Architecting Health Reporter Profiles and Requirements

🏷️ Final Capstone Engineer Script project / Project: System Health Reporter

📚 All-in-One📖 Theory Only💻 Code Only

🧭 Context Introduction

Before writing a single line of code, every successful automation project begins with a clear blueprint. The System Health Reporter is your capstone project—a tool that will monitor critical system metrics, detect anomalies, and generate actionable reports. This section defines the who, what, and why behind the tool by architecting user profiles and capturing functional requirements. Think of this as the architectural foundation upon which all your Python scripts will be built.

👤 Defining User Profiles

Different engineers interact with a health reporter in different ways. We identify three primary user profiles to guide our design decisions:

🕵️ The Daily Operator – Needs a quick, at-a-glance dashboard of system health. Prefers summary reports with clear pass/fail indicators. Values speed and simplicity over deep detail.
🔧 The Troubleshooter – Requires granular data when something goes wrong. Needs historical context, trend lines, and raw metric values. Willing to dig into logs and verbose output.
📈 The Manager – Cares about overall system reliability and compliance. Wants scheduled reports, uptime percentages, and trend summaries. Less interested in real-time data, more focused on patterns over time.

Each profile influences how we structure our Python script—from output formatting to logging levels and reporting frequency.

📋 Core Functional Requirements

The System Health Reporter must satisfy the following essential capabilities:

⚙️ Metric Collection – Gather CPU usage, memory consumption, disk space, and network latency from the local system.
📊 Threshold Evaluation – Compare collected metrics against predefined healthy thresholds (e.g., CPU > 80% triggers a warning).
🛠️ Alert Generation – Produce clear, timestamped alerts when any metric exceeds its threshold.
📁 Report Output – Generate both a human-readable summary (console output) and a machine-parseable log file.
⏱️ Scheduling Support – Allow the script to run on a timer (e.g., every 5 minutes) without manual intervention.

🧩 Non-Functional Requirements

Beyond what the tool does, we define how it should behave:

🚀 Performance – The script must complete a full health check in under 2 seconds on standard hardware.
🔒 Reliability – If a single metric collection fails (e.g., network timeout), the script should continue checking remaining metrics and log the error gracefully.
🧹 Maintainability – All thresholds and configuration values must be stored in a separate configuration dictionary, not hardcoded throughout the script.
📖 Readability – Code must include inline comments and follow consistent naming conventions so any engineer can understand and modify it.

🆚 Comparison: Profiles vs. Requirements

This table maps each user profile to the requirements that matter most to them:

User Profile	Priority Requirement	Output Preference	Interaction Style
🕵️ Daily Operator	Threshold Evaluation	Simple pass/fail summary	Runs on demand or via cron
🔧 Troubleshooter	Metric Collection + Alert Generation	Verbose logs with timestamps	Interactive debugging sessions
📈 Manager	Report Output + Scheduling	Scheduled email or file reports	Reviews weekly summaries

🗺️ Architectural Decisions Driven by Profiles

Each profile shapes a specific design choice in our Python implementation:

For the Daily Operator – We will implement a color-coded console output using ANSI escape sequences. Green for healthy, yellow for warning, red for critical. No extra reading required.
For the Troubleshooter – We will include a verbose mode flag (e.g., --verbose) that prints raw metric values, collection timestamps, and comparison logic step-by-step.
For the Manager – We will write a separate report generator function that formats the last 24 hours of data into a clean summary with uptime percentage and top warnings.

✅ Final Requirement Checklist

Before moving to implementation, confirm your design covers these essentials:

[ ] Profile Identification – Have you identified who will use this tool and what they care about?
[ ] Metric Scope – Have you listed exactly which system metrics will be collected?
[ ] Threshold Definitions – Are healthy, warning, and critical thresholds defined for each metric?
[ ] Output Formats – Have you decided on console output style and log file structure?
[ ] Error Handling Strategy – Do you know how the script will behave when a metric is unavailable?

🎯 Next Step

With profiles defined and requirements documented, you are ready to move into the Implementation Phase, where these specifications will be translated into clean, modular Python functions. The blueprint is complete—now the build begins.

This file shows how to design profiles and requirements for a system health reporter that tracks engineer-defined metrics and thresholds.

🧩 Example 1: Defining a basic health profile with a single metric

A health profile stores one metric name and its acceptable threshold value.

profile_name = "CPU Health"
metric_name = "cpu_usage_percent"
threshold = 80

📤 Output: No output (variable assignment)

🧩 Example 2: Creating a profile dictionary with multiple requirements

A dictionary groups all requirements for one engineer component into a single profile.

cpu_profile = {
    "name": "CPU Health",
    "metric": "cpu_usage_percent",
    "warning_threshold": 70,
    "critical_threshold": 90
}

📤 Output: No output (dictionary created)

🧩 Example 3: Checking if a metric value meets the profile requirement

A simple comparison tells the engineer whether the current reading is within the acceptable range.

current_cpu = 85
warning_level = 70
critical_level = 90

if current_cpu >= critical_level:
    status = "CRITICAL"
elif current_cpu >= warning_level:
    status = "WARNING"
else:
    status = "OK"

📤 Output: No output (status variable set to "WARNING")

🧩 Example 4: Building a list of health profiles for multiple engineer components

A list of profiles allows the reporter to check several components in one pass.

health_profiles = [
    {"component": "CPU", "metric": "usage", "max_ok": 80},
    {"component": "Memory", "metric": "usage", "max_ok": 85},
    {"component": "Disk", "metric": "usage", "max_ok": 90}
]

for profile in health_profiles:
    print(f"{profile['component']} max OK: {profile['max_ok']}%")

📤 Output: CPU max OK: 80%
Memory max OK: 85%
Disk max OK: 90%

🧩 Example 5: Defining a requirement with multiple thresholds and a check function

A function lets the engineer reuse the same requirement logic across different components.

def check_health(component_name, current_value, warning_at, critical_at):
    if current_value >= critical_at:
        return f"{component_name}: CRITICAL ({current_value})"
    elif current_value >= warning_at:
        return f"{component_name}: WARNING ({current_value})"
    else:
        return f"{component_name}: OK ({current_value})"

result = check_health("CPU", 75, 70, 90)

📤 Output: No output (result variable set to "CPU: WARNING (75)")

Comparison Table

Concept	Example	Purpose
Single metric profile	Example 1	Store one metric and threshold
Dictionary profile	Example 2	Group multiple requirements
Threshold check	Example 3	Compare current value to limits
List of profiles	Example 4	Manage multiple components
Reusable check function	Example 5	Apply same logic across profiles