Architecting Health Reporter Profiles and Requirements
π·οΈ Final Capstone Engineer Script project / Project: System Health Reporter
π§ Context Introduction
Before writing a single line of code, every successful automation project begins with a clear blueprint. The System Health Reporter is your capstone projectβa tool that will monitor critical system metrics, detect anomalies, and generate actionable reports. This section defines the who, what, and why behind the tool by architecting user profiles and capturing functional requirements. Think of this as the architectural foundation upon which all your Python scripts will be built.
π€ Defining User Profiles
Different engineers interact with a health reporter in different ways. We identify three primary user profiles to guide our design decisions:
- π΅οΈ The Daily Operator β Needs a quick, at-a-glance dashboard of system health. Prefers summary reports with clear pass/fail indicators. Values speed and simplicity over deep detail.
- π§ The Troubleshooter β Requires granular data when something goes wrong. Needs historical context, trend lines, and raw metric values. Willing to dig into logs and verbose output.
- π The Manager β Cares about overall system reliability and compliance. Wants scheduled reports, uptime percentages, and trend summaries. Less interested in real-time data, more focused on patterns over time.
Each profile influences how we structure our Python scriptβfrom output formatting to logging levels and reporting frequency.
π Core Functional Requirements
The System Health Reporter must satisfy the following essential capabilities:
- βοΈ Metric Collection β Gather CPU usage, memory consumption, disk space, and network latency from the local system.
- π Threshold Evaluation β Compare collected metrics against predefined healthy thresholds (e.g., CPU > 80% triggers a warning).
- π οΈ Alert Generation β Produce clear, timestamped alerts when any metric exceeds its threshold.
- π Report Output β Generate both a human-readable summary (console output) and a machine-parseable log file.
- β±οΈ Scheduling Support β Allow the script to run on a timer (e.g., every 5 minutes) without manual intervention.
π§© Non-Functional Requirements
Beyond what the tool does, we define how it should behave:
- π Performance β The script must complete a full health check in under 2 seconds on standard hardware.
- π Reliability β If a single metric collection fails (e.g., network timeout), the script should continue checking remaining metrics and log the error gracefully.
- π§Ή Maintainability β All thresholds and configuration values must be stored in a separate configuration dictionary, not hardcoded throughout the script.
- π Readability β Code must include inline comments and follow consistent naming conventions so any engineer can understand and modify it.
π Comparison: Profiles vs. Requirements
This table maps each user profile to the requirements that matter most to them:
| User Profile | Priority Requirement | Output Preference | Interaction Style |
|---|---|---|---|
| π΅οΈ Daily Operator | Threshold Evaluation | Simple pass/fail summary | Runs on demand or via cron |
| π§ Troubleshooter | Metric Collection + Alert Generation | Verbose logs with timestamps | Interactive debugging sessions |
| π Manager | Report Output + Scheduling | Scheduled email or file reports | Reviews weekly summaries |
πΊοΈ Architectural Decisions Driven by Profiles
Each profile shapes a specific design choice in our Python implementation:
- For the Daily Operator β We will implement a color-coded console output using ANSI escape sequences. Green for healthy, yellow for warning, red for critical. No extra reading required.
- For the Troubleshooter β We will include a verbose mode flag (e.g.,
--verbose) that prints raw metric values, collection timestamps, and comparison logic step-by-step. - For the Manager β We will write a separate report generator function that formats the last 24 hours of data into a clean summary with uptime percentage and top warnings.
β Final Requirement Checklist
Before moving to implementation, confirm your design covers these essentials:
- [ ] Profile Identification β Have you identified who will use this tool and what they care about?
- [ ] Metric Scope β Have you listed exactly which system metrics will be collected?
- [ ] Threshold Definitions β Are healthy, warning, and critical thresholds defined for each metric?
- [ ] Output Formats β Have you decided on console output style and log file structure?
- [ ] Error Handling Strategy β Do you know how the script will behave when a metric is unavailable?
π― Next Step
With profiles defined and requirements documented, you are ready to move into the Implementation Phase, where these specifications will be translated into clean, modular Python functions. The blueprint is completeβnow the build begins.
This file shows how to design profiles and requirements for a system health reporter that tracks engineer-defined metrics and thresholds.
π§© Example 1: Defining a basic health profile with a single metric
A health profile stores one metric name and its acceptable threshold value.
profile_name = "CPU Health"
metric_name = "cpu_usage_percent"
threshold = 80
π€ Output: No output (variable assignment)
π§© Example 2: Creating a profile dictionary with multiple requirements
A dictionary groups all requirements for one engineer component into a single profile.
cpu_profile = {
"name": "CPU Health",
"metric": "cpu_usage_percent",
"warning_threshold": 70,
"critical_threshold": 90
}
π€ Output: No output (dictionary created)
π§© Example 3: Checking if a metric value meets the profile requirement
A simple comparison tells the engineer whether the current reading is within the acceptable range.
current_cpu = 85
warning_level = 70
critical_level = 90
if current_cpu >= critical_level:
status = "CRITICAL"
elif current_cpu >= warning_level:
status = "WARNING"
else:
status = "OK"
π€ Output: No output (status variable set to "WARNING")
π§© Example 4: Building a list of health profiles for multiple engineer components
A list of profiles allows the reporter to check several components in one pass.
health_profiles = [
{"component": "CPU", "metric": "usage", "max_ok": 80},
{"component": "Memory", "metric": "usage", "max_ok": 85},
{"component": "Disk", "metric": "usage", "max_ok": 90}
]
for profile in health_profiles:
print(f"{profile['component']} max OK: {profile['max_ok']}%")
π€ Output: CPU max OK: 80%
Memory max OK: 85%
Disk max OK: 90%
π§© Example 5: Defining a requirement with multiple thresholds and a check function
A function lets the engineer reuse the same requirement logic across different components.
def check_health(component_name, current_value, warning_at, critical_at):
if current_value >= critical_at:
return f"{component_name}: CRITICAL ({current_value})"
elif current_value >= warning_at:
return f"{component_name}: WARNING ({current_value})"
else:
return f"{component_name}: OK ({current_value})"
result = check_health("CPU", 75, 70, 90)
π€ Output: No output (result variable set to "CPU: WARNING (75)")
Comparison Table
| Concept | Example | Purpose |
|---|---|---|
| Single metric profile | Example 1 | Store one metric and threshold |
| Dictionary profile | Example 2 | Group multiple requirements |
| Threshold check | Example 3 | Compare current value to limits |
| List of profiles | Example 4 | Manage multiple components |
| Reusable check function | Example 5 | Apply same logic across profiles |
π§ Context Introduction
Before writing a single line of code, every successful automation project begins with a clear blueprint. The System Health Reporter is your capstone projectβa tool that will monitor critical system metrics, detect anomalies, and generate actionable reports. This section defines the who, what, and why behind the tool by architecting user profiles and capturing functional requirements. Think of this as the architectural foundation upon which all your Python scripts will be built.
π€ Defining User Profiles
Different engineers interact with a health reporter in different ways. We identify three primary user profiles to guide our design decisions:
- π΅οΈ The Daily Operator β Needs a quick, at-a-glance dashboard of system health. Prefers summary reports with clear pass/fail indicators. Values speed and simplicity over deep detail.
- π§ The Troubleshooter β Requires granular data when something goes wrong. Needs historical context, trend lines, and raw metric values. Willing to dig into logs and verbose output.
- π The Manager β Cares about overall system reliability and compliance. Wants scheduled reports, uptime percentages, and trend summaries. Less interested in real-time data, more focused on patterns over time.
Each profile influences how we structure our Python scriptβfrom output formatting to logging levels and reporting frequency.
π Core Functional Requirements
The System Health Reporter must satisfy the following essential capabilities:
- βοΈ Metric Collection β Gather CPU usage, memory consumption, disk space, and network latency from the local system.
- π Threshold Evaluation β Compare collected metrics against predefined healthy thresholds (e.g., CPU > 80% triggers a warning).
- π οΈ Alert Generation β Produce clear, timestamped alerts when any metric exceeds its threshold.
- π Report Output β Generate both a human-readable summary (console output) and a machine-parseable log file.
- β±οΈ Scheduling Support β Allow the script to run on a timer (e.g., every 5 minutes) without manual intervention.
π§© Non-Functional Requirements
Beyond what the tool does, we define how it should behave:
- π Performance β The script must complete a full health check in under 2 seconds on standard hardware.
- π Reliability β If a single metric collection fails (e.g., network timeout), the script should continue checking remaining metrics and log the error gracefully.
- π§Ή Maintainability β All thresholds and configuration values must be stored in a separate configuration dictionary, not hardcoded throughout the script.
- π Readability β Code must include inline comments and follow consistent naming conventions so any engineer can understand and modify it.
π Comparison: Profiles vs. Requirements
This table maps each user profile to the requirements that matter most to them:
| User Profile | Priority Requirement | Output Preference | Interaction Style |
|---|---|---|---|
| π΅οΈ Daily Operator | Threshold Evaluation | Simple pass/fail summary | Runs on demand or via cron |
| π§ Troubleshooter | Metric Collection + Alert Generation | Verbose logs with timestamps | Interactive debugging sessions |
| π Manager | Report Output + Scheduling | Scheduled email or file reports | Reviews weekly summaries |
πΊοΈ Architectural Decisions Driven by Profiles
Each profile shapes a specific design choice in our Python implementation:
- For the Daily Operator β We will implement a color-coded console output using ANSI escape sequences. Green for healthy, yellow for warning, red for critical. No extra reading required.
- For the Troubleshooter β We will include a verbose mode flag (e.g.,
--verbose) that prints raw metric values, collection timestamps, and comparison logic step-by-step. - For the Manager β We will write a separate report generator function that formats the last 24 hours of data into a clean summary with uptime percentage and top warnings.
β Final Requirement Checklist
Before moving to implementation, confirm your design covers these essentials:
- [ ] Profile Identification β Have you identified who will use this tool and what they care about?
- [ ] Metric Scope β Have you listed exactly which system metrics will be collected?
- [ ] Threshold Definitions β Are healthy, warning, and critical thresholds defined for each metric?
- [ ] Output Formats β Have you decided on console output style and log file structure?
- [ ] Error Handling Strategy β Do you know how the script will behave when a metric is unavailable?
π― Next Step
With profiles defined and requirements documented, you are ready to move into the Implementation Phase, where these specifications will be translated into clean, modular Python functions. The blueprint is completeβnow the build begins.
Interactive Views
You are currently in π All-in-One mode. Use the tabs at the top to switch to π Theory Only or π» Code Only views.
This file shows how to design profiles and requirements for a system health reporter that tracks engineer-defined metrics and thresholds.
π§© Example 1: Defining a basic health profile with a single metric
A health profile stores one metric name and its acceptable threshold value.
profile_name = "CPU Health"
metric_name = "cpu_usage_percent"
threshold = 80
π€ Output: No output (variable assignment)
π§© Example 2: Creating a profile dictionary with multiple requirements
A dictionary groups all requirements for one engineer component into a single profile.
cpu_profile = {
"name": "CPU Health",
"metric": "cpu_usage_percent",
"warning_threshold": 70,
"critical_threshold": 90
}
π€ Output: No output (dictionary created)
π§© Example 3: Checking if a metric value meets the profile requirement
A simple comparison tells the engineer whether the current reading is within the acceptable range.
current_cpu = 85
warning_level = 70
critical_level = 90
if current_cpu >= critical_level:
status = "CRITICAL"
elif current_cpu >= warning_level:
status = "WARNING"
else:
status = "OK"
π€ Output: No output (status variable set to "WARNING")
π§© Example 4: Building a list of health profiles for multiple engineer components
A list of profiles allows the reporter to check several components in one pass.
health_profiles = [
{"component": "CPU", "metric": "usage", "max_ok": 80},
{"component": "Memory", "metric": "usage", "max_ok": 85},
{"component": "Disk", "metric": "usage", "max_ok": 90}
]
for profile in health_profiles:
print(f"{profile['component']} max OK: {profile['max_ok']}%")
π€ Output: CPU max OK: 80%
Memory max OK: 85%
Disk max OK: 90%
π§© Example 5: Defining a requirement with multiple thresholds and a check function
A function lets the engineer reuse the same requirement logic across different components.
def check_health(component_name, current_value, warning_at, critical_at):
if current_value >= critical_at:
return f"{component_name}: CRITICAL ({current_value})"
elif current_value >= warning_at:
return f"{component_name}: WARNING ({current_value})"
else:
return f"{component_name}: OK ({current_value})"
result = check_health("CPU", 75, 70, 90)
π€ Output: No output (result variable set to "CPU: WARNING (75)")
Comparison Table
| Concept | Example | Purpose |
|---|---|---|
| Single metric profile | Example 1 | Store one metric and threshold |
| Dictionary profile | Example 2 | Group multiple requirements |
| Threshold check | Example 3 | Compare current value to limits |
| List of profiles | Example 4 | Manage multiple components |
| Reusable check function | Example 5 | Apply same logic across profiles |