Parsing Structured Logs for Error Entries
🏷️ Regular Expressions (Regex) / Practical Engineering Examples
🎯 Context Introduction
In modern systems, logs are generated in structured formats like JSON, CSV, or key-value pairs. When something goes wrong, engineers need to quickly extract error entries from these logs to diagnose issues. Rather than manually scrolling through thousands of lines, you can use Python to parse structured logs and filter out only the error entries you care about.
This guide walks through parsing a JSON-formatted log file to extract all error-level entries, using simple Python techniques that build on your regex knowledge.
⚙️ Understanding Structured Logs
Structured logs follow a predictable format, making them ideal for automated parsing. Common structured formats include:
- JSON logs: Each line is a JSON object with fields like
timestamp,level,message,service - CSV logs: Fields separated by commas, often with a header row
- Key-value logs: Pairs like
level=ERRORandmessage=Connection timeout
Why parse structured logs? - You can filter by severity level (ERROR, WARN, INFO) - You can extract specific fields for analysis - You can aggregate errors across multiple services - You can trigger alerts based on error patterns
🛠️ Step 1: Reading the Log File
Before parsing, you need to read the log file into Python. The approach differs slightly depending on the log format.
For JSON logs (one JSON object per line):
- Open the file using Python's built-in open() function
- Read the file line by line using a for loop
- Parse each line as a JSON object using the json.loads() method
For CSV logs:
- Use Python's csv module with csv.DictReader() to read rows as dictionaries
- Each row becomes accessible by column name
For key-value logs:
- Read each line as a string
- Split the line by spaces or delimiters to extract key-value pairs
- Use regex patterns to match specific fields like level=ERROR
🕵️ Step 2: Filtering for Error Entries
Once you have the log data in a structured format (like a dictionary), filtering for errors becomes straightforward.
For JSON logs:
- Check if the level or severity field equals "ERROR"
- You can also check for case-insensitive matches using .lower()
For CSV logs:
- Access the column containing the log level (e.g., row['level'])
- Compare the value to "ERROR" using an if statement
For key-value logs:
- Extract the value after level= using string methods or regex
- Compare the extracted value to "ERROR"
📊 Comparison Table: Parsing Approaches by Log Format
| Log Format | Parsing Method | Error Detection | Best For |
|---|---|---|---|
| JSON | json.loads() per line |
Check level field |
Modern applications, microservices |
| CSV | csv.DictReader() |
Check severity column | Legacy systems, exported data |
| Key-Value | String splitting or regex | Match level=ERROR pattern |
Simple custom log formats |
🧩 Step 3: Extracting Relevant Information
After filtering for error entries, you typically want to extract specific details for further analysis or reporting.
Common fields to extract from error entries: - Timestamp: When the error occurred - Service name: Which component generated the error - Error message: The actual error description - Error code: HTTP status code or application error number - Request ID: For tracing the request that caused the error
How to extract fields:
- For JSON logs, access dictionary keys directly: entry['timestamp'], entry['message']
- For CSV logs, access by column name: row['timestamp'], row['message']
- For key-value logs, use regex to capture values after each field name
📈 Step 4: Counting and Summarizing Errors
Beyond just listing errors, you often need to understand error patterns.
Common summarization techniques: - Count total errors per service using a dictionary where service names are keys - Group errors by error code or message pattern - Find the most frequent error messages - Calculate error rate over time intervals
Example approach:
- Create an empty dictionary called error_counts
- For each error entry, increment the count for the relevant category (service, error code, etc.)
- After processing all logs, print the summary
🎨 Step 5: Writing Results to a New File
Once you've parsed and filtered the errors, you may want to save the results for further analysis or reporting.
Output options: - Write filtered errors to a new JSON file (one error per line) - Write a CSV summary with error counts per category - Write a plain text report with timestamps and messages - Append errors to an existing error tracking file
Writing JSON output:
- Open a new file in write mode
- For each filtered error, convert the dictionary to JSON using json.dumps()
- Write each JSON string as a new line in the output file
🔄 Complete Workflow Summary
The full process for parsing structured logs for error entries follows this flow:
- Open the log file using
open()with the appropriate mode - Read each line using a
forloop - Parse the line into a structured format (dictionary for JSON/CSV, key-value pairs for text)
- Check the severity level using an
ifstatement comparing to"ERROR" - Extract relevant fields from matching entries (timestamp, message, service)
- Store or aggregate the error information in a list or dictionary
- Output the results to console or write to a new file
🧠 Key Takeaways for Engineers
- Structured logs are predictable: JSON and CSV formats make parsing reliable and repeatable
- Filtering is simple: A single
ifstatement checking the level field is often all you need - Extract what matters: Focus on timestamp, message, and service name for most debugging scenarios
- Summarize patterns: Counting errors by type or service reveals systemic issues
- Save your work: Writing filtered results to a new file preserves your analysis for later review
🚀 Next Steps for Practice
- Start with a small JSON log file containing a mix of INFO, WARN, and ERROR entries
- Write a script that reads the file and prints only the ERROR entries
- Add logic to count how many errors occurred per service
- Extend your script to write the filtered errors to a new file
- Try parsing a CSV log file using the
csvmodule for comparison
By mastering structured log parsing, you'll be able to quickly pinpoint issues in production systems without manually searching through thousands of log lines.
This guide shows how to use regex to extract error entries from structured log files.
🔧 Example 1: Finding lines containing "ERROR"
This example finds any log line that contains the word "ERROR".
import re
log_data = """
2024-01-15 10:30:45 INFO Service started
2024-01-15 10:31:12 ERROR Database connection failed
2024-01-15 10:32:00 WARN Memory usage high
2024-01-15 10:33:21 ERROR Timeout occurred
"""
pattern = r"ERROR"
matches = re.findall(pattern, log_data)
print(matches)
📤 Output: ['ERROR', 'ERROR']
🔧 Example 2: Extracting full error lines from logs
This example captures the entire line containing an ERROR entry.
import re
log_data = """
2024-01-15 10:30:45 INFO Service started
2024-01-15 10:31:12 ERROR Database connection failed
2024-01-15 10:32:00 WARN Memory usage high
2024-01-15 10:33:21 ERROR Timeout occurred
"""
pattern = r"^.*ERROR.*$"
matches = re.findall(pattern, log_data, re.MULTILINE)
print(matches)
📤 Output: ['2024-01-15 10:31:12 ERROR Database connection failed', '2024-01-15 10:33:21 ERROR Timeout occurred']
🔧 Example 3: Extracting timestamp and error message separately
This example splits each error line into timestamp and message parts.
import re
log_data = """
2024-01-15 10:31:12 ERROR Database connection failed
2024-01-15 10:33:21 ERROR Timeout occurred
2024-01-15 10:35:00 ERROR Disk space low
"""
pattern = r"(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) ERROR (.+)"
matches = re.findall(pattern, log_data)
print(matches)
📤 Output: [('2024-01-15 10:31:12', 'Database connection failed'), ('2024-01-15 10:33:21', 'Timeout occurred'), ('2024-01-15 10:35:00', 'Disk space low')]
🔧 Example 4: Filtering errors by error code pattern
This example extracts only errors that contain a specific error code like "ERR-500".
import re
log_data = """
2024-01-15 10:31:12 ERROR ERR-500 Database connection failed
2024-01-15 10:32:45 ERROR ERR-404 Resource not found
2024-01-15 10:33:21 ERROR Timeout occurred
2024-01-15 10:35:00 ERROR ERR-500 Disk space low
"""
pattern = r"ERROR ERR-500 (.+)"
matches = re.findall(pattern, log_data)
print(matches)
📤 Output: ['Database connection failed', 'Disk space low']
🔧 Example 5: Counting errors by type in structured JSON logs
This example parses JSON-formatted log entries and counts errors by type.
import re
log_data = """
{"timestamp": "2024-01-15 10:31:12", "level": "ERROR", "type": "DB_ERROR", "message": "Connection failed"}
{"timestamp": "2024-01-15 10:32:45", "level": "ERROR", "type": "AUTH_ERROR", "message": "Invalid token"}
{"timestamp": "2024-01-15 10:33:21", "level": "ERROR", "type": "DB_ERROR", "message": "Timeout"}
{"timestamp": "2024-01-15 10:35:00", "level": "INFO", "type": "STARTUP", "message": "Service ready"}
"""
pattern = r'"type": "(\w+)"'
matches = re.findall(pattern, log_data)
error_types = {}
for error_type in matches:
if error_type in error_types:
error_types[error_type] += 1
else:
error_types[error_type] = 1
print(error_types)
📤 Output: {'DB_ERROR': 2, 'AUTH_ERROR': 1, 'STARTUP': 1}
📊 Comparison Table: Regex Patterns for Log Parsing
| Pattern | Purpose | Example Match |
|---|---|---|
ERROR |
Find any line with ERROR | ERROR Database connection failed |
^.*ERROR.*$ |
Capture entire error line | Full line containing ERROR |
(\d{4}-\d{2}-\d{2}) ERROR (.+) |
Extract date and message | ('2024-01-15', 'Database connection failed') |
ERROR ERR-500 (.+) |
Filter by error code | Database connection failed |
"type": "(\w+)" |
Extract error type from JSON | DB_ERROR |
🎯 Context Introduction
In modern systems, logs are generated in structured formats like JSON, CSV, or key-value pairs. When something goes wrong, engineers need to quickly extract error entries from these logs to diagnose issues. Rather than manually scrolling through thousands of lines, you can use Python to parse structured logs and filter out only the error entries you care about.
This guide walks through parsing a JSON-formatted log file to extract all error-level entries, using simple Python techniques that build on your regex knowledge.
⚙️ Understanding Structured Logs
Structured logs follow a predictable format, making them ideal for automated parsing. Common structured formats include:
- JSON logs: Each line is a JSON object with fields like
timestamp,level,message,service - CSV logs: Fields separated by commas, often with a header row
- Key-value logs: Pairs like
level=ERRORandmessage=Connection timeout
Why parse structured logs? - You can filter by severity level (ERROR, WARN, INFO) - You can extract specific fields for analysis - You can aggregate errors across multiple services - You can trigger alerts based on error patterns
🛠️ Step 1: Reading the Log File
Before parsing, you need to read the log file into Python. The approach differs slightly depending on the log format.
For JSON logs (one JSON object per line):
- Open the file using Python's built-in open() function
- Read the file line by line using a for loop
- Parse each line as a JSON object using the json.loads() method
For CSV logs:
- Use Python's csv module with csv.DictReader() to read rows as dictionaries
- Each row becomes accessible by column name
For key-value logs:
- Read each line as a string
- Split the line by spaces or delimiters to extract key-value pairs
- Use regex patterns to match specific fields like level=ERROR
🕵️ Step 2: Filtering for Error Entries
Once you have the log data in a structured format (like a dictionary), filtering for errors becomes straightforward.
For JSON logs:
- Check if the level or severity field equals "ERROR"
- You can also check for case-insensitive matches using .lower()
For CSV logs:
- Access the column containing the log level (e.g., row['level'])
- Compare the value to "ERROR" using an if statement
For key-value logs:
- Extract the value after level= using string methods or regex
- Compare the extracted value to "ERROR"
📊 Comparison Table: Parsing Approaches by Log Format
| Log Format | Parsing Method | Error Detection | Best For |
|---|---|---|---|
| JSON | json.loads() per line |
Check level field |
Modern applications, microservices |
| CSV | csv.DictReader() |
Check severity column | Legacy systems, exported data |
| Key-Value | String splitting or regex | Match level=ERROR pattern |
Simple custom log formats |
🧩 Step 3: Extracting Relevant Information
After filtering for error entries, you typically want to extract specific details for further analysis or reporting.
Common fields to extract from error entries: - Timestamp: When the error occurred - Service name: Which component generated the error - Error message: The actual error description - Error code: HTTP status code or application error number - Request ID: For tracing the request that caused the error
How to extract fields:
- For JSON logs, access dictionary keys directly: entry['timestamp'], entry['message']
- For CSV logs, access by column name: row['timestamp'], row['message']
- For key-value logs, use regex to capture values after each field name
📈 Step 4: Counting and Summarizing Errors
Beyond just listing errors, you often need to understand error patterns.
Common summarization techniques: - Count total errors per service using a dictionary where service names are keys - Group errors by error code or message pattern - Find the most frequent error messages - Calculate error rate over time intervals
Example approach:
- Create an empty dictionary called error_counts
- For each error entry, increment the count for the relevant category (service, error code, etc.)
- After processing all logs, print the summary
🎨 Step 5: Writing Results to a New File
Once you've parsed and filtered the errors, you may want to save the results for further analysis or reporting.
Output options: - Write filtered errors to a new JSON file (one error per line) - Write a CSV summary with error counts per category - Write a plain text report with timestamps and messages - Append errors to an existing error tracking file
Writing JSON output:
- Open a new file in write mode
- For each filtered error, convert the dictionary to JSON using json.dumps()
- Write each JSON string as a new line in the output file
🔄 Complete Workflow Summary
The full process for parsing structured logs for error entries follows this flow:
- Open the log file using
open()with the appropriate mode - Read each line using a
forloop - Parse the line into a structured format (dictionary for JSON/CSV, key-value pairs for text)
- Check the severity level using an
ifstatement comparing to"ERROR" - Extract relevant fields from matching entries (timestamp, message, service)
- Store or aggregate the error information in a list or dictionary
- Output the results to console or write to a new file
🧠 Key Takeaways for Engineers
- Structured logs are predictable: JSON and CSV formats make parsing reliable and repeatable
- Filtering is simple: A single
ifstatement checking the level field is often all you need - Extract what matters: Focus on timestamp, message, and service name for most debugging scenarios
- Summarize patterns: Counting errors by type or service reveals systemic issues
- Save your work: Writing filtered results to a new file preserves your analysis for later review
🚀 Next Steps for Practice
- Start with a small JSON log file containing a mix of INFO, WARN, and ERROR entries
- Write a script that reads the file and prints only the ERROR entries
- Add logic to count how many errors occurred per service
- Extend your script to write the filtered errors to a new file
- Try parsing a CSV log file using the
csvmodule for comparison
By mastering structured log parsing, you'll be able to quickly pinpoint issues in production systems without manually searching through thousands of log lines.
Interactive Views
You are currently in 📚 All-in-One mode. Use the tabs at the top to switch to 📖 Theory Only or 💻 Code Only views.
This guide shows how to use regex to extract error entries from structured log files.
🔧 Example 1: Finding lines containing "ERROR"
This example finds any log line that contains the word "ERROR".
import re
log_data = """
2024-01-15 10:30:45 INFO Service started
2024-01-15 10:31:12 ERROR Database connection failed
2024-01-15 10:32:00 WARN Memory usage high
2024-01-15 10:33:21 ERROR Timeout occurred
"""
pattern = r"ERROR"
matches = re.findall(pattern, log_data)
print(matches)
📤 Output: ['ERROR', 'ERROR']
🔧 Example 2: Extracting full error lines from logs
This example captures the entire line containing an ERROR entry.
import re
log_data = """
2024-01-15 10:30:45 INFO Service started
2024-01-15 10:31:12 ERROR Database connection failed
2024-01-15 10:32:00 WARN Memory usage high
2024-01-15 10:33:21 ERROR Timeout occurred
"""
pattern = r"^.*ERROR.*$"
matches = re.findall(pattern, log_data, re.MULTILINE)
print(matches)
📤 Output: ['2024-01-15 10:31:12 ERROR Database connection failed', '2024-01-15 10:33:21 ERROR Timeout occurred']
🔧 Example 3: Extracting timestamp and error message separately
This example splits each error line into timestamp and message parts.
import re
log_data = """
2024-01-15 10:31:12 ERROR Database connection failed
2024-01-15 10:33:21 ERROR Timeout occurred
2024-01-15 10:35:00 ERROR Disk space low
"""
pattern = r"(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) ERROR (.+)"
matches = re.findall(pattern, log_data)
print(matches)
📤 Output: [('2024-01-15 10:31:12', 'Database connection failed'), ('2024-01-15 10:33:21', 'Timeout occurred'), ('2024-01-15 10:35:00', 'Disk space low')]
🔧 Example 4: Filtering errors by error code pattern
This example extracts only errors that contain a specific error code like "ERR-500".
import re
log_data = """
2024-01-15 10:31:12 ERROR ERR-500 Database connection failed
2024-01-15 10:32:45 ERROR ERR-404 Resource not found
2024-01-15 10:33:21 ERROR Timeout occurred
2024-01-15 10:35:00 ERROR ERR-500 Disk space low
"""
pattern = r"ERROR ERR-500 (.+)"
matches = re.findall(pattern, log_data)
print(matches)
📤 Output: ['Database connection failed', 'Disk space low']
🔧 Example 5: Counting errors by type in structured JSON logs
This example parses JSON-formatted log entries and counts errors by type.
import re
log_data = """
{"timestamp": "2024-01-15 10:31:12", "level": "ERROR", "type": "DB_ERROR", "message": "Connection failed"}
{"timestamp": "2024-01-15 10:32:45", "level": "ERROR", "type": "AUTH_ERROR", "message": "Invalid token"}
{"timestamp": "2024-01-15 10:33:21", "level": "ERROR", "type": "DB_ERROR", "message": "Timeout"}
{"timestamp": "2024-01-15 10:35:00", "level": "INFO", "type": "STARTUP", "message": "Service ready"}
"""
pattern = r'"type": "(\w+)"'
matches = re.findall(pattern, log_data)
error_types = {}
for error_type in matches:
if error_type in error_types:
error_types[error_type] += 1
else:
error_types[error_type] = 1
print(error_types)
📤 Output: {'DB_ERROR': 2, 'AUTH_ERROR': 1, 'STARTUP': 1}
📊 Comparison Table: Regex Patterns for Log Parsing
| Pattern | Purpose | Example Match |
|---|---|---|
ERROR |
Find any line with ERROR | ERROR Database connection failed |
^.*ERROR.*$ |
Capture entire error line | Full line containing ERROR |
(\d{4}-\d{2}-\d{2}) ERROR (.+) |
Extract date and message | ('2024-01-15', 'Database connection failed') |
ERROR ERR-500 (.+) |
Filter by error code | Database connection failed |
"type": "(\w+)" |
Extract error type from JSON | DB_ERROR |