Practical Example: Archiving Old Log Files
๐ท๏ธ Operating System and System Operations / The shutil Module
๐ง Context Introduction
Log files are essential for troubleshooting and monitoring, but they can quickly consume disk space if left unchecked. Engineers often need to archive or compress older log files to free up storage while preserving them for future reference. This practical example demonstrates how to use Python's shutil module to automate the process of identifying and archiving old log files into a compressed archive.
โ๏ธ The Goal
The objective is to create a script that:
- Scans a specified directory for log files (e.g., files ending with .log)
- Identifies files older than a certain number of days (e.g., 7 days)
- Compresses those old files into a single archive (e.g., .zip or .tar.gz)
- Removes the original log files after successful archiving
๐ ๏ธ Key Tools from the shutil Module
The shutil module provides high-level file operations. For this task, we rely on:
- shutil.make_archive() โ Creates a compressed archive (zip, tar, etc.) from a collection of files
- shutil.rmtree() โ Removes a directory and all its contents (useful if logs are stored in subdirectories)
- os.path.getmtime() โ Retrieves the last modification time of a file to determine its age
๐ Comparison: Archive Formats
| Format | Compression | Platform Support | Use Case |
|---|---|---|---|
| zip | Moderate | Cross-platform | General purpose, easy to open on any OS |
| tar.gz | High (gzip) | Linux/macOS preferred | Better compression, common in server environments |
| tar.bz2 | Very high | Linux/macOS preferred | Maximum compression, slower to create |
For this example, we will use zip format for simplicity and broad compatibility.
๐ต๏ธ Step-by-Step Script Logic
Step 1: Define the target directory and age threshold
- Set a variable for the directory containing log files (e.g., /var/log/myapp)
- Set a variable for the maximum age in days (e.g., 7)
Step 2: List all files in the directory
- Use os.listdir() to get all entries in the directory
- Filter for files ending with .log using str.endswith()
Step 3: Check file age
- For each log file, use os.path.getmtime() to get the last modification timestamp
- Compare the timestamp with the current time minus the age threshold (using time.time())
- If the file is older than the threshold, add it to a list of files to archive
Step 4: Create the archive
- Use shutil.make_archive() with the base name (e.g., old_logs_archive), format (zip), and the list of file paths
- The archive will be created in the current working directory or a specified location
Step 5: Remove the original files
- After successful archiving, use os.remove() to delete each original log file
- Optionally, log the names of archived and removed files for auditing
๐งช Example Script Structure (Inline)
The script begins by importing os, shutil, and time. It then defines the target directory and age threshold. A function called archive_old_logs() is created to encapsulate the logic.
Inside the function:
- A list called files_to_archive is initialized as empty
- The script loops through all entries in the target directory using os.listdir()
- For each entry that ends with .log, it checks the file's modification time
- If the file is older than the threshold, its full path is appended to files_to_archive
- After the loop, if there are files to archive, shutil.make_archive() is called with a base name like archived_logs, format zip, and the root directory set to the target directory. The files to archive are passed as a list of relative paths
- Finally, each original file in files_to_archive is deleted using os.remove()
The script prints a message for each file archived and removed, and a summary at the end.
โ Expected Outcome
When the script runs:
- It scans the target directory and identifies all .log files older than 7 days
- It creates a single .zip file named archived_logs.zip containing those old log files
- It deletes the original log files from the directory
- The console output shows something like:
Archived: /var/log/myapp/error_20231001.log
Archived: /var/log/myapp/access_20230928.log
Removed: /var/log/myapp/error_20231001.log
Removed: /var/log/myapp/access_20230928.log
Archiving complete. 2 files processed.
๐งน Best Practices for Engineers
- Always test the script on a copy of your log directory first to avoid accidental data loss
- Use absolute paths to avoid confusion about the current working directory
- Add error handling (e.g., try-except blocks) to manage permission issues or missing files
- Schedule the script using cron (Linux) or Task Scheduler (Windows) for regular automated cleanup
- Consider logging the script's actions to a separate file for audit trails
๐ Summary
This practical example shows how the shutil module simplifies file archiving tasks that would otherwise require complex shell commands or manual effort. By combining shutil.make_archive() with file age checks, engineers can build a reliable log rotation system that keeps storage under control without losing historical data. The same approach can be extended to archive other file types or to compress entire directories.
This example shows how to use Python's shutil module to archive old log files into compressed archives for storage or cleanup.
๐ฆ Example 1: Creating a simple ZIP archive of a single file
This example demonstrates how to compress one log file into a ZIP archive.
import shutil
import os
log_file = "server.log"
archive_name = "server_log_backup"
shutil.make_archive(archive_name, "zip", ".", log_file)
๐ค Output: server_log_backup.zip created in current directory
๐ฆ Example 2: Archiving an entire directory of log files
This example shows how to compress a folder containing multiple log files into one archive.
import shutil
log_directory = "logs"
archive_name = "logs_backup"
shutil.make_archive(archive_name, "zip", log_directory)
๐ค Output: logs_backup.zip created containing all files from logs/
๐ฆ Example 3: Archiving only files older than 7 days
This example demonstrates how to filter and archive log files based on their age.
import shutil
import os
import time
log_directory = "logs"
archive_name = "old_logs_backup"
temp_dir = "temp_old_logs"
os.makedirs(temp_dir, exist_ok=True)
seven_days_ago = time.time() - (7 * 24 * 60 * 60)
for file_name in os.listdir(log_directory):
file_path = os.path.join(log_directory, file_name)
if os.path.getmtime(file_path) < seven_days_ago:
shutil.copy2(file_path, temp_dir)
shutil.make_archive(archive_name, "zip", temp_dir)
shutil.rmtree(temp_dir)
๐ค Output: old_logs_backup.zip created with files older than 7 days
๐ฆ Example 4: Archiving and deleting original log files after backup
This example shows how to archive old logs and then remove the original files to free up space.
import shutil
import os
import time
log_directory = "logs"
archive_name = "archived_logs"
temp_dir = "temp_archive"
os.makedirs(temp_dir, exist_ok=True)
seven_days_ago = time.time() - (7 * 24 * 60 * 60)
for file_name in os.listdir(log_directory):
file_path = os.path.join(log_directory, file_name)
if os.path.getmtime(file_path) < seven_days_ago:
shutil.copy2(file_path, temp_dir)
os.remove(file_path)
shutil.make_archive(archive_name, "zip", temp_dir)
shutil.rmtree(temp_dir)
๐ค Output: archived_logs.zip created, original old log files deleted
๐ฆ Example 5: Archiving logs with date-stamped archive names
This example demonstrates how to create archives with the current date in the filename for better organization.
import shutil
import os
from datetime import datetime
log_directory = "logs"
current_date = datetime.now().strftime("%Y-%m-%d")
archive_name = f"logs_archive_{current_date}"
shutil.make_archive(archive_name, "zip", log_directory)
๐ค Output: logs_archive_2024-01-15.zip created (date will vary)
๐ Comparison Table: Archive Methods
| Method | Use Case | Preserves Original Files | Creates Archive |
|---|---|---|---|
make_archive with single file |
Quick backup of one log | Yes | Yes |
make_archive with directory |
Backup entire log folder | Yes | Yes |
| Filter by age + archive | Archive old logs only | Yes | Yes |
| Archive + delete originals | Free up disk space | No | Yes |
| Date-stamped archive name | Organized backups | Yes | Yes |
๐ง Context Introduction
Log files are essential for troubleshooting and monitoring, but they can quickly consume disk space if left unchecked. Engineers often need to archive or compress older log files to free up storage while preserving them for future reference. This practical example demonstrates how to use Python's shutil module to automate the process of identifying and archiving old log files into a compressed archive.
โ๏ธ The Goal
The objective is to create a script that:
- Scans a specified directory for log files (e.g., files ending with .log)
- Identifies files older than a certain number of days (e.g., 7 days)
- Compresses those old files into a single archive (e.g., .zip or .tar.gz)
- Removes the original log files after successful archiving
๐ ๏ธ Key Tools from the shutil Module
The shutil module provides high-level file operations. For this task, we rely on:
- shutil.make_archive() โ Creates a compressed archive (zip, tar, etc.) from a collection of files
- shutil.rmtree() โ Removes a directory and all its contents (useful if logs are stored in subdirectories)
- os.path.getmtime() โ Retrieves the last modification time of a file to determine its age
๐ Comparison: Archive Formats
| Format | Compression | Platform Support | Use Case |
|---|---|---|---|
| zip | Moderate | Cross-platform | General purpose, easy to open on any OS |
| tar.gz | High (gzip) | Linux/macOS preferred | Better compression, common in server environments |
| tar.bz2 | Very high | Linux/macOS preferred | Maximum compression, slower to create |
For this example, we will use zip format for simplicity and broad compatibility.
๐ต๏ธ Step-by-Step Script Logic
Step 1: Define the target directory and age threshold
- Set a variable for the directory containing log files (e.g., /var/log/myapp)
- Set a variable for the maximum age in days (e.g., 7)
Step 2: List all files in the directory
- Use os.listdir() to get all entries in the directory
- Filter for files ending with .log using str.endswith()
Step 3: Check file age
- For each log file, use os.path.getmtime() to get the last modification timestamp
- Compare the timestamp with the current time minus the age threshold (using time.time())
- If the file is older than the threshold, add it to a list of files to archive
Step 4: Create the archive
- Use shutil.make_archive() with the base name (e.g., old_logs_archive), format (zip), and the list of file paths
- The archive will be created in the current working directory or a specified location
Step 5: Remove the original files
- After successful archiving, use os.remove() to delete each original log file
- Optionally, log the names of archived and removed files for auditing
๐งช Example Script Structure (Inline)
The script begins by importing os, shutil, and time. It then defines the target directory and age threshold. A function called archive_old_logs() is created to encapsulate the logic.
Inside the function:
- A list called files_to_archive is initialized as empty
- The script loops through all entries in the target directory using os.listdir()
- For each entry that ends with .log, it checks the file's modification time
- If the file is older than the threshold, its full path is appended to files_to_archive
- After the loop, if there are files to archive, shutil.make_archive() is called with a base name like archived_logs, format zip, and the root directory set to the target directory. The files to archive are passed as a list of relative paths
- Finally, each original file in files_to_archive is deleted using os.remove()
The script prints a message for each file archived and removed, and a summary at the end.
โ Expected Outcome
When the script runs:
- It scans the target directory and identifies all .log files older than 7 days
- It creates a single .zip file named archived_logs.zip containing those old log files
- It deletes the original log files from the directory
- The console output shows something like:
Archived: /var/log/myapp/error_20231001.log
Archived: /var/log/myapp/access_20230928.log
Removed: /var/log/myapp/error_20231001.log
Removed: /var/log/myapp/access_20230928.log
Archiving complete. 2 files processed.
๐งน Best Practices for Engineers
- Always test the script on a copy of your log directory first to avoid accidental data loss
- Use absolute paths to avoid confusion about the current working directory
- Add error handling (e.g., try-except blocks) to manage permission issues or missing files
- Schedule the script using cron (Linux) or Task Scheduler (Windows) for regular automated cleanup
- Consider logging the script's actions to a separate file for audit trails
๐ Summary
This practical example shows how the shutil module simplifies file archiving tasks that would otherwise require complex shell commands or manual effort. By combining shutil.make_archive() with file age checks, engineers can build a reliable log rotation system that keeps storage under control without losing historical data. The same approach can be extended to archive other file types or to compress entire directories.
Interactive Views
You are currently in ๐ All-in-One mode. Use the tabs at the top to switch to ๐ Theory Only or ๐ป Code Only views.
This example shows how to use Python's shutil module to archive old log files into compressed archives for storage or cleanup.
๐ฆ Example 1: Creating a simple ZIP archive of a single file
This example demonstrates how to compress one log file into a ZIP archive.
import shutil
import os
log_file = "server.log"
archive_name = "server_log_backup"
shutil.make_archive(archive_name, "zip", ".", log_file)
๐ค Output: server_log_backup.zip created in current directory
๐ฆ Example 2: Archiving an entire directory of log files
This example shows how to compress a folder containing multiple log files into one archive.
import shutil
log_directory = "logs"
archive_name = "logs_backup"
shutil.make_archive(archive_name, "zip", log_directory)
๐ค Output: logs_backup.zip created containing all files from logs/
๐ฆ Example 3: Archiving only files older than 7 days
This example demonstrates how to filter and archive log files based on their age.
import shutil
import os
import time
log_directory = "logs"
archive_name = "old_logs_backup"
temp_dir = "temp_old_logs"
os.makedirs(temp_dir, exist_ok=True)
seven_days_ago = time.time() - (7 * 24 * 60 * 60)
for file_name in os.listdir(log_directory):
file_path = os.path.join(log_directory, file_name)
if os.path.getmtime(file_path) < seven_days_ago:
shutil.copy2(file_path, temp_dir)
shutil.make_archive(archive_name, "zip", temp_dir)
shutil.rmtree(temp_dir)
๐ค Output: old_logs_backup.zip created with files older than 7 days
๐ฆ Example 4: Archiving and deleting original log files after backup
This example shows how to archive old logs and then remove the original files to free up space.
import shutil
import os
import time
log_directory = "logs"
archive_name = "archived_logs"
temp_dir = "temp_archive"
os.makedirs(temp_dir, exist_ok=True)
seven_days_ago = time.time() - (7 * 24 * 60 * 60)
for file_name in os.listdir(log_directory):
file_path = os.path.join(log_directory, file_name)
if os.path.getmtime(file_path) < seven_days_ago:
shutil.copy2(file_path, temp_dir)
os.remove(file_path)
shutil.make_archive(archive_name, "zip", temp_dir)
shutil.rmtree(temp_dir)
๐ค Output: archived_logs.zip created, original old log files deleted
๐ฆ Example 5: Archiving logs with date-stamped archive names
This example demonstrates how to create archives with the current date in the filename for better organization.
import shutil
import os
from datetime import datetime
log_directory = "logs"
current_date = datetime.now().strftime("%Y-%m-%d")
archive_name = f"logs_archive_{current_date}"
shutil.make_archive(archive_name, "zip", log_directory)
๐ค Output: logs_archive_2024-01-15.zip created (date will vary)
๐ Comparison Table: Archive Methods
| Method | Use Case | Preserves Original Files | Creates Archive |
|---|---|---|---|
make_archive with single file |
Quick backup of one log | Yes | Yes |
make_archive with directory |
Backup entire log folder | Yes | Yes |
| Filter by age + archive | Archive old logs only | Yes | Yes |
| Archive + delete originals | Free up disk space | No | Yes |
| Date-stamped archive name | Organized backups | Yes | Yes |