Substitutions and Replacing Patterns via sub()
🏷️ Regular Expressions (Regex) / Key re Functions
🧠 Context Introduction
When working with text data—whether it's cleaning log files, updating configuration values, or masking sensitive information—you'll often need to find specific patterns and replace them with something else. Python's re.sub() function is your go-to tool for this task. It allows you to search for a regex pattern in a string and substitute all matches with a replacement string, giving you powerful control over text transformations.
⚙️ What is re.sub()?
The re.sub() function performs substitution (find-and-replace) using regular expressions. It scans through a string, finds all non-overlapping matches of a pattern, and replaces them with a specified replacement.
Key characteristics: - Returns a new string with the replacements applied (original string remains unchanged) - Can replace all occurrences or a limited number of matches - Supports backreferences to reuse parts of the matched text in the replacement - Can accept a function as the replacement for dynamic substitutions
🛠️ Basic Syntax
The general structure of re.sub() is:
re.sub(pattern, replacement, string, count=0, flags=0)
Where: - pattern – The regex pattern to search for - replacement – The string (or function) to replace matches with - string – The input text to process - count – (Optional) Maximum number of replacements; default 0 means replace all - flags – (Optional) Regex flags like re.IGNORECASE
📊 Simple Substitution Examples
Example 1: Replace all digits with a placeholder
Input string: "Order 12345 shipped on 2024-01-15"
Pattern: \d+ (one or more digits)
Replacement: "[REDACTED]"
Result: "Order [REDACTED] shipped on [REDACTED]-[REDACTED]-[REDACTED]"
Example 2: Replace specific word
Input string: "The color of the sky is blue"
Pattern: blue
Replacement: red
Result: "The color of the sky is red"
🕵️ Using Backreferences in Replacement
Backreferences allow you to reuse captured groups from the pattern inside the replacement string. They are referenced using \1, \2, etc.
Example: Swap first and last name
Input string: "Doe, John"
Pattern: (\w+),\s(\w+)
Replacement: \2 \1
Result: "John Doe"
Explanation: - (\w+) captures "Doe" into group 1 - (\w+) captures "John" into group 2 - \2 \1 places group 2 first, then group 1
🔄 Limiting Replacements with count
By default, re.sub() replaces all occurrences. Use the count parameter to limit replacements.
Example: Replace only first two digits
Input string: "Item 123, Price 456, Tax 789"
Pattern: \d+
Replacement: "XXX"
Count: 2
Result: "Item XXX, Price XXX, Tax 789"
🧩 Using a Function as Replacement
Instead of a static string, you can pass a function that receives each match object and returns the replacement dynamically.
Example: Double all numbers
Input string: "Values: 5, 10, 15"
Pattern: \d+
Replacement function: lambda m: str(int(m.group()) * 2)
Result: "Values: 10, 20, 30"
How it works: - The function is called for each match - m.group() gets the matched text - The function returns the transformed replacement string
🧹 Practical Use Cases for Engineers
| Use Case | Pattern | Replacement | Example Input | Example Output |
|---|---|---|---|---|
| Mask email addresses | \b[\w.-]+@[\w.-]+.\w+\b | "[EMAIL]" | "Contact: [email protected]" | "Contact: [EMAIL]" |
| Normalize whitespace | \s+ | " " | "Hello World\nTest" | "Hello World Test" |
| Remove HTML tags | <[^>]+> | "" | " Hello " |
"Hello" |
| Format phone numbers | (\d{3})(\d{3})(\d{4}) | (\1) \2-\3 | "1234567890" | "(123) 456-7890" |
| Sanitize filenames | [^\w.-] | "_" | "my file (v2).txt" | "my_file_v2_.txt" |
⚠️ Important Notes
- re.sub() is case-sensitive by default; use flags=re.IGNORECASE for case-insensitive matching
- The original string is never modified; always assign the result to a new variable
- If no matches are found, the original string is returned unchanged
- For complex replacements, consider using re.subn() which returns a tuple of (new_string, number_of_replacements)
🧪 Quick Reference: re.sub() vs re.subn()
| Feature | re.sub() | re.subn() |
|---|---|---|
| Returns | Modified string | Tuple: (string, count) |
| Replacement info | Not provided | Number of substitutions made |
| Use case | Simple find-and-replace | When you need to know how many changes were made |
💡 Final Tip
Start with simple patterns and test them on sample strings before applying to real data. The re.sub() function is incredibly versatile—whether you're cleaning logs, transforming configuration files, or masking sensitive data, mastering substitutions will save you hours of manual text editing.
The re.sub() function finds all matches of a pattern in a string and replaces them with a specified replacement string.
🔧 Example 1: Basic word replacement
Replace all occurrences of the word "cat" with "dog" in a simple string.
import re
text = "The cat sat on the mat."
result = re.sub(r"cat", "dog", text)
print(result)
📤 Output: The dog sat on the mat.
🔧 Example 2: Case-insensitive replacement
Replace "python" with "Java" regardless of letter case.
import re
text = "I love Python. Python is great. python rocks!"
result = re.sub(r"python", "Java", text, flags=re.IGNORECASE)
print(result)
📤 Output: I love Java. Java is great. Java rocks!
🔧 Example 3: Limiting the number of replacements
Replace only the first two occurrences of "apple" with "orange".
import re
text = "apple apple apple apple"
result = re.sub(r"apple", "orange", text, count=2)
print(result)
📤 Output: orange orange apple apple
🔧 Example 4: Using a replacement function
Replace each number with its square value using a helper function.
import re
def square(match):
number = int(match.group(0))
return str(number ** 2)
text = "Numbers: 2, 3, 4"
result = re.sub(r"\d+", square, text)
print(result)
📤 Output: Numbers: 4, 9, 16
🔧 Example 5: Removing unwanted characters
Remove all non-digit characters from a phone number string.
import re
phone = "Call me at (555) 123-4567"
result = re.sub(r"\D", "", phone)
print(result)
📤 Output: 5551234567
🔧 Example 6: Replacing with captured groups
Swap first and last names in a list of names.
import re
text = "Smith, John | Doe, Jane"
result = re.sub(r"(\w+), (\w+)", r"\2 \1", text)
print(result)
📤 Output: John Smith | Jane Doe
Comparison Table
| Feature | re.sub() |
str.replace() |
|---|---|---|
| Pattern matching | Supports regex patterns | Only exact strings |
| Case-insensitive | Yes, with flags=re.IGNORECASE |
No |
| Limit replacements | Yes, with count= parameter |
Yes, with count= parameter |
| Replacement function | Yes, pass a callable | No |
| Capture group support | Yes, via \1, \2, etc. |
No |
🧠 Context Introduction
When working with text data—whether it's cleaning log files, updating configuration values, or masking sensitive information—you'll often need to find specific patterns and replace them with something else. Python's re.sub() function is your go-to tool for this task. It allows you to search for a regex pattern in a string and substitute all matches with a replacement string, giving you powerful control over text transformations.
⚙️ What is re.sub()?
The re.sub() function performs substitution (find-and-replace) using regular expressions. It scans through a string, finds all non-overlapping matches of a pattern, and replaces them with a specified replacement.
Key characteristics: - Returns a new string with the replacements applied (original string remains unchanged) - Can replace all occurrences or a limited number of matches - Supports backreferences to reuse parts of the matched text in the replacement - Can accept a function as the replacement for dynamic substitutions
🛠️ Basic Syntax
The general structure of re.sub() is:
re.sub(pattern, replacement, string, count=0, flags=0)
Where: - pattern – The regex pattern to search for - replacement – The string (or function) to replace matches with - string – The input text to process - count – (Optional) Maximum number of replacements; default 0 means replace all - flags – (Optional) Regex flags like re.IGNORECASE
📊 Simple Substitution Examples
Example 1: Replace all digits with a placeholder
Input string: "Order 12345 shipped on 2024-01-15"
Pattern: \d+ (one or more digits)
Replacement: "[REDACTED]"
Result: "Order [REDACTED] shipped on [REDACTED]-[REDACTED]-[REDACTED]"
Example 2: Replace specific word
Input string: "The color of the sky is blue"
Pattern: blue
Replacement: red
Result: "The color of the sky is red"
🕵️ Using Backreferences in Replacement
Backreferences allow you to reuse captured groups from the pattern inside the replacement string. They are referenced using \1, \2, etc.
Example: Swap first and last name
Input string: "Doe, John"
Pattern: (\w+),\s(\w+)
Replacement: \2 \1
Result: "John Doe"
Explanation: - (\w+) captures "Doe" into group 1 - (\w+) captures "John" into group 2 - \2 \1 places group 2 first, then group 1
🔄 Limiting Replacements with count
By default, re.sub() replaces all occurrences. Use the count parameter to limit replacements.
Example: Replace only first two digits
Input string: "Item 123, Price 456, Tax 789"
Pattern: \d+
Replacement: "XXX"
Count: 2
Result: "Item XXX, Price XXX, Tax 789"
🧩 Using a Function as Replacement
Instead of a static string, you can pass a function that receives each match object and returns the replacement dynamically.
Example: Double all numbers
Input string: "Values: 5, 10, 15"
Pattern: \d+
Replacement function: lambda m: str(int(m.group()) * 2)
Result: "Values: 10, 20, 30"
How it works: - The function is called for each match - m.group() gets the matched text - The function returns the transformed replacement string
🧹 Practical Use Cases for Engineers
| Use Case | Pattern | Replacement | Example Input | Example Output |
|---|---|---|---|---|
| Mask email addresses | \b[\w.-]+@[\w.-]+.\w+\b | "[EMAIL]" | "Contact: [email protected]" | "Contact: [EMAIL]" |
| Normalize whitespace | \s+ | " " | "Hello World\nTest" | "Hello World Test" |
| Remove HTML tags | <[^>]+> | "" | " Hello " |
"Hello" |
| Format phone numbers | (\d{3})(\d{3})(\d{4}) | (\1) \2-\3 | "1234567890" | "(123) 456-7890" |
| Sanitize filenames | [^\w.-] | "_" | "my file (v2).txt" | "my_file_v2_.txt" |
⚠️ Important Notes
- re.sub() is case-sensitive by default; use flags=re.IGNORECASE for case-insensitive matching
- The original string is never modified; always assign the result to a new variable
- If no matches are found, the original string is returned unchanged
- For complex replacements, consider using re.subn() which returns a tuple of (new_string, number_of_replacements)
🧪 Quick Reference: re.sub() vs re.subn()
| Feature | re.sub() | re.subn() |
|---|---|---|
| Returns | Modified string | Tuple: (string, count) |
| Replacement info | Not provided | Number of substitutions made |
| Use case | Simple find-and-replace | When you need to know how many changes were made |
💡 Final Tip
Start with simple patterns and test them on sample strings before applying to real data. The re.sub() function is incredibly versatile—whether you're cleaning logs, transforming configuration files, or masking sensitive data, mastering substitutions will save you hours of manual text editing.
Interactive Views
You are currently in 📚 All-in-One mode. Use the tabs at the top to switch to 📖 Theory Only or 💻 Code Only views.
The re.sub() function finds all matches of a pattern in a string and replaces them with a specified replacement string.
🔧 Example 1: Basic word replacement
Replace all occurrences of the word "cat" with "dog" in a simple string.
import re
text = "The cat sat on the mat."
result = re.sub(r"cat", "dog", text)
print(result)
📤 Output: The dog sat on the mat.
🔧 Example 2: Case-insensitive replacement
Replace "python" with "Java" regardless of letter case.
import re
text = "I love Python. Python is great. python rocks!"
result = re.sub(r"python", "Java", text, flags=re.IGNORECASE)
print(result)
📤 Output: I love Java. Java is great. Java rocks!
🔧 Example 3: Limiting the number of replacements
Replace only the first two occurrences of "apple" with "orange".
import re
text = "apple apple apple apple"
result = re.sub(r"apple", "orange", text, count=2)
print(result)
📤 Output: orange orange apple apple
🔧 Example 4: Using a replacement function
Replace each number with its square value using a helper function.
import re
def square(match):
number = int(match.group(0))
return str(number ** 2)
text = "Numbers: 2, 3, 4"
result = re.sub(r"\d+", square, text)
print(result)
📤 Output: Numbers: 4, 9, 16
🔧 Example 5: Removing unwanted characters
Remove all non-digit characters from a phone number string.
import re
phone = "Call me at (555) 123-4567"
result = re.sub(r"\D", "", phone)
print(result)
📤 Output: 5551234567
🔧 Example 6: Replacing with captured groups
Swap first and last names in a list of names.
import re
text = "Smith, John | Doe, Jane"
result = re.sub(r"(\w+), (\w+)", r"\2 \1", text)
print(result)
📤 Output: John Smith | Jane Doe
Comparison Table
| Feature | re.sub() |
str.replace() |
|---|---|---|
| Pattern matching | Supports regex patterns | Only exact strings |
| Case-insensitive | Yes, with flags=re.IGNORECASE |
No |
| Limit replacements | Yes, with count= parameter |
Yes, with count= parameter |
| Replacement function | Yes, pass a callable | No |
| Capture group support | Yes, via \1, \2, etc. |
No |