Substitutions and Replacing Patterns via sub()

🏷️ Regular Expressions (Regex) / Key re Functions

🧠 Context Introduction

When working with text data—whether it's cleaning log files, updating configuration values, or masking sensitive information—you'll often need to find specific patterns and replace them with something else. Python's re.sub() function is your go-to tool for this task. It allows you to search for a regex pattern in a string and substitute all matches with a replacement string, giving you powerful control over text transformations.


⚙️ What is re.sub()?

The re.sub() function performs substitution (find-and-replace) using regular expressions. It scans through a string, finds all non-overlapping matches of a pattern, and replaces them with a specified replacement.

Key characteristics: - Returns a new string with the replacements applied (original string remains unchanged) - Can replace all occurrences or a limited number of matches - Supports backreferences to reuse parts of the matched text in the replacement - Can accept a function as the replacement for dynamic substitutions


🛠️ Basic Syntax

The general structure of re.sub() is:

re.sub(pattern, replacement, string, count=0, flags=0)

Where: - pattern – The regex pattern to search for - replacement – The string (or function) to replace matches with - string – The input text to process - count – (Optional) Maximum number of replacements; default 0 means replace all - flags – (Optional) Regex flags like re.IGNORECASE


📊 Simple Substitution Examples

Example 1: Replace all digits with a placeholder

Input string: "Order 12345 shipped on 2024-01-15"
Pattern: \d+ (one or more digits)
Replacement: "[REDACTED]"
Result: "Order [REDACTED] shipped on [REDACTED]-[REDACTED]-[REDACTED]"

Example 2: Replace specific word

Input string: "The color of the sky is blue"
Pattern: blue
Replacement: red
Result: "The color of the sky is red"


🕵️ Using Backreferences in Replacement

Backreferences allow you to reuse captured groups from the pattern inside the replacement string. They are referenced using \1, \2, etc.

Example: Swap first and last name

Input string: "Doe, John"
Pattern: (\w+),\s(\w+)
Replacement: \2 \1
Result: "John Doe"

Explanation: - (\w+) captures "Doe" into group 1 - (\w+) captures "John" into group 2 - \2 \1 places group 2 first, then group 1


🔄 Limiting Replacements with count

By default, re.sub() replaces all occurrences. Use the count parameter to limit replacements.

Example: Replace only first two digits

Input string: "Item 123, Price 456, Tax 789"
Pattern: \d+
Replacement: "XXX"
Count: 2
Result: "Item XXX, Price XXX, Tax 789"


🧩 Using a Function as Replacement

Instead of a static string, you can pass a function that receives each match object and returns the replacement dynamically.

Example: Double all numbers

Input string: "Values: 5, 10, 15"
Pattern: \d+
Replacement function: lambda m: str(int(m.group()) * 2)
Result: "Values: 10, 20, 30"

How it works: - The function is called for each match - m.group() gets the matched text - The function returns the transformed replacement string


🧹 Practical Use Cases for Engineers

Use Case Pattern Replacement Example Input Example Output
Mask email addresses \b[\w.-]+@[\w.-]+.\w+\b "[EMAIL]" "Contact: [email protected]" "Contact: [EMAIL]"
Normalize whitespace \s+ " " "Hello World\nTest" "Hello World Test"
Remove HTML tags <[^>]+> "" "

Hello

"
"Hello"
Format phone numbers (\d{3})(\d{3})(\d{4}) (\1) \2-\3 "1234567890" "(123) 456-7890"
Sanitize filenames [^\w.-] "_" "my file (v2).txt" "my_file_v2_.txt"

⚠️ Important Notes

  • re.sub() is case-sensitive by default; use flags=re.IGNORECASE for case-insensitive matching
  • The original string is never modified; always assign the result to a new variable
  • If no matches are found, the original string is returned unchanged
  • For complex replacements, consider using re.subn() which returns a tuple of (new_string, number_of_replacements)

🧪 Quick Reference: re.sub() vs re.subn()

Feature re.sub() re.subn()
Returns Modified string Tuple: (string, count)
Replacement info Not provided Number of substitutions made
Use case Simple find-and-replace When you need to know how many changes were made

💡 Final Tip

Start with simple patterns and test them on sample strings before applying to real data. The re.sub() function is incredibly versatile—whether you're cleaning logs, transforming configuration files, or masking sensitive data, mastering substitutions will save you hours of manual text editing.


The re.sub() function finds all matches of a pattern in a string and replaces them with a specified replacement string.


🔧 Example 1: Basic word replacement

Replace all occurrences of the word "cat" with "dog" in a simple string.

import re

text = "The cat sat on the mat."
result = re.sub(r"cat", "dog", text)
print(result)

📤 Output: The dog sat on the mat.


🔧 Example 2: Case-insensitive replacement

Replace "python" with "Java" regardless of letter case.

import re

text = "I love Python. Python is great. python rocks!"
result = re.sub(r"python", "Java", text, flags=re.IGNORECASE)
print(result)

📤 Output: I love Java. Java is great. Java rocks!


🔧 Example 3: Limiting the number of replacements

Replace only the first two occurrences of "apple" with "orange".

import re

text = "apple apple apple apple"
result = re.sub(r"apple", "orange", text, count=2)
print(result)

📤 Output: orange orange apple apple


🔧 Example 4: Using a replacement function

Replace each number with its square value using a helper function.

import re

def square(match):
    number = int(match.group(0))
    return str(number ** 2)

text = "Numbers: 2, 3, 4"
result = re.sub(r"\d+", square, text)
print(result)

📤 Output: Numbers: 4, 9, 16


🔧 Example 5: Removing unwanted characters

Remove all non-digit characters from a phone number string.

import re

phone = "Call me at (555) 123-4567"
result = re.sub(r"\D", "", phone)
print(result)

📤 Output: 5551234567


🔧 Example 6: Replacing with captured groups

Swap first and last names in a list of names.

import re

text = "Smith, John | Doe, Jane"
result = re.sub(r"(\w+), (\w+)", r"\2 \1", text)
print(result)

📤 Output: John Smith | Jane Doe


Comparison Table

Feature re.sub() str.replace()
Pattern matching Supports regex patterns Only exact strings
Case-insensitive Yes, with flags=re.IGNORECASE No
Limit replacements Yes, with count= parameter Yes, with count= parameter
Replacement function Yes, pass a callable No
Capture group support Yes, via \1, \2, etc. No

🧠 Context Introduction

When working with text data—whether it's cleaning log files, updating configuration values, or masking sensitive information—you'll often need to find specific patterns and replace them with something else. Python's re.sub() function is your go-to tool for this task. It allows you to search for a regex pattern in a string and substitute all matches with a replacement string, giving you powerful control over text transformations.


⚙️ What is re.sub()?

The re.sub() function performs substitution (find-and-replace) using regular expressions. It scans through a string, finds all non-overlapping matches of a pattern, and replaces them with a specified replacement.

Key characteristics: - Returns a new string with the replacements applied (original string remains unchanged) - Can replace all occurrences or a limited number of matches - Supports backreferences to reuse parts of the matched text in the replacement - Can accept a function as the replacement for dynamic substitutions


🛠️ Basic Syntax

The general structure of re.sub() is:

re.sub(pattern, replacement, string, count=0, flags=0)

Where: - pattern – The regex pattern to search for - replacement – The string (or function) to replace matches with - string – The input text to process - count – (Optional) Maximum number of replacements; default 0 means replace all - flags – (Optional) Regex flags like re.IGNORECASE


📊 Simple Substitution Examples

Example 1: Replace all digits with a placeholder

Input string: "Order 12345 shipped on 2024-01-15"
Pattern: \d+ (one or more digits)
Replacement: "[REDACTED]"
Result: "Order [REDACTED] shipped on [REDACTED]-[REDACTED]-[REDACTED]"

Example 2: Replace specific word

Input string: "The color of the sky is blue"
Pattern: blue
Replacement: red
Result: "The color of the sky is red"


🕵️ Using Backreferences in Replacement

Backreferences allow you to reuse captured groups from the pattern inside the replacement string. They are referenced using \1, \2, etc.

Example: Swap first and last name

Input string: "Doe, John"
Pattern: (\w+),\s(\w+)
Replacement: \2 \1
Result: "John Doe"

Explanation: - (\w+) captures "Doe" into group 1 - (\w+) captures "John" into group 2 - \2 \1 places group 2 first, then group 1


🔄 Limiting Replacements with count

By default, re.sub() replaces all occurrences. Use the count parameter to limit replacements.

Example: Replace only first two digits

Input string: "Item 123, Price 456, Tax 789"
Pattern: \d+
Replacement: "XXX"
Count: 2
Result: "Item XXX, Price XXX, Tax 789"


🧩 Using a Function as Replacement

Instead of a static string, you can pass a function that receives each match object and returns the replacement dynamically.

Example: Double all numbers

Input string: "Values: 5, 10, 15"
Pattern: \d+
Replacement function: lambda m: str(int(m.group()) * 2)
Result: "Values: 10, 20, 30"

How it works: - The function is called for each match - m.group() gets the matched text - The function returns the transformed replacement string


🧹 Practical Use Cases for Engineers

Use Case Pattern Replacement Example Input Example Output
Mask email addresses \b[\w.-]+@[\w.-]+.\w+\b "[EMAIL]" "Contact: [email protected]" "Contact: [EMAIL]"
Normalize whitespace \s+ " " "Hello World\nTest" "Hello World Test"
Remove HTML tags <[^>]+> "" "

Hello

"
"Hello"
Format phone numbers (\d{3})(\d{3})(\d{4}) (\1) \2-\3 "1234567890" "(123) 456-7890"
Sanitize filenames [^\w.-] "_" "my file (v2).txt" "my_file_v2_.txt"

⚠️ Important Notes

  • re.sub() is case-sensitive by default; use flags=re.IGNORECASE for case-insensitive matching
  • The original string is never modified; always assign the result to a new variable
  • If no matches are found, the original string is returned unchanged
  • For complex replacements, consider using re.subn() which returns a tuple of (new_string, number_of_replacements)

🧪 Quick Reference: re.sub() vs re.subn()

Feature re.sub() re.subn()
Returns Modified string Tuple: (string, count)
Replacement info Not provided Number of substitutions made
Use case Simple find-and-replace When you need to know how many changes were made

💡 Final Tip

Start with simple patterns and test them on sample strings before applying to real data. The re.sub() function is incredibly versatile—whether you're cleaning logs, transforming configuration files, or masking sensitive data, mastering substitutions will save you hours of manual text editing.

Interactive Views

You are currently in 📚 All-in-One mode. Use the tabs at the top to switch to 📖 Theory Only or 💻 Code Only views.

The re.sub() function finds all matches of a pattern in a string and replaces them with a specified replacement string.


🔧 Example 1: Basic word replacement

Replace all occurrences of the word "cat" with "dog" in a simple string.

import re

text = "The cat sat on the mat."
result = re.sub(r"cat", "dog", text)
print(result)

📤 Output: The dog sat on the mat.


🔧 Example 2: Case-insensitive replacement

Replace "python" with "Java" regardless of letter case.

import re

text = "I love Python. Python is great. python rocks!"
result = re.sub(r"python", "Java", text, flags=re.IGNORECASE)
print(result)

📤 Output: I love Java. Java is great. Java rocks!


🔧 Example 3: Limiting the number of replacements

Replace only the first two occurrences of "apple" with "orange".

import re

text = "apple apple apple apple"
result = re.sub(r"apple", "orange", text, count=2)
print(result)

📤 Output: orange orange apple apple


🔧 Example 4: Using a replacement function

Replace each number with its square value using a helper function.

import re

def square(match):
    number = int(match.group(0))
    return str(number ** 2)

text = "Numbers: 2, 3, 4"
result = re.sub(r"\d+", square, text)
print(result)

📤 Output: Numbers: 4, 9, 16


🔧 Example 5: Removing unwanted characters

Remove all non-digit characters from a phone number string.

import re

phone = "Call me at (555) 123-4567"
result = re.sub(r"\D", "", phone)
print(result)

📤 Output: 5551234567


🔧 Example 6: Replacing with captured groups

Swap first and last names in a list of names.

import re

text = "Smith, John | Doe, Jane"
result = re.sub(r"(\w+), (\w+)", r"\2 \1", text)
print(result)

📤 Output: John Smith | Jane Doe


Comparison Table

Feature re.sub() str.replace()
Pattern matching Supports regex patterns Only exact strings
Case-insensitive Yes, with flags=re.IGNORECASE No
Limit replacements Yes, with count= parameter Yes, with count= parameter
Replacement function Yes, pass a callable No
Capture group support Yes, via \1, \2, etc. No