The re Module for Regex Pattern Matching

🏷️ Modules and Imports / Built-in Modules for Engineers

🎯 Context Introduction

Regular expressions (regex) are powerful patterns used to search, match, and manipulate text. The re module in Python provides all the tools you need to work with regex patterns. Whether you're validating log formats, extracting IP addresses from configuration files, or searching for specific error codes in system logs, the re module is an essential tool in your Python toolkit.


⚙️ What is the re Module?

The re module is Python's built-in library for working with regular expressions. It allows you to:

  • Search for patterns within strings
  • Match patterns at the beginning of strings
  • Replace text that matches patterns
  • Split strings based on patterns
  • Extract specific parts of text

🛠️ Key Functions in the re Module

Here are the most commonly used functions from the re module:

  • re.search(pattern, string) – Searches the entire string for the first occurrence of the pattern. Returns a match object if found, otherwise returns None.

  • re.match(pattern, string) – Checks if the pattern matches at the beginning of the string. Returns a match object or None.

  • re.findall(pattern, string) – Returns a list of all non-overlapping matches of the pattern in the string.

  • re.sub(pattern, replacement, string) – Replaces all occurrences of the pattern with the replacement string.

  • re.split(pattern, string) – Splits the string at every occurrence of the pattern and returns a list of substrings.

  • re.compile(pattern) – Compiles a regex pattern into a regex object for better performance when using the same pattern multiple times.


📊 Common Regex Patterns for Engineers

Pattern Meaning Example Match
. Matches any character except newline a.c matches "abc", "a1c"
^ Matches the start of a string ^Hello matches "Hello world"
$ Matches the end of a string world$ matches "Hello world"
* Matches 0 or more repetitions ab* matches "a", "ab", "abb"
+ Matches 1 or more repetitions ab+ matches "ab", "abb"
? Matches 0 or 1 repetition ab? matches "a", "ab"
\d Matches any digit (0-9) \d{3} matches "123"
\w Matches any word character (a-z, A-Z, 0-9, _) \w+ matches "hello_123"
\s Matches any whitespace character \s matches space, tab, newline
[abc] Matches any character in the set [aeiou] matches any vowel
[^abc] Matches any character NOT in the set [^0-9] matches any non-digit
** ** Acts as OR operator

🕵️ Working with Match Objects

When you use re.search() or re.match(), they return a match object with useful methods:

  • group() – Returns the matched string
  • start() – Returns the starting position of the match
  • end() – Returns the ending position of the match
  • span() – Returns a tuple of (start, end) positions

💻 Practical Examples for Engineers

Example 1: Searching for IP Addresses in Log Files

Use re.search() with the pattern \d{1,3}.\d{1,3}.\d{1,3}.\d{1,3} to find an IP address in a log line. If a match is found, use .group() to extract the matched IP address. If no match is found, print a "No IP found" message.

Example 2: Extracting All Error Codes

Use re.findall() with the pattern ERROR-\d{3} to find all error codes like "ERROR-404" or "ERROR-500" from a log string. The result will be a list of all matching error codes.

Example 3: Replacing Sensitive Data

Use re.sub() with the pattern \d{4}-\d{4}-\d{4}-\d{4} to replace credit card numbers in a string with the text [REDACTED]. This is useful for sanitizing logs before sharing them.

Example 4: Splitting Configuration Lines

Use re.split() with the pattern [,;]\s* to split a configuration line that uses commas or semicolons as delimiters. The result will be a list of individual configuration values.


🔧 Using Raw Strings for Patterns

When writing regex patterns, always use raw strings by prefixing the string with r. This prevents Python from interpreting backslashes as escape sequences:

  • Use r"\d+" instead of "\d+"
  • Use r"\s+" instead of "\s+"

Raw strings make your patterns cleaner and easier to read.


🎯 Compiling Patterns for Performance

If you're using the same regex pattern multiple times, compile it with re.compile():

  • Create a compiled pattern object: pattern = re.compile(r"\d{3}-\d{2}-\d{4}")
  • Use the compiled object's methods: pattern.search(text), pattern.findall(text)
  • This improves performance, especially when searching through large files or running many searches

📋 Comparison: search() vs match() vs findall()

Function What It Does When to Use
re.search() Finds the pattern anywhere in the string When you need to find a pattern that could appear anywhere
re.match() Checks only at the beginning of the string When you need to verify the string starts with a specific pattern
re.findall() Returns all occurrences of the pattern When you need to extract every matching piece of text

⚡ Quick Tips for Engineers

  • Always use raw strings (r"...") for regex patterns to avoid escape sequence issues
  • Test your regex patterns on small samples before applying them to large files
  • Use re.IGNORECASE flag for case-insensitive matching: re.search(pattern, text, re.IGNORECASE)
  • Use re.MULTILINE flag to make ^ and $ match at the start/end of each line, not just the string
  • Use re.DOTALL flag to make the . character match newlines as well
  • When in doubt, use re.search() instead of re.match() for more flexible pattern finding

🚀 Summary

The re module is an indispensable tool for any engineer working with text data. With functions like search(), match(), findall(), sub(), and split(), you can efficiently parse logs, validate input, extract information, and transform text. Mastering regex patterns will save you countless hours when dealing with configuration files, system logs, and data processing tasks.


The re module provides functions for searching, matching, and manipulating strings using regular expressions (patterns that describe text).


🔧 Example 1: Checking if a pattern exists in a string

This example shows how to search for a simple word pattern inside a larger string.

import re

text = "The server error code is 404"
pattern = "error"
result = re.search(pattern, text)
print(result)

📤 Output:


🔧 Example 2: Extracting the matched text

This example shows how to get the actual text that matched the pattern.

import re

text = "Connection timeout after 30 seconds"
pattern = "timeout"
match = re.search(pattern, text)
print(match.group())

📤 Output: timeout


🔧 Example 3: Finding all occurrences of a pattern

This example shows how to find every instance of a pattern in a string.

import re

log = "Error at line 10, Error at line 42, Error at line 87"
pattern = "Error"
matches = re.findall(pattern, log)
print(matches)

📤 Output: ['Error', 'Error', 'Error']


🔧 Example 4: Using a pattern to match digits

This example shows how to use \d to find all numeric values in a string.

import re

config = "Port: 8080, Timeout: 5000, Retries: 3"
pattern = "\d+"
numbers = re.findall(pattern, config)
print(numbers)

📤 Output: ['8080', '5000', '3']


🔧 Example 5: Replacing matched patterns with new text

This example shows how to replace all IP addresses in a log with a placeholder.

import re

log_entry = "Connection from 192.168.1.10 failed. Retry from 10.0.0.5"
pattern = "\d+\.\d+\.\d+\.\d+"
cleaned_log = re.sub(pattern, "[IP_REDACTED]", log_entry)
print(cleaned_log)

📤 Output: Connection from [IP_REDACTED] failed. Retry from [IP_REDACTED]


📊 Comparison Table: Common re Functions

Function What it does Returns when found Returns when not found
re.search() Finds first match anywhere in string Match object None
re.match() Finds match only at start of string Match object None
re.findall() Finds all non-overlapping matches List of strings Empty list []
re.sub() Replaces all matches with new text Modified string Original string unchanged

🎯 Context Introduction

Regular expressions (regex) are powerful patterns used to search, match, and manipulate text. The re module in Python provides all the tools you need to work with regex patterns. Whether you're validating log formats, extracting IP addresses from configuration files, or searching for specific error codes in system logs, the re module is an essential tool in your Python toolkit.


⚙️ What is the re Module?

The re module is Python's built-in library for working with regular expressions. It allows you to:

  • Search for patterns within strings
  • Match patterns at the beginning of strings
  • Replace text that matches patterns
  • Split strings based on patterns
  • Extract specific parts of text

🛠️ Key Functions in the re Module

Here are the most commonly used functions from the re module:

  • re.search(pattern, string) – Searches the entire string for the first occurrence of the pattern. Returns a match object if found, otherwise returns None.

  • re.match(pattern, string) – Checks if the pattern matches at the beginning of the string. Returns a match object or None.

  • re.findall(pattern, string) – Returns a list of all non-overlapping matches of the pattern in the string.

  • re.sub(pattern, replacement, string) – Replaces all occurrences of the pattern with the replacement string.

  • re.split(pattern, string) – Splits the string at every occurrence of the pattern and returns a list of substrings.

  • re.compile(pattern) – Compiles a regex pattern into a regex object for better performance when using the same pattern multiple times.


📊 Common Regex Patterns for Engineers

Pattern Meaning Example Match
. Matches any character except newline a.c matches "abc", "a1c"
^ Matches the start of a string ^Hello matches "Hello world"
$ Matches the end of a string world$ matches "Hello world"
* Matches 0 or more repetitions ab* matches "a", "ab", "abb"
+ Matches 1 or more repetitions ab+ matches "ab", "abb"
? Matches 0 or 1 repetition ab? matches "a", "ab"
\d Matches any digit (0-9) \d{3} matches "123"
\w Matches any word character (a-z, A-Z, 0-9, _) \w+ matches "hello_123"
\s Matches any whitespace character \s matches space, tab, newline
[abc] Matches any character in the set [aeiou] matches any vowel
[^abc] Matches any character NOT in the set [^0-9] matches any non-digit
** ** Acts as OR operator

🕵️ Working with Match Objects

When you use re.search() or re.match(), they return a match object with useful methods:

  • group() – Returns the matched string
  • start() – Returns the starting position of the match
  • end() – Returns the ending position of the match
  • span() – Returns a tuple of (start, end) positions

💻 Practical Examples for Engineers

Example 1: Searching for IP Addresses in Log Files

Use re.search() with the pattern \d{1,3}.\d{1,3}.\d{1,3}.\d{1,3} to find an IP address in a log line. If a match is found, use .group() to extract the matched IP address. If no match is found, print a "No IP found" message.

Example 2: Extracting All Error Codes

Use re.findall() with the pattern ERROR-\d{3} to find all error codes like "ERROR-404" or "ERROR-500" from a log string. The result will be a list of all matching error codes.

Example 3: Replacing Sensitive Data

Use re.sub() with the pattern \d{4}-\d{4}-\d{4}-\d{4} to replace credit card numbers in a string with the text [REDACTED]. This is useful for sanitizing logs before sharing them.

Example 4: Splitting Configuration Lines

Use re.split() with the pattern [,;]\s* to split a configuration line that uses commas or semicolons as delimiters. The result will be a list of individual configuration values.


🔧 Using Raw Strings for Patterns

When writing regex patterns, always use raw strings by prefixing the string with r. This prevents Python from interpreting backslashes as escape sequences:

  • Use r"\d+" instead of "\d+"
  • Use r"\s+" instead of "\s+"

Raw strings make your patterns cleaner and easier to read.


🎯 Compiling Patterns for Performance

If you're using the same regex pattern multiple times, compile it with re.compile():

  • Create a compiled pattern object: pattern = re.compile(r"\d{3}-\d{2}-\d{4}")
  • Use the compiled object's methods: pattern.search(text), pattern.findall(text)
  • This improves performance, especially when searching through large files or running many searches

📋 Comparison: search() vs match() vs findall()

Function What It Does When to Use
re.search() Finds the pattern anywhere in the string When you need to find a pattern that could appear anywhere
re.match() Checks only at the beginning of the string When you need to verify the string starts with a specific pattern
re.findall() Returns all occurrences of the pattern When you need to extract every matching piece of text

⚡ Quick Tips for Engineers

  • Always use raw strings (r"...") for regex patterns to avoid escape sequence issues
  • Test your regex patterns on small samples before applying them to large files
  • Use re.IGNORECASE flag for case-insensitive matching: re.search(pattern, text, re.IGNORECASE)
  • Use re.MULTILINE flag to make ^ and $ match at the start/end of each line, not just the string
  • Use re.DOTALL flag to make the . character match newlines as well
  • When in doubt, use re.search() instead of re.match() for more flexible pattern finding

🚀 Summary

The re module is an indispensable tool for any engineer working with text data. With functions like search(), match(), findall(), sub(), and split(), you can efficiently parse logs, validate input, extract information, and transform text. Mastering regex patterns will save you countless hours when dealing with configuration files, system logs, and data processing tasks.

Interactive Views

You are currently in 📚 All-in-One mode. Use the tabs at the top to switch to 📖 Theory Only or 💻 Code Only views.

The re module provides functions for searching, matching, and manipulating strings using regular expressions (patterns that describe text).


🔧 Example 1: Checking if a pattern exists in a string

This example shows how to search for a simple word pattern inside a larger string.

import re

text = "The server error code is 404"
pattern = "error"
result = re.search(pattern, text)
print(result)

📤 Output:


🔧 Example 2: Extracting the matched text

This example shows how to get the actual text that matched the pattern.

import re

text = "Connection timeout after 30 seconds"
pattern = "timeout"
match = re.search(pattern, text)
print(match.group())

📤 Output: timeout


🔧 Example 3: Finding all occurrences of a pattern

This example shows how to find every instance of a pattern in a string.

import re

log = "Error at line 10, Error at line 42, Error at line 87"
pattern = "Error"
matches = re.findall(pattern, log)
print(matches)

📤 Output: ['Error', 'Error', 'Error']


🔧 Example 4: Using a pattern to match digits

This example shows how to use \d to find all numeric values in a string.

import re

config = "Port: 8080, Timeout: 5000, Retries: 3"
pattern = "\d+"
numbers = re.findall(pattern, config)
print(numbers)

📤 Output: ['8080', '5000', '3']


🔧 Example 5: Replacing matched patterns with new text

This example shows how to replace all IP addresses in a log with a placeholder.

import re

log_entry = "Connection from 192.168.1.10 failed. Retry from 10.0.0.5"
pattern = "\d+\.\d+\.\d+\.\d+"
cleaned_log = re.sub(pattern, "[IP_REDACTED]", log_entry)
print(cleaned_log)

📤 Output: Connection from [IP_REDACTED] failed. Retry from [IP_REDACTED]


📊 Comparison Table: Common re Functions

Function What it does Returns when found Returns when not found
re.search() Finds first match anywhere in string Match object None
re.match() Finds match only at start of string Match object None
re.findall() Finds all non-overlapping matches List of strings Empty list []
re.sub() Replaces all matches with new text Modified string Original string unchanged