Character Classes (Digits, Words, Whitespace)
🏷️ Regular Expressions (Regex) / Basic Regex Patterns
🧠 Context Introduction
When working with text data—whether it's parsing log files, validating configuration inputs, or cleaning up system output—you'll often need to match specific types of characters. Instead of writing long patterns like [0123456789] or [abcdefghijklmnopqrstuvwxyz] , regex provides shorthand character classes that make your patterns cleaner, shorter, and easier to read.
Character classes are like shortcuts for common character groups. They allow you to match digits, word characters, whitespace, and their opposites with just a single symbol.
⚙️ The Three Core Character Classes
There are three essential character classes you'll use most often:
\d– Matches any digit (0 through 9). Equivalent to[0-9]\w– Matches any "word" character: letters (a-z, A-Z), digits (0-9), and underscore (_). Equivalent to[a-zA-Z0-9_]\s– Matches any whitespace character: space, tab, newline, carriage return, form feed. Equivalent to[ \t\n\r\f]
Each of these also has a negated (opposite) version using an uppercase letter:
\D– Matches any character that is NOT a digit\W– Matches any character that is NOT a word character\S– Matches any character that is NOT whitespace
📊 Quick Reference Table
| Class | Matches | Equivalent To | Opposite Class |
|---|---|---|---|
\d |
Any digit (0-9) | [0-9] |
\D (non-digit) |
\w |
Letter, digit, or underscore | [a-zA-Z0-9_] |
\W (non-word) |
\s |
Space, tab, newline, etc. | [ \t\n\r\f] |
\S (non-whitespace) |
🕵️ How They Work in Practice
🔢 Matching Digits with \d
The \d class is perfect when you need to find numbers in text:
- Pattern
\d\d\dmatches any three consecutive digits, such as 123, 456, or 789 - Pattern
\d{3}-\d{2}-\d{4}matches a social security number format like 123-45-6789 - Pattern
\d+matches one or more digits in a row, like 42 or 1000
🔤 Matching Word Characters with \w
The \w class is useful for finding identifiers, variable names, or any alphanumeric content:
- Pattern
\w+matches whole words like hello, test123, or user_name - Pattern
\w{5}matches any five-character word, such as hello or world - Pattern
\w+@\w+\.\w+is a simple pattern to match email-like strings ([email protected])
⬜ Matching Whitespace with \s
The \s class helps you find spaces, tabs, and line breaks:
- Pattern
\s+matches one or more whitespace characters, useful for splitting text - Pattern
\d\s\dmatches a digit, followed by a space, followed by another digit (3 7) - Pattern
\nspecifically matches a newline character (though\salso matches it)
🛠️ Using Negated Classes (Opposites)
Sometimes you need to match what is not a digit, word, or whitespace. This is where uppercase versions shine:
\Dmatches any non-digit: letters, symbols, spaces—anything except 0-9\Wmatches any non-word character: punctuation, spaces, symbols like @, #, $, %\Smatches any non-whitespace character: everything that isn't a space, tab, or newline
Example use cases:
- Pattern \D+ matches a sequence of non-digit characters, like hello or abc-xyz
- Pattern \W+ matches punctuation or symbols between words, like -- or !!!
- Pattern \S+ matches a continuous string of non-whitespace, which is great for extracting tokens or URLs
🧪 Practical Examples for Engineers
Parsing a Log Line
Imagine a log entry like ERROR 404: Page not found at 2024-01-15 14:30:00
\d+would match 404, 2024, 01, 15, 14, 30, 00\w+would match ERROR, Page, not, found, at\s+would match each space between words\D+would match ERROR, : Page not found at, and the spaces and colons
Validating a Simple Input
For a username that should only contain letters, digits, and underscores:
- Pattern
^\w+$ensures the entire string consists only of word characters - The
^anchors to the start,$anchors to the end
For a phone number like 555-123-4567:
- Pattern
\d{3}-\d{3}-\d{4}matches the exact format
⚠️ Important Notes to Remember
- Character classes match one single character at a time unless you use a quantifier like
+(one or more) or*(zero or more) - The underscore
_is included in\w, which is why it's great for matching variable names \smatches more than just a space—it also matches tabs (\t), newlines (\n), and carriage returns (\r)- Negated classes (
\D,\W,\S) match everything that the lowercase version does not—including characters you might not expect, like punctuation or symbols
🧩 Summary
Character classes are your shortcuts for matching common character types:
\dfor digits,\Dfor non-digits\wfor word characters,\Wfor non-word characters\sfor whitespace,\Sfor non-whitespace
These classes make your regex patterns cleaner, more readable, and easier to maintain. Practice combining them with quantifiers and anchors to build powerful text-matching patterns for your everyday tasks.
Character classes let you match specific types of characters like digits, letters, or whitespace using shorthand codes.
🔢 Example 1: Matching a single digit with \d
This example finds the first digit in a string.
import re
text = "Order 42 is ready"
pattern = r"\d"
match = re.search(pattern, text)
print(match.group())
📤 Output: 4
🔤 Example 2: Matching a single word character with \w
This example finds the first letter, digit, or underscore in a string.
import re
text = "Hello, World!"
pattern = r"\w"
match = re.search(pattern, text)
print(match.group())
📤 Output: H
⬜ Example 3: Matching whitespace with \s
This example finds the first space, tab, or newline in a string.
import re
text = "first second"
pattern = r"\s"
match = re.search(pattern, text)
print(match.group())
📤 Output: ** **
🔢 Example 4: Finding all digits in a string with \d
This example extracts every digit from a phone number string.
import re
text = "Call 555-1234 now"
pattern = r"\d"
matches = re.findall(pattern, text)
print(matches)
📤 Output: ['5', '5', '5', '1', '2', '3', '4']
🔤 Example 5: Finding all words in a sentence with \w+
This example extracts all complete words from a sentence.
import re
text = "Python 3.9 is great!"
pattern = r"\w+"
matches = re.findall(pattern, text)
print(matches)
📤 Output: ['Python', '3', '9', 'is', 'great']
📊 Character Class Comparison Table
| Shorthand | Matches | Opposite | Opposite Matches |
|---|---|---|---|
\d |
Any digit (0-9) | \D |
Any non-digit |
\w |
Any word char (a-z, A-Z, 0-9, _) | \W |
Any non-word character |
\s |
Any whitespace (space, tab, newline) | \S |
Any non-whitespace character |
🧠 Context Introduction
When working with text data—whether it's parsing log files, validating configuration inputs, or cleaning up system output—you'll often need to match specific types of characters. Instead of writing long patterns like [0123456789] or [abcdefghijklmnopqrstuvwxyz] , regex provides shorthand character classes that make your patterns cleaner, shorter, and easier to read.
Character classes are like shortcuts for common character groups. They allow you to match digits, word characters, whitespace, and their opposites with just a single symbol.
⚙️ The Three Core Character Classes
There are three essential character classes you'll use most often:
\d– Matches any digit (0 through 9). Equivalent to[0-9]\w– Matches any "word" character: letters (a-z, A-Z), digits (0-9), and underscore (_). Equivalent to[a-zA-Z0-9_]\s– Matches any whitespace character: space, tab, newline, carriage return, form feed. Equivalent to[ \t\n\r\f]
Each of these also has a negated (opposite) version using an uppercase letter:
\D– Matches any character that is NOT a digit\W– Matches any character that is NOT a word character\S– Matches any character that is NOT whitespace
📊 Quick Reference Table
| Class | Matches | Equivalent To | Opposite Class |
|---|---|---|---|
\d |
Any digit (0-9) | [0-9] |
\D (non-digit) |
\w |
Letter, digit, or underscore | [a-zA-Z0-9_] |
\W (non-word) |
\s |
Space, tab, newline, etc. | [ \t\n\r\f] |
\S (non-whitespace) |
🕵️ How They Work in Practice
🔢 Matching Digits with \d
The \d class is perfect when you need to find numbers in text:
- Pattern
\d\d\dmatches any three consecutive digits, such as 123, 456, or 789 - Pattern
\d{3}-\d{2}-\d{4}matches a social security number format like 123-45-6789 - Pattern
\d+matches one or more digits in a row, like 42 or 1000
🔤 Matching Word Characters with \w
The \w class is useful for finding identifiers, variable names, or any alphanumeric content:
- Pattern
\w+matches whole words like hello, test123, or user_name - Pattern
\w{5}matches any five-character word, such as hello or world - Pattern
\w+@\w+\.\w+is a simple pattern to match email-like strings ([email protected])
⬜ Matching Whitespace with \s
The \s class helps you find spaces, tabs, and line breaks:
- Pattern
\s+matches one or more whitespace characters, useful for splitting text - Pattern
\d\s\dmatches a digit, followed by a space, followed by another digit (3 7) - Pattern
\nspecifically matches a newline character (though\salso matches it)
🛠️ Using Negated Classes (Opposites)
Sometimes you need to match what is not a digit, word, or whitespace. This is where uppercase versions shine:
\Dmatches any non-digit: letters, symbols, spaces—anything except 0-9\Wmatches any non-word character: punctuation, spaces, symbols like @, #, $, %\Smatches any non-whitespace character: everything that isn't a space, tab, or newline
Example use cases:
- Pattern \D+ matches a sequence of non-digit characters, like hello or abc-xyz
- Pattern \W+ matches punctuation or symbols between words, like -- or !!!
- Pattern \S+ matches a continuous string of non-whitespace, which is great for extracting tokens or URLs
🧪 Practical Examples for Engineers
Parsing a Log Line
Imagine a log entry like ERROR 404: Page not found at 2024-01-15 14:30:00
\d+would match 404, 2024, 01, 15, 14, 30, 00\w+would match ERROR, Page, not, found, at\s+would match each space between words\D+would match ERROR, : Page not found at, and the spaces and colons
Validating a Simple Input
For a username that should only contain letters, digits, and underscores:
- Pattern
^\w+$ensures the entire string consists only of word characters - The
^anchors to the start,$anchors to the end
For a phone number like 555-123-4567:
- Pattern
\d{3}-\d{3}-\d{4}matches the exact format
⚠️ Important Notes to Remember
- Character classes match one single character at a time unless you use a quantifier like
+(one or more) or*(zero or more) - The underscore
_is included in\w, which is why it's great for matching variable names \smatches more than just a space—it also matches tabs (\t), newlines (\n), and carriage returns (\r)- Negated classes (
\D,\W,\S) match everything that the lowercase version does not—including characters you might not expect, like punctuation or symbols
🧩 Summary
Character classes are your shortcuts for matching common character types:
\dfor digits,\Dfor non-digits\wfor word characters,\Wfor non-word characters\sfor whitespace,\Sfor non-whitespace
These classes make your regex patterns cleaner, more readable, and easier to maintain. Practice combining them with quantifiers and anchors to build powerful text-matching patterns for your everyday tasks.
Interactive Views
You are currently in 📚 All-in-One mode. Use the tabs at the top to switch to 📖 Theory Only or 💻 Code Only views.
Character classes let you match specific types of characters like digits, letters, or whitespace using shorthand codes.
🔢 Example 1: Matching a single digit with \d
This example finds the first digit in a string.
import re
text = "Order 42 is ready"
pattern = r"\d"
match = re.search(pattern, text)
print(match.group())
📤 Output: 4
🔤 Example 2: Matching a single word character with \w
This example finds the first letter, digit, or underscore in a string.
import re
text = "Hello, World!"
pattern = r"\w"
match = re.search(pattern, text)
print(match.group())
📤 Output: H
⬜ Example 3: Matching whitespace with \s
This example finds the first space, tab, or newline in a string.
import re
text = "first second"
pattern = r"\s"
match = re.search(pattern, text)
print(match.group())
📤 Output: ** **
🔢 Example 4: Finding all digits in a string with \d
This example extracts every digit from a phone number string.
import re
text = "Call 555-1234 now"
pattern = r"\d"
matches = re.findall(pattern, text)
print(matches)
📤 Output: ['5', '5', '5', '1', '2', '3', '4']
🔤 Example 5: Finding all words in a sentence with \w+
This example extracts all complete words from a sentence.
import re
text = "Python 3.9 is great!"
pattern = r"\w+"
matches = re.findall(pattern, text)
print(matches)
📤 Output: ['Python', '3', '9', 'is', 'great']
📊 Character Class Comparison Table
| Shorthand | Matches | Opposite | Opposite Matches |
|---|---|---|---|
\d |
Any digit (0-9) | \D |
Any non-digit |
\w |
Any word char (a-z, A-Z, 0-9, _) | \W |
Any non-word character |
\s |
Any whitespace (space, tab, newline) | \S |
Any non-whitespace character |