Character Classes (Digits, Words, Whitespace)

🏷️ Regular Expressions (Regex) / Basic Regex Patterns

📚 All-in-One📖 Theory Only💻 Code Only

🧠 Context Introduction

When working with text data—whether it's parsing log files, validating configuration inputs, or cleaning up system output—you'll often need to match specific types of characters. Instead of writing long patterns like [0123456789] or [abcdefghijklmnopqrstuvwxyz] , regex provides shorthand character classes that make your patterns cleaner, shorter, and easier to read.

Character classes are like shortcuts for common character groups. They allow you to match digits, word characters, whitespace, and their opposites with just a single symbol.

⚙️ The Three Core Character Classes

There are three essential character classes you'll use most often:

\d – Matches any digit (0 through 9). Equivalent to [0-9]
\w – Matches any "word" character: letters (a-z, A-Z), digits (0-9), and underscore (_). Equivalent to [a-zA-Z0-9_]
\s – Matches any whitespace character: space, tab, newline, carriage return, form feed. Equivalent to [ \t\n\r\f]

Each of these also has a negated (opposite) version using an uppercase letter:

\D – Matches any character that is NOT a digit
\W – Matches any character that is NOT a word character
\S – Matches any character that is NOT whitespace

📊 Quick Reference Table

Class	Matches	Equivalent To	Opposite Class
`\d`	Any digit (0-9)	`[0-9]`	`\D` (non-digit)
`\w`	Letter, digit, or underscore	`[a-zA-Z0-9_]`	`\W` (non-word)
`\s`	Space, tab, newline, etc.	`[ \t\n\r\f]`	`\S` (non-whitespace)

🕵️ How They Work in Practice

🔢 Matching Digits with `\d`

The \d class is perfect when you need to find numbers in text:

Pattern \d\d\d matches any three consecutive digits, such as 123, 456, or 789
Pattern \d{3}-\d{2}-\d{4} matches a social security number format like 123-45-6789
Pattern \d+ matches one or more digits in a row, like 42 or 1000

🔤 Matching Word Characters with `\w`

The \w class is useful for finding identifiers, variable names, or any alphanumeric content:

Pattern \w+ matches whole words like hello, test123, or user_name
Pattern \w{5} matches any five-character word, such as hello or world
Pattern \w+@\w+\.\w+ is a simple pattern to match email-like strings ([email protected])

⬜ Matching Whitespace with `\s`

The \s class helps you find spaces, tabs, and line breaks:

Pattern \s+ matches one or more whitespace characters, useful for splitting text
Pattern \d\s\d matches a digit, followed by a space, followed by another digit (3 7)
Pattern \n specifically matches a newline character (though \s also matches it)

🛠️ Using Negated Classes (Opposites)

Sometimes you need to match what is not a digit, word, or whitespace. This is where uppercase versions shine:

\D matches any non-digit: letters, symbols, spaces—anything except 0-9
\W matches any non-word character: punctuation, spaces, symbols like @, #, $, %
\S matches any non-whitespace character: everything that isn't a space, tab, or newline

Example use cases: - Pattern \D+ matches a sequence of non-digit characters, like hello or abc-xyz - Pattern \W+ matches punctuation or symbols between words, like -- or !!! - Pattern \S+ matches a continuous string of non-whitespace, which is great for extracting tokens or URLs

🧪 Practical Examples for Engineers

Parsing a Log Line

Imagine a log entry like ERROR 404: Page not found at 2024-01-15 14:30:00

\d+ would match 404, 2024, 01, 15, 14, 30, 00
\w+ would match ERROR, Page, not, found, at
\s+ would match each space between words
\D+ would match ERROR, : Page not found at, and the spaces and colons

Validating a Simple Input

For a username that should only contain letters, digits, and underscores:

Pattern ^\w+$ ensures the entire string consists only of word characters
The ^ anchors to the start, $ anchors to the end

For a phone number like 555-123-4567:

Pattern \d{3}-\d{3}-\d{4} matches the exact format

⚠️ Important Notes to Remember

Character classes match one single character at a time unless you use a quantifier like + (one or more) or * (zero or more)
The underscore _ is included in \w , which is why it's great for matching variable names
\s matches more than just a space—it also matches tabs (\t ), newlines (\n ), and carriage returns (\r )
Negated classes (\D , \W , \S ) match everything that the lowercase version does not—including characters you might not expect, like punctuation or symbols

🧩 Summary

Character classes are your shortcuts for matching common character types:

\d for digits, \D for non-digits
\w for word characters, \W for non-word characters
\s for whitespace, \S for non-whitespace

These classes make your regex patterns cleaner, more readable, and easier to maintain. Practice combining them with quantifiers and anchors to build powerful text-matching patterns for your everyday tasks.

Character classes let you match specific types of characters like digits, letters, or whitespace using shorthand codes.

🔢 Example 1: Matching a single digit with `\d`

This example finds the first digit in a string.

import re

text = "Order 42 is ready"
pattern = r"\d"
match = re.search(pattern, text)
print(match.group())

📤 Output: 4

🔤 Example 2: Matching a single word character with `\w`

This example finds the first letter, digit, or underscore in a string.

import re

text = "Hello, World!"
pattern = r"\w"
match = re.search(pattern, text)
print(match.group())

📤 Output: H

⬜ Example 3: Matching whitespace with `\s`

This example finds the first space, tab, or newline in a string.

import re

text = "first second"
pattern = r"\s"
match = re.search(pattern, text)
print(match.group())

📤 Output: ** **

🔢 Example 4: Finding all digits in a string with `\d`

This example extracts every digit from a phone number string.

import re

text = "Call 555-1234 now"
pattern = r"\d"
matches = re.findall(pattern, text)
print(matches)

📤 Output: ['5', '5', '5', '1', '2', '3', '4']

🔤 Example 5: Finding all words in a sentence with `\w+`

This example extracts all complete words from a sentence.

import re

text = "Python 3.9 is great!"
pattern = r"\w+"
matches = re.findall(pattern, text)
print(matches)

📤 Output: ['Python', '3', '9', 'is', 'great']

📊 Character Class Comparison Table

Shorthand	Matches	Opposite	Opposite Matches
`\d`	Any digit (0-9)	`\D`	Any non-digit
`\w`	Any word char (a-z, A-Z, 0-9, _)	`\W`	Any non-word character
`\s`	Any whitespace (space, tab, newline)	`\S`	Any non-whitespace character

🧠 Context Introduction

When working with text data—whether it's parsing log files, validating configuration inputs, or cleaning up system output—you'll often need to match specific types of characters. Instead of writing long patterns like [0123456789] or [abcdefghijklmnopqrstuvwxyz] , regex provides shorthand character classes that make your patterns cleaner, shorter, and easier to read.

Character classes are like shortcuts for common character groups. They allow you to match digits, word characters, whitespace, and their opposites with just a single symbol.

⚙️ The Three Core Character Classes

There are three essential character classes you'll use most often:

\d – Matches any digit (0 through 9). Equivalent to [0-9]
\w – Matches any "word" character: letters (a-z, A-Z), digits (0-9), and underscore (_). Equivalent to [a-zA-Z0-9_]
\s – Matches any whitespace character: space, tab, newline, carriage return, form feed. Equivalent to [ \t\n\r\f]

Each of these also has a negated (opposite) version using an uppercase letter:

\D – Matches any character that is NOT a digit
\W – Matches any character that is NOT a word character
\S – Matches any character that is NOT whitespace

📊 Quick Reference Table

Class	Matches	Equivalent To	Opposite Class
`\d`	Any digit (0-9)	`[0-9]`	`\D` (non-digit)
`\w`	Letter, digit, or underscore	`[a-zA-Z0-9_]`	`\W` (non-word)
`\s`	Space, tab, newline, etc.	`[ \t\n\r\f]`	`\S` (non-whitespace)

🕵️ How They Work in Practice

🔢 Matching Digits with `\d`

The \d class is perfect when you need to find numbers in text:

Pattern \d\d\d matches any three consecutive digits, such as 123, 456, or 789
Pattern \d{3}-\d{2}-\d{4} matches a social security number format like 123-45-6789
Pattern \d+ matches one or more digits in a row, like 42 or 1000

🔤 Matching Word Characters with `\w`

The \w class is useful for finding identifiers, variable names, or any alphanumeric content:

Pattern \w+ matches whole words like hello, test123, or user_name
Pattern \w{5} matches any five-character word, such as hello or world
Pattern \w+@\w+\.\w+ is a simple pattern to match email-like strings ([email protected])

⬜ Matching Whitespace with `\s`

The \s class helps you find spaces, tabs, and line breaks:

Pattern \s+ matches one or more whitespace characters, useful for splitting text
Pattern \d\s\d matches a digit, followed by a space, followed by another digit (3 7)
Pattern \n specifically matches a newline character (though \s also matches it)

🛠️ Using Negated Classes (Opposites)

Sometimes you need to match what is not a digit, word, or whitespace. This is where uppercase versions shine:

\D matches any non-digit: letters, symbols, spaces—anything except 0-9
\W matches any non-word character: punctuation, spaces, symbols like @, #, $, %
\S matches any non-whitespace character: everything that isn't a space, tab, or newline

Example use cases: - Pattern \D+ matches a sequence of non-digit characters, like hello or abc-xyz - Pattern \W+ matches punctuation or symbols between words, like -- or !!! - Pattern \S+ matches a continuous string of non-whitespace, which is great for extracting tokens or URLs

🧪 Practical Examples for Engineers

Parsing a Log Line

Imagine a log entry like ERROR 404: Page not found at 2024-01-15 14:30:00

\d+ would match 404, 2024, 01, 15, 14, 30, 00
\w+ would match ERROR, Page, not, found, at
\s+ would match each space between words
\D+ would match ERROR, : Page not found at, and the spaces and colons

Validating a Simple Input

For a username that should only contain letters, digits, and underscores:

Pattern ^\w+$ ensures the entire string consists only of word characters
The ^ anchors to the start, $ anchors to the end

For a phone number like 555-123-4567:

Pattern \d{3}-\d{3}-\d{4} matches the exact format

⚠️ Important Notes to Remember

Character classes match one single character at a time unless you use a quantifier like + (one or more) or * (zero or more)
The underscore _ is included in \w , which is why it's great for matching variable names
\s matches more than just a space—it also matches tabs (\t ), newlines (\n ), and carriage returns (\r )
Negated classes (\D , \W , \S ) match everything that the lowercase version does not—including characters you might not expect, like punctuation or symbols

🧩 Summary

Character classes are your shortcuts for matching common character types:

\d for digits, \D for non-digits
\w for word characters, \W for non-word characters
\s for whitespace, \S for non-whitespace

These classes make your regex patterns cleaner, more readable, and easier to maintain. Practice combining them with quantifiers and anchors to build powerful text-matching patterns for your everyday tasks.

Interactive Views

You are currently in 📚 All-in-One mode. Use the tabs at the top to switch to 📖 Theory Only or 💻 Code Only views.

Character classes let you match specific types of characters like digits, letters, or whitespace using shorthand codes.

🔢 Example 1: Matching a single digit with `\d`

This example finds the first digit in a string.

import re

text = "Order 42 is ready"
pattern = r"\d"
match = re.search(pattern, text)
print(match.group())

📤 Output: 4

🔤 Example 2: Matching a single word character with `\w`

This example finds the first letter, digit, or underscore in a string.

import re

text = "Hello, World!"
pattern = r"\w"
match = re.search(pattern, text)
print(match.group())

📤 Output: H

⬜ Example 3: Matching whitespace with `\s`

This example finds the first space, tab, or newline in a string.

import re

text = "first second"
pattern = r"\s"
match = re.search(pattern, text)
print(match.group())

📤 Output: ** **

🔢 Example 4: Finding all digits in a string with `\d`

This example extracts every digit from a phone number string.

import re

text = "Call 555-1234 now"
pattern = r"\d"
matches = re.findall(pattern, text)
print(matches)

📤 Output: ['5', '5', '5', '1', '2', '3', '4']

🔤 Example 5: Finding all words in a sentence with `\w+`

This example extracts all complete words from a sentence.

import re

text = "Python 3.9 is great!"
pattern = r"\w+"
matches = re.findall(pattern, text)
print(matches)

📤 Output: ['Python', '3', '9', 'is', 'great']

📊 Character Class Comparison Table

Shorthand	Matches	Opposite	Opposite Matches
`\d`	Any digit (0-9)	`\D`	Any non-digit
`\w`	Any word char (a-z, A-Z, 0-9, _)	`\W`	Any non-word character
`\s`	Any whitespace (space, tab, newline)	`\S`	Any non-whitespace character

Character Classes (Digits, Words, Whitespace)

🧠 Context Introduction

⚙️ The Three Core Character Classes

📊 Quick Reference Table

🕵️ How They Work in Practice

🔢 Matching Digits with \d

🔤 Matching Word Characters with \w

⬜ Matching Whitespace with \s

🛠️ Using Negated Classes (Opposites)

🧪 Practical Examples for Engineers

Parsing a Log Line

Validating a Simple Input

⚠️ Important Notes to Remember

🧩 Summary

🔢 Example 1: Matching a single digit with \d

🔤 Example 2: Matching a single word character with \w

⬜ Example 3: Matching whitespace with \s

🔢 Example 4: Finding all digits in a string with \d

🔤 Example 5: Finding all words in a sentence with \w+

📊 Character Class Comparison Table

🧠 Context Introduction

⚙️ The Three Core Character Classes

📊 Quick Reference Table

🕵️ How They Work in Practice

🔢 Matching Digits with \d

🔤 Matching Word Characters with \w

⬜ Matching Whitespace with \s

🛠️ Using Negated Classes (Opposites)

🧪 Practical Examples for Engineers

Parsing a Log Line

Validating a Simple Input

⚠️ Important Notes to Remember

🧩 Summary

🔢 Example 1: Matching a single digit with \d

🔤 Example 2: Matching a single word character with \w

⬜ Example 3: Matching whitespace with \s

🔢 Example 4: Finding all digits in a string with \d

🔤 Example 5: Finding all words in a sentence with \w+

📊 Character Class Comparison Table

🔢 Matching Digits with `\d`

🔤 Matching Word Characters with `\w`

⬜ Matching Whitespace with `\s`

🔢 Example 1: Matching a single digit with `\d`

🔤 Example 2: Matching a single word character with `\w`

⬜ Example 3: Matching whitespace with `\s`

🔢 Example 4: Finding all digits in a string with `\d`

🔤 Example 5: Finding all words in a sentence with `\w+`

🔢 Matching Digits with `\d`

🔤 Matching Word Characters with `\w`

⬜ Matching Whitespace with `\s`

🔢 Example 1: Matching a single digit with `\d`

🔤 Example 2: Matching a single word character with `\w`

⬜ Example 3: Matching whitespace with `\s`

🔢 Example 4: Finding all digits in a string with `\d`

🔤 Example 5: Finding all words in a sentence with `\w+`