Matching Character Sets Using Brackets
🏷️ Regular Expressions (Regex) / Basic Regex Patterns
When working with text data, you often need to match not just a single specific character, but any character from a group of allowed options. This is where bracket notation [...] becomes incredibly useful. Instead of writing multiple patterns for each possible character, you define a character set inside square brackets, and the regex engine will match if it finds any one of those characters at that position.
⚙️ What Are Character Sets?
A character set is a list of characters placed inside square brackets [ ]. The pattern will match exactly one character from that list.
- Syntax:
[abc]matches eithera,b, orc. - Behavior: It matches only one character, but that character can be any one from the set.
- Case sensitivity: By default,
[abc]does not matchA,B, orC. You must include uppercase versions if needed.
Example: The pattern [aeiou] will match any single vowel (lowercase) in a string. If you search the word "hello", it will match the e and the o.
🕵️ Defining Ranges Inside Brackets
Instead of listing every character individually (e.g., [0123456789]), you can use a hyphen - to define a range.
- Digit range:
[0-9]matches any single digit from 0 to 9. - Letter range:
[a-z]matches any lowercase letter from a to z. - Mixed ranges:
[a-zA-Z]matches any single uppercase or lowercase letter. - Alphanumeric:
[a-zA-Z0-9]matches any single letter or digit.
Important rule: The hyphen is only treated as a range indicator when it appears between two characters. If you place it at the start or end of the set (e.g., [-abc] or [abc-]), it becomes a literal hyphen character.
📊 Comparison: Single Character vs. Character Set
| Pattern | What It Matches | Example Match in "cat1" |
|---|---|---|
c |
Only the letter c |
c |
[abc] |
Any one of a, b, or c |
c (first match from left) |
[0-9] |
Any single digit | 1 |
[a-z] |
Any single lowercase letter | c (first match from left) |
[cC] |
Either lowercase c or uppercase C |
c |
🛠️ Negating a Character Set
Sometimes you want to match any character except the ones listed. You do this by placing a caret ^ immediately after the opening bracket.
- Syntax:
[^abc]matches any character that is nota,b, orc. - Behavior: It will match spaces, digits, punctuation, and any other character not in the set.
- Important: The caret only negates when it is the first character inside the brackets. If placed elsewhere (e.g.,
[a^b]), it becomes a literal caret character.
Example: The pattern [^0-9] will match any character that is not a digit. In the string "Room 101", it would match R, o, o, m, and the space.
🧩 Common Use Cases for Engineers
- Validating input fields:
[a-zA-Z0-9]to ensure a username contains only letters and digits. - Parsing log files:
[0-9]{2}to match two-digit values like hours or days. - Filtering special characters:
[^a-zA-Z0-9]to find all non-alphanumeric characters in a string. - Matching specific delimiters:
[,;:]to match commas, semicolons, or colons in CSV-like data.
🎯 Practical Example: Matching a Date Component
Imagine you need to find the month number in a date string like "2024-03-15".
- Pattern:
[0-9]{2}would match20,24,03,15— too broad. - Better pattern:
-[0-9]{2}-would match-03-specifically. - Even more precise:
-[0-1][0-9]-would match months from-01-to-12-(first digit must be 0 or 1).
This shows how combining character sets with quantifiers (like {2}) gives you fine-grained control over what you extract.
✅ Key Takeaways
- Brackets
[...]define a set of allowed characters — the pattern matches if any one of them is present. - Use hyphens
-to define ranges like[0-9]or[a-z]for cleaner patterns. - Use caret
^at the start of the set to negate it:[^abc]matches anything excepta,b, orc. - Character sets match exactly one character — combine them with quantifiers (
+,*,{n}) to match multiple characters. - Always consider case sensitivity — include both uppercase and lowercase ranges if needed.
Mastering character sets is a foundational step toward building powerful regex patterns for data validation, log parsing, and text extraction tasks.
Brackets [] in regex define a character set that matches any single character listed inside the brackets.
🔧 Example 1: Matching a Single Character from a Set
This example checks if a string contains either the letter "a" or "b".
import re
pattern = r"[ab]"
text = "cat"
result = re.search(pattern, text)
print(result.group())
📤 Output: a
🔧 Example 2: Matching Any Digit from a Range
This example finds the first digit in a string using a range inside brackets.
import re
pattern = r"[0-9]"
text = "Room 42"
result = re.search(pattern, text)
print(result.group())
📤 Output: 4
🔧 Example 3: Matching Any Letter from a Range
This example finds the first lowercase letter in a string using a range inside brackets.
import re
pattern = r"[a-z]"
text = "HELLO world"
result = re.search(pattern, text)
print(result.group())
📤 Output: w
🔧 Example 4: Matching Characters NOT in a Set
This example finds the first character that is NOT a vowel.
import re
pattern = r"[^aeiou]"
text = "apple"
result = re.search(pattern, text)
print(result.group())
📤 Output: p
🔧 Example 5: Matching Multiple Character Sets in a Phone Number
This example extracts the area code from a phone number using a character set for digits.
import re
pattern = r"[0-9]{3}"
text = "Phone: 555-1234"
result = re.search(pattern, text)
print(result.group())
📤 Output: 555
📊 Comparison Table: Bracket Patterns
| Pattern | Matches | Example Input | Match Result |
|---|---|---|---|
[abc] |
a, b, or c | "cat" | "a" |
[0-9] |
any digit | "Room 42" | "4" |
[a-z] |
any lowercase letter | "HELLO world" | "w" |
[^aeiou] |
any non-vowel | "apple" | "p" |
[0-9]{3} |
exactly 3 digits | "555-1234" | "555" |
When working with text data, you often need to match not just a single specific character, but any character from a group of allowed options. This is where bracket notation [...] becomes incredibly useful. Instead of writing multiple patterns for each possible character, you define a character set inside square brackets, and the regex engine will match if it finds any one of those characters at that position.
⚙️ What Are Character Sets?
A character set is a list of characters placed inside square brackets [ ]. The pattern will match exactly one character from that list.
- Syntax:
[abc]matches eithera,b, orc. - Behavior: It matches only one character, but that character can be any one from the set.
- Case sensitivity: By default,
[abc]does not matchA,B, orC. You must include uppercase versions if needed.
Example: The pattern [aeiou] will match any single vowel (lowercase) in a string. If you search the word "hello", it will match the e and the o.
🕵️ Defining Ranges Inside Brackets
Instead of listing every character individually (e.g., [0123456789]), you can use a hyphen - to define a range.
- Digit range:
[0-9]matches any single digit from 0 to 9. - Letter range:
[a-z]matches any lowercase letter from a to z. - Mixed ranges:
[a-zA-Z]matches any single uppercase or lowercase letter. - Alphanumeric:
[a-zA-Z0-9]matches any single letter or digit.
Important rule: The hyphen is only treated as a range indicator when it appears between two characters. If you place it at the start or end of the set (e.g., [-abc] or [abc-]), it becomes a literal hyphen character.
📊 Comparison: Single Character vs. Character Set
| Pattern | What It Matches | Example Match in "cat1" |
|---|---|---|
c |
Only the letter c |
c |
[abc] |
Any one of a, b, or c |
c (first match from left) |
[0-9] |
Any single digit | 1 |
[a-z] |
Any single lowercase letter | c (first match from left) |
[cC] |
Either lowercase c or uppercase C |
c |
🛠️ Negating a Character Set
Sometimes you want to match any character except the ones listed. You do this by placing a caret ^ immediately after the opening bracket.
- Syntax:
[^abc]matches any character that is nota,b, orc. - Behavior: It will match spaces, digits, punctuation, and any other character not in the set.
- Important: The caret only negates when it is the first character inside the brackets. If placed elsewhere (e.g.,
[a^b]), it becomes a literal caret character.
Example: The pattern [^0-9] will match any character that is not a digit. In the string "Room 101", it would match R, o, o, m, and the space.
🧩 Common Use Cases for Engineers
- Validating input fields:
[a-zA-Z0-9]to ensure a username contains only letters and digits. - Parsing log files:
[0-9]{2}to match two-digit values like hours or days. - Filtering special characters:
[^a-zA-Z0-9]to find all non-alphanumeric characters in a string. - Matching specific delimiters:
[,;:]to match commas, semicolons, or colons in CSV-like data.
🎯 Practical Example: Matching a Date Component
Imagine you need to find the month number in a date string like "2024-03-15".
- Pattern:
[0-9]{2}would match20,24,03,15— too broad. - Better pattern:
-[0-9]{2}-would match-03-specifically. - Even more precise:
-[0-1][0-9]-would match months from-01-to-12-(first digit must be 0 or 1).
This shows how combining character sets with quantifiers (like {2}) gives you fine-grained control over what you extract.
✅ Key Takeaways
- Brackets
[...]define a set of allowed characters — the pattern matches if any one of them is present. - Use hyphens
-to define ranges like[0-9]or[a-z]for cleaner patterns. - Use caret
^at the start of the set to negate it:[^abc]matches anything excepta,b, orc. - Character sets match exactly one character — combine them with quantifiers (
+,*,{n}) to match multiple characters. - Always consider case sensitivity — include both uppercase and lowercase ranges if needed.
Mastering character sets is a foundational step toward building powerful regex patterns for data validation, log parsing, and text extraction tasks.
Interactive Views
You are currently in 📚 All-in-One mode. Use the tabs at the top to switch to 📖 Theory Only or 💻 Code Only views.
Brackets [] in regex define a character set that matches any single character listed inside the brackets.
🔧 Example 1: Matching a Single Character from a Set
This example checks if a string contains either the letter "a" or "b".
import re
pattern = r"[ab]"
text = "cat"
result = re.search(pattern, text)
print(result.group())
📤 Output: a
🔧 Example 2: Matching Any Digit from a Range
This example finds the first digit in a string using a range inside brackets.
import re
pattern = r"[0-9]"
text = "Room 42"
result = re.search(pattern, text)
print(result.group())
📤 Output: 4
🔧 Example 3: Matching Any Letter from a Range
This example finds the first lowercase letter in a string using a range inside brackets.
import re
pattern = r"[a-z]"
text = "HELLO world"
result = re.search(pattern, text)
print(result.group())
📤 Output: w
🔧 Example 4: Matching Characters NOT in a Set
This example finds the first character that is NOT a vowel.
import re
pattern = r"[^aeiou]"
text = "apple"
result = re.search(pattern, text)
print(result.group())
📤 Output: p
🔧 Example 5: Matching Multiple Character Sets in a Phone Number
This example extracts the area code from a phone number using a character set for digits.
import re
pattern = r"[0-9]{3}"
text = "Phone: 555-1234"
result = re.search(pattern, text)
print(result.group())
📤 Output: 555
📊 Comparison Table: Bracket Patterns
| Pattern | Matches | Example Input | Match Result |
|---|---|---|---|
[abc] |
a, b, or c | "cat" | "a" |
[0-9] |
any digit | "Room 42" | "4" |
[a-z] |
any lowercase letter | "HELLO world" | "w" |
[^aeiou] |
any non-vowel | "apple" | "p" |
[0-9]{3} |
exactly 3 digits | "555-1234" | "555" |