Matching Character Sets Using Brackets

🏷️ Regular Expressions (Regex) / Basic Regex Patterns

When working with text data, you often need to match not just a single specific character, but any character from a group of allowed options. This is where bracket notation [...] becomes incredibly useful. Instead of writing multiple patterns for each possible character, you define a character set inside square brackets, and the regex engine will match if it finds any one of those characters at that position.


⚙️ What Are Character Sets?

A character set is a list of characters placed inside square brackets [ ]. The pattern will match exactly one character from that list.

  • Syntax: [abc] matches either a, b, or c.
  • Behavior: It matches only one character, but that character can be any one from the set.
  • Case sensitivity: By default, [abc] does not match A, B, or C. You must include uppercase versions if needed.

Example: The pattern [aeiou] will match any single vowel (lowercase) in a string. If you search the word "hello", it will match the e and the o.


🕵️ Defining Ranges Inside Brackets

Instead of listing every character individually (e.g., [0123456789]), you can use a hyphen - to define a range.

  • Digit range: [0-9] matches any single digit from 0 to 9.
  • Letter range: [a-z] matches any lowercase letter from a to z.
  • Mixed ranges: [a-zA-Z] matches any single uppercase or lowercase letter.
  • Alphanumeric: [a-zA-Z0-9] matches any single letter or digit.

Important rule: The hyphen is only treated as a range indicator when it appears between two characters. If you place it at the start or end of the set (e.g., [-abc] or [abc-]), it becomes a literal hyphen character.


📊 Comparison: Single Character vs. Character Set

Pattern What It Matches Example Match in "cat1"
c Only the letter c c
[abc] Any one of a, b, or c c (first match from left)
[0-9] Any single digit 1
[a-z] Any single lowercase letter c (first match from left)
[cC] Either lowercase c or uppercase C c

🛠️ Negating a Character Set

Sometimes you want to match any character except the ones listed. You do this by placing a caret ^ immediately after the opening bracket.

  • Syntax: [^abc] matches any character that is not a, b, or c.
  • Behavior: It will match spaces, digits, punctuation, and any other character not in the set.
  • Important: The caret only negates when it is the first character inside the brackets. If placed elsewhere (e.g., [a^b]), it becomes a literal caret character.

Example: The pattern [^0-9] will match any character that is not a digit. In the string "Room 101", it would match R, o, o, m, and the space.


🧩 Common Use Cases for Engineers

  • Validating input fields: [a-zA-Z0-9] to ensure a username contains only letters and digits.
  • Parsing log files: [0-9]{2} to match two-digit values like hours or days.
  • Filtering special characters: [^a-zA-Z0-9] to find all non-alphanumeric characters in a string.
  • Matching specific delimiters: [,;:] to match commas, semicolons, or colons in CSV-like data.

🎯 Practical Example: Matching a Date Component

Imagine you need to find the month number in a date string like "2024-03-15".

  • Pattern: [0-9]{2} would match 20, 24, 03, 15 — too broad.
  • Better pattern: -[0-9]{2}- would match -03- specifically.
  • Even more precise: -[0-1][0-9]- would match months from -01- to -12- (first digit must be 0 or 1).

This shows how combining character sets with quantifiers (like {2}) gives you fine-grained control over what you extract.


✅ Key Takeaways

  • Brackets [...] define a set of allowed characters — the pattern matches if any one of them is present.
  • Use hyphens - to define ranges like [0-9] or [a-z] for cleaner patterns.
  • Use caret ^ at the start of the set to negate it: [^abc] matches anything except a, b, or c.
  • Character sets match exactly one character — combine them with quantifiers (+, *, {n}) to match multiple characters.
  • Always consider case sensitivity — include both uppercase and lowercase ranges if needed.

Mastering character sets is a foundational step toward building powerful regex patterns for data validation, log parsing, and text extraction tasks.


Brackets [] in regex define a character set that matches any single character listed inside the brackets.


🔧 Example 1: Matching a Single Character from a Set

This example checks if a string contains either the letter "a" or "b".

import re

pattern = r"[ab]"
text = "cat"
result = re.search(pattern, text)

print(result.group())

📤 Output: a


🔧 Example 2: Matching Any Digit from a Range

This example finds the first digit in a string using a range inside brackets.

import re

pattern = r"[0-9]"
text = "Room 42"
result = re.search(pattern, text)

print(result.group())

📤 Output: 4


🔧 Example 3: Matching Any Letter from a Range

This example finds the first lowercase letter in a string using a range inside brackets.

import re

pattern = r"[a-z]"
text = "HELLO world"
result = re.search(pattern, text)

print(result.group())

📤 Output: w


🔧 Example 4: Matching Characters NOT in a Set

This example finds the first character that is NOT a vowel.

import re

pattern = r"[^aeiou]"
text = "apple"
result = re.search(pattern, text)

print(result.group())

📤 Output: p


🔧 Example 5: Matching Multiple Character Sets in a Phone Number

This example extracts the area code from a phone number using a character set for digits.

import re

pattern = r"[0-9]{3}"
text = "Phone: 555-1234"
result = re.search(pattern, text)

print(result.group())

📤 Output: 555


📊 Comparison Table: Bracket Patterns

Pattern Matches Example Input Match Result
[abc] a, b, or c "cat" "a"
[0-9] any digit "Room 42" "4"
[a-z] any lowercase letter "HELLO world" "w"
[^aeiou] any non-vowel "apple" "p"
[0-9]{3} exactly 3 digits "555-1234" "555"

When working with text data, you often need to match not just a single specific character, but any character from a group of allowed options. This is where bracket notation [...] becomes incredibly useful. Instead of writing multiple patterns for each possible character, you define a character set inside square brackets, and the regex engine will match if it finds any one of those characters at that position.


⚙️ What Are Character Sets?

A character set is a list of characters placed inside square brackets [ ]. The pattern will match exactly one character from that list.

  • Syntax: [abc] matches either a, b, or c.
  • Behavior: It matches only one character, but that character can be any one from the set.
  • Case sensitivity: By default, [abc] does not match A, B, or C. You must include uppercase versions if needed.

Example: The pattern [aeiou] will match any single vowel (lowercase) in a string. If you search the word "hello", it will match the e and the o.


🕵️ Defining Ranges Inside Brackets

Instead of listing every character individually (e.g., [0123456789]), you can use a hyphen - to define a range.

  • Digit range: [0-9] matches any single digit from 0 to 9.
  • Letter range: [a-z] matches any lowercase letter from a to z.
  • Mixed ranges: [a-zA-Z] matches any single uppercase or lowercase letter.
  • Alphanumeric: [a-zA-Z0-9] matches any single letter or digit.

Important rule: The hyphen is only treated as a range indicator when it appears between two characters. If you place it at the start or end of the set (e.g., [-abc] or [abc-]), it becomes a literal hyphen character.


📊 Comparison: Single Character vs. Character Set

Pattern What It Matches Example Match in "cat1"
c Only the letter c c
[abc] Any one of a, b, or c c (first match from left)
[0-9] Any single digit 1
[a-z] Any single lowercase letter c (first match from left)
[cC] Either lowercase c or uppercase C c

🛠️ Negating a Character Set

Sometimes you want to match any character except the ones listed. You do this by placing a caret ^ immediately after the opening bracket.

  • Syntax: [^abc] matches any character that is not a, b, or c.
  • Behavior: It will match spaces, digits, punctuation, and any other character not in the set.
  • Important: The caret only negates when it is the first character inside the brackets. If placed elsewhere (e.g., [a^b]), it becomes a literal caret character.

Example: The pattern [^0-9] will match any character that is not a digit. In the string "Room 101", it would match R, o, o, m, and the space.


🧩 Common Use Cases for Engineers

  • Validating input fields: [a-zA-Z0-9] to ensure a username contains only letters and digits.
  • Parsing log files: [0-9]{2} to match two-digit values like hours or days.
  • Filtering special characters: [^a-zA-Z0-9] to find all non-alphanumeric characters in a string.
  • Matching specific delimiters: [,;:] to match commas, semicolons, or colons in CSV-like data.

🎯 Practical Example: Matching a Date Component

Imagine you need to find the month number in a date string like "2024-03-15".

  • Pattern: [0-9]{2} would match 20, 24, 03, 15 — too broad.
  • Better pattern: -[0-9]{2}- would match -03- specifically.
  • Even more precise: -[0-1][0-9]- would match months from -01- to -12- (first digit must be 0 or 1).

This shows how combining character sets with quantifiers (like {2}) gives you fine-grained control over what you extract.


✅ Key Takeaways

  • Brackets [...] define a set of allowed characters — the pattern matches if any one of them is present.
  • Use hyphens - to define ranges like [0-9] or [a-z] for cleaner patterns.
  • Use caret ^ at the start of the set to negate it: [^abc] matches anything except a, b, or c.
  • Character sets match exactly one character — combine them with quantifiers (+, *, {n}) to match multiple characters.
  • Always consider case sensitivity — include both uppercase and lowercase ranges if needed.

Mastering character sets is a foundational step toward building powerful regex patterns for data validation, log parsing, and text extraction tasks.

Interactive Views

You are currently in 📚 All-in-One mode. Use the tabs at the top to switch to 📖 Theory Only or 💻 Code Only views.

Brackets [] in regex define a character set that matches any single character listed inside the brackets.


🔧 Example 1: Matching a Single Character from a Set

This example checks if a string contains either the letter "a" or "b".

import re

pattern = r"[ab]"
text = "cat"
result = re.search(pattern, text)

print(result.group())

📤 Output: a


🔧 Example 2: Matching Any Digit from a Range

This example finds the first digit in a string using a range inside brackets.

import re

pattern = r"[0-9]"
text = "Room 42"
result = re.search(pattern, text)

print(result.group())

📤 Output: 4


🔧 Example 3: Matching Any Letter from a Range

This example finds the first lowercase letter in a string using a range inside brackets.

import re

pattern = r"[a-z]"
text = "HELLO world"
result = re.search(pattern, text)

print(result.group())

📤 Output: w


🔧 Example 4: Matching Characters NOT in a Set

This example finds the first character that is NOT a vowel.

import re

pattern = r"[^aeiou]"
text = "apple"
result = re.search(pattern, text)

print(result.group())

📤 Output: p


🔧 Example 5: Matching Multiple Character Sets in a Phone Number

This example extracts the area code from a phone number using a character set for digits.

import re

pattern = r"[0-9]{3}"
text = "Phone: 555-1234"
result = re.search(pattern, text)

print(result.group())

📤 Output: 555


📊 Comparison Table: Bracket Patterns

Pattern Matches Example Input Match Result
[abc] a, b, or c "cat" "a"
[0-9] any digit "Room 42" "4"
[a-z] any lowercase letter "HELLO world" "w"
[^aeiou] any non-vowel "apple" "p"
[0-9]{3} exactly 3 digits "555-1234" "555"