Handling Non-Standard Delimiters (Semicolons, Pipes)
๐ท๏ธ Structured Data Formats: JSON, YAML, and CSV / CSV In-Depth
When working with CSV files in real-world scenarios, you'll often encounter files that don't use commas as delimiters. Many systems export data using semicolons (;) or pipes (|) instead. This is especially common in European locales where commas are used as decimal separators, or in legacy systems that prefer pipe-delimited formats. Python's csv module handles these non-standard delimiters gracefully with just a small configuration change.
โ๏ธ Why Non-Standard Delimiters Exist
- Semicolons are commonly used when data contains commas (e.g., addresses like "New York, NY" or numbers like "1,234.56").
- Pipes are preferred when data may contain both commas and semicolons, as pipes rarely appear in normal text.
- Some enterprise systems use tabs or spaces as delimiters for specific export formats.
๐ ๏ธ Reading Files with Custom Delimiters
The key to handling non-standard delimiters is the delimiter parameter in the csv.reader function. Instead of the default comma, you specify the character that separates your fields.
For a semicolon-delimited file: - Create a reader object with csv.reader(file, delimiter=';') - Each row will be split at every semicolon, returning a list of fields - This works exactly like a comma-delimited file, just with a different separator
For a pipe-delimited file: - Use csv.reader(file, delimiter='|') - The pipe character acts as the field separator - All other CSV rules (quoting, escaping) still apply
๐ Comparison Table: Delimiter Types
| Delimiter | Python Parameter | Common Use Case | Example Row |
|---|---|---|---|
| Comma | delimiter=',' | Standard CSV format | Name,Age,City |
| Semicolon | delimiter=';' | European locales, comma-heavy data | Name;Age;City |
| Pipe | **delimiter=' | '** | Legacy systems, safe separator |
| Tab | delimiter='\t' | TSV files, spreadsheet exports | Name\tAge\tCity |
๐ต๏ธ Detecting Delimiters Automatically
When you don't know the delimiter in advance, you can use Python's csv.Sniffer class to detect it automatically:
- The Sniffer analyzes a sample of your file to determine the delimiter, quote character, and other formatting details
- Use sniffer.sniff(sample) where sample is a string of your file's content
- The returned dialect object contains a delimiter attribute that reveals the detected character
- This is extremely useful when processing files from unknown sources or when the delimiter may vary
โ๏ธ Writing Files with Custom Delimiters
Writing files with non-standard delimiters follows the same pattern as reading:
- Create a csv.writer object with the desired delimiter parameter
- Use writer.writerow() to write each row as a list of values
- The writer will automatically insert your chosen delimiter between fields
- You can also specify a lineterminator if you need non-standard line endings
๐งช Practical Tips for Delimiter Handling
- Always check the first few lines of a file manually to confirm the delimiter before writing code
- When using semicolons, be aware that some European CSV files also use commas as decimal separators within numbers
- For pipe-delimited files, ensure your data doesn't contain pipe characters, or use quoting to escape them
- The csv.Sniffer works best with a representative sample of at least a few hundred characters
- If your data contains the delimiter character within fields, wrap those fields in quotes (the csv module handles this automatically)
โ ๏ธ Common Pitfalls to Avoid
- Forgetting to specify the delimiter when reading a non-standard file will result in a single field per row (since no commas are found)
- Using the wrong delimiter character (e.g., a lowercase L instead of a pipe) will split data incorrectly
- Assuming all files from a system use the same delimiter โ always verify with a sample
- Mixing delimiters within the same file (e.g., some rows using commas, others using semicolons) will cause parsing errors
๐ฏ Summary
Handling non-standard delimiters in Python is straightforward once you understand the delimiter parameter. Whether you're working with semicolons, pipes, tabs, or any other character, the csv.reader and csv.writer functions adapt seamlessly. For unknown formats, the csv.Sniffer provides automatic detection, making your code robust enough to handle diverse data sources. Always test with sample data and verify your delimiter choice before processing large files.
This topic shows how to read and write CSV files that use semicolons (;) or pipes (|) instead of commas as delimiters.
๐ Example 1: Reading a CSV file with semicolon delimiter
This example reads a simple CSV file where columns are separated by semicolons.
import csv
with open("employees.csv", "r") as file:
reader = csv.reader(file, delimiter=";")
for row in reader:
print(row)
๐ค Output: ['Alice', 'Engineer', '60000'] ['Bob', 'Technician', '45000'] ['Carol', 'Analyst', '52000']
๐ Example 2: Reading a CSV file with pipe delimiter
This example reads a CSV file where columns are separated by pipe symbols.
import csv
with open("inventory.csv", "r") as file:
reader = csv.reader(file, delimiter="|")
for row in reader:
print(row)
๐ค Output: ['Item', 'Quantity', 'Price'] ['Widget', '150', '2.50'] ['Gadget', '75', '8.00'] ['Doodad', '200', '1.20']
๐ Example 3: Writing a CSV file with semicolon delimiter
This example writes data to a CSV file using semicolons as the delimiter.
import csv
data = [
["Name", "Department", "Salary"],
["Dave", "Engineering", 72000],
["Eve", "Marketing", 58000],
["Frank", "Sales", 63000]
]
with open("departments.csv", "w", newline="") as file:
writer = csv.writer(file, delimiter=";")
for row in data:
writer.writerow(row)
๐ค Output: File 'departments.csv' created with semicolon-separated values
๐ Example 4: Reading a semicolon-delimited file with header row
This example reads a semicolon-delimited file and accesses data by column name using DictReader.
import csv
with open("employees.csv", "r") as file:
reader = csv.DictReader(file, delimiter=";")
for row in reader:
print(row["Name"], "works as", row["Role"])
๐ค Output: Alice works as Engineer Bob works as Technician Carol works as Analyst
๐ Example 5: Converting a pipe-delimited file to a list of dictionaries
This example reads a pipe-delimited file and stores each row as a dictionary for easier data access.
import csv
records = []
with open("inventory.csv", "r") as file:
reader = csv.DictReader(file, delimiter="|")
for row in reader:
records.append(row)
for item in records:
print(f"{item['Item']}: {item['Quantity']} units at ${item['Price']} each")
๐ค Output: Widget: 150 units at $2.50 each Gadget: 75 units at $8.00 each Doodad: 200 units at $1.20 each
๐ Example 6: Writing a pipe-delimited file from a list of dictionaries
This example writes data from a list of dictionaries to a pipe-delimited CSV file.
import csv
data = [
{"Product": "Laptop", "Stock": 30, "Price": 899.99},
{"Product": "Mouse", "Stock": 120, "Price": 24.99},
{"Product": "Keyboard", "Stock": 85, "Price": 49.99}
]
with open("products.csv", "w", newline="") as file:
fieldnames = ["Product", "Stock", "Price"]
writer = csv.DictWriter(file, fieldnames=fieldnames, delimiter="|")
writer.writeheader()
for row in data:
writer.writerow(row)
๐ค Output: File 'products.csv' created with pipe-delimited columns
๐ Example 7: Handling mixed delimiters in a single file
This example reads a file that uses both semicolons and pipes in different sections by processing each line separately.
import csv
lines = [
"Name;Age;City",
"Grace|32|Boston",
"Henry;28|Dallas",
"Iris|35;Miami"
]
for line in lines:
if ";" in line and "|" in line:
# Handle mixed delimiter line
parts = line.replace("|", ";").split(";")
print(parts)
elif ";" in line:
print(line.split(";"))
elif "|" in line:
print(line.split("|"))
๐ค Output: ['Name', 'Age', 'City'] ['Grace', '32', 'Boston'] ['Henry', '28', 'Dallas'] ['Iris', '35', 'Miami']
Comparison Table: Delimiter Types
| Feature | Comma (,) |
Semicolon (;) |
Pipe (|) |
|---|---|---|---|
| Common use | Standard CSV | European locale data | Log files, system exports |
| Risk of conflict | High (data may contain commas) | Low | Very low |
| Readability | Good for simple data | Good when commas are in data | Excellent for complex data |
| Python parameter | delimiter="," (default) |
delimiter=";" |
delimiter="\|" |
When working with CSV files in real-world scenarios, you'll often encounter files that don't use commas as delimiters. Many systems export data using semicolons (;) or pipes (|) instead. This is especially common in European locales where commas are used as decimal separators, or in legacy systems that prefer pipe-delimited formats. Python's csv module handles these non-standard delimiters gracefully with just a small configuration change.
โ๏ธ Why Non-Standard Delimiters Exist
- Semicolons are commonly used when data contains commas (e.g., addresses like "New York, NY" or numbers like "1,234.56").
- Pipes are preferred when data may contain both commas and semicolons, as pipes rarely appear in normal text.
- Some enterprise systems use tabs or spaces as delimiters for specific export formats.
๐ ๏ธ Reading Files with Custom Delimiters
The key to handling non-standard delimiters is the delimiter parameter in the csv.reader function. Instead of the default comma, you specify the character that separates your fields.
For a semicolon-delimited file: - Create a reader object with csv.reader(file, delimiter=';') - Each row will be split at every semicolon, returning a list of fields - This works exactly like a comma-delimited file, just with a different separator
For a pipe-delimited file: - Use csv.reader(file, delimiter='|') - The pipe character acts as the field separator - All other CSV rules (quoting, escaping) still apply
๐ Comparison Table: Delimiter Types
| Delimiter | Python Parameter | Common Use Case | Example Row |
|---|---|---|---|
| Comma | delimiter=',' | Standard CSV format | Name,Age,City |
| Semicolon | delimiter=';' | European locales, comma-heavy data | Name;Age;City |
| Pipe | **delimiter=' | '** | Legacy systems, safe separator |
| Tab | delimiter='\t' | TSV files, spreadsheet exports | Name\tAge\tCity |
๐ต๏ธ Detecting Delimiters Automatically
When you don't know the delimiter in advance, you can use Python's csv.Sniffer class to detect it automatically:
- The Sniffer analyzes a sample of your file to determine the delimiter, quote character, and other formatting details
- Use sniffer.sniff(sample) where sample is a string of your file's content
- The returned dialect object contains a delimiter attribute that reveals the detected character
- This is extremely useful when processing files from unknown sources or when the delimiter may vary
โ๏ธ Writing Files with Custom Delimiters
Writing files with non-standard delimiters follows the same pattern as reading:
- Create a csv.writer object with the desired delimiter parameter
- Use writer.writerow() to write each row as a list of values
- The writer will automatically insert your chosen delimiter between fields
- You can also specify a lineterminator if you need non-standard line endings
๐งช Practical Tips for Delimiter Handling
- Always check the first few lines of a file manually to confirm the delimiter before writing code
- When using semicolons, be aware that some European CSV files also use commas as decimal separators within numbers
- For pipe-delimited files, ensure your data doesn't contain pipe characters, or use quoting to escape them
- The csv.Sniffer works best with a representative sample of at least a few hundred characters
- If your data contains the delimiter character within fields, wrap those fields in quotes (the csv module handles this automatically)
โ ๏ธ Common Pitfalls to Avoid
- Forgetting to specify the delimiter when reading a non-standard file will result in a single field per row (since no commas are found)
- Using the wrong delimiter character (e.g., a lowercase L instead of a pipe) will split data incorrectly
- Assuming all files from a system use the same delimiter โ always verify with a sample
- Mixing delimiters within the same file (e.g., some rows using commas, others using semicolons) will cause parsing errors
๐ฏ Summary
Handling non-standard delimiters in Python is straightforward once you understand the delimiter parameter. Whether you're working with semicolons, pipes, tabs, or any other character, the csv.reader and csv.writer functions adapt seamlessly. For unknown formats, the csv.Sniffer provides automatic detection, making your code robust enough to handle diverse data sources. Always test with sample data and verify your delimiter choice before processing large files.
Interactive Views
You are currently in ๐ All-in-One mode. Use the tabs at the top to switch to ๐ Theory Only or ๐ป Code Only views.
This topic shows how to read and write CSV files that use semicolons (;) or pipes (|) instead of commas as delimiters.
๐ Example 1: Reading a CSV file with semicolon delimiter
This example reads a simple CSV file where columns are separated by semicolons.
import csv
with open("employees.csv", "r") as file:
reader = csv.reader(file, delimiter=";")
for row in reader:
print(row)
๐ค Output: ['Alice', 'Engineer', '60000'] ['Bob', 'Technician', '45000'] ['Carol', 'Analyst', '52000']
๐ Example 2: Reading a CSV file with pipe delimiter
This example reads a CSV file where columns are separated by pipe symbols.
import csv
with open("inventory.csv", "r") as file:
reader = csv.reader(file, delimiter="|")
for row in reader:
print(row)
๐ค Output: ['Item', 'Quantity', 'Price'] ['Widget', '150', '2.50'] ['Gadget', '75', '8.00'] ['Doodad', '200', '1.20']
๐ Example 3: Writing a CSV file with semicolon delimiter
This example writes data to a CSV file using semicolons as the delimiter.
import csv
data = [
["Name", "Department", "Salary"],
["Dave", "Engineering", 72000],
["Eve", "Marketing", 58000],
["Frank", "Sales", 63000]
]
with open("departments.csv", "w", newline="") as file:
writer = csv.writer(file, delimiter=";")
for row in data:
writer.writerow(row)
๐ค Output: File 'departments.csv' created with semicolon-separated values
๐ Example 4: Reading a semicolon-delimited file with header row
This example reads a semicolon-delimited file and accesses data by column name using DictReader.
import csv
with open("employees.csv", "r") as file:
reader = csv.DictReader(file, delimiter=";")
for row in reader:
print(row["Name"], "works as", row["Role"])
๐ค Output: Alice works as Engineer Bob works as Technician Carol works as Analyst
๐ Example 5: Converting a pipe-delimited file to a list of dictionaries
This example reads a pipe-delimited file and stores each row as a dictionary for easier data access.
import csv
records = []
with open("inventory.csv", "r") as file:
reader = csv.DictReader(file, delimiter="|")
for row in reader:
records.append(row)
for item in records:
print(f"{item['Item']}: {item['Quantity']} units at ${item['Price']} each")
๐ค Output: Widget: 150 units at $2.50 each Gadget: 75 units at $8.00 each Doodad: 200 units at $1.20 each
๐ Example 6: Writing a pipe-delimited file from a list of dictionaries
This example writes data from a list of dictionaries to a pipe-delimited CSV file.
import csv
data = [
{"Product": "Laptop", "Stock": 30, "Price": 899.99},
{"Product": "Mouse", "Stock": 120, "Price": 24.99},
{"Product": "Keyboard", "Stock": 85, "Price": 49.99}
]
with open("products.csv", "w", newline="") as file:
fieldnames = ["Product", "Stock", "Price"]
writer = csv.DictWriter(file, fieldnames=fieldnames, delimiter="|")
writer.writeheader()
for row in data:
writer.writerow(row)
๐ค Output: File 'products.csv' created with pipe-delimited columns
๐ Example 7: Handling mixed delimiters in a single file
This example reads a file that uses both semicolons and pipes in different sections by processing each line separately.
import csv
lines = [
"Name;Age;City",
"Grace|32|Boston",
"Henry;28|Dallas",
"Iris|35;Miami"
]
for line in lines:
if ";" in line and "|" in line:
# Handle mixed delimiter line
parts = line.replace("|", ";").split(";")
print(parts)
elif ";" in line:
print(line.split(";"))
elif "|" in line:
print(line.split("|"))
๐ค Output: ['Name', 'Age', 'City'] ['Grace', '32', 'Boston'] ['Henry', '28', 'Dallas'] ['Iris', '35', 'Miami']
Comparison Table: Delimiter Types
| Feature | Comma (,) |
Semicolon (;) |
Pipe (|) |
|---|---|---|---|
| Common use | Standard CSV | European locale data | Log files, system exports |
| Risk of conflict | High (data may contain commas) | Low | Very low |
| Readability | Good for simple data | Good when commas are in data | Excellent for complex data |
| Python parameter | delimiter="," (default) |
delimiter=";" |
delimiter="\|" |