Handling Non-Standard Delimiters (Semicolons, Pipes)

🏷️ Structured Data Formats: JSON, YAML, and CSV / CSV In-Depth

📚 All-in-One📖 Theory Only💻 Code Only

When working with CSV files in real-world scenarios, you'll often encounter files that don't use commas as delimiters. Many systems export data using semicolons (;) or pipes (|) instead. This is especially common in European locales where commas are used as decimal separators, or in legacy systems that prefer pipe-delimited formats. Python's csv module handles these non-standard delimiters gracefully with just a small configuration change.

⚙️ Why Non-Standard Delimiters Exist

Semicolons are commonly used when data contains commas (e.g., addresses like "New York, NY" or numbers like "1,234.56").
Pipes are preferred when data may contain both commas and semicolons, as pipes rarely appear in normal text.
Some enterprise systems use tabs or spaces as delimiters for specific export formats.

🛠️ Reading Files with Custom Delimiters

The key to handling non-standard delimiters is the delimiter parameter in the csv.reader function. Instead of the default comma, you specify the character that separates your fields.

For a semicolon-delimited file: - Create a reader object with csv.reader(file, delimiter=';') - Each row will be split at every semicolon, returning a list of fields - This works exactly like a comma-delimited file, just with a different separator

For a pipe-delimited file: - Use csv.reader(file, delimiter='|') - The pipe character acts as the field separator - All other CSV rules (quoting, escaping) still apply

📊 Comparison Table: Delimiter Types

Delimiter	Python Parameter	Common Use Case	Example Row
Comma	delimiter=','	Standard CSV format	Name,Age,City
Semicolon	delimiter=';'	European locales, comma-heavy data	Name;Age;City
Pipe	**delimiter='	'**	Legacy systems, safe separator
Tab	delimiter='\t'	TSV files, spreadsheet exports	Name\tAge\tCity

🕵️ Detecting Delimiters Automatically

When you don't know the delimiter in advance, you can use Python's csv.Sniffer class to detect it automatically:

The Sniffer analyzes a sample of your file to determine the delimiter, quote character, and other formatting details
Use sniffer.sniff(sample) where sample is a string of your file's content
The returned dialect object contains a delimiter attribute that reveals the detected character
This is extremely useful when processing files from unknown sources or when the delimiter may vary

✍️ Writing Files with Custom Delimiters

Writing files with non-standard delimiters follows the same pattern as reading:

Create a csv.writer object with the desired delimiter parameter
Use writer.writerow() to write each row as a list of values
The writer will automatically insert your chosen delimiter between fields
You can also specify a lineterminator if you need non-standard line endings

🧪 Practical Tips for Delimiter Handling

Always check the first few lines of a file manually to confirm the delimiter before writing code
When using semicolons, be aware that some European CSV files also use commas as decimal separators within numbers
For pipe-delimited files, ensure your data doesn't contain pipe characters, or use quoting to escape them
The csv.Sniffer works best with a representative sample of at least a few hundred characters
If your data contains the delimiter character within fields, wrap those fields in quotes (the csv module handles this automatically)

⚠️ Common Pitfalls to Avoid

Forgetting to specify the delimiter when reading a non-standard file will result in a single field per row (since no commas are found)
Using the wrong delimiter character (e.g., a lowercase L instead of a pipe) will split data incorrectly
Assuming all files from a system use the same delimiter — always verify with a sample
Mixing delimiters within the same file (e.g., some rows using commas, others using semicolons) will cause parsing errors

🎯 Summary

Handling non-standard delimiters in Python is straightforward once you understand the delimiter parameter. Whether you're working with semicolons, pipes, tabs, or any other character, the csv.reader and csv.writer functions adapt seamlessly. For unknown formats, the csv.Sniffer provides automatic detection, making your code robust enough to handle diverse data sources. Always test with sample data and verify your delimiter choice before processing large files.

This topic shows how to read and write CSV files that use semicolons (;) or pipes (|) instead of commas as delimiters.

📘 Example 1: Reading a CSV file with semicolon delimiter

This example reads a simple CSV file where columns are separated by semicolons.

import csv

with open("employees.csv", "r") as file:
    reader = csv.reader(file, delimiter=";")
    for row in reader:
        print(row)

📤 Output: ['Alice', 'Engineer', '60000'] ['Bob', 'Technician', '45000'] ['Carol', 'Analyst', '52000']

📘 Example 2: Reading a CSV file with pipe delimiter

This example reads a CSV file where columns are separated by pipe symbols.

import csv

with open("inventory.csv", "r") as file:
    reader = csv.reader(file, delimiter="|")
    for row in reader:
        print(row)

📤 Output: ['Item', 'Quantity', 'Price'] ['Widget', '150', '2.50'] ['Gadget', '75', '8.00'] ['Doodad', '200', '1.20']

📘 Example 3: Writing a CSV file with semicolon delimiter

This example writes data to a CSV file using semicolons as the delimiter.

import csv

data = [
    ["Name", "Department", "Salary"],
    ["Dave", "Engineering", 72000],
    ["Eve", "Marketing", 58000],
    ["Frank", "Sales", 63000]
]

with open("departments.csv", "w", newline="") as file:
    writer = csv.writer(file, delimiter=";")
    for row in data:
        writer.writerow(row)

📤 Output: File 'departments.csv' created with semicolon-separated values

📘 Example 4: Reading a semicolon-delimited file with header row

This example reads a semicolon-delimited file and accesses data by column name using DictReader.

import csv

with open("employees.csv", "r") as file:
    reader = csv.DictReader(file, delimiter=";")
    for row in reader:
        print(row["Name"], "works as", row["Role"])

📤 Output: Alice works as Engineer Bob works as Technician Carol works as Analyst

📘 Example 5: Converting a pipe-delimited file to a list of dictionaries

This example reads a pipe-delimited file and stores each row as a dictionary for easier data access.

import csv

records = []

with open("inventory.csv", "r") as file:
    reader = csv.DictReader(file, delimiter="|")
    for row in reader:
        records.append(row)

for item in records:
    print(f"{item['Item']}: {item['Quantity']} units at ${item['Price']} each")

📤 Output: Widget: 150 units at $2.50 each Gadget: 75 units at $8.00 each Doodad: 200 units at $1.20 each

📘 Example 6: Writing a pipe-delimited file from a list of dictionaries

This example writes data from a list of dictionaries to a pipe-delimited CSV file.

import csv

data = [
    {"Product": "Laptop", "Stock": 30, "Price": 899.99},
    {"Product": "Mouse", "Stock": 120, "Price": 24.99},
    {"Product": "Keyboard", "Stock": 85, "Price": 49.99}
]

with open("products.csv", "w", newline="") as file:
    fieldnames = ["Product", "Stock", "Price"]
    writer = csv.DictWriter(file, fieldnames=fieldnames, delimiter="|")
    writer.writeheader()
    for row in data:
        writer.writerow(row)

📤 Output: File 'products.csv' created with pipe-delimited columns

📘 Example 7: Handling mixed delimiters in a single file

This example reads a file that uses both semicolons and pipes in different sections by processing each line separately.

import csv

lines = [
    "Name;Age;City",
    "Grace|32|Boston",
    "Henry;28|Dallas",
    "Iris|35;Miami"
]

for line in lines:
    if ";" in line and "|" in line:
        # Handle mixed delimiter line
        parts = line.replace("|", ";").split(";")
        print(parts)
    elif ";" in line:
        print(line.split(";"))
    elif "|" in line:
        print(line.split("|"))

📤 Output: ['Name', 'Age', 'City'] ['Grace', '32', 'Boston'] ['Henry', '28', 'Dallas'] ['Iris', '35', 'Miami']

Comparison Table: Delimiter Types

Feature	Comma (`,`)	Semicolon (`;`)	Pipe (`\|`)
Common use	Standard CSV	European locale data	Log files, system exports
Risk of conflict	High (data may contain commas)	Low	Very low
Readability	Good for simple data	Good when commas are in data	Excellent for complex data
Python parameter	`delimiter=","` (default)	`delimiter=";"`	`delimiter="\\|"`