Back to Scaling Python Applications guides

Working with CSV Files in Python

Stanley Ulili
Updated on April 30, 2025

CSV (Comma-Separated Values) files are the backbone of data exchange. Their simple, tabular structure makes them perfect for data analysis, configuration storage, and sharing information between systems.

Python provides efficient tools for working with CSV data through its standard library and specialized packages.

This guide will show you how to handle CSV files in Python, from basic reading to advanced data manipulation.

Prerequisites

Make sure you have Python 3.7 or later installed. You should know Python basics like lists, dictionaries, and file operations to get the most from this guide.

Getting started with CSV in Python

Python's csv module has everything you need to process CSV files. It's part of the standard library, so you don't need to install any additional software.

As shown in the diagram below, working with CSV files in Python follows a simple flow:

Diagram showing CSV data flow through Python processing: input files are processed by Python's CSV module or pandas library, then output as analysis results or transformed data.

This workflow makes Python an excellent choice for CSV processing, whether you're doing simple data transformations or complex analysis.

To begin, create a project directory:

 
mkdir csv-python && cd csv-python

Next, set up a virtual environment:

 
python3 -m venv venv

Activate the virtual environment:

 
source venv/bin/activate  

Now, create a file named app.py that you'll use throughout this tutorial.

Reading CSV files in Python

Reading data from CSV files is often your first step in data analysis. The csv module makes this straightforward.

First, create a sample CSV file named employees.csv with this data:

employees.csv
id,name,department,salary
1,John Smith,Engineering,75000
2,Sara Johnson,Marketing,82000
3,Michael Brown,Finance,95000
4,Jessica Williams,Human Resources,67000
5,Robert Davis,Engineering,78000

Now add this code to your app.py file:

app.py
import csv

# Reading a CSV file using csv.reader
with open('employees.csv', 'r', newline='') as file:
    csv_reader = csv.reader(file)

    # The first row typically contains headers
    headers = next(csv_reader)
    print(f"Headers: {headers}")

    # Print each row of data
    for row in csv_reader:
        print(f"Row: {row}")
        print(f"  Employee: {row[1]} works in {row[2]} and earns ${row[3]}")

The csv.reader() function gives you an iterator that returns each row as a list of strings. The next(csv_reader) line extracts the header row before processing the data rows.

Run your script:

 
python app.py
Output
Headers: ['id', 'name', 'department', 'salary']
Row: ['1', 'John Smith', 'Engineering', '75000']
  Employee: John Smith works in Engineering and earns $75000
Row: ['2', 'Sara Johnson', 'Marketing', '82000']
  Employee: Sara Johnson works in Marketing and earns $82000
...
Row: ['5', 'Robert Davis', 'Engineering', '78000']
  Employee: Robert Davis works in Engineering and earns $78000

Notice the newline='' parameter when opening the file—this prevents issues with different line endings across operating systems.

Working with CSV data as dictionaries

While lists work for simple files, accessing data by column name instead of index position makes your code more readable. The csv.DictReader class does this by mapping each row to a dictionary using the header row as keys.

Side-by-side comparison of CSV reading methods in Python. Left: `csv.reader` converts rows to lists with numeric index access. Right: `csv.DictReader` converts rows to dictionaries with named field access.

Update your app.py with this code:

app.py
import csv

# Reading a CSV file using csv.reader
with open('employees.csv', 'r', newline='') as file:
    csv_reader = csv.reader(file)

    # The first row typically contains headers
    headers = next(csv_reader)
    print(f"Headers: {headers}")

    # Print each row of data
    for row in csv_reader:
        print(f"Row: {row}")
        print(f"  Employee: {row[1]} works in {row[2]} and earns ${row[3]}")

# Reading a CSV file using csv.DictReader
with open('employees.csv', 'r', newline='') as file:
dict_reader = csv.DictReader(file)
# Print the fieldnames (headers)
print(f"\nField names: {dict_reader.fieldnames}")
# Process each row as a dictionary
for row in dict_reader:
print(f"Row: {row}")
print(f" {row['name']} works in {row['department']} and earns ${row['salary']}")
# Perform calculations or conditional operations directly with column names
if int(row['salary']) > 80000:
print(f" {row['name']} is a high earner!")

Using DictReader gives you several advantages:

  • You access data by meaningful column names instead of numeric indices
  • Your code becomes more resilient to changes in column order
  • Your logic reflects the actual structure of your data
  • Extracting specific columns becomes more intuitive

Run the file again:

 
python app.py
Output
Headers: ['id', 'name', 'department', 'salary']
...

Field names: ['id', 'name', 'department', 'salary']
Row: {'id': '1', 'name': 'John Smith', 'department': 'Engineering', 'salary': '75000'}
  John Smith works in Engineering and earns $75000
Row: {'id': '2', 'name': 'Sara Johnson', 'department': 'Marketing', 'salary': '82000'}
  Sara Johnson works in Marketing and earns $82000
  Sara Johnson is a high earner!
...
 Jessica Williams works in Human Resources and earns $67000
Row: {'id': '5', 'name': 'Robert Davis', 'department': 'Engineering', 'salary': '78000'}
  Robert Davis works in Engineering and earns $78000

This name-based access makes your code more readable and less prone to errors when column positions change.

Writing data to CSV files

Creating CSV files is just as important as reading them. Python's csv module gives you two approaches: csv.writer for list-based data and csv.DictWriter for dictionary-based data.

Update your app.py with this code:

app.py
import csv

# Sample data to write to CSV
employees = [
    {'id': 101, 'name': 'Alice Cooper', 'dept': 'Engineering', 'salary': 85000},
    {'id': 102, 'name': 'Bob Martinez', 'dept': 'Marketing', 'salary': 72000},
    {'id': 103, 'name': 'Charlie Lee', 'dept': 'Finance', 'salary': 93000}
]

# Using csv.writer (list-based approach)
with open('team.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(['ID', 'Name', 'Department', 'Salary'])

    for emp in employees:
        writer.writerow([emp['id'], emp['name'], emp['dept'], emp['salary']])

print("CSV written successfully!")

In this code, you create a sample dataset of employees, each represented as a dictionary containing their id, name, department, and salary.

The code uses csv.writer to write this data to a CSV file in list format. You first write a header row and then loop through the employee data to write each row as a list of values.

Run the script:

 
python app.py
Output
CSV written successfully!

Check the output file:

 
cat team.csv
Output
ID,Name,Department,Salary
101,Alice Cooper,Engineering,85000
102,Bob Martinez,Marketing,72000
103,Charlie Lee,Finance,93000

Alternatively, you can use csv.DictWriter for a more dictionary-oriented approach:

 
# Using csv.DictWriter (dictionary-based approach)
with open('team_dict.csv', 'w', newline='') as file:
    fields = ['id', 'name', 'dept', 'salary', 'category']
    writer = csv.DictWriter(file, fieldnames=fields)
    writer.writeheader()

    for emp in employees:
        # Add a calculated field
        category = 'Senior' if emp['salary'] >= 80000 else 'Junior'
        # Create a row with the additional field
        row = {**emp, 'category': category}
        writer.writerow(row)

The key benefits of DictWriter include:

  • It generates headers automatically with writeheader()
  • It maintains field order with the fieldnames parameter
  • You can easily add calculated fields
  • You can write dictionaries directly without manual ordering

Handling CSV dialects and formats

CSV isn't a standardized format. Different systems use variations in delimiters, quoting, and escape characters. Python's csv module handles these through "dialects."

Try these format variations:

app.py
import csv

# Sample data
sample_data = [
    ['Product', 'Price', 'Description'],
    ['Laptop', '999.99', 'High-performance laptop'],
    ['Keyboard', '87.99', 'Mechanical keyboard with "Cherry MX" switches']
]

# Standard CSV (comma-separated)
with open('products_comma.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(sample_data)

# Tab-separated values (TSV)
with open('products_tab.csv', 'w', newline='') as file:
    writer = csv.writer(file, delimiter='\t')
    writer.writerows(sample_data)

print("Created comma and tab-separated files")

This code creates two different CSV files from the same data. The first uses the standard comma delimiter, while the second uses tabs instead. Notice how you only need to change the delimiter parameter to switch formats.

Run the script:

 
python app.py
Output
Created comma and tab-separated files

You can compare the two formats:

 
cat products_comma.csv
Output
Product,Price,Description
Laptop,999.99,High-performance laptop
Keyboard,87.99,"Mechanical keyboard with ""Cherry MX"" switches"
 
cat products_tab.csv
Output
Product Price   Description
Laptop  999.99  High-performance laptop
Keyboard        87.99   "Mechanical keyboard with ""Cherry MX"" switches"

Notice how quotes are handled differently in the two formats. The comma-separated format needs to escape the quotes in "Cherry MX" with double quotes, while the tab-separated format doesn't have this requirement.

For other common formats, modify dialect parameters as needed:

 
# Semicolon-separated (common in European locales)
writer = csv.writer(file, delimiter=';')

# Custom dialect with different quoting behavior
csv.register_dialect('custom', 
                     delimiter=',',
                     quotechar='\'',
                     quoting=csv.QUOTE_NONNUMERIC)
writer = csv.writer(file, dialect='custom')

The first part of the code shows how to create a semicolon-separated file, common in countries that use commas for decimal points.

The second creates a custom dialect that uses single quotes and applies them to all non-numeric fields.

Key dialect parameters include:

Parameter Description Common Values
delimiter Field separator , (default), ;, \t (tab)
quotechar Quoting character " (default), '
quoting Quoting style csv.QUOTE_MINIMAL, csv.QUOTE_ALL
escapechar Character for escaping \

When working with unknown CSV formats, use csv.Sniffer to detect the dialect automatically:

 
with open('unknown_format.csv', 'r', newline='') as file:
    sample = file.read(1024)
    file.seek(0)
    dialect = csv.Sniffer().sniff(sample)
    reader = csv.reader(file, dialect)
    # Now process the file with the detected dialect

This code reads a small sample of the file, uses csv.Sniffer().sniff() to analyze its format, then creates a reader with the detected dialect. This is especially useful when processing files from various sources where you don't know the exact format in advance.

These parameters help you handle almost any CSV variation you encounter.

Validating and cleaning CSV data

Real-world CSV files often contain inconsistencies, errors, and missing values. While you could write validation code from scratch, Python provides specialized tools that simplify this process significantly.

Pandas is a powerful data analysis library that excels at handling tabular data like CSV files. Before proceeding, you'll need to install it:

 
pip install pandas

Now, create a new file named validation.py to check your team.csv file using pandas for validation:

validation.py
import pandas as pd

# Load CSV into a DataFrame - pandas' core data structure
# The dtype parameter validates data types during import
df = pd.read_csv('team.csv', 
                 dtype={
                     'ID': int,
                     'Name': str,
                     'Department': str,
                     'Salary': float
                 })

# Display the first few rows to check the data
print("CSV data preview:")
print(df.head())

# Check for missing values
missing_values = df.isnull().sum()

# Validate salary range
invalid_salaries = df[df['Salary'] <= 0]

# Validate name field (not empty)
empty_names = df[df['Name'].str.strip() == '']

# Print validation results
print(f"\nMissing values per column:\n{missing_values}")
print(f"\nInvalid salaries: {len(invalid_salaries)}")
print(f"Empty names: {len(empty_names)}")

This code demonstrates why pandas is the preferred tool for CSV validation.

Its DataFrame structure automatically handles column types, provides powerful filtering for validation rules, and makes it easy to identify problematic data.

The dtype parameter validates data types while loading the file, immediately catching conversion errors.

Run the script:

 
python validation.py
Output
CSV data preview:
    ID          Name   Department   Salary
0  101  Alice Cooper  Engineering  85000.0
1  102  Bob Martinez    Marketing  72000.0
2  103   Charlie Lee      Finance  93000.0

Missing values per column:
ID            0
Name          0
Department    0
Salary        0
dtype: int64

Invalid salaries: 0
Empty names: 0

Pandas offers many additional validation capabilities:

 
# Check for duplicate IDs
print(f"Duplicate IDs: {df['ID'].duplicated().sum()}")

You can detect duplicate values in columns that should be unique, like ID fields. The duplicated() method flags repeat values while sum() counts how many duplicates exist.

 
# Validate department values against allowed list
valid_depts = ['Engineering', 'Marketing', 'Finance', 'HR']
invalid_depts = df[~df['Department'].isin(valid_depts)]
print(f"Invalid departments: {len(invalid_depts)}")

For categorical data, you'll often need to verify values against an allowed list. Here, the isin() method checks if each department appears in the valid options list, while the tilde (~) operator finds values that don't match your approved categories.

 
# Statistical validation (detect outliers)
salary_mean = df['Salary'].mean()
salary_std = df['Salary'].std()
salary_outliers = df[(df['Salary'] < salary_mean - 2*salary_std) | 
                     (df['Salary'] > salary_mean + 2*salary_std)]
print(f"Salary outliers: {len(salary_outliers)}")

Statistical validation helps identify questionable values that technically pass type checking. This approach identifies salary outliers using the standard "two-sigma rule" - values more than two standard deviations from the mean are flagged for review.

Final thoughts

In this guide, you learned how to handle CSV files in Python, from basic reading and writing to advanced data manipulation. Python's csv module provides tools to process data in both list and dictionary formats, offering flexibility for different use cases.

You also explored handling various CSV formats through dialects and performing data validation and cleaning using pandas. These powerful tools allow you to efficiently manage and manipulate CSV data for a wide range of applications.

For more details, refer to the official Python documentation on the csv module:
Python CSV Documentation and Pandas Documentation.

Author's avatar
Article by
Stanley Ulili
Stanley Ulili is a technical educator at Better Stack based in Malawi. He specializes in backend development and has freelanced for platforms like DigitalOcean, LogRocket, and AppSignal. Stanley is passionate about making complex topics accessible to developers.
Got an article suggestion? Let us know
Next article
Get Started with Job Scheduling in Python
Learn how to create and monitor Python scheduled tasks in a production environment
Licensed under CC-BY-NC-SA

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Make your mark

Join the writer's program

Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.

Write for us
Writer of the month
Marin Bezhanov
Marin is a software engineer and architect with a broad range of experience working...
Build on top of Better Stack

Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.

community@betterstack.com

or submit a pull request and help us build better products for everyone.

See the full list of amazing projects on github