Pandas read.csv() Function | Docs With Examples

The pd.read_csv() function in Pandas is a powerful tool for reading and processing CSV files efficiently.

Whether you're handling large datasets or small structured files, understanding how to use pd.read_csv() effectively can improve your data analysis workflow in Python.

Importing Pandas

Before using pd.read_csv(), it's essential you have Pandas installed and imported, and it's standard practice to import it with an alias:

import pandas as pd

Basic Usage of `pd.read_csv()`

To read a CSV file into a Pandas DataFrame, you'd typically do the following:

df = pd.read_csv("data.csv")
print(df.head())  # Display the first five rows

Explanation: This reads the CSV file data.csv and stores it as a DataFrame.

Handling Headers and Column Names

If the CSV file has no header row, it's a good idea to specify column names:

df = pd.read_csv("data.csv", header=None, names=["Column1", "Column2", "Column3"])

Explanation: header=None tells Pandas there is no header row, and names assigns column names manually.

Selecting Specific Columns

If you only want to read specific columns, you simply pass a list of the column names:

df = pd.read_csv("data.csv", usecols=["Column1", "Column3"])

Explanation: The usecols parameter selects only specified columns.

Handling Missing Values

It's very common in data analysis to handle data with missing values, and one way to do this is to replace missing values with a default value:

df = pd.read_csv("data.csv", na_values=["NA", "?"])

Explanation: This replaces "NA" and "?" with NaN.

Controlling Data Types

It can be more memory-efficient and performant to specify column data types:

df = pd.read_csv("data.csv", dtype={"Column1": int, "Column2": float})

Explanation: The dtype parameter ensures each column is read as the specified type.

Handling Large Files

For large files, it can be a smart idea to read in chunks to ensure you don't swallow up all system memory:

chunk_size = 1000
for chunk in pd.read_csv("large_data.csv", chunksize=chunk_size):
    print(chunk.shape)

Explanation: This reads the file in chunks of 1000 rows at a time to optimize memory usage.

Skipping Rows

To skip specific rows in a CSV file, you simple specify how many to ignore:

df = pd.read_csv("data.csv", skiprows=5)

Explanation: skiprows=5 ignores the first 5 rows.

Key Takeaways

pd.read_csv() is essential for loading CSV data into Pandas DataFrames in your Python projects.
Use header, names, and usecols to control column selection.
Handle missing values with na_values.
Optimize memory for large files with chunksize.

Practice Exercise

Here's a simple challenge, open up your Python editor and try to read a CSV file and display only rows where a specific column value is greater than 100:

df = pd.read_csv("data.csv")
filtered_df = df[df["Column1"] > 100]
print(filtered_df)

Wrapping Up

The pd.read_csv() function is a versatile tool for reading and processing CSV files. By mastering its parameters, you can efficiently load, clean, and analyze data in Pandas. Happy coding!

By Robert Johns

Technical Editor for Hackr.io | 15+ Years in Python, Java, SQL, C++, C#, JavaScript, Ruby, PHP, .NET, MATLAB, HTML & CSS, and more... 10+ Years in Networking, Cloud, APIs, Linux | 5+ Years in Data Science | 2x PhDs in Structural & Blast Engineering

View all post by the author

Beginner Courses

Intermediate Courses

Topics

Project Topics

Popular Technologies

Popular Articles

Topics

Beginner Courses

Intermediate Courses

Python

Web Development

Data Analysis

Python

Popular Projects

HTML

Popular Projects

JavaScript

Popular Projects

Java

Popular Projects

C++

Popular Projects

React

Popular Projects

PHP

Popular Projects

Arduino

Popular Projects

Python

Courses

Articles

HTML

Courses

Articles

JavaScript

Courses

Articles

Linux

Courses

Articles

Docker

Courses

Articles

Crypto

Courses

Articles

Projects

Blog

Code Alongs

Cheat Sheets

User-Submitted Resources

Code Editors

AI Tools

Pandas read.csv() Function | Docs With Examples

Importing Pandas

Basic Usage of pd.read_csv()

Handling Headers and Column Names

Selecting Specific Columns

Handling Missing Values

Controlling Data Types

Handling Large Files

Skipping Rows

Key Takeaways

Practice Exercise

Wrapping Up

Learn More

Basic Usage of `pd.read_csv()`