Reading CSV Files in Python — Complete Tutorial for CBSE AI Students

Every AI project starts with data. And data almost always lives in a CSV file. Whether you are reading the CBSE-suggested rainfall.csv, a dataset downloaded from Kaggle, or your own student marks file — this is the skill that unlocks everything else.

This tutorial covers all the CSV reading methods you need for Class 10 (Unit 7), Class 11 (Unit 3 Level 2 and Unit 5), and Class 12 (Unit 1) practicals.

What You’ll Learn

  • What a CSV file is and how Python reads it
  • All essential pd.read_csv() options with practical examples
  • How to explore, filter, and extract data from a CSV
  • How to handle common problems — missing values, wrong separators, encoding errors
  • How to write data back to CSV after cleaning

What Is a CSV File?

CSV stands for Comma-Separated Values. It is a plain text file where each row is a line and each column is separated by a comma.

A CSV file looks like this in a text editor:

Name,Marks,Grade,City
Arjun,85,B,Delhi
Priya,92,A,Mumbai
Kiran,78,C,Delhi

And it looks like this when opened in Excel:

NameMarksGradeCity
Arjun85BDelhi
Priya92AMumbai
Kiran78CDelhi

Python’s Pandas library reads CSV files and converts them into a DataFrame — a table you can query, filter, and analyse with code.


Part 1 — Basic CSV Reading

The Standard Way

python

# Program to read a CSV file and display its contents

import pandas as pd

df = pd.read_csv("data.csv")

print("First 5 rows:")
print(df.head())

Before running: Save your CSV file in the same folder as your Jupyter Notebook. If the file is elsewhere, you must provide the full path: pd.read_csv("C:/Users/Arjun/Documents/data.csv").


Read and Display 10 Rows ✅ CBSE Class 10 Suggested Program

python

# Program to read the csv file saved in your system and display 10 rows

import pandas as pd

df = pd.read_csv("data.csv")
print(df.head(10))

Expected Output: The first 10 rows of your CSV displayed as a formatted table with column headers and row index numbers on the left.


Read and Display Information ✅ CBSE Class 10 Suggested Program

python

# Program to read csv file saved in your system and display its information

import pandas as pd

df = pd.read_csv("data.csv")

print("Shape:", df.shape)
print("\nColumn Names:", df.columns.tolist())
print("\nDataset Info:")
print(df.info())
print("\nStatistical Summary:")
print(df.describe())

Expected Output (example for a 50-row, 4-column dataset):

Shape: (50, 4)

Column Names: ['Name', 'Marks', 'Grade', 'City']

Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Name    50 non-null     object
 1   Marks   47 non-null     float64
 2   Grade   50 non-null     object
 3   City    50 non-null     object
dtypes: float64(1), object(3)

Statistical Summary:
           Marks
count  47.000000
mean   78.340426
...

What df.info() reveals that df.describe() doesn’t: The Non-Null Count column tells you exactly which columns have missing data. In the example above, Marks shows 47 non-null out of 50 rows — meaning 3 values are missing.


Part 2 — Useful read_csv() Options

pd.read_csv() has many parameters. These are the ones you will actually use in CBSE practicals:

Specifying a Different Separator

Some CSV files use semicolons or tabs instead of commas:

python

# Reading a semicolon-separated file
df = pd.read_csv("data.csv", sep=";")

# Reading a tab-separated file (.tsv)
df = pd.read_csv("data.tsv", sep="\t")

Reading Only Specific Columns

python

# Read only Name and Marks columns — ignore the rest
df = pd.read_csv("data.csv", usecols=["Name", "Marks"])
print(df.head())

Skipping Rows at the Top

python

# Skip the first 2 rows (useful when CSV has header comments)
df = pd.read_csv("data.csv", skiprows=2)

Setting a Column as the Index

python

# Use the Name column as the row label instead of 0, 1, 2...
df = pd.read_csv("data.csv", index_col="Name")
print(df.head())

Handling Encoding Issues

Some CSV files saved in Indian languages or with special characters cause UnicodeDecodeError:

python

# Try utf-8 first (standard)
df = pd.read_csv("data.csv", encoding="utf-8")

# If that fails, try latin-1
df = pd.read_csv("data.csv", encoding="latin-1")

Part 3 — Exploring the Data After Reading

Once you have read the CSV, always explore it before doing anything else. This sequence is standard in every data science workflow:

python

# Program: Complete CSV exploration workflow

import pandas as pd

df = pd.read_csv("rainfall.csv")

# Step 1: Check shape
print("Rows, Columns:", df.shape)

# Step 2: Preview data
print("\nFirst 5 rows:")
print(df.head())

# Step 3: Check data types
print("\nData types:")
print(df.dtypes)

# Step 4: Missing values
print("\nMissing values per column:")
print(df.isnull().sum())

# Step 5: Basic statistics
print("\nStatistical summary:")
print(df.describe())

# Step 6: Unique values in a column (useful for categories)
# print(df["Grade"].unique())

Part 4 — Reading, Cleaning, and Saving

This is the complete workflow for Class 11 Unit 5 (Data Literacy — Data Pre-processing) and Class 12 Unit 1:

python

# Program to read a CSV, perform statistical analysis,
# check and fill missing values, then save cleaned data

import pandas as pd

# Step 1: Read
df = pd.read_csv("data.csv")
print("Original shape:", df.shape)
print("Missing values:\n", df.isnull().sum())

# Step 2: Statistical analysis
print("\nMean of numeric columns:")
print(df.mean(numeric_only=True))

# Step 3: Fill missing numeric values with column mean
df.fillna(df.mean(numeric_only=True), inplace=True)

# Step 4: Verify
print("\nMissing values after cleaning:")
print(df.isnull().sum())

# Step 5: Save cleaned data
df.to_csv("data_cleaned.csv", index=False)
print("\nCleaned file saved as data_cleaned.csv")

Part 5 — Common Problems and Fixes

Problem: FileNotFoundError: [Errno 2] No such file or directory: 'data.csv'

The most common error. Cause: your CSV is not in the same folder as your notebook.

Fix:

python

import os
print(os.getcwd())   # Shows which folder Jupyter is looking in

Move your CSV to that folder, or copy the full file path into read_csv().


Problem: Extra unnamed column appears (usually called Unnamed: 0)

Cause: The CSV was saved with index=True (the default), adding row numbers as an extra column.

Fix:

python

df = pd.read_csv("data.csv", index_col=0)   # Treats first column as index

Or, when saving: df.to_csv("data.csv", index=False)


Problem: Numbers reading as text (object dtype instead of int/float)

python

# Check: df["Marks"].dtype → shows 'object' instead of 'int64'

# Fix: convert after reading
df["Marks"] = pd.to_numeric(df["Marks"], errors="coerce")

errors="coerce" converts non-numeric values to NaN instead of crashing.


Quick Revision Box

Function / ParameterWhat It Does
pd.read_csv("file.csv")Reads a CSV file into a DataFrame
df.head(n)Shows first n rows (default 5)
df.tail(n)Shows last n rows
df.info()Shows column names, types, and null counts
df.describe()Statistical summary of numeric columns
df.shapeReturns (rows, columns) as a tuple
df.isnull().sum()Counts missing values per column
df.fillna(value)Replaces missing values
df.to_csv("file.csv", index=False)Saves DataFrame to CSV
sep=","Separator character (default comma)
usecols=["col1","col2"]Read only specified columns
encoding="utf-8"Character encoding for special characters

Practice Questions

Q1 (2 marks): Write Python code to read a CSV file called marks.csv and display its shape, column names, and the number of missing values in each column.

Model Answer:

python

import pandas as pd
df = pd.read_csv("marks.csv")
print("Shape:", df.shape)
print("Columns:", df.columns.tolist())
print("Missing values:\n", df.isnull().sum())

Q2 (MCQ): Which parameter in pd.read_csv() prevents an extra index column from appearing when the file was saved without index=False?

a) header=False b) index_col=0 c) skiprows=1 d) usecols=None

Answer: b) index_col=0 — treats the first column as the DataFrame index instead of loading it as a data column.


Frequently Asked Questions

Q1: What CSV file should I use for CBSE AI practice programs? The Class 11 AI syllabus specifically references rainfall.csv for Unit 5 programs. Ask your teacher for this file. For additional practice, download datasets from: Kaggle (kaggle.com/datasets), data.gov.in (Indian government open data), or the CBSE-linked AI activities spreadsheet mentioned in the Class 10 syllabus. Any properly formatted CSV works — the programs don’t depend on specific data.

Q2: My CSV file has Indian characters (Hindi names, etc.) and shows garbled text. How do I fix it? This is an encoding problem. Try these in order:

python

df = pd.read_csv("data.csv", encoding="utf-8")        # Try first
df = pd.read_csv("data.csv", encoding="utf-8-sig")    # If saved from Excel
df = pd.read_csv("data.csv", encoding="latin-1")      # Last resort

Q3: Can I read an Excel file instead of CSV using Pandas? Yes: df = pd.read_excel("file.xlsx"). You need the openpyxl library: pip install openpyxl. However, for CBSE AI practicals, CSV is the standard format — use CSV files to stay aligned with the syllabus.