Pandas Tutorial for Class 11 AI Students (CBSE 2025-26)

If NumPy is for working with numbers, Pandas is for working with data. The moment you have a table — student records, weather readings, sales data — Pandas is what you reach for. This tutorial covers everything Class 11 needs: creating DataFrames, reading CSV files, cleaning data, and analysing it, with every program ready for your practical file.

This tutorial covers Unit 3: Python Programming (Level 2) and supports Unit 5: Data Literacy – Data Pre-processing of the CBSE AI Class 11 syllabus (Subject Code 843, 2025-26). The same programs also map directly to Class 12, Unit 1: Python Programming – II sample programs.

What You’ll Learn

What a DataFrame is and how Pandas thinks about data
How to create DataFrames from dictionaries, lists, and CSV files
Essential operations: exploring, filtering, sorting, grouping data
Handling missing values — the most important data cleaning skill
Exporting cleaned data back to CSV
All practical file programs with code and expected output

What Is Pandas?

Pandas is a Python library for working with structured, tabular data — data that has rows and columns, like a spreadsheet or a database table.

Think of it this way: NumPy is excellent at fast maths on arrays of numbers. But real datasets have mixed types — names (text), marks (numbers), dates, categories. Pandas handles all of that in a single structure called a DataFrame.

In India’s agriculture sector, government agencies use Pandas to load district-level crop production data from CSV files published on data.gov.in, clean missing entries for districts that didn’t report, and compute state-wise averages. The exact same workflow — read_csv(), isnull(), fillna(), groupby() — is what you will practise here.

python

import pandas as pd    # Standard alias — always use this

Part 1 — Creating DataFrames

A DataFrame is Pandas’ core data structure. Think of it as a table with labelled rows (index) and labelled columns (column names).

From a Dictionary

The most common way to create a DataFrame for practice programs:

python

# Program to create a Pandas DataFrame using a dictionary (sequence data type)
# and perform basic display operations

import pandas as pd

data = {
    "Name"      : ["Arjun", "Priya", "Kiran", "Meena", "Rohan", "Sneha"],
    "Marks"     : [85, 92, 78, 95, 70, 88],
    "Grade"     : ["B", "A", "C", "A", "D", "B"],
    "Attendance": [88, 95, 72, 98, 65, 91]
}

df = pd.DataFrame(data)

# a) Display the full DataFrame
print("Full DataFrame:")
print(df)

# b) Display first 5 records
print("\nFirst 5 records:")
print(df.head(5))

# c) Display last 10 records (only 6 rows exist, so all are shown)
print("\nLast 10 records:")
print(df.tail(10))

# d) Display the number of missing values
print("\nMissing values in each column:")
print(df.isnull().sum())

Expected Output:

Full DataFrame:
    Name  Marks Grade  Attendance
0  Arjun     85     B          88
1  Priya     92     A          95
2  Kiran     78     C          72
3  Meena     95     A          98
4  Rohan     70     D          65
5  Sneha     88     B          91

First 5 records:
    Name  Marks Grade  Attendance
0  Arjun     85     B          88
1  Priya     92     A          95
2  Kiran     78     C          72
3  Meena     95     A          98
4  Rohan     70     D          65

Last 10 records:
    Name  Marks Grade  Attendance
0  Arjun     85     B          88
...

Missing values in each column:
Name          0
Marks         0
Grade         0
Attendance    0
dtype: int64

📌 Class 12 note: This program directly maps to the Class 12 (Subject Code 843, 2025-26) Unit 1 sample program: “Write Python code to create a Pandas DataFrame using any sequence data type” — a dictionary is a sequence data type.

From a List of Lists

python

# Program to create a DataFrame from a list of lists

import pandas as pd

rows = [
    ["Aarav",  82, "B"],
    ["Diya",   91, "A"],
    ["Ishaan", 74, "C"],
    ["Kavya",  96, "A"]
]

df = pd.DataFrame(rows, columns=["Name", "Marks", "Grade"])
print(df)

Expected Output:

     Name  Marks Grade
0   Aarav     82     B
1    Diya     91     A
2  Ishaan     74     C
3   Kavya     96     A

Part 2 — Exploring a DataFrame

Before analysing any dataset, you always explore it first. These five methods are the standard starting sequence in every data science project.

python

# Program to explore a DataFrame using standard methods

import pandas as pd

data = {
    "City"       : ["Mumbai", "Delhi", "Bengaluru", "Chennai", "Kolkata"],
    "Population" : [20667656, 32941309, 13193000, 10971108, 14850066],
    "Area_km2"   : [603, 1484, 741, 426, 205],
    "Literacy_%" : [89.2, 86.3, 87.7, 90.2, 87.1]
}

df = pd.DataFrame(data)

print("Shape (rows, columns):", df.shape)
print("\nColumn Names:", df.columns.tolist())
print("\nData Types:\n", df.dtypes)
print("\nFirst 3 rows:\n", df.head(3))
print("\nStatistical Summary:\n", df.describe())

Expected Output:

Shape (rows, columns): (5, 4)

Column Names: ['City', 'Population', 'Area_km2', 'Literacy_%']

Data Types:
 City           object
Population      int64
Area_km2        int64
Literacy_%    float64
dtype: object

First 3 rows:
        City  Population  Area_km2  Literacy_%
0     Mumbai    20667656       603        89.2
1      Delhi    32941309      1484        86.3
2  Bengaluru    13193000       741        87.7

Statistical Summary:
         Population      Area_km2  Literacy_%
count  5.000000e+00      5.000000    5.000000
mean   1.852365e+07    691.800000   88.100000
...

The exploration sequence — memorise this for Viva:

Method	What It Tells You
`df.shape`	Number of rows and columns
`df.columns`	Column names
`df.dtypes`	Data type of each column
`df.head(n)`	First n rows (default 5)
`df.tail(n)`	Last n rows (default 5)
`df.info()`	Column types + non-null counts + memory
`df.describe()`	Min, max, mean, std for numeric columns

Part 3 — Reading and Writing CSV Files

CSV (Comma-Separated Values) is the most common data format in AI. Every real-world project starts here.

Reading a CSV File

python

# Program to read a CSV file and perform statistical analysis
# (Download dataset from Kaggle, data.gov.in, or use rainfall.csv from CBSE)

import pandas as pd

# Read the CSV
df = pd.read_csv("rainfall.csv")    # Replace with your filename

# a) Basic exploration
print("Shape:", df.shape)
print("\nFirst 5 rows:")
print(df.head())

# b) Statistical summary
print("\nStatistical Summary:")
print(df.describe())

# c) Check for missing values
print("\nMissing Values:")
print(df.isnull().sum())

# d) Column-wise statistics
print("\nMean of each numeric column:")
print(df.mean(numeric_only=True))

Which file to use: The CBSE Class 11 Unit 5 syllabus specifically mentions rainfall.csv for data literacy programs. Ask your teacher for this file, or download Indian rainfall data from data.gov.in or IMD (India Meteorological Department).

Writing a DataFrame to CSV

python

# Program to export a DataFrame to a CSV file

import pandas as pd

data = {
    "Student" : ["Arjun", "Priya", "Kiran"],
    "Score"   : [85, 92, 78]
}
df = pd.DataFrame(data)

# Save to CSV — index=False prevents adding an extra index column
df.to_csv("student_scores.csv", index=False)
print("File saved successfully as student_scores.csv")

# Read it back to verify
df_verify = pd.read_csv("student_scores.csv")
print(df_verify)

Expected Output:

File saved successfully as student_scores.csv
  Student  Score
0   Arjun     85
1   Priya     92
2   Kiran     78

index=False explained: Without it, Pandas adds a column of row numbers (0, 1, 2…) as the first column of your CSV. This creates a duplicate index when you read the file back. Always use index=False when saving.

Part 4 — Handling Missing Values

This is the most important data cleaning skill in AI. Real datasets are almost always incomplete — sensors fail, forms are left blank, values get corrupted. Before training any model, you must handle missing data.

python

# Program to detect and handle missing values in a DataFrame

import pandas as pd
import numpy as np

# Dataset with deliberate missing values (NaN = Not a Number)
data = {
    "Name"       : ["Arjun", "Priya", "Kiran", "Meena", "Rohan"],
    "Marks"      : [85, np.nan, 78, 95, np.nan],
    "Attendance" : [88, 95, np.nan, 98, 65],
    "Grade"      : ["B", "A", np.nan, "A", "D"]
}

df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

print("\nMissing values per column:")
print(df.isnull().sum())

print("\nTotal missing values:", df.isnull().sum().sum())

# Strategy 1: Fill numeric missing values with column mean
df["Marks"].fillna(df["Marks"].mean(), inplace=True)

# Strategy 2: Fill numeric missing values with a specific value
df["Attendance"].fillna(0, inplace=True)

# Strategy 3: Fill text/category column with a placeholder
df["Grade"].fillna("Unknown", inplace=True)

print("\nDataFrame after handling missing values:")
print(df)

print("\nMissing values after cleaning:")
print(df.isnull().sum())

Expected Output:

Original DataFrame:
    Name  Marks  Attendance Grade
0  Arjun   85.0        88.0     B
1  Priya    NaN        95.0     A
2  Kiran   78.0         NaN  None
3  Meena   95.0        98.0     A
4  Rohan    NaN        65.0     D

Missing values per column:
Name          0
Marks         2
Attendance    1
Grade         1
dtype: int64

Total missing values: 4

DataFrame after handling missing values:
    Name  Marks  Attendance    Grade
0  Arjun   85.0        88.0        B
1  Priya   86.0        95.0        A
2  Kiran   78.0         0.0  Unknown
3  Meena   95.0        98.0        A
4  Rohan   86.0        65.0        D

Missing values after cleaning:
Name          0
Marks         0
Attendance    0
Grade         0
dtype: int64

Three strategies for missing values:

Strategy	When to Use	Code
Fill with mean	Numeric column, data is roughly symmetric	`df["col"].fillna(df["col"].mean(), inplace=True)`
Fill with median	Numeric column with outliers	`df["col"].fillna(df["col"].median(), inplace=True)`
Fill with placeholder	Text/category column	`df["col"].fillna("Unknown", inplace=True)`
Drop rows	Very few rows missing, can afford to lose them	`df.dropna(inplace=True)`

Part 5 — Filtering, Sorting, and Selecting Data

python

# Program to filter, sort, and select data from a DataFrame

import pandas as pd

data = {
    "Name"   : ["Arjun", "Priya", "Kiran", "Meena", "Rohan", "Sneha", "Dev"],
    "Marks"  : [85, 92, 78, 95, 70, 88, 63],
    "Grade"  : ["B", "A", "C", "A", "D", "B", "D"],
    "City"   : ["Delhi", "Mumbai", "Delhi", "Chennai", "Mumbai", "Delhi", "Chennai"]
}

df = pd.DataFrame(data)

# Filter: students scoring above 80
high_scorers = df[df["Marks"] > 80]
print("Students scoring above 80:")
print(high_scorers)

# Filter with multiple conditions: marks > 75 AND city is Delhi
delhi_toppers = df[(df["Marks"] > 75) & (df["City"] == "Delhi")]
print("\nDelhi students scoring above 75:")
print(delhi_toppers)

# Select specific columns
name_marks = df[["Name", "Marks"]]
print("\nName and Marks only:")
print(name_marks)

# Sort by Marks descending
sorted_df = df.sort_values("Marks", ascending=False)
print("\nSorted by Marks (highest first):")
print(sorted_df)

Expected Output:

Students scoring above 80:
    Name  Marks Grade     City
0  Arjun     85     B    Delhi
1  Priya     92     A   Mumbai
3  Meena     95     A  Chennai
5  Sneha     88     B    Delhi

Delhi students scoring above 75:
    Name  Marks Grade   City
0  Arjun     85     B  Delhi
5  Sneha     88     B  Delhi

Name and Marks only:
    Name  Marks
0  Arjun     85
...

Sorted by Marks (highest first):
    Name  Marks Grade     City
3  Meena     95     A  Chennai
1  Priya     92     A   Mumbai
...

Part 6 — Grouping and Aggregation

GroupBy is one of the most used operations in data analysis — it answers questions like “what is the average marks by city?” or “how many students per grade?”

python

# Program to demonstrate groupby and aggregation in Pandas

import pandas as pd

data = {
    "Name"  : ["Arjun","Priya","Kiran","Meena","Rohan","Sneha","Dev","Anita"],
    "Marks" : [85, 92, 78, 95, 70, 88, 63, 91],
    "Grade" : ["B","A","C","A","D","B","D","A"],
    "City"  : ["Delhi","Mumbai","Delhi","Chennai","Mumbai","Delhi","Chennai","Mumbai"]
}

df = pd.DataFrame(data)

# Average marks by Grade
print("Average marks by Grade:")
print(df.groupby("Grade")["Marks"].mean())

# Count of students by City
print("\nNumber of students per City:")
print(df.groupby("City")["Name"].count())

# Multiple aggregations at once
print("\nMarks summary by City:")
print(df.groupby("City")["Marks"].agg(["mean", "min", "max"]))

Expected Output:

Average marks by Grade:
Grade
A    92.666667
B    86.500000
C    78.000000
D    66.500000
Name: Marks, dtype: float64

Number of students per City:
City
Chennai    2
Delhi      3
Mumbai     3
Name: Name, dtype: int64

Marks summary by City:
              mean  min  max
City
Chennai  79.000000   63   95
Delhi    83.666667   78   88
Mumbai   84.333333   70   92

Part 7 — Adding and Dropping Columns

python

# Program to add new columns and drop unwanted columns

import pandas as pd

data = {
    "Name"    : ["Arjun", "Priya", "Kiran", "Meena"],
    "Marks"   : [85, 92, 78, 95],
    "Max_Marks": [100, 100, 100, 100]
}

df = pd.DataFrame(data)

# Add a new column: Percentage
df["Percentage"] = (df["Marks"] / df["Max_Marks"]) * 100

# Add a new column based on condition: Pass/Fail
df["Result"] = df["Marks"].apply(lambda x: "Pass" if x >= 33 else "Fail")

print("DataFrame with new columns:")
print(df)

# Drop the Max_Marks column (no longer needed)
df.drop(columns=["Max_Marks"], inplace=True)

print("\nAfter dropping Max_Marks:")
print(df)

Expected Output:

DataFrame with new columns:
    Name  Marks  Max_Marks  Percentage Result
0  Arjun     85        100        85.0   Pass
1  Priya     92        100        92.0   Pass
2  Kiran     78        100        78.0   Pass
3  Meena     95        100        95.0   Pass

After dropping Max_Marks:
    Name  Marks  Percentage Result
0  Arjun     85        85.0   Pass
1  Priya     92        92.0   Pass
2  Kiran     78        78.0   Pass
3  Meena     95        95.0   Pass

lambda in one line: lambda x: "Pass" if x >= 33 else "Fail" is a small anonymous function that runs on each value in the Marks column. apply() passes each value through it and returns the result. It is the clean way to add a calculated category column.

NumPy vs Pandas — Knowing Which to Use

Students often get confused about when to use NumPy and when to use Pandas. Here is the rule:

Use Case	NumPy	Pandas
Pure number crunching (arrays, matrices)	✅	Not needed
Tabular data (rows + columns, mixed types)	❌	✅
Statistical analysis on a single column	✅	✅
Reading CSV files	❌	✅
Filtering rows by condition	Possible but verbose	✅
Input to Scikit-learn	✅ (arrays)	✅ (DataFrames)
Matrix operations	✅	❌

In practice, they work together: Pandas loads and cleans the data, NumPy does the maths underneath, Scikit-learn trains the model.

Quick Revision Box

Function	What It Does
`pd.DataFrame(data)`	Creates a DataFrame from a dictionary or list
`pd.read_csv("file.csv")`	Reads a CSV file into a DataFrame
`df.to_csv("file.csv", index=False)`	Saves DataFrame to CSV without extra index column
`df.shape`	Returns (rows, columns)
`df.head(n)`	First n rows
`df.tail(n)`	Last n rows
`df.info()`	Column types and null counts
`df.describe()`	Statistical summary of numeric columns
`df.isnull().sum()`	Count of missing values per column
`df.fillna(value, inplace=True)`	Fill missing values with given value
`df.dropna(inplace=True)`	Remove rows with any missing value
`df[df["col"] > value]`	Filter rows by condition
`df.sort_values("col")`	Sort by column (ascending by default)
`df.groupby("col").mean()`	Group rows and compute mean per group
`df["new"] = ...`	Add a new column
`df.drop(columns=["col"])`	Remove a column
`df["col"].apply(func)`	Apply a function to every value in a column

Practice Questions

Q1 (2 marks): Write a Python program to create a Pandas DataFrame of 3 students with columns Name, Marks, and Grade. Display the first 2 rows and count the missing values.

Model Answer:

python

import pandas as pd
data = {
    "Name" : ["Arjun", "Priya", "Kiran"],
    "Marks": [85, 92, 78],
    "Grade": ["B", "A", "C"]
}
df = pd.DataFrame(data)
print(df.head(2))
print("Missing values:", df.isnull().sum())

Q2 (MCQ): Which Pandas method removes rows that contain missing values?

a) df.fillna() b) df.isnull() c) df.dropna() d) df.remove()

Answer: c) df.dropna() — removes all rows containing at least one missing (NaN) value.

Frequently Asked Questions

Q1: What is the difference between df.fillna() and df.dropna()? fillna() replaces missing values with something — a number, a string, or the column mean — so you keep all your rows. dropna() deletes any row that has at least one missing value. Use fillna() when you cannot afford to lose data (small datasets, important rows). Use dropna() when you have enough data and the missing rows are few and random.

Q2: Why does inplace=True appear in so many Pandas operations? By default, Pandas operations return a new DataFrame and leave the original unchanged. inplace=True modifies the existing DataFrame directly without creating a copy. Without it, you must write df = df.fillna(0) to save the change. With it, df.fillna(0, inplace=True) modifies df directly. Both approaches are correct — inplace=True is just shorter.

Q3: What CSV file should I use for the Unit 5 practical programs? The CBSE Class 11 AI syllabus specifically references rainfall.csv for Unit 5 programs. Ask your teacher for this file. Alternatively, download Indian district-level rainfall data from the IMD website or data.gov.in. Any properly formatted CSV works — the programs are not tied to a specific dataset.