Every Class 12 AI student needs a practical file. Most students don’t know exactly what it must contain until a week before submission — and then they scramble.
This guide tells you exactly what CBSE requires in your Class 12 AI practical file for 2025-26, gives you all the Python programs you need, and walks you through the Orange Data Mining activities and the Data Story. Follow this guide and your practical file is complete.
What You’ll Learn
- The exact CBSE requirements for your Class 12 AI practical file (from the official 2025-26 curriculum)
- 6 complete Python programs with code, explanation, and expected output
- Step-by-step instructions for all 3 required Orange Data Mining activities
- How to write a complete Data Story — the one activity most students underestimate
- What optional programs to add if you want to score higher in the Lab Test
What Must Your Practical File Contain? (CBSE 2025-26 Official Requirements)
According to the CBSE Class 12 AI curriculum (Subject Code 843, 2025-26), your practical file must include:
- Minimum 6 Python programs
- Minimum 3 programs using Orange Data Mining tool (with screenshots)
- Minimum 1 Data Story — using all steps of Data Storytelling
The practical file is worth 10 marks in Part C. The Lab Test (which tests the same skills) is worth another 10 marks. Together, the practical file and lab test account for 20 marks out of 50 in Part C — a significant chunk of your total score.
Important: All Orange Data Mining activities require screenshots (snapshots) pasted directly into the file showing each step and output.
Practical File Marks Breakdown (Part C)
| Component | Marks |
|---|---|
| Capstone Project + Documentation + Video | 25 |
| Practical File | 10 |
| Lab Test (Python and Orange Data Mining) | 10 |
| Viva Voce | 5 |
| Total (Part C) | 50 |
Section 1: Python Programs (Minimum 6 Required)
Program 1: Create a Pandas DataFrame and Display Basic Information
Objective: Create a Pandas DataFrame from a sequence data type and display its contents, first 5 records, last 10 records, and the number of missing values.
# Program 1: Create and explore a Pandas DataFrame
import pandas as pd
import numpy as np
# Create a DataFrame from a dictionary (sequence data type)
data = {
'Student_Name': ['Arjun', 'Priya', 'Rohan', 'Sneha', 'Vikram',
'Ananya', 'Karan', 'Divya', 'Rahul', 'Meera',
'Suresh', 'Kavya'],
'Marks_AI': [85, 92, 78, np.nan, 88, 95, 72, np.nan, 81, 90, 76, 89],
'Marks_Math': [78, 88, 82, 91, 75, np.nan, 85, 79, 92, 84, 77, 93],
'Grade': ['A', 'A+', 'B+', 'A', 'A', 'A+', 'B', 'B+', 'A', 'A', 'B+', 'A']
}
df = pd.DataFrame(data)
print("=== Complete DataFrame ===")
print(df)
print("\n=== First 5 Records ===")
print(df.head(5))
print("\n=== Last 10 Records ===")
print(df.tail(10))
print("\n=== Number of Missing Values in Each Column ===")
print(df.isnull().sum())
Expected Output:
=== Complete DataFrame ===
Student_Name Marks_AI Marks_Math Grade
0 Arjun 85.0 78.0 A
1 Priya 92.0 88.0 A+
...
=== First 5 Records ===
Student_Name Marks_AI Marks_Math Grade
0 Arjun 85.0 78.0 A
...
=== Number of Missing Values in Each Column ===
Student_Name 0
Marks_AI 2
Marks_Math 1
Grade 0
Explanation: pd.DataFrame() creates the table from a dictionary. head(5) returns the first 5 rows. tail(10) returns the last 10. isnull().sum() counts the missing values (NaN) in each column.
Program 2: Read a CSV File and Handle Missing Values
Objective: Read a CSV file into a DataFrame, check for missing values, and fill them using appropriate methods.
# Program 2: Read CSV file and handle missing values
import pandas as pd
# Read CSV file (download any dataset from Kaggle or use a dataset provided by teacher)
# For this example, we create a sample CSV scenario
data = {
'City': ['Mumbai', 'Delhi', 'Bangalore', 'Chennai', 'Kolkata', 'Pune', 'Hyderabad'],
'Temperature': [32.5, 28.0, None, 35.2, 30.1, None, 33.8],
'Humidity': [85, 70, 75, None, 88, 65, None],
'AQI': [142, 189, 95, 112, 165, 88, 134]
}
# Save to CSV and read back (simulates reading an external CSV file)
df = pd.DataFrame(data)
df.to_csv('city_data.csv', index=False)
df = pd.read_csv('city_data.csv')
print("=== Original DataFrame (with missing values) ===")
print(df)
print("\n=== Data Types and Info ===")
print(df.info())
print("\n=== Statistical Summary ===")
print(df.describe())
print("\n=== Missing Values Count ===")
print(df.isnull().sum())
# Fill missing Temperature with column mean
df['Temperature'].fillna(df['Temperature'].mean(), inplace=True)
# Fill missing Humidity with column median
df['Humidity'].fillna(df['Humidity'].median(), inplace=True)
print("\n=== DataFrame After Handling Missing Values ===")
print(df)
print("\n=== Verify: Missing Values After Filling ===")
print(df.isnull().sum())
Explanation: read_csv() loads the file. info() shows column data types. describe() gives statistical summary. fillna() replaces missing values — we use mean for Temperature and median for Humidity. inplace=True modifies the original DataFrame directly.
Program 3: Evaluate a Classification Model (Confusion Matrix and Metrics)
Objective: Build a simple classification model and evaluate it using a confusion matrix, precision, recall, and F1 score.
# Program 3: Build and evaluate a classification model
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score
from sklearn.metrics import recall_score, f1_score
# Dataset: Predict whether a student passes (1) or fails (0) based on study hours and attendance
data = {
'Study_Hours': [2, 5, 1, 8, 3, 7, 1, 6, 4, 9, 2, 7, 3, 8, 1],
'Attendance': [60, 85, 45, 95, 70, 90, 40, 88, 75, 98, 55, 92, 65, 96, 35],
'Result': [0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0]
}
df = pd.DataFrame(data)
X = df[['Study_Hours', 'Attendance']] # Features
y = df['Result'] # Target variable
# Split data into training (80%) and test (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Build and train the model
model = GaussianNB()
model.fit(X_train, y_train)
# Make predictions on test data
y_pred = model.predict(X_test)
# Evaluate the model
print("=== Confusion Matrix ===")
cm = confusion_matrix(y_test, y_pred)
print(cm)
print("TN:", cm[0][0], "| FP:", cm[0][1])
print("FN:", cm[1][0], "| TP:", cm[1][1])
print("\n=== Model Performance Metrics ===")
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print(f"Precision: {precision_score(y_test, y_pred, zero_division=0):.2f}")
print(f"Recall: {recall_score(y_test, y_pred, zero_division=0):.2f}")
print(f"F1 Score: {f1_score(y_test, y_pred, zero_division=0):.2f}")
Explanation: We use a Naive Bayes classifier from scikit-learn. train_test_split divides data 80:20. fit() trains the model. predict() generates predictions. The confusion matrix shows TP, TN, FP, FN values. Precision, recall, and F1 score are calculated using scikit-learn’s metrics module.
Program 4: Data Visualisation with Matplotlib
Objective: Create four types of charts — bar chart, line chart, histogram, and scatter plot — using Matplotlib.
# Program 4: Data visualisation using Matplotlib
import matplotlib.pyplot as plt
import numpy as np
# Data for all charts
subjects = ['AI', 'Math', 'Physics', 'Chemistry', 'English']
avg_marks = [82, 75, 68, 71, 88]
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
ai_scores = [72, 75, 79, 82, 85, 88]
student_marks = [45, 52, 58, 61, 65, 68, 71, 75, 78, 82, 85, 88, 91, 94]
study_hours = [2, 3, 4, 5, 6, 7, 8]
exam_score = [45, 55, 62, 70, 78, 85, 92]
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
fig.suptitle('Class 12 AI Practical File — Data Visualisation', fontsize=14)
# Bar Chart
axes[0, 0].bar(subjects, avg_marks, color='steelblue', edgecolor='black')
axes[0, 0].set_title('Average Marks by Subject')
axes[0, 0].set_ylabel('Marks')
# Line Chart
axes[0, 1].plot(months, ai_scores, marker='o', color='green', linewidth=2)
axes[0, 1].set_title('AI Score Progress Over 6 Months')
axes[0, 1].set_ylabel('Score')
# Histogram
axes[1, 0].hist(student_marks, bins=5, color='orange', edgecolor='black')
axes[1, 0].set_title('Distribution of Student Marks')
axes[1, 0].set_xlabel('Marks Range')
axes[1, 0].set_ylabel('Number of Students')
# Scatter Plot
axes[1, 1].scatter(study_hours, exam_score, color='red', s=80)
axes[1, 1].set_title('Study Hours vs Exam Score')
axes[1, 1].set_xlabel('Study Hours per Day')
axes[1, 1].set_ylabel('Exam Score')
plt.tight_layout()
plt.savefig('visualisation_programs.png')
plt.show()
print("Chart saved as 'visualisation_programs.png'")
Explanation: plt.subplots(2,2) creates a 2×2 grid of charts. bar() creates a bar chart. plot() with marker='o' creates a line chart. hist() creates a histogram with 5 bins. scatter() creates a scatter plot. tight_layout() prevents overlapping titles. Save the output image and paste it in your practical file.
Program 5: Import and Export Data Between CSV Files and DataFrames
Objective: Demonstrate importing data from CSV, performing operations, and exporting the modified data back to a new CSV file.
# Program 5: Import and export data between CSV files and DataFrames
import pandas as pd
import numpy as np
# Create sample dataset simulating a school database
np.random.seed(42)
n = 20
data = {
'Roll_No': range(1, n+1),
'Student_Name': [f'Student_{i}' for i in range(1, n+1)],
'AI_Marks': np.random.randint(50, 100, n),
'Math_Marks': np.random.randint(45, 98, n),
'Attendance_Pct': np.random.randint(60, 100, n)
}
# Export to CSV
df_original = pd.DataFrame(data)
df_original.to_csv('student_records.csv', index=False)
print("Original data exported to 'student_records.csv'")
# Import from CSV
df = pd.read_csv('student_records.csv')
print("\n=== Imported DataFrame ===")
print(df.head())
print(f"\nShape: {df.shape[0]} rows × {df.shape[1]} columns")
# Perform operations
df['Total_Marks'] = df['AI_Marks'] + df['Math_Marks']
df['Average'] = (df['AI_Marks'] + df['Math_Marks']) / 2
df['Status'] = df['Average'].apply(lambda x: 'Pass' if x >= 60 else 'Fail')
print("\n=== DataFrame After Adding Computed Columns ===")
print(df.head())
# Export modified data
df.to_csv('student_records_updated.csv', index=False)
print("\nUpdated data exported to 'student_records_updated.csv'")
# Summary statistics
print("\n=== Summary Statistics ===")
print(df[['AI_Marks', 'Math_Marks', 'Total_Marks']].describe())
print(f"\nPass: {df['Status'].value_counts()['Pass']} students")
print(f"Fail: {df['Status'].value_counts()['Fail']} students")
Explanation: to_csv() exports the DataFrame to a CSV file. index=False prevents writing row numbers. read_csv() imports it back. We add computed columns using arithmetic operations. apply() with a lambda function assigns Pass/Fail status based on a condition. The final export saves the enriched dataset.
Program 6: NumPy Array Operations for Data Analysis
Objective: Demonstrate NumPy operations — array creation, mathematical operations, statistical functions, and reshaping.
# Program 6: NumPy array operations
import numpy as np
# Create arrays
marks_ai = np.array([85, 92, 78, 88, 95, 72, 81, 90, 76, 89])
marks_math = np.array([78, 88, 82, 91, 75, 85, 79, 92, 84, 77, 93, 86])
print("=== Basic Array Information ===")
print(f"AI Marks Array: {marks_ai}")
print(f"Shape: {marks_ai.shape}")
print(f"Data Type: {marks_ai.dtype}")
print("\n=== Statistical Analysis of AI Marks ===")
print(f"Mean: {np.mean(marks_ai):.2f}")
print(f"Median: {np.median(marks_ai):.2f}")
print(f"Standard Deviation: {np.std(marks_ai):.2f}")
print(f"Variance: {np.var(marks_ai):.2f}")
print(f"Minimum: {np.min(marks_ai)}")
print(f"Maximum: {np.max(marks_ai)}")
print("\n=== Array Operations ===")
bonus = np.full(10, 5) # Array of 5s for bonus marks
adjusted_marks = marks_ai + bonus
print(f"Original AI Marks: {marks_ai}")
print(f"After +5 Bonus: {adjusted_marks}")
print("\n=== Filtering: Students scoring above 85 ===")
above_85 = marks_ai[marks_ai > 85]
print(f"Scores above 85: {above_85}")
print(f"Count: {len(above_85)} students")
print("\n=== 2D Array: Reshape marks into 2×5 grid ===")
grid = marks_ai.reshape(2, 5)
print(grid)
print(f"Shape after reshape: {grid.shape}")
print("\n=== Sorting ===")
print(f"Ascending order: {np.sort(marks_ai)}")
print(f"Descending order: {np.sort(marks_ai)[::-1]}")
Explanation: NumPy arrays are faster and more memory-efficient than Python lists for numerical operations. np.mean(), np.median(), np.std(), np.var() are the standard statistical functions. Boolean indexing (marks_ai > 85) filters elements based on a condition. reshape() changes the array dimensions without changing the data.
Section 2: Orange Data Mining Activities (Minimum 3 Required)
Important: Each Orange activity must include step-by-step screenshots of every widget and its output pasted into your practical file. The note in the CBSE curriculum is explicit: “Snapshots of all the steps and outputs are to be taken and pasted in the practical file.”
Orange Activity 1: Data Visualisation Using Orange Data Mining Tool
Step-by-step procedure:
Step 1 — Open Orange and Start a New Workflow Open Orange Data Mining. Click File → New. You will see an empty canvas.
Step 2 — Add the File Widget From the Data panel on the left, drag the File widget onto the canvas. Double-click it and load a dataset. Use the Iris dataset (built into Orange: click the folder icon → look for iris.tab in the Datasets folder) or any CSV file you have.
Step 3 — Add a Data Table Widget Drag the Data Table widget from the Data panel. Connect the File widget to the Data Table widget by dragging from the output dot of File to the input dot of Data Table. Double-click Data Table to view the loaded data.
Step 4 — Add a Scatter Plot Widget From the Visualize panel, drag the Scatter Plot widget onto the canvas. Connect File → Scatter Plot. Double-click Scatter Plot. Set X-Axis to one numeric feature (e.g. sepal length) and Y-Axis to another (e.g. petal length). Set Color by class label. [Take screenshot of the scatter plot output.]
Step 5 — Add a Box Plot Widget From the Visualize panel, drag Box Plot. Connect File → Box Plot. Double-click to view the distribution of each feature across classes. [Take screenshot.]
Step 6 — Add a Distributions Widget Connect File → Distributions. View the frequency distribution of a selected feature. [Take screenshot.]
Observation to write in your file: The scatter plot shows clear visual separation between the three Iris classes when petal length and petal width are used as axes, suggesting these features are strong predictors for classification.
Orange Activity 2: Perform Classification with Orange Data Mining
Step-by-step procedure:
Step 1 — Load the Dataset Add a File widget and load the Iris dataset (iris.tab) or a classification dataset of your choice.
Step 2 — Add a Naive Bayes Classifier From the Model panel, drag the Naive Bayes widget onto the canvas.
Step 3 — Add Test and Score Widget From the Evaluate panel, drag the Test and Score widget. Connect File → Test and Score and also Naive Bayes → Test and Score.
Step 4 — Run and View Results Double-click Test and Score. In the Sampling section, select Cross-validation with 10 folds. The results table will show Accuracy, Precision, Recall, and F1 Score for the model. [Take screenshot of the results table.]
Step 5 — Try a Second Classifier Add a kNN (k-Nearest Neighbours) widget from the Model panel. Connect it also to Test and Score. Now compare the performance of Naive Bayes and kNN side by side. [Take screenshot showing both models.]
Observation to write in your file: Record the accuracy values for both classifiers. Note which performed better and suggest one reason why.
Orange Activity 3: Evaluate the Classification Model with Confusion Matrix
Step-by-step procedure (continues from Activity 2):
Step 1 — Add a Confusion Matrix Widget From the Evaluate panel, drag the Confusion Matrix widget. Connect Test and Score → Confusion Matrix.
Step 2 — View the Confusion Matrix Double-click Confusion Matrix. You will see a table showing True Positives, True Negatives, False Positives, and False Negatives for each class. [Take screenshot of the full confusion matrix.]
Step 3 — Interpret the Results In your file, write the following for your model:
- True Positive rate for each class (correctly identified instances of that class)
- Which class was most often misclassified, and into which other class
- The overall accuracy shown in Test and Score
Step 4 — Add a ROC Analysis Widget (Optional — for extra marks) Connect Test and Score → ROC Analysis. View the ROC curves for each class. A curve closer to the top-left corner indicates a better model. [Take screenshot.]
Observation to write: The confusion matrix shows that [class name] has the lowest misclassification rate, while [class name] is sometimes confused with [other class]. This matches the scatter plot from Activity 1, where those two classes showed overlapping data points.
Section 3: Data Story (Minimum 1 Required)
Your Data Story must use all the steps of Data Storytelling. The CBSE 2025-26 curriculum provides a sample topic: the impact of the Mid-Day Meal Scheme (MDMS) on student dropout rates. You can use this topic or choose another dataset-backed topic.
What a Complete Data Story Looks Like
A Data Story has three mandatory components: Data, Visualisation, and Narrative. Your practical file entry should show all three.
Sample Data Story: Student AI Performance vs Study Hours (Class 12)
Component 1 — Data
Dataset: 30 Class 12 AI students’ study hours per day and their AI exam score.
| Study Hours/Day | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|
| Average Score | 42 | 51 | 60 | 68 | 75 | 82 | 88 | 93 |
Data source: School internal records (or simulated data for practical purposes).
Component 2 — Visualisation
Create the following charts using Matplotlib (or Orange) and paste them in your file:
- A scatter plot showing the relationship between study hours (X-axis) and exam score (Y-axis) — this shows the positive correlation visually.
- A line chart showing the same trend — this makes the direction of improvement clearer.
Component 3 — Narrative
Title: The Study Hours Effect — What the Data Says About AI Exam Performance
Analysis shows a strong positive relationship between daily study hours and AI exam scores in Class 12. Students studying fewer than 3 hours per day average below 60 marks. Those studying 6 or more hours consistently score above 80. The data shows that each additional hour of daily study is associated with approximately an 8-point increase in exam score, up to around 7 hours, after which the returns diminish slightly.
This has a clear practical implication: students targeting 80+ marks should aim for a consistent 5–6 hours of daily study rather than last-minute cramming. Sporadic high-intensity study sessions do not produce the same results as consistent daily effort — a finding consistent with research on distributed practice in educational psychology.
Recommended action: Schools could use this analysis to set minimum study hour targets and design their revision schedules to encourage consistent daily practice rather than exam-season cramming.
Optional Programs (Add These to Score Higher in Lab Tests)
The CBSE curriculum lists five optional programs. Including even two of these will strengthen your Lab Test performance significantly.
Optional 1 — Train-Test Split in Linear Regression Demonstrates the complete model building workflow for a regression problem. Tests your understanding of how training and testing work in practice. This is tagged “For Advanced Learners” in the curriculum.
Optional 2 — Chatbot Using Google Gemini API Build a simple chatbot using Python and the Gemini API. Requires an API key from Google AI Studio (free tier available). This is also tagged “For Advanced Learners.”
Optional 3 — Orange Data Mining for Data Analytics Extend your Orange workflow to include data analytics widgets — Distribution, Feature Statistics, Mosaic Display.
Optional 4 — Classification Using TensorFlow Playground Demonstrate a neural network classification problem using the browser-based TensorFlow Playground tool (playground.tensorflow.org). Take screenshots showing the network configuration and decision boundary. No Python code required.
Optional 5 — Regression Using TensorFlow Playground Same as above but with a regression dataset. Show how changing the number of hidden layers and neurons affects the model fit.
For all optional programs: snapshots must be attached.
Quick Revision Box
Before you submit your practical file, confirm:
☑ Minimum 6 Python programs — each with code, explanation, and expected output
☑ Minimum 3 Orange activities — each with screenshots of every step and output
☑ Minimum 1 Data Story — with all three components: data, visualisation, and narrative
☑ Front page with your name, class, section, roll number, school name, and session
☑ Index page listing all programs with page numbers
☑ All code tested and working — never paste code that hasn’t run successfully
☑ All Orange screenshots are clear, readable, and correctly labelled
☑ Teacher signature obtained on the front page and index
Practice Questions
Q1. Your teacher asks you to add a seventh Python program to your practical file. They want you to demonstrate how to filter a DataFrame to show only rows where the value in one column exceeds a threshold, then sort the result. Write the key lines of code.
Model Answer:
# Filter students with AI marks above 80 and sort by marks descending
filtered_df = df[df['Marks_AI'] > 80]
sorted_df = filtered_df.sort_values('Marks_AI', ascending=False)
print(sorted_df)
df[df['Marks_AI'] > 80] applies a boolean mask — only rows where the condition is True are kept. sort_values() with ascending=False puts the highest marks first.
Q2. In your Orange practical, your classification model shows 65% accuracy. Your teacher asks you to explain whether this is a good result. What do you say?
Model Answer: Whether 65% is acceptable depends on the context. For a balanced dataset with multiple classes, 65% can be a starting point — there is room for improvement. I would first check if the dataset is imbalanced (one class having far more instances than others), which can inflate apparent accuracy. I would then try a different algorithm (like Naive Bayes vs. kNN) to compare, and check the confusion matrix to understand which classes are being misclassified. Accuracy alone does not tell the full story — I would also look at the F1 Score, especially if one class matters more than the others.
Frequently Asked Questions
Q: Can I submit the same programs that are listed in the CBSE sample programs section? Yes — the three sample Python programs in the official CBSE curriculum are valid starting points. You must make them your own by using your own data (even slightly modified datasets) and adding your own observations. Simply copying the CBSE sample programs verbatim without any personalisation risks appearing unoriginal to your examiner.
Q: Does the Data Story have to be handwritten or typed? Most schools accept typed and printed practical files. Confirm with your teacher. If your school requires handwritten files, write the narrative, draw the table, and paste printouts of the charts.
Q: How many screenshots do I need for Orange activities? The rule is: one screenshot per step, per activity. For Activity 1 (Data Visualisation), you should have at least 4 screenshots — one for the completed widget workflow and one each for the Scatter Plot, Box Plot, and Distributions outputs. Quality matters more than quantity: screenshots must be clear enough for the examiner to read the labels and values.
Q: Can I use a dataset from Kaggle for my Python programs? Yes. The CBSE curriculum specifically says: “Download dataset in the form of CSV from any public open-source website.” Kaggle, UCI Machine Learning Repository, and data.gov.in are all appropriate sources. Always mention the source in your practical file entry.
Q: My Orange workflow is not producing the expected output. What do I do? The most common issue is a broken connection between widgets. Check that the output port of one widget is connected to the correct input port of the next. In Orange, a yellow or red dot on a widget indicates an error or warning — click the widget to read the error message. Also ensure your dataset has the correct format: classification tasks require a target column set as “Class Variable” in the File widget settings.
The Students Who Score 48/50 In Practicals All Have One Thing in Common
They finished their practical file two weeks before submission — not two days before. With a complete file, you walk into the Lab Test with confidence because you have already run every program, seen the outputs, and understood what each line does. The viva examiner will ask you about your own file. You should know it better than anyone.
