Week 4: Image Datasets and CNNs

📋

Laboratory Overview

↑ Go Up

This laboratory focuses on working with image datasets and Convolutional Neural Networks (CNNs) for computer vision tasks. Students will learn to process the CIFAR-10 dataset, implement CNN architectures from scratch, apply data augmentation techniques, and leverage transfer learning with pre-trained models. Through hands-on exercises on the Jetson Orin Nano, you'll develop practical skills in building and deploying image classification systems.

What You'll Learn

Image Data Loading: Use torchvision's ImageFolder and transforms for efficient dataset handling
CNN Architecture: Design convolutional layers, pooling layers, and feature extraction pipelines
Data Augmentation: Apply transformations to improve model generalization
CIFAR-10 Training: Train and evaluate CNNs on a standard benchmark dataset
Transfer Learning: Fine-tune pre-trained models (VGG, ResNet) for custom tasks
Custom Datasets: Organize and load your own image datasets

💡 Why This Matters

CNNs are the backbone of modern computer vision systems—powering applications from facial recognition and autonomous vehicles to medical image analysis and satellite imagery processing. Understanding how to work with image datasets, design CNN architectures, and apply transfer learning is essential for any AI engineer. By deploying these models on edge devices like the Jetson Orin Nano, you're learning practical skills for real-world deployment scenarios.

Lab Structure

This laboratory consists of three progressive parts, each building upon previous concepts:

Part 1: CIFAR-10 Classification with CNNs - Building and training convolutional networks
Part 2: Loading Image Data - Working with ImageFolder and custom datasets
Part 3: Transfer Learning - Fine-tuning pre-trained models for custom tasks

🎯

Learning Objectives

↑ Go Up

By the end of this laboratory session, you will be able to:

Load and preprocess image datasets using torchvision's ImageFolder and transforms modules for efficient data handling in computer vision tasks.
Design and implement CNN architectures for image classification, understanding the role of convolutional layers, pooling layers, and feature extraction.
Apply data augmentation techniques to improve model generalization and prevent overfitting when working with limited training data.
Train CNNs on the CIFAR-10 dataset and evaluate their performance using appropriate metrics for multi-class classification.
Implement transfer learning by fine-tuning pre-trained models (VGG, ResNet) for custom image classification tasks.
Create custom dataset loaders for organizing and loading your own image datasets using PyTorch's Dataset and DataLoader classes.

📚

Background

↑ Go Up

Introduction to Image Datasets and CNNs

Image datasets are fundamental to computer vision and deep learning applications. Unlike simple numerical data, images require specialized processing and neural network architectures to effectively learn visual patterns and features. This laboratory explores working with standard image datasets like CIFAR-10 and leveraging transfer learning with pre-trained models.

The CIFAR-10 dataset consists of 60,000 32×32 color images across 10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. With 50,000 training images and 10,000 test images, CIFAR-10 serves as an excellent benchmark for learning CNN architectures and training strategies. Despite the small image size, CIFAR-10 presents significant challenges that make it ideal for understanding fundamental concepts in computer vision.

Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are specifically designed for processing grid-like data such as images. Unlike fully connected networks that treat images as flat vectors, CNNs preserve spatial relationships through several key mechanisms. Convolutional layers apply learnable filters to detect local patterns such as edges, textures, and shapes. Pooling layers reduce spatial dimensions while retaining important features, making the network more efficient and robust. Through multiple layers, CNNs build feature hierarchies that progress from low-level features (edges, corners) to high-level features (objects, faces).

Data Augmentation

Data augmentation artificially expands training datasets by applying transformations like rotations, flips, crops, and color adjustments to existing images. This technique provides several important benefits: it improves model generalization by exposing the network to varied inputs, reduces overfitting especially when training data is limited, makes models more robust to real-world variations in lighting and orientation, and effectively increases dataset size without collecting new data. In PyTorch, data augmentation is implemented through the transforms module in torchvision.

Transfer Learning

Transfer learning leverages knowledge from models pre-trained on large datasets like ImageNet (containing 1.2 million images across 1000 categories) and adapts them for new tasks. This approach offers significant advantages: faster training since the model starts with learned features rather than random weights, better performance especially with limited training data, reusable low-level features (edges, textures, shapes) that transfer across different visual domains, and resource efficiency by avoiding the computational cost of training large models from scratch. In this lab, you'll experiment with fine-tuning pre-trained VGG and ResNet models for custom classification tasks.

🎬

Pre-lab Preparation

↑ Go Up

Before starting the laboratory exercises, watch the following video tutorials from Udacity's "Introduction to Deep Learning with PyTorch" course. These videos provide essential background knowledge and practical demonstrations of the concepts you'll implement in this lab.

📺 Required Video Tutorials

Watch all videos from Udacity - Introduction to Deep Learning with PyTorch

Convolutional Neural Networks

Chapter: Convolutional Neural Networks

Transfer Learning

Chapter: Introduction to PyTorch

📝 Pre-lab Quiz

Instructions: Complete this quiz after watching the required videos to assess your readiness for the lab. Click on your answer choice to see if it's correct.

Question 1: What is the size of images in the CIFAR-10 dataset?

A) 28×28 pixels (grayscale)
B) 64×64 pixels (RGB)
C) 32×32 pixels (RGB)
D) 224×224 pixels (RGB)

Question 2: What is the primary advantage of transfer learning?

A) It eliminates the need for training data
B) It leverages pre-trained features to improve performance with less data
C) It always produces 100% accuracy
D) It makes training slower but more accurate

Question 3: What is the purpose of data augmentation in image classification?

A) To reduce the size of the dataset
B) To make training faster by reducing image resolution
C) To increase dataset diversity and improve model generalization
D) To compress images for storage efficiency

Question 4: In a CNN, what is the primary function of a convolutional layer?

A) To detect local patterns and features in the input image
B) To reduce the spatial dimensions of the feature maps
C) To flatten the image into a vector
D) To perform classification at the output layer

Question 5: Which PyTorch module is commonly used for loading and preprocessing image datasets?

A) torch.nn
B) torchvision
C) torch.optim
D) torch.autograd

Question 6: What is the primary purpose of pooling layers in a CNN?

A) To increase the spatial dimensions of feature maps
B) To reduce spatial dimensions while retaining important features
C) To apply non-linear activation functions
D) To normalize the input data

Question 7: What does a typical CNN filter (kernel) size of 3×3 mean?

A) The filter examines a 3×3 pixel region at a time
B) The output image will be 3×3 pixels
C) The neural network has 3 input and 3 output layers
D) The stride is always 3 pixels

Question 8: Which pre-trained model architecture is known for using residual connections (skip connections)?

A) VGG
B) ResNet
C) AlexNet
D) LeNet

Question 9: When loading a custom image dataset using ImageFolder in PyTorch, how should the images be organized?

A) All images in a single folder with labels in a CSV file
B) Images organized in subdirectories, where each subdirectory name represents a class label
C) Images with labels embedded in the filename
D) Images in random folders with a separate JSON configuration file

Question 10: Which technique is NOT typically used to prevent overfitting in CNNs?

A) Dropout
B) Data augmentation
C) Increasing the learning rate significantly
D) L2 regularization (weight decay)

Note: Discuss your answers with your lab instructor before beginning the practical exercises. Understanding these concepts is crucial for successfully completing the lab.

⚙️

Lab Procedure

↑ Go Up

This laboratory is divided into three parts, each focusing on a specific aspect of working with image datasets and CNNs. Complete each part sequentially, as they build upon each other. Each part includes exercises that you should attempt before viewing the solutions.

⚠️ Before You Begin:

Ensure your Jetson Orin Nano is powered on and connected
Verify that PyTorch and torchvision are pre-configured by lab technician
All datasets will be automatically downloaded when running the exercises
Work through parts in order - later parts build on earlier concepts

Part 1: CIFAR-10 Classification with CNNs

Build and train a convolutional neural network to classify images from the CIFAR-10 dataset. You'll design a CNN architecture, implement training loops, apply data augmentation, and evaluate model performance on test data.

Key Topics:

CNN architecture design for image classification
CIFAR-10 dataset loading and preprocessing
Data augmentation techniques
Training and validation loops
Model evaluation and accuracy metrics

📓 Open Exercise

Part 2: Loading Image Data

Learn to organize, load, and preprocess custom image datasets using PyTorch's ImageFolder and transforms. You'll work with image datasets, implement data loaders, apply preprocessing pipelines, and visualize loaded images.

Key Topics:

ImageFolder for dataset organization
Data transforms and normalization
DataLoader configuration
Batch processing
Custom dataset creation

📓 Open Exercise

Part 3: Transfer Learning

Apply transfer learning by fine-tuning a pre-trained model (VGG or ResNet) for a custom classification task. You'll load pre-trained weights, freeze/unfreeze layers, replace the classifier head, and compare performance with training from scratch.

Key Topics:

Pre-trained model loading (VGG, ResNet)
Feature extraction vs fine-tuning
Layer freezing techniques
Classifier head modification
Performance comparison with training from scratch

📓 Open Exercise

                    💡 Tips for Success
                    Work sequentially - Each part builds on previous concepts
Experiment - Try modifying hyperparameters to see effects
Monitor training - Watch loss curves and accuracy metrics
Take screenshots - Capture all outputs and plots for your report
Ask questions - Consult your instructor when stuck

                

🔧

Lab Materials

↑ Go Up

Hardware Requirements

Platform: Jetson Orin Nano Developer Kit (assembled in Week 1)
Memory: Minimum 4GB RAM available
Storage: At least 2GB free space for datasets
Connection: Internet access for downloading CIFAR-10 dataset

Software Prerequisites

Pre-installed by lab technician on your Jetson Orin Nano:

Python: Version 3.8 or higher
PyTorch: Version compatible with Jetson (with CUDA support)
torchvision: For dataset loading and transformations
NumPy: For numerical operations
Matplotlib: For visualization

Datasets

CIFAR-10: Automatically downloaded by torchvision when running the exercises
Size: Approximately 170MB compressed
Classes: 10 categories (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck)
Images: 60,000 total (50,000 training + 10,000 test), 32×32 RGB

Included Files

All necessary code and helper functions are included in the three lab parts. Each HTML file is self-contained and ready to execute on your Jetson Orin Nano through a Jupyter Notebook interface.

Part 1: CIFAR-10 CNN Exercise and Solution HTML files
Part 2: Loading Image Data Exercise and Solution HTML files
Part 3: Transfer Learning Exercise and Solution HTML files

📖

References & Resources

↑ Go Up

Primary Course Material

Source: Udacity - Introduction to Deep Learning with PyTorch

Chapters: Convolutional Neural Networks & Introduction to PyTorch

See Pre-lab Preparation section above for complete video tutorial list

Additional Resources

PyTorch Documentation: https://pytorch.org/docs/
torchvision Documentation: https://pytorch.org/vision/stable/
CIFAR-10 Dataset: https://www.cs.toronto.edu/~kriz/cifar.html
Transfer Learning Tutorial: https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

📄

Lab Report Requirements

↑ Go Up

Students must submit a comprehensive lab report demonstrating their understanding of CNNs and transfer learning. The report should showcase practical skills acquired through the three laboratory parts and include evidence of all completed exercises.

⚠️ Submission Deadline:

Submit your completed lab report by [Insert Deadline - Typically 1 week after lab session]. Late submissions will be penalized according to course policy (10% per day, maximum 3 days).

Report Structure

Your lab report must include the following sections:

1. Title Page & Formatting (5 points)

Lab title, your name, student ID, date, course name, and instructor name
Professional formatting with clear headers and page numbers

2. Objectives (10 points)

List all learning objectives
Briefly explain why each is important (1-2 sentences each)

3. Procedure & Results (50 points)

For each of the 3 parts:

Include code snippets with clear outputs
Provide screenshots of key results
Add plots where applicable (loss curves, accuracy graphs, etc.)
Explain what each part demonstrates

4. Discussion (20 points)

Analyze your experimental results
Compare CNN performance on CIFAR-10 with different architectures
Discuss the benefits of transfer learning vs. training from scratch
Support all statements with evidence from your experiments

5. Challenges & Solutions (10 points)

Describe problems you encountered
Explain your debugging process
Reflect on what you learned from solving these challenges

6. Conclusion (5 points)

Summarize key learnings
Reflect on the most challenging concepts
Discuss potential applications of this knowledge

📋 Submission Checklist

Before submitting, ensure you have:

✓ Completed all 3 lab parts with working, tested code
✓ Included clear screenshots of all outputs, plots, and visualizations
✓ Answered all discussion questions thoroughly with supporting evidence
✓ Documented challenges and solutions in detail
✓ Checked all code for errors and verified all functions execute correctly
✓ Formatted report professionally with clear section headers and page numbers
✓ Referenced all sources and datasets used
✓ Proofread for grammar, spelling, and technical accuracy
✓ Verified all images are clear, properly labeled, and referenced in text
✓ Included your name and student ID on all pages

📤 Submission Format

File Format: Submit report as PDF document (required)
Code Files: Include Jupyter notebooks (.ipynb) in a separate ZIP file
File Naming Convention:
- Report: Week4_[YourLastName]_[StudentID].pdf
- Code: Week4_[YourLastName]_[StudentID]_Code.zip
- Example: Week4_Ahmed_202012345.pdf
Submission Method: Upload to University LMS (Blackboard/Moodle)
File Size Limit: Maximum 50MB total
- If exceeded, compress images or use PDF compression tools
- Ensure PDF is searchable text, not scanned images
Required Components:
- 1. Main PDF lab report
- 2. ZIP file containing all Jupyter notebooks with outputs
- 3. Any modified helper files (if applicable)

Important:

Ensure PDF is searchable and not password-protected
All code must be properly commented and executable
Include all necessary imports and dependencies
Test that your notebooks run completely from top to bottom

📊 Grading Rubric

Component	Points	Criteria
Title Page & Formatting	5	Complete, professional
Objectives	10	Clear, comprehensive
Procedure & Results	50	All parts complete, correct code, proper outputs
Discussion	20	Thoughtful analysis, supported by results
Challenges & Solutions	10	Detailed problem-solving process
Conclusion	5	Reflective, insightful
Total	100

Grading Notes:

All code must execute without errors for full credit
Screenshots must be clear, properly labeled, and referenced
All discussion questions must be answered with supporting evidence
Late penalty: 10% per day (up to 3 days)
Plagiarism will result in zero credit