This laboratory introduces students to image classification, a fundamental task in artificial intelligence and machine learning with applications across diverse fields including computer vision, autonomous vehicles, medical diagnosis, and fraud detection. Classification enables systems to automatically assign data to predefined categoriesโan essential capability for effective decision-making. In this hands-on lab, you will develop a practical understanding of deep learning-based image classification by training a custom model to recognize hand gestures. Using the Jetson Orin Nano platform and its integrated camera, you'll learn the complete workflow from data collection and labeling through model training to real-time inference. This experience bridges the gap between theoretical deep learning concepts and practical edge AI implementation.
What You'll Learn
- Classification Fundamentals: Understanding how neural networks map inputs to categorical outputs and the role of training data in model accuracy.
- Data Collection & Labeling: Building custom image datasets through interactive data capture and organizing training examples by category.
- Model Training: Configuring and training a deep neural network for image classification using transfer learning techniques.
- Real-time Inference: Deploying trained models on edge devices for live classification of camera input with minimal latency.
- Model Evaluation: Assessing classifier performance through accuracy metrics and testing with diverse input examples.
- Edge AI Optimization: Understanding the considerations for running AI models on resource-constrained embedded platforms.
๐ก Why This Matters
Classification is everywhere in modern AI systems. Autonomous vehicles must classify whether the road ahead is clear or blockedโa decision critical for passenger safety. Medical imaging systems classify tumors as benign or malignant, supporting life-saving diagnoses. Industrial quality control systems classify products as acceptable or defective, ensuring manufacturing standards. Your smartphone's camera classifies scenes to optimize photo settings automatically. By mastering classification techniques, you're learning the foundation for countless real-world applications. Moreover, implementing these systems on edge devices like the Jetson enables real-time, privacy-preserving AI that processes data locally rather than in the cloudโincreasingly important for robotics, IoT devices, and applications requiring low latency or offline operation.
Lab Structure
This laboratory consists of one comprehensive hands-on part that guides you through the complete classification pipeline:
- Part 1: Hand Gesture Classification - Build a complete image classification system by collecting training data (thumbs up/down gestures), training a neural network model, and deploying it for real-time gesture recognition on your Jetson platform. You'll gain practical experience with the entire machine learning workflow from data to deployment.
By the end of this laboratory session, you will be able to:
-
Explain Classification Concepts
Understand how deep learning models perform image classification by mapping pixel inputs to categorical outputs, and explain the role of training data in teaching models to recognize patterns.
-
Collect and Organize Training Data
Use the Jetson camera system to capture diverse training examples, organize them into labeled categories, and understand the importance of dataset quality and diversity for model performance.
-
Train Image Classification Models
Configure training parameters, execute the training process on the Jetson platform, and monitor training metrics to develop accurate classification models.
-
Deploy Models for Real-time Inference
Load trained models and perform real-time classification on live camera feeds, understanding the inference pipeline from image capture to prediction output.
-
Evaluate Classifier Performance
Test trained models with various input examples, assess accuracy, identify classification errors, and understand factors affecting model reliability.
-
Apply Classification to Real Problems
Design custom classification applications by defining appropriate categories for specific use cases and adapting the classification pipeline to different domains.
What is Classification?
Classification is a supervised learning task where an algorithm learns to assign input data to predefined categories or classes based on labeled training examples. In image classification, the input is an image (represented as a matrix of pixel values), and the output is a class label that identifies what the image contains. The classifier must learn the visual patterns, features, and characteristics that distinguish one class from another.
Unlike regression (which predicts continuous values), classification predicts discrete categories. The number of categories can range from binary classification (two classes, like thumbs up or down) to multi-class classification (many categories, like recognizing different objects, animals, or scenes). Modern deep learning approaches have achieved human-level or even superhuman accuracy on many classification tasks.
Deep Learning for Image Classification
A Deep Learning model consists of a neural network with internal parameters, or weights, configured to map inputs to outputs. In image classification, the inputs are the pixels from a camera image and the outputs are the possible categories, or classes that the model is trained to recognize. Multiple labeled examples must be provided to the model repeatedly to train it to recognize patterns in the images.
During training, the network learns hierarchical representations of visual features. Early layers might learn to detect simple edges and colors, while deeper layers combine these into more complex patterns like shapes and textures, eventually recognizing entire objects or gestures. The network adjusts its weights through backpropagation, minimizing the difference between its predictions and the true labels in the training data.
Once the model is trained, it can be run on live data and provide results in real time. This is called inference. During inference, the trained model takes a new image as input and outputs a predictionโtypically a probability distribution over all possible classes, with the highest probability indicating the predicted category.
Transfer Learning
Training a deep neural network from scratch requires enormous amounts of labeled data and significant computational resources. Transfer learning addresses this challenge by starting with a model pre-trained on a large dataset (like ImageNet, which contains millions of images across thousands of categories) and fine-tuning it for your specific task. The pre-trained model has already learned useful visual features that transfer well to new classification problems.
In transfer learning, we typically freeze the early layers (which have learned general features like edges and textures) and only retrain the final layers to recognize your specific categories. This approach dramatically reduces the amount of training data needed and speeds up training time, making it practical to train custom classifiers on modest hardware like the Jetson with relatively small datasets.
The Classification Pipeline
Developing a classification system involves several key stages:
- Problem Definition: Identify what categories you need to classify and whether classification is appropriate for your task.
- Data Collection: Gather diverse training examples for each category, ensuring variation in lighting, angles, backgrounds, and other relevant factors.
- Data Labeling: Organize collected examples by category, ensuring each example is correctly labeled.
- Model Training: Feed labeled examples to the neural network, allowing it to learn discriminative features.
- Validation & Testing: Evaluate model performance on new examples not seen during training.
- Deployment: Deploy the trained model for real-time inference on new data.
- Monitoring & Refinement: Track real-world performance and collect additional training data to improve accuracy.
Edge AI Deployment
Running classification models on edge devices like the Jetson Orin Nano offers several advantages over cloud-based approaches. Edge inference provides low latency (important for real-time applications like robotics), preserves privacy (data stays on-device), reduces bandwidth costs, and enables offline operation. However, edge deployment requires optimizing models for limited computational resources and memory constraints.
The Jetson platform includes GPU acceleration and specialized hardware for neural network inference, making it capable of running sophisticated models in real-time. Understanding how to balance model accuracy with computational efficiency is a key skill for edge AI development.
โ ๏ธ Important Considerations
- Data Quality: Model accuracy depends heavily on the quality and diversity of training data. Collect examples under various conditions (different lighting, backgrounds, orientations).
- Balanced Datasets: Ensure each category has roughly equal numbers of examples to prevent bias.
- Overfitting: Models trained too long on small datasets may memorize training examples rather than learning generalizable patterns.
- Testing: Always test models on new examples not seen during training to assess true performance.
Before starting the laboratory exercises, review the background material above and test your understanding with the following questions. These questions cover fundamental concepts you'll apply during the lab.
๐ Knowledge Check Questions
1. What is the primary goal of image classification?
A) To compress images into smaller file sizes
B) To assign images to predefined categorical labels based on their content
C) To enhance image quality and resolution
D) To generate new images from random noise
2. What is the difference between classification and regression?
A) Classification is faster than regression
B) Classification predicts discrete categories while regression predicts continuous values
C) Classification requires more training data than regression
D) There is no difference; they are the same task
3. In a deep learning model for image classification, what do the internal parameters (weights) represent?
A) The pixel values of the training images
B) Learned values that configure the network to map inputs to outputs correctly
C) The number of images in each category
D) The names of the classification categories
4. What is "inference" in the context of deep learning models?
A) The process of collecting training data
B) The process of training the model on labeled examples
C) Running a trained model on new data to make predictions in real-time
D) Evaluating model accuracy on a test dataset
5. What is the main advantage of transfer learning?
A) It eliminates the need for any training data
B) It allows you to leverage pre-learned features from large datasets, reducing training time and data requirements
C) It makes models 100% accurate on all tasks
D) It automatically generates training data
6. Why is dataset diversity important when training a classification model?
A) It makes training faster
B) It reduces the amount of data needed
C) It helps the model learn generalizable patterns that work under various conditions rather than memorizing specific examples
D) It is not important; identical examples are preferred
7. What problem does an unbalanced dataset cause in classification?
A) It makes the model train slower
B) The model may become biased toward predicting the overrepresented classes
C) It increases computational requirements
D) Unbalanced datasets have no effect on model performance
8. What is "overfitting" in machine learning?
A) When the model trains too quickly
B) When the model memorizes training examples rather than learning generalizable patterns
C) When the model is too accurate on training data
D) When you have too much training data
9. What is a key advantage of performing inference on edge devices (like Jetson) rather than in the cloud?
A) Edge devices have unlimited computing power
B) Lower latency, better privacy, and ability to operate offline
C) Edge inference is always more accurate than cloud inference
D) Edge devices never require power or maintenance
10. In a typical classification model output, what does the highest probability value indicate?
A) The number of training examples for that class
B) How long the model took to train
C) The model's prediction of which class the input belongs to
D) The quality of the input image
โ ๏ธ Before You Begin
- Ensure your Jetson Orin Nano is powered on and the camera is connected
- Verify that you can access the Jupyter notebook interface
- Have adequate lighting for capturing clear images
- Plan your gesture poses ahead of time for consistent data collection
Part 1: Hand Gesture Classification
In this comprehensive exercise, you will build a complete image classification system from scratch. You'll collect your own training data using the Jetson camera, train a neural network to recognize hand gestures (thumbs up and thumbs down), and deploy the trained model for real-time gesture recognition. This hands-on experience covers the entire machine learning pipeline from data collection to inference.
๐ฏ Key Topics Covered
- Interactive data collection using Jetson camera
- Dataset organization and labeling
- Neural network training with transfer learning
- Real-time inference and gesture recognition
- Model evaluation and testing
๐ What You'll Do
- Set up the interactive classification tool on your Jetson
- Collect 50-100 images of "thumbs up" gestures under various angles and lighting
- Collect 50-100 images of "thumbs down" gestures with similar diversity
- Configure and execute the training process
- Monitor training metrics (loss and accuracy)
- Test the trained model with live camera input
- Evaluate model performance and identify any misclassifications
- (Optional) Add additional gesture categories to expand the classifier
๐ก Tips for Success
- Data Collection: Capture images from multiple angles, distances, and lighting conditions
- Background Variation: Include diverse backgrounds to improve model generalization
- Consistent Gestures: Make sure your gestures are clear and consistent within each category
- Training Duration: Allow sufficient training epochs but watch for overfitting
- Testing: Test with gestures not used during training to assess true accuracy
Hardware Required
This laboratory uses the NVIDIA Jetson Orin Nano Developer Kit that you assembled in Week 1. All necessary hardware is pre-configured:
- NVIDIA Jetson Orin Nano Developer Kit - Assembled in Week 1
- CSI Camera Module - Connected to Jetson CSI port
- Power Supply - 5V DC barrel jack adapter
- Monitor, Keyboard, Mouse - For Jetson interaction
- Internet Connection - For downloading dependencies (if needed)
Software Environment
All required software has been pre-configured by the lab technician on your Jetson system:
- JetPack SDK - NVIDIA's comprehensive AI development suite
- PyTorch - Deep learning framework with CUDA support
- torchvision - Computer vision models and utilities
- Jupyter Notebook - Interactive development environment
- Python Libraries - NumPy, OpenCV, PIL for image processing
- JetBot Libraries - Camera interface utilities
๐ No Installation Required
Your Jetson system is pre-configured with all necessary software. You do not need to install any packages. Simply access the Jupyter notebook interface and begin the exercises.
Exercise Files
- Classification.html - Main exercise notebook (download from lab materials)
- Classification_solution.html - Reference solution (password protected)
Documentation & Resources
Course Materials
Review previous laboratory materials for foundational concepts:
- Week 3: Neural Network Architectures (ANN, DNN, CNN) with PyTorch
- Week 4: Image Datasets and CNNs (MNIST, CIFAR-10)
- Week 7: Sensor Interface and Data Acquisition (Camera systems)
Your lab report should demonstrate comprehensive understanding of image classification concepts and document your practical implementation experience. Follow the structure below and ensure all sections are complete and well-organized.
๐ Report Structure & Grading (100 points total)
1. Title Page & Formatting (5 points)
- Lab title: "Week 8: Image Classification with Deep Learning"
- Your name, student ID, date, course name (ELEC 395), and instructor name
- Professional formatting with clear section headers and page numbers
2. Objectives (10 points)
- List all learning objectives from this lab
- For each objective, briefly explain its importance (1-2 sentences)
3. Procedure & Results (50 points)
Document your complete classification implementation:
- Data Collection (15 points):
- Describe your data collection process and strategy
- Include sample images from your dataset (at least 4-6 examples per class)
- Document the number of images collected per category
- Explain how you ensured dataset diversity
- Model Training (20 points):
- Include code snippets showing training configuration
- Document training parameters (epochs, learning rate, etc.)
- Show training progress plots (loss and accuracy over epochs)
- Report final training accuracy
- Explain the training process and any adjustments made
- Testing & Inference (15 points):
- Provide screenshots of successful gesture classifications
- Show examples of correct predictions with confidence scores
- Document any misclassifications and analyze why they occurred
- Report overall testing accuracy
4. Discussion (20 points)
- Analyze factors that affected your model's accuracy (data quality, quantity, diversity)
- Compare your results to expected performance and explain any discrepancies
- Discuss the importance of balanced datasets in classification
- Explain how transfer learning benefited your implementation
- Describe real-world applications where similar classification systems could be deployed
- Discuss challenges of deploying classification models on edge devices
- Support all statements with evidence from your experimental results
5. Challenges & Solutions (10 points)
- Describe specific problems encountered during data collection or training
- Explain your debugging and troubleshooting process
- Document solutions implemented to overcome challenges
- Reflect on lessons learned from addressing these issues
6. Conclusion (5 points)
- Summarize key concepts learned about image classification
- Reflect on challenges faced and skills developed
- Discuss potential improvements or extensions to your classifier
- Describe how this knowledge applies to future AI projects
๐ Submission Checklist
- โ Completed classification exercise with working trained model
- โ Included sample training images showing data diversity
- โ Training progress plots (loss and accuracy curves)
- โ Screenshots of successful real-time gesture recognition
- โ Analysis of model performance and any errors
- โ Discussion of classification concepts and applications
- โ Documented challenges and solutions
- โ All sections complete with clear explanations
- โ Professional formatting with proper citations
๐ค Submission Format
- File Format: PDF (required)
- Code Files: Jupyter notebook (.ipynb) in ZIP file
- Supporting Files: Include sample training images and trained model (if file size permits)
- File Naming: Week8_[LastName]_[StudentID].pdf
- Submission Method: Upload to Learning Management System (LMS)
- File Size Limit: 50MB maximum (compress images if needed)
- Due Date: As specified in course schedule
๐ Grading Rubric
| Component |
Points |
Criteria |
| Title Page & Formatting |
5 |
Complete information, professional presentation |
| Objectives |
10 |
Clear listing with importance explained |
| Procedure & Results |
50 |
Complete documentation of all parts with evidence |
| Discussion |
20 |
Thoughtful analysis with supporting evidence |
| Challenges & Solutions |
10 |
Detailed problem-solving documentation |
| Conclusion |
5 |
Reflective summary with insights |
| Total |
100 |
|
Grading Notes:
- Model must demonstrate successful gesture classification
- Screenshots must be clear, labeled, and demonstrate functionality
- All code must execute without errors
- Late penalty: 10% per day (maximum 3 days)
- Plagiarism results in zero credit for the assignment