This laboratory introduces students to regression techniques in machine learning, focusing on predicting continuous numerical values rather than discrete classifications. Using the NVIDIA Jetson Orin Nano and camera integration, students will develop real-time object tracking systems that predict the position of body parts (such as hands or nose) in camera frames. This hands-on experience bridges theoretical regression concepts with practical computer vision applications essential for autonomous systems, robotics, and human-computer interaction.
Regression is fundamental to many AI applications beyond classification. Autonomous vehicles need to predict continuous trajectories, robotic systems require precise position estimation, and human-computer interfaces depend on accurate gesture tracking. This laboratory provides practical experience with regression techniques that are essential for developing intelligent systems that interact with the physical world. By mastering regression on edge devices, you're preparing for real-world scenarios where AI systems must make continuous predictions in real-time with limited computational resources.
This laboratory consists of one comprehensive interactive exercise that progresses through the complete regression workflow:
By the end of this laboratory session, you will be able to:
Regression is a fundamental supervised learning technique in machine learning that predicts continuous numerical values based on input data. Unlike classification, which assigns inputs to discrete categories (e.g., "cat" or "dog"), regression estimates quantities along a continuous scale (e.g., temperature, price, or coordinate position). This makes regression essential for applications requiring precise numerical predictions rather than categorical decisions.
In the context of computer vision and robotics, regression enables systems to predict positions, trajectories, distances, and other continuous measurements. For example, an autonomous vehicle uses regression to predict the continuous steering angle needed to follow a road, while a robotic arm uses regression to estimate the precise coordinates where it should move to grasp an object.
The fundamental distinction between classification and regression lies in their output types:
Classification predicts discrete labels or categories. The model assigns each input to one of a predefined set of classes. For example, determining whether an image contains a cat, dog, or bird is a classification problem. The output is categorical, and performance is typically measured using accuracy, precision, recall, or F1-score.
Regression predicts continuous numerical values. The model estimates a quantity that can take any value within a range. For example, predicting the exact (x, y) pixel coordinates of a hand in an image is a regression problem. The output is numerical, and performance is measured using metrics like Mean Squared Error (MSE), Mean Absolute Error (MAE), or R-squared.
In this laboratory, we apply regression to predict the precise pixel coordinates of body parts in camera images—a problem that requires continuous predictions rather than discrete classifications.
While neural networks are often associated with classification tasks, they are equally powerful for regression. The key differences when using neural networks for regression include:
Output Layer Architecture: Regression networks typically use linear activation (no activation function) in the output layer, allowing the network to produce any real-valued number. For predicting XY coordinates, the output layer has two neurons—one for the x-coordinate and one for the y-coordinate.
Loss Functions: Instead of cross-entropy loss used in classification, regression employs loss functions that measure the distance between predicted and actual continuous values. Mean Squared Error (MSE) is the most common, calculated as the average of squared differences between predictions and targets. MSE penalizes larger errors more heavily, encouraging the model to minimize significant deviations.
Evaluation Metrics: Regression performance is assessed using metrics like MSE, Root Mean Squared Error (RMSE), or Mean Absolute Error (MAE), which quantify how close predictions are to actual values in the original units of measurement.
Transfer learning—using pre-trained models as starting points—is not limited to classification. We can adapt classification models like ResNet-18 for regression tasks by modifying the final layer. A pre-trained ResNet-18 has learned to extract rich visual features from images (edges, textures, shapes, objects) through training on millions of images. These learned features are equally valuable for regression tasks.
To convert a classification model to regression, we replace the final fully connected layer. Instead of outputting class probabilities, the modified layer outputs continuous values. For XY coordinate prediction, we replace the original (512, 1000) classification layer with a (512, 2) regression layer—where the two outputs represent x and y coordinates. The rest of the network remains unchanged, allowing us to leverage the pre-trained feature extraction capabilities while adapting only the final prediction layer for our specific regression task.
Object tracking involves following the position of objects across video frames. Traditional computer vision approaches used techniques like optical flow or template matching. Modern deep learning approaches use regression to directly predict object coordinates, offering robustness to appearance changes, occlusions, and varying backgrounds.
In this laboratory, we track body parts (hands, nose, etc.) by training a regression model to predict their (x, y) pixel coordinates. The process involves three stages:
1. Data Collection: We capture camera images and manually click on the target body part (e.g., hand) in each image. The system records both the image and the clicked coordinates as a labeled training example. Collecting diverse examples with the body part in different positions, angles, and backgrounds helps the model generalize.
2. Model Training: Using the collected data, we train a regression neural network to predict coordinates from images. The model learns to identify visual patterns associated with the target body part and map them to spatial locations. Transfer learning accelerates this process by starting with pre-trained feature extractors.
3. Live Inference: Once trained, the model processes live camera frames, predicting the coordinates of the target body part in real-time. These predictions can drive robotic actions, gesture controls, or visual feedback—making the system interactive and responsive.
When predicting pixel coordinates, we often normalize values to improve training stability and model performance. Instead of predicting raw pixel values (e.g., x from 0 to 640), we normalize coordinates to a standard range like [-1, 1] or [0, 1]. This normalization helps the neural network learn more effectively by keeping values in a consistent range across different image resolutions.
During inference, we convert the normalized predictions back to pixel coordinates for visualization and use. For example, if the model predicts normalized coordinates (0.5, -0.2), we can convert these to actual pixel locations using the image dimensions: x_pixel = (x_norm + 1) * image_width / 2.
Beyond object tracking, regression appears throughout AI applications: autonomous vehicles predict steering angles and throttle positions; medical imaging systems estimate tumor sizes; financial systems forecast stock prices; and recommender systems predict user ratings. Understanding regression provides a foundation for countless real-world AI deployments where precise numerical predictions are essential.
Before starting the laboratory exercises, complete the following knowledge assessment quiz. These questions test your understanding of regression concepts, neural networks for continuous prediction, and the differences between classification and regression tasks.
Instructions: Answer the following 10 questions to assess your readiness for the regression laboratory. Click on your chosen answer to see if it's correct.
This laboratory consists of one comprehensive interactive exercise that guides you through the complete regression workflow—from data collection to live inference. Follow the instructions carefully and ensure you understand each concept before proceeding to the next step.
In this comprehensive exercise, you will build an end-to-end regression system for real-time body part tracking. The interactive Jupyter notebook guides you through camera setup, data collection with coordinate labeling, model training using transfer learning, and live inference for tracking body parts like hands or nose in real-time camera feed.
This laboratory uses the NVIDIA Jetson Orin Nano Developer Kit assembled in Week 1, with all necessary software pre-configured by the lab technician.
All required software has been pre-installed and configured on your Jetson Orin Nano:
Your Jetson Orin Nano is ready to use with all software pre-configured. Simply access JupyterLab through your web browser, open the exercise notebook, and follow the instructions. Make sure your camera is connected before starting the data collection phase. The notebook includes detailed guidance for both USB and CSI camera types.
Prepare a comprehensive laboratory report documenting your regression experiments, findings, and analysis. Your report should demonstrate understanding of regression concepts, implementation skills, and critical evaluation of model performance.
For the interactive regression exercise:
| Component | Points | Criteria |
|---|---|---|
| Title Page & Formatting | 5 | Complete information, professional appearance |
| Objectives | 10 | Clear understanding of goals, well-articulated |
| Procedure & Results | 50 | Complete exercise, correct implementation, clear documentation |
| Discussion | 20 | Thoughtful analysis, evidence-based conclusions |
| Challenges & Solutions | 10 | Detailed problem-solving, reflective learning |
| Conclusion | 5 | Insightful summary, meaningful reflections |
| Total | 100 |