Week 9: Regression - Real-Time Object Tracking with Neural Networks

📋

Laboratory Overview

↑ Go Up

This laboratory introduces students to regression techniques in machine learning, focusing on predicting continuous numerical values rather than discrete classifications. Using the NVIDIA Jetson Orin Nano and camera integration, students will develop real-time object tracking systems that predict the position of body parts (such as hands or nose) in camera frames. This hands-on experience bridges theoretical regression concepts with practical computer vision applications essential for autonomous systems, robotics, and human-computer interaction.

What You'll Learn

Regression Fundamentals: Understand how regression differs from classification and when to use each approach
Interactive Data Collection: Capture labeled training data by clicking on target points in camera images
XY Coordinate Prediction: Build models that predict continuous (x, y) coordinates for object tracking
Transfer Learning for Regression: Adapt pre-trained ResNet-18 models for regression tasks
Real-Time Inference: Deploy regression models for live object tracking with camera feed
Model Evaluation: Assess regression performance using appropriate loss metrics and visualization

💡 Why This Matters

Regression is fundamental to many AI applications beyond classification. Autonomous vehicles need to predict continuous trajectories, robotic systems require precise position estimation, and human-computer interfaces depend on accurate gesture tracking. This laboratory provides practical experience with regression techniques that are essential for developing intelligent systems that interact with the physical world. By mastering regression on edge devices, you're preparing for real-world scenarios where AI systems must make continuous predictions in real-time with limited computational resources.

Lab Structure

This laboratory consists of one comprehensive interactive exercise that progresses through the complete regression workflow:

Part 1: Interactive Regression System - Complete data collection, model training, and live inference for body part tracking using camera and neural networks

🎯

Learning Objectives

↑ Go Up

By the end of this laboratory session, you will be able to:

Distinguish between regression and classification problems, understanding when to predict continuous values versus discrete categories in machine learning applications.
Implement interactive data collection systems using camera integration and coordinate labeling to create training datasets for regression models.
Build and train regression neural networks using PyTorch and transfer learning with ResNet-18 to predict continuous XY coordinates.
Adapt pre-trained classification models for regression tasks by modifying output layers and loss functions appropriately.
Deploy regression models for real-time inference on edge devices, performing live object tracking with camera input.
Evaluate regression model performance using appropriate metrics and visualization techniques for continuous predictions.
Apply regression techniques to practical applications such as gesture recognition, body part tracking, and position estimation for robotics and autonomous systems.

📚

Background

↑ Go Up

Introduction to Regression

Regression is a fundamental supervised learning technique in machine learning that predicts continuous numerical values based on input data. Unlike classification, which assigns inputs to discrete categories (e.g., "cat" or "dog"), regression estimates quantities along a continuous scale (e.g., temperature, price, or coordinate position). This makes regression essential for applications requiring precise numerical predictions rather than categorical decisions.

In the context of computer vision and robotics, regression enables systems to predict positions, trajectories, distances, and other continuous measurements. For example, an autonomous vehicle uses regression to predict the continuous steering angle needed to follow a road, while a robotic arm uses regression to estimate the precise coordinates where it should move to grasp an object.

Classification vs Regression

The fundamental distinction between classification and regression lies in their output types:

Classification predicts discrete labels or categories. The model assigns each input to one of a predefined set of classes. For example, determining whether an image contains a cat, dog, or bird is a classification problem. The output is categorical, and performance is typically measured using accuracy, precision, recall, or F1-score.

Regression predicts continuous numerical values. The model estimates a quantity that can take any value within a range. For example, predicting the exact (x, y) pixel coordinates of a hand in an image is a regression problem. The output is numerical, and performance is measured using metrics like Mean Squared Error (MSE), Mean Absolute Error (MAE), or R-squared.

In this laboratory, we apply regression to predict the precise pixel coordinates of body parts in camera images—a problem that requires continuous predictions rather than discrete classifications.

Neural Networks for Regression

While neural networks are often associated with classification tasks, they are equally powerful for regression. The key differences when using neural networks for regression include:

Output Layer Architecture: Regression networks typically use linear activation (no activation function) in the output layer, allowing the network to produce any real-valued number. For predicting XY coordinates, the output layer has two neurons—one for the x-coordinate and one for the y-coordinate.

Loss Functions: Instead of cross-entropy loss used in classification, regression employs loss functions that measure the distance between predicted and actual continuous values. Mean Squared Error (MSE) is the most common, calculated as the average of squared differences between predictions and targets. MSE penalizes larger errors more heavily, encouraging the model to minimize significant deviations.

Evaluation Metrics: Regression performance is assessed using metrics like MSE, Root Mean Squared Error (RMSE), or Mean Absolute Error (MAE), which quantify how close predictions are to actual values in the original units of measurement.

Transfer Learning for Regression

Transfer learning—using pre-trained models as starting points—is not limited to classification. We can adapt classification models like ResNet-18 for regression tasks by modifying the final layer. A pre-trained ResNet-18 has learned to extract rich visual features from images (edges, textures, shapes, objects) through training on millions of images. These learned features are equally valuable for regression tasks.

To convert a classification model to regression, we replace the final fully connected layer. Instead of outputting class probabilities, the modified layer outputs continuous values. For XY coordinate prediction, we replace the original (512, 1000) classification layer with a (512, 2) regression layer—where the two outputs represent x and y coordinates. The rest of the network remains unchanged, allowing us to leverage the pre-trained feature extraction capabilities while adapting only the final prediction layer for our specific regression task.

Real-Time Object Tracking with Regression

Object tracking involves following the position of objects across video frames. Traditional computer vision approaches used techniques like optical flow or template matching. Modern deep learning approaches use regression to directly predict object coordinates, offering robustness to appearance changes, occlusions, and varying backgrounds.

In this laboratory, we track body parts (hands, nose, etc.) by training a regression model to predict their (x, y) pixel coordinates. The process involves three stages:

1. Data Collection: We capture camera images and manually click on the target body part (e.g., hand) in each image. The system records both the image and the clicked coordinates as a labeled training example. Collecting diverse examples with the body part in different positions, angles, and backgrounds helps the model generalize.

2. Model Training: Using the collected data, we train a regression neural network to predict coordinates from images. The model learns to identify visual patterns associated with the target body part and map them to spatial locations. Transfer learning accelerates this process by starting with pre-trained feature extractors.

3. Live Inference: Once trained, the model processes live camera frames, predicting the coordinates of the target body part in real-time. These predictions can drive robotic actions, gesture controls, or visual feedback—making the system interactive and responsive.

Coordinate Normalization

When predicting pixel coordinates, we often normalize values to improve training stability and model performance. Instead of predicting raw pixel values (e.g., x from 0 to 640), we normalize coordinates to a standard range like [-1, 1] or [0, 1]. This normalization helps the neural network learn more effectively by keeping values in a consistent range across different image resolutions.

During inference, we convert the normalized predictions back to pixel coordinates for visualization and use. For example, if the model predicts normalized coordinates (0.5, -0.2), we can convert these to actual pixel locations using the image dimensions: x_pixel = (x_norm + 1) * image_width / 2.

Applications of Regression in AI

Beyond object tracking, regression appears throughout AI applications: autonomous vehicles predict steering angles and throttle positions; medical imaging systems estimate tumor sizes; financial systems forecast stock prices; and recommender systems predict user ratings. Understanding regression provides a foundation for countless real-world AI deployments where precise numerical predictions are essential.

📝

Pre-lab Preparation

↑ Go Up

Before starting the laboratory exercises, complete the following knowledge assessment quiz. These questions test your understanding of regression concepts, neural networks for continuous prediction, and the differences between classification and regression tasks.

📝 Pre-Lab Knowledge Assessment

Instructions: Answer the following 10 questions to assess your readiness for the regression laboratory. Click on your chosen answer to see if it's correct.

Question 1: What is the primary difference between classification and regression in machine learning?

A) Classification is more accurate than regression
B) Classification predicts discrete categories while regression predicts continuous values
C) Regression can only be used with neural networks
D) Classification requires more training data than regression

Question 2: Which loss function is most commonly used for training regression neural networks?

A) Cross-Entropy Loss
B) Mean Squared Error (MSE)
C) Binary Cross-Entropy Loss
D) Hinge Loss

Question 3: When predicting XY coordinates for object tracking, how many output neurons should the final layer have?

A) 1 neuron (for both x and y)
B) 2 neurons (one for x, one for y)
C) 3 neurons (x, y, and z)
D) 1000 neurons (one for each possible coordinate)

Question 4: What activation function is typically used in the output layer of a regression neural network?

A) ReLU activation
B) Sigmoid activation
C) Softmax activation
D) Linear activation (no activation function)

Question 5: Why is transfer learning useful for regression tasks in computer vision?

A) It eliminates the need for any training data
B) Pre-trained models have already learned visual features that can be adapted for regression
C) Transfer learning only works for classification, not regression
D) It reduces the model's accuracy but increases speed

Question 6: In the context of object tracking, what does the regression model predict?

A) The object's class label
B) The continuous (x, y) coordinates of the object's position
C) Whether the object is present or absent
D) The object's RGB color values

Question 7: What is the purpose of coordinate normalization in regression tasks?

A) To reduce image file sizes
B) To keep coordinate values in a consistent range for better training stability
C) To convert coordinates to integer values
D) To eliminate the need for a loss function

Question 8: Which of the following is an example of a regression problem in autonomous systems?

A) Detecting whether a traffic light is red or green
B) Predicting the steering angle for a self-driving car
C) Classifying pedestrians vs vehicles
D) Identifying road signs

Question 9: How does data collection for regression differ from classification?

A) Regression requires 10x more data than classification
B) Each training example must be labeled with continuous numerical values instead of discrete categories
C) Regression doesn't need labeled data
D) Data collection is identical for both tasks

Question 10: What metric would best evaluate a model predicting pixel coordinates in an image?

A) Classification accuracy
B) Precision and recall
C) Mean Squared Error (MSE) or Mean Absolute Error (MAE)
D) F1-score

⚙️

Lab Procedure

↑ Go Up

This laboratory consists of one comprehensive interactive exercise that guides you through the complete regression workflow—from data collection to live inference. Follow the instructions carefully and ensure you understand each concept before proceeding to the next step.

Part 1: Interactive Regression for Body Part Tracking

In this comprehensive exercise, you will build an end-to-end regression system for real-time body part tracking. The interactive Jupyter notebook guides you through camera setup, data collection with coordinate labeling, model training using transfer learning, and live inference for tracking body parts like hands or nose in real-time camera feed.

🔑 Key Concepts Covered:

Setting up camera integration (USB or CSI camera) on Jetson Orin Nano
Creating interactive data collection interfaces with ipywidgets
Labeling training data by clicking on target body parts in images
Defining XY coordinate datasets for multiple tracking categories
Implementing coordinate normalization for neural network training
Adapting ResNet-18 from classification to regression with transfer learning
Training regression models with MSE loss and appropriate optimizers
Visualizing training progress and model performance
Deploying trained models for real-time coordinate prediction
Processing live camera feed with continuous position tracking
Overlaying predictions on video frames for visual feedback

⚠️ Important: Make sure your camera is properly connected before starting the notebook. The notebook includes detailed instructions for both USB and CSI camera types. Collect at least 30-50 labeled examples per tracking category for effective model training. Vary the position, angle, and background of the tracked body part to help the model generalize.

📓 Open Interactive Exercise

🛠️

Lab Materials

↑ Go Up

Hardware Requirements

This laboratory uses the NVIDIA Jetson Orin Nano Developer Kit assembled in Week 1, with all necessary software pre-configured by the lab technician.

NVIDIA Jetson Orin Nano Developer Kit (assembled and configured)
USB Camera or CSI Camera (compatible with Jetson)
Power supply for Jetson Orin Nano
Mouse and keyboard for data labeling interactions
Network connection (Wi-Fi or Ethernet) for JupyterLab access

Software Environment

All required software has been pre-installed and configured on your Jetson Orin Nano:

JetPack SDK: NVIDIA's comprehensive AI software stack
PyTorch: Deep learning framework for model training and inference
torchvision: Computer vision utilities and pre-trained models
JupyterLab: Interactive development environment for notebooks
ipywidgets: Interactive widgets for data collection interfaces
ipyevents: Event handling for mouse interactions in widgets
OpenCV: Computer vision library for image processing
Custom utilities: Dataset classes and preprocessing functions

Exercise Files

regression_interactive_ipevent.html: Interactive notebook for complete regression workflow including camera setup, data collection, training, and live inference

💡 Setup Notes

Your Jetson Orin Nano is ready to use with all software pre-configured. Simply access JupyterLab through your web browser, open the exercise notebook, and follow the instructions. Make sure your camera is connected before starting the data collection phase. The notebook includes detailed guidance for both USB and CSI camera types.

📖

References

↑ Go Up

Primary Resources

NVIDIA Deep Learning Institute: Getting Started with AI on Jetson Nano
Comprehensive course covering AI fundamentals, transfer learning, and regression techniques on NVIDIA Jetson platforms

PyTorch Documentation: Neural Network Regression
Official PyTorch guides for implementing regression models, loss functions, and training procedures

NVIDIA JetBot Documentation: Interactive Regression Notebooks
Open-source notebooks demonstrating regression-based object following and coordinate prediction

Additional Resources

NVIDIA Jetson Orin Nano Developer Kit Documentation
Hardware specifications, software stack, and optimization guides

PyTorch Tutorials: Training Neural Networks
Step-by-step guides for model training, evaluation, and deployment

Computer Vision for Robotics
Applications of regression in robotic perception and control systems

📄

Lab Report

↑ Go Up

Prepare a comprehensive laboratory report documenting your regression experiments, findings, and analysis. Your report should demonstrate understanding of regression concepts, implementation skills, and critical evaluation of model performance.

📋 Report Structure

1. Title Page & Course Information (5 points)

Course name, number, and section
Experiment title and week number
Your name and student ID
Submission date
Instructor name

2. Objectives Summary (10 points)

Restate laboratory objectives in your own words
Explain the purpose of learning regression techniques
Describe expected outcomes and skills to be developed

3. Procedure & Results (50 points)

For the interactive regression exercise:

Document your data collection process with screenshots
Show examples of labeled training data (images with coordinate annotations)
Include code snippets for model architecture modification
Present training loss curves and convergence analysis
Provide screenshots of live inference results with predictions overlaid
Demonstrate model performance with different body parts or scenarios
Compare coordinate prediction accuracy across training examples
Explain any preprocessing or normalization steps applied

4. Discussion (20 points)

Compare regression vs classification approaches for tracking tasks
Analyze how data collection quality affects model performance
Discuss the role of transfer learning in accelerating regression training
Evaluate the accuracy of coordinate predictions in different scenarios
Explain why MSE is appropriate for this regression task
Discuss potential real-world applications of this tracking system
Support your analysis with quantitative results and visual evidence

5. Challenges & Solutions (10 points)

Describe technical challenges encountered (camera setup, data labeling, training issues)
Explain your debugging and problem-solving process
Discuss strategies for improving tracking accuracy
Reflect on lessons learned about regression model development

6. Conclusion (5 points)

Summarize key insights about regression in computer vision
Reflect on understanding of continuous vs discrete predictions
Discuss potential improvements or extensions to the tracking system
Identify possible applications in robotics or autonomous systems

                    📋 Submission Checklist
                    ✓ Completed interactive regression exercise with data collection and training
✓ Included clear screenshots of data labeling interface
✓ Documented training process with loss curves
✓ Provided live inference results with coordinate predictions
✓ Analyzed regression performance with quantitative metrics
✓ Discussed differences between regression and classification
✓ All code snippets are properly formatted and explained
✓ Report follows professional formatting standards

                

📤 Submission Format

File Format: PDF (required)
Code Files: Include Jupyter notebook (.ipynb) in ZIP archive
File Naming: Week9_[LastName]_[StudentID].pdf
Submission Method: Upload to Learning Management System (LMS)
File Size Limit: 50MB maximum

📊 Grading Rubric

Component	Points	Criteria
Title Page & Formatting	5	Complete information, professional appearance
Objectives	10	Clear understanding of goals, well-articulated
Procedure & Results	50	Complete exercise, correct implementation, clear documentation
Discussion	20	Thoughtful analysis, evidence-based conclusions
Challenges & Solutions	10	Detailed problem-solving, reflective learning
Conclusion	5	Insightful summary, meaningful reflections
Total	100

Grading Notes:

All code must execute without errors
Screenshots must clearly show data collection and predictions
Analysis must demonstrate understanding of regression concepts
Late penalty: 10% per day (maximum 3 days accepted)
Plagiarism or unauthorized collaboration: zero credit

Laboratory Overview

What You'll Learn

💡 Why This Matters

Lab Structure

Learning Objectives

Background

Introduction to Regression

Classification vs Regression

Neural Networks for Regression

Transfer Learning for Regression

Real-Time Object Tracking with Regression

Coordinate Normalization

Applications of Regression in AI

Pre-lab Preparation

📝 Pre-Lab Knowledge Assessment

Question 1: What is the primary difference between classification and regression in machine learning?

Question 2: Which loss function is most commonly used for training regression neural networks?

Question 3: When predicting XY coordinates for object tracking, how many output neurons should the final layer have?

Question 4: What activation function is typically used in the output layer of a regression neural network?

Question 5: Why is transfer learning useful for regression tasks in computer vision?

Question 6: In the context of object tracking, what does the regression model predict?

Question 7: What is the purpose of coordinate normalization in regression tasks?

Question 8: Which of the following is an example of a regression problem in autonomous systems?

Question 9: How does data collection for regression differ from classification?

Question 10: What metric would best evaluate a model predicting pixel coordinates in an image?

Lab Procedure

Part 1: Interactive Regression for Body Part Tracking

🔑 Key Concepts Covered:

Lab Materials

Hardware Requirements

Software Environment

Exercise Files

💡 Setup Notes

References

Primary Resources

Recommended Reading

Additional Resources

Lab Report

📋 Report Structure

1. Title Page & Course Information (5 points)

2. Objectives Summary (10 points)

3. Procedure & Results (50 points)

4. Discussion (20 points)

5. Challenges & Solutions (10 points)

6. Conclusion (5 points)

📋 Submission Checklist

📤 Submission Format

📊 Grading Rubric

Grading Notes: