Week 9
Experiment 9
Regression
Real-Time Object Tracking with Neural Networks
← Back to Course Home
📋

Laboratory Overview

↑ Go Up

This laboratory introduces students to regression techniques in machine learning, focusing on predicting continuous numerical values rather than discrete classifications. Using the NVIDIA Jetson Orin Nano and camera integration, students will develop real-time object tracking systems that predict the position of body parts (such as hands or nose) in camera frames. This hands-on experience bridges theoretical regression concepts with practical computer vision applications essential for autonomous systems, robotics, and human-computer interaction.

What You'll Learn

  • Regression Fundamentals: Understand how regression differs from classification and when to use each approach
  • Interactive Data Collection: Capture labeled training data by clicking on target points in camera images
  • XY Coordinate Prediction: Build models that predict continuous (x, y) coordinates for object tracking
  • Transfer Learning for Regression: Adapt pre-trained ResNet-18 models for regression tasks
  • Real-Time Inference: Deploy regression models for live object tracking with camera feed
  • Model Evaluation: Assess regression performance using appropriate loss metrics and visualization

💡 Why This Matters

Regression is fundamental to many AI applications beyond classification. Autonomous vehicles need to predict continuous trajectories, robotic systems require precise position estimation, and human-computer interfaces depend on accurate gesture tracking. This laboratory provides practical experience with regression techniques that are essential for developing intelligent systems that interact with the physical world. By mastering regression on edge devices, you're preparing for real-world scenarios where AI systems must make continuous predictions in real-time with limited computational resources.

Lab Structure

This laboratory consists of one comprehensive interactive exercise that progresses through the complete regression workflow:

  • Part 1: Interactive Regression System - Complete data collection, model training, and live inference for body part tracking using camera and neural networks
🎯

Learning Objectives

↑ Go Up

By the end of this laboratory session, you will be able to:

  • Distinguish between regression and classification problems, understanding when to predict continuous values versus discrete categories in machine learning applications.
  • Implement interactive data collection systems using camera integration and coordinate labeling to create training datasets for regression models.
  • Build and train regression neural networks using PyTorch and transfer learning with ResNet-18 to predict continuous XY coordinates.
  • Adapt pre-trained classification models for regression tasks by modifying output layers and loss functions appropriately.
  • Deploy regression models for real-time inference on edge devices, performing live object tracking with camera input.
  • Evaluate regression model performance using appropriate metrics and visualization techniques for continuous predictions.
  • Apply regression techniques to practical applications such as gesture recognition, body part tracking, and position estimation for robotics and autonomous systems.
📚

Background

↑ Go Up

Introduction to Regression

Regression is a fundamental supervised learning technique in machine learning that predicts continuous numerical values based on input data. Unlike classification, which assigns inputs to discrete categories (e.g., "cat" or "dog"), regression estimates quantities along a continuous scale (e.g., temperature, price, or coordinate position). This makes regression essential for applications requiring precise numerical predictions rather than categorical decisions.

In the context of computer vision and robotics, regression enables systems to predict positions, trajectories, distances, and other continuous measurements. For example, an autonomous vehicle uses regression to predict the continuous steering angle needed to follow a road, while a robotic arm uses regression to estimate the precise coordinates where it should move to grasp an object.

Classification vs Regression

The fundamental distinction between classification and regression lies in their output types:

Classification predicts discrete labels or categories. The model assigns each input to one of a predefined set of classes. For example, determining whether an image contains a cat, dog, or bird is a classification problem. The output is categorical, and performance is typically measured using accuracy, precision, recall, or F1-score.

Regression predicts continuous numerical values. The model estimates a quantity that can take any value within a range. For example, predicting the exact (x, y) pixel coordinates of a hand in an image is a regression problem. The output is numerical, and performance is measured using metrics like Mean Squared Error (MSE), Mean Absolute Error (MAE), or R-squared.

In this laboratory, we apply regression to predict the precise pixel coordinates of body parts in camera images—a problem that requires continuous predictions rather than discrete classifications.

Neural Networks for Regression

While neural networks are often associated with classification tasks, they are equally powerful for regression. The key differences when using neural networks for regression include:

Output Layer Architecture: Regression networks typically use linear activation (no activation function) in the output layer, allowing the network to produce any real-valued number. For predicting XY coordinates, the output layer has two neurons—one for the x-coordinate and one for the y-coordinate.

Loss Functions: Instead of cross-entropy loss used in classification, regression employs loss functions that measure the distance between predicted and actual continuous values. Mean Squared Error (MSE) is the most common, calculated as the average of squared differences between predictions and targets. MSE penalizes larger errors more heavily, encouraging the model to minimize significant deviations.

Evaluation Metrics: Regression performance is assessed using metrics like MSE, Root Mean Squared Error (RMSE), or Mean Absolute Error (MAE), which quantify how close predictions are to actual values in the original units of measurement.

Transfer Learning for Regression

Transfer learning—using pre-trained models as starting points—is not limited to classification. We can adapt classification models like ResNet-18 for regression tasks by modifying the final layer. A pre-trained ResNet-18 has learned to extract rich visual features from images (edges, textures, shapes, objects) through training on millions of images. These learned features are equally valuable for regression tasks.

To convert a classification model to regression, we replace the final fully connected layer. Instead of outputting class probabilities, the modified layer outputs continuous values. For XY coordinate prediction, we replace the original (512, 1000) classification layer with a (512, 2) regression layer—where the two outputs represent x and y coordinates. The rest of the network remains unchanged, allowing us to leverage the pre-trained feature extraction capabilities while adapting only the final prediction layer for our specific regression task.

Real-Time Object Tracking with Regression

Object tracking involves following the position of objects across video frames. Traditional computer vision approaches used techniques like optical flow or template matching. Modern deep learning approaches use regression to directly predict object coordinates, offering robustness to appearance changes, occlusions, and varying backgrounds.

In this laboratory, we track body parts (hands, nose, etc.) by training a regression model to predict their (x, y) pixel coordinates. The process involves three stages:

1. Data Collection: We capture camera images and manually click on the target body part (e.g., hand) in each image. The system records both the image and the clicked coordinates as a labeled training example. Collecting diverse examples with the body part in different positions, angles, and backgrounds helps the model generalize.

2. Model Training: Using the collected data, we train a regression neural network to predict coordinates from images. The model learns to identify visual patterns associated with the target body part and map them to spatial locations. Transfer learning accelerates this process by starting with pre-trained feature extractors.

3. Live Inference: Once trained, the model processes live camera frames, predicting the coordinates of the target body part in real-time. These predictions can drive robotic actions, gesture controls, or visual feedback—making the system interactive and responsive.

Coordinate Normalization

When predicting pixel coordinates, we often normalize values to improve training stability and model performance. Instead of predicting raw pixel values (e.g., x from 0 to 640), we normalize coordinates to a standard range like [-1, 1] or [0, 1]. This normalization helps the neural network learn more effectively by keeping values in a consistent range across different image resolutions.

During inference, we convert the normalized predictions back to pixel coordinates for visualization and use. For example, if the model predicts normalized coordinates (0.5, -0.2), we can convert these to actual pixel locations using the image dimensions: x_pixel = (x_norm + 1) * image_width / 2.

Applications of Regression in AI

Beyond object tracking, regression appears throughout AI applications: autonomous vehicles predict steering angles and throttle positions; medical imaging systems estimate tumor sizes; financial systems forecast stock prices; and recommender systems predict user ratings. Understanding regression provides a foundation for countless real-world AI deployments where precise numerical predictions are essential.

📝

Pre-lab Preparation

↑ Go Up

Before starting the laboratory exercises, complete the following knowledge assessment quiz. These questions test your understanding of regression concepts, neural networks for continuous prediction, and the differences between classification and regression tasks.

📝 Pre-Lab Knowledge Assessment

Instructions: Answer the following 10 questions to assess your readiness for the regression laboratory. Click on your chosen answer to see if it's correct.

Question 1: What is the primary difference between classification and regression in machine learning?

  • A) Classification is more accurate than regression
  • B) Classification predicts discrete categories while regression predicts continuous values
  • C) Regression can only be used with neural networks
  • D) Classification requires more training data than regression

Question 2: Which loss function is most commonly used for training regression neural networks?

  • A) Cross-Entropy Loss
  • B) Mean Squared Error (MSE)
  • C) Binary Cross-Entropy Loss
  • D) Hinge Loss

Question 3: When predicting XY coordinates for object tracking, how many output neurons should the final layer have?

  • A) 1 neuron (for both x and y)
  • B) 2 neurons (one for x, one for y)
  • C) 3 neurons (x, y, and z)
  • D) 1000 neurons (one for each possible coordinate)

Question 4: What activation function is typically used in the output layer of a regression neural network?

  • A) ReLU activation
  • B) Sigmoid activation
  • C) Softmax activation
  • D) Linear activation (no activation function)

Question 5: Why is transfer learning useful for regression tasks in computer vision?

  • A) It eliminates the need for any training data
  • B) Pre-trained models have already learned visual features that can be adapted for regression
  • C) Transfer learning only works for classification, not regression
  • D) It reduces the model's accuracy but increases speed

Question 6: In the context of object tracking, what does the regression model predict?

  • A) The object's class label
  • B) The continuous (x, y) coordinates of the object's position
  • C) Whether the object is present or absent
  • D) The object's RGB color values

Question 7: What is the purpose of coordinate normalization in regression tasks?

  • A) To reduce image file sizes
  • B) To keep coordinate values in a consistent range for better training stability
  • C) To convert coordinates to integer values
  • D) To eliminate the need for a loss function

Question 8: Which of the following is an example of a regression problem in autonomous systems?

  • A) Detecting whether a traffic light is red or green
  • B) Predicting the steering angle for a self-driving car
  • C) Classifying pedestrians vs vehicles
  • D) Identifying road signs

Question 9: How does data collection for regression differ from classification?

  • A) Regression requires 10x more data than classification
  • B) Each training example must be labeled with continuous numerical values instead of discrete categories
  • C) Regression doesn't need labeled data
  • D) Data collection is identical for both tasks

Question 10: What metric would best evaluate a model predicting pixel coordinates in an image?

  • A) Classification accuracy
  • B) Precision and recall
  • C) Mean Squared Error (MSE) or Mean Absolute Error (MAE)
  • D) F1-score
⚙️

Lab Procedure

↑ Go Up

This laboratory consists of one comprehensive interactive exercise that guides you through the complete regression workflow—from data collection to live inference. Follow the instructions carefully and ensure you understand each concept before proceeding to the next step.

Part 1: Interactive Regression for Body Part Tracking

In this comprehensive exercise, you will build an end-to-end regression system for real-time body part tracking. The interactive Jupyter notebook guides you through camera setup, data collection with coordinate labeling, model training using transfer learning, and live inference for tracking body parts like hands or nose in real-time camera feed.

🔑 Key Concepts Covered:

  • Setting up camera integration (USB or CSI camera) on Jetson Orin Nano
  • Creating interactive data collection interfaces with ipywidgets
  • Labeling training data by clicking on target body parts in images
  • Defining XY coordinate datasets for multiple tracking categories
  • Implementing coordinate normalization for neural network training
  • Adapting ResNet-18 from classification to regression with transfer learning
  • Training regression models with MSE loss and appropriate optimizers
  • Visualizing training progress and model performance
  • Deploying trained models for real-time coordinate prediction
  • Processing live camera feed with continuous position tracking
  • Overlaying predictions on video frames for visual feedback
⚠️ Important: Make sure your camera is properly connected before starting the notebook. The notebook includes detailed instructions for both USB and CSI camera types. Collect at least 30-50 labeled examples per tracking category for effective model training. Vary the position, angle, and background of the tracked body part to help the model generalize.
🛠️

Lab Materials

↑ Go Up

Hardware Requirements

This laboratory uses the NVIDIA Jetson Orin Nano Developer Kit assembled in Week 1, with all necessary software pre-configured by the lab technician.

  • NVIDIA Jetson Orin Nano Developer Kit (assembled and configured)
  • USB Camera or CSI Camera (compatible with Jetson)
  • Power supply for Jetson Orin Nano
  • Mouse and keyboard for data labeling interactions
  • Network connection (Wi-Fi or Ethernet) for JupyterLab access

Software Environment

All required software has been pre-installed and configured on your Jetson Orin Nano:

  • JetPack SDK: NVIDIA's comprehensive AI software stack
  • PyTorch: Deep learning framework for model training and inference
  • torchvision: Computer vision utilities and pre-trained models
  • JupyterLab: Interactive development environment for notebooks
  • ipywidgets: Interactive widgets for data collection interfaces
  • ipyevents: Event handling for mouse interactions in widgets
  • OpenCV: Computer vision library for image processing
  • Custom utilities: Dataset classes and preprocessing functions

Exercise Files

  • regression_interactive_ipevent.html: Interactive notebook for complete regression workflow including camera setup, data collection, training, and live inference

💡 Setup Notes

Your Jetson Orin Nano is ready to use with all software pre-configured. Simply access JupyterLab through your web browser, open the exercise notebook, and follow the instructions. Make sure your camera is connected before starting the data collection phase. The notebook includes detailed guidance for both USB and CSI camera types.

📖

References

↑ Go Up

Primary Resources

NVIDIA Deep Learning Institute: Getting Started with AI on Jetson Nano
Comprehensive course covering AI fundamentals, transfer learning, and regression techniques on NVIDIA Jetson platforms
PyTorch Documentation: Neural Network Regression
Official PyTorch guides for implementing regression models, loss functions, and training procedures
NVIDIA JetBot Documentation: Interactive Regression Notebooks
Open-source notebooks demonstrating regression-based object following and coordinate prediction

Recommended Reading

Deep Learning for Computer Vision: Regression vs Classification
Comparative analysis of prediction tasks in computer vision applications
Transfer Learning for Visual Regression Tasks
Techniques for adapting classification models to continuous prediction problems
Real-Time Object Tracking with Neural Networks
Survey of modern approaches for position estimation and trajectory prediction
ResNet Architecture and Applications
Understanding Residual Networks for both classification and regression tasks

Additional Resources

NVIDIA Jetson Orin Nano Developer Kit Documentation
Hardware specifications, software stack, and optimization guides
PyTorch Tutorials: Training Neural Networks
Step-by-step guides for model training, evaluation, and deployment
Computer Vision for Robotics
Applications of regression in robotic perception and control systems
📄

Lab Report

↑ Go Up

Prepare a comprehensive laboratory report documenting your regression experiments, findings, and analysis. Your report should demonstrate understanding of regression concepts, implementation skills, and critical evaluation of model performance.

📋 Report Structure

1. Title Page & Course Information (5 points)

  • Course name, number, and section
  • Experiment title and week number
  • Your name and student ID
  • Submission date
  • Instructor name

2. Objectives Summary (10 points)

  • Restate laboratory objectives in your own words
  • Explain the purpose of learning regression techniques
  • Describe expected outcomes and skills to be developed

3. Procedure & Results (50 points)

For the interactive regression exercise:

  • Document your data collection process with screenshots
  • Show examples of labeled training data (images with coordinate annotations)
  • Include code snippets for model architecture modification
  • Present training loss curves and convergence analysis
  • Provide screenshots of live inference results with predictions overlaid
  • Demonstrate model performance with different body parts or scenarios
  • Compare coordinate prediction accuracy across training examples
  • Explain any preprocessing or normalization steps applied

4. Discussion (20 points)

  • Compare regression vs classification approaches for tracking tasks
  • Analyze how data collection quality affects model performance
  • Discuss the role of transfer learning in accelerating regression training
  • Evaluate the accuracy of coordinate predictions in different scenarios
  • Explain why MSE is appropriate for this regression task
  • Discuss potential real-world applications of this tracking system
  • Support your analysis with quantitative results and visual evidence

5. Challenges & Solutions (10 points)

  • Describe technical challenges encountered (camera setup, data labeling, training issues)
  • Explain your debugging and problem-solving process
  • Discuss strategies for improving tracking accuracy
  • Reflect on lessons learned about regression model development

6. Conclusion (5 points)

  • Summarize key insights about regression in computer vision
  • Reflect on understanding of continuous vs discrete predictions
  • Discuss potential improvements or extensions to the tracking system
  • Identify possible applications in robotics or autonomous systems

📋 Submission Checklist

  • ✓ Completed interactive regression exercise with data collection and training
  • ✓ Included clear screenshots of data labeling interface
  • ✓ Documented training process with loss curves
  • ✓ Provided live inference results with coordinate predictions
  • ✓ Analyzed regression performance with quantitative metrics
  • ✓ Discussed differences between regression and classification
  • ✓ All code snippets are properly formatted and explained
  • ✓ Report follows professional formatting standards

📤 Submission Format

  • File Format: PDF (required)
  • Code Files: Include Jupyter notebook (.ipynb) in ZIP archive
  • File Naming: Week9_[LastName]_[StudentID].pdf
  • Submission Method: Upload to Learning Management System (LMS)
  • File Size Limit: 50MB maximum

📊 Grading Rubric

Component Points Criteria
Title Page & Formatting 5 Complete information, professional appearance
Objectives 10 Clear understanding of goals, well-articulated
Procedure & Results 50 Complete exercise, correct implementation, clear documentation
Discussion 20 Thoughtful analysis, evidence-based conclusions
Challenges & Solutions 10 Detailed problem-solving, reflective learning
Conclusion 5 Insightful summary, meaningful reflections
Total 100

Grading Notes:

  • All code must execute without errors
  • Screenshots must clearly show data collection and predictions
  • Analysis must demonstrate understanding of regression concepts
  • Late penalty: 10% per day (maximum 3 days accepted)
  • Plagiarism or unauthorized collaboration: zero credit