Gradient Descent Notebook Explained - Setup & Training Loop

1

Import Statements

These lines import the essential Python libraries for numerical computing, data handling, and visualization.

imports.py

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
                

Line 1 import matplotlib.pyplot as plt

📊 Matplotlib - Visualization Library

Imports the pyplot module from Matplotlib for creating plots.

matplotlib — Python's most popular plotting library
pyplot — Provides a MATLAB-like interface for simple plotting
as plt — Creates a short alias so we can write plt.plot()

Used for: Plotting data points, decision boundaries, and error curves

Line 2 import numpy as np

🔢 NumPy - Numerical Computing

Imports NumPy, the fundamental package for scientific computing in Python.

Provides multi-dimensional arrays (faster than Python lists)
Mathematical functions: np.exp(), np.log(), np.dot()
Random number generation: np.random
Array operations are vectorized (operate on entire arrays at once)

NumPy is the backbone of machine learning in Python. Almost every ML library is built on top of it!

Line 3 import pandas as pd

📋 Pandas - Data Analysis

Imports Pandas for reading and manipulating tabular data.

Provides DataFrames — like spreadsheets in Python
Easy CSV file reading with pd.read_csv()
Powerful data selection and filtering

Used for: Loading the training data from data.csv

2

Visualization Helpers

These provided functions handle plotting data points and decision boundaries. You don't need to modify them.

plot_points.py

def plot_points(X, y):
    admitted = X[np.argwhere(y==1)]
    rejected = X[np.argwhere(y==0)]
    plt.scatter([s[0][0] for s in rejected], ..., color='blue')
    plt.scatter([s[0][0] for s in admitted], ..., color='red')
                

Lines 1-5 plot_points(X, y)

🔴🔵 Plot Data Points by Class

This function separates data into two classes and plots them with different colors.

np.argwhere(y==1) — Finds indices where label is 1
X[...] — Selects rows from X at those indices
Blue dots = Class 0 (rejected)
Red dots = Class 1 (admitted)

display.py

def display(m, b, color='g--'):
    plt.xlim(-0.05, 1.05)
    plt.ylim(-0.05, 1.05)
    x = np.arange(-10, 10, 0.1)
    plt.plot(x, m*x+b, color)
                

Lines 1-5 display(m, b, color='g--')

📈 Plot Decision Boundary Line

Draws the decision boundary using slope-intercept form: y = mx + b

m — Slope of the line
b — Y-intercept
'g--' — Green dashed line (default)
xlim/ylim — Sets visible range to [0, 1] with padding

3

Loading the Data

These lines read the training data from a CSV file and prepare it for the algorithm.

load_data.py

data = pd.read_csv('data.csv', header=None)
X = np.array(data[[0,1]])
y = np.array(data[2])
plot_points(X, y)
plt.show()
                

Line 1 data = pd.read_csv('data.csv', header=None)

📁 Read CSV File

Loads the training data from a CSV file into a Pandas DataFrame.

'data.csv' — Filename containing our training examples
header=None — File has no header row, just data
Result: DataFrame with columns accessed by index (0, 1, 2)

CSV Structure

Column 0: Feature x₁ (e.g., test score 1)
Column 1: Feature x₂ (e.g., test score 2)
Column 2: Label y (0 = rejected, 1 = admitted)

Line 2 X = np.array(data[[0,1]])

📊 Extract Feature Matrix

Selects the input features and converts to a NumPy array.

data[[0,1]] — Selects columns 0 and 1
np.array() — Converts to NumPy array for fast math
Result shape: (n_samples, 2) — each row is [x₁, x₂]

Convention: Capital X for the feature matrix, lowercase x for a single sample.

Line 3 y = np.array(data[2])

🏷️ Extract Labels

Gets the target labels (ground truth) as a NumPy array.

data[2] — Selects column 2 (labels)
Result shape: (n_samples,) — 1D array of 0s and 1s

Lines 4-5 plot_points(X, y) / plt.show()

👁️ Visualize the Data

Plots the data to see what we're working with before training.

4

Hyperparameters

These settings control how the gradient descent algorithm behaves during training.

config.py

np.random.seed(44)

epochs = 100
learnrate = 0.01

Line 1 np.random.seed(44)

🎲 Set Random Seed

Makes random number generation reproducible.

Weights are initialized randomly
Setting a seed ensures same "random" numbers each run
Allows you to reproduce results exactly

Line 3 epochs = 100

🔄 Number of Epochs

How many times to iterate through the entire training set.

1 epoch = 1 complete pass through all training samples
More epochs → more learning (up to a point)

Line 4 learnrate = 0.01

📏 Learning Rate (α)

Controls the step size for each weight update.

α = 0.01 is a common starting point
Too high → Overshoots, diverges
Too low → Very slow convergence

The learning rate is crucial! Experiment with different values to see how it affects training.

5

Training Loop

The main function that orchestrates gradient descent, calling your four implemented functions.

train.py

def train(features, targets, epochs, learnrate, graph_lines=False):
    errors = []
    n_records, n_features = features.shape
    last_loss = None
    weights = np.random.normal(scale=1/n_features**.5, size=n_features)
    bias = 0
    for e in range(epochs):
        for x, y in zip(features, targets):
            weights, bias = update_weights(x, y, weights, bias, learnrate)
        out = output_formula(features, weights, bias)
        loss = np.mean(error_formula(targets, out))
        errors.append(loss)
                

Line 5 weights = np.random.normal(scale=1/n_features**.5, size=n_features)

⚖️ Initialize Weights (Xavier Initialization)

Creates random initial weights from a normal distribution.

np.random.normal() — Draws from Gaussian distribution
scale=1/√n_features — Xavier initialization (keeps values reasonable)
size=n_features — One weight per input feature

Xavier initialization prevents vanishing/exploding gradients by scaling weights appropriately.

Lines 7-9 for e in range(epochs): for x, y in zip(...): update_weights(...)

🔄 The Training Loops (Stochastic Gradient Descent)

Two nested loops that perform stochastic gradient descent:

Outer loop (Line 7): Iterates through epochs
Inner loop (Line 8): Iterates through each sample
zip(features, targets) — Pairs each input with its label
Line 9: update_weights() — YOUR function updates weights for each sample

This is Stochastic GD — weights update after EACH sample, not after seeing all samples.

Lines 10-12 out = output_formula(...) / loss = np.mean(error_formula(...))

📉 Calculate and Track Loss

After each epoch, compute the average loss to monitor progress.

output_formula() — YOUR function: Gets predictions for all samples
error_formula() — YOUR function: Computes error for each sample
np.mean() — Averages errors across all samples

6

Running the Training

run.py

train(X, y, epochs, learnrate, True)

Line 1 train(X, y, epochs, learnrate, True)

🚀 Start Training!

Calls the training function with all our prepared data and settings.

What you'll see:

Loss and accuracy printed every 10 epochs
Decision boundary lines evolving
Final boundary plot
Error curve (should decrease!)

Notebook Structure Explained

📑 Table of Contents