Understanding the imports, helper functions, data loading, and training loop that orchestrate the gradient descent algorithm
These lines import the essential Python libraries for numerical computing, data handling, and visualization.
1import matplotlib.pyplot as plt 2import numpy as np 3import pandas as pd
import matplotlib.pyplot as plt
Imports the pyplot module from Matplotlib for creating plots.
matplotlib — Python's most popular plotting librarypyplot — Provides a MATLAB-like interface for simple plottingas plt — Creates a short alias so we can write plt.plot()Used for: Plotting data points, decision boundaries, and error curves
import numpy as np
Imports NumPy, the fundamental package for scientific computing in Python.
np.exp(), np.log(), np.dot()np.randomNumPy is the backbone of machine learning in Python. Almost every ML library is built on top of it!
import pandas as pd
Imports Pandas for reading and manipulating tabular data.
pd.read_csv()Used for: Loading the training data from data.csv
These provided functions handle plotting data points and decision boundaries. You don't need to modify them.
1def plot_points(X, y): 2 admitted = X[np.argwhere(y==1)] 3 rejected = X[np.argwhere(y==0)] 4 plt.scatter([s[0][0] for s in rejected], ..., color='blue') 5 plt.scatter([s[0][0] for s in admitted], ..., color='red')
plot_points(X, y)
This function separates data into two classes and plots them with different colors.
np.argwhere(y==1) — Finds indices where label is 1X[...] — Selects rows from X at those indices1def display(m, b, color='g--'): 2 plt.xlim(-0.05, 1.05) 3 plt.ylim(-0.05, 1.05) 4 x = np.arange(-10, 10, 0.1) 5 plt.plot(x, m*x+b, color)
display(m, b, color='g--')
Draws the decision boundary using slope-intercept form: y = mx + b
m — Slope of the lineb — Y-intercept'g--' — Green dashed line (default)xlim/ylim — Sets visible range to [0, 1] with paddingThese lines read the training data from a CSV file and prepare it for the algorithm.
1data = pd.read_csv('data.csv', header=None) 2X = np.array(data[[0,1]]) 3y = np.array(data[2]) 4plot_points(X, y) 5plt.show()
data = pd.read_csv('data.csv', header=None)
Loads the training data from a CSV file into a Pandas DataFrame.
'data.csv' — Filename containing our training examplesheader=None — File has no header row, just dataColumn 0: Feature x₁ (e.g., test score 1) Column 1: Feature x₂ (e.g., test score 2) Column 2: Label y (0 = rejected, 1 = admitted)
X = np.array(data[[0,1]])
Selects the input features and converts to a NumPy array.
data[[0,1]] — Selects columns 0 and 1np.array() — Converts to NumPy array for fast math(n_samples, 2) — each row is [x₁, x₂]Convention: Capital X for the feature matrix, lowercase x for a single sample.
y = np.array(data[2])
Gets the target labels (ground truth) as a NumPy array.
data[2] — Selects column 2 (labels)(n_samples,) — 1D array of 0s and 1splot_points(X, y) / plt.show()
Plots the data to see what we're working with before training.
These settings control how the gradient descent algorithm behaves during training.
1np.random.seed(44) 2 3epochs = 100 4learnrate = 0.01
np.random.seed(44)
Makes random number generation reproducible.
epochs = 100
How many times to iterate through the entire training set.
learnrate = 0.01
Controls the step size for each weight update.
The learning rate is crucial! Experiment with different values to see how it affects training.
The main function that orchestrates gradient descent, calling your four implemented functions.
1def train(features, targets, epochs, learnrate, graph_lines=False): 2 errors = [] 3 n_records, n_features = features.shape 4 last_loss = None 5 weights = np.random.normal(scale=1/n_features**.5, size=n_features) 6 bias = 0 7 for e in range(epochs): 8 for x, y in zip(features, targets): 9 weights, bias = update_weights(x, y, weights, bias, learnrate) 10 out = output_formula(features, weights, bias) 11 loss = np.mean(error_formula(targets, out)) 12 errors.append(loss)
weights = np.random.normal(scale=1/n_features**.5, size=n_features)
Creates random initial weights from a normal distribution.
np.random.normal() — Draws from Gaussian distributionscale=1/√n_features — Xavier initialization (keeps values reasonable)size=n_features — One weight per input featureXavier initialization prevents vanishing/exploding gradients by scaling weights appropriately.
for e in range(epochs): for x, y in zip(...): update_weights(...)
Two nested loops that perform stochastic gradient descent:
zip(features, targets) — Pairs each input with its labelupdate_weights() — YOUR function updates weights for each sampleThis is Stochastic GD — weights update after EACH sample, not after seeing all samples.
out = output_formula(...) / loss = np.mean(error_formula(...))
After each epoch, compute the average loss to monitor progress.
output_formula() — YOUR function: Gets predictions for all sampleserror_formula() — YOUR function: Computes error for each samplenp.mean() — Averages errors across all samples1train(X, y, epochs, learnrate, True)
train(X, y, epochs, learnrate, True)
Calls the training function with all our prepared data and settings.
What you'll see: