In Week 2 of Andrew Ng’s Machine Learning Course the fundamentals of linear regression are taught. This culminates in octave coding exercises. As an extra challenge throughout the course I will be completing these exercises in both octave and python. Below documents python coding snippets from the exercises. The git repository of the full code can be found here.

Required modules

import numpy as py

Importing the data

Two example data sets are provided:


ex1data1.txt	data set for linear regression with one variable
ex1data2.txt	data set for linear regression with two variable

def import_data(filename, normalise):
    data = np.loadtxt(filename, delimiter=",")
    y = (data[:,-1])
    X = (data[:,:-1])
    mu = None
    sigma = None

    if normalise == True:
        X, mu, sigma = feature_normalisation(X)

    X_1 = np.c_[np.ones(X.shape[0]),X]
    features = X_1.shape[1]

    return(X_1, y, mu, sigma, features)

...
X_1, y, mu, sigma, features = import_data("ex1data1.txt", normalise=False)

Input values:

Name	Type	Description
filename	string	filename of dataset to be loaded
normalise	boolean	apply feature normalisation to dataset if True

Return values are:

Name	Type	Description
X_1	numpy.ndarray	X variables with first column of ones
y	numpy.ndarray	y variables
mu	numpy.ndarray	mean of each feature of X (returned from `feature_normalisation` function)
sigma	numpy.ndarray	standard deviation of each feature of X (returned from `feature_normalisation` function)
features	numpy.ndarray	number of features of X_1

Feature normalisation

Uses the following equation:

\begin{align*} x_i = \frac {x_i - \mu_i} {S_i} \forall i \in {1, 2, \cdots, n} \end{align*}

def feature_normalisation(X):
    mu = np.mean(X, axis = 0)
    sigma = np.std(X, axis = 0, ddof=1)
    X_n = (X - mu) / sigma
    return(X_n, mu, sigma)

...
X_n, mu, sigma = feature_normalisation(X)

Input values:

Name	Type	Description
X	numpy.ndarray	X variables

Return values are:

Name	Type	Description
X_n	numpy.ndarray	X variables normalised
mu	numpy.ndarray	mean of X
sigma	numpy.ndarray	standard deviation of X

Cost function

Uses the following equation:

\begin{align*}& J(\theta) = {1 \over 2m} \sum_{i=1}^m \left( h_\theta(x^{(i)}) - y^{(i)} \right)^2 \end{align*}

\begin{align*} h_\theta(x) &= \theta_0x_0 + \theta_1x_1 + \theta_2x_2 + \cdots + \theta_nx_n \text{, where }x_0 = 1 \newline &= \theta^T x \end{align*}

def cost_function(X, y, theta):
    m = y.shape[0]
    error = (X @ theta) - y
    J = 1 / (2*m) * sum(error**2)
    return(J)

...
J = cost_function(X_1, y, theta)

Input values:

Name	Type	Description
X	numpy.ndarray	X variables with first column of ones, i.e. X_1 obtained from `import_data` function
y	numpy.ndarray	y variables
theta	numpy.ndarray	theta values to compute cost function with

Return values are:

Name	Type	Description
J	numpy.float64	cost function

Gradient descent

Uses the following equation:

\begin{align*}& \text{repeat until convergence:} \lbrace \newline & \theta_j := \theta_j - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} & \text{for j := 0…n}\newline \rbrace \end{align*}

\begin{align*} h_\theta(x) &= \theta_0x_0 + \theta_1x_1 + \theta_2x_2 + \cdots + \theta_nx_n \text{, where }x_0 = 1 \newline &= \theta^T x \end{align*}

def gradient_descent(X, y, theta, alpha, iterations):
    m = y.shape[0]
    J_history = []

    for i in range(iterations):
        J_history.append(cost_function(X, y, theta))
        error = np.transpose(X) @ ((X @ theta) - y)
        theta = theta - (alpha / m) * (error)
    return(theta, J_history)

...
theta, J_history = gradient_descent(X_1, y, theta, alpha, iterations)

Input values:

Name	Type	Description
X	numpy.ndarray	X variables with first column of ones with / without feature normalisation
y	numpy.ndarray	y variables
theta	numpy.ndarray	initial theta values by default set at an array of zeros ‘np.zeros(features)’
alpha	float	learning rate
iterations	integer	number of iterations of gradient descent to perform

Return values:

Name	Type	Description
theta	numpy.ndarray	theta values computed from gradient descent
J_history	list	calculated cost function for each theta of each iteration (can be used to ensure optimal learning rate)

Normal equation

Uses the following equation:

\begin{align*} &\theta = \left( X^TX \right)^{-1}X^Ty \end{align*}

def normal_equation(X, y):
    theta = np.linalg.inv(np.transpose(X) @ X) @ np.transpose(X) @ y
    return(theta)

...
theta = normal_equation(X_1, y)

Input values:

Name	Type	Description
X	numpy.ndarray	X variables with first column of ones with / without feature normalisation
y	numpy.ndarray	y variables

Return value:

Name	Type	Description
theta	numpy.ndarray	theta values computed from normal equation

Predicting y

def predict_y(theta, x, mu, sigma):
    if any(v is None for v in [mu, sigma]):
        x_1 = np.insert(x, 0, 1)
        y = x_1 @ theta
    else:
        x_n = np.transpose((x - mu) / sigma)
        x_n1 = np.insert(x_n, 0, 1)
        y = x_n1 @ theta
    return(y)

...
y = predict_y(theta, x, mu, sigma)

Input values are:

Name	Type	Description
theta	numpy.ndarray	theta values used to calculate predicted y
x	numpy.ndarray	set of x values used to predict y
mu	numpy.ndarray	mean of X (returned from `import_data` function). If set as `None` function will not apply feature normalisation when predicting y.
sigma	numpy.ndarray	standard deviation of X (returned from `import_data` function). If set as `None` function will not apply feature normalisation when predicting y.

Return value:

Name	Type	Description
y	numpy.float64	predicted y value from given thetas and x variables

Programming Exercise 1

Required modules

Importing the data

Feature normalisation

Cost function

Gradient descent

Normal equation

Predicting y