Programming Exercise 1

In Week 2 of Andrew Ng’s Machine Learning Course the fundamentals of linear regression are taught. This culminates in octave coding exercises. As an extra challenge throughout the course I will be completing these exercises in both octave and python. Below documents python coding snippets from the exercises. The git repository of the full code can be found here.

Required modules

import numpy as py

Importing the data

Two example data sets are provided:

ex1data1.txt data set for linear regression with one variable
ex1data2.txt data set for linear regression with two variable
def import_data(filename, normalise):
    data = np.loadtxt(filename, delimiter=",")
    y = (data[:,-1])
    X = (data[:,:-1])
    mu = None
    sigma = None

    if normalise == True:
        X, mu, sigma = feature_normalisation(X)

    X_1 = np.c_[np.ones(X.shape[0]),X]
    features = X_1.shape[1]

    return(X_1, y, mu, sigma, features)

X_1, y, mu, sigma, features = import_data("ex1data1.txt", normalise=False)

Input values:

Name Type Description
filename string filename of dataset to be loaded
normalise boolean apply feature normalisation to dataset if True

Return values are:

Name Type Description
X_1 numpy.ndarray X variables with first column of ones
y numpy.ndarray y variables
mu numpy.ndarray mean of each feature of X (returned from feature_normalisation function)
sigma numpy.ndarray standard deviation of each feature of X (returned from feature_normalisation function)
features numpy.ndarray number of features of X_1

Feature normalisation

Uses the following equation:

\begin{align*} x_i = \frac {x_i - \mu_i} {S_i} \forall i \in {1, 2, \cdots, n} \end{align*}

def feature_normalisation(X):
    mu = np.mean(X, axis = 0)
    sigma = np.std(X, axis = 0, ddof=1)
    X_n = (X - mu) / sigma
    return(X_n, mu, sigma)

X_n, mu, sigma = feature_normalisation(X)

Input values:

Name Type Description
X numpy.ndarray X variables

Return values are:

Name Type Description
X_n numpy.ndarray X variables normalised
mu numpy.ndarray mean of X
sigma numpy.ndarray standard deviation of X

Cost function

Uses the following equation:

\begin{align*}& J(\theta) = {1 \over 2m} \sum_{i=1}^m \left( h_\theta(x^{(i)}) - y^{(i)} \right)^2 \end{align*}

\begin{align*} h_\theta(x) &= \theta_0x_0 + \theta_1x_1 + \theta_2x_2 + \cdots + \theta_nx_n \text{, where }x_0 = 1 \newline &= \theta^T x \end{align*}

def cost_function(X, y, theta):
    m = y.shape[0]
    error = (X @ theta) - y
    J = 1 / (2*m) * sum(error**2)

J = cost_function(X_1, y, theta)

Input values:

Name Type Description
X numpy.ndarray X variables with first column of ones, i.e. X_1 obtained from import_data function
y numpy.ndarray y variables
theta numpy.ndarray theta values to compute cost function with

Return values are:

Name Type Description
J numpy.float64 cost function

Gradient descent

Uses the following equation:

\begin{align*}& \text{repeat until convergence:} \lbrace \newline & \theta_j := \theta_j - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} & \text{for j := 0…n}\newline \rbrace \end{align*}

\begin{align*} h_\theta(x) &= \theta_0x_0 + \theta_1x_1 + \theta_2x_2 + \cdots + \theta_nx_n \text{, where }x_0 = 1 \newline &= \theta^T x \end{align*}

def gradient_descent(X, y, theta, alpha, iterations):
    m = y.shape[0]
    J_history = []

    for i in range(iterations):
        J_history.append(cost_function(X, y, theta))
        error = np.transpose(X) @ ((X @ theta) - y)
        theta = theta - (alpha / m) * (error)
    return(theta, J_history)

theta, J_history = gradient_descent(X_1, y, theta, alpha, iterations)

Input values:

Name Type Description
X numpy.ndarray X variables with first column of ones with / without feature normalisation
y numpy.ndarray y variables
theta numpy.ndarray initial theta values by default set at an array of zeros ‘np.zeros(features)’
alpha float learning rate
iterations integer number of iterations of gradient descent to perform

Return values:

Name Type Description
theta numpy.ndarray theta values computed from gradient descent
J_history list calculated cost function for each theta of each iteration (can be used to ensure optimal learning rate)

Normal equation

Uses the following equation:

\begin{align*} &\theta = \left( X^TX \right)^{-1}X^Ty \end{align*}

def normal_equation(X, y):
    theta = np.linalg.inv(np.transpose(X) @ X) @ np.transpose(X) @ y

theta = normal_equation(X_1, y)

Input values:

Name Type Description
X numpy.ndarray X variables with first column of ones with / without feature normalisation
y numpy.ndarray y variables

Return value:

Name Type Description
theta numpy.ndarray theta values computed from normal equation

Predicting y

def predict_y(theta, x, mu, sigma):
    if any(v is None for v in [mu, sigma]):
        x_1 = np.insert(x, 0, 1)
        y = x_1 @ theta
        x_n = np.transpose((x - mu) / sigma)
        x_n1 = np.insert(x_n, 0, 1)
        y = x_n1 @ theta

y = predict_y(theta, x, mu, sigma)

Input values are:

Name Type Description
theta numpy.ndarray theta values used to calculate predicted y
x numpy.ndarray set of x values used to predict y
mu numpy.ndarray mean of X (returned from import_data function). If set as None function will not apply feature normalisation when predicting y.
sigma numpy.ndarray standard deviation of X (returned from import_data function). If set as None function will not apply feature normalisation when predicting y.

Return value:

Name Type Description
y numpy.float64 predicted y value from given thetas and x variables
June 2020