Programming Exercise 5



Week 6 of Andrew Ng’s Machine Learning Course covered how to ‘debug’ a learning algorithm; including the concepts of bias and variance. The coding assignment revisited linear regression adding on regularisation; then, involved using polynomial regression to demonstrate learning curves and validation curves. The graded assignments in python are outlined below. The Git repository of the complete script is here.

Required modules

import numpy as py
from scipy import optimize
from scipy.io import loadmat

Cost function

Uses the following formula

Regularised cost function: \begin{align*}& J(\theta) = {1 \over 2m} \sum_{i=1}^m \left( h_\theta(x^{(i)}) - y^{(i)} \right)^2 + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_j^2 \end{align*}

\begin{align*} h_\theta(x) &= \theta_0x_0 + \theta_1x_1 + \theta_2x_2 + \cdots + \theta_nx_n \text{, where }x_0 = 1 \newline &= \theta^T x \end{align*}

Regularised gradient:

\begin{align*} & \frac{\partial}{\partial \theta_j} J_{reg}(\theta) = \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} \hspace{25pt} \text{for }j = 0 \end{align*}

\begin{align*} \frac{\partial}{\partial \theta_j} J_{reg}(\theta) = \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} + \frac{\lambda}{m}\theta_j \hspace{25pt} \text{for }j \geq 1 \end{align*}

def cost_function(X, y, theta, lambda_=0):
    m = y.shape[0]
    h = X @ theta
    error = (h - y)

    # cost function
    regularise_cost = (lambda_ / (2*m)) * np.sum(theta[1:]**2)
    J = 1/(2*m) * np.sum(error**2) + regularise_cost

    # gradient
    grad = np.empty(theta.shape)
    grad[0] = 1/m * (error @ X)[0]
    grad[1:] = 1/m * (error @ X)[1:] + (lambda_/m) * theta[1:]

    return J, grad
...
J, grad = cost_function(X_1, y, theta, lambda_=1)

Input values:

Name Type Description
theta numpy.ndarray theta values to compute cost function with
X numpy.ndarray X variables with first column of ones
y numpy.ndarray y variables
lamdba_ integer lambda value used for regularisation (if 0 no regularisation is applied)

Return values are:

Name Type Description
J numpy.float64 cost function
grad numpy.ndarray gradient

Learning curve

def learning_curve(X, y, Xval, yval, lambda_=0):
    print("\n- Learning curve -")
    m = y.shape[0]
    error_train = np.zeros(m)
    error_val = np.zeros(m)

    for i in range(1, m+1):
        theta = train_linear_reg(cost_function, X[:i, ], y[:i], lambda_)
        error_train[i-1], _ = cost_function(X[:i, ], y[:i], theta, lambda_=0)
        error_val[i-1], _ = cost_function(Xval, yval, theta, lambda_=0)

    print("Training errors | Validation errors")
    print(np.stack([error_train, error_val], axis=1))
    print("\n")

    plt.plot(np.arange(1, m+1), error_train)
    plt.plot(np.arange(1, m+1), error_val)
    plt.xlabel("Number of training examples")
    plt.ylabel("Error")
    plt.legend(['Train', 'Cross Validation'])
    plt.title("Learning curve for linear regression")
    plt.show()
...
learning_curve(X_1, y, Xval_1, yval, lambda_=0)

Input values:

Name Type Description
X numpy.ndarray X variables from training dataset with first column of ones
y numpy.ndarray y variables from training dataset
Xval numpy.ndarray X variables from validation dataset with first column of ones
yval numpy.ndarray y variables from validation dataset
lamdba_ integer lambda value used for regularisation (if 0 no regularisation is applied)

Returns:

Table of training/ validation errors and plot of learning curve.

Polynomial regression

Uses the formula:

\begin{align*} & h_\theta(x) = \theta_0x_0 + \theta_1x_1 \newline \newline & \text{map to p-th power} \newline \newline & h_\theta(x) = \theta_0x_0 + \theta_1x_1 + \theta_2x_1^2 + \cdots + \theta_px_1^p \end{align*}

def poly_features(X, p):
    X_poly = np.zeros((X.shape[0], p))

    for i in range(1, p+1):
        X_poly[:, i-1] = X.flatten()**(i)

    return X_poly
...
X_poly = poly_features(X, p=8)

Input values:

Name Type Description
X numpy.ndarray X variables
p integer polynomial power to map the features

Return values are:

Name Type Description
J numpy.ndarray Array of X variables with polynomial power

Validation curve

def validation_curve(X, y, Xval, yval):
    print("\n- Validation curve -")
    lambda_vec = [0, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10]
    error_train = np.zeros(len(lambda_vec))
    error_val = np.zeros(len(lambda_vec))

    for i in range(len(lambda_vec)):
        lambda_ = lambda_vec[i]
        theta = train_linear_reg(cost_function, X, y, lambda_)
        error_train[i], _ = cost_function(X, y, theta, lambda_=0)
        error_val[i], _ = cost_function(Xval, yval, theta, lambda_=0)

    print("Training errors | Validation errors")
    print(np.stack([error_train, error_val], axis=1))
    print("\n")

    plt.plot(lambda_vec, error_train)
    plt.plot(lambda_vec, error_val)
    plt.xlabel("Lambda")
    plt.ylabel("Error")
    plt.legend(['Train', 'Cross Validation'])
    plt.title("Validation curve for linear regression")
    plt.show()
...
validation_curve(X_poly_n_1, y, X_poly_val_1, yval)

Input values:

Name Type Description
X numpy.ndarray X variables from training dataset with first column of ones
y numpy.ndarray y variables from training dataset
Xval numpy.ndarray X variables from validation dataset with first column of ones
yval numpy.ndarray y variables from validation dataset

Returns:

Table of training/ validation errors and plot of validation curve.

July 2020