Week 6 of Andrew Ng’s Machine Learning Course covered how to ‘debug’ a learning algorithm; including the concepts of bias and variance. The coding assignment revisited linear regression adding on regularisation; then, involved using polynomial regression to demonstrate learning curves and validation curves. The graded assignments in python are outlined below. The Git repository of the complete script is here.

Required modules

import numpy as py
from scipy import optimize
from scipy.io import loadmat

Cost function

Uses the following formula

Regularised cost function: \begin{align*}& J(\theta) = {1 \over 2m} \sum_{i=1}^m \left( h_\theta(x^{(i)}) - y^{(i)} \right)^2 + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_j^2 \end{align*}

\begin{align*} h_\theta(x) &= \theta_0x_0 + \theta_1x_1 + \theta_2x_2 + \cdots + \theta_nx_n \text{, where }x_0 = 1 \newline &= \theta^T x \end{align*}

Regularised gradient:

\begin{align*} & \frac{\partial}{\partial \theta_j} J_{reg}(\theta) = \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} \hspace{25pt} \text{for }j = 0 \end{align*}

\begin{align*} \frac{\partial}{\partial \theta_j} J_{reg}(\theta) = \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} + \frac{\lambda}{m}\theta_j \hspace{25pt} \text{for }j \geq 1 \end{align*}

def cost_function(X, y, theta, lambda_=0):
    m = y.shape[0]
    h = X @ theta
    error = (h - y)

    # cost function
    regularise_cost = (lambda_ / (2*m)) * np.sum(theta[1:]**2)
    J = 1/(2*m) * np.sum(error**2) + regularise_cost

    # gradient
    grad = np.empty(theta.shape)
    grad[0] = 1/m * (error @ X)[0]
    grad[1:] = 1/m * (error @ X)[1:] + (lambda_/m) * theta[1:]

    return J, grad
...
J, grad = cost_function(X_1, y, theta, lambda_=1)

Input values:

Name	Type	Description
theta	numpy.ndarray	theta values to compute cost function with
X	numpy.ndarray	X variables with first column of ones
y	numpy.ndarray	y variables
lamdba_	integer	lambda value used for regularisation (if 0 no regularisation is applied)

Return values are:

Name	Type	Description
J	numpy.float64	cost function
grad	numpy.ndarray	gradient

Learning curve

def learning_curve(X, y, Xval, yval, lambda_=0):
    print("\n- Learning curve -")
    m = y.shape[0]
    error_train = np.zeros(m)
    error_val = np.zeros(m)

    for i in range(1, m+1):
        theta = train_linear_reg(cost_function, X[:i, ], y[:i], lambda_)
        error_train[i-1], _ = cost_function(X[:i, ], y[:i], theta, lambda_=0)
        error_val[i-1], _ = cost_function(Xval, yval, theta, lambda_=0)

    print("Training errors | Validation errors")
    print(np.stack([error_train, error_val], axis=1))
    print("\n")

    plt.plot(np.arange(1, m+1), error_train)
    plt.plot(np.arange(1, m+1), error_val)
    plt.xlabel("Number of training examples")
    plt.ylabel("Error")
    plt.legend(['Train', 'Cross Validation'])
    plt.title("Learning curve for linear regression")
    plt.show()
...
learning_curve(X_1, y, Xval_1, yval, lambda_=0)

Input values:

Name	Type	Description
X	numpy.ndarray	X variables from training dataset with first column of ones
y	numpy.ndarray	y variables from training dataset
Xval	numpy.ndarray	X variables from validation dataset with first column of ones
yval	numpy.ndarray	y variables from validation dataset
lamdba_	integer	lambda value used for regularisation (if 0 no regularisation is applied)

Returns:

Table of training/ validation errors and plot of learning curve.

Polynomial regression

Uses the formula:

\begin{align*} & h_\theta(x) = \theta_0x_0 + \theta_1x_1 \newline \newline & \text{map to p-th power} \newline \newline & h_\theta(x) = \theta_0x_0 + \theta_1x_1 + \theta_2x_1^2 + \cdots + \theta_px_1^p \end{align*}

def poly_features(X, p):
    X_poly = np.zeros((X.shape[0], p))

    for i in range(1, p+1):
        X_poly[:, i-1] = X.flatten()**(i)

    return X_poly
...
X_poly = poly_features(X, p=8)

Input values:

Name	Type	Description
X	numpy.ndarray	X variables
p	integer	polynomial power to map the features

Return values are:

Name	Type	Description
J	numpy.ndarray	Array of X variables with polynomial power

Validation curve

def validation_curve(X, y, Xval, yval):
    print("\n- Validation curve -")
    lambda_vec = [0, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10]
    error_train = np.zeros(len(lambda_vec))
    error_val = np.zeros(len(lambda_vec))

    for i in range(len(lambda_vec)):
        lambda_ = lambda_vec[i]
        theta = train_linear_reg(cost_function, X, y, lambda_)
        error_train[i], _ = cost_function(X, y, theta, lambda_=0)
        error_val[i], _ = cost_function(Xval, yval, theta, lambda_=0)

    print("Training errors | Validation errors")
    print(np.stack([error_train, error_val], axis=1))
    print("\n")

    plt.plot(lambda_vec, error_train)
    plt.plot(lambda_vec, error_val)
    plt.xlabel("Lambda")
    plt.ylabel("Error")
    plt.legend(['Train', 'Cross Validation'])
    plt.title("Validation curve for linear regression")
    plt.show()
...
validation_curve(X_poly_n_1, y, X_poly_val_1, yval)

Input values:

Name	Type	Description
X	numpy.ndarray	X variables from training dataset with first column of ones
y	numpy.ndarray	y variables from training dataset
Xval	numpy.ndarray	X variables from validation dataset with first column of ones
yval	numpy.ndarray	y variables from validation dataset

Returns:

Table of training/ validation errors and plot of validation curve.

Programming Exercise 5

Required modules

Cost function

Learning curve

Polynomial regression

Validation curve