Programming Exercise 3



Week 4 of Andrew Ng’s Machine Learning Course introduced the concepts of neural networks and their applications. The coding assignment was to recognise hand written digits. Two methods were used; all vs one logisitic regression and forward propagation of a neural network. The graded assignments in python are outlined below. This repository was used as a guide to importing the .mat file data and on using optimize from scipy. The Git repository of the complete script is here.

Required modules

import numpy as py
from scipy import optimize

Cost function and gradient with optional regularisation

Uses the following formula:

Regularised Cost funtion: \begin{align*}& J_{reg}(\theta) = - \dfrac{1}{m} \left[\sum_{i=1}^{m} y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1-h_\theta(x^{(i)}))\right] + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_j^2 \end{align*}

Regularised gradient: \begin{align*} & \frac{\partial}{\partial \theta_j} J_{reg}(\theta) = \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)} \hspace{25pt} \text{for }j = 0 \end{align*}

\begin{align*} \frac{\partial}{\partial \theta_j} J_{reg}(\theta) = \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)} + \frac{\lambda}{m}\theta_j \hspace{25pt} \text{for }j \geq 1 \end{align*}

Note: If λ is set to 0 then the regularisation formula will evaluate to 0; therefore, no regularisation will be applied.

def cost_function(theta, X_1, y, lambda_var=0):
    m = y.shape[0]
    h = sigmoid(X_1 @ theta)

    # cost function
    error = (-y * np.log(h)) - (1 - y) * np.log(1 - h)

    regularise_cost = (lambda_var/(2*m)) * np.sum(theta[1:]**2)

    J = 1/m * np.sum(error) + regularise_cost

    # gadient
    regularise_gradient = (lambda_var/m) * theta[1:]

    grad_0 = 1/m * (np.transpose(X) @ (h - y))[0]
    grad_1 = 1/m * (np.transpose(X) @ (h - y))[1:] + regularise
    grad = np.hstack([grad_0.reshape(1), grad_1])

    return(J, grad)

...
J, grad = cost_function(theta, X_1, y, lambda_ = 0)

Input values:

Name Type Description
theta numpy.ndarray theta values to compute cost function with
X_1 numpy.ndarray X variables with first column of ones, i.e. X_1 obtained from import_data function
y numpy.ndarray y variables
lamdba_ integer lambda value used for regularisation (if 0 no regularisation is applied)

Return values are:

Name Type Description
J numpy.float64 cost function
grad numpy.ndarray gradient

One vs all classification

Uses the following formula:

\begin{align*}& y \in \lbrace0, 1 … n\rbrace \newline& h_\theta^{(0)}(x) = P(y = 0 | x ; \theta) \newline& h_\theta^{(1)}(x) = P(y = 1 | x ; \theta) \newline& \cdots \newline& h_\theta^{(n)}(x) = P(y = n | x ; \theta) \newline& \mathrm{prediction} = \max_i( h_\theta ^{(i)}(x) )\newline\end{align*}

def one_vs_all(X_1, y, num_labels, lambda_):
    m, n = X_1.shape
    theta_initial = np.zeros(n)
    theta_all = np.empty([num_labels, n])

    options = {'maxiter': 50}

    for i in range(num_labels):
        # Run minimize to obtain the optimal theta.
        res = optimize.minimize(cost_function,
                                theta_initial,
                                (X_1, (y == i), lambda_),
                                jac=True,
                                method='TNC',
                                options=options)
        print(f"For nummber: {i} - Cost Function {res.fun}")
        theta_all[i] = res.x

    return(theta_all)
...

lambda_ = 0.1
num_labels = 10
theta_all = one_vs_all(X_1, y, num_labels, lambda_)

Input values:

Name Type Description
X_1 numpy.ndarray X variables with first column of ones, i.e. X_1 obtained from import_data function
y numpy.ndarray y variables
num_labels integer number of output classes, i.e. 0-9 digits gives 10 classes
lamdba_ integer lambda value used for regularisation (if 0 no regularisation is applied)

Return values are:

Name Type Description
theta_all numpy.ndarray theta values caluated by optimize functino

Predicting from one vs all and measuring accuracy

Uses the formula:

\begin{align*} h_\theta(x) = \frac{1}{1 + e^{\theta^{\top} x}} \end{align*}

def predict_one_vs_all(theta, X_1):
    p = sigmoid(X_1 @ np.transpose(theta))
    p = np.argmax(p, axis=1)
    return(p)

...
prediction = predict_one_vs_all(theta_all, X_1)
accuracy = np.mean(prediction == y) * 100
print(f"\nAccuracy of model: {accuracy} \n")

Input values are:

Name Type Description
theta numpy.ndarray theta values used to calculate predication, i.e. theta_all from one_vs_all function
X_1 numpy.ndarray set of x values with first column of ones used to form prediction

Return value:

Name Type Description
p numpy.ndarray predicted values from given thetas and x variables

Predicting from neural network forward propagation and measuring accuracy

Uses the formula:

\begin{align*}& \text{Input Layer } \newline& a^{(1)} = x \hspace{10pt} \text{; add } a_0^{(1)} \newline& \newline& \text{Hidden Layer } \newline& z^{(2)} = \theta^{(1)}a^{(1)} \newline& a^{(2)} = g(z^{(2)}) \hspace{10pt} \text{; add } a_0^{(2)} \newline& \newline& \text{Output Layer } \newline& z^{(3)} = \theta^{(2)}a^{(2)} \newline& a^{(3)} = g(z^{(3)}) = h_\theta(x) \end{align*}

def predict_nn(theta_1, theta_2, X_1):
    a_2 = sigmoid(X_1 @ np.transpose(theta_1))
    a_2 = np.c_[np.ones(a_2.shape[0]), a_2]
    a_3 = sigmoid(a_2 @ np.transpose(theta_2))
    p = np.argmax(a_3, axis=1)
    return(p)

...
prediction = predict_one_vs_all(theta_all, X_1)
accuracy = np.mean(prediction == y) * 100
print(f"\nAccuracy of model: {accuracy} \n")

Input values are:

Name Type Description
theta_1 numpy.ndarray layer 2 theta values
theta_2 numpy.ndarray layer 3 theta values
X_1 numpy.ndarray set of x values with first column of ones used to form prediction

Return value:

Name Type Description
p numpy.ndarray predicted values from given thetas and x variables
July 2020