Week 4 of Andrew Ng’s Machine Learning Course introduced the concepts of neural networks and their applications. The coding assignment was to recognise hand written digits. Two methods were used; all vs one logisitic regression and forward propagation of a neural network. The graded assignments in python are outlined below. This repository was used as a guide to importing the .mat file data and on using optimize from scipy. The Git repository of the complete script is here.

Required modules

import numpy as py
from scipy import optimize

Cost function and gradient with optional regularisation

Uses the following formula:

Regularised Cost funtion: \begin{align*}& J_{reg}(\theta) = - \dfrac{1}{m} \left[\sum_{i=1}^{m} y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1-h_\theta(x^{(i)}))\right] + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_j^2 \end{align*}

Regularised gradient: \begin{align*} & \frac{\partial}{\partial \theta_j} J_{reg}(\theta) = \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)} \hspace{25pt} \text{for }j = 0 \end{align*}

\begin{align*} \frac{\partial}{\partial \theta_j} J_{reg}(\theta) = \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)} + \frac{\lambda}{m}\theta_j \hspace{25pt} \text{for }j \geq 1 \end{align*}

Note: If λ is set to 0 then the regularisation formula will evaluate to 0; therefore, no regularisation will be applied.

def cost_function(theta, X_1, y, lambda_var=0):
    m = y.shape[0]
    h = sigmoid(X_1 @ theta)

    # cost function
    error = (-y * np.log(h)) - (1 - y) * np.log(1 - h)

    regularise_cost = (lambda_var/(2*m)) * np.sum(theta[1:]**2)

    J = 1/m * np.sum(error) + regularise_cost

    # gadient
    regularise_gradient = (lambda_var/m) * theta[1:]

    grad_0 = 1/m * (np.transpose(X) @ (h - y))[0]
    grad_1 = 1/m * (np.transpose(X) @ (h - y))[1:] + regularise
    grad = np.hstack([grad_0.reshape(1), grad_1])

    return(J, grad)

...
J, grad = cost_function(theta, X_1, y, lambda_ = 0)

Input values:

Name	Type	Description
theta	numpy.ndarray	theta values to compute cost function with
X_1	numpy.ndarray	X variables with first column of ones, i.e. X_1 obtained from `import_data` function
y	numpy.ndarray	y variables
lamdba_	integer	lambda value used for regularisation (if 0 no regularisation is applied)

Return values are:

Name	Type	Description
J	numpy.float64	cost function
grad	numpy.ndarray	gradient

One vs all classification

Uses the following formula:

\begin{align*}& y \in \lbrace0, 1 … n\rbrace \newline& h_\theta^{(0)}(x) = P(y = 0 | x ; \theta) \newline& h_\theta^{(1)}(x) = P(y = 1 | x ; \theta) \newline& \cdots \newline& h_\theta^{(n)}(x) = P(y = n | x ; \theta) \newline& \mathrm{prediction} = \max_i( h_\theta ^{(i)}(x) )\newline\end{align*}

def one_vs_all(X_1, y, num_labels, lambda_):
    m, n = X_1.shape
    theta_initial = np.zeros(n)
    theta_all = np.empty([num_labels, n])

    options = {'maxiter': 50}

    for i in range(num_labels):
        # Run minimize to obtain the optimal theta.
        res = optimize.minimize(cost_function,
                                theta_initial,
                                (X_1, (y == i), lambda_),
                                jac=True,
                                method='TNC',
                                options=options)
        print(f"For nummber: {i} - Cost Function {res.fun}")
        theta_all[i] = res.x

    return(theta_all)
...

lambda_ = 0.1
num_labels = 10
theta_all = one_vs_all(X_1, y, num_labels, lambda_)

Input values:

Name	Type	Description
X_1	numpy.ndarray	X variables with first column of ones, i.e. X_1 obtained from `import_data` function
y	numpy.ndarray	y variables
num_labels	integer	number of output classes, i.e. 0-9 digits gives 10 classes
lamdba_	integer	lambda value used for regularisation (if 0 no regularisation is applied)

Return values are:

Name	Type	Description
theta_all	numpy.ndarray	theta values caluated by optimize functino

Predicting from one vs all and measuring accuracy

Uses the formula:

\begin{align*} h_\theta(x) = \frac{1}{1 + e^{\theta^{\top} x}} \end{align*}

def predict_one_vs_all(theta, X_1):
    p = sigmoid(X_1 @ np.transpose(theta))
    p = np.argmax(p, axis=1)
    return(p)

...
prediction = predict_one_vs_all(theta_all, X_1)
accuracy = np.mean(prediction == y) * 100
print(f"\nAccuracy of model: {accuracy} \n")

Input values are:

Name	Type	Description
theta	numpy.ndarray	theta values used to calculate predication, i.e. `theta_all` from `one_vs_all` function
X_1	numpy.ndarray	set of x values with first column of ones used to form prediction

Return value:

Name	Type	Description
p	numpy.ndarray	predicted values from given thetas and x variables

Predicting from neural network forward propagation and measuring accuracy