Week 4 of Andrew Ng’s Machine Learning Course introduced the concepts of neural networks and their applications. The coding assignment was to recognise hand written digits. Two methods were used; all vs one logisitic regression and forward propagation of a neural network. The graded assignments in python are outlined below. This repository was used as a guide to importing the .mat
file data and on using optimize from scipy. The Git repository of the complete script is here.
import numpy as py
from scipy import optimize
Uses the following formula:
Regularised Cost funtion: \begin{align*}& J_{reg}(\theta) = - \dfrac{1}{m} \left[\sum_{i=1}^{m} y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1-h_\theta(x^{(i)}))\right] + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_j^2 \end{align*}
Regularised gradient: \begin{align*} & \frac{\partial}{\partial \theta_j} J_{reg}(\theta) = \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)} \hspace{25pt} \text{for }j = 0 \end{align*}
\begin{align*} \frac{\partial}{\partial \theta_j} J_{reg}(\theta) = \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)} + \frac{\lambda}{m}\theta_j \hspace{25pt} \text{for }j \geq 1 \end{align*}
Note: If λ is set to 0 then the regularisation formula will evaluate to 0; therefore, no regularisation will be applied.
def cost_function(theta, X_1, y, lambda_var=0):
m = y.shape[0]
h = sigmoid(X_1 @ theta)
# cost function
error = (-y * np.log(h)) - (1 - y) * np.log(1 - h)
regularise_cost = (lambda_var/(2*m)) * np.sum(theta[1:]**2)
J = 1/m * np.sum(error) + regularise_cost
# gadient
regularise_gradient = (lambda_var/m) * theta[1:]
grad_0 = 1/m * (np.transpose(X) @ (h - y))[0]
grad_1 = 1/m * (np.transpose(X) @ (h - y))[1:] + regularise
grad = np.hstack([grad_0.reshape(1), grad_1])
return(J, grad)
...
J, grad = cost_function(theta, X_1, y, lambda_ = 0)
Input values:
Name | Type | Description |
---|---|---|
theta | numpy.ndarray | theta values to compute cost function with |
X_1 | numpy.ndarray | X variables with first column of ones, i.e. X_1 obtained from import_data function |
y | numpy.ndarray | y variables |
lamdba_ | integer | lambda value used for regularisation (if 0 no regularisation is applied) |
Return values are:
Name | Type | Description |
---|---|---|
J | numpy.float64 | cost function |
grad | numpy.ndarray | gradient |
Uses the following formula:
\begin{align*}& y \in \lbrace0, 1 … n\rbrace \newline& h_\theta^{(0)}(x) = P(y = 0 | x ; \theta) \newline& h_\theta^{(1)}(x) = P(y = 1 | x ; \theta) \newline& \cdots \newline& h_\theta^{(n)}(x) = P(y = n | x ; \theta) \newline& \mathrm{prediction} = \max_i( h_\theta ^{(i)}(x) )\newline\end{align*}
def one_vs_all(X_1, y, num_labels, lambda_):
m, n = X_1.shape
theta_initial = np.zeros(n)
theta_all = np.empty([num_labels, n])
options = {'maxiter': 50}
for i in range(num_labels):
# Run minimize to obtain the optimal theta.
res = optimize.minimize(cost_function,
theta_initial,
(X_1, (y == i), lambda_),
jac=True,
method='TNC',
options=options)
print(f"For nummber: {i} - Cost Function {res.fun}")
theta_all[i] = res.x
return(theta_all)
...
lambda_ = 0.1
num_labels = 10
theta_all = one_vs_all(X_1, y, num_labels, lambda_)
Input values:
Name | Type | Description |
---|---|---|
X_1 | numpy.ndarray | X variables with first column of ones, i.e. X_1 obtained from import_data function |
y | numpy.ndarray | y variables |
num_labels | integer | number of output classes, i.e. 0-9 digits gives 10 classes |
lamdba_ | integer | lambda value used for regularisation (if 0 no regularisation is applied) |
Return values are:
Name | Type | Description |
---|---|---|
theta_all | numpy.ndarray | theta values caluated by optimize functino |
Uses the formula:
\begin{align*} h_\theta(x) = \frac{1}{1 + e^{\theta^{\top} x}} \end{align*}
def predict_one_vs_all(theta, X_1):
p = sigmoid(X_1 @ np.transpose(theta))
p = np.argmax(p, axis=1)
return(p)
...
prediction = predict_one_vs_all(theta_all, X_1)
accuracy = np.mean(prediction == y) * 100
print(f"\nAccuracy of model: {accuracy} \n")
Input values are:
Name | Type | Description |
---|---|---|
theta | numpy.ndarray | theta values used to calculate predication, i.e. theta_all from one_vs_all function |
X_1 | numpy.ndarray | set of x values with first column of ones used to form prediction |
Return value:
Name | Type | Description |
---|---|---|
p | numpy.ndarray | predicted values from given thetas and x variables |
Uses the formula:
\begin{align*}& \text{Input Layer } \newline& a^{(1)} = x \hspace{10pt} \text{; add } a_0^{(1)} \newline& \newline& \text{Hidden Layer } \newline& z^{(2)} = \theta^{(1)}a^{(1)} \newline& a^{(2)} = g(z^{(2)}) \hspace{10pt} \text{; add } a_0^{(2)} \newline& \newline& \text{Output Layer } \newline& z^{(3)} = \theta^{(2)}a^{(2)} \newline& a^{(3)} = g(z^{(3)}) = h_\theta(x) \end{align*}
def predict_nn(theta_1, theta_2, X_1):
a_2 = sigmoid(X_1 @ np.transpose(theta_1))
a_2 = np.c_[np.ones(a_2.shape[0]), a_2]
a_3 = sigmoid(a_2 @ np.transpose(theta_2))
p = np.argmax(a_3, axis=1)
return(p)
...
prediction = predict_one_vs_all(theta_all, X_1)
accuracy = np.mean(prediction == y) * 100
print(f"\nAccuracy of model: {accuracy} \n")
Input values are:
Name | Type | Description |
---|---|---|
theta_1 | numpy.ndarray | layer 2 theta values |
theta_2 | numpy.ndarray | layer 3 theta values |
X_1 | numpy.ndarray | set of x values with first column of ones used to form prediction |
Return value:
Name | Type | Description |
---|---|---|
p | numpy.ndarray | predicted values from given thetas and x variables |