Week 5 of Andrew Ng’s Machine Learning Course went deeper into neural networks. Explaining how to apply forward and backward propagation to calculate the elements required to train a neural network. The coding assignment used the same dataset as the previous week; however, required both forward and backward propagation to be implemented. The graded assignments in python are outlined below. This repository was used as a guide in implementing the gradient checking and calculating the numerical gradient. The Git repository of the complete script is here.
import numpy as py
from scipy import optimize
from scipy.io import loadmat
Uses the following formula:
\begin{align*} & g'(z) = {d \over dz}g(z) = g(z)(1-g(z)) \newline& \text{where} \newline& \text{sigmoid}(z) = g(z) = {1 \over 1+ e^{-z}} \end{align*}
def sigmoid_gradient(z):
g = sigmoid(z) * (1 - sigmoid(z))
return(g)
...
g = sigmoid_gradient(z)
Input values are:
Name | Type | Description |
---|---|---|
z | numpy.ndarray | vector or matrix as input of sigmoid function |
Return value:
Name | Type | Description |
---|---|---|
g | numpy.ndarray | gradient of sigmoid function; has same shape as input |
Uses the following formula:
\begin{align*} & \text{Initialize each }\theta^{(l)}_{ij} \text{to a random value in } [-\epsilon,\epsilon] \newline & W = rand[m, 1 + n] \times (2 \times \epsilon) - \epsilon \end{align*}
def rand_init_weights(L_in, L_out, episilon_init=0.12):
W = np.random.rand(L_out, 1 + L_in) * 2 * episilon_init - episilon_init
return(W)
...
theta_1_rand = rand_init_weights(input_layer_size, hidden_layer_size)
theta_2_rand = rand_init_weights(hidden_layer_size, num_labels)
Input values are:
Name | Type | Description |
---|---|---|
L_in | int | number of incoming connections |
L_out | int | number of outgoing connections |
episilon | float | range of values that weight can take from a uniform distrubution |
Return value:
Name | Type | Description |
---|---|---|
W | numpy.ndarray | array of randomised weights |
Uses the following formula:
Regularised cost funtion:
\begin{gather*} J(\Theta) = - \frac{1}{m} \sum_{i=1}^m \sum_{k=1}^K \left[y^{(i)}_k \log ((h_\Theta (x^{(i)}))_k) + (1 - y^{(i)}_k)\log (1 - (h_\Theta(x^{(i)}))_k)\right] + \frac{\lambda}{2m}\sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} ( \Theta_{j,i}^{(l)})^2\end{gather*}
Regularised gradient:
\begin{align*} & {\partial \over \partial \theta^{(l)}_{ij}} J(\theta) = D^{(l)}_{ij} = {1 \over m}\Delta^{(l)}_{ij} \hspace{10pt} \text{for j = 0 }\newline& {\partial \over \partial \theta^{(l)}_{ij}} J(\theta) = D^{(l)}_{ij} = {1 \over m}\Delta^{(l)}_{ij} \hspace{10pt} \text{for } j \geq 1 \end{align*}
def cost_function(nn_params, input_layer_size, hidden_layer_size, num_labels,
X, y, lambda_=0):
theta_1 = np.reshape(nn_params[:hidden_layer_size * (input_layer_size + 1)],
(hidden_layer_size, (input_layer_size + 1)))
theta_2 = np.reshape(nn_params[(hidden_layer_size * (input_layer_size + 1)):],
(num_labels, (hidden_layer_size + 1)))
m = y.shape[0]
# forward propagation
a_1 = np.c_[np.ones(X.shape[0]), X]
z_2 = a_1 @ np.transpose(theta_1)
a_2 = sigmoid(z_2)
a_2 = np.c_[np.ones(a_2.shape[0]), a_2]
z_3 = a_2 @ np.transpose(theta_2)
a_3 = sigmoid(z_3)
# convert y to dummies
y_v = np.zeros([m, num_labels])
y_v[np.arange(m), y] = 1
# cost function
error = ((-1 * y_v) * np.log(a_3)) - (1 - y_v) * np.log(1 - a_3)
regularise_cost = (lambda_/(2*m)) * (np.sum(np.sum(theta_1[:, 1:]**2))
+ np.sum(np.sum(theta_2[:, 1:]**2)))
J = 1/m * np.sum(np.sum(error)) + regularise_cost
# backward propagation and gradient regularisation
d_3 = a_3 - y_v
gz_2 = sigmoid_gradient(np.c_[np.ones(z_2.shape[0]), z_2])
d_2 = (d_3 @ theta_2) * gz_2
d_2 = d_2[:, 1:]
theta_1_grad = np.zeros(theta_1.shape)
theta_1_grad += (np.transpose(d_2) @ a_1)
nn_theta_1_grad = theta_1_grad/m + (lambda_/m) \
* np.column_stack((np.zeros(theta_1.shape[0]), theta_1[:, 1:]))
theta_2_grad = np.zeros(theta_2.shape)
theta_2_grad += (np.transpose(d_3) @ a_2)
nn_theta_2_grad = theta_2_grad/m + (lambda_/m) \
* np.column_stack((np.zeros(theta_2.shape[0]), theta_2[:, 1:]))
grad = np.concatenate([nn_theta_1_grad.ravel(), nn_theta_2_grad.ravel()])
return(J, grad)
...
J, grad = cost_function(nn_params, input_layer_size, hidden_layer_size,
num_labels, X, y, lambda_)
Input values:
Name | Type | Description |
---|---|---|
nn_params | numpy.ndarray | theta parameters for neural network; ‘unrolled’ into vector |
input_layer_size | int | number of features of input layer |
hidden_layer_size | int | number of hidden units in second layer |
num_labels | int | number of units in output layer |
X | numpy.ndarray | X variables |
y | numpy.ndarray | y variables |
lamdba_ | float | lambda value used for regularisation (if 0 no regularisation is applied) |
Return values are:
Name | Type | Description |
---|---|---|
J | numpy.float64 | cost function |
grad | numpy.ndarray | gradient as an ‘unrolled’ vector |