Week 6 of Andrew Ng’s Machine Learning Course covered how to ‘debug’ a learning algorithm; including the concepts of bias and variance. The coding assignment revisited linear regression adding on regularisation; then, involved using polynomial regression to demonstrate learning curves and validation curves. The graded assignments in python are outlined below. The Git repository of the complete script is here.
import numpy as py
from scipy import optimize
from scipy.io import loadmat
Uses the following formula
Regularised cost function: \begin{align*}& J(\theta) = {1 \over 2m} \sum_{i=1}^m \left( h_\theta(x^{(i)}) - y^{(i)} \right)^2 + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_j^2 \end{align*}
\begin{align*} h_\theta(x) &= \theta_0x_0 + \theta_1x_1 + \theta_2x_2 + \cdots + \theta_nx_n \text{, where }x_0 = 1 \newline &= \theta^T x \end{align*}
Regularised gradient:
\begin{align*} & \frac{\partial}{\partial \theta_j} J_{reg}(\theta) = \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} \hspace{25pt} \text{for }j = 0 \end{align*}
\begin{align*} \frac{\partial}{\partial \theta_j} J_{reg}(\theta) = \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} + \frac{\lambda}{m}\theta_j \hspace{25pt} \text{for }j \geq 1 \end{align*}
def cost_function(X, y, theta, lambda_=0):
m = y.shape[0]
h = X @ theta
error = (h - y)
# cost function
regularise_cost = (lambda_ / (2*m)) * np.sum(theta[1:]**2)
J = 1/(2*m) * np.sum(error**2) + regularise_cost
# gradient
grad = np.empty(theta.shape)
grad[0] = 1/m * (error @ X)[0]
grad[1:] = 1/m * (error @ X)[1:] + (lambda_/m) * theta[1:]
return J, grad
...
J, grad = cost_function(X_1, y, theta, lambda_=1)
Input values:
Name | Type | Description |
---|---|---|
theta | numpy.ndarray | theta values to compute cost function with |
X | numpy.ndarray | X variables with first column of ones |
y | numpy.ndarray | y variables |
lamdba_ | integer | lambda value used for regularisation (if 0 no regularisation is applied) |
Return values are:
Name | Type | Description |
---|---|---|
J | numpy.float64 | cost function |
grad | numpy.ndarray | gradient |
def learning_curve(X, y, Xval, yval, lambda_=0):
print("\n- Learning curve -")
m = y.shape[0]
error_train = np.zeros(m)
error_val = np.zeros(m)
for i in range(1, m+1):
theta = train_linear_reg(cost_function, X[:i, ], y[:i], lambda_)
error_train[i-1], _ = cost_function(X[:i, ], y[:i], theta, lambda_=0)
error_val[i-1], _ = cost_function(Xval, yval, theta, lambda_=0)
print("Training errors | Validation errors")
print(np.stack([error_train, error_val], axis=1))
print("\n")
plt.plot(np.arange(1, m+1), error_train)
plt.plot(np.arange(1, m+1), error_val)
plt.xlabel("Number of training examples")
plt.ylabel("Error")
plt.legend(['Train', 'Cross Validation'])
plt.title("Learning curve for linear regression")
plt.show()
...
learning_curve(X_1, y, Xval_1, yval, lambda_=0)
Input values:
Name | Type | Description |
---|---|---|
X | numpy.ndarray | X variables from training dataset with first column of ones |
y | numpy.ndarray | y variables from training dataset |
Xval | numpy.ndarray | X variables from validation dataset with first column of ones |
yval | numpy.ndarray | y variables from validation dataset |
lamdba_ | integer | lambda value used for regularisation (if 0 no regularisation is applied) |
Returns:
Table of training/ validation errors and plot of learning curve.
Uses the formula:
\begin{align*} & h_\theta(x) = \theta_0x_0 + \theta_1x_1 \newline \newline & \text{map to p-th power} \newline \newline & h_\theta(x) = \theta_0x_0 + \theta_1x_1 + \theta_2x_1^2 + \cdots + \theta_px_1^p \end{align*}
def poly_features(X, p):
X_poly = np.zeros((X.shape[0], p))
for i in range(1, p+1):
X_poly[:, i-1] = X.flatten()**(i)
return X_poly
...
X_poly = poly_features(X, p=8)
Input values:
Name | Type | Description |
---|---|---|
X | numpy.ndarray | X variables |
p | integer | polynomial power to map the features |
Return values are:
Name | Type | Description |
---|---|---|
J | numpy.ndarray | Array of X variables with polynomial power |
def validation_curve(X, y, Xval, yval):
print("\n- Validation curve -")
lambda_vec = [0, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10]
error_train = np.zeros(len(lambda_vec))
error_val = np.zeros(len(lambda_vec))
for i in range(len(lambda_vec)):
lambda_ = lambda_vec[i]
theta = train_linear_reg(cost_function, X, y, lambda_)
error_train[i], _ = cost_function(X, y, theta, lambda_=0)
error_val[i], _ = cost_function(Xval, yval, theta, lambda_=0)
print("Training errors | Validation errors")
print(np.stack([error_train, error_val], axis=1))
print("\n")
plt.plot(lambda_vec, error_train)
plt.plot(lambda_vec, error_val)
plt.xlabel("Lambda")
plt.ylabel("Error")
plt.legend(['Train', 'Cross Validation'])
plt.title("Validation curve for linear regression")
plt.show()
...
validation_curve(X_poly_n_1, y, X_poly_val_1, yval)
Input values:
Name | Type | Description |
---|---|---|
X | numpy.ndarray | X variables from training dataset with first column of ones |
y | numpy.ndarray | y variables from training dataset |
Xval | numpy.ndarray | X variables from validation dataset with first column of ones |
yval | numpy.ndarray | y variables from validation dataset |
Returns:
Table of training/ validation errors and plot of validation curve.