In Week 2 of Andrew Ng’s Machine Learning Course the fundamentals of linear regression are taught. This culminates in octave coding exercises. As an extra challenge throughout the course I will be completing these exercises in both octave and python. Below documents python coding snippets from the exercises. The git repository of the full code can be found here.
import numpy as py
Two example data sets are provided:
ex1data1.txt | data set for linear regression with one variable |
ex1data2.txt | data set for linear regression with two variable |
def import_data(filename, normalise):
data = np.loadtxt(filename, delimiter=",")
y = (data[:,-1])
X = (data[:,:-1])
mu = None
sigma = None
if normalise == True:
X, mu, sigma = feature_normalisation(X)
X_1 = np.c_[np.ones(X.shape[0]),X]
features = X_1.shape[1]
return(X_1, y, mu, sigma, features)
...
X_1, y, mu, sigma, features = import_data("ex1data1.txt", normalise=False)
Input values:
Name | Type | Description |
---|---|---|
filename | string | filename of dataset to be loaded |
normalise | boolean | apply feature normalisation to dataset if True |
Return values are:
Name | Type | Description |
---|---|---|
X_1 | numpy.ndarray | X variables with first column of ones |
y | numpy.ndarray | y variables |
mu | numpy.ndarray | mean of each feature of X (returned from feature_normalisation function) |
sigma | numpy.ndarray | standard deviation of each feature of X (returned from feature_normalisation function) |
features | numpy.ndarray | number of features of X_1 |
Uses the following equation:
\begin{align*} x_i = \frac {x_i - \mu_i} {S_i} \forall i \in {1, 2, \cdots, n} \end{align*}
def feature_normalisation(X):
mu = np.mean(X, axis = 0)
sigma = np.std(X, axis = 0, ddof=1)
X_n = (X - mu) / sigma
return(X_n, mu, sigma)
...
X_n, mu, sigma = feature_normalisation(X)
Input values:
Name | Type | Description |
---|---|---|
X | numpy.ndarray | X variables |
Return values are:
Name | Type | Description |
---|---|---|
X_n | numpy.ndarray | X variables normalised |
mu | numpy.ndarray | mean of X |
sigma | numpy.ndarray | standard deviation of X |
Uses the following equation:
\begin{align*}& J(\theta) = {1 \over 2m} \sum_{i=1}^m \left( h_\theta(x^{(i)}) - y^{(i)} \right)^2 \end{align*}
\begin{align*} h_\theta(x) &= \theta_0x_0 + \theta_1x_1 + \theta_2x_2 + \cdots + \theta_nx_n \text{, where }x_0 = 1 \newline &= \theta^T x \end{align*}
def cost_function(X, y, theta):
m = y.shape[0]
error = (X @ theta) - y
J = 1 / (2*m) * sum(error**2)
return(J)
...
J = cost_function(X_1, y, theta)
Input values:
Name | Type | Description |
---|---|---|
X | numpy.ndarray | X variables with first column of ones, i.e. X_1 obtained from import_data function |
y | numpy.ndarray | y variables |
theta | numpy.ndarray | theta values to compute cost function with |
Return values are:
Name | Type | Description |
---|---|---|
J | numpy.float64 | cost function |
Uses the following equation:
\begin{align*}& \text{repeat until convergence:} \lbrace \newline & \theta_j := \theta_j - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} & \text{for j := 0…n}\newline \rbrace \end{align*}
\begin{align*} h_\theta(x) &= \theta_0x_0 + \theta_1x_1 + \theta_2x_2 + \cdots + \theta_nx_n \text{, where }x_0 = 1 \newline &= \theta^T x \end{align*}
def gradient_descent(X, y, theta, alpha, iterations):
m = y.shape[0]
J_history = []
for i in range(iterations):
J_history.append(cost_function(X, y, theta))
error = np.transpose(X) @ ((X @ theta) - y)
theta = theta - (alpha / m) * (error)
return(theta, J_history)
...
theta, J_history = gradient_descent(X_1, y, theta, alpha, iterations)
Input values:
Name | Type | Description |
---|---|---|
X | numpy.ndarray | X variables with first column of ones with / without feature normalisation |
y | numpy.ndarray | y variables |
theta | numpy.ndarray | initial theta values by default set at an array of zeros ‘np.zeros(features)’ |
alpha | float | learning rate |
iterations | integer | number of iterations of gradient descent to perform |
Return values:
Name | Type | Description |
---|---|---|
theta | numpy.ndarray | theta values computed from gradient descent |
J_history | list | calculated cost function for each theta of each iteration (can be used to ensure optimal learning rate) |
Uses the following equation:
\begin{align*} &\theta = \left( X^TX \right)^{-1}X^Ty \end{align*}
def normal_equation(X, y):
theta = np.linalg.inv(np.transpose(X) @ X) @ np.transpose(X) @ y
return(theta)
...
theta = normal_equation(X_1, y)
Input values:
Name | Type | Description |
---|---|---|
X | numpy.ndarray | X variables with first column of ones with / without feature normalisation |
y | numpy.ndarray | y variables |
Return value:
Name | Type | Description |
---|---|---|
theta | numpy.ndarray | theta values computed from normal equation |
def predict_y(theta, x, mu, sigma):
if any(v is None for v in [mu, sigma]):
x_1 = np.insert(x, 0, 1)
y = x_1 @ theta
else:
x_n = np.transpose((x - mu) / sigma)
x_n1 = np.insert(x_n, 0, 1)
y = x_n1 @ theta
return(y)
...
y = predict_y(theta, x, mu, sigma)
Input values are:
Name | Type | Description |
---|---|---|
theta | numpy.ndarray | theta values used to calculate predicted y |
x | numpy.ndarray | set of x values used to predict y |
mu | numpy.ndarray | mean of X (returned from import_data function). If set as None function will not apply feature normalisation when predicting y. |
sigma | numpy.ndarray | standard deviation of X (returned from import_data function). If set as None function will not apply feature normalisation when predicting y. |
Return value:
Name | Type | Description |
---|---|---|
y | numpy.float64 | predicted y value from given thetas and x variables |