Week 9 of Andrew Ng’s Machine Learning Course covered anomaly detection and recommender systems. It also has the last programming assignment of the course. The exercises were to create an anomaly detection algorithm and a movie recommendation system. The programming assignments in python are below. This repository was used as a guide in implementing the graphical plots and the anomaly detection / recommender system. The Git repository of the complete script is here.
import numpy as np
from scipy.io import loadmat
from scipy import optimize
import matplotlib.pyplot as plt
Uses following formulas:
Mean
\begin{align*} \mu_i = \frac{1}{m} \sum_{j=1}^m x_i^{(j)} \end{align*}
Variance
\begin{align*} \sigma_i^2 = \frac{1}{m} \sum_{j=1}^m \left( x_i^{(j)} - \mu_i \right)^2 \end{align*}
def estimate_gaussian(X):
m, n = X.shape
mu = (1/m) * sum(X)
sigma_2 = (1/m) * sum((X - mu) ** 2)
return mu, sigma_2
...
mu, sigma_2 = estimate_gaussian(X)
Input values:
Name | Type | Description |
---|---|---|
X | numpy.ndarray | Dataset |
Return values are:
Name | Type | Description |
---|---|---|
mu | numpy.ndarray | Vector of the means of each dimension |
sigma_2 | numpy.ndarray | Vector of the variances of each dimension |
Uses following formulas:
F1 score
\begin{align*} F_1 = \frac{2 \cdot precision \cdot recall}{precision + recall} \end{align*}
Precision
\begin{align*} precision = \frac{true positives}{true positives + false positives} \end{align*}
Recall
\begin{align*} recall = \frac{true positives}{true positives + false negatives} \end{align*}
def select_threshold(y_val, p_val):
best_epsilon = 0
best_F1 = 0
for epsilon in np.linspace(1.01*min(p_val), max(p_val), 1000):
predictions = p_val < epsilon
true_positives = np.sum((predictions == 1) & (y_val == 1))
false_positives = np.sum((predictions == 1) & (y_val == 0))
false_negatives = np.sum((predictions == 0) & (y_val == 1))
precision = true_positives / (true_positives + false_positives)
recall = true_positives / (true_positives + false_negatives)
F1 = (2 * precision * recall) / (precision + recall)
if F1 > best_F1:
best_F1 = F1
best_epsilon = epsilon
return best_epsilon, best_F1
...
epsilon, F1 = select_threshold(y_val, p_val)
Input values:
Name | Type | Description |
---|---|---|
y_val | numpy.ndarray | Ground truth labels |
p_val | numpy.ndarray | Vector of probabilities based on mu and sigma_2 parameters |
Return values are:
Name | Type | Description |
---|---|---|
best_epsilon | float | Optimal threshold value |
best_F1 | float | Best F1 score |
Uses following formulas:
Cost function with regularisation:
\begin{align*} J(x^{(1)}, \dots, x^{(n_m)}, \theta^{(1)}, \dots, \theta^{(n_u)}) = \frac{1}{2} \sum_{(i,j):r(i,j)=1} \left( \left( \theta^{(j)} \right)^T x^{(i)} - y^{(i,j)} \right)^2 + \left( \frac{\lambda}{2} \sum_{j=1}^{n_u} \sum_{k=1}^{n} \left( \theta_k^{(j)} \right)^2 \right) + \left( \frac{\lambda}{2} \sum_{i=1}^{n_m} \sum_{k=1}^n \left(x_k^{(i)} \right)^2 \right) \end{align*}
Gradient with regularisation
\begin{align*} \frac{\partial J}{\partial x_k^{(i)}} = \sum_{j:r(i,j)=1} \left( \left(\theta^{(j)}\right)^T x^{(i)} - y^{(i,j)} \right) \theta_k^{(j)} + \lambda x_k^{(i)} \end{align*}
\begin{align*} \frac{\partial J}{\partial \theta_k^{(j)}} = \sum_{i:r(i,j)=1} \left( \left(\theta^{(j)}\right)^T x^{(i)}- y^{(i,j)} \right) x_k^{(i)} + \lambda \theta_k^{(j)} \end{align*}
def cost_function(params, Y, R, num_users, num_movies, num_features,
lambda_=0.0):
X = params[:num_movies*num_features].reshape(num_movies, num_features)
theta = params[num_movies*num_features:].reshape(num_users, num_features)
# Cost function
error = (X @ np.transpose(theta)) - Y
cost_regularisation = (lambda_ / 2) * np.sum(theta ** 2)\
+ (lambda_ / 2) * np.sum(X ** 2)
J = 1/2 * np.sum((error ** 2) * R) + cost_regularisation
# Gradient
X_grad = (error * R) @ theta + lambda_ * X
theta_grad = np.transpose(error * R) @ X + lambda_ * theta
grad = np.concatenate([X_grad.ravel(), theta_grad.ravel()])
return J, grad
...
J, grad = cost_function(np.concatenate([X.ravel(), theta.ravel()]),
Y, R, num_users, num_movies, num_features, lambda_=1.5)
Input values:
Name | Type | Description |
---|---|---|
params | numpy.ndarray | Concatenation of feature vector X and parameters theta |
Y | numpy.ndarray | Dataset of user ratings of movies |
R | numpy.ndarray | Boolean array where 1 represent a movie was rated by an user |
num_users | int | Total number of users |
num_movies | int | Total number of movies |
num_features | int | Total number of features to learn |
lambda_ | float | Regularisation coefficient |
Return values are:
Name | Type | Description |
---|---|---|
J | float | Cost function for given parameters |
grad | numpy.ndarray | Vector of gradient for given parameters |