Week 9 of Andrew Ng’s Machine Learning Course covered anomaly detection and recommender systems. It also has the last programming assignment of the course. The exercises were to create an anomaly detection algorithm and a movie recommendation system. The programming assignments in python are below. This repository was used as a guide in implementing the graphical plots and the anomaly detection / recommender system. The Git repository of the complete script is here.

Required modules

import numpy as np
from scipy.io import loadmat
from scipy import optimize
import matplotlib.pyplot as plt

Estimating Gaussian parameters

Uses following formulas:

Mean

\begin{align*} \mu_i = \frac{1}{m} \sum_{j=1}^m x_i^{(j)} \end{align*}

Variance

\begin{align*} \sigma_i^2 = \frac{1}{m} \sum_{j=1}^m \left( x_i^{(j)} - \mu_i \right)^2 \end{align*}

def estimate_gaussian(X):
    m, n = X.shape
    mu = (1/m) * sum(X)
    sigma_2 = (1/m) * sum((X - mu) ** 2)
    return mu, sigma_2
...
mu, sigma_2 = estimate_gaussian(X)

Input values:

Name	Type	Description
X	numpy.ndarray	Dataset

Return values are:

Name	Type	Description
mu	numpy.ndarray	Vector of the means of each dimension
sigma_2	numpy.ndarray	Vector of the variances of each dimension

Select threshold

Uses following formulas:

F1 score

\begin{align*} F_1 = \frac{2 \cdot precision \cdot recall}{precision + recall} \end{align*}

Precision

\begin{align*} precision = \frac{true positives}{true positives + false positives} \end{align*}

Recall

\begin{align*} recall = \frac{true positives}{true positives + false negatives} \end{align*}

def select_threshold(y_val, p_val):
    best_epsilon = 0
    best_F1 = 0

    for epsilon in np.linspace(1.01*min(p_val), max(p_val), 1000):
        predictions = p_val < epsilon

	true_positives = np.sum((predictions == 1) & (y_val == 1))
	false_positives = np.sum((predictions == 1) & (y_val == 0))
	false_negatives = np.sum((predictions == 0) & (y_val == 1))

        precision = true_positives / (true_positives + false_positives)
	recall = true_positives / (true_positives + false_negatives)

	F1 = (2 * precision * recall) / (precision + recall)

	if F1 > best_F1:
            best_F1 = F1
	    best_epsilon = epsilon
    
    return best_epsilon, best_F1
...
epsilon, F1 = select_threshold(y_val, p_val)

Input values:

Name	Type	Description
y_val	numpy.ndarray	Ground truth labels
p_val	numpy.ndarray	Vector of probabilities based on mu and sigma_2 parameters

Return values are:

Name	Type	Description
best_epsilon	float	Optimal threshold value
best_F1	float	Best F1 score

Collaborative filtering regularised cost function and gradient

Uses following formulas:

Cost function with regularisation:

\begin{align*} J(x^{(1)}, \dots, x^{(n_m)}, \theta^{(1)}, \dots, \theta^{(n_u)}) = \frac{1}{2} \sum_{(i,j):r(i,j)=1} \left( \left( \theta^{(j)} \right)^T x^{(i)} - y^{(i,j)} \right)^2 + \left( \frac{\lambda}{2} \sum_{j=1}^{n_u} \sum_{k=1}^{n} \left( \theta_k^{(j)} \right)^2 \right) + \left( \frac{\lambda}{2} \sum_{i=1}^{n_m} \sum_{k=1}^n \left(x_k^{(i)} \right)^2 \right) \end{align*}

Gradient with regularisation

\begin{align*} \frac{\partial J}{\partial x_k^{(i)}} = \sum_{j:r(i,j)=1} \left( \left(\theta^{(j)}\right)^T x^{(i)} - y^{(i,j)} \right) \theta_k^{(j)} + \lambda x_k^{(i)} \end{align*}

\begin{align*} \frac{\partial J}{\partial \theta_k^{(j)}} = \sum_{i:r(i,j)=1} \left( \left(\theta^{(j)}\right)^T x^{(i)}- y^{(i,j)} \right) x_k^{(i)} + \lambda \theta_k^{(j)} \end{align*}

def cost_function(params, Y, R, num_users, num_movies, num_features,
                  lambda_=0.0):

    X = params[:num_movies*num_features].reshape(num_movies, num_features)
    theta = params[num_movies*num_features:].reshape(num_users, num_features)

    # Cost function
    error = (X @ np.transpose(theta)) - Y
    cost_regularisation = (lambda_ / 2) * np.sum(theta ** 2)\
        + (lambda_ / 2) * np.sum(X ** 2)
    J = 1/2 * np.sum((error ** 2) * R) + cost_regularisation

    # Gradient
    X_grad = (error * R) @ theta + lambda_ * X
    theta_grad = np.transpose(error * R) @ X + lambda_ * theta

    grad = np.concatenate([X_grad.ravel(), theta_grad.ravel()])

    return J, grad
...
J, grad = cost_function(np.concatenate([X.ravel(), theta.ravel()]),
                        Y, R, num_users, num_movies, num_features, lambda_=1.5)

Input values:

Name	Type	Description
params	numpy.ndarray	Concatenation of feature vector X and parameters theta
Y	numpy.ndarray	Dataset of user ratings of movies
R	numpy.ndarray	Boolean array where 1 represent a movie was rated by an user
num_users	int	Total number of users
num_movies	int	Total number of movies
num_features	int	Total number of features to learn
lambda_	float	Regularisation coefficient

Return values are:

Name	Type	Description
J	float	Cost function for given parameters
grad	numpy.ndarray	Vector of gradient for given parameters

Programming Exercise 8

Required modules

Estimating Gaussian parameters

Select threshold

Collaborative filtering regularised cost function and gradient