Programming Exercise 8



Week 9 of Andrew Ng’s Machine Learning Course covered anomaly detection and recommender systems. It also has the last programming assignment of the course. The exercises were to create an anomaly detection algorithm and a movie recommendation system. The programming assignments in python are below. This repository was used as a guide in implementing the graphical plots and the anomaly detection / recommender system. The Git repository of the complete script is here.

Required modules

import numpy as np
from scipy.io import loadmat
from scipy import optimize
import matplotlib.pyplot as plt

Estimating Gaussian parameters

Uses following formulas:

Mean

\begin{align*} \mu_i = \frac{1}{m} \sum_{j=1}^m x_i^{(j)} \end{align*}

Variance

\begin{align*} \sigma_i^2 = \frac{1}{m} \sum_{j=1}^m \left( x_i^{(j)} - \mu_i \right)^2 \end{align*}

def estimate_gaussian(X):
    m, n = X.shape
    mu = (1/m) * sum(X)
    sigma_2 = (1/m) * sum((X - mu) ** 2)
    return mu, sigma_2
...
mu, sigma_2 = estimate_gaussian(X)

Input values:

Name Type Description
X numpy.ndarray Dataset

Return values are:

Name Type Description
mu numpy.ndarray Vector of the means of each dimension
sigma_2 numpy.ndarray Vector of the variances of each dimension

Select threshold

Uses following formulas:

F1 score

\begin{align*} F_1 = \frac{2 \cdot precision \cdot recall}{precision + recall} \end{align*}

Precision

\begin{align*} precision = \frac{true positives}{true positives + false positives} \end{align*}

Recall

\begin{align*} recall = \frac{true positives}{true positives + false negatives} \end{align*}

def select_threshold(y_val, p_val):
    best_epsilon = 0
    best_F1 = 0

    for epsilon in np.linspace(1.01*min(p_val), max(p_val), 1000):
        predictions = p_val < epsilon

	true_positives = np.sum((predictions == 1) & (y_val == 1))
	false_positives = np.sum((predictions == 1) & (y_val == 0))
	false_negatives = np.sum((predictions == 0) & (y_val == 1))

        precision = true_positives / (true_positives + false_positives)
	recall = true_positives / (true_positives + false_negatives)

	F1 = (2 * precision * recall) / (precision + recall)

	if F1 > best_F1:
            best_F1 = F1
	    best_epsilon = epsilon
    
    return best_epsilon, best_F1
...
epsilon, F1 = select_threshold(y_val, p_val)

Input values:

Name Type Description
y_val numpy.ndarray Ground truth labels
p_val numpy.ndarray Vector of probabilities based on mu and sigma_2 parameters

Return values are:

Name Type Description
best_epsilon float Optimal threshold value
best_F1 float Best F1 score

Collaborative filtering regularised cost function and gradient

Uses following formulas:

Cost function with regularisation:

\begin{align*} J(x^{(1)}, \dots, x^{(n_m)}, \theta^{(1)}, \dots, \theta^{(n_u)}) = \frac{1}{2} \sum_{(i,j):r(i,j)=1} \left( \left( \theta^{(j)} \right)^T x^{(i)} - y^{(i,j)} \right)^2 + \left( \frac{\lambda}{2} \sum_{j=1}^{n_u} \sum_{k=1}^{n} \left( \theta_k^{(j)} \right)^2 \right) + \left( \frac{\lambda}{2} \sum_{i=1}^{n_m} \sum_{k=1}^n \left(x_k^{(i)} \right)^2 \right) \end{align*}

Gradient with regularisation

\begin{align*} \frac{\partial J}{\partial x_k^{(i)}} = \sum_{j:r(i,j)=1} \left( \left(\theta^{(j)}\right)^T x^{(i)} - y^{(i,j)} \right) \theta_k^{(j)} + \lambda x_k^{(i)} \end{align*}

\begin{align*} \frac{\partial J}{\partial \theta_k^{(j)}} = \sum_{i:r(i,j)=1} \left( \left(\theta^{(j)}\right)^T x^{(i)}- y^{(i,j)} \right) x_k^{(i)} + \lambda \theta_k^{(j)} \end{align*}

def cost_function(params, Y, R, num_users, num_movies, num_features,
                  lambda_=0.0):

    X = params[:num_movies*num_features].reshape(num_movies, num_features)
    theta = params[num_movies*num_features:].reshape(num_users, num_features)

    # Cost function
    error = (X @ np.transpose(theta)) - Y
    cost_regularisation = (lambda_ / 2) * np.sum(theta ** 2)\
        + (lambda_ / 2) * np.sum(X ** 2)
    J = 1/2 * np.sum((error ** 2) * R) + cost_regularisation

    # Gradient
    X_grad = (error * R) @ theta + lambda_ * X
    theta_grad = np.transpose(error * R) @ X + lambda_ * theta

    grad = np.concatenate([X_grad.ravel(), theta_grad.ravel()])

    return J, grad
...
J, grad = cost_function(np.concatenate([X.ravel(), theta.ravel()]),
                        Y, R, num_users, num_movies, num_features, lambda_=1.5)

Input values:

Name Type Description
params numpy.ndarray Concatenation of feature vector X and parameters theta
Y numpy.ndarray Dataset of user ratings of movies
R numpy.ndarray Boolean array where 1 represent a movie was rated by an user
num_users int Total number of users
num_movies int Total number of movies
num_features int Total number of features to learn
lambda_ float Regularisation coefficient

Return values are:

Name Type Description
J float Cost function for given parameters
grad numpy.ndarray Vector of gradient for given parameters
August 2020