Linear classifiers perform classification based on the linear combinition of the component features. Some examples of Linear Classifiers include: Naive Bayes Classifier, Linear Discriminant Analysis, Logistic Regression and Perceptrons.
ArrayFire’s easy to use API enables users to write such classifiers from scratch fairly easily. In this post, we show how you can map mathematical equations to ArrayFire code and implement them from scratch.
Naive Bayes Classifier
Naive bayes classifier is a probabilistic classifier that assumes all the features in a feature vector are independent of each other. This assumption simplifies the bayes rule to a simple multiplication of probabilities as show below.
First we start with the simple Baye’s rule.
$$ p(C_k | x) = \frac{p(C_k)}{p(x)} \prod\limits_{i=1}^n p(x_i | C_k) $$
Where:
- $$ C_k \space \text{is class k} $$
- $$ x \space \text{is a feature vector containing the features} \space \{ x_0, x_1, x_2, … x_n \} $$
- $$ p(C_k | x) \space \text {is the posterior probability of class} \space C_k \space \text {given feature vector} \space x $$
- $$ p(C_k) \space \text {is the apriori probability distribution of the class} \space C_k $$
- $$ p(x | C_k) \space \text {is the probability of likelyhood} \space x \space belongs to class \space C_k $$
Since p(x)
is the same for all classes, it can be safely ignored. The classification is then done using the following formula
$$\newcommand{\argmax}{\operatornamewithlimits{argmax}}y = argmax_i(p(C_k) \prod\limits_{i=1}^n p(x_i | C_k))$$
Classification based on Naive Bayes method uses the following phases:
- Training Phase: The probability ditribution of each feature variable is modeled
- Prediction Phase: The probability of a sample belonging to each class is calculated based on the distributions from Phase 1.
In our example, we are assuming all the input features fit a normal distribution. This requires us to calculate the mean, mu
and the variance Sig2
in the training phase.
The necessary ArrayFire code for the training phase can be seen below:
// Iterate through each class
for (int ii = 0; ii < num_classes; ii++) {
// train_feats is of size [num_feats, num_train]
// train_classes is of size [1, num_train]
// Get indices where train_classes matches current class
array idx = where(train_classes == ii);
// Get train_feats from the current class only
array train_feats_ii = train_feats(span, idx);
// Get the distribution parameters
mu(span, ii) = mean(train_feats_ii, 1);
sig2(span,ii) = var(train_feats_ii, 0, 1);
// Calculate priors
priors[ii] = (float)idx.elements() / (float)num_samples;
}
With the mean and variance of each class known, we can check the probabiltiy of a sample belonging to a particular class using the CDF of the distribution. The class with the highest probability wins.
However when working with a large number of feature variables, multiplication can lead to underflow errors. We can get around this by using log
and sum
instead of exp
and mul
.
The prediciton phase can be seen below.
for (int ii = 0; ii < num_classes; ii++) {
// test_feats is of size [num_feats, num_test]
// mu, sig2 are of size [num_feats, num_classes]
// mu(span, ii), sig2(span, ii) will get mean and variance of current class
// Tile the mean and variance of current class to match test_feats
array Mu = tile(mu (span, ii), 1, num_test);
array Sig2 = tile(sig2(span, ii), 1, num_test) + 0.01; // to avoid dividing by zero
array Df = test_feats - Mu;
// CDF of normal distribution is as follows
// array P = exp(-(Df * Df) / 2 * Sig2) / sqrt(2 * af::Pi * Sig2);
// Multiply the conditional probabilities and the prior of the class
// probs(span, ii) = priors[ii] * product(P).T();
// Calculate log_P instead for numerical stability
array log_P = (-(Df * Df) / (2 * Sig2)) - log(sqrt(2 * af::Pi * Sig2));
// Accumulate the log of conditional probabilities and the prior of the class
log_probs(span, ii) = log(priors[ii]) + sum(log_P).T();
}
array val; // ignored
array idx; // Gets the class id
// Get the class ID with maximum log_probability
max(val, idx, log_probs, 1);
Perceptron
Perceptron is one of the first classification algorithms to use supervised learning. The perceptron transforms the weighted input features into the activation in the range [0, 1]
.
To perform classification on K
classes, we use K
perceptrons and choose the class with the maximum activation. For the perceptron k
, the activation function can be seen below:
$$ Y_k = f\small(W_{k,i} * X_i\small) $$
To get an activation function that is continuous we use sigmoid function show below:
$$ sigmoid\small(Z\small) = \frac{1}{1 \space + \space \exp\small(-Z\small)} $$
The equivalent ArrayFire code can be seen here:
// Activation function
array sigmoid(const array &Z)
{
// Transforms Z to a value between [0, 1]
return 1 / (1 + exp(-Z));
}
// Predict based on given parameters
array predict(const array &X, const array &Weights)
{
// Weighted inputs
array Z = matmul(X, Weights);
// Transform Weighted inputs to output
return sigmoid(Z);
}
The weights of the perceptron are trained in an iterative process. The trained weights are then used for prediction on test data. Training a perceptron includes the following steps:
- Initialize the weights
- Predict the output based on training data
- Update the weights in proportion to the folloing
$$ W_{k,i} = W_{k, i} \space + \space \alpha * \space \small(Y_k - T_k\small)^T \space * \space X_i$$
- if error greater than treshold, go to step 2.
The training process of the perceptron can be implemented in ArrayFire using the following code.
for (int i = 0; i < maxiter; i++) {
// Make a prediction based on current weights
array predicted = predict(input, Weights);
// G
array err = target - predicted;
// Get the average error
float mean_abs_err = mean(abs(err));
// Stop training if the error is less than specified bounds
if (mean_abs_err < maxerr) break;
// Update the weights based on the input and error
Weights = Weights + alpha * matmul(transpose(input), err);
}
Results
The complete source code for testing and benchmarking these algorithms can be found here:
We tested these algorithms on a small subset of the MNIST database with 6100 training images, 3900 testing images.
The performance and accuracy results can be seen below:
Algorithm | Hardware | Accuracy | Train Time | Predict Time |
---|---|---|---|---|
Naive Bayes | NVIDIA GTX 680 | 80.69% | 0.003s | 0.011s |
Naive Bayes | AMD Radeon 7970 | 81.19% | 0.007s | 0.007s |
Perceptron (1000 iter) | NVIDIA GTX 680 | 83.64% | 2.005s | 0.0003s |
Perceptron (1000 iter) | AMD Radeon 7970 | 85.29% | 4.935s | 0.0005s |
Conclusion
We have shown how simple mathematical expressions can easily translate into ArrayFire code. If you are interested in learning more about how ArrayFire can be used to develop Machine Learning applications, follow this blog!
If are going to GTC 2015, you may also be interested in the following tutorials.