Neural Networks

In the last exercise Multi-class classification with logistic regression we implemented a mult-class classifier that was capable of recognizing handwritten digits. However, logistic regression cannon form more complex hypotheses as it is only a linear classifier, unless more polynomial features are added, however that can become too computationally intensive to train.

In this exercise we will implement a neural network to recognize handwritten digits using the training set used in the previous exercise. The neural network will be able to recognize handwritten digits using the same training set as before and will be able to represent complex models that form non-linear hypotheses. In this exercise we will used parameters from a neural network that has already been trianed. The goal is to use a feedforward propagation algorithm to make predictions using the provided weights. In a separate exercise we will implement the backpropagation algorithm for learning the network parameters.

Model representation

The network model we will be using today has three layers; an input layer, a single hidden layer and an output layer. Recall that the inputs from the training set are 20x20 pixel images that have been unrolled into a 400 element feature vector with an additional bias unit (401 total features). The weights are stored in ex3theta1.csv and ex3theta2.csv and are sized for a neural network with 25 units in the hidden layer and 10 output units which corresponds to the 10 classification labels of the digits.

The first (input) layer of the model consists of 401 nodes, 400 nodes from the image plus an additional bias node. These nodes are combined linearly with \(\Theta^{(1)}\) to give the \(m\)x\(1\) dimensional vector \(z^{(2)}\). This output is then mapped through the sigmoid function to return \(a^{(2)}\) which is then combined with \(\Theta^{(2)}\) to get \(z^{(3)}\) which is then passed through the sigmoid function to return \(h_{\theta}(x)\)

Implementation

# Sigmoid function
sigmoid_function <- function(z){
  1 / (1 + exp(-z))
}

nn_predict <- function(X, theta1, theta2){
  m <- dim(X)[1]
  z2 <- X %*% t(theta1)
  a2 <- sigmoid_function(z2)
  a2 <- cbind(rep(1,m), a2)
  z3 <- a2 %*% t(theta2)
  hx <- sigmoid_function(z3)
  return(hx)
}

Now we can test the predictions returned by our neural network by using the test data in ex3data1_features.txt and ex3data1_labs.txt

# Load test features
X <- as.matrix(read.csv("./data/ex3data1_features.txt", header=FALSE))
m <- dim(X)[1]
X <- cbind(rep(1,m), X)

# Load labs
y <- as.matrix(read.csv("./data/ex3data1_labs.txt", header=FALSE))

# Load pre-trained parameters
theta1 <- as.matrix(read.csv("./data/ex3theta1.csv", header=FALSE))
theta2 <- as.matrix(read.csv("./data/ex3theta2.csv", header=FALSE))

# Make predictions using nn_predict
predictions <- apply(nn_predict(X, theta1, theta2), 1, function(x) which(x == max(x)))

# Determine accuracy
accuracy <- mean(predictions == y)
accuracy
## [1] 0.9752

Great, from the output above we can see that the accuracy of our model is 97.52! Predictions from single feature vectors can be made by passing them through our prediction function

random_image <- sample(5000,1)
X <- matrix(X[random_image,], nrow=1)
prediction <- nn_predict(X, theta1, theta2)
prediction <- which(prediction == max(prediction))
actual_label <- y[random_image,1]

image(matrix(X[1,-1], 20, 20))

For the above example our algorithm predicts 5 while the actual class label is 5 which is correct.

In the next post, we’ll learn how to generate our own set of weights using the backpropagation algorithm.