Building a Convolutional Neural Network from Scratch

I thought it'd be fun to do this so here we are.

Process

Convolution Operation

Figuring out the convolution was quite simple and just involved adapting the valid cross-correlation formula which is sliding the kernel on top of the input, multiplying the adjacent values, and summing them up. To get the convolution, it involved rotating the kernel 180 degrees. Thus, finding the convolution looked like $conv(I, K) = I * rot180(K)$ *

Implementing the Convolutional Layer

Forward Kernel

Now the crux of the CNN—the convolutional layer. This involved taking in a 3 dimensional block of data as inputs (depth being 3). The kernels are also a 3 dimensional block in this case, spanning the full depth of the input. Something neat is that we can have multiple kernels, all of which extend the depth of the input. Each kernel contains a bias matrix that has the same shape as the output. Then, the layer would produce a 3 dimensional block of data as the output. Computing the output involved taking the cross-correlation with the input data and summing this up with the bias. That process is repeated with each kernel. We use the following formula for calculating the outputs: $Y_1 = B_1 + X_1 \star K_{11} + X_2 \star K_{12} + X_3 \star K_{13}$ We can repeat the use of this equation for every kernel, simply by using a different kernel and bias matrix. The inputs, $X_n$ , would stay the same. This is called the forward propagation of the convolutional layer.

Using sum notation, we can write it like this:

$Y_i = B_i + \sum_{j=1}^{n} X_j \star K_{ij}, \space\space\space\space\space i =1...d$

Backward Kernel

To update the kernels and biases, we need to compute their gradients. We're given the derivative of E, $\frac{\mathrm{d} E}{\mathrm{d} Y_i}$ , and we need to compute two things. First, the derivative of E with respect to the trainable parameters of the layer, $\frac{\mathrm{d} E}{\mathrm{d} K_{ij}}$ .

Second, the derivative of E with respect to the input of the layer, $\frac{\mathrm{d} E}{\mathrm{d} B_i}$ .

Once you find these, we use the backward method by initializing empty arrays for the kernel and input gradients. We compute the derivative of E with respect to k, i, and j inside two nested for loops that go through the indices k, i, and j. We do this by simply translating the mathematical formula into code (with the help of SciPy's signal class).

[!NOTE] To be honest, this is where I got really lost and I definitely will be revisiting this later to better understand what's going on here. This is also the core element of computer vision algorithms that are using deep learning today so it's pretty important to understand this part.

Implementing the Reshape Layer

So this layer is inherited from the base layer class. The class looks something like this:

python

The constructor takes in the shape of the input and output. The forward method reshapes the input to the output shape. The backward method reshapes the output to the input shape. Not too much going on here.

Binary Cross-Entropy Loss

We're given a vector, $Y^\ast$ , containing the desired outputs of the neural network. Keep in mind $y_i \space\epsilon\space \{0, 1\}$ (hence the term binary).

We also have the actual output of the neural network, $Y$ .

The binary cross entropy loss is defined as the following:

$E = -\frac{1}{n} \sum_{i=1}^{n} \left[ y_i^\ast \log(y_i) + (1 - y_i^\ast) \log(1 - y_i) \right]$

The goal is to compute the derivative of E with respect to the output. Upon plugging $E$ into $\frac{\partial E}{\partial y_1}$ , we find that $Y_1$ only appears in the first term.

Thus, we can just use the chain rule and we get the following:

$\frac{\partial E}{\partial y_i} = \frac{1}{n} \left( \frac{1 - y_i^\ast}{1 - y_i} - \frac{y_i^\ast}{y_i} \right)$

Also, I added a small epsilon value that prevents log(0) and division by 0. After converting this to code, it looks something like this:

python

Sigmoid Activation

The sigmoid activation takes any real number and outputs a value between 0 and 1. This is particularly useful for binary classification problems where the output is interpreted as a probability. The sigmoid activation is defined as:

$\sigma(x) = \frac{1}{1 + e^{-x}}$

The derivative is:

$\sigma'(x) = \frac{e^{-x}}{(1 + e^{-x})^2}$

And the implementation looks like this:

python

Solving MNIST

MNIST is a dataset of handwritten digits (0-9). The goal of this CNN is to classify each of these images into a number. We load the MNIST dataset from the keras library like so:

python

First, we get the indices of images representing a zero or one. Then, we stack the arrays of numbers together and shuffle them. Then, we extract only these images from the indices. Then, we reshape each image from 28x28 pixels to a 3D block of 1x28x28 pixels. This is because our convolutional layer takes in a 3D block of data with the depth as the first dimension. The images contain numbers from 0 to 255, we normalize the input by dividing each input by 255. For the output vector, we use another util from keras called to_categorical which will create a one-hot encoded vector from a number. Essentially, $0 \to [1, 0] \text{ and } 1 \to [0, 1]$ . Then we use reshape because the dense layer takes in this type of input.

Finally, our network looks something like this:

python

We then define our epochs and learning rate. I used values 20 and 0.1 respectively.

Now for training, it looks quite similar to building a regular neural network except we are using the binary cross entropy loss in this.

python

Running It

bash

Acknowledgements

This was super fun to build and I learned a lot. Thanks to The Independent Code and his extremely informative video which I followed and adapted.

I thought it'd be fun to do this so here we are.

Process

Convolution Operation

Implementing the Convolutional Layer

Forward Kernel

Using sum notation, we can write it like this:

$Y_i = B_i + \sum_{j=1}^{n} X_j \star K_{ij}, \space\space\space\space\space i =1...d$

Backward Kernel

Second, the derivative of E with respect to the input of the layer, $\frac{\mathrm{d} E}{\mathrm{d} B_i}$ .

[!NOTE] To be honest, this is where I got really lost and I definitely will be revisiting this later to better understand what's going on here. This is also the core element of computer vision algorithms that are using deep learning today so it's pretty important to understand this part.

Implementing the Reshape Layer

So this layer is inherited from the base layer class. The class looks something like this:

python

Binary Cross-Entropy Loss

We're given a vector, $Y^\ast$ , containing the desired outputs of the neural network. Keep in mind $y_i \space\epsilon\space \{0, 1\}$ (hence the term binary).

We also have the actual output of the neural network, $Y$ .

The binary cross entropy loss is defined as the following:

$E = -\frac{1}{n} \sum_{i=1}^{n} \left[ y_i^\ast \log(y_i) + (1 - y_i^\ast) \log(1 - y_i) \right]$

The goal is to compute the derivative of E with respect to the output. Upon plugging $E$ into $\frac{\partial E}{\partial y_1}$ , we find that $Y_1$ only appears in the first term.

Thus, we can just use the chain rule and we get the following:

$\frac{\partial E}{\partial y_i} = \frac{1}{n} \left( \frac{1 - y_i^\ast}{1 - y_i} - \frac{y_i^\ast}{y_i} \right)$

Also, I added a small epsilon value that prevents log(0) and division by 0. After converting this to code, it looks something like this:

python

Sigmoid Activation

$\sigma(x) = \frac{1}{1 + e^{-x}}$

The derivative is:

$\sigma'(x) = \frac{e^{-x}}{(1 + e^{-x})^2}$

And the implementation looks like this:

python

Solving MNIST

MNIST is a dataset of handwritten digits (0-9). The goal of this CNN is to classify each of these images into a number. We load the MNIST dataset from the keras library like so:

python

Finally, our network looks something like this:

python

We then define our epochs and learning rate. I used values 20 and 0.1 respectively.

Now for training, it looks quite similar to building a regular neural network except we are using the binary cross entropy loss in this.

python

Running It

bash

Acknowledgements

This was super fun to build and I learned a lot. Thanks to The Independent Code and his extremely informative video which I followed and adapted.