Our human brain is the most complex organ in the entire human body. It contains roughly 86 billion neurons, each of which is linked to the others through 'synaptic connections' in a total of 100-500 trillion connections. We as humans don't even fully understand how our own brain works, has consciousness and is able to store memories.

The idea of modern neural networks was an attempt to recreate the intelligence of our brains using a simplified mathematical model. This enables them to learn patterns very quickly and generate quite accurate results, unlike previous techniques like SVMs, and they have since been used in large-scale models like GPT or Stable Diffusion.

Neural networks have been around since the 1940s, but they only started to gain serious popularity in the early 2010s because of the availability of more powerful hardware like GPUs.

Every large model you see online is built with neurons arranged in different ways, and all models learn using the same underlying algorithm, which we are going to explore today: backpropagation.

We're going to build a neural network from the ground up with just NumPy. To really get the most out of this blog, I expect you to have a basic understanding of differential calculus, some Python fundamentals and some basic experience with NumPy, though I have revisited a few concepts in this blog.

💡

I would like to be clear that this work is NOT a recreation of any online content that you would see, such as Andrej Karpathy's Zero to Hero series or Andrew Ng's Deep Learning Specialisation. Though I have taken inspiration from various sources, I have built my own understanding and this is my own independent work built upon my knowledge.

This is going to be a rather long and technical blog. To keep things simple, I'm going to skip over a lot of concepts you would usually learn in a deep learning course, such as Dropout, Early Stopping, Learning rate schedulers or Quantisation.

What exactly is a Neuron?

A Neuron is the fundamental unit of computation of Neural Networks. They're also called perceptrons or units. Each neuron takes a set of inputs, associates a 'weight' corresponding to those inputs as well as a 'bias ', and returns a dot product of the inputs and weights added with the bias. This dot product is further passed into an 'activation function ', which is the final neuron output.

Weights and biases are just numerical values which signify the strength of an input or a connection. Together, they're called the 'parameters' of the model.