Neural Nets: A Beginner’s Guide…

3 min readMar 6, 2024

This article is a primer into the inner workings of Neural Nets. We are going to cover the overall architecture of the neural network (different layers in the neural network), loss functions, gradient descent and a simple example of Logistic regression through a neural network.

So, let’s dive in.

Neural Networks are used to approximate any function due to their architecture. And hence Neural Networks are also called universal approximators.

Neural Networks have following layers:

Input Layer
Hidden Layers
Fully Connected Linear Layers / Output Layer

Let’s cover layer by layer.

Input Layer: is used to provide the input/features to Neural Nets, no of neurons in the input layer is equivalent to the no. of features. Inputs/ Features are numbers that neural nets use to tease out patterns.

Hidden Layers: are layers where NN learns complex patterns on the input data. Every neuron in hidden layers has weights, biases and in these layers are where nonlinear activation functions are used. Non-Linear Activation functions provide nonlinearity to NN’s. Some activation functions are as below:

Sigmoid, ReLU , Tanh and Softmax

Output Layer: is a fully connected layer, whose main function is to address the final prediction or the function output. To achieve this NN’s use loss functions such as Negative Likelihood Function / Cross Entropy Loss for Logistics Regression to achieve Classification tasks. For regression loss functions used are MSE / MAE etc.

As we covered layers of NN. Let’s move into how NN’s work/ get trained.

Refer below image for how NN works. I will explain this in detail now.

Working of Neural Networks

Input layers have inputs, weights and biases. Hidden layers are fully connected where activation functions are used and finally output layers where loss functions are.

Now from the training perspective of NN’s there is a forward pass in which the following equation is used.

Neural Net Function (a depiction of a very simplistic function)

In the function equation above, yHat is a true label from the data set. ‘Sigma’ depicts an activation function. ‘w’ depicts weights. ‘x’ is an input. ‘b’ is a bias.

The way NN trains its weights are multiplied with input and bias term added and then NN produces logits which are given to the loss function. This is a forward pass of NN.

In backward pass the weights are updated using Gradient Descent to adjust the difference of prediction between the NN predicted logit and true label(yHat) in our case. Gradient descent is key for training NN’s. Gradient descent works on taking derivatives of weights w.r.t to all inputs. This is called Chain Rule.

Initial weights are updated using Gradient Descent and this Is KEY.

Depiction of Gradient Descent is as follows:

The above image shows how the weights are initialized and then weights are updated in a manner to reach global minimum. This all is achieved through gradient descent. There is Stochastic Gradient Descent, Adam and other optimizers that are used to train NN.

Important point here is that you can use the ’N’ number of hidden layers to make the NN as deep as possible.

Also, NN’s can take embeddings as input which are nothing but vectors of numbers for tokens (language) to model languages.

In summary, NN’s can be used to model any nonlinear function.

Thank you for reading…

Contact Info…

LinkedIn • Kaggle • HuggingFace • Twitter / X • GitHub

Neural Nets: A Beginner’s Guide…

Contact Info…

Written by Vinsmoke Somya

No responses yet