Neural Networks and Deep Learning — AI & LLM Fundamentals

The Artificial Neuron

A biological neuron receives signals through dendrites, processes them in the cell body, and fires an output through the axon. An artificial neuron does something remarkably similar: it takes multiple inputs, multiplies each by a weight, adds a bias, and passes the result through an activation function.

output = activation(w1*x1 + w2*x2 + ... + wn*xn + bias)

The activation function introduces non-linearity — without it, stacking layers would be mathematically equivalent to a single layer. Common activation functions include ReLU (max(0, x)), sigmoid, and tanh.

From Perceptrons to Deep Networks

A single neuron (perceptron) can only learn linear boundaries. Stack neurons into layers and you get a neural network. Stack many layers and you get a deep neural network — hence "deep learning."

Input layer: Receives raw data (pixels, words, numbers)
Hidden layers: Extract increasingly abstract features
Output layer: Produces the final prediction

A network with 2-3 hidden layers can approximate virtually any mathematical function. Modern LLMs use dozens to over a hundred layers.

Backpropagation: How Networks Learn

Training a neural network means finding the right weights. Backpropagation is the algorithm that makes this possible:

Forward pass: Input flows through the network, producing a prediction
Loss calculation: Compare prediction to the correct answer using a loss function
Backward pass: Calculate how much each weight contributed to the error (using the chain rule of calculus)
Weight update: Adjust weights in the direction that reduces error (gradient descent)

Repeat this millions of times across the training dataset, and the network gradually learns to make accurate predictions.

Key Architectures

CNNs (Convolutional Neural Networks): Specialized for grid-like data (images). Use small sliding filters that detect edges, textures, and shapes. Dominated computer vision from 2012-2020.

RNNs (Recurrent Neural Networks): Designed for sequential data (text, audio). Process inputs one at a time while maintaining a hidden state as "memory." LSTMs and GRUs improved on basic RNNs by solving the vanishing gradient problem.

Transformers: Replaced RNNs for most language tasks by processing all tokens in parallel using attention mechanisms. We'll cover these in detail in the next lesson.

Why Depth Matters

Each layer in a deep network learns to represent data at a different level of abstraction. In an image network: early layers detect edges, middle layers detect shapes, and deep layers detect objects. In a language model: early layers capture syntax, middle layers capture semantics, and deep layers capture reasoning patterns. This hierarchy of learned representations is what gives deep learning its power.