AI Explainer: How Neural Networks Work

What is a Neural Network?

Imagine your brain is made of billions of tiny decision-makers called neurons. Each neuron:

🎯 Takes in information (inputs)
🤔 Thinks about it (processing)
💡 Makes a decision (output)

An AI neural network works the same way! It's like a simplified brain made of math. Let's see it in action!

A neural network is a function approximator that transforms inputs through layers of neurons:

f(x) = σ(W₃ · σ(W₂ · σ(W₁ · x + b₁) + b₂) + b₃)

Where:

x = input vector
Wᵢ = weight matrix for layer i
bᵢ = bias vector for layer i
σ = activation function (e.g., ReLU, sigmoid)

🎮 Live XOR Training Demo

Watch an AI learn the XOR problem in real-time! XOR outputs 1 when inputs are different, 0 when same.

Epoch

0

Loss

1.000

Accuracy

0%

Learning Rate

0.1

How Does Learning Work?

🎯 Forward Pass: Making Predictions

The network makes a prediction by passing data forward through each layer:

Input: Feed in the data (like 0,1 for XOR)
Multiply & Add: Each connection has a "strength" (weight)
Activate: Decide if the neuron should "fire"
Output: Get the final prediction

📉 Backward Pass: Learning from Mistakes

When the network is wrong, it learns by adjusting its connections:

Calculate Error: How wrong was the prediction?
Blame Game: Which connections caused the error?
Adjust Weights: Make connections stronger or weaker
Repeat: Try again with new weights!

Forward Propagation

For each layer l:

z[l] = W[l] · a[l-1] + b[l]

a[l] = σ(z[l])

Where a[0] = x (input) and a[L] = ŷ (output)

Backpropagation

Loss function (Mean Squared Error):

L = ½ Σ(y - ŷ)²

Gradient computation:

δ[L] = ∇ₐL ⊙ σ'(z[L])

δ[l] = (W[l+1]ᵀ · δ[l+1]) ⊙ σ'(z[l])

Weight update:

W[l] = W[l] - α · δ[l] · a[l-1]ᵀ

b[l] = b[l] - α · δ[l]

Key Components Explained

🔗 Weights & Biases

Weights are like volume knobs - they control how much each input matters.

Biases are like thresholds - they decide when a neuron should activate.

⚡ Activation Functions

These decide if a neuron should "fire" or not:

ReLU: If positive, pass it on. If negative, block it!
Sigmoid: Squash everything between 0 and 1
Tanh: Squash everything between -1 and 1

🎯 Gradient Descent

Imagine you're blindfolded on a hill, trying to reach the bottom:

Feel the slope around you (calculate gradient)
Take a small step downhill (adjust weights)
Repeat until you reach the bottom (minimum loss)

Activation Functions

ReLU:

f(x) = max(0, x)

f'(x) = {1 if x > 0, 0 if x ≤ 0}

Sigmoid:

σ(x) = 1 / (1 + e⁻ˣ)

σ'(x) = σ(x) · (1 - σ(x))

Tanh:

tanh(x) = (eˣ - e⁻ˣ) / (eˣ + e⁻ˣ)

tanh'(x) = 1 - tanh²(x)

Gradient Descent Update Rule

θₜ₊₁ = θₜ - α · ∇θ L(θₜ)

Where:

θ = parameters (weights and biases)
α = learning rate
∇θ L = gradient of loss with respect to parameters