1. Vectors and Dot Product
Before we build a brain, we need to understand how computers "talk" about lists of things.
Simple Intuition
Imagine you are buying groceries. You have a list of items and their prices. A Vector is just a list of numbers. The Dot Product is the total bill you pay at the end.
Clear Definition
A Vector is an ordered list of numbers. In Deep Learning, we use them to represent data (like pixels or features).
Mathematical Formula
Dot Product (A · B) = (a1 * b1) + (a2 * b2)
Small Numerical Example
Inputs (x) = [2, 3] (2 apples, 3 oranges)
Weights (w) = [0.5, 1.2] (Price of apple, price of orange)
Dot Product = (2 * 0.5) + (3 * 1.2)
= 1.0 + 3.6 = 4.6
Answer
(1*2) + (5*0) = 22. Linear Equation (z = wᵀx + b)
Now that we can multiply lists, let's see how a "neuron" looks at data.
Simple Intuition
Imagine you are deciding if a movie is "Good" or "Bad". You care about Action and Story. But maybe you care more about Story. The "Weights" (w) are how much you care. The "Bias" (b) is your personal mood before the movie starts.
Clear Definition
z = wᵀx + b is the standard "Linear" formula. wᵀx is just the Dot Product of weights and inputs. b is the Bias, which shifts the result up or down.
Visual Intuition
In a 2D graph, this is a straight line. In 3D, it is a flat plane. It separates space into two halves.
z = (1 * 0.5) + (2 * -0.5) + 1
z = 0.5 - 1.0 + 1 = 0.5
Answer
(1*2 + 1*2) - 10 = 4 - 10 = -63. Perceptron (The Single Neuron)
The Perceptron is the "Grandfather" of Deep Learning. It’s the simplest model of a brain cell.
Simple Intuition
Think of a Perceptron as a Voting Machine. It takes several inputs, calculates the score (z), and then decides: "If the score is positive, say YES (1). If negative, say NO (0)."
Mathematical Steps
- Take Inputs (x)
- Multiply by Weights (w) and add Bias (b) ->
z = wᵀx + b - Pass
zthrough a decision rule (Activation Function).
x1 = Is it sunny? (1 for yes)
x2 = Is it a weekend? (1 for yes)
Let's say w = [5, 5] and b = -7.
If x = [1, 0] (Sunny but Monday): z = (5*1 + 5*0) - 7 = -2. (Output: 0 / No)
If x = [1, 1] (Sunny and Sunday): z = (5*1 + 5*1) - 7 = +3. (Output: 1 / Yes)
Answer
Yes, as long as you have 100 weights!4. Activation Functions
In the previous step, we decided "Yes" or "No". But real life is often more "maybe" or "partially".
1. Step Function
Intuition: A light switch. Either 0 or 1. No middle ground.
2. Sigmoid
Intuition: An "S" shaped curve. It squashes any number into a range between 0 and 1. It represents probability.
3. ReLU (Rectified Linear Unit)
Intuition: "If it's negative, ignore it. If it's positive, keep it as it is." Most popular in Deep Learning today.
Step says: 0
Sigmoid says: ~0.006
ReLU says: 0
Answer
10 (ReLU doesn't change positive numbers).5. Linear vs Non-linear Models
Why do we need Sigmoid or ReLU? Why not just stick to z = wx + b?
Visual Intuition
Linear: Imagine a straight ruler. You can only draw straight lines. If your data is in a circle, a ruler cannot separate the inside from the outside.
Non-linear: Imagine a piece of flexible wire. You can bend it to wrap around complex shapes.
The Secret of Deep Learning
By stacking many neurons with Activation Functions, we can create complex "bends" in the data. Without them, 1000 layers of neurons would still just be one big straight line.
Answer
No, it remains a linear model.6. AND, OR, XOR Logic Problems
Let's see how our Perceptron handles logic. We have two inputs (A, B) which can be 0 or 1.
AND Gate
Only True if both are 1. Weights [1, 1], Bias -1.5.
- (0,0) -> 0+0 - 1.5 = -1.5 (False)
- (1,1) -> 1+1 - 1.5 = 0.5 (True)
OR Gate
True if at least one is 1. Weights [1, 1], Bias -0.5.
- (1,0) -> 1+0 - 0.5 = 0.5 (True)
- (0,0) -> 0+0 - 0.5 = -0.5 (False)
XOR Gate (The Trouble Maker)
True only if inputs are different. (1,0) or (0,1).
Try to find a weight for XOR... you will fail!
Answer
It has a lower "threshold" (bias), so even one '1' can trigger it.7. Why XOR Fails in Perceptron
This is a famous moment in AI history. In 1969, it was proved that a single Perceptron cannot solve XOR.
Visual Intuition
Imagine 4 dots on a square:
- (0,0) and (1,1) are labeled BLUE
- (1,0) and (0,1) are labeled RED
Try to draw one single straight line that keeps all REDs on one side and all BLUEs on the other. You can't!
Mathematical Proof (Intuition)
For XOR:
- w1(0) + w2(0) + b < 0 (so b < 0)
- w1(1) + w2(0) + b > 0 (so w1 + b > 0)
- w1(0) + w2(1) + b > 0 (so w2 + b > 0)
- w1(1) + w2(1) + b < 0 (so w1 + w2 + b < 0)
If you add (2) and (3), you get w1 + w2 + 2b > 0. But (4) says w1 + w2 + b < 0. These two cannot both be true if b is negative! Contradiction!
Answer
By using more than one neuron (Hidden Layers)!8. Multi-Layer Perceptron (MLP)
If one neuron can't solve XOR, let's use a team of neurons!
The Architecture
- Input Layer: Where data enters.
- Hidden Layer: Where "thinking" happens. Neurons here transform the data into a new space where it is linearly separable.
- Output Layer: The final answer.
Visual Intuition
One neuron draws one line. A hidden layer with 3 neurons draws 3 lines. By combining these 3 lines, we can "fence in" a specific area of the graph, solving complex problems like XOR.
Answer
YES! Without them, multiple layers are no better than one layer.9. Forward Propagation
This is how data travels from the input to the output.
The Flow
- Input (x) enters.
- Hidden Layer: Calculates
z1 = W1x + b1, then appliesa1 = Activation(z1). - Output Layer: Uses the hidden layer's output as its input!
z2 = W2a1 + b2, theny_pred = Activation(z2).
Mathematical Example
Hidden Weight w=2, bias=0.
z = 1*2 = 2.
ReLU(2) = 2.
Output Weight w=0.5, bias=0.
Final z = 2 * 0.5 = 1.
Output = 1
10. Loss Functions
How does the computer know if it's doing a good job? We need a "Scorecard."
1. Mean Squared Error (MSE)
Used for Regression (predicting numbers like house prices).
2. Cross Entropy
Used for Classification (Cats vs Dogs). It measures how "far away" your predicted probability is from the truth (0 or 1).
Why square the error?
If we just subtract (Pred - Actual), a positive error and a negative error might cancel out. Squaring makes all errors positive and punishes big mistakes much harder than small ones!
Answer
(0.8 - 1.0)² = (-0.2)² = 0.0411. Gradient Descent
If the Loss is high, how do we fix the weights? We "walk down the hill."
Intuition
Imagine you are on a mountain in the fog. You want to find the valley (lowest loss). You feel the slope with your foot and take a step in the opposite direction of the slope.
The Update Rule
Gradient: The slope (derivative).
Learning Rate: The size of your step. Small step = slow but safe. Large step = fast but might overstep the valley.
New Weight = 5.0 - (0.1 * 2.0) = 4.8
12. Backpropagation
This is the "magic" of Deep Learning. It's how we tell the hidden layers they made a mistake.
Simple Intuition
Think of it as "Assigning Blame." The output layer says: "I was wrong by 0.5. It's mostly the fault of Neuron B in the hidden layer." Then Neuron B looks at its inputs and says: "Okay, then it's mostly the fault of Input 1."
The Math (Chain Rule)
We use the Chain Rule from calculus to calculate how much the Loss changes when a specific weight changes.
13. Convolutional Neural Networks (CNN)
MLPs are great, but they are bad at images. Why? Because if you move a cat 1 pixel to the right, an MLP thinks it's a totally different object. CNNs solve this.
The Convolution Operation
Think of a Flashlight (Filter/Kernel) scanning an image. It looks for small patterns (edges, circles, eyes).
Components:
- Filters: Small squares (e.g. 3x3) that slide over the image.
- Feature Map: The result of the scan, showing where certain patterns were found.
- Pooling: Shrinking the image. "In this 2x2 area, what was the strongest signal?" (Max Pooling). This helps the network be less sensitive to the exact location.
14. MNIST Classification: Full Pipeline
Let's put it all together to recognize a handwritten digit "7".
Step-by-Step:
- Input: A 28x28 pixel image (784 numbers).
- Convolution: 32 filters scan the image. They find horizontal and vertical lines.
- ReLU: Any negative values from the scan are turned to zero.
- Pooling: The 28x28 map is shrunk to 14x14 to focus on the "essence" of the shape.
- Flatten: The 2D square is stretched into one long list of numbers.
- Dense Layer: A standard MLP looks at these pattern-scores.
- Output (Softmax): 10 neurons (representing digits 0 to 9). The neuron for "7" gets the highest score!
How it learns:
If the network guesses "1" but the label is "7", Loss is calculated. Backpropagation goes through the layers, and Gradient Descent tweaks the filters so next time they recognize that specific curve of a "7" better.