Perceptron

Understanding the fundamental building block of neural networks

Imagine a simple decision maker that takes multiple inputs and decides 'yes' or 'no'. For example, deciding whether to go outside: if it's sunny (input 1), not raining (input 2), and warm (input 3), you go outside (output = yes). The perceptron works the same way - it's the simplest artificial neuron that takes multiple inputs, weighs their importance, sums them up, and makes a binary decision. It's like a tiny brain cell that learns from mistakes!

What is a Perceptron?

The perceptron is the simplest type of artificial neural network, invented by Frank Rosenblatt in 1958. It's a binary linear classifier that takes multiple inputs, multiplies each by a weight, sums them together with a bias term, and passes the result through an activation function to produce a binary output (0 or 1). The perceptron learns by adjusting its weights based on errors, making it the foundation for understanding modern deep learning.

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# Simple Perceptron Implementation
import numpy as np
class Perceptron:
"""
Single neuron binary classifier (the simplest neural network)
Components:
- Weights (w): Importance of each input feature
- Bias (b): Shifts decision boundary
- Activation: Step function for binary output
"""
def __init__(self, learning_rate=0.1, n_iterations=100):
self.learning_rate = learning_rate
self.n_iterations = n_iterations
self.weights = None
self.bias = None
def step_function(self, z):
"""Activation function: returns 1 if z >= 0, else 0"""
return np.where(z >= 0, 1, 0)
def fit(self, X, y):
"""
Train the perceptron using labeled data
Learning Rule (Rosenblatt's Rule):
If prediction is wrong:
w_new = w_old + learning_rate * (target - predicted) * input
b_new = b_old + learning_rate * (target - predicted)
"""
n_samples, n_features = X.shape
# Initialize weights and bias to zeros
self.weights = np.zeros(n_features)
self.bias = 0
# Training loop
for iteration in range(self.n_iterations):
errors = 0
for i in range(n_samples):
# Forward pass: compute weighted sum
linear_output = np.dot(X[i], self.weights) + self.bias
# Apply activation function
y_predicted = self.step_function(linear_output)
# Update weights if prediction is wrong
error = y[i] - y_predicted
if error != 0:
# Weight update rule
self.weights += self.learning_rate * error * X[i]
self.bias += self.learning_rate * error
errors += 1
print(f"Iteration {iteration + 1}: {errors} errors")
# Stop if no errors (converged)
if errors == 0:
print(f"Converged after {iteration + 1} iterations!")
break
def predict(self, X):
"""Make predictions on new data"""
linear_output = np.dot(X, self.weights) + self.bias
return self.step_function(linear_output)
# EXAMPLE: Learning the AND logic gate
print("="*60)
print("TRAINING PERCEPTRON ON AND GATE")
print("="*60)
# AND gate truth table
X_and = np.array([
[0, 0], # 0 AND 0 = 0
[0, 1], # 0 AND 1 = 0
[1, 0], # 1 AND 0 = 0
[1, 1] # 1 AND 1 = 1
])
y_and = np.array([0, 0, 0, 1])
# Create and train perceptron
perceptron = Perceptron(learning_rate=0.1, n_iterations=10)
perceptron.fit(X_and, y_and)
# Test predictions
print("\nTesting AND gate:")
for inputs, expected in zip(X_and, y_and):
prediction = perceptron.predict([inputs])[0]
print(f"{inputs[0]} AND {inputs[1]} = {prediction} (expected: {expected})")
print(f"\nLearned weights: {perceptron.weights}")
print(f"Learned bias: {perceptron.bias}")

Perceptron Architecture

The perceptron consists of several key components:

1. Inputs (x₁, x₂, ..., xₙ)

Feature values fed into the perceptron. Each input represents a dimension of the data.

2. Weights (w₁, w₂, ..., wₙ)

Parameters that determine the importance of each input. Learned during training.

3. Bias (b)

Shifts the decision boundary, allowing the perceptron to fit data that doesn't pass through origin.

4. Summation (Σ)

Weighted sum: z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b

5. Activation Function

Step function: f(z) = 1 if z ≥ 0, else 0. Creates binary decision.

6. Output (ŷ)

Binary classification: 0 or 1, representing predicted class.

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# Detailed Perceptron Architecture Visualization
import numpy as np
def perceptron_forward_pass(inputs, weights, bias):
"""
Demonstrates each step in perceptron computation
Architecture:
[Input Layer] [Weighted Sum] [Activation] [Output]
"""
print("PERCEPTRON FORWARD PASS")
print("="*60)
# Step 1: Display inputs
print(f"Inputs (x): {inputs}")
# Step 2: Display weights
print(f"Weights (w): {weights}")
# Step 3: Display bias
print(f"Bias (b): {bias}")
# Step 4: Compute weighted sum
weighted_sum = np.dot(inputs, weights) + bias
print(f"\nWeighted Sum (z): ", end="")
for i in range(len(inputs)):
if i > 0:
print(" + ", end="")
print(f"({weights[i]} × {inputs[i]})", end="")
print(f" + {bias}")
print(f" z = {weighted_sum:.3f}")
# Step 5: Apply activation function (step function)
output = 1 if weighted_sum >= 0 else 0
print(f"\nActivation (step function):")
print(f" if z >= 0: output = 1")
print(f" else: output = 0")
print(f"\nOutput: {output}")
return output
# Example: 2-input perceptron
inputs = np.array([0.8, 0.6])
weights = np.array([0.5, 0.3])
bias = -0.4
output = perceptron_forward_pass(inputs, weights, bias)
# Visualize decision boundary
print("\n" + "="*60)
print("DECISION BOUNDARY")
print("="*60)
print("The perceptron creates a linear decision boundary:")
print(f" w₁x₁ + w₂x₂ + b = 0")
print(f" {weights[0]}x₁ + {weights[1]}x₂ + {bias} = 0")
print(f"\nRearranged as line equation:")
x2_intercept = -bias / weights[1]
slope = -weights[0] / weights[1]
print(f" x₂ = {slope:.2f}x₁ + {x2_intercept:.2f}")
print("\nPoints above this line: class 1")
print("Points below this line: class 0")

Perceptron Learning Algorithm

How the perceptron learns from labeled training data:

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
# Perceptron Learning Algorithm - Step by Step
import numpy as np
def train_perceptron_detailed(X, y, learning_rate=0.1, max_iterations=20):
"""
Perceptron learning algorithm with detailed output
Algorithm:
1. Initialize weights and bias to zero (or small random values)
2. For each training example:
a. Compute prediction: ŷ = step(wᵀx + b)
b. Calculate error: e = y - ŷ
c. Update weights: w w + α × e × x
d. Update bias: b b + α × e
3. Repeat until convergence or max iterations
"""
n_samples, n_features = X.shape
# Step 1: Initialize parameters
weights = np.zeros(n_features)
bias = 0
print("PERCEPTRON LEARNING ALGORITHM")
print("="*60)
print(f"Initial weights: {weights}")
print(f"Initial bias: {bias}")
print(f"Learning rate: {learning_rate}")
print("="*60)
# Training loop
for iteration in range(max_iterations):
print(f"\n--- ITERATION {iteration + 1} ---")
total_errors = 0
for i, (x, target) in enumerate(zip(X, y)):
# Step 2a: Forward pass - compute prediction
z = np.dot(weights, x) + bias
prediction = 1 if z >= 0 else 0
# Step 2b: Calculate error
error = target - prediction
print(f"\nSample {i+1}: x={x}, target={target}")
print(f" Weighted sum: z = {z:.3f}")
print(f" Prediction: ŷ = {prediction}")
print(f" Error: e = {target} - {prediction} = {error}")
# Step 2c & 2d: Update weights and bias if wrong prediction
if error != 0:
# Weight update: w_new = w_old + learning_rate * error * input
weight_update = learning_rate * error * x
bias_update = learning_rate * error
print(f" ⚠️ WRONG! Updating parameters...")
print(f" Weight change: Δw = {learning_rate} × {error} × {x} = {weight_update}")
print(f" Bias change: Δb = {learning_rate} × {error} = {bias_update}")
weights += weight_update
bias += bias_update
print(f" New weights: {weights}")
print(f" New bias: {bias:.3f}")
total_errors += 1
else:
print(f" Correct prediction!")
print(f"\nIteration {iteration + 1} complete: {total_errors} errors")
# Step 3: Check convergence
if total_errors == 0:
print(f"\n🎉 CONVERGED after {iteration + 1} iterations!")
break
return weights, bias
# EXAMPLE: Learning OR gate
print("\nTRAINING EXAMPLE: OR LOGIC GATE")
print("="*60)
X_or = np.array([
[0, 0], # 0 OR 0 = 0
[0, 1], # 0 OR 1 = 1
[1, 0], # 1 OR 0 = 1
[1, 1] # 1 OR 1 = 1
])
y_or = np.array([0, 1, 1, 1])
learned_weights, learned_bias = train_perceptron_detailed(X_or, y_or, learning_rate=0.1)
# Test learned model
print("\n" + "="*60)
print("TESTING LEARNED MODEL")
print("="*60)
for x, expected in zip(X_or, y_or):
z = np.dot(learned_weights, x) + learned_bias
prediction = 1 if z >= 0 else 0
status = "✓" if prediction == expected else "✗"
print(f"{x[0]} OR {x[1]} = {prediction} (expected {expected}) {status}")
# Understanding the learned boundary
print("\n" + "="*60)
print("LEARNED DECISION BOUNDARY")
print("="*60)
print(f"Equation: {learned_weights[0]:.2f}x₁ + {learned_weights[1]:.2f}x₂ + {learned_bias:.2f} = 0")
print("This line separates class 0 (below) from class 1 (above)")

Limitations of the Perceptron

Understanding what perceptrons can and cannot do:

⚠️ The XOR Problem - Why Single Perceptron Fails

The XOR (exclusive OR) function cannot be learned by a single perceptron because it's not linearly separable. You cannot draw a single straight line to separate the two classes.

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
# The XOR Problem - Perceptron's Limitation
import numpy as np
class Perceptron:
def __init__(self, learning_rate=0.1, max_iterations=100):
self.lr = learning_rate
self.max_iter = max_iterations
self.weights = None
self.bias = None
def fit(self, X, y):
n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0
for iteration in range(self.max_iter):
errors = 0
for i in range(n_samples):
z = np.dot(X[i], self.weights) + self.bias
y_pred = 1 if z >= 0 else 0
error = y[i] - y_pred
if error != 0:
self.weights += self.lr * error * X[i]
self.bias += self.lr * error
errors += 1
if errors == 0:
return True, iteration + 1
return False, self.max_iter
def predict(self, X):
z = np.dot(X, self.weights) + self.bias
return 1 if z >= 0 else 0
# Test on linearly separable problems (WORKS)
print("="*60)
print("TESTING ON LINEARLY SEPARABLE PROBLEMS")
print("="*60)
# AND gate - linearly separable
X_and = np.array([[0,0], [0,1], [1,0], [1,1]])
y_and = np.array([0, 0, 0, 1])
perceptron_and = Perceptron()
converged, iterations = perceptron_and.fit(X_and, y_and)
print(f"\nAND gate: {'✓ CONVERGED' if converged else '✗ FAILED'} in {iterations} iterations")
for x in X_and:
print(f" {x[0]} AND {x[1]} = {perceptron_and.predict(x)}")
# OR gate - linearly separable
X_or = np.array([[0,0], [0,1], [1,0], [1,1]])
y_or = np.array([0, 1, 1, 1])
perceptron_or = Perceptron()
converged, iterations = perceptron_or.fit(X_or, y_or)
print(f"\nOR gate: {'✓ CONVERGED' if converged else '✗ FAILED'} in {iterations} iterations")
for x in X_or:
print(f" {x[0]} OR {x[1]} = {perceptron_or.predict(x)}")
# XOR gate - NOT linearly separable (FAILS!)
print("\n" + "="*60)
print("TESTING ON NON-LINEARLY SEPARABLE PROBLEM")
print("="*60)
X_xor = np.array([[0,0], [0,1], [1,0], [1,1]])
y_xor = np.array([0, 1, 1, 0]) # XOR pattern
perceptron_xor = Perceptron(max_iterations=1000)
converged, iterations = perceptron_xor.fit(X_xor, y_xor)
print(f"\nXOR gate: {'✓ CONVERGED' if converged else '✗ FAILED'} after {iterations} iterations")
print("\nPredictions (likely wrong):")
for x, expected in zip(X_xor, y_xor):
pred = perceptron_xor.predict(x)
status = "✓" if pred == expected else "✗"
print(f" {x[0]} XOR {x[1]} = {pred} (expected {expected}) {status}")
# Explanation
print("\n" + "="*60)
print("WHY XOR CANNOT BE SOLVED BY SINGLE PERCEPTRON")
print("="*60)
print("\nXOR Truth Table:")
print(" (0,0) 0 ┐")
print(" (0,1) 1 Cannot separate with single line!")
print(" (1,0) 1 │")
print(" (1,1) 0 ┘")
print("\nVisualize the points:")
print(" Class 0: (0,0) bottom-left, (1,1) top-right")
print(" Class 1: (0,1) top-left, (1,0) bottom-right")
print(" No single straight line separates these!")
print("\nSOLUTION: Use Multi-Layer Perceptron (MLP)")
print(" - Multiple perceptrons in layers")
print(" - Hidden layers create non-linear boundaries")
print(" - XOR can be solved with 1 hidden layer (2 neurons)")
# Other limitations
print("\n" + "="*60)
print("OTHER PERCEPTRON LIMITATIONS")
print("="*60)
print("1. Only binary classification (0 or 1)")
print("2. Only linear decision boundaries")
print("3. No probabilistic outputs (unlike logistic regression)")
print("4. Sensitive to feature scaling")
print("5. Cannot learn complex patterns")
print("6. May never converge if data not linearly separable")
print("\nHOWEVER, perceptrons are:")
print("✓ Foundation for understanding neural networks")
print("✓ Fast and simple for linearly separable problems")
print("✓ Historically important (first 'learning' algorithm)")

Key Concepts

Linear Separator

The perceptron can only learn linearly separable patterns - those that can be divided by a straight line (2D) or hyperplane (higher dimensions). Cannot solve XOR problem.

Weights and Bias

Weights determine the importance of each input, bias shifts the decision boundary. Learning involves adjusting these values to minimize classification errors.

Activation Function

Step function (Heaviside) that outputs 1 if weighted sum exceeds threshold, 0 otherwise. This creates the binary decision boundary.

Convergence Theorem

The perceptron is guaranteed to converge (find correct weights) in finite steps if the data is linearly separable. No guarantee for non-linearly separable data.

Interview Tips

  • 💡Explain the perceptron formula: output = step(w₁x₁ + w₂x₂ + ... + wₙxₙ + b), where step(z) = 1 if z ≥ 0, else 0
  • 💡Know the learning rule: If prediction is wrong, update weights: w_new = w_old + learning_rate × (target - predicted) × input
  • 💡Be ready to explain why perceptron cannot solve XOR: XOR is not linearly separable (cannot draw single line to separate classes)
  • 💡Discuss the perceptron convergence theorem: guaranteed to find solution if data is linearly separable, may never converge otherwise
  • 💡Explain the difference between perceptron and logistic regression: perceptron uses step function, logistic uses sigmoid; logistic gives probabilities
  • 💡Know historical significance: perceptron (1958) led to first AI winter when limitations discovered, but inspired modern neural networks
  • 💡Understand single-layer vs multi-layer perceptrons: single layer can only learn linear boundaries, multiple layers (MLPs) can learn non-linear patterns
  • 💡Be able to walk through a training example: show how weights update when perceptron makes mistake on AND or OR gate