Perceptron
Understanding the fundamental building block of neural networks
Imagine a simple decision maker that takes multiple inputs and decides 'yes' or 'no'. For example, deciding whether to go outside: if it's sunny (input 1), not raining (input 2), and warm (input 3), you go outside (output = yes). The perceptron works the same way - it's the simplest artificial neuron that takes multiple inputs, weighs their importance, sums them up, and makes a binary decision. It's like a tiny brain cell that learns from mistakes!
What is a Perceptron?
The perceptron is the simplest type of artificial neural network, invented by Frank Rosenblatt in 1958. It's a binary linear classifier that takes multiple inputs, multiplies each by a weight, sums them together with a bias term, and passes the result through an activation function to produce a binary output (0 or 1). The perceptron learns by adjusting its weights based on errors, making it the foundation for understanding modern deep learning.
# Simple Perceptron Implementationimport numpy as npclass Perceptron: """ Single neuron binary classifier (the simplest neural network) Components: - Weights (w): Importance of each input feature - Bias (b): Shifts decision boundary - Activation: Step function for binary output """ def __init__(self, learning_rate=0.1, n_iterations=100): self.learning_rate = learning_rate self.n_iterations = n_iterations self.weights = None self.bias = None def step_function(self, z): """Activation function: returns 1 if z >= 0, else 0""" return np.where(z >= 0, 1, 0) def fit(self, X, y): """ Train the perceptron using labeled data Learning Rule (Rosenblatt's Rule): If prediction is wrong: w_new = w_old + learning_rate * (target - predicted) * input b_new = b_old + learning_rate * (target - predicted) """ n_samples, n_features = X.shape # Initialize weights and bias to zeros self.weights = np.zeros(n_features) self.bias = 0 # Training loop for iteration in range(self.n_iterations): errors = 0 for i in range(n_samples): # Forward pass: compute weighted sum linear_output = np.dot(X[i], self.weights) + self.bias # Apply activation function y_predicted = self.step_function(linear_output) # Update weights if prediction is wrong error = y[i] - y_predicted if error != 0: # Weight update rule self.weights += self.learning_rate * error * X[i] self.bias += self.learning_rate * error errors += 1 print(f"Iteration {iteration + 1}: {errors} errors") # Stop if no errors (converged) if errors == 0: print(f"Converged after {iteration + 1} iterations!") break def predict(self, X): """Make predictions on new data""" linear_output = np.dot(X, self.weights) + self.bias return self.step_function(linear_output)# EXAMPLE: Learning the AND logic gateprint("="*60)print("TRAINING PERCEPTRON ON AND GATE")print("="*60)# AND gate truth tableX_and = np.array([ [0, 0], # 0 AND 0 = 0 [0, 1], # 0 AND 1 = 0 [1, 0], # 1 AND 0 = 0 [1, 1] # 1 AND 1 = 1])y_and = np.array([0, 0, 0, 1])# Create and train perceptronperceptron = Perceptron(learning_rate=0.1, n_iterations=10)perceptron.fit(X_and, y_and)# Test predictionsprint("\nTesting AND gate:")for inputs, expected in zip(X_and, y_and): prediction = perceptron.predict([inputs])[0] print(f"{inputs[0]} AND {inputs[1]} = {prediction} (expected: {expected})")print(f"\nLearned weights: {perceptron.weights}")print(f"Learned bias: {perceptron.bias}")Perceptron Architecture
The perceptron consists of several key components:
1. Inputs (x₁, x₂, ..., xₙ)
Feature values fed into the perceptron. Each input represents a dimension of the data.
2. Weights (w₁, w₂, ..., wₙ)
Parameters that determine the importance of each input. Learned during training.
3. Bias (b)
Shifts the decision boundary, allowing the perceptron to fit data that doesn't pass through origin.
4. Summation (Σ)
Weighted sum: z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b
5. Activation Function
Step function: f(z) = 1 if z ≥ 0, else 0. Creates binary decision.
6. Output (ŷ)
Binary classification: 0 or 1, representing predicted class.
# Detailed Perceptron Architecture Visualizationimport numpy as npdef perceptron_forward_pass(inputs, weights, bias): """ Demonstrates each step in perceptron computation Architecture: [Input Layer] → [Weighted Sum] → [Activation] → [Output] """ print("PERCEPTRON FORWARD PASS") print("="*60) # Step 1: Display inputs print(f"Inputs (x): {inputs}") # Step 2: Display weights print(f"Weights (w): {weights}") # Step 3: Display bias print(f"Bias (b): {bias}") # Step 4: Compute weighted sum weighted_sum = np.dot(inputs, weights) + bias print(f"\nWeighted Sum (z): ", end="") for i in range(len(inputs)): if i > 0: print(" + ", end="") print(f"({weights[i]} × {inputs[i]})", end="") print(f" + {bias}") print(f" z = {weighted_sum:.3f}") # Step 5: Apply activation function (step function) output = 1 if weighted_sum >= 0 else 0 print(f"\nActivation (step function):") print(f" if z >= 0: output = 1") print(f" else: output = 0") print(f"\nOutput: {output}") return output# Example: 2-input perceptroninputs = np.array([0.8, 0.6])weights = np.array([0.5, 0.3])bias = -0.4output = perceptron_forward_pass(inputs, weights, bias)# Visualize decision boundaryprint("\n" + "="*60)print("DECISION BOUNDARY")print("="*60)print("The perceptron creates a linear decision boundary:")print(f" w₁x₁ + w₂x₂ + b = 0")print(f" {weights[0]}x₁ + {weights[1]}x₂ + {bias} = 0")print(f"\nRearranged as line equation:")x2_intercept = -bias / weights[1]slope = -weights[0] / weights[1]print(f" x₂ = {slope:.2f}x₁ + {x2_intercept:.2f}")print("\nPoints above this line: class 1")print("Points below this line: class 0")Perceptron Learning Algorithm
How the perceptron learns from labeled training data:
# Perceptron Learning Algorithm - Step by Stepimport numpy as npdef train_perceptron_detailed(X, y, learning_rate=0.1, max_iterations=20): """ Perceptron learning algorithm with detailed output Algorithm: 1. Initialize weights and bias to zero (or small random values) 2. For each training example: a. Compute prediction: ŷ = step(wᵀx + b) b. Calculate error: e = y - ŷ c. Update weights: w ← w + α × e × x d. Update bias: b ← b + α × e 3. Repeat until convergence or max iterations """ n_samples, n_features = X.shape # Step 1: Initialize parameters weights = np.zeros(n_features) bias = 0 print("PERCEPTRON LEARNING ALGORITHM") print("="*60) print(f"Initial weights: {weights}") print(f"Initial bias: {bias}") print(f"Learning rate: {learning_rate}") print("="*60) # Training loop for iteration in range(max_iterations): print(f"\n--- ITERATION {iteration + 1} ---") total_errors = 0 for i, (x, target) in enumerate(zip(X, y)): # Step 2a: Forward pass - compute prediction z = np.dot(weights, x) + bias prediction = 1 if z >= 0 else 0 # Step 2b: Calculate error error = target - prediction print(f"\nSample {i+1}: x={x}, target={target}") print(f" Weighted sum: z = {z:.3f}") print(f" Prediction: ŷ = {prediction}") print(f" Error: e = {target} - {prediction} = {error}") # Step 2c & 2d: Update weights and bias if wrong prediction if error != 0: # Weight update: w_new = w_old + learning_rate * error * input weight_update = learning_rate * error * x bias_update = learning_rate * error print(f" ⚠️ WRONG! Updating parameters...") print(f" Weight change: Δw = {learning_rate} × {error} × {x} = {weight_update}") print(f" Bias change: Δb = {learning_rate} × {error} = {bias_update}") weights += weight_update bias += bias_update print(f" New weights: {weights}") print(f" New bias: {bias:.3f}") total_errors += 1 else: print(f" ✓ Correct prediction!") print(f"\nIteration {iteration + 1} complete: {total_errors} errors") # Step 3: Check convergence if total_errors == 0: print(f"\n🎉 CONVERGED after {iteration + 1} iterations!") break return weights, bias# EXAMPLE: Learning OR gateprint("\nTRAINING EXAMPLE: OR LOGIC GATE")print("="*60)X_or = np.array([ [0, 0], # 0 OR 0 = 0 [0, 1], # 0 OR 1 = 1 [1, 0], # 1 OR 0 = 1 [1, 1] # 1 OR 1 = 1])y_or = np.array([0, 1, 1, 1])learned_weights, learned_bias = train_perceptron_detailed(X_or, y_or, learning_rate=0.1)# Test learned modelprint("\n" + "="*60)print("TESTING LEARNED MODEL")print("="*60)for x, expected in zip(X_or, y_or): z = np.dot(learned_weights, x) + learned_bias prediction = 1 if z >= 0 else 0 status = "✓" if prediction == expected else "✗" print(f"{x[0]} OR {x[1]} = {prediction} (expected {expected}) {status}")# Understanding the learned boundaryprint("\n" + "="*60)print("LEARNED DECISION BOUNDARY")print("="*60)print(f"Equation: {learned_weights[0]:.2f}x₁ + {learned_weights[1]:.2f}x₂ + {learned_bias:.2f} = 0")print("This line separates class 0 (below) from class 1 (above)")Limitations of the Perceptron
Understanding what perceptrons can and cannot do:
⚠️ The XOR Problem - Why Single Perceptron Fails
The XOR (exclusive OR) function cannot be learned by a single perceptron because it's not linearly separable. You cannot draw a single straight line to separate the two classes.
# The XOR Problem - Perceptron's Limitationimport numpy as npclass Perceptron: def __init__(self, learning_rate=0.1, max_iterations=100): self.lr = learning_rate self.max_iter = max_iterations self.weights = None self.bias = None def fit(self, X, y): n_samples, n_features = X.shape self.weights = np.zeros(n_features) self.bias = 0 for iteration in range(self.max_iter): errors = 0 for i in range(n_samples): z = np.dot(X[i], self.weights) + self.bias y_pred = 1 if z >= 0 else 0 error = y[i] - y_pred if error != 0: self.weights += self.lr * error * X[i] self.bias += self.lr * error errors += 1 if errors == 0: return True, iteration + 1 return False, self.max_iter def predict(self, X): z = np.dot(X, self.weights) + self.bias return 1 if z >= 0 else 0# Test on linearly separable problems (WORKS)print("="*60)print("TESTING ON LINEARLY SEPARABLE PROBLEMS")print("="*60)# AND gate - linearly separableX_and = np.array([[0,0], [0,1], [1,0], [1,1]])y_and = np.array([0, 0, 0, 1])perceptron_and = Perceptron()converged, iterations = perceptron_and.fit(X_and, y_and)print(f"\nAND gate: {'✓ CONVERGED' if converged else '✗ FAILED'} in {iterations} iterations")for x in X_and: print(f" {x[0]} AND {x[1]} = {perceptron_and.predict(x)}")# OR gate - linearly separableX_or = np.array([[0,0], [0,1], [1,0], [1,1]])y_or = np.array([0, 1, 1, 1])perceptron_or = Perceptron()converged, iterations = perceptron_or.fit(X_or, y_or)print(f"\nOR gate: {'✓ CONVERGED' if converged else '✗ FAILED'} in {iterations} iterations")for x in X_or: print(f" {x[0]} OR {x[1]} = {perceptron_or.predict(x)}")# XOR gate - NOT linearly separable (FAILS!)print("\n" + "="*60)print("TESTING ON NON-LINEARLY SEPARABLE PROBLEM")print("="*60)X_xor = np.array([[0,0], [0,1], [1,0], [1,1]])y_xor = np.array([0, 1, 1, 0]) # XOR patternperceptron_xor = Perceptron(max_iterations=1000)converged, iterations = perceptron_xor.fit(X_xor, y_xor)print(f"\nXOR gate: {'✓ CONVERGED' if converged else '✗ FAILED'} after {iterations} iterations")print("\nPredictions (likely wrong):")for x, expected in zip(X_xor, y_xor): pred = perceptron_xor.predict(x) status = "✓" if pred == expected else "✗" print(f" {x[0]} XOR {x[1]} = {pred} (expected {expected}) {status}")# Explanationprint("\n" + "="*60)print("WHY XOR CANNOT BE SOLVED BY SINGLE PERCEPTRON")print("="*60)print("\nXOR Truth Table:")print(" (0,0) → 0 ┐")print(" (0,1) → 1 │ Cannot separate with single line!")print(" (1,0) → 1 │")print(" (1,1) → 0 ┘")print("\nVisualize the points:")print(" Class 0: (0,0) bottom-left, (1,1) top-right")print(" Class 1: (0,1) top-left, (1,0) bottom-right")print(" → No single straight line separates these!")print("\nSOLUTION: Use Multi-Layer Perceptron (MLP)")print(" - Multiple perceptrons in layers")print(" - Hidden layers create non-linear boundaries")print(" - XOR can be solved with 1 hidden layer (2 neurons)")# Other limitationsprint("\n" + "="*60)print("OTHER PERCEPTRON LIMITATIONS")print("="*60)print("1. Only binary classification (0 or 1)")print("2. Only linear decision boundaries")print("3. No probabilistic outputs (unlike logistic regression)")print("4. Sensitive to feature scaling")print("5. Cannot learn complex patterns")print("6. May never converge if data not linearly separable")print("\nHOWEVER, perceptrons are:")print("✓ Foundation for understanding neural networks")print("✓ Fast and simple for linearly separable problems")print("✓ Historically important (first 'learning' algorithm)")Key Concepts
Linear Separator
The perceptron can only learn linearly separable patterns - those that can be divided by a straight line (2D) or hyperplane (higher dimensions). Cannot solve XOR problem.
Weights and Bias
Weights determine the importance of each input, bias shifts the decision boundary. Learning involves adjusting these values to minimize classification errors.
Activation Function
Step function (Heaviside) that outputs 1 if weighted sum exceeds threshold, 0 otherwise. This creates the binary decision boundary.
Convergence Theorem
The perceptron is guaranteed to converge (find correct weights) in finite steps if the data is linearly separable. No guarantee for non-linearly separable data.
Interview Tips
- 💡Explain the perceptron formula: output = step(w₁x₁ + w₂x₂ + ... + wₙxₙ + b), where step(z) = 1 if z ≥ 0, else 0
- 💡Know the learning rule: If prediction is wrong, update weights: w_new = w_old + learning_rate × (target - predicted) × input
- 💡Be ready to explain why perceptron cannot solve XOR: XOR is not linearly separable (cannot draw single line to separate classes)
- 💡Discuss the perceptron convergence theorem: guaranteed to find solution if data is linearly separable, may never converge otherwise
- 💡Explain the difference between perceptron and logistic regression: perceptron uses step function, logistic uses sigmoid; logistic gives probabilities
- 💡Know historical significance: perceptron (1958) led to first AI winter when limitations discovered, but inspired modern neural networks
- 💡Understand single-layer vs multi-layer perceptrons: single layer can only learn linear boundaries, multiple layers (MLPs) can learn non-linear patterns
- 💡Be able to walk through a training example: show how weights update when perceptron makes mistake on AND or OR gate