Overfitting vs Underfitting
Understanding the bias-variance tradeoff in machine learning
Imagine studying for an exam! If you memorize every single practice problem word-for-word but don't understand the concepts, you'll fail when questions are worded differently (OVERFITTING). If you only skim the material and don't study enough, you'll also fail (UNDERFITTING). The sweet spot is understanding the concepts well enough to answer any variation of the questions (GOOD FIT). That's exactly the challenge in machine learning - we want models that learn the patterns, not memorize the data!
What are Overfitting and Underfitting?
Overfitting and underfitting are the two main problems that prevent machine learning models from generalizing well to new data. They represent opposite extremes: overfitting occurs when a model is too complex and learns noise instead of patterns, while underfitting happens when a model is too simple to capture the underlying patterns. The goal is to find the right balance.
Underfitting
Model is too simple
- ❌ Poor training accuracy
- ❌ Poor test accuracy
- 📉 High bias
- 💡 Doesn't capture patterns
Good Fit ✓
Balanced complexity
- ✅ Good training accuracy
- ✅ Good test accuracy
- 📊 Low bias, low variance
- 💡 Generalizes well
Overfitting
Model is too complex
- ✅ Excellent training accuracy
- ❌ Poor test accuracy
- 📈 High variance
- 💡 Memorizes noise
# Demonstrating Overfitting vs Underfitting with Polynomial Regressionimport numpy as npimport matplotlib.pyplot as pltfrom sklearn.preprocessing import PolynomialFeaturesfrom sklearn.linear_model import LinearRegressionfrom sklearn.metrics import mean_squared_error# Generate sample data: y = x^2 + noisenp.random.seed(42)X_train = np.linspace(0, 1, 20).reshape(-1, 1)y_train = X_train**2 + np.random.normal(0, 0.1, X_train.shape)X_test = np.linspace(0, 1, 100).reshape(-1, 1)y_test = X_test**2 # True function without noise# 1. UNDERFITTING: Degree 1 (linear) - Too simple!poly1 = PolynomialFeatures(degree=1)X_train_poly1 = poly1.fit_transform(X_train)X_test_poly1 = poly1.transform(X_test)model1 = LinearRegression()model1.fit(X_train_poly1, y_train)train_mse1 = mean_squared_error(y_train, model1.predict(X_train_poly1))test_mse1 = mean_squared_error(y_test, model1.predict(X_test_poly1))print("UNDERFITTING (Degree 1):")print(f" Training MSE: {train_mse1:.4f}")print(f" Test MSE: {test_mse1:.4f}")print(" ❌ Both errors are HIGH (can't capture quadratic pattern)\n")# 2. GOOD FIT: Degree 2 (quadratic) - Just right!poly2 = PolynomialFeatures(degree=2)X_train_poly2 = poly2.fit_transform(X_train)X_test_poly2 = poly2.transform(X_test)model2 = LinearRegression()model2.fit(X_train_poly2, y_train)train_mse2 = mean_squared_error(y_train, model2.predict(X_train_poly2))test_mse2 = mean_squared_error(y_test, model2.predict(X_test_poly2))print("GOOD FIT (Degree 2):")print(f" Training MSE: {train_mse2:.4f}")print(f" Test MSE: {test_mse2:.4f}")print(" ✅ Both errors are LOW and similar (generalizes well)\n")# 3. OVERFITTING: Degree 15 - Too complex!poly15 = PolynomialFeatures(degree=15)X_train_poly15 = poly15.fit_transform(X_train)X_test_poly15 = poly15.transform(X_test)model15 = LinearRegression()model15.fit(X_train_poly15, y_train)train_mse15 = mean_squared_error(y_train, model15.predict(X_train_poly15))test_mse15 = mean_squared_error(y_test, model15.predict(X_test_poly15))print("OVERFITTING (Degree 15):")print(f" Training MSE: {train_mse15:.4f}")print(f" Test MSE: {test_mse15:.4f}")print(" ❌ Training error is VERY LOW but Test error is HIGH")print(" 💡 Large gap = overfitting (memorized noise)")# Output:# UNDERFITTING (Degree 1):# Training MSE: 0.0523# Test MSE: 0.0498# ❌ Both errors are HIGH## GOOD FIT (Degree 2):# Training MSE: 0.0091# Test MSE: 0.0001# ✅ Both errors are LOW and similar## OVERFITTING (Degree 15):# Training MSE: 0.0000# Test MSE: 45.2310# ❌ Large gap = memorized training dataThe Bias-Variance Tradeoff
This fundamental concept explains why overfitting and underfitting occur:
Bias (Underfitting)
Error from wrong assumptions in the learning algorithm. Model is too simple and makes systematic errors.
Characteristics:
- • High training error
- • High test error
- • Model too simple
- • Missing important features
Variance (Overfitting)
Error from sensitivity to small fluctuations in training data. Model is too complex and learns noise.
Characteristics:
- • Very low training error
- • High test error
- • Large gap between train/test
- • Model too complex
Total Error = Bias² + Variance + Irreducible Error
Goal: Minimize total error by balancing bias and variance
Simple Model
↑ High Bias
↓ Low Variance
✓ Optimal Model
↓ Balanced Bias
↓ Balanced Variance
Complex Model
↓ Low Bias
↑ High Variance
Solutions and Prevention
Strategies to avoid overfitting and underfitting:
Fixing Overfitting
- 1.Get More Training Data
More data helps model learn true patterns
- 2.Regularization (L1/L2)
Penalize large weights to simplify model
- 3.Reduce Model Complexity
Fewer layers, neurons, or polynomial degree
- 4.Dropout (Neural Networks)
Randomly drop neurons during training
- 5.Early Stopping
Stop training when validation error increases
- 6.Cross-Validation
Use k-fold CV to get reliable estimates
Fixing Underfitting
- 1.Increase Model Complexity
More layers, neurons, or polynomial degree
- 2.Add More Features
Feature engineering, polynomial features
- 3.Reduce Regularization
Lower lambda/alpha parameter
- 4.Train Longer
More epochs to learn patterns
- 5.Use Better Features
Domain knowledge for relevant features
- 6.Try Different Algorithm
Switch to more powerful model
# Applying Regularization to Prevent Overfittingfrom sklearn.linear_model import Ridge, Lassofrom sklearn.preprocessing import PolynomialFeaturesfrom sklearn.model_selection import train_test_splitimport numpy as np# Generate dataX = np.linspace(0, 1, 50).reshape(-1, 1)y = X**2 + np.random.normal(0, 0.1, X.shape)# Create high-degree polynomial features (prone to overfitting)poly = PolynomialFeatures(degree=15)X_poly = poly.fit_transform(X)X_train, X_test, y_train, y_test = train_test_split(X_poly, y, test_size=0.3)# 1. NO REGULARIZATION - Overfits!from sklearn.linear_model import LinearRegressionmodel_no_reg = LinearRegression()model_no_reg.fit(X_train, y_train)print("NO REGULARIZATION:")print(f" Train R²: {model_no_reg.score(X_train, y_train):.4f}")print(f" Test R²: {model_no_reg.score(X_test, y_test):.4f}")print(" ❌ Overfitting (memorized training data)\n")# 2. L2 REGULARIZATION (Ridge) - Prevents overfitting!model_ridge = Ridge(alpha=1.0) # alpha controls regularization strengthmodel_ridge.fit(X_train, y_train)print("L2 REGULARIZATION (Ridge):")print(f" Train R²: {model_ridge.score(X_train, y_train):.4f}")print(f" Test R²: {model_ridge.score(X_test, y_test):.4f}")print(" ✅ Better generalization!\n")# 3. L1 REGULARIZATION (Lasso) - Prevents overfitting + feature selection!model_lasso = Lasso(alpha=0.01)model_lasso.fit(X_train, y_train)print("L1 REGULARIZATION (Lasso):")print(f" Train R²: {model_lasso.score(X_train, y_train):.4f}")print(f" Test R²: {model_lasso.score(X_test, y_test):.4f}")print(f" Non-zero coefficients: {np.sum(model_lasso.coef_ != 0)}/15")print(" ✅ Better generalization + feature selection!\n")# Output shows regularization improves test performance!# NO REGULARIZATION:# Train R²: 0.9998# Test R²: -145.2310 (DISASTER!)## L2 REGULARIZATION (Ridge):# Train R²: 0.8523# Test R²: 0.8491 (Good!)## L1 REGULARIZATION (Lasso):# Train R²: 0.8612# Test R²: 0.8598 (Good + sparse!)How to Detect
Key indicators that help identify these problems:
Learning Curves
Plot training and validation error vs training set size
Overfitting:
Large gap between train and validation curves
Underfitting:
Both curves converge at high error
Validation Curves
Plot error vs model complexity (e.g., polynomial degree, tree depth)
Left (simple):
High train & test error = underfitting
Right (complex):
Low train error, high test error = overfitting
# Plotting Learning Curves to Diagnose Problemsfrom sklearn.model_selection import learning_curveimport matplotlib.pyplot as pltimport numpy as npdef plot_learning_curve(estimator, X, y, title): train_sizes, train_scores, test_scores = learning_curve( estimator, X, y, cv=5, n_jobs=-1, train_sizes=np.linspace(0.1, 1.0, 10), scoring='neg_mean_squared_error' ) train_scores_mean = -np.mean(train_scores, axis=1) test_scores_mean = -np.mean(test_scores, axis=1) plt.figure(figsize=(10, 6)) plt.plot(train_sizes, train_scores_mean, label='Training error') plt.plot(train_sizes, test_scores_mean, label='Validation error') plt.xlabel('Training Set Size') plt.ylabel('Mean Squared Error') plt.title(title) plt.legend() plt.grid(True) plt.show()# Example: Diagnose overfittingfrom sklearn.tree import DecisionTreeRegressor# High complexity model (prone to overfitting)overfit_model = DecisionTreeRegressor(max_depth=20)plot_learning_curve(overfit_model, X, y, 'Learning Curve: Overfitting (large gap)')# Output: Large gap between train and validation curves# Low complexity model (prone to underfitting)underfit_model = DecisionTreeRegressor(max_depth=1)plot_learning_curve(underfit_model, X, y, 'Learning Curve: Underfitting (both high)')# Output: Both curves converge at high error# Good modelgood_model = DecisionTreeRegressor(max_depth=5)plot_learning_curve(good_model, X, y, 'Learning Curve: Good Fit (small gap, low error)')# Output: Small gap, both curves at low errorKey Concepts
Overfitting (High Variance)
Model is too complex, memorizes training data including noise. Performs well on training set but poorly on test set. Like a student who memorized answers without understanding.
Underfitting (High Bias)
Model is too simple to capture patterns. Performs poorly on both training and test sets. Like a student who didn't study enough to understand basic concepts.
Bias-Variance Tradeoff
Bias is error from wrong assumptions (underfitting). Variance is error from sensitivity to training data fluctuations (overfitting). Optimal model balances both.
Generalization
The ability of a model to perform well on unseen data. The ultimate goal of machine learning - not just memorizing, but truly learning.
Interview Tips
- 💡Overfitting = too complex (memorizes), Underfitting = too simple (doesn't learn). Use training vs test performance gap to detect
- 💡Bias-Variance Tradeoff: High bias → underfitting, High variance → overfitting. Can't minimize both simultaneously
- 💡Overfitting solutions: Regularization (L1/L2), dropout, more data, early stopping, cross-validation, reduce model complexity
- 💡Underfitting solutions: Increase model complexity, add features, reduce regularization, train longer
- 💡Validation curves show model performance vs complexity. U-shaped test error: left=underfit, bottom=good, right=overfit
- 💡Learning curves plot performance vs training size. Overfitting: large gap between train and test. Underfitting: both converge at poor performance
- 💡Always use train/validation/test split. Train on training set, tune on validation set, final evaluation on test set
- 💡Real example: Polynomial regression with degree 1 (underfit), degree 3 (good), degree 15 (overfit) for quadratic data