Home/System Design/Caching

Caching

Improve performance and scalability with intelligent caching strategies

🎒 Explain Like I'm 5...

Imagine you have a backpack where you keep your favorite snacks! 🎒

🍪 The Backpack Rule:

• If you want a snack, first check your BACKPACK (super fast!)
• If it's not there, walk to the KITCHEN (takes longer)
• When you get the snack from kitchen, put some in your BACKPACK for next time!
• Your backpack is small, so you only keep your FAVORITE snacks (most used ones)

🚀 Why Is This Amazing?

• You don't walk to the kitchen every time (saves time and energy!)
• Popular snacks are always close by (faster access!)
• The kitchen doesn't get crowded with everyone asking for snacks (less load!)

🌍 Real-World Magic:

• YouTube keeps popular videos close to you (that's why they load instantly!)
• Your phone remembers your recent apps (opens them super fast!)
• Netflix downloads shows to your device (watch without internet!)

What is Caching?

Caching is a technique of storing copies of frequently accessed data in a faster storage layer (cache) so that future requests for that data can be served faster. Think of it as a temporary storage area that sits between your application and the data source.

Why Caching Matters

⚡Speed: Reduces latency from seconds to milliseconds
💰Cost: Reduces load on expensive resources (databases, APIs)
📈Scalability: Handles more users without adding more servers
🛡️Reliability: Can serve stale data when backend is down

Cache Levels: The Cache Hierarchy

Caching happens at multiple layers, each with different characteristics:

1. Browser Cache

Closest to the user, stores static assets (images, CSS, JavaScript)

Example: When you visit a website, your browser saves images so it doesn't download them again

✓ Pros: Instant loading, no network request needed

✗ Cons: Limited to one user's device, can become stale

2. CDN (Content Delivery Network)

Geographically distributed servers that cache content close to users

Example: Netflix stores popular shows in servers near your city

✓ Pros: Reduces latency, handles traffic spikes, DDoS protection

✗ Cons: Costs money, cache invalidation across all edges is complex

3. Application Cache

In-memory cache within your application server (Redis, Memcached)

Example: Facebook caches your profile data so it loads instantly

✓ Pros: Very fast (sub-millisecond), flexible data structures

✗ Cons: Limited by server memory, need synchronization in distributed systems

4. Database Cache

Query results cached inside the database or with a query cache

Example: MySQL query cache stores results of SELECT queries

✓ Pros: Transparent to application, automatically managed

✗ Cons: Invalidation can be tricky, limited flexibility

Visual Concepts

Cache Hit vs Cache Miss

Understanding the performance difference:

Cache Hit (Fast Path) ⚡

Client→Cache(Found!)

Client←Cache

Time: 1-10ms | DB Load: None

Cache Miss (Slow Path) 🐢

Client→Cache(Not found)

Client→Database

Client←Database

Client→Cache(Store for next time)

Time: 50-500ms | DB Load: Full query

Caching Strategies

Different ways to manage how data flows between cache and database:

Cache-Aside (Lazy Loading)

Application checks cache first. On miss, loads from DB and updates cache.

When to use: Read-heavy workloads, data requested inconsistently

✓ Only requested data is cached, cache failures don't break system

✗ First request is slow (cache miss), potential for stale data

Write-Through

Data is written to cache and database simultaneously

When to use: Need strong consistency, can tolerate write latency

✓ Cache always consistent, no data loss risk

✗ Slower writes (double write), wasted cache space for unused data

Write-Back (Write-Behind)

Data written to cache first, then asynchronously written to DB

When to use: Write-heavy workloads, can tolerate eventual consistency

✓ Very fast writes, reduces DB load

✗ Risk of data loss if cache fails, complexity in error handling

Read-Through

Cache automatically loads data from DB on cache miss

When to use: Want simplified read logic in application

✓ Application code is simpler, consistent loading pattern

✗ First request slow, all data loaded even if not needed

Cache Eviction Policies

When cache is full, which data should we remove?

LRU (Least Recently Used)

Removes the item that hasn't been accessed for the longest time

Analogy:

Best for: General purpose, time-based access patterns

LFU (Least Frequently Used)

Removes the item that has been accessed the fewest times

Analogy:

Best for: Long-term popularity matters more than recent access

FIFO (First In, First Out)

Removes the oldest item in the cache

Analogy:

Best for: Simple implementation, when all data equally important

TTL (Time To Live)

Each item expires after a set duration

Analogy:

Best for: Data that becomes stale, session data, temporary data

Cache Invalidation: The Hard Problem

"There are only two hard things in Computer Science: cache invalidation and naming things." - Phil Karlton

Keeping cache synchronized with the database is challenging:

Timing Issues

Race conditions between cache update and DB write

Distributed Systems

Multiple cache servers need to be synchronized

Partial Updates

When only part of cached data changes

Invalidation Strategies:

1.Time-based: Set TTL (time-to-live) for automatic expiration
2.Event-based: Invalidate when data changes (triggers, webhooks)
3.Manual: Explicit cache clear when updating data
4.Version-based: Use versioned keys for cache entries

Redis vs Memcached

Two popular in-memory caching solutions:

Redis (Remote Dictionary Server)

Features:

•Rich data structures (strings, lists, sets, sorted sets, hashes)
•Persistence options (can survive restarts)
•Pub/Sub messaging
•Lua scripting
•Transactions and atomic operations

Best for: Complex data structures, need persistence, pub/sub patterns

Memcached

Features:

•Simple key-value store (strings only)
•Multi-threaded (better CPU utilization)
•Simpler, more predictable performance
•Lower memory overhead

Best for: Simple caching needs, high throughput, lower memory usage

Real-World Examples with Implementation Details

YouTube: Video Caching at Edge Servers

YouTube uses a multi-tier caching strategy:

▪Tier 1 - Origin Servers: Master copies of all videos
▪Tier 2 - Regional Caches: Cache popular videos in each region
▪Tier 3 - Edge Servers: Cache most popular videos closest to users

How it works:

1. User requests video → Check nearest edge server
2. If not found → Check regional cache
3. If still not found → Fetch from origin
4. Video cached at each level on the way back

Result: Popular videos load in <100ms, unpopular videos take longer

Impact: 95% of requests served from cache, saving massive bandwidth

Facebook: Profile Data Caching

Facebook uses memcached extensively:

Scale: Thousands of memcached servers, petabytes of cached data

Strategy: Cache-aside pattern with regional clusters

How it works:

1. User visits profile → Check memcached for user data
2. Cache miss → Query MySQL database
3. Store result in memcached with 15-30 min TTL
4. Subsequent requests served from cache (sub-millisecond)

Invalidation: When user updates profile, delete cache entry

Result: Profile pages load instantly, database load reduced by 90%

Amazon: Product Catalog Caching

Amazon uses a hybrid caching approach:

▪Layer 1 - CloudFront CDN: Static content (images, CSS)
▪Layer 2 - ElastiCache (Redis): Product details, pricing
▪Layer 3 - Application Memory: Session data, cart info

How it works:

1. Product page request → CDN serves static assets instantly
2. Product details fetched from Redis cache (1-5ms)
3. Cache miss → Query DynamoDB, update cache
4. Pricing updated every 15 minutes via background jobs

Result: Product pages load in <200ms globally, handle Black Friday traffic

Google: Search Results Caching

Google caches search results intelligently:

Strategy: Multi-level cache with smart invalidation

How it works:

1. Query parsed and normalized (lowercase, remove spaces)
2. Check L1 cache (in-memory) for exact query match
3. Cache hit → Return results instantly (<50ms)
4. Cache miss → Run search algorithm, cache results

TTL Strategy: Popular queries cached longer (hours), rare queries shorter (minutes)

Invalidation: New web content triggers cache refresh for related queries

Result: 70% of searches served from cache, dramatically faster than re-running search

Types of Caches by Scope

Local Cache (In-Process)

Cache stored in application's memory

Example: Node.js Map, Python dict, Java HashMap

✓ Pros: Fastest access, no network overhead, simple

✗ Cons: Not shared between servers, lost on restart

Distributed Cache

Cache shared across multiple servers

Example: Redis cluster, Memcached pool

✓ Pros: Shared state, scalable, survives individual server failures

✗ Cons: Network latency, complexity, consistency challenges

When to Cache?

✓ Cache data when:

•Read frequently, written infrequently (high read/write ratio)
•Expensive to compute or fetch (complex queries, API calls)
•Doesn't change often (configuration, product catalog)
•Tolerates some staleness (news articles, social media feeds)

✗ Don't cache when:

•Data changes very frequently (real-time stock prices)
•Requires strong consistency (financial transactions)
•Unique per request (personalized, one-time data)
•Already fast to fetch (simple DB queries on indexed columns)

Best Practices

1.Set appropriate TTL - Balance freshness vs performance
2.Use cache keys wisely - Include version numbers, user IDs, timestamps
3.Monitor cache hit rates - Aim for >80% hit rate
4.Implement cache warming - Pre-populate cache with expected data
5.Handle cache failures gracefully - Always have fallback to source
6.Use compression - Reduce memory usage for large objects
7.Namespace your keys - Prevent collisions between different data types
8.Log cache metrics - Track misses, latency, eviction rates

Common Pitfalls to Avoid

⚠️Over-caching: Caching too much data wastes memory
⚠️Cache stampede: Multiple requests fetch same data simultaneously on cache miss
⚠️Inconsistent invalidation: Cache and DB out of sync
⚠️Cache penetration: Malicious requests for non-existent data bypass cache
⚠️Not monitoring: Can't optimize what you don't measure
⚠️Ignoring network latency: In-memory doesn't mean instant in distributed systems