Caching
Improve performance and scalability with intelligent caching strategies
🎒 Explain Like I'm 5...
Imagine you have a backpack where you keep your favorite snacks! 🎒
🍪 The Backpack Rule:
- • If you want a snack, first check your BACKPACK (super fast!)
- • If it's not there, walk to the KITCHEN (takes longer)
- • When you get the snack from kitchen, put some in your BACKPACK for next time!
- • Your backpack is small, so you only keep your FAVORITE snacks (most used ones)
🚀 Why Is This Amazing?
- • You don't walk to the kitchen every time (saves time and energy!)
- • Popular snacks are always close by (faster access!)
- • The kitchen doesn't get crowded with everyone asking for snacks (less load!)
🌍 Real-World Magic:
- • YouTube keeps popular videos close to you (that's why they load instantly!)
- • Your phone remembers your recent apps (opens them super fast!)
- • Netflix downloads shows to your device (watch without internet!)
What is Caching?
Caching is a technique of storing copies of frequently accessed data in a faster storage layer (cache) so that future requests for that data can be served faster. Think of it as a temporary storage area that sits between your application and the data source.
Why Caching Matters
- ⚡Speed: Reduces latency from seconds to milliseconds
- 💰Cost: Reduces load on expensive resources (databases, APIs)
- 📈Scalability: Handles more users without adding more servers
- 🛡️Reliability: Can serve stale data when backend is down
Cache Levels: The Cache Hierarchy
Caching happens at multiple layers, each with different characteristics:
1. Browser Cache
Closest to the user, stores static assets (images, CSS, JavaScript)
Example: When you visit a website, your browser saves images so it doesn't download them again
✓ Pros: Instant loading, no network request needed
✗ Cons: Limited to one user's device, can become stale
2. CDN (Content Delivery Network)
Geographically distributed servers that cache content close to users
Example: Netflix stores popular shows in servers near your city
✓ Pros: Reduces latency, handles traffic spikes, DDoS protection
✗ Cons: Costs money, cache invalidation across all edges is complex
3. Application Cache
In-memory cache within your application server (Redis, Memcached)
Example: Facebook caches your profile data so it loads instantly
✓ Pros: Very fast (sub-millisecond), flexible data structures
✗ Cons: Limited by server memory, need synchronization in distributed systems
4. Database Cache
Query results cached inside the database or with a query cache
Example: MySQL query cache stores results of SELECT queries
✓ Pros: Transparent to application, automatically managed
✗ Cons: Invalidation can be tricky, limited flexibility
Visual Concepts
Cache Hit vs Cache Miss
Understanding the performance difference:
Cache Hit (Fast Path) ⚡
Cache Miss (Slow Path) 🐢
Caching Strategies
Different ways to manage how data flows between cache and database:
Cache-Aside (Lazy Loading)
Application checks cache first. On miss, loads from DB and updates cache.
When to use: Read-heavy workloads, data requested inconsistently
✓ Only requested data is cached, cache failures don't break system
✗ First request is slow (cache miss), potential for stale data
Write-Through
Data is written to cache and database simultaneously
When to use: Need strong consistency, can tolerate write latency
✓ Cache always consistent, no data loss risk
✗ Slower writes (double write), wasted cache space for unused data
Write-Back (Write-Behind)
Data written to cache first, then asynchronously written to DB
When to use: Write-heavy workloads, can tolerate eventual consistency
✓ Very fast writes, reduces DB load
✗ Risk of data loss if cache fails, complexity in error handling
Read-Through
Cache automatically loads data from DB on cache miss
When to use: Want simplified read logic in application
✓ Application code is simpler, consistent loading pattern
✗ First request slow, all data loaded even if not needed
Cache Eviction Policies
When cache is full, which data should we remove?
LRU (Least Recently Used)
Removes the item that hasn't been accessed for the longest time
Analogy:
Best for: General purpose, time-based access patterns
LFU (Least Frequently Used)
Removes the item that has been accessed the fewest times
Analogy:
Best for: Long-term popularity matters more than recent access
FIFO (First In, First Out)
Removes the oldest item in the cache
Analogy:
Best for: Simple implementation, when all data equally important
TTL (Time To Live)
Each item expires after a set duration
Analogy:
Best for: Data that becomes stale, session data, temporary data
Cache Invalidation: The Hard Problem
"There are only two hard things in Computer Science: cache invalidation and naming things." - Phil Karlton
Keeping cache synchronized with the database is challenging:
Timing Issues
Race conditions between cache update and DB write
Distributed Systems
Multiple cache servers need to be synchronized
Partial Updates
When only part of cached data changes
Invalidation Strategies:
- 1.Time-based: Set TTL (time-to-live) for automatic expiration
- 2.Event-based: Invalidate when data changes (triggers, webhooks)
- 3.Manual: Explicit cache clear when updating data
- 4.Version-based: Use versioned keys for cache entries
Redis vs Memcached
Two popular in-memory caching solutions:
Redis (Remote Dictionary Server)
Features:
- •Rich data structures (strings, lists, sets, sorted sets, hashes)
- •Persistence options (can survive restarts)
- •Pub/Sub messaging
- •Lua scripting
- •Transactions and atomic operations
Best for: Complex data structures, need persistence, pub/sub patterns
Memcached
Features:
- •Simple key-value store (strings only)
- •Multi-threaded (better CPU utilization)
- •Simpler, more predictable performance
- •Lower memory overhead
Best for: Simple caching needs, high throughput, lower memory usage
Real-World Examples with Implementation Details
YouTube: Video Caching at Edge Servers
YouTube uses a multi-tier caching strategy:
- ▪Tier 1 - Origin Servers: Master copies of all videos
- ▪Tier 2 - Regional Caches: Cache popular videos in each region
- ▪Tier 3 - Edge Servers: Cache most popular videos closest to users
How it works:
- 1. User requests video → Check nearest edge server
- 2. If not found → Check regional cache
- 3. If still not found → Fetch from origin
- 4. Video cached at each level on the way back
Result: Popular videos load in <100ms, unpopular videos take longer
Impact: 95% of requests served from cache, saving massive bandwidth
Facebook: Profile Data Caching
Facebook uses memcached extensively:
Scale: Thousands of memcached servers, petabytes of cached data
Strategy: Cache-aside pattern with regional clusters
How it works:
- 1. User visits profile → Check memcached for user data
- 2. Cache miss → Query MySQL database
- 3. Store result in memcached with 15-30 min TTL
- 4. Subsequent requests served from cache (sub-millisecond)
Invalidation: When user updates profile, delete cache entry
Result: Profile pages load instantly, database load reduced by 90%
Amazon: Product Catalog Caching
Amazon uses a hybrid caching approach:
- ▪Layer 1 - CloudFront CDN: Static content (images, CSS)
- ▪Layer 2 - ElastiCache (Redis): Product details, pricing
- ▪Layer 3 - Application Memory: Session data, cart info
How it works:
- 1. Product page request → CDN serves static assets instantly
- 2. Product details fetched from Redis cache (1-5ms)
- 3. Cache miss → Query DynamoDB, update cache
- 4. Pricing updated every 15 minutes via background jobs
Result: Product pages load in <200ms globally, handle Black Friday traffic
Google: Search Results Caching
Google caches search results intelligently:
Strategy: Multi-level cache with smart invalidation
How it works:
- 1. Query parsed and normalized (lowercase, remove spaces)
- 2. Check L1 cache (in-memory) for exact query match
- 3. Cache hit → Return results instantly (<50ms)
- 4. Cache miss → Run search algorithm, cache results
TTL Strategy: Popular queries cached longer (hours), rare queries shorter (minutes)
Invalidation: New web content triggers cache refresh for related queries
Result: 70% of searches served from cache, dramatically faster than re-running search
Types of Caches by Scope
Local Cache (In-Process)
Cache stored in application's memory
Example: Node.js Map, Python dict, Java HashMap
✓ Pros: Fastest access, no network overhead, simple
✗ Cons: Not shared between servers, lost on restart
Distributed Cache
Cache shared across multiple servers
Example: Redis cluster, Memcached pool
✓ Pros: Shared state, scalable, survives individual server failures
✗ Cons: Network latency, complexity, consistency challenges
When to Cache?
✓ Cache data when:
- •Read frequently, written infrequently (high read/write ratio)
- •Expensive to compute or fetch (complex queries, API calls)
- •Doesn't change often (configuration, product catalog)
- •Tolerates some staleness (news articles, social media feeds)
✗ Don't cache when:
- •Data changes very frequently (real-time stock prices)
- •Requires strong consistency (financial transactions)
- •Unique per request (personalized, one-time data)
- •Already fast to fetch (simple DB queries on indexed columns)
Best Practices
- 1.Set appropriate TTL - Balance freshness vs performance
- 2.Use cache keys wisely - Include version numbers, user IDs, timestamps
- 3.Monitor cache hit rates - Aim for >80% hit rate
- 4.Implement cache warming - Pre-populate cache with expected data
- 5.Handle cache failures gracefully - Always have fallback to source
- 6.Use compression - Reduce memory usage for large objects
- 7.Namespace your keys - Prevent collisions between different data types
- 8.Log cache metrics - Track misses, latency, eviction rates
Common Pitfalls to Avoid
- ⚠️Over-caching: Caching too much data wastes memory
- ⚠️Cache stampede: Multiple requests fetch same data simultaneously on cache miss
- ⚠️Inconsistent invalidation: Cache and DB out of sync
- ⚠️Cache penetration: Malicious requests for non-existent data bypass cache
- ⚠️Not monitoring: Can't optimize what you don't measure
- ⚠️Ignoring network latency: In-memory doesn't mean instant in distributed systems