Design a URL Shortener
π― Explain Like I'm 5...
Imagine you have a really long book title that's hard to remember. A URL shortener is like giving that book a short nickname that's easy to remember and share!
What is a URL Shortener?
It turns long web addresses into short, easy-to-share links.
Example:
Long URL: https://www.example.com/products/category/shoes/nike/air-max-2024?color=red&size=10
Short URL: https://short.url/abc123
π¨ How It Works (Simple Version)
- 1. User gives us a long URL
- 2. We create a short code (like 'abc123')
- 3. We save: short code β long URL in our database
- 4. When someone clicks the short link, we look up the long URL and redirect them!
π― Key Features to Design
- βURL Shortening: Convert long URLs to short ones
- βURL Redirection: When clicked, go to original URL
- βCustom Short URLs: Let users choose their own short code
- βAnalytics: Track how many times link was clicked
- βExpiration: Links can expire after some time
π Requirements
Functional Requirements:
- β’ Generate a unique short URL for each long URL
- β’ Redirect users from short URL to original URL
- β’ Support custom short URLs (optional)
- β’ Track click analytics
- β’ Delete/expire old URLs
Non-Functional Requirements:
- β’ High availability (99.9% uptime)
- β’ Low latency (<100ms for redirects)
- β’ Scalable (millions of URLs)
- β’ Secure (prevent abuse)
π Capacity Estimation
Assumptions:
- β’ 100 million URLs created per month
- β’ 100:1 read to write ratio (100 redirects per 1 creation)
- β’ URLs stored for 5 years
Calculations:
- β’ Write requests: 100M/month β 40 requests/second
- β’ Read requests: 40 * 100 = 4,000 requests/second
- β’ Storage: 100M URLs/month * 60 months = 6 billion URLs
- β’ If each URL = 500 bytes β 6B * 500B = 3TB storage
π API Design
// 1. Create Short URLPOST /api/shortenRequest Body:{ "originalUrl": "https://www.example.com/very/long/url", "customAlias": "mylink" // optional}Response:{ "shortUrl": "https://short.url/abc123", "originalUrl": "https://www.example.com/very/long/url", "createdAt": "2024-01-15T10:30:00Z"}// 2. Redirect Short URLGET /{shortCode}Response: HTTP 301/302 Redirect to original URL// 3. Get URL StatsGET /api/stats/{shortCode}Response:{ "shortCode": "abc123", "originalUrl": "https://www.example.com/very/long/url", "clicks": 1523, "createdAt": "2024-01-15T10:30:00Z", "expiresAt": "2025-01-15T10:30:00Z"}// 4. Delete URLDELETE /api/url/{shortCode}Response: 204 No ContentπΎ Database Schema
-- URLs TableCREATE TABLE urls ( id BIGINT PRIMARY KEY AUTO_INCREMENT, short_code VARCHAR(10) UNIQUE NOT NULL, original_url TEXT NOT NULL, user_id BIGINT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, expires_at TIMESTAMP, is_custom BOOLEAN DEFAULT FALSE, INDEX idx_short_code (short_code), INDEX idx_user_id (user_id));-- Analytics TableCREATE TABLE url_analytics ( id BIGINT PRIMARY KEY AUTO_INCREMENT, short_code VARCHAR(10) NOT NULL, clicked_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, ip_address VARCHAR(45), user_agent TEXT, referrer TEXT, INDEX idx_short_code (short_code), INDEX idx_clicked_at (clicked_at));-- Users Table (Optional)CREATE TABLE users ( id BIGINT PRIMARY KEY AUTO_INCREMENT, email VARCHAR(255) UNIQUE, api_key VARCHAR(64) UNIQUE, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP);URL Shortening Algorithm
Option 1: Hash-Based (MD5/SHA256)
Pros: Fast, deterministic
Cons: Collisions possible, long output needs truncation
Option 2: Base62 Encoding
Convert unique ID to Base62 (a-z, A-Z, 0-9)
Pros: Short, no collisions with auto-increment ID
Cons: Predictable sequence
Option 3: Random String Generation
Pros: Unpredictable, secure
Cons: Need to check for collisions
β Recommended Approach: Base62 Encoding
public class Base62Encoder { // Base62 character set: a-z, A-Z, 0-9 (62 characters) private static final String BASE62_CHARS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"; /** * Encode a number to Base62 string * @param num - The number to encode (database auto-increment ID) * @return Base62 encoded string */ public static String encodeBase62(long num) { if (num == 0) { return String.valueOf(BASE62_CHARS.charAt(0)); } StringBuilder encoded = new StringBuilder(); while (num > 0) { int remainder = (int)(num % 62); encoded.insert(0, BASE62_CHARS.charAt(remainder)); num = num / 62; } return encoded.toString(); } /** * Decode a Base62 string back to number * @param str - The Base62 string to decode * @return Decoded number */ public static long decodeBase62(String str) { long decoded = 0; for (int i = 0; i < str.length(); i++) { char c = str.charAt(i); int value = BASE62_CHARS.indexOf(c); decoded = decoded * 62 + value; } return decoded; } public static void main(String[] args) { // Example usage: long id = 125348L; // Auto-increment database ID String shortCode = encodeBase62(id); System.out.println("Short code: " + shortCode); // "w7e" // Decode back long decodedId = decodeBase62(shortCode); System.out.println("Decoded ID: " + decodedId); // 125348 // More examples System.out.println("ID 1: " + encodeBase62(1L)); // "b" System.out.println("ID 62: " + encodeBase62(62L)); // "ba" System.out.println("ID 1000000: " + encodeBase62(1000000L)); // "4c92" // Capacity calculations: // 7 characters = 62^7 = 3.5 trillion unique URLs! // 6 characters = 62^6 = 56 billion unique URLs // 5 characters = 62^5 = 916 million unique URLs }}ποΈ High-Level System Design
ββββββββββββββββ
β Clients β
β (Web/Mobile) β
ββββββββ¬ββββββββ
β
ββββββββΌββββββββ
βLoad Balancer β
ββββββββ¬ββββββββ
β
βββββββββββββββββΌββββββββββββββββ
β β β
ββββββββΌββββββ βββββββΌβββββββ ββββββΌβββββββ
β App β β App β β App β
β Server 1 β β Server 2 β β Server 3 β
ββββββββ¬ββββββ βββββββ¬βββββββ ββββββ¬βββββββ
β β β
βββββββββββββββββΌββββββββββββββββ
β
ββββββββββββ΄βββββββββββ
β β
ββββββββΌβββββββ ββββββββΌβββββββ
β Cache β β Analytics β
β (Redis) β β Service β
ββββββββ¬βββββββ ββββββββ¬βββββββ
β β
ββββββββΌβββββββ ββββββββΌβββββββ
β Database β β Message β
β (Master) β β Queue β
ββββββββ¬βββββββ β (Kafka) β
β βββββββββββββββ
ββββββββ΄βββββββ
β β
ββββββΌββββ ββββββΌββββ
βReplica β βReplica β
β 1 β β 2 β
ββββββββββ ββββββββββ
Components:
- β’ Load Balancer: Distribute traffic
- β’ Application Servers: Handle requests
- β’ Database: Store URL mappings
- β’ Cache (Redis): Fast lookups for popular URLs
- β’ Analytics Service: Track clicks
Data Flow:
Create Short URL:
- 1. Client β Load Balancer β App Server
- 2. Generate unique ID (auto-increment)
- 3. Convert ID to Base62
- 4. Save to database
- 5. Return short URL
Redirect Short URL:
- 1. Client clicks short URL
- 2. Check cache for short code
- 3. If miss, query database
- 4. Cache the result
- 5. Redirect to original URL (HTTP 301/302)
π Deep Dive Topics
Caching Strategy
Use Redis/Memcached to cache popular URLs. LRU eviction policy.
Scaling Database
Partition data by hash of short code. Use read replicas for high read traffic.
Rate Limiting
Prevent abuse by limiting requests per user/IP (e.g., 100 URLs/hour).
Security Considerations
- β’ Validate URLs before shortening
- β’ Prevent malicious redirects
- β’ Use CAPTCHA for suspicious activity
- β’ Implement expiration for old links
βοΈ Trade-offs & Decisions
HTTP 301 vs 302 Redirect:
301: 301 (Permanent): Cached by browser, faster but no analytics
302: 302 (Temporary): Always hits server, slower but full analytics
SQL vs NoSQL:
SQL: SQL: ACID, complex queries, harder to scale
NoSQL: NoSQL: Fast, scalable, but eventual consistency
π€ Follow-up Questions
Q: How would you handle custom short URLs?
A: Check availability, reserve in database, prevent conflicts with auto-generated codes
Q: How to prevent the same long URL from getting multiple short URLs?
A: Hash long URL, check if exists in database before creating new short URL
Q: How to implement analytics?
A: Use message queue (Kafka) to async process click events. Store in time-series database.
Q: How to delete/expire URLs?
A: Add 'expiry_date' field. Run cron job to clean expired URLs. Mark as deleted (soft delete).
π Real-World Examples
- β’ bit.ly - Popular URL shortener with analytics
- β’ tinyurl.com - Simple, no-frills URL shortening
- β’ goo.gl - Google's URL shortener (discontinued)