Large Language Models (LLMs)

Understanding GPT, ChatGPT, fine-tuning, and prompt engineering

Imagine a super-smart friend who has read the entire internet and can answer almost any question or help with any writing task! That's what Large Language Models (LLMs) like ChatGPT are - massive AI systems trained on billions of words to understand and generate human-like text. They learned by predicting the next word in sentences over and over, millions of times. The 'large' part means they have billions of parameters (learned patterns) - GPT-3 has 175 billion! This massive scale gives them amazing abilities: writing essays, coding, translation, math, and having natural conversations. They're like having Wikipedia, a tutor, and a writing assistant all rolled into one!

What are Large Language Models?

Large Language Models (LLMs) are neural networks with billions of parameters trained on massive text datasets. They use the Transformer architecture and are trained to predict the next token in a sequence. Through this simple task repeated trillions of times, they learn grammar, facts, reasoning, and even some coding ability. Examples include GPT-4, ChatGPT, Claude, PaLM, and LLaMA. Their 'emergence' phenomenon shows that at large scale, they develop capabilities not explicitly programmed.

GPT-4

~1.7T parameters

OpenAI

ChatGPT, API

Claude

~175B+ parameters

Anthropic

Assistant, API

LLaMA 2

7B-70B parameters

How Do LLMs Work?

The fundamental process behind language models:

Next-Token Prediction

Training:

Input: 'The cat sat on the ___'

Model learns to predict: 'mat' (most likely), 'floor', 'chair' (possible)

Generation:

Start: 'Once upon a' → predict 'time' → now 'Once upon a time' → predict 'there' → continue...

python

# Demonstrating Autoregressive Generation
"""
How LLMs generate text step by step:
USER PROMPT: "Write a haiku about AI"
GENERATION PROCESS (token by token):
Step 1:  "" → predict → "Silicon"
Step 2:  "Silicon" → predict → " minds"
Step 3:  "Silicon minds" → predict → " awakening"
Step 4:  "Silicon minds awakening" → predict → "\n"
Step 5:  "Silicon minds awakening\n" → predict → "Patterns"
Step 6:  "Silicon minds awakening\nPatterns" → predict → " in"
Step 7:  "...Patterns in" → predict → " the"
Step 8:  "...Patterns in the" → predict → " data"
Step 9:  "...Patterns in the data" → predict → "\n"
Step 10: "...\n" → predict → "Learning"
Step 11: "...Learning" → predict → " never"
Step 12: "...Learning never" → predict → " stops"
Step 13: "...Learning never stops" → predict → "<END>"
FINAL OUTPUT:
"Silicon minds awakening
Patterns in the data
Learning never stops"
KEY INSIGHTS:
- Each token is predicted based on ALL previous tokens (context)
- Model doesn't "know" the full poem in advance
- Probability distribution at each step (could sample different tokens)
- This is why setting random seed affects output!
- Computational complexity: O(n) sequential steps for n tokens
"""
# Simplified pseudo-code
def generate_text(prompt, max_tokens=50):
    context = tokenize(prompt)
    generated = []
    for _ in range(max_tokens):
        # Model computes probability distribution over vocabulary
        logits = model.forward(context)  # [vocab_size]
        probs = softmax(logits)
        # Sample next token (various strategies possible)
        next_token = sample(probs, temperature=0.7)
        generated.append(next_token)
        context.append(next_token)  # Add to context
        if next_token == END_TOKEN:
            break
    return detokenize(generated)
# This happens internally when you call:
# response = openai.ChatCompletion.create(...)

Training Process

Three main stages in creating modern LLMs:

1. Pre-training (Unsupervised)

Train on massive text corpus (web pages, books, code) to predict next token. Most expensive stage.

Data:

• Billions of web pages
• Books, articles, Wikipedia
• Code repositories (GitHub)

Result:

• General language understanding
• World knowledge
• Cost: $2-10M+ for GPT-3 scale

2. Fine-tuning (Supervised)

Train on specific task data with labeled examples. Adapts model to particular use case.

Examples:

• Question-answer pairs
• Classification examples
• Instruction-following demos

Result:

• Task-specific performance
• More controlled outputs
• Cost: $100s-$10K

3. RLHF (Reinforcement Learning from Human Feedback)

Humans rank model outputs, reward model learns preferences, policy model optimized for higher rewards.

Process:

• Humans rank outputs (A > B > C)
• Train reward model on preferences
• Optimize policy with RL (PPO)

Result:

• Alignment with human values
• Helpful, harmless, honest
• Used in ChatGPT, Claude

Prompt Engineering

The art and science of getting good outputs from LLMs:

Zero-Shot Prompting

Direct instruction without examples

Prompt:

"Classify sentiment: I love this product!"

Output:

"Positive"

Few-Shot Prompting

Provide examples to guide model

Prompt:

"Good product → Positive
Terrible → Negative
Okay → Neutral

I love this → ?"

Output:

"Positive"

python

# Advanced Prompt Engineering Techniques
# 1. ZERO-SHOT (Direct instruction)
prompt_zero = """
Translate to French: "Hello, how are you?"
"""
# 2. FEW-SHOT (Provide examples)
prompt_few = """
Translate to French:
English: "Good morning"
French: "Bonjour"
English: "Thank you"
French: "Merci"
English: "Hello, how are you?"
French:
"""
# 3. CHAIN-OF-THOUGHT (Step-by-step reasoning)
prompt_cot = """
Question: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. 
Each can has 3 tennis balls. How many tennis balls does he have now?
Answer: Let's think step by step.
1. Roger starts with 5 balls
2. He buys 2 cans, each with 3 balls
3. 2 cans × 3 balls/can = 6 balls
4. Total: 5 + 6 = 11 balls
Answer: 11 balls
Question: Janet has 4 apples. She gives 2 to her friend and buys 5 more. 
How many apples does she have?
Answer: Let's think step by step.
"""
# 4. INSTRUCTION FOLLOWING (Clear structure)
prompt_instruction = """
You are a helpful assistant that answers questions concisely.
Rules:
- Keep answers under 50 words
- Use simple language
- Cite sources when possible
Question: What is machine learning?
Answer:
"""
# 5. SYSTEM + USER PATTERN (ChatGPT style)
messages = [
    {
        "role": "system",
        "content": "You are an expert Python tutor. Explain concepts clearly with examples."
    },
    {
        "role": "user",
        "content": "What is a list comprehension?"
    }
]
# TEMPERATURE CONTROL
# Low temperature (0.0-0.3): Deterministic, focused, factual
response_factual = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What is 2+2?"}],
    temperature=0.1  # Almost deterministic
)
# High temperature (0.8-1.5): Creative, diverse, random
response_creative = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Write a creative story opening."}],
    temperature=1.2  # More random/creative
)
# BEST PRACTICES:
# - Be specific and clear
# - Provide context and examples
# - Use delimiters (""", ---, ###) to separate sections
# - Specify output format (JSON, bullet points, etc.)
# - Iterate and refine based on results

Key Concepts

Autoregressive Generation

LLMs generate text one token at a time, using previously generated tokens as context. Each token is sampled from a probability distribution over the vocabulary.

Pre-training vs Fine-tuning

Pre-training: unsupervised learning on massive text (expensive, done once). Fine-tuning: supervised learning on specific task data (cheaper, adapts model). Transfer learning enables this two-stage approach.

Context Window

Maximum number of tokens the model can process at once (e.g., GPT-4: 8K-32K tokens). Longer context = remembers more of the conversation but more computationally expensive.

Temperature & Sampling

Temperature controls randomness: low (0.1) = deterministic/focused, high (1.0+) = creative/random. Sampling methods: greedy, top-k, top-p (nucleus) affect output diversity.

Interview Tips

💡LLMs are Transformer-based models with billions of parameters trained on massive text corpora using next-token prediction
💡Training stages: 1) Pre-training (unsupervised, predict next token, expensive), 2) Fine-tuning (supervised, task-specific), 3) RLHF (reinforcement learning from human feedback)
💡GPT = Generative Pre-trained Transformer. Decoder-only architecture trained autoregressively (predict next token given previous)
💡Emergent abilities: at large scale (~100B parameters), models suddenly gain capabilities like reasoning, few-shot learning not seen in smaller models
💡Context window: maximum tokens processed. GPT-3: 4K, GPT-4: 8K-32K. Longer context = more memory but also more cost and attention computation (O(n²))
💡Prompt engineering: zero-shot (no examples), few-shot (provide examples), chain-of-thought (step-by-step reasoning), instruction-following
💡Temperature: controls randomness. Low (0.0-0.3) = deterministic/factual, Medium (0.7) = balanced, High (1.0+) = creative/random
💡Fine-tuning vs RAG: Fine-tuning updates model weights (expensive but integrated knowledge). RAG retrieves relevant docs (cheaper but external dependency)
💡Key challenges: hallucinations (generating false info), bias (from training data), alignment (following human values), computational cost
💡Applications: chatbots, code generation (Copilot), content creation, translation, summarization, QA, data extraction