← Back to notes

Large Language Models (LLMs) – Introduction

Published on January 15, 2026

Large Language Models (LLMs) – Introduction

What is an LLM?

A Large Language Model (LLM) is a machine learning model trained on massive amounts of text to predict the next token (word or word fragment) in a sequence.

It learns patterns, relationships, structure, and context from text rather than memorizing fixed answers.

Examples:


What Makes an LLM Different?

Traditional software:

Input → Rules → Output

LLM:

Input → Learned Patterns → Output

Instead of explicitly programmed rules, the model learns statistical relationships from large datasets.

Example:

The capital of France is ___

The model predicts: Paris

Because it has seen similar patterns during training.


Core Building Blocks

Tokens

LLMs do not read text as sentences. They process tokens.

Example:

"ChatGPT is useful"

Becomes:

["Chat", "GPT", " is", " useful"]

Everything is ultimately converted into tokens.


Embeddings

Text is converted into numerical vectors. These vectors capture meaning.

Example:

King, Queen, Prince, Princess

Will have similar vector representations because they appear in similar contexts.

Embeddings allow semantic understanding rather than simple keyword matching.


Context Window

The amount of information the model can consider at one time.

Example:

Anything outside the context window is effectively forgotten for that interaction.


How an LLM Works

Step 1: Training Data

The model is trained on large collections of:

The objective is simple: Predict the next token

Example:

The sun rises in the ___

Expected answer: east

Step 2: Transformer Architecture

Modern LLMs are built using the Transformer architecture.

Key idea: The model determines which words in the input are most relevant to each other.

Example:

"Atharva dropped the toy because he was tired."

The model learns that "he" refers to Atharva.

This mechanism is called attention.


Step 3: Inference

When a user enters a prompt:

Explain TCP/IP

The model:

  1. Converts text to tokens
  2. Processes tokens through layers
  3. Predicts the next token
  4. Repeats until a complete response is generated

The model is generating one token at a time.


Why LLMs Appear Intelligent

LLMs are good at:

They do NOT:

They operate by predicting likely token sequences.


Common Use Cases

Content Generation

Coding

Search & Knowledge Assistance

Customer Support

Data Analysis

Enterprise Automation


RAG (Retrieval Augmented Generation)

Problem

An LLM only knows what is in its training and current context.

Solution

Retrieve relevant information before generating a response.

Flow

User Question
    ↓
Document Search
    ↓
Relevant Documents
    ↓
LLM
    ↓
Answer

Benefits


Fine-Tuning vs Prompting

Prompting

Provide instructions at runtime.

Example: "Act as a senior Java architect."

Model weights remain unchanged.

Fine-Tuning

Additional training on specialized data.

Examples:

Model behavior becomes more specialized.


Hallucinations

Hallucination = generating information that sounds plausible but is incorrect.

Examples:

Reasons

Mitigation


Typical LLM Application Architecture

User
  ↓
Frontend
  ↓
Application Layer
  ↓
LLM
  ↓
Tools / Databases / APIs
  ↓
Response

Modern AI applications rarely use an LLM alone. They combine:


Key Terms Reference

| Term | Meaning | |------|---------| | Token | Smallest text unit processed by model | | Embedding | Numeric representation of meaning | | Context Window | Information available during generation | | Transformer | Architecture behind modern LLMs | | Attention | Mechanism for identifying relevant context | | Inference | Generating output from a prompt | | RAG | Retrieving data before generation | | Fine-Tuning | Additional training on specific data | | Hallucination | Confident but incorrect output |


Summary

An LLM is a transformer-based model trained to predict the next token, enabling it to generate, summarize, reason over, and transform language when supplied with sufficient context.