A Large Language Model (LLM) is a machine learning model trained on massive amounts of text to predict the next token (word or word fragment) in a sequence.
It learns patterns, relationships, structure, and context from text rather than memorizing fixed answers.
Examples:
Traditional software:
Input → Rules → Output
LLM:
Input → Learned Patterns → Output
Instead of explicitly programmed rules, the model learns statistical relationships from large datasets.
Example:
The capital of France is ___
The model predicts: Paris
Because it has seen similar patterns during training.
LLMs do not read text as sentences. They process tokens.
Example:
"ChatGPT is useful"
Becomes:
["Chat", "GPT", " is", " useful"]
Everything is ultimately converted into tokens.
Text is converted into numerical vectors. These vectors capture meaning.
Example:
King, Queen, Prince, Princess
Will have similar vector representations because they appear in similar contexts.
Embeddings allow semantic understanding rather than simple keyword matching.
The amount of information the model can consider at one time.
Example:
Anything outside the context window is effectively forgotten for that interaction.
The model is trained on large collections of:
The objective is simple: Predict the next token
Example:
The sun rises in the ___
Expected answer: east
Modern LLMs are built using the Transformer architecture.
Key idea: The model determines which words in the input are most relevant to each other.
Example:
"Atharva dropped the toy because he was tired."
The model learns that "he" refers to Atharva.
This mechanism is called attention.
When a user enters a prompt:
Explain TCP/IP
The model:
The model is generating one token at a time.
LLMs are good at:
They do NOT:
They operate by predicting likely token sequences.
An LLM only knows what is in its training and current context.
Retrieve relevant information before generating a response.
User Question
↓
Document Search
↓
Relevant Documents
↓
LLM
↓
Answer
Provide instructions at runtime.
Example: "Act as a senior Java architect."
Model weights remain unchanged.
Additional training on specialized data.
Examples:
Model behavior becomes more specialized.
Hallucination = generating information that sounds plausible but is incorrect.
Examples:
User
↓
Frontend
↓
Application Layer
↓
LLM
↓
Tools / Databases / APIs
↓
Response
Modern AI applications rarely use an LLM alone. They combine:
| Term | Meaning | |------|---------| | Token | Smallest text unit processed by model | | Embedding | Numeric representation of meaning | | Context Window | Information available during generation | | Transformer | Architecture behind modern LLMs | | Attention | Mechanism for identifying relevant context | | Inference | Generating output from a prompt | | RAG | Retrieving data before generation | | Fine-Tuning | Additional training on specific data | | Hallucination | Confident but incorrect output |
An LLM is a transformer-based model trained to predict the next token, enabling it to generate, summarize, reason over, and transform language when supplied with sufficient context.