Andrej Karpathy’s deep dive into LLMs video

Read Time 10 mins | Written by: Cole

In this video, Andrej Karpathy (one of the leading voices in AI research and education) guides us through the intricate process of building and refining large language models (LLMs).

From ingesting vast amounts of internet text during pretraining to fine-tuning with human feedback and reinforcement learning, every stage of the model’s evolution is unpacked.

Whether you’re interested in the basics of tokenization or the challenges of model “hallucinations,” this video has it all.

Watch Andrej Karpathy’s Deep Dive into LLMs video

Andrej Karpathy recently launched Eureka Labs, a new initiative focused on democratizing AI education. With a mission to make complex machine learning concepts accessible and engaging,

Eureka Labs offers high-quality, hands-on content that empowers learners of all levels to understand and build with AI. It’s part of Karpathy’s broader vision to accelerate public understanding of large language models and their real-world applications.

Here's a breakdown of the video by section, followed by a list of resources at the end.

LLM video outline by topic

00:00:00 introduction

The video introduces the topic of LLMs and sets the stage for a detailed exploration.
The speaker explains the importance of understanding the training stages and internal mechanics.
The discussion establishes expectations for a deep dive into LLM architecture and training processes.

00:01:00 pretraining data (internet)

LLMs begin their learning journey by ingesting vast amounts of text from the internet.
Pretraining builds a broad foundation of language patterns and factual knowledge.
This stage forms the baseline for all subsequent learning and specialization.

00:07:47 tokenization

Text is broken down into tokens – small units such as subwords or characters.
Tokenization makes processing more efficient while introducing limitations for fine-grained tasks.
The method shapes how the model “sees” and processes language.

00:14:27 neural network I/O

The model receives input as token sequences and generates output by predicting the next token.
Internal architecture – including transformers, attention mechanisms, and multilayer perceptrons – processes tokens with fixed computational steps.
These components work together to encode language patterns and store knowledge.

00:26:01 inference

During inference, the model uses its fixed, pre-learned parameters to generate text one token at a time.
The output is produced by sampling from learned probability distributions.
Examples from GPT-2 and Llama 3.1 demonstrate how inference has evolved in modern models.

00:31:09 GPT-2: training and inference

00:42:52 Llama 3.1 base model inference

The shift from general language understanding (pretraining) to specialized behavior (post-training) is explained.
Pretraining builds a broad knowledge base, while post-training refines this into a practical, interactive assistant.
This transition is crucial for converting a raw language simulator into a task-oriented tool.

01:01:06 post-training data (conversations)

Curated conversational data is used to fine-tune the model after pretraining.
Human labelers create ideal prompt–response pairs to teach the model to behave like a helpful assistant.
This phase adjusts the model’s personality and practical utility for dialogue.

01:20:32 hallucinations, tool use, knowledge/working memory

The speaker addresses the phenomenon of hallucinations, where models generate confident yet incorrect or fabricated information.
Limited working memory (context window) sometimes leads to errors in complex tasks.
External tools like code interpreters or web searches can offload certain computations, enhancing reliability.

01:41:46 knowledge of self

The model’s responses about its own identity are derived from statistical patterns rather than true self-awareness.
Questions like “Who built you?” yield generic, data-driven answers.
This highlights that LLMs are simulations of human text, not sentient beings.

01:46:56 models need tokens to think

The model’s reasoning is distributed across many tokens in its output sequence.
Each token is produced with a fixed computational budget, so complex tasks must be broken into smaller steps.
Overloading a single token with too much computation can lead to errors.

02:01:11 tokenization revisited: models struggle with spelling

Revisiting tokenization reveals challenges with precise, character-level tasks.
Operations like spelling or counting individual letters can be error-prone.
External mechanisms (such as using code) may be needed for fine-grained tasks.

02:04:53 jagged intelligence

“Jagged intelligence” describes the uneven performance of LLMs, where they may show sudden lapses in reasoning.
Even high-performing models can exhibit inexplicable errors in simple tasks, like basic arithmetic.
This variability underscores inherent gaps in statistical learning.

02:07:28 supervised finetuning to reinforcement learning

The transition from supervised fine-tuning (imitating ideal human responses) to reinforcement learning (RL) is explained.
Supervised fine-tuning creates a baseline assistant behavior using human-generated dialogue.
Reinforcement learning then refines the model by rewarding effective solution paths.

02:14:42 reinforcement learning

RL helps the model explore multiple candidate solutions and reinforces strategies that yield correct outcomes.
The process is compared to a student practicing problem-solving through trial and error.
Rewarding correct strategies leads to improved performance over time.

02:27:47 DeepSeek-R1

DeepSeek-R1 is showcased as an example of a model enhanced by reinforcement learning.
The model develops extended chains of thought, particularly useful in solving math and coding problems.
This demonstrates the emergence of “thinking” behavior in LLMs through RL.

02:42:07 AlphaGo

The RL process in LLMs is compared to AlphaGo’s innovative play in the game of Go.
Just as AlphaGo discovered moves beyond human strategies (e.g., “Move 37”), LLMs can develop novel reasoning techniques.
This analogy underscores the transformative potential of reinforcement learning.

02:48:26 reinforcement learning from human feedback (RLHF)

RLHF combines reinforcement learning with human feedback to fine-tune model responses.
Human evaluators rank multiple outputs, and a reward model is trained to mimic these judgments.
While effective, this method must be carefully managed to prevent the model from exploiting the reward system.

03:09:39 preview of things to come

The speaker previews future advancements in LLM technology, including strong multimodal capabilities.
Upcoming models are expected to handle text, audio, and images in a unified system.
Further improvements in managing long-duration tasks and dynamic real-time learning are anticipated.

03:15:15 keeping track of LLMs

Resources such as leaderboards (e.g., El Marina) and AI newsletters help monitor model performance.
These tools allow users to compare models and stay updated with rapid advancements in the field.
Keeping informed is essential in this fast-evolving area of technology.

03:18:34 where to find LLMs

Guidance is provided on accessing various LLMs from both proprietary and open-weight providers.
Proprietary models can be found via platforms like OpenAI and Google, while open models (e.g., DeepSeek, Llama) are available on community inference sites.
Users are encouraged to experiment with multiple options to find the best fit for their needs.

03:21:46 grand summary

The video recaps the complete training pipeline – from pretraining to supervised fine-tuning and reinforcement learning.
It emphasizes the emergence of advanced chain-of-thought reasoning and acknowledges the limitations of LLMs.
Despite their power, LLMs are statistical simulators that require human oversight, serving as invaluable tools when used responsibly