# What are LLMs?
In the previous section we learned that each Agent needs **an AI Model at its core**, and that LLMs are the most common type of AI models for this purpose.
Now we will learn what LLMs are and how they power Agents.
This section offers a concise technical explanation of the use of LLMs. If you want to dive deeper, you can check our free Natural Language Processing Course.
## What is a Large Language Model?
An LLM is a type of AI model that excels at **understanding and generating human language**. They are trained on vast amounts of text data, allowing them to learn patterns, structure, and even nuance in language. These models typically consist of many millions of parameters.
Most LLMs nowadays are **built on the Transformer architecture**—a deep learning architecture based on the "Attention" algorithm, that has gained significant interest since the release of BERT from Google in 2018.
| Model | Provider | EOS Token | Functionality |
|---|---|---|---|
| GPT4 | OpenAI | <|endoftext|> |
End of message text |
| Llama 3 | Meta (Facebook AI Research) | <|eot_id|> |
End of sequence |
| Deepseek-R1 | DeepSeek | <|end_of_sentence|> |
End of message text |
| SmolLM2 | Hugging Face | <|im_end|> |
End of instruction or message |
| Gemma | <end_of_turn> |
End of conversation turn |
In other words, an LLM will decode text until it reaches the EOS. But what happens during a single decoding loop?
While the full process can be quite technical for the purpose of learning agents, here's a brief overview:
- Once the input text is **tokenized**, the model computes a representation of the sequence that captures information about the meaning and the position of each token in the input sequence.
- This representation goes into the model, which outputs scores that rank the likelihood of each token in its vocabulary as being the next one in the sequence.
Based on these scores, we have multiple strategies to select the tokens to complete the sentence.
- The easiest decoding strategy would be to always take the token with the maximum score.
You can interact with the decoding process yourself with SmolLM2 in this Space (remember, it decodes until reaching an **EOS** token which is **<|im_end|>** for this model):
- But there are more advanced decoding strategies. For example, *beam search* explores multiple candidate sequences to find the one with the maximum total score–even if some individual tokens have lower scores.
If you want to know more about decoding, you can take a look at the [NLP course](https://huggingface.co/learn/nlp-course).
## Attention is all you need
A key aspect of the Transformer architecture is **Attention**. When predicting the next word,
not every word in a sentence is equally important; words like "France" and "capital" in the sentence *"The capital of France is ..."* carry the most meaning.
This process of identifying the most relevant words to predict the next token has proven to be incredibly effective.
Although the basic principle of LLMs—predicting the next token—has remained consistent since GPT-2, there have been significant advancements in scaling neural networks and making the attention mechanism work for longer and longer sequences.
If you've interacted with LLMs, you're probably familiar with the term *context length*, which refers to the maximum number of tokens the LLM can process, and the maximum _attention span_ it has.
## Prompting the LLM is important
Considering that the only job of an LLM is to predict the next token by looking at every input token, and to choose which tokens are "important", the wording of your input sequence is very important.
The input sequence you provide an LLM is called _a prompt_. Careful design of the prompt makes it easier **to guide the generation of the LLM toward the desired output**.
## How are LLMs trained?
LLMs are trained on large datasets of text, where they learn to predict the next word in a sequence through a self-supervised or masked language modeling objective.
From this unsupervised learning, the model learns the structure of the language and **underlying patterns in text, allowing the model to generalize to unseen data**.
After this initial _pre-training_, LLMs can be fine-tuned on a supervised learning objective to perform specific tasks. For example, some models are trained for conversational structures or tool usage, while others focus on classification or code generation.
## How can I use LLMs?
You have two main options:
1. **Run Locally** (if you have sufficient hardware).
2. **Use a Cloud/API** (e.g., via the Hugging Face Serverless Inference API).
Throughout this course, we will primarily use models via APIs on the Hugging Face Hub. Later on, we will explore how to run these models locally on your hardware.
## How are LLMs used in AI Agents?
LLMs are a key component of AI Agents, **providing the foundation for understanding and generating human language**.
They can interpret user instructions, maintain context in conversations, define a plan and decide which tools to use.
We will explore these steps in more detail in this Unit, but for now, what you need to understand is that the LLM is **the brain of the Agent**.
---
That was a lot of information! We've covered the basics of what LLMs are, how they function, and their role in powering AI agents.
If you'd like to dive even deeper into the fascinating world of language models and natural language processing, don't hesitate to check out our free NLP course.
Now that we understand how LLMs work, it's time to see **how LLMs structure their generations in a conversational context**.
To run this notebook, **you need a Hugging Face token** that you can get from https://hf.co/settings/tokens.
For more information on how to run Jupyter Notebooks, checkout Jupyter Notebooks on the Hugging Face Hub.
You also need to request access to the Meta Llama models.