Files
claude-cookbooks/third_party/LlamaIndex/Basic_RAG_With_LlamaIndex.ipynb
Alex Notov 8d1c93365b Revert CLAUDE_API_KEY to ANTHROPIC_API_KEY throughout the repository
Reverted all instances of CLAUDE_API_KEY back to ANTHROPIC_API_KEY to maintain
compatibility with existing infrastructure and GitHub secrets. This affects:
- Environment variable examples (.env.example files)
- Python scripts and notebooks
- Documentation and README files
- Evaluation scripts and test files

Other naming changes (Claude API, Claude Console, Claude Docs, Claude Cookbook) remain intact.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-16 17:02:29 -06:00

404 lines
9.2 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "813NspGRKhc2"
},
"source": [
"# RAG Pipeline with LlamaIndex\n",
"\n",
"In this notebook we will look into building Basic RAG Pipeline with LlamaIndex. The pipeline has following steps.\n",
"\n",
"1. Setup LLM and Embedding Model.\n",
"2. Download Data.\n",
"3. Load Data.\n",
"4. Index Data.\n",
"5. Create Query Engine.\n",
"6. Querying."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qYHCYDecKDRZ"
},
"source": [
"### Installation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "mLTzvn__ldjd"
},
"outputs": [],
"source": [
"!pip install llama-index\n",
"!pip install llama-index-llms-anthropic\n",
"!pip install llama-index-embeddings-huggingface"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Iu-wU44BKF9c"
},
"source": [
"### Setup API Keys"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "ku5rkxtIlpCs"
},
"outputs": [],
"source": [
"import os\n",
"os.environ['ANTHROPIC_API_KEY'] = 'YOUR Claude API KEY'"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jJTRVhkJKH4r"
},
"source": [
"### Setup LLM and Embedding model\n",
"\n",
"We will use anthropic latest released `Claude 3 Opus` models"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "fQ0tqJL_mSe0"
},
"outputs": [],
"source": [
"from llama_index.llms.anthropic import Anthropic\n",
"from llama_index.embeddings.huggingface import HuggingFaceEmbedding"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "snc2jpj4nlXJ"
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "74a676bbd1d04390b4449a42b98a4428",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"config.json: 0%| | 0.00/777 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "779eefefc7dc49bb9244dc54822d2195",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"model.safetensors: 0%| | 0.00/438M [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "167e22b0490c4c17a6a2ac3c1861e75a",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"tokenizer_config.json: 0%| | 0.00/366 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "7f0e9e699cfb47b2bbe92756e95131ac",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"vocab.txt: 0%| | 0.00/232k [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "2623fec06f4c4fcd8fbaa496a0652b5e",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"tokenizer.json: 0%| | 0.00/711k [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "33d6ff6e06214af08c3474e2c9daf830",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"special_tokens_map.json: 0%| | 0.00/125 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"llm = Anthropic(temperature=0.0, model='claude-3-opus-20240229')\n",
"embed_model = HuggingFaceEmbedding(model_name=\"BAAI/bge-base-en-v1.5\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "YmN2FEQFnx6Y"
},
"outputs": [],
"source": [
"from llama_index.core import Settings\n",
"Settings.llm = llm\n",
"Settings.embed_model = embed_model\n",
"Settings.chunk_size = 512"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BqdSyD2_KK-m"
},
"source": [
"### Download Data"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "zC7CD222n72t",
"outputId": "42c54b8a-bec9-4c2e-9cd5-97387f7011eb"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2024-03-08 06:51:30-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt\n",
"Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...\n",
"Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 75042 (73K) [text/plain]\n",
"Saving to: data/paul_graham/paul_graham_essay.txt\n",
"\n",
"data/paul_graham/pa 100%[===================>] 73.28K --.-KB/s in 0.002s \n",
"\n",
"2024-03-08 06:51:30 (34.6 MB/s) - data/paul_graham/paul_graham_essay.txt saved [75042/75042]\n",
"\n"
]
}
],
"source": [
"!mkdir -p 'data/paul_graham/'\n",
"!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "Ea9GbN2poO3V"
},
"outputs": [],
"source": [
"from llama_index.core import (\n",
" VectorStoreIndex,\n",
" SimpleDirectoryReader,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2dOGl_SKKUNH"
},
"source": [
"### Load Data"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"id": "Q4hWhnhxoUBj"
},
"outputs": [],
"source": [
"documents = SimpleDirectoryReader(\"./data/paul_graham\").load_data()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XM3-kLhdKRk1"
},
"source": [
"### Index Data"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"id": "yvblCnWYKSrh"
},
"outputs": [],
"source": [
"index = VectorStoreIndex.from_documents(\n",
" documents,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "a17Tz644KZ_P"
},
"source": [
"### Create Query Engine"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"id": "LIT8kqYKoaRq"
},
"outputs": [],
"source": [
"query_engine = index.as_query_engine(similarity_top_k=3)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "n6ODGRTxKd-u"
},
"source": [
"### Test Query"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"id": "igdrBclbKYrJ"
},
"outputs": [],
"source": [
"response = query_engine.query(\"What did author do growing up?\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ap-6RUvSozgE",
"outputId": "37ba0327-afe7-4f1e-d862-0412085954c8"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Based on the information provided, the author worked on two main things outside of school before college: writing and programming.\n",
"\n",
"For writing, he wrote short stories as a beginning writer, though he felt they were awful, with hardly any plot and just characters with strong feelings.\n",
"\n",
"In terms of programming, in 9th grade he tried writing his first programs on an IBM 1401 computer that his school district used. He and his friend got permission to use it, programming in an early version of Fortran using punch cards. However, he had difficulty figuring out what to actually do with the computer at that stage given the limited inputs available.\n"
]
}
],
"source": [
"print(response)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ri47de5wpMTo"
},
"outputs": [],
"source": []
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"vscode": {
"interpreter": {
"hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}