{ "cells": [ { "cell_type": "markdown", "id": "1972a352-a0e3-41b7-81dc-dd4ae2b890c3", "metadata": {}, "source": [ "## Retrieval-Augmented Generation (RAG) using LlamaIndex" ] }, { "cell_type": "markdown", "id": "05d999bc-83a3-454f-a8a4-44cbff1fcedc", "metadata": {}, "source": [ "\r\n", " \"Open\r\n", "" ] }, { "cell_type": "markdown", "id": "a06035dc-f812-419b-bd08-538c2e00cdda", "metadata": {}, "source": [ "[**LlamaIndex**](https://github.com/run-llama/llama_index) is a widely used RAG framework. **LLMLingua** and **LongLLMLingua** have also been incorporated into the [LlamaIndex pipeline](https://github.com/run-llama/llama_index), which allows for more convenient use of LLMLingua-related technologies in RAG scenarios." ] }, { "cell_type": "markdown", "id": "a6137de2-0e3f-4962-860c-680da4df2eae", "metadata": {}, "source": [ "More specifically, [**LongLLMLinguaPostprocessor**](https://github.com/run-llama/llama_index/blob/main/llama_index/postprocessor/longllmlingua.py#L16) can be used as a **Postprocessor** in **LlamaIndex** by invoking it, with arguments consistent with those in the [**PromptCompressor**](https://github.com/microsoft/LLMLingua/blob/main/llmlingua/prompt_compressor.py) of [**LLMLingua**](https://github.com/microsoft/LLMLingua).\n", "You can call the corresponding compression algorithms in LLMLingua and the question-aware prompt compression method in LongLLMLingua." ] }, { "cell_type": "markdown", "id": "44f78b07-0a11-4c71-86cb-213a32c4fd7a", "metadata": {}, "source": [ "For examples,\n", "```python\n", "from llama_index.query_engine import RetrieverQueryEngine\n", "from llama_index.response_synthesizers import CompactAndRefine\n", "from llama_index.indices.postprocessor import LongLLMLinguaPostprocessor\n", "\n", "node_postprocessor = LongLLMLinguaPostprocessor(\n", " instruction_str=\"Given the context, please answer the final question\",\n", " target_token=300,\n", " rank_method=\"longllmlingua\",\n", " additional_compress_kwargs={\n", " \"condition_compare\": True,\n", " \"condition_in_question\": \"after\",\n", " \"context_budget\": \"+100\",\n", " \"reorder_context\": \"sort\", # enable document reorder\n", " \"dynamic_context_compression_ratio\": 0.4, # enable dynamic compression ratio\n", " },\n", ")\n", "```" ] }, { "cell_type": "markdown", "id": "fe3ed1ce-d38d-4048-9db6-9707b55dc642", "metadata": {}, "source": [ "Retrieval-Augmented Generation (RAG) is a powerful and popular technique that applies specialized knowledge to large language models (LLMs). However, traditional RAG methods tend to have increasingly long prompts, sometimes exceeding **40k**, which can result in high financial and latency costs. Moreover, the decreased information density within the prompts can lead to performance degradation in LLMs, such as the \"lost in the middle\" issue." ] }, { "cell_type": "markdown", "id": "ae003ead-2f07-44a4-b641-2e33be920dd9", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "id": "0b39b33f-5860-4825-8f00-d60aed0dce86", "metadata": {}, "source": [ "To address this, we propose [**LongLLMLingua**](https://arxiv.org/abs/2310.06839), which specifically tackles the low information density problem in long context scenarios via prompt compression, making it particularly suitable for RAG tasks. The main ideas involve a two-stage compression process, as shown by the **red line**, which significantly improves the original curve:\n", "\n", "- Coarse-grained compression through document-level perplexity;\n", "- Fine-grained compression of the remaining text using token perplexity;" ] }, { "cell_type": "markdown", "id": "c748f877-4bbf-443c-b72b-332be1df6f1a", "metadata": {}, "source": [ "Instead of fighting against positional effects, we aim to utilize them to our advantage through document reordering, as illustrated by the **green line**. In this approach, the most critical passages are placed at the beginning and the end of the context. Furthermore, the entire process becomes more **cost-effective and faster** since it only requires handling **1/4** of the original context." ] }, { "cell_type": "markdown", "id": "18422597-687a-43aa-a6ed-ce6244d0eb55", "metadata": {}, "source": [ "### PG's essay" ] }, { "cell_type": "markdown", "id": "51a7accd-5ec2-4ed2-9582-1afdb441a998", "metadata": {}, "source": [ "Next, we will demonstrate the use of LongLLMLingua on the **PG's essay** dataset in LlamaIndex pipeline, which effectively alleviates the \"lost in the middle\" issue." ] }, { "cell_type": "code", "execution_count": 2, "id": "a970a901-11bd-43af-a8bc-7fb2fc6a1a07", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Defaulting to user installation because normal site-packages is not writeable\n", "Requirement already satisfied: llmlingua in /home/hjiang/Code/github/LLMLingua (0.1.2)\n", "Requirement already satisfied: llama-index in /home/hjiang/.local/lib/python3.9/site-packages (0.8.50)\n", "Requirement already satisfied: nltk in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (3.8.1)\n", "Requirement already satisfied: numpy in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (1.23.5)\n", "Requirement already satisfied: tiktoken in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (0.4.0)\n", "Requirement already satisfied: torch in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (1.13.1+cu116)\n", "Requirement already satisfied: transformers>=4.26.0 in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (4.34.1)\n", "Requirement already satisfied: SQLAlchemy[asyncio]>=1.4.49 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (2.0.22)\n", "Requirement already satisfied: dataclasses-json<0.6.0,>=0.5.7 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (0.5.14)\n", "Requirement already satisfied: deprecated>=1.2.9.3 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (1.2.14)\n", "Requirement already satisfied: fsspec>=2023.5.0 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (2023.6.0)\n", "Requirement already satisfied: langchain>=0.0.303 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (0.0.322)\n", "Requirement already satisfied: nest-asyncio<2.0.0,>=1.5.8 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (1.5.8)\n", "Requirement already satisfied: openai>=0.26.4 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (0.27.8)\n", "Requirement already satisfied: pandas in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (2.0.3)\n", "Requirement already satisfied: tenacity<9.0.0,>=8.2.0 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (8.2.3)\n", "Requirement already satisfied: typing-extensions>=4.5.0 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (4.7.1)\n", "Requirement already satisfied: typing-inspect>=0.8.0 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (0.9.0)\n", "Requirement already satisfied: urllib3<2 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (1.26.16)\n", "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /home/hjiang/.local/lib/python3.9/site-packages (from dataclasses-json<0.6.0,>=0.5.7->llama-index) (3.20.1)\n", "Requirement already satisfied: wrapt<2,>=1.10 in /home/hjiang/.local/lib/python3.9/site-packages (from deprecated>=1.2.9.3->llama-index) (1.15.0)\n", "Requirement already satisfied: PyYAML>=5.3 in /usr/lib/python3/dist-packages (from langchain>=0.0.303->llama-index) (5.3.1)\n", "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (3.8.5)\n", "Requirement already satisfied: anyio<4.0 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (3.7.1)\n", "Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (4.0.2)\n", "Requirement already satisfied: jsonpatch<2.0,>=1.33 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (1.33)\n", "Requirement already satisfied: langsmith<0.1.0,>=0.0.43 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (0.0.51)\n", "Requirement already satisfied: pydantic<3,>=1 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (1.10.12)\n", "Requirement already satisfied: requests<3,>=2 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (2.29.0)\n", "Requirement already satisfied: click in /home/hjiang/.local/lib/python3.9/site-packages (from nltk->llmlingua) (8.1.6)\n", "Requirement already satisfied: joblib in /home/hjiang/.local/lib/python3.9/site-packages (from nltk->llmlingua) (1.3.1)\n", "Requirement already satisfied: regex>=2021.8.3 in /home/hjiang/.local/lib/python3.9/site-packages (from nltk->llmlingua) (2023.6.3)\n", "Requirement already satisfied: tqdm in /home/hjiang/.local/lib/python3.9/site-packages (from nltk->llmlingua) (4.65.0)\n", "Requirement already satisfied: greenlet!=0.4.17 in /home/hjiang/.local/lib/python3.9/site-packages (from SQLAlchemy[asyncio]>=1.4.49->llama-index) (3.0.0)\n", "Requirement already satisfied: filelock in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (3.12.2)\n", "Requirement already satisfied: huggingface-hub<1.0,>=0.16.4 in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (0.16.4)\n", "Requirement already satisfied: packaging>=20.0 in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (23.0)\n", "Requirement already satisfied: tokenizers<0.15,>=0.14 in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (0.14.1)\n", "Requirement already satisfied: safetensors>=0.3.1 in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (0.3.1)\n", "Requirement already satisfied: mypy-extensions>=0.3.0 in /home/hjiang/.local/lib/python3.9/site-packages (from typing-inspect>=0.8.0->llama-index) (1.0.0)\n", "Requirement already satisfied: python-dateutil>=2.8.2 in /home/hjiang/.local/lib/python3.9/site-packages (from pandas->llama-index) (2.8.2)\n", "Requirement already satisfied: pytz>=2020.1 in /home/hjiang/.local/lib/python3.9/site-packages (from pandas->llama-index) (2023.3)\n", "Requirement already satisfied: tzdata>=2022.1 in /home/hjiang/.local/lib/python3.9/site-packages (from pandas->llama-index) (2023.3)\n", "Requirement already satisfied: attrs>=17.3.0 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.303->llama-index) (23.1.0)\n", "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.303->llama-index) (3.2.0)\n", "Requirement already satisfied: multidict<7.0,>=4.5 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.303->llama-index) (6.0.4)\n", "Requirement already satisfied: yarl<2.0,>=1.0 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.303->llama-index) (1.9.2)\n", "Requirement already satisfied: frozenlist>=1.1.1 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.303->llama-index) (1.4.0)\n", "Requirement already satisfied: aiosignal>=1.1.2 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.303->llama-index) (1.3.1)\n", "Requirement already satisfied: idna>=2.8 in /usr/lib/python3/dist-packages (from anyio<4.0->langchain>=0.0.303->llama-index) (2.8)\n", "Requirement already satisfied: sniffio>=1.1 in /home/hjiang/.local/lib/python3.9/site-packages (from anyio<4.0->langchain>=0.0.303->llama-index) (1.3.0)\n", "Requirement already satisfied: exceptiongroup in /home/hjiang/.local/lib/python3.9/site-packages (from anyio<4.0->langchain>=0.0.303->llama-index) (1.1.2)\n", "Requirement already satisfied: jsonpointer>=1.9 in /usr/lib/python3/dist-packages (from jsonpatch<2.0,>=1.33->langchain>=0.0.303->llama-index) (2.0)\n", "Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas->llama-index) (1.14.0)\n", "Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests<3,>=2->langchain>=0.0.303->llama-index) (2019.11.28)\n", "\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3.9 -m pip install --upgrade pip\u001b[0m\n" ] } ], "source": [ "# Install dependency.\n", "!pip install llmlingua llama-index" ] }, { "cell_type": "code", "execution_count": 7, "id": "cbbbf3de-a9d6-46cf-afab-dcb72a6154ec", "metadata": {}, "outputs": [], "source": [ "# Using the OAI\n", "import openai\n", "openai.api_key = \"\"" ] }, { "cell_type": "code", "execution_count": 3, "id": "46506810-8565-43da-984b-d862c56b49c2", "metadata": {}, "outputs": [], "source": [ "# or Using the AOAI\n", "import openai\n", "openai.api_key = \"\"\n", "openai.api_base = \"https://xxxx.openai.azure.com/\"\n", "openai.api_type = 'azure'\n", "openai.api_version = '2023-05-15'" ] }, { "cell_type": "markdown", "id": "f8676ffa-5117-44dc-9742-bb9ab1d56e0c", "metadata": {}, "source": [ "### Setup Data" ] }, { "cell_type": "code", "execution_count": 4, "id": "bb349566-83d8-44ac-a683-b67ed9ddf7a6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2023-10-31 15:16:22-- https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1\n", "Resolving www.dropbox.com (www.dropbox.com)... 162.125.2.18, 2620:100:6017:18::a27d:212\n", "Connecting to www.dropbox.com (www.dropbox.com)|162.125.2.18|:443... connected.\n", "HTTP request sent, awaiting response... 302 Found\n", "Location: /s/dl/f6bmb19xdg0xedm/paul_graham_essay.txt [following]\n", "--2023-10-31 15:16:22-- https://www.dropbox.com/s/dl/f6bmb19xdg0xedm/paul_graham_essay.txt\n", "Reusing existing connection to www.dropbox.com:443.\n", "HTTP request sent, awaiting response... 302 Found\n", "Location: https://uc79cc99922e921397f441d519f7.dl.dropboxusercontent.com/cd/0/get/CGo-ddVpLM8dpEbGPhaDcZnqlmurexkVdlYv9jcpsjMI9xyxqtt-feE8m6nlMFwBfbWAp9oEfbf0YZC65uNupypod6w4ANXltrG3NpGWErO9j18UQuwqd2wr79FcGtg55HxuwN_2xElcqEPjH3zg8RZl/file?dl=1# [following]\n", "--2023-10-31 15:16:22-- https://uc79cc99922e921397f441d519f7.dl.dropboxusercontent.com/cd/0/get/CGo-ddVpLM8dpEbGPhaDcZnqlmurexkVdlYv9jcpsjMI9xyxqtt-feE8m6nlMFwBfbWAp9oEfbf0YZC65uNupypod6w4ANXltrG3NpGWErO9j18UQuwqd2wr79FcGtg55HxuwN_2xElcqEPjH3zg8RZl/file?dl=1\n", "Resolving uc79cc99922e921397f441d519f7.dl.dropboxusercontent.com (uc79cc99922e921397f441d519f7.dl.dropboxusercontent.com)... 162.125.2.15, 2620:100:6017:15::a27d:20f\n", "Connecting to uc79cc99922e921397f441d519f7.dl.dropboxusercontent.com (uc79cc99922e921397f441d519f7.dl.dropboxusercontent.com)|162.125.2.15|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 75047 (73K) [application/binary]\n", "Saving to: ‘paul_graham_essay.txt’\n", "\n", "paul_graham_essay.t 100%[===================>] 73.29K --.-KB/s in 0.03s \n", "\n", "2023-10-31 15:16:23 (2.15 MB/s) - ‘paul_graham_essay.txt’ saved [75047/75047]\n", "\n" ] } ], "source": [ "!wget \"https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1\" -O paul_graham_essay.txt" ] }, { "cell_type": "code", "execution_count": 2, "id": "cc17bbc5-86cb-4d15-a730-955af85a10b2", "metadata": {}, "outputs": [], "source": [ "from llama_index import (\n", " VectorStoreIndex,\n", " SimpleDirectoryReader,\n", " load_index_from_storage,\n", " StorageContext,\n", ")" ] }, { "cell_type": "code", "execution_count": 3, "id": "2a4f0fa1-fd32-468c-aa9d-4bee21d9dd89", "metadata": {}, "outputs": [], "source": [ "# load documents\n", "documents = SimpleDirectoryReader(\n", " input_files=[\"paul_graham_essay.txt\"]\n", ").load_data()" ] }, { "cell_type": "code", "execution_count": 8, "id": "01c16eb9-b9d1-4357-9647-e587633fbcdd", "metadata": {}, "outputs": [], "source": [ "index = VectorStoreIndex.from_documents(documents)" ] }, { "cell_type": "code", "execution_count": 9, "id": "71fad540-eac7-425c-9ca0-7886d0b9a1cc", "metadata": {}, "outputs": [], "source": [ "retriever = index.as_retriever(similarity_top_k=10)" ] }, { "cell_type": "code", "execution_count": 10, "id": "672cccf6-dc14-40a6-a057-cc6f2a3aeea0", "metadata": {}, "outputs": [], "source": [ "# question = \"What did the author do growing up?\"\n", "# question = \"What did the author do during his time in YC?\"\n", "question = \"Where did the author go for art school?\"" ] }, { "cell_type": "code", "execution_count": 11, "id": "ef94e951-7576-45d7-bf75-8f28e70598fd", "metadata": {}, "outputs": [], "source": [ "# Ground-truth Answer\n", "answer = \"RISD\"" ] }, { "cell_type": "code", "execution_count": 12, "id": "5aa58abe-c2f8-4de0-b3af-c852f9ef9bdb", "metadata": {}, "outputs": [], "source": [ "contexts = retriever.retrieve(question)" ] }, { "cell_type": "code", "execution_count": 13, "id": "3973a921-0f52-4e77-a123-b0d06776cd4c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "context_list = [n.get_content() for n in contexts]\n", "len(context_list)" ] }, { "cell_type": "markdown", "id": "ba1c6d52-dc87-434c-a41c-0bbc8a286504", "metadata": {}, "source": [ "### The response of Original prompt" ] }, { "cell_type": "code", "execution_count": 14, "id": "3d441f10-c5c7-4d45-b09a-717e536b36bf", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The author went to the Rhode Island School of Design (RISD) for art school.\n" ] } ], "source": [ "# The response from original prompt\n", "from llama_index.llms import OpenAI\n", "\n", "llm = OpenAI(model=\"gpt-3.5-turbo-16k\")\n", "prompt = \"\\n\\n\".join(context_list + [question])\n", "\n", "response = llm.complete(prompt)\n", "print(str(response))" ] }, { "cell_type": "markdown", "id": "9aa90492-8ad1-4a89-85c5-26b8472f1ff0", "metadata": {}, "source": [ "### The response of Compressed Prompt" ] }, { "cell_type": "code", "execution_count": 15, "id": "fa638dec-c9ec-4dce-9dac-d768145de714", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "7b37c874e2d34f2cbbd88f3556e42c80", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Loading checkpoint shards: 0%| | 0/2 [00:00