LLMLingua/examples/RAGLlamaIndex.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1972a352-a0e3-41b7-81dc-dd4ae2b890c3",
   "metadata": {},
   "source": [
    "## Retrieval-Augmented Generation (RAG) using LlamaIndex"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05d999bc-83a3-454f-a8a4-44cbff1fcedc",
   "metadata": {},
   "source": [
    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/microsoft/LLMLingua/blob/main/examples/RAGLlamaIndex.ipynb\">\r\n",
    "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\r\n",
    "</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a06035dc-f812-419b-bd08-538c2e00cdda",
   "metadata": {},
   "source": [
    "[**LlamaIndex**](https://github.com/run-llama/llama_index) is a widely used RAG framework. **LLMLingua** and **LongLLMLingua** have also been incorporated into the [LlamaIndex pipeline](https://github.com/run-llama/llama_index), which allows for more convenient use of LLMLingua-related technologies in RAG scenarios."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6137de2-0e3f-4962-860c-680da4df2eae",
   "metadata": {},
   "source": [
    "More specifically, [**LongLLMLinguaPostprocessor**](https://github.com/run-llama/llama_index/blob/main/llama_index/postprocessor/longllmlingua.py#L16) can be used as a **Postprocessor** in **LlamaIndex** by invoking it, with arguments consistent with those in the [**PromptCompressor**](https://github.com/microsoft/LLMLingua/blob/main/llmlingua/prompt_compressor.py) of [**LLMLingua**](https://github.com/microsoft/LLMLingua).\n",
    "You can call the corresponding compression algorithms in LLMLingua and the question-aware prompt compression method in LongLLMLingua."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44f78b07-0a11-4c71-86cb-213a32c4fd7a",
   "metadata": {},
   "source": [
    "For examples,\n",
    "```python\n",
    "from llama_index.query_engine import RetrieverQueryEngine\n",
    "from llama_index.response_synthesizers import CompactAndRefine\n",
    "from llama_index.indices.postprocessor import LongLLMLinguaPostprocessor\n",
    "\n",
    "node_postprocessor = LongLLMLinguaPostprocessor(\n",
    "    instruction_str=\"Given the context, please answer the final question\",\n",
    "    target_token=300,\n",
    "    rank_method=\"longllmlingua\",\n",
    "    additional_compress_kwargs={\n",
    "        \"condition_compare\": True,\n",
    "        \"condition_in_question\": \"after\",\n",
    "        \"context_budget\": \"+100\",\n",
    "        \"reorder_context\": \"sort\",  # enable document reorder\n",
    "        \"dynamic_context_compression_ratio\": 0.4, # enable dynamic compression ratio\n",
    "    },\n",
    ")\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe3ed1ce-d38d-4048-9db6-9707b55dc642",
   "metadata": {},
   "source": [
    "Retrieval-Augmented Generation (RAG) is a powerful and popular technique that applies specialized knowledge to large language models (LLMs). However, traditional RAG methods tend to have increasingly long prompts, sometimes exceeding **40k**, which can result in high financial and latency costs. Moreover, the decreased information density within the prompts can lead to performance degradation in LLMs, such as the \"lost in the middle\" issue."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae003ead-2f07-44a4-b641-2e33be920dd9",
   "metadata": {},
   "source": [
    "<center><img width=\"800\" src=\"../images/LongLLMLingua_Motivation.png\"></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b39b33f-5860-4825-8f00-d60aed0dce86",
   "metadata": {},
   "source": [
    "To address this, we propose [**LongLLMLingua**](https://arxiv.org/abs/2310.06839), which specifically tackles the low information density problem in long context scenarios via prompt compression, making it particularly suitable for RAG tasks. The main ideas involve a two-stage compression process, as shown by the  <font color='red'>**red line**</font>, which significantly improves the original curve:\n",
    "\n",
    "- Coarse-grained compression through document-level perplexity;\n",
    "- Fine-grained compression of the remaining text using token perplexity;"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c748f877-4bbf-443c-b72b-332be1df6f1a",
   "metadata": {},
   "source": [
    "Instead of fighting against positional effects, we aim to utilize them to our advantage through document reordering, as illustrated by the  <font color='green'>**green line**</font>. In this approach, the most critical passages are placed at the beginning and the end of the context. Furthermore, the entire process becomes more **cost-effective and faster** since it only requires handling **1/4** of the original context."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18422597-687a-43aa-a6ed-ce6244d0eb55",
   "metadata": {},
   "source": [
    "### PG's essay"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51a7accd-5ec2-4ed2-9582-1afdb441a998",
   "metadata": {},
   "source": [
    "Next, we will demonstrate the use of LongLLMLingua on the **PG's essay** dataset in LlamaIndex pipeline, which effectively alleviates the \"lost in the middle\" issue."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "a970a901-11bd-43af-a8bc-7fb2fc6a1a07",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Defaulting to user installation because normal site-packages is not writeable\n",
      "Requirement already satisfied: llmlingua in /home/hjiang/Code/github/LLMLingua (0.1.2)\n",
      "Requirement already satisfied: llama-index in /home/hjiang/.local/lib/python3.9/site-packages (0.8.50)\n",
      "Requirement already satisfied: nltk in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (3.8.1)\n",
      "Requirement already satisfied: numpy in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (1.23.5)\n",
      "Requirement already satisfied: tiktoken in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (0.4.0)\n",
      "Requirement already satisfied: torch in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (1.13.1+cu116)\n",
      "Requirement already satisfied: transformers>=4.26.0 in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (4.34.1)\n",
      "Requirement already satisfied: SQLAlchemy[asyncio]>=1.4.49 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (2.0.22)\n",
      "Requirement already satisfied: dataclasses-json<0.6.0,>=0.5.7 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (0.5.14)\n",
      "Requirement already satisfied: deprecated>=1.2.9.3 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (1.2.14)\n",
      "Requirement already satisfied: fsspec>=2023.5.0 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (2023.6.0)\n",
      "Requirement already satisfied: langchain>=0.0.303 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (0.0.322)\n",
      "Requirement already satisfied: nest-asyncio<2.0.0,>=1.5.8 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (1.5.8)\n",
      "Requirement already satisfied: openai>=0.26.4 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (0.27.8)\n",
      "Requirement already satisfied: pandas in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (2.0.3)\n",
      "Requirement already satisfied: tenacity<9.0.0,>=8.2.0 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (8.2.3)\n",
      "Requirement already satisfied: typing-extensions>=4.5.0 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (4.7.1)\n",
      "Requirement already satisfied: typing-inspect>=0.8.0 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (0.9.0)\n",
      "Requirement already satisfied: urllib3<2 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (1.26.16)\n",
      "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /home/hjiang/.local/lib/python3.9/site-packages (from dataclasses-json<0.6.0,>=0.5.7->llama-index) (3.20.1)\n",
      "Requirement already satisfied: wrapt<2,>=1.10 in /home/hjiang/.local/lib/python3.9/site-packages (from deprecated>=1.2.9.3->llama-index) (1.15.0)\n",
      "Requirement already satisfied: PyYAML>=5.3 in /usr/lib/python3/dist-packages (from langchain>=0.0.303->llama-index) (5.3.1)\n",
      "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (3.8.5)\n",
      "Requirement already satisfied: anyio<4.0 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (3.7.1)\n",
      "Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (4.0.2)\n",
      "Requirement already satisfied: jsonpatch<2.0,>=1.33 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (1.33)\n",
      "Requirement already satisfied: langsmith<0.1.0,>=0.0.43 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (0.0.51)\n",
      "Requirement already satisfied: pydantic<3,>=1 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (1.10.12)\n",
      "Requirement already satisfied: requests<3,>=2 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (2.29.0)\n",
      "Requirement already satisfied: click in /home/hjiang/.local/lib/python3.9/site-packages (from nltk->llmlingua) (8.1.6)\n",
      "Requirement already satisfied: joblib in /home/hjiang/.local/lib/python3.9/site-packages (from nltk->llmlingua) (1.3.1)\n",
      "Requirement already satisfied: regex>=2021.8.3 in /home/hjiang/.local/lib/python3.9/site-packages (from nltk->llmlingua) (2023.6.3)\n",
      "Requirement already satisfied: tqdm in /home/hjiang/.local/lib/python3.9/site-packages (from nltk->llmlingua) (4.65.0)\n",
      "Requirement already satisfied: greenlet!=0.4.17 in /home/hjiang/.local/lib/python3.9/site-packages (from SQLAlchemy[asyncio]>=1.4.49->llama-index) (3.0.0)\n",
      "Requirement already satisfied: filelock in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (3.12.2)\n",
      "Requirement already satisfied: huggingface-hub<1.0,>=0.16.4 in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (0.16.4)\n",
      "Requirement already satisfied: packaging>=20.0 in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (23.0)\n",
      "Requirement already satisfied: tokenizers<0.15,>=0.14 in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (0.14.1)\n",
      "Requirement already satisfied: safetensors>=0.3.1 in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (0.3.1)\n",
      "Requirement already satisfied: mypy-extensions>=0.3.0 in /home/hjiang/.local/lib/python3.9/site-packages (from typing-inspect>=0.8.0->llama-index) (1.0.0)\n",
      "Requirement already satisfied: python-dateutil>=2.8.2 in /home/hjiang/.local/lib/python3.9/site-packages (from pandas->llama-index) (2.8.2)\n",
      "Requirement already satisfied: pytz>=2020.1 in /home/hjiang/.local/lib/python3.9/site-packages (from pandas->llama-index) (2023.3)\n",
      "Requirement already satisfied: tzdata>=2022.1 in /home/hjiang/.local/lib/python3.9/site-packages (from pandas->llama-index) (2023.3)\n",
      "Requirement already satisfied: attrs>=17.3.0 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.303->llama-index) (23.1.0)\n",
      "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.303->llama-index) (3.2.0)\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.303->llama-index) (6.0.4)\n",
      "Requirement already satisfied: yarl<2.0,>=1.0 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.303->llama-index) (1.9.2)\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.303->llama-index) (1.4.0)\n",
      "Requirement already satisfied: aiosignal>=1.1.2 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.303->llama-index) (1.3.1)\n",
      "Requirement already satisfied: idna>=2.8 in /usr/lib/python3/dist-packages (from anyio<4.0->langchain>=0.0.303->llama-index) (2.8)\n",
      "Requirement already satisfied: sniffio>=1.1 in /home/hjiang/.local/lib/python3.9/site-packages (from anyio<4.0->langchain>=0.0.303->llama-index) (1.3.0)\n",
      "Requirement already satisfied: exceptiongroup in /home/hjiang/.local/lib/python3.9/site-packages (from anyio<4.0->langchain>=0.0.303->llama-index) (1.1.2)\n",
      "Requirement already satisfied: jsonpointer>=1.9 in /usr/lib/python3/dist-packages (from jsonpatch<2.0,>=1.33->langchain>=0.0.303->llama-index) (2.0)\n",
      "Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas->llama-index) (1.14.0)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests<3,>=2->langchain>=0.0.303->llama-index) (2019.11.28)\n",
      "\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.1\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3.9 -m pip install --upgrade pip\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "# Install dependency.\n",
    "!pip install llmlingua llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "cbbbf3de-a9d6-46cf-afab-dcb72a6154ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Using the OAI\n",
    "import openai\n",
    "openai.api_key = \"<insert_openai_key>\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "46506810-8565-43da-984b-d862c56b49c2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# or Using the AOAI\n",
    "import openai\n",
    "openai.api_key = \"<insert_openai_key>\"\n",
    "openai.api_base = \"https://xxxx.openai.azure.com/\"\n",
    "openai.api_type = 'azure'\n",
    "openai.api_version = '2023-05-15'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8676ffa-5117-44dc-9742-bb9ab1d56e0c",
   "metadata": {},
   "source": [
    "### Setup Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "bb349566-83d8-44ac-a683-b67ed9ddf7a6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2023-10-31 15:16:22--  https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1\n",
      "Resolving www.dropbox.com (www.dropbox.com)... 162.125.2.18, 2620:100:6017:18::a27d:212\n",
      "Connecting to www.dropbox.com (www.dropbox.com)|162.125.2.18|:443... connected.\n",
      "HTTP request sent, awaiting response... 302 Found\n",
      "Location: /s/dl/f6bmb19xdg0xedm/paul_graham_essay.txt [following]\n",
      "--2023-10-31 15:16:22--  https://www.dropbox.com/s/dl/f6bmb19xdg0xedm/paul_graham_essay.txt\n",
      "Reusing existing connection to www.dropbox.com:443.\n",
      "HTTP request sent, awaiting response... 302 Found\n",
      "Location: https://uc79cc99922e921397f441d519f7.dl.dropboxusercontent.com/cd/0/get/CGo-ddVpLM8dpEbGPhaDcZnqlmurexkVdlYv9jcpsjMI9xyxqtt-feE8m6nlMFwBfbWAp9oEfbf0YZC65uNupypod6w4ANXltrG3NpGWErO9j18UQuwqd2wr79FcGtg55HxuwN_2xElcqEPjH3zg8RZl/file?dl=1# [following]\n",
      "--2023-10-31 15:16:22--  https://uc79cc99922e921397f441d519f7.dl.dropboxusercontent.com/cd/0/get/CGo-ddVpLM8dpEbGPhaDcZnqlmurexkVdlYv9jcpsjMI9xyxqtt-feE8m6nlMFwBfbWAp9oEfbf0YZC65uNupypod6w4ANXltrG3NpGWErO9j18UQuwqd2wr79FcGtg55HxuwN_2xElcqEPjH3zg8RZl/file?dl=1\n",
      "Resolving uc79cc99922e921397f441d519f7.dl.dropboxusercontent.com (uc79cc99922e921397f441d519f7.dl.dropboxusercontent.com)... 162.125.2.15, 2620:100:6017:15::a27d:20f\n",
      "Connecting to uc79cc99922e921397f441d519f7.dl.dropboxusercontent.com (uc79cc99922e921397f441d519f7.dl.dropboxusercontent.com)|162.125.2.15|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 75047 (73K) [application/binary]\n",
      "Saving to: ‘paul_graham_essay.txt’\n",
      "\n",
      "paul_graham_essay.t 100%[===================>]  73.29K  --.-KB/s    in 0.03s   \n",
      "\n",
      "2023-10-31 15:16:23 (2.15 MB/s) - ‘paul_graham_essay.txt’ saved [75047/75047]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!wget \"https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1\" -O paul_graham_essay.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "cc17bbc5-86cb-4d15-a730-955af85a10b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import (\n",
    "    VectorStoreIndex,\n",
    "    SimpleDirectoryReader,\n",
    "    load_index_from_storage,\n",
    "    StorageContext,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "2a4f0fa1-fd32-468c-aa9d-4bee21d9dd89",
   "metadata": {},
   "outputs": [],
   "source": [
    "# load documents\n",
    "documents = SimpleDirectoryReader(\n",
    "    input_files=[\"paul_graham_essay.txt\"]\n",
    ").load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "01c16eb9-b9d1-4357-9647-e587633fbcdd",
   "metadata": {},
   "outputs": [],
   "source": [
    "index = VectorStoreIndex.from_documents(documents)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "71fad540-eac7-425c-9ca0-7886d0b9a1cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = index.as_retriever(similarity_top_k=10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "672cccf6-dc14-40a6-a057-cc6f2a3aeea0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# question = \"What did the author do growing up?\"\n",
    "# question = \"What did the author do during his time in YC?\"\n",
    "question = \"Where did the author go for art school?\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "ef94e951-7576-45d7-bf75-8f28e70598fd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Ground-truth Answer\n",
    "answer = \"RISD\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "5aa58abe-c2f8-4de0-b3af-c852f9ef9bdb",
   "metadata": {},
   "outputs": [],
   "source": [
    "contexts = retriever.retrieve(question)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "3973a921-0f52-4e77-a123-b0d06776cd4c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "10"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "context_list = [n.get_content() for n in contexts]\n",
    "len(context_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba1c6d52-dc87-434c-a41c-0bbc8a286504",
   "metadata": {},
   "source": [
    "### The response of Original prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "3d441f10-c5c7-4d45-b09a-717e536b36bf",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author went to the Rhode Island School of Design (RISD) for art school.\n"
     ]
    }
   ],
   "source": [
    "# The response from original prompt\n",
    "from llama_index.llms import OpenAI\n",
    "\n",
    "llm = OpenAI(model=\"gpt-3.5-turbo-16k\")\n",
    "prompt = \"\\n\\n\".join(context_list + [question])\n",
    "\n",
    "response = llm.complete(prompt)\n",
    "print(str(response))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9aa90492-8ad1-4a89-85c5-26b8472f1ff0",
   "metadata": {},
   "source": [
    "### The response of Compressed Prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "fa638dec-c9ec-4dce-9dac-d768145de714",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7b37c874e2d34f2cbbd88f3556e42c80",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/hjiang/.local/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:362: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.\n",
      "  warnings.warn(\n",
      "/home/hjiang/.local/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:367: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "# Setup LLMLingua\n",
    "from llama_index.query_engine import RetrieverQueryEngine\n",
    "from llama_index.response_synthesizers import CompactAndRefine\n",
    "from llama_index.indices.postprocessor import LongLLMLinguaPostprocessor\n",
    "\n",
    "node_postprocessor = LongLLMLinguaPostprocessor(\n",
    "    instruction_str=\"Given the context, please answer the final question\",\n",
    "    target_token=300,\n",
    "    rank_method=\"longllmlingua\",\n",
    "    additional_compress_kwargs={\n",
    "        \"condition_compare\": True,\n",
    "        \"condition_in_question\": \"after\",\n",
    "        \"context_budget\": \"+100\",\n",
    "        \"reorder_context\": \"sort\",  # enable document reorder,\n",
    "        \"dynamic_context_compression_ratio\": 0.3,\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3438c76e-5bf9-4db6-97a7-69f5d9be0707",
   "metadata": {},
   "source": [
    "We show you how to compose a `retriever` + `prompt compressor` + `query engine` into the **RAG** pipeline."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e0df62f-be5f-43f5-9d53-0d31cfcc5c81",
   "metadata": {},
   "source": [
    "#### Method One: Call Step-by-Step"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "be610922-c84d-4ed7-91a3-52aff193bc56",
   "metadata": {},
   "outputs": [],
   "source": [
    "retrieved_nodes = retriever.retrieve(question)\n",
    "synthesizer = CompactAndRefine()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "f2239bab-0c64-4798-a435-f98f9e09107d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.indices.query.schema import QueryBundle\n",
    "\n",
    "# outline steps in RetrieverQueryEngine for clarity:\n",
    "# postprocess (compress), synthesize\n",
    "new_retrieved_nodes = node_postprocessor.postprocess_nodes(\n",
    "    retrieved_nodes, query_bundle=QueryBundle(query_str=question)\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "3ac375b9-ee42-4b94-a9af-ce37bf62e0ec",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "next Rtm's advice hadn' included anything that. I wanted to do something completely different, so I decided I'd paint. I wanted to how good I could get if I focused on it. the day after stopped on YC, I painting. I was rusty and it took a while to get back into shape, but it was at least completely engaging.1]\n",
      "\n",
      "I wanted to back RISD, was now broke and RISD was very expensive so decided job for a year and return RISD the fall. I got one at Interleaf, which made software for creating documents. You like Microsoft Word? Exactly That was I low end software tends to high. Interleaf still had a few years to live yet. []\n",
      "\n",
      " the Accademia wasn't, and my money was running out, end year back to the\n",
      " lot the color class I tookD, but otherwise I was basically myself to do that for in993 I dropped I aroundidence bit then my friend Par did me a big A rent-partment building New York. Did I want it Itt more my place, and York be where the artists. wanted [For when you that ofs you big painting of this type hanging in the apartment of a hedge fund manager, you know he paid millions of dollars for it. That's not always why artists have a signature style, but it's usually why buyers pay a lot for such work. [6]\n",
      "\n",
      "Original Tokens: 10719\n",
      "Compressed Tokens: 308\n",
      "Comressed Ratio: 34.80x\n"
     ]
    }
   ],
   "source": [
    "original_contexts = \"\\n\\n\".join([n.get_content() for n in retrieved_nodes])\n",
    "compressed_contexts = \"\\n\\n\".join([n.get_content() for n in new_retrieved_nodes])\n",
    "\n",
    "original_tokens = node_postprocessor._llm_lingua.get_token_length(original_contexts)\n",
    "compressed_tokens = node_postprocessor._llm_lingua.get_token_length(compressed_contexts)\n",
    "\n",
    "print(compressed_contexts)\n",
    "print()\n",
    "print(\"Original Tokens:\", original_tokens)\n",
    "print(\"Compressed Tokens:\", compressed_tokens)\n",
    "print(\"Comressed Ratio:\", f\"{original_tokens/(compressed_tokens + 1e-5):.2f}x\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "c72e8559-12a7-4f9d-b3ec-b21f0241aff5",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = synthesizer.synthesize(question, new_retrieved_nodes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "5c26eb3e-bcc9-4d1c-9e9e-1e511b22831f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author went to RISD for art school.\n"
     ]
    }
   ],
   "source": [
    "print(str(response))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1eb3a56-5ba4-4a98-a4b7-47e6b3fb0027",
   "metadata": {},
   "source": [
    "#### Method Two: End-to-End Call"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "80042d53-97f3-4719-b95b-38c47b24f075",
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever_query_engine = RetrieverQueryEngine.from_args(\n",
    "    retriever, node_postprocessors=[node_postprocessor]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "6eb1a345-6f07-48b7-aab7-71c0da772839",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = retriever_query_engine.query(question)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "0737ece9-0239-4e3e-adf6-d39cafc85a05",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author went to RISD for art school.\n"
     ]
    }
   ],
   "source": [
    "print(str(response))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}