Feature(LLMLingua): add examples

2024-01-23 02:05:46 +03:00 · 2023-11-01 02:56:51 +00:00
parent fd81e571cf
commit aa27ac795a
9 changed files with 2859 additions and 7 deletions
--- a/README.md
+++ b/README.md
@@ -16,18 +16,19 @@ https://github.com/microsoft/LLMLingua/assets/30883354/eb0ea70d-6d4c-4aa7-8977-6
 ## News
 - 🎈 We launched a [project page](https://llmlingua.com/) showcasing real-world case studies, including RAG, Online Meetings, CoT, and Code;
 - 👨‍🦯 We have launched a series of examples in the ['./examples'](./examples) folder, which include [RAG](./examples/RAG.ipynb), [Online Meeting](./examples/OnlineMeeting.ipynb), [CoT](./examples/CoT.ipynb), [Code](./examples/Code.ipynb), and [RAG using LlamaIndex](./examples/RAGLlamaIndex.ipynb);
 - 👾 LongLLMLingua has been incorporated into the [LlamaIndex pipeline](https://github.com/run-llama/llama_index/blob/main/llama_index/indices/postprocessor/longllmlingua.py), which is a widely used RAG framework.
 ## Tl;DR
 LLMLingua, that uses a well-trained small language model after alignment, such as GPT2-small or LLaMA-7B, to detect the unimportant tokens in the prompt and enable inference with the compressed prompt in black-box LLMs, achieving up to 20x compression with minimal performance loss.
-[LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models](https://arxiv.org/abs/2310.05736) (EMNLP 2023).<br>
+[LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models](https://arxiv.org/abs/2310.05736) (EMNLP 2023)<br>
 _Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang and Lili Qiu_
-LongLLMLingua is a method that enhances LLMs' ability to perceive key information in long-context scenarios using prompt compression, achieveing up to $28.5 in cost savings per 1,000 samples while also improving performance.
+LongLLMLingua is a method that enhances LLMs' ability to perceive key information in long-context scenarios using prompt compression, achieving up to $28.5 in cost savings per 1,000 samples while also improving performance.
-[LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression](https://arxiv.org/abs/2310.06839) (Under Review).<br>
+[LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression](https://arxiv.org/abs/2310.06839) (Under Review)<br>
 _Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang and Lili Qiu_
 ## 🎥 Overview
@@ -40,7 +41,7 @@ _Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang
 Large language models, such as ChatGPT and GPT-4, impress us with their amazing generalization and reasoning abilities, but they also come with some drawbacks, such as the prompt length limit and the prompt-based pricing scheme.
-![image](./images/LLMLingua_framework.png)
+![image](./images/motivation.png)
 Now you can use **LLMLingua** & **LongLLMLingua**!
@@ -51,12 +52,15 @@ A simple and efficient method to compress prompt up to **20x**.
 - ⚖️ **Robustness**, no need any training for the LLMs;
 - 🕵️ **Keeping** the original prompt knowledge like ICL, reasoning, etc.
 - 📜 **KV-Cache compression**, speedup inference;
 - 🪃 **GPT-4 can recovery all key information from the compressed prompt**.
-![image](./images/LongLLMLingua_Motivation.png)
+![image](./images/LLMLingua.png)
 ![image](./images/LongLLMLingua.png)
 ![image](./images/LLMLingua_demo.png)
-If you find this repo helpful, please cite the following paper:
+If you find this repo helpful, please cite the following papers:
 ```bibtex
@inproceedings{jiang-etal-2023-llmlingua,
@@ -103,7 +107,7 @@ compressed_prompt = llm_lingua.compress_prompt(prompt, instruction="", question=
 #  'saving': ', Saving $0.1 in GPT-4.'}
 ```
-You can refer to this [document](./DOCUMENT.md) for more recommendations on how to use LLMLingua effectively.
+You can refer to the [**examples**](./examples) to understand how to use **LLMLingua** and **LongLLMLingua** in practical scenarios, such as RAG, Online Meeting, CoT, Code, and RAG using LlamaIndex. Additionally, you can refer to the [**document**](./DOCUMENT.md) for more recommendations on how to use LLMLingua effectively.
 ## Frequently Asked Questions
--- a/examples/CoT.ipynb
+++ b/examples/CoT.ipynb
@@ -0,0 +1,585 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1972a352-a0e3-41b7-81dc-dd4ae2b890c3",
   "metadata": {},
   "source": [
    "## In-Context Learning, Chain-of-Thought, Reasoning"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05d999bc-83a3-454f-a8a4-44cbff1fcedc",
   "metadata": {},
   "source": [
    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/microsoft/LLMLingua/blob/main/examples/CoT.ipynb\">\r\n",
    "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\r\n",
    "</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe3ed1ce-d38d-4048-9db6-9707b55dc642",
   "metadata": {},
   "source": [
    "**In-Context Learning (ICL)** is a unique capability of large models, allowing Language Models (LLMs) to quickly learn relevant tasks from a few examples. Generally, ICL is used in combination with the Chain-of-Thought (CoT) approach, which involves describing the reasoning process in detail within the examples to enhance the LLMs' reasoning abilities. For instance, Yao et al.'s Complexity-Based Prompting improved GSM8K performance from 74.9 to 78.85 in GPT-3.5-Turbo-0301. However, this can also lead to increasingly lengthy prompts, such as the GSM8K prompt with a token count of **2,366**."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae003ead-2f07-44a4-b641-2e33be920dd9",
   "metadata": {},
   "source": [
    "<center><img width=\"800\" src=\"../images/LLMLingua_framework.png\"></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b39b33f-5860-4825-8f00-d60aed0dce86",
   "metadata": {},
   "source": [
    "To address this, we propose [**LLMLingua**](https://arxiv.org/abs/2310.05736), that uses a well-trained small language model after alignment, such as GPT2-small or LLaMA-7B, to detect the unimportant tokens in the prompt and enable inference with the compressed prompt in black-box LLMs, achieving up to **20x** compression with minimal performance loss."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18422597-687a-43aa-a6ed-ce6244d0eb55",
   "metadata": {},
   "source": [
    "### GSM8K"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51a7accd-5ec2-4ed2-9582-1afdb441a998",
   "metadata": {},
   "source": [
    "Next, we will demonstrate the use of LLMLingua on the GSM8K dataset, which effectively alleviates the \"lost in the middle\" issue. The original dataset can be found at https://github.com/FranxYao/chain-of-thought-hub/blob/main/gsm8k/lib_prompt/prompt_hardest.txt, which has 2,366 tokens and is an 8-shot setup."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "a970a901-11bd-43af-a8bc-7fb2fc6a1a07",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Defaulting to user installation because normal site-packages is not writeable\n",
      "Requirement already satisfied: llmlingua in /home/hjiang/Code/github/LLMLingua (0.1.2)\n",
      "Requirement already satisfied: datasets in /home/hjiang/.local/lib/python3.9/site-packages (2.14.4)\n",
      "Requirement already satisfied: nltk in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (3.8.1)\n",
      "Requirement already satisfied: numpy in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (1.23.5)\n",
      "Requirement already satisfied: tiktoken in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (0.4.0)\n",
      "Requirement already satisfied: torch in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (1.13.1+cu116)\n",
      "Requirement already satisfied: transformers>=4.26.0 in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (4.34.1)\n",
      "Requirement already satisfied: pyarrow>=8.0.0 in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (11.0.0)\n",
      "Requirement already satisfied: dill<0.3.8,>=0.3.0 in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (0.3.7)\n",
      "Requirement already satisfied: pandas in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (2.0.3)\n",
      "Requirement already satisfied: requests>=2.19.0 in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (2.29.0)\n",
      "Requirement already satisfied: tqdm>=4.62.1 in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (4.65.0)\n",
      "Requirement already satisfied: xxhash in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (3.3.0)\n",
      "Requirement already satisfied: multiprocess in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (0.70.15)\n",
      "Requirement already satisfied: fsspec[http]>=2021.11.1 in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (2023.6.0)\n",
      "Requirement already satisfied: aiohttp in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (3.8.5)\n",
      "Requirement already satisfied: huggingface-hub<1.0.0,>=0.14.0 in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (0.16.4)\n",
      "Requirement already satisfied: packaging in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (23.0)\n",
      "Requirement already satisfied: pyyaml>=5.1 in /usr/lib/python3/dist-packages (from datasets) (5.3.1)\n",
      "Requirement already satisfied: attrs>=17.3.0 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp->datasets) (23.1.0)\n",
      "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp->datasets) (3.2.0)\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp->datasets) (6.0.4)\n",
      "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp->datasets) (4.0.2)\n",
      "Requirement already satisfied: yarl<2.0,>=1.0 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp->datasets) (1.9.2)\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp->datasets) (1.4.0)\n",
      "Requirement already satisfied: aiosignal>=1.1.2 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp->datasets) (1.3.1)\n",
      "Requirement already satisfied: filelock in /home/hjiang/.local/lib/python3.9/site-packages (from huggingface-hub<1.0.0,>=0.14.0->datasets) (3.12.2)\n",
      "Requirement already satisfied: typing-extensions>=3.7.4.3 in /home/hjiang/.local/lib/python3.9/site-packages (from huggingface-hub<1.0.0,>=0.14.0->datasets) (4.7.1)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests>=2.19.0->datasets) (2.8)\n",
      "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/hjiang/.local/lib/python3.9/site-packages (from requests>=2.19.0->datasets) (1.26.16)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests>=2.19.0->datasets) (2019.11.28)\n",
      "Requirement already satisfied: regex!=2019.12.17 in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (2023.6.3)\n",
      "Requirement already satisfied: tokenizers<0.15,>=0.14 in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (0.14.1)\n",
      "Requirement already satisfied: safetensors>=0.3.1 in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (0.3.1)\n",
      "Requirement already satisfied: click in /home/hjiang/.local/lib/python3.9/site-packages (from nltk->llmlingua) (8.1.6)\n",
      "Requirement already satisfied: joblib in /home/hjiang/.local/lib/python3.9/site-packages (from nltk->llmlingua) (1.3.1)\n",
      "Requirement already satisfied: python-dateutil>=2.8.2 in /home/hjiang/.local/lib/python3.9/site-packages (from pandas->datasets) (2.8.2)\n",
      "Requirement already satisfied: pytz>=2020.1 in /home/hjiang/.local/lib/python3.9/site-packages (from pandas->datasets) (2023.3)\n",
      "Requirement already satisfied: tzdata>=2022.1 in /home/hjiang/.local/lib/python3.9/site-packages (from pandas->datasets) (2023.3)\n",
      "Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas->datasets) (1.14.0)\n",
      "\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.1\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3.9 -m pip install --upgrade pip\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "# Install dependency.\n",
    "!pip install llmlingua datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "641235b6-71a5-4f2a-8eec-272c73931bef",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2023-10-30 09:15:31--  https://raw.githubusercontent.com/FranxYao/chain-of-thought-hub/main/gsm8k/lib_prompt/prompt_hardest.txt\n",
      "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...\n",
      "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 8464 (8.3K) [text/plain]\n",
      "Saving to: ‘prompt_hardest.txt’\n",
      "\n",
      "prompt_hardest.txt  100%[===================>]   8.27K  --.-KB/s    in 0s      \n",
      "\n",
      "2023-10-30 09:15:31 (78.8 MB/s) - ‘prompt_hardest.txt’ saved [8464/8464]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Download the original prompt and dataset\n",
    "from datasets import load_dataset\n",
    "!wget https://raw.githubusercontent.com/FranxYao/chain-of-thought-hub/main/gsm8k/lib_prompt/prompt_hardest.txt\n",
    "prompt_complex = open(\"./prompt_hardest.txt\").read()\n",
    "gsm8k = load_dataset(\"gsm8k\", \"main\")\n",
    "gsm8k_test = gsm8k[\"test\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "cbbbf3de-a9d6-46cf-afab-dcb72a6154ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Using the OAI\n",
    "import openai\n",
    "openai.api_key = \"<insert_openai_key>\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "46506810-8565-43da-984b-d862c56b49c2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# or Using the AOAI\n",
    "import openai\n",
    "openai.api_key = \"<insert_openai_key>\"\n",
    "openai.api_base = \"https://xxxx.openai.azure.com/\"\n",
    "openai.api_type = 'azure'\n",
    "openai.api_version = '2023-05-15'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8676ffa-5117-44dc-9742-bb9ab1d56e0c",
   "metadata": {},
   "source": [
    "### Setup Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "cc17bbc5-86cb-4d15-a730-955af85a10b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# select an example from GSM8K\n",
    "question, answer = [gsm8k_test[2][key] for key in [\"question\", \"answer\"]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "58718a19-cc4e-4002-a92a-58ea3de9c9d0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Question: Josh decides to try flipping a house.  He buys a house for $80,000 and then puts in $50,000 in repairs.  This increased the value of the house by 150%.  How much profit did he make?\n",
      "Answer: The cost of the house and repairs came out to 80,000+50,000=$<<80000+50000=130000>>130,000\n",
      "He increased the value of the house by 80,000*1.5=<<80000*1.5=120000>>120,000\n",
      "So the new value of the house is 120,000+80,000=$<<120000+80000=200000>>200,000\n",
      "So he made a profit of 200,000-130,000=$<<200000-130000=70000>>70,000\n",
      "#### 70000\n"
     ]
    }
   ],
   "source": [
    "# Ground-truth Answer\n",
    "print(\"Question:\", question)\n",
    "print(\"Answer:\", answer)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba1c6d52-dc87-434c-a41c-0bbc8a286504",
   "metadata": {},
   "source": [
    "#### The response of Original prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "3d441f10-c5c7-4d45-b09a-717e536b36bf",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"id\": \"cmpl-8FZvcX70FH7ck9c9MegWmnUocH0A0\",\n",
      "    \"object\": \"text_completion\",\n",
      "    \"created\": 1698723720,\n",
      "    \"model\": \"gpt-35-turbo\",\n",
      "    \"choices\": [\n",
      "        {\n",
      "            \"text\": \" \\nLet's think step by step\\nThe repairs increased the value of the house by 150% so that means it increased by 80,000*1.5=$<<80000*1.5=120000>>120,000\\nSo the total value of the house is 80,000+120,000=$<<80000+120000=200000>>200,000\\nHe spent 80,000+50,000=$<<80000+50000=130000>>130,000\\nSo he made a profit of 200,000-130,000=$<<200000-130000=70000>>70,000\\nThe answer is 70,000\",\n",
      "            \"index\": 0,\n",
      "            \"finish_reason\": \"stop\",\n",
      "            \"logprobs\": null\n",
      "        }\n",
      "    ],\n",
      "    \"usage\": {\n",
      "        \"prompt_tokens\": 2428,\n",
      "        \"completion_tokens\": 142,\n",
      "        \"total_tokens\": 2570\n",
      "    }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# The response from original prompt\n",
    "import json\n",
    "instruction = \"Please reference the following examples to answer the math question,\\n\"\n",
    "prompt = instruction + prompt_complex + \"\\n\\nQuestion: \" + question\n",
    "\n",
    "request_data = {\n",
    "    \"prompt\": prompt,\n",
    "    \"max_tokens\": 400,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "    \"stop\": \"\\n\\n\",\n",
    "}\n",
    "response = openai.Completion.create(\n",
    "    \"gpt-3.5-turbo-0301\",\n",
    "    **request_data,\n",
    ")\n",
    "print(json.dumps(response, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9aa90492-8ad1-4a89-85c5-26b8472f1ff0",
   "metadata": {},
   "source": [
    "#### The response of Compressed Prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "fa638dec-c9ec-4dce-9dac-d768145de714",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "8ec90053e7274da59973427652f879a1",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/hjiang/.local/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:362: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.\n",
      "  warnings.warn(\n",
      "/home/hjiang/.local/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:367: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "# Setup LLMLingua\n",
    "from llmlingua import PromptCompressor\n",
    "llm_lingua = PromptCompressor()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "5f61a186-6641-4118-ad04-5245a53b6d79",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"compressed_prompt\": \"Question: Sam bought a dozen boxes, each with 30 highlighter pens inside, for $10 each. He reanged five of boxes into packages of sixlters each and sold them $3 per. He sold the rest theters separately at the of three pens $2. How much did make in total, dollars?\\nLets think step step\\nSam bought 1 boxes x00 oflters.\\nHe bought 12 00ters in total\\nSam then took5 boxes 6ters0ters\\nHe sold these boxes for 5 *5\\nAfterelling these  boxes there were 30330ters remaining\\nese form 330 /30 of three\\n sold each for2 each, so made * =0 from\\n total, he0 $15\\nSince his original1 he earned $120 = $115 in profit.\\nThe answer is 115\",\n",
      "    \"origin_tokens\": 2365,\n",
      "    \"compressed_tokens\": 174,\n",
      "    \"ratio\": \"13.6x\",\n",
      "    \"saving\": \", Saving $0.1 in GPT-4.\"\n",
      "}\n",
      "Response: {\n",
      "  \"id\": \"cmpl-8FZwYp1QIwiQs6pEhy2cRK6wnLnAO\",\n",
      "  \"object\": \"text_completion\",\n",
      "  \"created\": 1698723778,\n",
      "  \"model\": \"gpt-35-turbo\",\n",
      "  \"choices\": [\n",
      "    {\n",
      "      \"text\": \" \\n\\nThe repairs increased the value of the house by 150% so that means it increased by 80000*1.5=$<<80000*1.5=120000>>120,000\\nSo the total value of the house is 120,000+80,000=$<<120000+80000=200000>>200,000\\nThat means he made a profit of 200,000-80,000-50,000=$<<200000-80000-50000=70000>>70,000. Answer: \\\\boxed{70,000}.<|im_end|>\",\n",
      "      \"index\": 0,\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"logprobs\": null\n",
      "    }\n",
      "  ],\n",
      "  \"usage\": {\n",
      "    \"prompt_tokens\": 237,\n",
      "    \"completion_tokens\": 120,\n",
      "    \"total_tokens\": 357\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# 174 tokens Compression, 13.6x\n",
    "compressed_prompt = llm_lingua.compress_prompt(\n",
    "    prompt_complex.split(\"\\n\\n\"),\n",
    "    instruction=\"\",\n",
    "    question=\"\",\n",
    "    target_token=200,\n",
    "    context_budget=\"*1.5\",\n",
    "    iterative_size=100,\n",
    ")\n",
    "\n",
    "instruction = \"Please reference the following examples to answer the math question,\\n\"\n",
    "prompt = instruction + compressed_prompt[\"compressed_prompt\"] + \"\\n\\nQuestion: \" + question\n",
    "\n",
    "request_data = {\n",
    "    \"prompt\": prompt,\n",
    "    \"max_tokens\": 400,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "    \"stop\": \"\\r\\n\",\n",
    "}\n",
    "response = openai.Completion.create(\n",
    "    \"gpt-3.5-turbo-0301\",\n",
    "    **request_data,\n",
    ")\n",
    "print(json.dumps(compressed_prompt, indent=4))\n",
    "print(\"Response:\", response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f89bb0f-7959-4a14-95be-dc80d88ce576",
   "metadata": {},
   "source": [
    "### Test in GSM8K test set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "c1ac9bf5-23a9-446c-9394-8bb19aa1d89d",
   "metadata": {},
   "outputs": [],
   "source": [
    "import re\n",
    "\n",
    "def extract_ans(ans_model):\n",
    "    ans_model = ans_model.split(\"\\n\")\n",
    "    ans = []\n",
    "    residual = []\n",
    "    for li, al in enumerate(ans_model):\n",
    "        ans.append(al)\n",
    "        if \"answer is\" in al:\n",
    "            break\n",
    "    residual = list(ans_model[li + 1 :])\n",
    "    ans = \"\\n\".join(ans)\n",
    "    residual = \"\\n\".join(residual)\n",
    "    return ans, residual\n",
    "\n",
    "def parse_pred_ans(filename):\n",
    "    with open(filename) as fd:\n",
    "        lines = fd.readlines()\n",
    "    am, a = None, None\n",
    "    num_q, acc = 0, 0\n",
    "    current_mode = \"none\"\n",
    "    questions = []\n",
    "    ans_pred = []\n",
    "    ans_gold = []\n",
    "    for l in lines:\n",
    "        l = l.replace(\",\", \"\")\n",
    "        if l.startswith(\"Q: \"):\n",
    "            if am is not None and a is not None:\n",
    "                questions.append(q)\n",
    "                ans_pred.append(am)\n",
    "                ans_gold.append(a)\n",
    "                if test_answer(am, a):\n",
    "                    acc += 1\n",
    "            current_mode = \"q\"\n",
    "            q = l\n",
    "            num_q += 1\n",
    "        elif l.startswith(\"A_model:\"):\n",
    "            current_mode = \"am\"\n",
    "            am = l\n",
    "        elif l.startswith(\"A:\"):\n",
    "            current_mode = \"a\"\n",
    "            a = l\n",
    "        else:\n",
    "            if current_mode == \"q\":\n",
    "                q += l\n",
    "            elif current_mode == \"am\":\n",
    "                am += l\n",
    "            elif current_mode == \"a\":\n",
    "                a += l\n",
    "            else:\n",
    "                raise ValueError(current_mode)\n",
    "\n",
    "    questions.append(q)\n",
    "    ans_pred.append(am)\n",
    "    ans_gold.append(a)\n",
    "    if test_answer(am, a):\n",
    "        acc += 1\n",
    "    print(\"num_q %d correct %d ratio %.4f\" % (num_q, acc, float(acc / num_q)))\n",
    "    return questions, ans_pred, ans_gold\n",
    "\n",
    "\n",
    "def get_result(text: str):\n",
    "    pattern = \"\\d*\\.?\\d+\"\n",
    "    res = re.findall(pattern, text)\n",
    "    return res[-1] if res else \"\"\n",
    "\n",
    "\n",
    "def test_answer(pred_str, ans_str):\n",
    "    pred, gold = get_result(pred_str), get_result(ans_str)\n",
    "    return pred == gold"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "id": "cb209d5a-f822-4734-afc5-dafc07cc1bbc",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1319/1319 [47:55<00:00,  2.18s/it] \n"
     ]
    }
   ],
   "source": [
    "# Test in GSM8K test set\n",
    "from tqdm import tqdm\n",
    "import os\n",
    "os.makedirs(\"outputs\", exist_ok=True)\n",
    "i = 0\n",
    "\n",
    "compressed_prompt = llm_lingua.compress_prompt(\n",
    "    prompt_complex.split(\"\\n\\n\"),\n",
    "    instruction=\"\",\n",
    "    question=\"\",\n",
    "    target_token=200,\n",
    "    context_budget=\"*1.5\",\n",
    "    iterative_size=100,\n",
    ")\n",
    "\n",
    "for q, a in tqdm(zip(gsm8k_test['question'], gsm8k_test['answer']), \n",
    "                           total=len(gsm8k_test['question'])):\n",
    "    instruction = \"Please reference the following examples to answer the math question,\\n\"\n",
    "    prompt = instruction + compressed_prompt[\"compressed_prompt\"] + \"\\n\\nQuestion: \" + q + \"\\n\"\n",
    "    \n",
    "    request_data = {\n",
    "        \"prompt\": prompt,\n",
    "        \"max_tokens\": 400,\n",
    "        \"temperature\": 0,\n",
    "        \"top_p\": 1,\n",
    "        \"n\": 1,\n",
    "        \"stream\": False,\n",
    "    }\n",
    "    response = openai.Completion.create(\n",
    "        \"gpt-3.5-turbo-0301\",\n",
    "        **request_data,\n",
    "    )\n",
    "    ans_model = response[\"choices\"][0][\"text\"]\n",
    "    ans_, residual = extract_ans(ans_model)\n",
    "    with open('outputs/test_gpt_3.5_turbo_LLMLingua_174.txt', 'a') as fd:\n",
    "        fd.write(\"Q: %s\\nA_model:\\n%s\\nA:\\n%s\\n\\n\" % (q, ans_.replace(\"Q:\", \"\").replace(\"A:\", \"\"), a))\n",
    "    i += 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "id": "3a35d298-8596-4b92-8dda-8da4250c873c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "num_q 1319 correct 1032 ratio 0.7824\n"
     ]
    }
   ],
   "source": [
    "_ = parse_pred_ans(\"outputs/test_gpt_3.5_turbo_LLMLingua_174.txt\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
--- a/examples/Code.ipynb
+++ b/examples/Code.ipynb
--- a/examples/OnlineMeeting.ipynb
+++ b/examples/OnlineMeeting.ipynb
@@ -0,0 +1,687 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1972a352-a0e3-41b7-81dc-dd4ae2b890c3",
   "metadata": {},
   "source": [
    "## Online Meeting"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05d999bc-83a3-454f-a8a4-44cbff1fcedc",
   "metadata": {},
   "source": [
    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/microsoft/LLMLingua/blob/main/examples/OnlineMeeting.ipynb\">\r\n",
    "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\r\n",
    "</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe3ed1ce-d38d-4048-9db6-9707b55dc642",
   "metadata": {},
   "source": [
    "Using generative AI like ChatGPT in online meetings can greatly improve work efficiency (e.g., **Teams**). However, the context in such applications tends to be more conversational, with a high degree of redundancy and a large number of tokens(more than **40k**). By utilizing LLMLingua to compress prompts, we can significantly reduce the length of prompts, which in turn helps to reduce latency. This makes the AI more efficient and responsive in real-time communication scenarios like online meetings, enabling smoother interactions and better overall performance. We use meeting transcripts from the [**MeetingBank** dataset](https://huggingface.co/datasets/lytang/MeetingBank-transcript) as an example to demonstrate the capabilities of LLMLingua."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18422597-687a-43aa-a6ed-ce6244d0eb55",
   "metadata": {},
   "source": [
    "### MeetingBank Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51a7accd-5ec2-4ed2-9582-1afdb441a998",
   "metadata": {},
   "source": [
    "Next, we will demonstrate the use of LongLLMLingua on the **MeetingBank** dataset, which can achieve similar or even better performance with significantly fewer tokens. The online meeting scenario is quite similar to RAG, as it also suffers from the \"lost in the middle\" issue, where noise data at the beginning or end of the prompt interferes with LLMs extracting key information. This dataset closely resembles real-world online meeting scenarios, with prompt lengths exceeding **60k tokens at their longest.  \n",
    "   \n",
    "The original dataset can be found at https://huggingface.co/datasets/lytang/MeetingBank-transcript"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "a970a901-11bd-43af-a8bc-7fb2fc6a1a07",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Defaulting to user installation because normal site-packages is not writeable\n",
      "Requirement already satisfied: llmlingua in /home/hjiang/Code/github/LLMLingua (0.1.2)\n",
      "Requirement already satisfied: datasets in /home/hjiang/.local/lib/python3.9/site-packages (2.14.4)\n",
      "Requirement already satisfied: nltk in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (3.8.1)\n",
      "Requirement already satisfied: numpy in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (1.23.5)\n",
      "Requirement already satisfied: tiktoken in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (0.4.0)\n",
      "Requirement already satisfied: torch in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (1.13.1+cu116)\n",
      "Requirement already satisfied: transformers>=4.26.0 in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (4.34.1)\n",
      "Requirement already satisfied: pyarrow>=8.0.0 in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (11.0.0)\n",
      "Requirement already satisfied: dill<0.3.8,>=0.3.0 in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (0.3.7)\n",
      "Requirement already satisfied: pandas in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (2.0.3)\n",
      "Requirement already satisfied: requests>=2.19.0 in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (2.29.0)\n",
      "Requirement already satisfied: tqdm>=4.62.1 in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (4.65.0)\n",
      "Requirement already satisfied: xxhash in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (3.3.0)\n",
      "Requirement already satisfied: multiprocess in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (0.70.15)\n",
      "Requirement already satisfied: fsspec[http]>=2021.11.1 in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (2023.6.0)\n",
      "Requirement already satisfied: aiohttp in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (3.8.5)\n",
      "Requirement already satisfied: huggingface-hub<1.0.0,>=0.14.0 in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (0.16.4)\n",
      "Requirement already satisfied: packaging in /home/hjiang/.local/lib/python3.9/site-packages (from datasets) (23.0)\n",
      "Requirement already satisfied: pyyaml>=5.1 in /usr/lib/python3/dist-packages (from datasets) (5.3.1)\n",
      "Requirement already satisfied: attrs>=17.3.0 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp->datasets) (23.1.0)\n",
      "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp->datasets) (3.2.0)\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp->datasets) (6.0.4)\n",
      "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp->datasets) (4.0.2)\n",
      "Requirement already satisfied: yarl<2.0,>=1.0 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp->datasets) (1.9.2)\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp->datasets) (1.4.0)\n",
      "Requirement already satisfied: aiosignal>=1.1.2 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp->datasets) (1.3.1)\n",
      "Requirement already satisfied: filelock in /home/hjiang/.local/lib/python3.9/site-packages (from huggingface-hub<1.0.0,>=0.14.0->datasets) (3.12.2)\n",
      "Requirement already satisfied: typing-extensions>=3.7.4.3 in /home/hjiang/.local/lib/python3.9/site-packages (from huggingface-hub<1.0.0,>=0.14.0->datasets) (4.7.1)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests>=2.19.0->datasets) (2.8)\n",
      "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/hjiang/.local/lib/python3.9/site-packages (from requests>=2.19.0->datasets) (1.26.16)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests>=2.19.0->datasets) (2019.11.28)\n",
      "Requirement already satisfied: regex!=2019.12.17 in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (2023.6.3)\n",
      "Requirement already satisfied: tokenizers<0.15,>=0.14 in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (0.14.1)\n",
      "Requirement already satisfied: safetensors>=0.3.1 in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (0.3.1)\n",
      "Requirement already satisfied: click in /home/hjiang/.local/lib/python3.9/site-packages (from nltk->llmlingua) (8.1.6)\n",
      "Requirement already satisfied: joblib in /home/hjiang/.local/lib/python3.9/site-packages (from nltk->llmlingua) (1.3.1)\n",
      "Requirement already satisfied: python-dateutil>=2.8.2 in /home/hjiang/.local/lib/python3.9/site-packages (from pandas->datasets) (2.8.2)\n",
      "Requirement already satisfied: pytz>=2020.1 in /home/hjiang/.local/lib/python3.9/site-packages (from pandas->datasets) (2023.3)\n",
      "Requirement already satisfied: tzdata>=2022.1 in /home/hjiang/.local/lib/python3.9/site-packages (from pandas->datasets) (2023.3)\n",
      "Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas->datasets) (1.14.0)\n",
      "\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.1\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3.9 -m pip install --upgrade pip\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "# Install dependency.\n",
    "!pip install llmlingua datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "7bbb89f7-9f0e-4998-97a6-da033351ef1a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download the original prompt and dataset\n",
    "from datasets import load_dataset\n",
    "dataset = load_dataset(\"lytang/MeetingBank-transcript\")[\"train\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "cbbbf3de-a9d6-46cf-afab-dcb72a6154ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Using the OAI\n",
    "import openai\n",
    "openai.api_key = \"<insert_openai_key>\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "46506810-8565-43da-984b-d862c56b49c2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# or Using the AOAI\n",
    "import openai\n",
    "openai.api_key = \"<insert_openai_key>\"\n",
    "openai.api_base = \"https://xxxx.openai.azure.com/\"\n",
    "openai.api_type = 'azure'\n",
    "openai.api_version = '2023-05-15'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8676ffa-5117-44dc-9742-bb9ab1d56e0c",
   "metadata": {},
   "source": [
    "### Setup Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "cc17bbc5-86cb-4d15-a730-955af85a10b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# select an example from MeetingBank\n",
    "contexts = dataset[1][\"source\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba1c6d52-dc87-434c-a41c-0bbc8a286504",
   "metadata": {},
   "source": [
    "### Q1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "f8a54b7f-5bd4-4d4f-9249-b900bd703884",
   "metadata": {},
   "outputs": [],
   "source": [
    "question = \"Question: How much did the crime rate increase last year?\\nAnswer:\"\n",
    "reference = \"5.4%\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "3d441f10-c5c7-4d45-b09a-717e536b36bf",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"id\": \"chatcmpl-8FNC3cZSVtzUCxOVhB04RxnEUVrf8\",\n",
      "    \"object\": \"chat.completion\",\n",
      "    \"created\": 1698674767,\n",
      "    \"model\": \"gpt-4-32k\",\n",
      "    \"choices\": [\n",
      "        {\n",
      "            \"index\": 0,\n",
      "            \"finish_reason\": \"stop\",\n",
      "            \"message\": {\n",
      "                \"role\": \"assistant\",\n",
      "                \"content\": \"The crime rate increased by 5.4% year to date.\"\n",
      "            }\n",
      "        }\n",
      "    ],\n",
      "    \"usage\": {\n",
      "        \"prompt_tokens\": 30096,\n",
      "        \"completion_tokens\": 14,\n",
      "        \"total_tokens\": 30110\n",
      "    }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# The response from original prompt, using GPT-4-32k\n",
    "import json\n",
    "prompt = \"\\n\\n\".join([contexts, question])\n",
    "\n",
    "message = [\n",
    "    {\"role\": \"user\", \"content\": prompt},\n",
    "]\n",
    "\n",
    "request_data = {\n",
    "    \"messages\": message,\n",
    "    \"max_tokens\": 100,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "}\n",
    "response = openai.ChatCompletion.create(\n",
    "    \"gpt-4-32k\",\n",
    "    **request_data,\n",
    ")\n",
    "print(json.dumps(response, indent=4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "7859d7d7-a6cd-499a-a780-643ba8e0b832",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ba191aa3d6554337a49e9b0896fc73e6",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/hjiang/.local/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:362: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.\n",
      "  warnings.warn(\n",
      "/home/hjiang/.local/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:367: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "# Setup LLMLingua\n",
    "from llmlingua import PromptCompressor\n",
    "llm_lingua = PromptCompressor()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "0f850fc9-2f7f-42f8-8d8f-5a64e39c1a8b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"compressed_prompt\": \"aker3., the.\\n\\naker : Thank you Counciloman Yes,'s. 5.4% increase to date That after this a 1.4 increase in crime in. 1 From Police. Let the police. : day. Our department will continue to evolve and move forward, building on our existing strengths and taking advantage of opportunities for growth and renewal. Our priorities around crime and homelessness, employee and community wellness and open communication will help guide us further into 21st century policing, while also supporting the shared responsibility of public safety in the city of Long Beach. Thank you. Myself and Bureau Chief Josie Murray stand ready to answer any questions they can.\\n\\nQuestion: How much did the crime rate increase last year?\\nAnswer:\",\n",
      "    \"origin_tokens\": 30089,\n",
      "    \"compressed_tokens\": 149,\n",
      "    \"ratio\": \"201.9x\",\n",
      "    \"saving\": \", Saving $1.8 in GPT-4.\"\n",
      "}\n",
      "Response: {\n",
      "  \"id\": \"chatcmpl-8FNIg6iVYBfI1354r72xYE9X4tDDE\",\n",
      "  \"object\": \"chat.completion\",\n",
      "  \"created\": 1698675178,\n",
      "  \"model\": \"gpt-4-32k\",\n",
      "  \"choices\": [\n",
      "    {\n",
      "      \"index\": 0,\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"message\": {\n",
      "        \"role\": \"assistant\",\n",
      "        \"content\": \"The crime rate increased by 5.4% last year.\"\n",
      "      }\n",
      "    }\n",
      "  ],\n",
      "  \"usage\": {\n",
      "    \"prompt_tokens\": 156,\n",
      "    \"completion_tokens\": 13,\n",
      "    \"total_tokens\": 169\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# 200 Compression\n",
    "compressed_prompt = llm_lingua.compress_prompt(\n",
    "    contexts.split(\"\\n\"),\n",
    "    instruction=\"\",\n",
    "    question=question,\n",
    "    target_token=200,\n",
    "    condition_compare=True,\n",
    "    condition_in_question='after',\n",
    "    rank_method='longllmlingua',\n",
    "    use_sentence_level_filter=False,\n",
    "    context_budget=\"+100\",\n",
    "    dynamic_context_compression_ratio=0.4, # enable dynamic_context_compression_ratio\n",
    "    reorder_context=\"sort\"\n",
    ")\n",
    "message = [\n",
    "    {\"role\": \"user\", \"content\": compressed_prompt[\"compressed_prompt\"]},\n",
    "]\n",
    "\n",
    "request_data = {\n",
    "    \"messages\": message,\n",
    "    \"max_tokens\": 100,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "}\n",
    "response = openai.ChatCompletion.create(\n",
    "    \"gpt-4-32k\",\n",
    "    **request_data,\n",
    ")\n",
    "\n",
    "print(json.dumps(compressed_prompt, indent=4))\n",
    "print(\"Response:\", response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9aa90492-8ad1-4a89-85c5-26b8472f1ff0",
   "metadata": {},
   "source": [
    "### Q2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "fa638dec-c9ec-4dce-9dac-d768145de714",
   "metadata": {},
   "outputs": [],
   "source": [
    "question = \"Question: What is the homicide clearance rate?\\nAnswer:\"\n",
    "reference = \"77%\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "5f61a186-6641-4118-ad04-5245a53b6d79",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"id\": \"chatcmpl-8FNJi0fTohhSuLHTF13uWBBcslAtx\",\n",
      "    \"object\": \"chat.completion\",\n",
      "    \"created\": 1698675242,\n",
      "    \"model\": \"gpt-4-32k\",\n",
      "    \"choices\": [\n",
      "        {\n",
      "            \"index\": 0,\n",
      "            \"finish_reason\": \"stop\",\n",
      "            \"message\": {\n",
      "                \"role\": \"assistant\",\n",
      "                \"content\": \"The homicide clearance rate for the Long Beach Fire Department is 77%.\"\n",
      "            }\n",
      "        }\n",
      "    ],\n",
      "    \"usage\": {\n",
      "        \"prompt_tokens\": 30093,\n",
      "        \"completion_tokens\": 14,\n",
      "        \"total_tokens\": 30107\n",
      "    }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# The response from original prompt, using GPT-4-32k\n",
    "import json\n",
    "prompt = \"\\n\\n\".join([contexts, question])\n",
    "\n",
    "message = [\n",
    "    {\"role\": \"user\", \"content\": prompt},\n",
    "]\n",
    "\n",
    "request_data = {\n",
    "    \"messages\": message,\n",
    "    \"max_tokens\": 100,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "}\n",
    "response = openai.ChatCompletion.create(\n",
    "    \"gpt-4-32k\",\n",
    "    **request_data,\n",
    ")\n",
    "print(json.dumps(response, indent=4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "4328e6c4-63f5-4a24-a459-baaa309f9825",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"compressed_prompt\": \"\\n\\nEvery we discuss a variety of public we provide, emergency response and calls for service criminal investig, and advoc, safarding while protect infrastr and and threats.\\n you see we experiencing, exempl how our are working.\\n51% these arrests forb by law from possessing firear.\\n this alone have seized  firear includes a 23% increase in the recovery manufactured firearms knownimps or ghost guns.And while every homic tragic, we not dissuaded and continue to toward bringing justice to the families and loved ones of victimsAmong accomplish,'ll see we have a homicide clearance rate of 77%.\\nThere are many factors that contribute to our effectiveness in this area, including a rapid reaction and response by patrol officers, immediate follow up by our Special Investigations Division and the excellent investigative efforts of our homicide detectives.\\nTo help increase our communication, transparency and engagement, we've developed a community advisory committee to help inform and shape department policies, and we engage our neighborhoods through division, specific events and commander forums.\\n\\nQuestion: What is the homicide clearance rate?\\nAnswer:\",\n",
      "    \"origin_tokens\": 30086,\n",
      "    \"compressed_tokens\": 211,\n",
      "    \"ratio\": \"142.6x\",\n",
      "    \"saving\": \", Saving $1.8 in GPT-4.\"\n",
      "}\n",
      "Response: {\n",
      "  \"id\": \"chatcmpl-8FNxaQUPnfByyAmNtRld4FSIVUMtW\",\n",
      "  \"object\": \"chat.completion\",\n",
      "  \"created\": 1698677714,\n",
      "  \"model\": \"gpt-4-32k\",\n",
      "  \"choices\": [\n",
      "    {\n",
      "      \"index\": 0,\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"message\": {\n",
      "        \"role\": \"assistant\",\n",
      "        \"content\": \"The homicide clearance rate is 77%.\"\n",
      "      }\n",
      "    }\n",
      "  ],\n",
      "  \"usage\": {\n",
      "    \"prompt_tokens\": 218,\n",
      "    \"completion_tokens\": 8,\n",
      "    \"total_tokens\": 226\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# 200 Compression\n",
    "compressed_prompt = llm_lingua.compress_prompt(\n",
    "    contexts.split(\"\\n\"),\n",
    "    instruction=\"\",\n",
    "    question=question,\n",
    "    target_token=200,\n",
    "    condition_compare=True,\n",
    "    condition_in_question='after',\n",
    "    rank_method='longllmlingua',\n",
    "    use_sentence_level_filter=True,\n",
    "    context_budget=\"+100\",\n",
    "    reorder_context=\"sort\"\n",
    ")\n",
    "message = [\n",
    "    {\"role\": \"user\", \"content\": compressed_prompt[\"compressed_prompt\"]},\n",
    "]\n",
    "\n",
    "request_data = {\n",
    "    \"messages\": message,\n",
    "    \"max_tokens\": 100,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "}\n",
    "response = openai.ChatCompletion.create(\n",
    "    \"gpt-4-32k\",\n",
    "    **request_data,\n",
    ")\n",
    "\n",
    "print(json.dumps(compressed_prompt, indent=4))\n",
    "print(\"Response:\", response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a085d6b2-2642-4ed6-a92f-1a1dc104b954",
   "metadata": {},
   "source": [
    "### Q3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "id": "5217d1f8-009c-4665-aed9-ae4889358070",
   "metadata": {},
   "outputs": [],
   "source": [
    "question = \"Question: what are the arrangements the Police Department will make this year?\"\n",
    "reference = \"enhancing community engagement and internal communication models, building a culture of accountability and transparency, and prioritizing recruitment and retention.\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "id": "9a78c641-b102-4cd9-bdec-e0fccdd8e19e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"id\": \"chatcmpl-8FNz2YdueWIGpFTnRAM0ZbWKPNWIY\",\n",
      "    \"object\": \"chat.completion\",\n",
      "    \"created\": 1698677804,\n",
      "    \"model\": \"gpt-4-32k\",\n",
      "    \"choices\": [\n",
      "        {\n",
      "            \"index\": 0,\n",
      "            \"finish_reason\": \"stop\",\n",
      "            \"message\": {\n",
      "                \"role\": \"assistant\",\n",
      "                \"content\": \"The Police Department plans to focus on addressing the steady increase in call volume and maintaining or improving response times to fires, emergency medical and other emergency responses. They will also prioritize firefighter safety and behavioral health, increase diversity in all ranks of the department through recruitment and training opportunities, and maintain staffing and resources to meet service demands of citywide growth. The department will also begin preparing for the upcoming emergency service demands brought on by the 2028 Summer Olympic Games. They plan to replace front line vehicles and improve compliance with mandated fire prevention inspections. The department also plans to implement alternate destination and telemedicine pilot programs.\"\n",
      "            }\n",
      "        }\n",
      "    ],\n",
      "    \"usage\": {\n",
      "        \"prompt_tokens\": 30096,\n",
      "        \"completion_tokens\": 121,\n",
      "        \"total_tokens\": 30217\n",
      "    }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# The response from original prompt, using GPT-4-32k\n",
    "import json\n",
    "prompt = \"\\n\\n\".join([contexts, question])\n",
    "\n",
    "message = [\n",
    "    {\"role\": \"user\", \"content\": prompt},\n",
    "]\n",
    "\n",
    "request_data = {\n",
    "    \"messages\": message,\n",
    "    \"max_tokens\": 500,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "}\n",
    "response = openai.ChatCompletion.create(\n",
    "    \"gpt-4-32k\",\n",
    "    **request_data,\n",
    ")\n",
    "print(json.dumps(response, indent=4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "id": "26065113-7c23-4118-812e-8fff506ba749",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"compressed_prompt\": \"Speaker3: Thank., the\\n\\nSpe Thank. Next keep the\\n  Thank. Councilwoman Yes,'s5% year date. is after this year with a74%.: Mr. Mods,able Mayor and of the' very be presenting the Polices3 budget. for.ented. police and have experienced increased and de, they. their work purpose they are needed. to leave or, vast majority have toers with Department I believe because not typicalre worldized to maintain andation mental, qualityach programs as are, and to' mistakesre, is. of the or, the officers to here each a Theyageance In of, rising crime un police, theirment andre everyone our Every year we a of safety services,gency and, victim and ouringucture and resource, should also we ourhips and like with the Cityation,uma Program to toaborating with the Department Communic to responses. joining department as part the many other reason're we. Here volumere which. Year10 calls nearly60. Although' had to make modifications through the years to one or the of about 5 Like of remain andies adapted changes Our resulted notableend in And2,, with74% seen that numberink.% date accessms tocing have worked to illegal they this.ve had andve3ests forarms% over. peopleidden. And this, officers have which3% in the recovery of personally p guns Mov second of this'll, violence and other crime. we isic, we dis we to justice onesims .,ll we%. is  There our in area, rapid by, up our excellent efforts ofives. In trust, more to becomees We understand this, these factors achieve the seeing' highlight work', racing exhib, City supported efforts and street to assist this extremely and improve we conductedcement we pending street haveve also continued supporting city Our andcement of quality mental evaluation Angeles our,. Year,50 found. our priority supported'reousides employee over the in new ouration and employee programsuma. intersection between community, such as communityison officers on critical, andating and to provide support tra It Officerging that biggestes facing is, theing in, been critical this constantly to futureed in like the  focus increasing representation to the our next continued with, new P F T is and at par increase our shape specific of through, Project the for our media betterage public bying and effectively .ve increased a daily crimeter our as our health and we that Department onities .Y3 budget proposal department and communication. Bu're manyative the police budget our res the and to current and more offill this vision. are. our CRC Chief thatance the police. Kerry the divisions and Communityryize To we propose0 per in toness. to specific single officer life large to. The Departmentes Services we to fund. youngian while oursre departmentational. Will the while the will on or. and department, proposed re and the., O to nine F transferred other This inCP to with were works. communication the which The division willations In ourre currentlying.6 weate class2 new support and our the new part4 camera We our for additional forre to receive An new and Theorous and mostatic uned anti andperforce our work relationship and equitable systems in all areas of the department. We'll continue exploring ways to leverage new technology and improve operational efficiencies to help modernize our services or seek new grant opportunities that enhance employee and community training and develop partnerships with research and educational institutions. In closing, I'd like to again express how honored I am to lead the officers and professional staff of the Long Beach Police Department and how appreciative I am for the dedicated service they provide each day. Our department will continue to evolve and move forward, building on our existing strengths and taking advantage of opportunities for growth and renewal. Our priorities around crime and homelessness, employee and community wellness and open communication will help guide us further into 21st century policing, while also supporting the shared responsibility of public safety in the city of Long Beach. Thank you. Myself and Bureau Chief Josie Murray stand ready to answer any questions they can.\\n\\nQuestion: what are the arrangements the Police Department will make this year?\",\n",
      "    \"origin_tokens\": 30089,\n",
      "    \"compressed_tokens\": 864,\n",
      "    \"ratio\": \"34.8x\",\n",
      "    \"saving\": \", Saving $1.8 in GPT-4.\"\n",
      "}\n",
      "Response: {\n",
      "  \"id\": \"chatcmpl-8FO8w5VmNG5ujiTnL8gqVpQKbqzl8\",\n",
      "  \"object\": \"chat.completion\",\n",
      "  \"created\": 1698678418,\n",
      "  \"model\": \"gpt-4-32k\",\n",
      "  \"choices\": [\n",
      "    {\n",
      "      \"index\": 0,\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"message\": {\n",
      "        \"role\": \"assistant\",\n",
      "        \"content\": \"The Police Department plans to present a budget that addresses increased demands and challenges. They will focus on maintaining and improving mental health programs, crime prevention, and community engagement. They will also work on improving their response to rising crime rates and ensuring the safety of the community. They plan to collaborate with the City Trauma Program and the Department of Communication. They will also focus on the recovery of illegal firearms and addressing violence and other crimes. They will continue supporting city-wide mental health evaluation programs and increase representation within the department. They will also focus on leveraging new technology and improving operational efficiencies. They will seek new grant opportunities that enhance employee and community training and develop partnerships with research and educational institutions. They also plan to address issues around crime and homelessness, employee and community wellness, and open communication.\"\n",
      "      }\n",
      "    }\n",
      "  ],\n",
      "  \"usage\": {\n",
      "    \"prompt_tokens\": 871,\n",
      "    \"completion_tokens\": 157,\n",
      "    \"total_tokens\": 1028\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# 2000 Compression\n",
    "compressed_prompt = llm_lingua.compress_prompt(\n",
    "    contexts.split(\"\\n\"),\n",
    "    instruction=\"\",\n",
    "    question=question,\n",
    "    target_token=2000,\n",
    "    condition_compare=True,\n",
    "    condition_in_question='after',\n",
    "    rank_method='longllmlingua',\n",
    "    use_sentence_level_filter=False,\n",
    "    context_budget=\"+100\",\n",
    "    dynamic_context_compression_ratio=0.4, # enable dynamic_context_compression_ratio\n",
    "    reorder_context=\"sort\"\n",
    ")\n",
    "message = [\n",
    "    {\"role\": \"user\", \"content\": compressed_prompt[\"compressed_prompt\"]},\n",
    "]\n",
    "\n",
    "request_data = {\n",
    "    \"messages\": message,\n",
    "    \"max_tokens\": 500,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "}\n",
    "response = openai.ChatCompletion.create(\n",
    "    \"gpt-4-32k\",\n",
    "    **request_data,\n",
    ")\n",
    "\n",
    "print(json.dumps(compressed_prompt, indent=4))\n",
    "print(\"Response:\", response)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
--- a/examples/RAG.ipynb
+++ b/examples/RAG.ipynb
@@ -0,0 +1,527 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1972a352-a0e3-41b7-81dc-dd4ae2b890c3",
   "metadata": {},
   "source": [
    "## Retrieval-Augmented Generation (RAG)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05d999bc-83a3-454f-a8a4-44cbff1fcedc",
   "metadata": {},
   "source": [
    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/microsoft/LLMLingua/blob/main/examples/RAG.ipynb\">\r\n",
    "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\r\n",
    "</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe3ed1ce-d38d-4048-9db6-9707b55dc642",
   "metadata": {},
   "source": [
    "Retrieval-Augmented Generation (RAG) is a powerful and popular technique that applies specialized knowledge to large language models (LLMs). However, traditional RAG methods tend to have increasingly long prompts, sometimes exceeding **40k**, which can result in high financial and latency costs. Moreover, the decreased information density within the prompts can lead to performance degradation in LLMs, such as the \"lost in the middle\" issue."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae003ead-2f07-44a4-b641-2e33be920dd9",
   "metadata": {},
   "source": [
    "<center><img width=\"800\" src=\"../images/LongLLMLingua_Motivation.png\"></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b39b33f-5860-4825-8f00-d60aed0dce86",
   "metadata": {},
   "source": [
    "To address this, we propose [**LongLLMLingua**](https://arxiv.org/abs/2310.06839), which specifically tackles the low information density problem in long context scenarios via prompt compression, making it particularly suitable for RAG tasks. The main ideas involve a two-stage compression process, as shown by the  <font color='red'>**red line**</font>, which significantly improves the original curve:\n",
    "\n",
    "- Coarse-grained compression through document-level perplexity;\n",
    "- Fine-grained compression of the remaining text using token perplexity;"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c748f877-4bbf-443c-b72b-332be1df6f1a",
   "metadata": {},
   "source": [
    "Instead of fighting against positional effects, we aim to utilize them to our advantage through document reordering, as illustrated by the  <font color='green'>**green line**</font>. In this approach, the most critical passages are placed at the beginning and the end of the context. Furthermore, the entire process becomes more **cost-effective and faster** since it only requires handling **1/4** of the original context."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18422597-687a-43aa-a6ed-ce6244d0eb55",
   "metadata": {},
   "source": [
    "### NaturalQuestions Multi-document QA"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51a7accd-5ec2-4ed2-9582-1afdb441a998",
   "metadata": {},
   "source": [
    "Next, we will demonstrate the use of LongLLMLingua on the NaturalQuestions dataset, which effectively alleviates the \"lost in the middle\" issue. This dataset closely resembles real-world RAG scenarios, as it first employs the Contriever retrieval system to recall 20 relevant documents (including 1 ground truth and 19 related documents), and then answers the respective questions based on the prompts composed of these 20 documents.\n",
    "\n",
    "The original dataset can be found in https://github.com/nelson-liu/lost-in-the-middle/tree/main/qa_data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "a970a901-11bd-43af-a8bc-7fb2fc6a1a07",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cloning into 'lost-in-the-middle'...\n",
      "remote: Enumerating objects: 101, done.\u001b[K\n",
      "remote: Counting objects: 100% (78/78), done.\u001b[K\n",
      "remote: Compressing objects: 100% (52/52), done.\u001b[K\n",
      "remote: Total 101 (delta 33), reused 61 (delta 17), pack-reused 23\u001b[K\n",
      "Receiving objects: 100% (101/101), 254.44 MiB | 48.70 MiB/s, done.\n",
      "Resolving deltas: 100% (33/33), done.\n",
      "Defaulting to user installation because normal site-packages is not writeable\n",
      "Obtaining file:///home/hjiang/Code/github/LLMLingua/examples/lost-in-the-middle\n",
      "  Installing build dependencies ... \u001b[?25ldone\n",
      "\u001b[?25h  Checking if build backend supports build_editable ... \u001b[?25ldone\n",
      "\u001b[?25h  Getting requirements to build editable ... \u001b[?25ldone\n",
      "\u001b[?25h  Preparing editable metadata (pyproject.toml) ... \u001b[?25ldone\n",
      "\u001b[?25hRequirement already satisfied: xopen in /home/hjiang/.local/lib/python3.9/site-packages (from lost-in-the-middle==0.0.0) (1.7.0)\n",
      "Requirement already satisfied: isal>=1.0.0 in /home/hjiang/.local/lib/python3.9/site-packages (from xopen->lost-in-the-middle==0.0.0) (1.2.0)\n",
      "Building wheels for collected packages: lost-in-the-middle\n",
      "  Building editable for lost-in-the-middle (pyproject.toml) ... \u001b[?25ldone\n",
      "\u001b[?25h  Created wheel for lost-in-the-middle: filename=lost_in_the_middle-0.0.0-0.editable-py3-none-any.whl size=4611 sha256=2c670631c3bce6e2ca5b87fdc43e73402f33cc2b96aceaa3c89b4ae22f3de741\n",
      "  Stored in directory: /tmp/pip-ephem-wheel-cache-y7iw2jwb/wheels/1e/ff/75/6c31681b19235602b007f32c4ec397e7e2eeacc2c76fcefcde\n",
      "Successfully built lost-in-the-middle\n",
      "Installing collected packages: lost-in-the-middle\n",
      "  Attempting uninstall: lost-in-the-middle\n",
      "    Found existing installation: lost-in-the-middle 0.0.0\n",
      "    Uninstalling lost-in-the-middle-0.0.0:\n",
      "      Successfully uninstalled lost-in-the-middle-0.0.0\n",
      "Successfully installed lost-in-the-middle-0.0.0\n",
      "\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.1\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3.9 -m pip install --upgrade pip\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "# Install dependency.\n",
    "## Lost in the middle\n",
    "!git clone https://github.com/nelson-liu/lost-in-the-middle\n",
    "!cd lost-in-the-middle && echo \"xopen\" > requirements.txt && pip install -e .\n",
    "## LLMLingu\n",
    "!pip install llmlingua"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "cbbbf3de-a9d6-46cf-afab-dcb72a6154ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Using the OAI\n",
    "import openai\n",
    "openai.api_key = \"<insert_openai_key>\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "46506810-8565-43da-984b-d862c56b49c2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# or Using the AOAI\n",
    "import openai\n",
    "openai.api_key = \"<insert_openai_key>\"\n",
    "openai.api_base = \"https://xxxx.openai.azure.com/\"\n",
    "openai.api_type = 'azure'\n",
    "openai.api_version = '2023-05-15'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8676ffa-5117-44dc-9742-bb9ab1d56e0c",
   "metadata": {},
   "source": [
    "### Setup Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "bb349566-83d8-44ac-a683-b67ed9ddf7a6",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 2655/2655 [00:01<00:00, 1550.38it/s]\n"
     ]
    }
   ],
   "source": [
    "import json\n",
    "from xopen import xopen\n",
    "from copy import deepcopy\n",
    "from tqdm import tqdm\n",
    "from lost_in_the_middle.prompting import (\n",
    "    Document,\n",
    "    get_closedbook_qa_prompt,\n",
    "    get_qa_prompt,\n",
    ")\n",
    "\n",
    "datasets = []\n",
    "path = \"./lost-in-the-middle/qa_data/20_total_documents/nq-open-20_total_documents_gold_at_9.jsonl.gz\"\n",
    "with xopen(path) as f:\n",
    "    for ii, jj in tqdm(enumerate(f), total=2655):\n",
    "        input_example = json.loads(jj)\n",
    "        question = input_example[\"question\"]\n",
    "        documents = []\n",
    "        for ctx in deepcopy(input_example[\"ctxs\"]):\n",
    "            documents.append(Document.from_dict(ctx))\n",
    "\n",
    "        prompt = get_qa_prompt(\n",
    "            question,\n",
    "            documents,\n",
    "            mention_random_ordering=False,\n",
    "            query_aware_contextualization=False,\n",
    "        )\n",
    "\n",
    "        c = prompt.split(\"\\n\\n\")\n",
    "        instruction, question = c[0], c[-1]\n",
    "        demonstration = \"\\n\".join(c[1:-1])\n",
    "        datasets.append({\"id\": ii, \"instruction\": instruction, \"demonstration\": demonstration, \"question\": question, \"answer\": input_example[\"answers\"]})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "cc17bbc5-86cb-4d15-a730-955af85a10b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# select an example from NaturalQuestions\n",
    "instruction, demonstration_str, question, answer = [datasets[23][key] for key in [\"instruction\", \"demonstration\", \"question\", \"answer\"]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "58718a19-cc4e-4002-a92a-58ea3de9c9d0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['14']"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Ground-truth Answer\n",
    "answer"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba1c6d52-dc87-434c-a41c-0bbc8a286504",
   "metadata": {},
   "source": [
    "### The response of Original prompt (Error)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "3d441f10-c5c7-4d45-b09a-717e536b36bf",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"id\": \"chatcmpl-8FFZIQCjv9Dv5Q9WQcDmNBT1OJIP8\",\n",
      "    \"object\": \"chat.completion\",\n",
      "    \"created\": 1698645456,\n",
      "    \"model\": \"gpt-35-turbo\",\n",
      "    \"choices\": [\n",
      "        {\n",
      "            \"index\": 0,\n",
      "            \"finish_reason\": \"stop\",\n",
      "            \"message\": {\n",
      "                \"role\": \"assistant\",\n",
      "                \"content\": \"As of the provided search results, OPEC has 15 member countries.\"\n",
      "            }\n",
      "        }\n",
      "    ],\n",
      "    \"usage\": {\n",
      "        \"prompt_tokens\": 2897,\n",
      "        \"completion_tokens\": 15,\n",
      "        \"total_tokens\": 2912\n",
      "    }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# The response from original prompt, error\n",
    "prompt = \"\\n\\n\".join([instruction, demonstration_str, question])\n",
    "\n",
    "message = [\n",
    "    {\"role\": \"user\", \"content\": prompt},\n",
    "]\n",
    "\n",
    "request_data = {\n",
    "    \"messages\": message,\n",
    "    \"max_tokens\": 100,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "}\n",
    "response = openai.ChatCompletion.create(\n",
    "    \"gpt-3.5-turbo\",\n",
    "    **request_data,\n",
    ")\n",
    "print(json.dumps(response, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9aa90492-8ad1-4a89-85c5-26b8472f1ff0",
   "metadata": {},
   "source": [
    "### The response of Compressed Prompt (Correct in 10x Compression)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "fa638dec-c9ec-4dce-9dac-d768145de714",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0cbd44bf14024a3291cce2187b1ec363",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/hjiang/.local/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:362: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.\n",
      "  warnings.warn(\n",
      "/home/hjiang/.local/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:367: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "# Setup LLMLingua\n",
    "from llmlingua import PromptCompressor\n",
    "llm_lingua = PromptCompressor()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "5f61a186-6641-4118-ad04-5245a53b6d79",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"compressed_prompt\": \"Write a high-quality answer for the given question using only the provided search results (some of which might be irrelevant).\\n\\nDocument [10](Title: OPEC Organization of the Petroleum Exporting Countries (OPEC, /\\u02c8o\\u028ap\\u025bk/ OH-pek, or OPEP in several other languages) is an intergovernmental organization of 14 nations as of February 2018, founded in 1960 in Baghdad by the first five members (Iran, Iraq, Kuwait, Saudi Arabia, and Venezuela), and headquartered since 1965 in Vienna, Austria. As of 2016, the 14 countries accounted for an estimated 44 percent of global oil production and 73 percent of the world's \\\"proven\\\" oil reserves, giving OPEC a major influence on global oil prices that were previously determined by American-dominated multinational oil companies.\\n\\nDocument1](Title: OPE OPE lost its newest members, who had in mid-1970s E withd in December 192, because it was unwilling to pay annual US$2 million membership fee felt that it needed produce more oil it was allowed under the OPEC quota, although it rejoined October 200. concerns prompted Gabon suspend membership in January 199; it rejoined in July 201. Ira remained a member of OPEC since the organization's found but Iraqi production was not part of OPEC quota agre from 198 to 26 due to the country'sun political.\\nDocument [Title OPEruption-den Libia) half of27 alongside promised Russia and ten non offset increases in the shale, Niger, sur late-2016 before the cut effect Indones another \\\"aryension of itsPE membership, rather than accepting organization requested cutuated US$50/l, in 2017 decided to extend newas through March201, the waiting to see and the inventory glut fully siphonoff by1C)\\\" of itsend cut. Some commentators consider that the United States was a de facto member of OPEC during its formal occupation of Iraq, due to its leadership of the Coalition Provisional Authority in 2003\\u20132004. But this is not borne out by the minutes of OPEC meetings, as no US representative attended in an official capacity. Since the 1980s, representatives from Egypt, Mexico, Norway, Oman, Russia, and other oil-exporting nations have attended many OPEC meetings as observers. This arrangement serves as an informal mechanism for coordinating policies. The OPEC Conference\\n\\nQuestion: how many countries are a part of opec\\nAnswer:\",\n",
      "    \"origin_tokens\": 2890,\n",
      "    \"compressed_tokens\": 520,\n",
      "    \"ratio\": \"5.6x\",\n",
      "    \"saving\": \", Saving $0.1 in GPT-4.\"\n",
      "}\n",
      "Response: {\n",
      "  \"id\": \"chatcmpl-8FHk8at6284Ur8hbKA41NBqRbeNSc\",\n",
      "  \"object\": \"chat.completion\",\n",
      "  \"created\": 1698653816,\n",
      "  \"model\": \"gpt-35-turbo\",\n",
      "  \"choices\": [\n",
      "    {\n",
      "      \"index\": 0,\n",
      "      \"finish_reason\": \"stop\",\n",
      "      \"message\": {\n",
      "        \"role\": \"assistant\",\n",
      "        \"content\": \"As of February 2018, there are 14 countries that are part of OPEC.\"\n",
      "      }\n",
      "    }\n",
      "  ],\n",
      "  \"usage\": {\n",
      "    \"prompt_tokens\": 527,\n",
      "    \"completion_tokens\": 19,\n",
      "    \"total_tokens\": 546\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# 6x Compression\n",
    "compressed_prompt = llm_lingua.compress_prompt(\n",
    "    demonstration_str.split(\"\\n\"),\n",
    "    instruction=instruction,\n",
    "    question=question,\n",
    "    target_token=500,\n",
    "    condition_compare=True,\n",
    "    condition_in_question='after',\n",
    "    rank_method='longllmlingua',\n",
    "    use_sentence_level_filter=False,\n",
    "    context_budget=\"+100\",\n",
    "    dynamic_context_compression_ratio=0.4, # enable dynamic_context_compression_ratio\n",
    "    reorder_context=\"sort\"\n",
    ")\n",
    "message = [\n",
    "    {\"role\": \"user\", \"content\": compressed_prompt[\"compressed_prompt\"]},\n",
    "]\n",
    "\n",
    "request_data = {\n",
    "    \"messages\": message,\n",
    "    \"max_tokens\": 100,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "}\n",
    "response = openai.ChatCompletion.create(\n",
    "    \"gpt-3.5-turbo\",\n",
    "    **request_data,\n",
    ")\n",
    "\n",
    "print(json.dumps(compressed_prompt, indent=4))\n",
    "print(\"Response:\", response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "4328e6c4-63f5-4a24-a459-baaa309f9825",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"compressed_prompt\": \"Write a high-quality answer for the given question using only the provided search results (some of which might be irrelevant).\\n\\n0Title: OPECization of Petroleum Exporting Count (OPEC, /\\u02c8o\\u028ap\\u025bk OHpekP in other) is an intergovernmental14 nations as February 218 founded in 960 in Baghdad by fiveIran Iraq, Kuwait, Saudi Arab, and Venezuela), headquartered since 965 in, Austria. of the4ed an estimated4 percent of production and 3 percent of the world's \\\"proven\\\" oil res OPEC on global by Americandominatedin companies.\\n\\n5](Title: OPEC) OPEC lost its two newest members, who had joined in the mid-1970s. Ecuador withdrew in December 1992, because it was unwilling to pay the annual US$2 million membership fee and felt that it needed to produce more oil than it was allowed under the OPEC quota, although it rejoined in October 2007. Similar concerns prompted Gabon to suspend membership in January 1995; it rejoined in July 2016. Iraq has remained a member of OPEC since the organization's founding, but Iraqi production was not a part of OPEC quota agreements from 1998 to 2016, due to the country's daunting political difficulties.\\n\\nQuestion: how many countries are a part of opec\\nAnswer:\",\n",
      "    \"origin_tokens\": 2890,\n",
      "    \"compressed_tokens\": 285,\n",
      "    \"ratio\": \"10.1x\",\n",
      "    \"saving\": \", Saving $0.2 in GPT-4.\"\n",
      "}\n",
      "Response: {\n",
      "  \"id\": \"chatcmpl-8FHgqwIufzoXoDTLvh3S3eRVTOLSU\",\n",
      "  \"object\": \"chat.completion\",\n",
      "  \"created\": 1698653612,\n",
      "  \"model\": \"gpt-35-turbo\",\n",
      "  \"choices\": [\n",
      "    {\n",
      "      \"index\": 0,\n",
      "      \"finish_reason\": \"length\",\n",
      "      \"message\": {\n",
      "        \"role\": \"assistant\",\n",
      "        \"content\": \"OPEC, or the Organization of the Petroleum Exporting Countries, is currently composed of 14 member nations. These countries are Iran, Iraq, Kuwait, Saudi Arabia, Venezuela, Algeria, Angola, Congo, Equatorial Guinea, Gabon, Libya, Nigeria, United Arab Emirates, and Ecuador. However, it is important to note that Ecuador temporarily withdrew from OPEC in December 1992 but rejoined in October 2007, while Gabon suspended its membership in January 1995 but\"\n",
      "      }\n",
      "    }\n",
      "  ],\n",
      "  \"usage\": {\n",
      "    \"prompt_tokens\": 292,\n",
      "    \"completion_tokens\": 100,\n",
      "    \"total_tokens\": 392\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# 10x Compression\n",
    "compressed_prompt = llm_lingua.compress_prompt(\n",
    "    demonstration_str.split(\"\\n\"),\n",
    "    instruction=instruction,\n",
    "    question=question,\n",
    "    target_token=100,\n",
    "    condition_compare=True,\n",
    "    condition_in_question='after',\n",
    "    rank_method='longllmlingua',\n",
    "    use_sentence_level_filter=False,\n",
    "    context_budget=\"+100\",\n",
    "    dynamic_context_compression_ratio=0.4, # enable dynamic_context_compression_ratio\n",
    "    reorder_context=\"sort\"\n",
    ")\n",
    "message = [\n",
    "    {\"role\": \"user\", \"content\": compressed_prompt[\"compressed_prompt\"]},\n",
    "]\n",
    "\n",
    "request_data = {\n",
    "    \"messages\": message,\n",
    "    \"max_tokens\": 100,\n",
    "    \"temperature\": 0,\n",
    "    \"top_p\": 1,\n",
    "    \"n\": 1,\n",
    "    \"stream\": False,\n",
    "}\n",
    "response = openai.ChatCompletion.create(\n",
    "    \"gpt-3.5-turbo\",\n",
    "    **request_data,\n",
    ")\n",
    "\n",
    "print(json.dumps(compressed_prompt, indent=4))\n",
    "print(\"Response:\", response)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
--- a/examples/RAGLlamaIndex.ipynb
+++ b/examples/RAGLlamaIndex.ipynb
@@ -0,0 +1,637 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1972a352-a0e3-41b7-81dc-dd4ae2b890c3",
   "metadata": {},
   "source": [
    "## Retrieval-Augmented Generation (RAG) using LlamaIndex"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05d999bc-83a3-454f-a8a4-44cbff1fcedc",
   "metadata": {},
   "source": [
    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/microsoft/LLMLingua/blob/main/examples/RAGLlamaIndex.ipynb\">\r\n",
    "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\r\n",
    "</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a06035dc-f812-419b-bd08-538c2e00cdda",
   "metadata": {},
   "source": [
    "[**LlamaIndex**](https://github.com/run-llama/llama_index) is a widely used RAG framework. **LLMLingua** and **LongLLMLingua** have also been incorporated into the [LlamaIndex pipeline](https://github.com/run-llama/llama_index), which allows for more convenient use of LLMLingua-related technologies in RAG scenarios."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6137de2-0e3f-4962-860c-680da4df2eae",
   "metadata": {},
   "source": [
    "More specifically, [**LongLLMLinguaPostprocessor**](https://github.com/run-llama/llama_index/blob/main/llama_index/indices/postprocessor/longllmlingua.py#L16) can be used as a **Postprocessor** in **LlamaIndex** by invoking it, with arguments consistent with those in the [**PromptCompressor**](https://github.com/microsoft/LLMLingua/blob/main/llmlingua/prompt_compressor.py) of [**LLMLingua**](https://github.com/microsoft/LLMLingua).\n",
    "You can call the corresponding compression algorithms in LLMLingua and the question-aware prompt compression method in LongLLMLingua."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44f78b07-0a11-4c71-86cb-213a32c4fd7a",
   "metadata": {},
   "source": [
    "For examples,\n",
    "```python\n",
    "from llama_index.query_engine import RetrieverQueryEngine\n",
    "from llama_index.response_synthesizers import CompactAndRefine\n",
    "from llama_index.indices.postprocessor import LongLLMLinguaPostprocessor\n",
    "\n",
    "node_postprocessor = LongLLMLinguaPostprocessor(\n",
    "    instruction_str=\"Given the context, please answer the final question\",\n",
    "    target_token=300,\n",
    "    rank_method=\"longllmlingua\",\n",
    "    additional_compress_kwargs={\n",
    "        \"condition_compare\": True,\n",
    "        \"condition_in_question\": \"after\",\n",
    "        \"context_budget\": \"+100\",\n",
    "        \"reorder_context\": \"sort\",  # enable document reorder\n",
    "        \"dynamic_context_compression_ratio\": 0.4, # enable dynamic compression ratio\n",
    "    },\n",
    ")\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe3ed1ce-d38d-4048-9db6-9707b55dc642",
   "metadata": {},
   "source": [
    "Retrieval-Augmented Generation (RAG) is a powerful and popular technique that applies specialized knowledge to large language models (LLMs). However, traditional RAG methods tend to have increasingly long prompts, sometimes exceeding **40k**, which can result in high financial and latency costs. Moreover, the decreased information density within the prompts can lead to performance degradation in LLMs, such as the \"lost in the middle\" issue."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae003ead-2f07-44a4-b641-2e33be920dd9",
   "metadata": {},
   "source": [
    "<center><img width=\"800\" src=\"../images/LongLLMLingua_Motivation.png\"></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b39b33f-5860-4825-8f00-d60aed0dce86",
   "metadata": {},
   "source": [
    "To address this, we propose [**LongLLMLingua**](https://arxiv.org/abs/2310.06839), which specifically tackles the low information density problem in long context scenarios via prompt compression, making it particularly suitable for RAG tasks. The main ideas involve a two-stage compression process, as shown by the  <font color='red'>**red line**</font>, which significantly improves the original curve:\n",
    "\n",
    "- Coarse-grained compression through document-level perplexity;\n",
    "- Fine-grained compression of the remaining text using token perplexity;"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c748f877-4bbf-443c-b72b-332be1df6f1a",
   "metadata": {},
   "source": [
    "Instead of fighting against positional effects, we aim to utilize them to our advantage through document reordering, as illustrated by the  <font color='green'>**green line**</font>. In this approach, the most critical passages are placed at the beginning and the end of the context. Furthermore, the entire process becomes more **cost-effective and faster** since it only requires handling **1/4** of the original context."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18422597-687a-43aa-a6ed-ce6244d0eb55",
   "metadata": {},
   "source": [
    "### PG's essay"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51a7accd-5ec2-4ed2-9582-1afdb441a998",
   "metadata": {},
   "source": [
    "Next, we will demonstrate the use of LongLLMLingua on the **PG's essay** dataset in LlamaIndex pipeline, which effectively alleviates the \"lost in the middle\" issue."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "a970a901-11bd-43af-a8bc-7fb2fc6a1a07",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Defaulting to user installation because normal site-packages is not writeable\n",
      "Requirement already satisfied: llmlingua in /home/hjiang/Code/github/LLMLingua (0.1.2)\n",
      "Requirement already satisfied: llama-index in /home/hjiang/.local/lib/python3.9/site-packages (0.8.50)\n",
      "Requirement already satisfied: nltk in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (3.8.1)\n",
      "Requirement already satisfied: numpy in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (1.23.5)\n",
      "Requirement already satisfied: tiktoken in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (0.4.0)\n",
      "Requirement already satisfied: torch in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (1.13.1+cu116)\n",
      "Requirement already satisfied: transformers>=4.26.0 in /home/hjiang/.local/lib/python3.9/site-packages (from llmlingua) (4.34.1)\n",
      "Requirement already satisfied: SQLAlchemy[asyncio]>=1.4.49 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (2.0.22)\n",
      "Requirement already satisfied: dataclasses-json<0.6.0,>=0.5.7 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (0.5.14)\n",
      "Requirement already satisfied: deprecated>=1.2.9.3 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (1.2.14)\n",
      "Requirement already satisfied: fsspec>=2023.5.0 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (2023.6.0)\n",
      "Requirement already satisfied: langchain>=0.0.303 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (0.0.322)\n",
      "Requirement already satisfied: nest-asyncio<2.0.0,>=1.5.8 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (1.5.8)\n",
      "Requirement already satisfied: openai>=0.26.4 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (0.27.8)\n",
      "Requirement already satisfied: pandas in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (2.0.3)\n",
      "Requirement already satisfied: tenacity<9.0.0,>=8.2.0 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (8.2.3)\n",
      "Requirement already satisfied: typing-extensions>=4.5.0 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (4.7.1)\n",
      "Requirement already satisfied: typing-inspect>=0.8.0 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (0.9.0)\n",
      "Requirement already satisfied: urllib3<2 in /home/hjiang/.local/lib/python3.9/site-packages (from llama-index) (1.26.16)\n",
      "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /home/hjiang/.local/lib/python3.9/site-packages (from dataclasses-json<0.6.0,>=0.5.7->llama-index) (3.20.1)\n",
      "Requirement already satisfied: wrapt<2,>=1.10 in /home/hjiang/.local/lib/python3.9/site-packages (from deprecated>=1.2.9.3->llama-index) (1.15.0)\n",
      "Requirement already satisfied: PyYAML>=5.3 in /usr/lib/python3/dist-packages (from langchain>=0.0.303->llama-index) (5.3.1)\n",
      "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (3.8.5)\n",
      "Requirement already satisfied: anyio<4.0 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (3.7.1)\n",
      "Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (4.0.2)\n",
      "Requirement already satisfied: jsonpatch<2.0,>=1.33 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (1.33)\n",
      "Requirement already satisfied: langsmith<0.1.0,>=0.0.43 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (0.0.51)\n",
      "Requirement already satisfied: pydantic<3,>=1 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (1.10.12)\n",
      "Requirement already satisfied: requests<3,>=2 in /home/hjiang/.local/lib/python3.9/site-packages (from langchain>=0.0.303->llama-index) (2.29.0)\n",
      "Requirement already satisfied: click in /home/hjiang/.local/lib/python3.9/site-packages (from nltk->llmlingua) (8.1.6)\n",
      "Requirement already satisfied: joblib in /home/hjiang/.local/lib/python3.9/site-packages (from nltk->llmlingua) (1.3.1)\n",
      "Requirement already satisfied: regex>=2021.8.3 in /home/hjiang/.local/lib/python3.9/site-packages (from nltk->llmlingua) (2023.6.3)\n",
      "Requirement already satisfied: tqdm in /home/hjiang/.local/lib/python3.9/site-packages (from nltk->llmlingua) (4.65.0)\n",
      "Requirement already satisfied: greenlet!=0.4.17 in /home/hjiang/.local/lib/python3.9/site-packages (from SQLAlchemy[asyncio]>=1.4.49->llama-index) (3.0.0)\n",
      "Requirement already satisfied: filelock in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (3.12.2)\n",
      "Requirement already satisfied: huggingface-hub<1.0,>=0.16.4 in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (0.16.4)\n",
      "Requirement already satisfied: packaging>=20.0 in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (23.0)\n",
      "Requirement already satisfied: tokenizers<0.15,>=0.14 in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (0.14.1)\n",
      "Requirement already satisfied: safetensors>=0.3.1 in /home/hjiang/.local/lib/python3.9/site-packages (from transformers>=4.26.0->llmlingua) (0.3.1)\n",
      "Requirement already satisfied: mypy-extensions>=0.3.0 in /home/hjiang/.local/lib/python3.9/site-packages (from typing-inspect>=0.8.0->llama-index) (1.0.0)\n",
      "Requirement already satisfied: python-dateutil>=2.8.2 in /home/hjiang/.local/lib/python3.9/site-packages (from pandas->llama-index) (2.8.2)\n",
      "Requirement already satisfied: pytz>=2020.1 in /home/hjiang/.local/lib/python3.9/site-packages (from pandas->llama-index) (2023.3)\n",
      "Requirement already satisfied: tzdata>=2022.1 in /home/hjiang/.local/lib/python3.9/site-packages (from pandas->llama-index) (2023.3)\n",
      "Requirement already satisfied: attrs>=17.3.0 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.303->llama-index) (23.1.0)\n",
      "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.303->llama-index) (3.2.0)\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.303->llama-index) (6.0.4)\n",
      "Requirement already satisfied: yarl<2.0,>=1.0 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.303->llama-index) (1.9.2)\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.303->llama-index) (1.4.0)\n",
      "Requirement already satisfied: aiosignal>=1.1.2 in /home/hjiang/.local/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.303->llama-index) (1.3.1)\n",
      "Requirement already satisfied: idna>=2.8 in /usr/lib/python3/dist-packages (from anyio<4.0->langchain>=0.0.303->llama-index) (2.8)\n",
      "Requirement already satisfied: sniffio>=1.1 in /home/hjiang/.local/lib/python3.9/site-packages (from anyio<4.0->langchain>=0.0.303->llama-index) (1.3.0)\n",
      "Requirement already satisfied: exceptiongroup in /home/hjiang/.local/lib/python3.9/site-packages (from anyio<4.0->langchain>=0.0.303->llama-index) (1.1.2)\n",
      "Requirement already satisfied: jsonpointer>=1.9 in /usr/lib/python3/dist-packages (from jsonpatch<2.0,>=1.33->langchain>=0.0.303->llama-index) (2.0)\n",
      "Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas->llama-index) (1.14.0)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests<3,>=2->langchain>=0.0.303->llama-index) (2019.11.28)\n",
      "\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.1\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3.9 -m pip install --upgrade pip\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "# Install dependency.\n",
    "!pip install llmlingua llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "cbbbf3de-a9d6-46cf-afab-dcb72a6154ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Using the OAI\n",
    "import openai\n",
    "openai.api_key = \"<insert_openai_key>\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "46506810-8565-43da-984b-d862c56b49c2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# or Using the AOAI\n",
    "import openai\n",
    "openai.api_key = \"<insert_openai_key>\"\n",
    "openai.api_base = \"https://xxxx.openai.azure.com/\"\n",
    "openai.api_type = 'azure'\n",
    "openai.api_version = '2023-05-15'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8676ffa-5117-44dc-9742-bb9ab1d56e0c",
   "metadata": {},
   "source": [
    "### Setup Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "bb349566-83d8-44ac-a683-b67ed9ddf7a6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2023-10-31 15:16:22--  https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1\n",
      "Resolving www.dropbox.com (www.dropbox.com)... 162.125.2.18, 2620:100:6017:18::a27d:212\n",
      "Connecting to www.dropbox.com (www.dropbox.com)|162.125.2.18|:443... connected.\n",
      "HTTP request sent, awaiting response... 302 Found\n",
      "Location: /s/dl/f6bmb19xdg0xedm/paul_graham_essay.txt [following]\n",
      "--2023-10-31 15:16:22--  https://www.dropbox.com/s/dl/f6bmb19xdg0xedm/paul_graham_essay.txt\n",
      "Reusing existing connection to www.dropbox.com:443.\n",
      "HTTP request sent, awaiting response... 302 Found\n",
      "Location: https://uc79cc99922e921397f441d519f7.dl.dropboxusercontent.com/cd/0/get/CGo-ddVpLM8dpEbGPhaDcZnqlmurexkVdlYv9jcpsjMI9xyxqtt-feE8m6nlMFwBfbWAp9oEfbf0YZC65uNupypod6w4ANXltrG3NpGWErO9j18UQuwqd2wr79FcGtg55HxuwN_2xElcqEPjH3zg8RZl/file?dl=1# [following]\n",
      "--2023-10-31 15:16:22--  https://uc79cc99922e921397f441d519f7.dl.dropboxusercontent.com/cd/0/get/CGo-ddVpLM8dpEbGPhaDcZnqlmurexkVdlYv9jcpsjMI9xyxqtt-feE8m6nlMFwBfbWAp9oEfbf0YZC65uNupypod6w4ANXltrG3NpGWErO9j18UQuwqd2wr79FcGtg55HxuwN_2xElcqEPjH3zg8RZl/file?dl=1\n",
      "Resolving uc79cc99922e921397f441d519f7.dl.dropboxusercontent.com (uc79cc99922e921397f441d519f7.dl.dropboxusercontent.com)... 162.125.2.15, 2620:100:6017:15::a27d:20f\n",
      "Connecting to uc79cc99922e921397f441d519f7.dl.dropboxusercontent.com (uc79cc99922e921397f441d519f7.dl.dropboxusercontent.com)|162.125.2.15|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 75047 (73K) [application/binary]\n",
      "Saving to: ‘paul_graham_essay.txt’\n",
      "\n",
      "paul_graham_essay.t 100%[===================>]  73.29K  --.-KB/s    in 0.03s   \n",
      "\n",
      "2023-10-31 15:16:23 (2.15 MB/s) - ‘paul_graham_essay.txt’ saved [75047/75047]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!wget \"https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1\" -O paul_graham_essay.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "cc17bbc5-86cb-4d15-a730-955af85a10b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import (\n",
    "    VectorStoreIndex,\n",
    "    SimpleDirectoryReader,\n",
    "    load_index_from_storage,\n",
    "    StorageContext,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "2a4f0fa1-fd32-468c-aa9d-4bee21d9dd89",
   "metadata": {},
   "outputs": [],
   "source": [
    "# load documents\n",
    "documents = SimpleDirectoryReader(\n",
    "    input_files=[\"paul_graham_essay.txt\"]\n",
    ").load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "01c16eb9-b9d1-4357-9647-e587633fbcdd",
   "metadata": {},
   "outputs": [],
   "source": [
    "index = VectorStoreIndex.from_documents(documents)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "71fad540-eac7-425c-9ca0-7886d0b9a1cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = index.as_retriever(similarity_top_k=10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "672cccf6-dc14-40a6-a057-cc6f2a3aeea0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# question = \"What did the author do growing up?\"\n",
    "# question = \"What did the author do during his time in YC?\"\n",
    "question = \"Where did the author go for art school?\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "ef94e951-7576-45d7-bf75-8f28e70598fd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Ground-truth Answer\n",
    "answer = \"RISD\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "5aa58abe-c2f8-4de0-b3af-c852f9ef9bdb",
   "metadata": {},
   "outputs": [],
   "source": [
    "contexts = retriever.retrieve(question)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "3973a921-0f52-4e77-a123-b0d06776cd4c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "10"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "context_list = [n.get_content() for n in contexts]\n",
    "len(context_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba1c6d52-dc87-434c-a41c-0bbc8a286504",
   "metadata": {},
   "source": [
    "### The response of Original prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "3d441f10-c5c7-4d45-b09a-717e536b36bf",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author went to the Rhode Island School of Design (RISD) for art school.\n"
     ]
    }
   ],
   "source": [
    "# The response from original prompt\n",
    "from llama_index.llms import OpenAI\n",
    "\n",
    "llm = OpenAI(model=\"gpt-3.5-turbo-16k\")\n",
    "prompt = \"\\n\\n\".join(context_list + [question])\n",
    "\n",
    "response = llm.complete(prompt)\n",
    "print(str(response))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9aa90492-8ad1-4a89-85c5-26b8472f1ff0",
   "metadata": {},
   "source": [
    "### The response of Compressed Prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "fa638dec-c9ec-4dce-9dac-d768145de714",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7b37c874e2d34f2cbbd88f3556e42c80",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/hjiang/.local/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:362: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.\n",
      "  warnings.warn(\n",
      "/home/hjiang/.local/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:367: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "# Setup LLMLingua\n",
    "from llama_index.query_engine import RetrieverQueryEngine\n",
    "from llama_index.response_synthesizers import CompactAndRefine\n",
    "from llama_index.indices.postprocessor import LongLLMLinguaPostprocessor\n",
    "\n",
    "node_postprocessor = LongLLMLinguaPostprocessor(\n",
    "    instruction_str=\"Given the context, please answer the final question\",\n",
    "    target_token=300,\n",
    "    rank_method=\"longllmlingua\",\n",
    "    additional_compress_kwargs={\n",
    "        \"condition_compare\": True,\n",
    "        \"condition_in_question\": \"after\",\n",
    "        \"context_budget\": \"+100\",\n",
    "        \"reorder_context\": \"sort\",  # enable document reorder,\n",
    "        \"dynamic_context_compression_ratio\": 0.3,\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3438c76e-5bf9-4db6-97a7-69f5d9be0707",
   "metadata": {},
   "source": [
    "We show you how to compose a `retriever` + `prompt compressor` + `query engine` into the **RAG** pipeline."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e0df62f-be5f-43f5-9d53-0d31cfcc5c81",
   "metadata": {},
   "source": [
    "#### Method One: Call Step-by-Step"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "be610922-c84d-4ed7-91a3-52aff193bc56",
   "metadata": {},
   "outputs": [],
   "source": [
    "retrieved_nodes = retriever.retrieve(question)\n",
    "synthesizer = CompactAndRefine()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "f2239bab-0c64-4798-a435-f98f9e09107d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.indices.query.schema import QueryBundle\n",
    "\n",
    "# outline steps in RetrieverQueryEngine for clarity:\n",
    "# postprocess (compress), synthesize\n",
    "new_retrieved_nodes = node_postprocessor.postprocess_nodes(\n",
    "    retrieved_nodes, query_bundle=QueryBundle(query_str=question)\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "3ac375b9-ee42-4b94-a9af-ce37bf62e0ec",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "next Rtm's advice hadn' included anything that. I wanted to do something completely different, so I decided I'd paint. I wanted to how good I could get if I focused on it. the day after stopped on YC, I painting. I was rusty and it took a while to get back into shape, but it was at least completely engaging.1]\n",
      "\n",
      "I wanted to back RISD, was now broke and RISD was very expensive so decided job for a year and return RISD the fall. I got one at Interleaf, which made software for creating documents. You like Microsoft Word? Exactly That was I low end software tends to high. Interleaf still had a few years to live yet. []\n",
      "\n",
      " the Accademia wasn't, and my money was running out, end year back to the\n",
      " lot the color class I tookD, but otherwise I was basically myself to do that for in993 I dropped I aroundidence bit then my friend Par did me a big A rent-partment building New York. Did I want it Itt more my place, and York be where the artists. wanted [For when you that ofs you big painting of this type hanging in the apartment of a hedge fund manager, you know he paid millions of dollars for it. That's not always why artists have a signature style, but it's usually why buyers pay a lot for such work. [6]\n",
      "\n",
      "Original Tokens: 10719\n",
      "Compressed Tokens: 308\n",
      "Comressed Ratio: 34.80x\n"
     ]
    }
   ],
   "source": [
    "original_contexts = \"\\n\\n\".join([n.get_content() for n in retrieved_nodes])\n",
    "compressed_contexts = \"\\n\\n\".join([n.get_content() for n in new_retrieved_nodes])\n",
    "\n",
    "original_tokens = node_postprocessor._llm_lingua.get_token_length(original_contexts)\n",
    "compressed_tokens = node_postprocessor._llm_lingua.get_token_length(compressed_contexts)\n",
    "\n",
    "print(compressed_contexts)\n",
    "print()\n",
    "print(\"Original Tokens:\", original_tokens)\n",
    "print(\"Compressed Tokens:\", compressed_tokens)\n",
    "print(\"Comressed Ratio:\", f\"{original_tokens/(compressed_tokens + 1e-5):.2f}x\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "c72e8559-12a7-4f9d-b3ec-b21f0241aff5",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = synthesizer.synthesize(question, new_retrieved_nodes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "5c26eb3e-bcc9-4d1c-9e9e-1e511b22831f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author went to RISD for art school.\n"
     ]
    }
   ],
   "source": [
    "print(str(response))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1eb3a56-5ba4-4a98-a4b7-47e6b3fb0027",
   "metadata": {},
   "source": [
    "#### Method Two: End-to-End Call"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "80042d53-97f3-4719-b95b-38c47b24f075",
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever_query_engine = RetrieverQueryEngine.from_args(\n",
    "    retriever, node_postprocessors=[node_postprocessor]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "6eb1a345-6f07-48b7-aab7-71c0da772839",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = retriever_query_engine.query(question)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "0737ece9-0239-4e3e-adf6-d39cafc85a05",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author went to RISD for art school.\n"
     ]
    }
   ],
   "source": [
    "print(str(response))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
--- a/images/LLMLingua.png
+++ b/images/LLMLingua.png
--- a/images/LongLLMLingua.png
+++ b/images/LongLLMLingua.png
--- a/images/motivation.png
+++ b/images/motivation.png