added contextual_compression + adaptive_retrieval (latest has some bugs yet)

2025-04-07 00:48:52 +03:00 · 2024-07-23 18:35:22 +03:00
parent 9d7ba88755
commit 232021af35
4 changed files with 630 additions and 0 deletions
--- a/all_rag_techniques/adaptive_retrieval.ipynb
+++ b/all_rag_techniques/adaptive_retrieval.ipynb
@@ -0,0 +1,462 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Adaptive Retrieval-Augmented Generation (RAG) System\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "This system implements an advanced Retrieval-Augmented Generation (RAG) approach that adapts its retrieval strategy based on the type of query. By leveraging Language Models (LLMs) at various stages, it aims to provide more accurate, relevant, and context-aware responses to user queries.\n",
+    "\n",
+    "## Motivation\n",
+    "\n",
+    "Traditional RAG systems often use a one-size-fits-all approach to retrieval, which can be suboptimal for different types of queries. Our adaptive system is motivated by the understanding that different types of questions require different retrieval strategies. For example, a factual query might benefit from precise, focused retrieval, while an analytical query might require a broader, more diverse set of information.\n",
+    "\n",
+    "## Key Components\n",
+    "\n",
+    "1. **Query Classifier**: Determines the type of query (Factual, Analytical, Opinion, or Contextual).\n",
+    "\n",
+    "2. **Adaptive Retrieval Strategies**: Four distinct strategies tailored to different query types:\n",
+    "   - Factual Strategy\n",
+    "   - Analytical Strategy\n",
+    "   - Opinion Strategy\n",
+    "   - Contextual Strategy\n",
+    "\n",
+    "3. **LLM Integration**: LLMs are used throughout the process to enhance retrieval and ranking.\n",
+    "\n",
+    "4. **OpenAI GPT Model**: Generates the final response using the retrieved documents as context.\n",
+    "\n",
+    "## Method Details\n",
+    "\n",
+    "### 1. Query Classification\n",
+    "\n",
+    "The system begins by classifying the user's query into one of four categories:\n",
+    "- Factual: Queries seeking specific, verifiable information.\n",
+    "- Analytical: Queries requiring comprehensive analysis or explanation.\n",
+    "- Opinion: Queries about subjective matters or seeking diverse viewpoints.\n",
+    "- Contextual: Queries that depend on user-specific context.\n",
+    "\n",
+    "### 2. Adaptive Retrieval Strategies\n",
+    "\n",
+    "Each query type triggers a specific retrieval strategy:\n",
+    "\n",
+    "#### Factual Strategy\n",
+    "- Enhances the original query using an LLM for better precision.\n",
+    "- Retrieves documents based on the enhanced query.\n",
+    "- Uses an LLM to rank documents by relevance.\n",
+    "\n",
+    "#### Analytical Strategy\n",
+    "- Generates multiple sub-queries using an LLM to cover different aspects of the main query.\n",
+    "- Retrieves documents for each sub-query.\n",
+    "- Ensures diversity in the final document selection using an LLM.\n",
+    "\n",
+    "#### Opinion Strategy\n",
+    "- Identifies different viewpoints on the topic using an LLM.\n",
+    "- Retrieves documents representing each viewpoint.\n",
+    "- Uses an LLM to select a diverse range of opinions from the retrieved documents.\n",
+    "\n",
+    "#### Contextual Strategy\n",
+    "- Incorporates user-specific context into the query using an LLM.\n",
+    "- Performs retrieval based on the contextualized query.\n",
+    "- Ranks documents considering both relevance and user context.\n",
+    "\n",
+    "### 3. LLM-Enhanced Ranking\n",
+    "\n",
+    "After retrieval, each strategy uses an LLM to perform a final ranking of the documents. This step ensures that the most relevant and appropriate documents are selected for the next stage.\n",
+    "\n",
+    "### 4. Response Generation\n",
+    "\n",
+    "The final set of retrieved documents is passed to an OpenAI GPT model, which generates a response based on the query and the provided context.\n",
+    "\n",
+    "## Benefits of This Approach\n",
+    "\n",
+    "1. **Improved Accuracy**: By tailoring the retrieval strategy to the query type, the system can provide more accurate and relevant information.\n",
+    "\n",
+    "2. **Flexibility**: The system adapts to different types of queries, handling a wide range of user needs.\n",
+    "\n",
+    "3. **Context-Awareness**: Especially for contextual queries, the system can incorporate user-specific information for more personalized responses.\n",
+    "\n",
+    "4. **Diverse Perspectives**: For opinion-based queries, the system actively seeks out and presents multiple viewpoints.\n",
+    "\n",
+    "5. **Comprehensive Analysis**: The analytical strategy ensures a thorough exploration of complex topics.\n",
+    "\n",
+    "## Conclusion\n",
+    "\n",
+    "This adaptive RAG system represents a significant advancement over traditional RAG approaches. By dynamically adjusting its retrieval strategy and leveraging LLMs throughout the process, it aims to provide more accurate, relevant, and nuanced responses to a wide variety of user queries."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<div style=\"text-align: center;\">\n",
+    "\n",
+    "<img src=\"../images/adaptive_retrieval.svg\" alt=\"adaptive retrieval\" style=\"width:100%; height:auto;\">\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "c:\\Users\\N7\\PycharmProjects\\llm_tasks\\RAG_TECHNIQUES\\.venv\\Lib\\site-packages\\deepeval\\__init__.py:45: UserWarning: You are using deepeval version 0.21.70, however version 0.21.71 is available. You should consider upgrading via the \"pip install --upgrade deepeval\" command.\n",
+      "  warnings.warn(\n"
+     ]
+    }
+   ],
+   "source": [
+    "import os\n",
+    "import sys\n",
+    "from dotenv import load_dotenv\n",
+    "from langchain.prompts import PromptTemplate\n",
+    "from langchain.vectorstores import FAISS\n",
+    "from langchain.embeddings import OpenAIEmbeddings\n",
+    "from langchain.text_splitter import CharacterTextSplitter\n",
+    "from langchain.prompts import PromptTemplate\n",
+    "\n",
+    "from langchain_core.retrievers import BaseRetriever\n",
+    "from typing import Dict, Any\n",
+    "from langchain.docstore.document import Document\n",
+    "from langchain_community.chat_models import ChatOpenAI\n",
+    "from langchain_core.pydantic_v1 import BaseModel, Field\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..'))) # Add the parent directory to the path sicnce we work with notebooks\n",
+    "from helper_functions import *\n",
+    "from evaluation.evalute_rag import *\n",
+    "\n",
+    "# Load environment variables from a .env file\n",
+    "load_dotenv()\n",
+    "\n",
+    "# Set the OpenAI API key environment variable\n",
+    "os.environ[\"OPENAI_API_KEY\"] = os.getenv('OPENAI_API_KEY')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class categories_options(BaseModel):\n",
+    "        category: str = Field(description=\"The category of the query, the options are: Factual, Analytical, Opinion, or Contextual\", example=\"Factual\")\n",
+    "\n",
+    "\n",
+    "class QueryClassifier:\n",
+    "    def __init__(self):\n",
+    "        self.llm = ChatOpenAI(temperature=0, model_name=\"gpt-4\", max_tokens=4000)\n",
+    "        self.prompt = PromptTemplate(\n",
+    "            input_variables=[\"query\"],\n",
+    "            template=\"Classify the following query into one of these categories: Factual, Analytical, Opinion, or Contextual.\\nQuery: {query}\\nCategory:\"\n",
+    "        )\n",
+    "        self.chain = self.prompt | self.llm.with_structured_output(categories_options)\n",
+    "\n",
+    "\n",
+    "    def classify(self, query):\n",
+    "        return self.chain.invoke(query).category"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class BaseRetrievalStrategy:\n",
+    "    def __init__(self, texts):\n",
+    "        self.embeddings = OpenAIEmbeddings()\n",
+    "        text_splitter = CharacterTextSplitter(chunk_size=800, chunk_overlap=0)\n",
+    "        self.documents = text_splitter.create_documents(texts)\n",
+    "        self.db = FAISS.from_documents(self.documents, self.embeddings)\n",
+    "        self.llm = ChatOpenAI(temperature=0, model_name=\"gpt-4o\", max_tokens=4000)\n",
+    "\n",
+    "\n",
+    "    def retrieve(self, query, k=4):\n",
+    "        return self.db.similarity_search(query, k=k)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class relevant_score(BaseModel):\n",
+    "        score: float = Field(description=\"The relevance score of the document to the query\", example=8.0)\n",
+    "\n",
+    "class FactualRetrievalStrategy(BaseRetrievalStrategy):\n",
+    "    def retrieve(self, query, k=4):\n",
+    "        # Use LLM to enhance the query\n",
+    "        enhanced_query_prompt = PromptTemplate(\n",
+    "            input_variables=[\"query\"],\n",
+    "            template=\"Enhance this factual query for better information retrieval: {query}\"\n",
+    "        )\n",
+    "        query_chain = enhanced_query_prompt | self.llm\n",
+    "        enhanced_query = query_chain.invoke(query).content\n",
+    "\n",
+    "        # Retrieve documents using the enhanced query\n",
+    "        docs = self.db.similarity_search(enhanced_query, k=k*2)\n",
+    "\n",
+    "        # Use LLM to rank the relevance of retrieved documents\n",
+    "        ranking_prompt = PromptTemplate(\n",
+    "            input_variables=[\"query\", \"doc\"],\n",
+    "            template=\"On a scale of 1-10, how relevant is this document to the query: '{query}'?\\nDocument: {doc}\\nRelevance score:\"\n",
+    "        )\n",
+    "        ranking_chain = ranking_prompt | self.llm.with_structured_output(relevant_score)\n",
+    "\n",
+    "        ranked_docs = []\n",
+    "        for doc in docs:\n",
+    "            input_data = {\"query\": enhanced_query, \"doc\": doc.page_content}\n",
+    "            score = float(ranking_chain.invoke(input_data).score)\n",
+    "            ranked_docs.append((doc, score))\n",
+    "\n",
+    "        # Sort by relevance score and return top k\n",
+    "        ranked_docs.sort(key=lambda x: x[1], reverse=True)\n",
+    "        return [doc for doc, _ in ranked_docs[:k]]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pydantic import BaseModel, Field\n",
+    "from typing import List\n",
+    "from langchain.prompts import PromptTemplate\n",
+    "\n",
+    "class SelectedIndices(BaseModel):\n",
+    "    indices: List[int] = Field(description=\"Indices of selected documents\", example=[0, 1, 2, 3])\n",
+    "\n",
+    "class AnalyticalRetrievalStrategy(BaseRetrievalStrategy):\n",
+    "    def retrieve(self, query, k=3):\n",
+    "        # Use LLM to generate sub-queries for comprehensive analysis\n",
+    "        sub_queries_prompt = PromptTemplate(\n",
+    "            input_variables=[\"query\", \"k\"],\n",
+    "            template=\"Generate {k} different aspects or sub-questions to comprehensively analyze: {query}\"\n",
+    "        )\n",
+    "        sub_queries_chain = sub_queries_prompt | self.llm\n",
+    "        input_data = {\"query\": query, \"k\": k}\n",
+    "        sub_queries_result = sub_queries_chain.invoke(input_data).content\n",
+    "        sub_queries = sub_queries_result.split('\\n')\n",
+    "\n",
+    "        all_docs = []\n",
+    "        for sub_query in sub_queries:\n",
+    "            all_docs.extend(self.db.similarity_search(sub_query, k=2))\n",
+    "\n",
+    "        # Use LLM to ensure diversity and relevance\n",
+    "        diversity_prompt = PromptTemplate(\n",
+    "            input_variables=[\"query\", \"docs\", \"k\"],\n",
+    "            template=\"Select the most diverse and relevant set of {k} documents for the query: '{query}'\\nDocuments: {docs}\\nReturn only the indices of selected documents as a list of integers.\"\n",
+    "        )\n",
+    "        diversity_chain = diversity_prompt | self.llm.with_structured_output(SelectedIndices)\n",
+    "        docs_text = \"\\n\".join([f\"{i}: {doc.page_content[:50]}...\" for i, doc in enumerate(all_docs)])\n",
+    "        input_data = {\"query\": query, \"docs\": docs_text, \"k\": k}\n",
+    "        selected_indices_result = diversity_chain.invoke(input_data).indices\n",
+    "        \n",
+    "        return [all_docs[i] for i in selected_indices_result.indices if i < len(all_docs)]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class OpinionRetrievalStrategy(BaseRetrievalStrategy):\n",
+    "    def retrieve(self, query, k=3):\n",
+    "        # Use LLM to identify potential viewpoints\n",
+    "        viewpoints_prompt = PromptTemplate(\n",
+    "            input_variables=[\"query\", \"k\"],\n",
+    "            template=\"Identify {k} distinct viewpoints or perspectives on the topic: {query}\"\n",
+    "        )\n",
+    "        viewpoints_chain = viewpoints_prompt | self.llm\n",
+    "        input_data = {\"query\": query, \"k\": k}\n",
+    "        viewpoints = viewpoints_chain.invoke(input_data).content.split('\\n')\n",
+    "\n",
+    "        all_docs = []\n",
+    "        for viewpoint in viewpoints:\n",
+    "            all_docs.extend(self.db.similarity_search(f\"{query} {viewpoint}\", k=2))\n",
+    "\n",
+    "        # Use LLM to classify and select diverse opinions\n",
+    "        opinion_prompt = PromptTemplate(\n",
+    "            input_variables=[\"query\", \"docs\", \"k\"],\n",
+    "            template=\"Classify these documents into distinct opinions on '{query}' and select the {k} most representative and diverse viewpoints:\\nDocuments: {docs}\\nSelected indices:\"\n",
+    "        )\n",
+    "        opinion_chain = opinion_prompt | self.llm.with_structured_output(SelectedIndices)\n",
+    "        \n",
+    "        docs_text = \"\\n\".join([f\"{i}: {doc.page_content[:100]}...\" for i, doc in enumerate(all_docs)])\n",
+    "        input_data = {\"query\": query, \"docs\": docs_text, \"k\": k}\n",
+    "        selected_indices = opinion_chain.invoke(input_data).indices\n",
+    "        \n",
+    "        return [all_docs[int(i)] for i in selected_indices.split() if i.isdigit() and int(i) < len(all_docs)]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class ContextualRetrievalStrategy(BaseRetrievalStrategy):\n",
+    "    def retrieve(self, query, k=4, user_context=None):\n",
+    "        # Use LLM to incorporate user context into the query\n",
+    "        context_prompt = PromptTemplate(\n",
+    "            input_variables=[\"query\", \"context\"],\n",
+    "            template=\"Given the user context: {context}\\nReformulate the query to best address the user's needs: {query}\"\n",
+    "        )\n",
+    "        context_chain = context_prompt | self.llm\n",
+    "        input_data = {\"query\": query, \"context\": user_context or \"No specific context provided\"}\n",
+    "        contextualized_query = context_chain.invoke(input_data).content\n",
+    "\n",
+    "        # Retrieve documents using the contextualized query\n",
+    "        docs = self.db.similarity_search(contextualized_query, k=k*2)\n",
+    "\n",
+    "        # Use LLM to rank the relevance of retrieved documents considering the user context\n",
+    "        ranking_prompt = PromptTemplate(\n",
+    "            input_variables=[\"query\", \"context\", \"doc\"],\n",
+    "            template=\"Given the query: '{query}' and user context: '{context}', rate the relevance of this document on a scale of 1-10:\\nDocument: {doc}\\nRelevance score:\"\n",
+    "        )\n",
+    "        ranking_chain = ranking_prompt | self.llm.with_structured_output(relevant_score)\n",
+    "\n",
+    "        ranked_docs = []\n",
+    "        for doc in docs:\n",
+    "            input_data = {\"query\": contextualized_query, \"context\": user_context or \"No specific context provided\", \"doc\": doc.page_content}\n",
+    "            score = float(ranking_chain.invoke(input_data).score)\n",
+    "            ranked_docs.append((doc, score))\n",
+    "\n",
+    "        # Sort by relevance score and return top k\n",
+    "        ranked_docs.sort(key=lambda x: x[1], reverse=True)\n",
+    "        return [doc for doc, _ in ranked_docs[:k]]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class AdaptiveRetriever:\n",
+    "    def __init__(self, texts: List[str]):\n",
+    "        self.classifier = QueryClassifier()\n",
+    "        self.strategies = {\n",
+    "            \"Factual\": FactualRetrievalStrategy(texts),\n",
+    "            \"Analytical\": AnalyticalRetrievalStrategy(texts),\n",
+    "            \"Opinion\": OpinionRetrievalStrategy(texts),\n",
+    "            \"Contextual\": ContextualRetrievalStrategy(texts)\n",
+    "        }\n",
+    "\n",
+    "    def get_relevant_documents(self, query: str) -> List[Document]:\n",
+    "        category = self.classifier.classify(query)\n",
+    "        strategy = self.strategies[category]\n",
+    "        return strategy.retrieve(query)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class PydanticAdaptiveRetriever(BaseRetriever):\n",
+    "    adaptive_retriever: AdaptiveRetriever = Field(exclude=True)\n",
+    "\n",
+    "    class Config:\n",
+    "        arbitrary_types_allowed = True\n",
+    "\n",
+    "    def get_relevant_documents(self, query: str) -> List[Document]:\n",
+    "        return self.adaptive_retriever.get_relevant_documents(query)\n",
+    "\n",
+    "    async def aget_relevant_documents(self, query: str) -> List[Document]:\n",
+    "        return self.get_relevant_documents(query)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class AdaptiveRAG:\n",
+    "    def __init__(self, texts: List[str]):\n",
+    "        adaptive_retriever = AdaptiveRetriever(texts)\n",
+    "        self.retriever = PydanticAdaptiveRetriever(adaptive_retriever=adaptive_retriever)\n",
+    "        self.llm = ChatOpenAI(temperature=0, model_name=\"gpt-4o\", max_tokens=4000)\n",
+    "        \n",
+    "        # Create a custom prompt\n",
+    "        prompt_template = \"\"\"Use the following pieces of context to answer the question at the end. \n",
+    "        If you don't know the answer, just say that you don't know, don't try to make up an answer.\n",
+    "\n",
+    "        {context}\n",
+    "\n",
+    "        Question: {question}\n",
+    "        Answer:\"\"\"\n",
+    "        prompt = PromptTemplate(template=prompt_template, input_variables=[\"context\", \"question\"])\n",
+    "        \n",
+    "        # Create the LLM chain\n",
+    "        self.llm_chain = prompt | self.llm\n",
+    "        \n",
+    "      \n",
+    "\n",
+    "    def answer(self, query: str) -> str:\n",
+    "        docs = self.retriever.get_relevant_documents(query)\n",
+    "        input_data = {\"context\": \"\\n\".join([doc.page_content for doc in docs]), \"question\": query}\n",
+    "        return self.llm_chain.invoke(input_data)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Usage\n",
+    "texts = [\n",
+    "    \"The Earth is the third planet from the Sun and the only astronomical object known to harbor life.\"\n",
+    "    ]\n",
+    "rag_system = AdaptiveRAG(texts)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "answer = rag_system.answer(\"What are the effects of global warming?\")\n",
+    "print(answer)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/all_rag_techniques/contextual_compression.ipynb
+++ b/all_rag_techniques/contextual_compression.ipynb
@@ -0,0 +1,162 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<div style=\"text-align: center;\">\n",
+    "\n",
+    "<img src=\"../images/contextual_compression.svg\" alt=\"contextual compression\" style=\"width:70%; height:auto;\">\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Import libraries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import sys\n",
+    "from dotenv import load_dotenv\n",
+    "from langchain.retrievers.document_compressors import LLMChainExtractor\n",
+    "from langchain.retrievers import ContextualCompressionRetriever\n",
+    "from langchain.chains import RetrievalQA\n",
+    "\n",
+    "\n",
+    "sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..'))) # Add the parent directory to the path sicnce we work with notebooks\n",
+    "from helper_functions import *\n",
+    "from evaluation.evalute_rag import *\n",
+    "\n",
+    "# Load environment variables from a .env file\n",
+    "load_dotenv()\n",
+    "\n",
+    "# Set the OpenAI API key environment variable\n",
+    "os.environ[\"OPENAI_API_KEY\"] = os.getenv('OPENAI_API_KEY')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Define document's path"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "path = \"../data/Understanding_Climate_Change.pdf\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Create a vector store"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "vector_store = encode_pdf(path)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Create a retriever + contexual compressor + combine them "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create a retriever\n",
+    "retriever = vector_store.as_retriever()\n",
+    "\n",
+    "\n",
+    "#Create a contextual compressor\n",
+    "llm = ChatOpenAI(temperature=0, model_name=\"gpt-4o-mini\", max_tokens=4000)\n",
+    "compressor = LLMChainExtractor.from_llm(llm)\n",
+    "\n",
+    "#Combine the retriever with the compressor\n",
+    "compression_retriever = ContextualCompressionRetriever(\n",
+    "    base_compressor=compressor,\n",
+    "    base_retriever=retriever\n",
+    ")\n",
+    "\n",
+    "# Create a QA chain with the compressed retriever\n",
+    "qa_chain = RetrievalQA.from_chain_type(\n",
+    "    llm=llm,\n",
+    "    retriever=compression_retriever,\n",
+    "    return_source_documents=True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Example usage"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The main topic of the document is climate change, focusing on international collaboration, national strategies, policy development, and the ethical dimensions of climate justice. It discusses frameworks like the UNFCCC and the Paris Agreement, as well as the importance of sustainable practices for future generations.\n",
+      "Source documents: [Document(metadata={'source': '../data/Understanding_Climate_Change.pdf', 'page': 9}, page_content='Chapter 6: Global and Local Climate Action  \\nInternational Collaboration  \\nUnited Nations Framework Convention on Climate Change (UNFCCC)  \\nThe UNFCCC is an international treaty aimed at addressing climate change. It provides a \\nframework for negotiating specific protocols and agreements, such as the Kyoto Protocol and \\nthe Paris Agreement. Global cooperation under the UNFCCC is crucial for coordinated \\nclimate action.  \\nParis Agreement  \\nThe Paris Agreement, adopted in 2015, aims to limit global warming to well below 2 degrees \\nCelsius above pre-industrial levels, with efforts to limit the increase to 1.5 degrees Celsius. \\nCountries submit nationally determined contributions (NDCs) outlining their climate action \\nplans and targets.  \\nNational Strategies  \\nCarbon Pricing  \\nCarbon pricing mechanisms, such as carbon taxes and cap-and-trade systems, incentivize \\nemission reductions by assigning a cost to carbon emissions. These policies encourage'), Document(metadata={'source': '../data/Understanding_Climate_Change.pdf', 'page': 27}, page_content='Legacy for Future Generations  \\nOur actions today shape the world for future generations. Ensuring a sustainable and resilient \\nplanet is our responsibility to future generations. By working together, we can create a legacy \\nof environmental stewardship, social equity, and global solidarity.  \\nChapter 19: Climate Change and Policy  \\nPolicy Development and Implementation  \\nNational Climate Policies  \\nCountries around the world are developing and implementing national climate policies to \\naddress climate change. These policies set emission reduction targets, promote renewable \\nenergy, and support adaptation measures. Effective policy implementation requires'), Document(metadata={'source': '../data/Understanding_Climate_Change.pdf', 'page': 18}, page_content='This vision includes a healthy planet, thriving ecosystems, and equitable societies. Working together towards this vision creates a sense of purpose and motivation . By embracing these principles and taking concerted action, we can address the urgent challenge of climate change and build a sustainable, resilient, and equitable world for all. The path forward requires courage, commitment, and collaboration, but the rewa rds are immense—a thriving planet and a prosperous future for generations to come.  \\nChapter 13: Climate Change and Social Justice  \\nClimate Justice  \\nUnderstanding Climate Justice  \\nClimate justice emphasizes the ethical dimensions of climate change, recognizing that its impacts are not evenly distributed. Vulnerable populations, including low -income communities, indigenous peoples, and marginalized groups, often face the greatest ris ks while contributing the least to greenhouse gas emissions. Climate justice advocates for')]\n"
+     ]
+    }
+   ],
+   "source": [
+    "query = \"What is the main topic of the document?\"\n",
+    "result = qa_chain.invoke({\"query\": query})\n",
+    "print(result[\"result\"])\n",
+    "print(\"Source documents:\", result[\"source_documents\"])"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/images/adaptive_retrieval.svg
+++ b/images/adaptive_retrieval.svg
--- a/images/contextual_compression.svg
+++ b/images/contextual_compression.svg