Files
RAG_Techniques/all_rag_techniques/query_transformations.ipynb
2024-07-23 22:52:22 +03:00

368 lines
13 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Query Transformations for Improved Retrieval in RAG Systems\n",
"\n",
"## Overview\n",
"\n",
"This code implements three query transformation techniques to enhance the retrieval process in Retrieval-Augmented Generation (RAG) systems:\n",
"\n",
"1. Query Rewriting\n",
"2. Step-back Prompting\n",
"3. Sub-query Decomposition\n",
"\n",
"Each technique aims to improve the relevance and comprehensiveness of retrieved information by modifying or expanding the original query.\n",
"\n",
"## Motivation\n",
"\n",
"RAG systems often face challenges in retrieving the most relevant information, especially when dealing with complex or ambiguous queries. These query transformation techniques address this issue by reformulating queries to better match relevant documents or to retrieve more comprehensive information.\n",
"\n",
"## Key Components\n",
"\n",
"1. Query Rewriting: Reformulates queries to be more specific and detailed.\n",
"2. Step-back Prompting: Generates broader queries for better context retrieval.\n",
"3. Sub-query Decomposition: Breaks down complex queries into simpler sub-queries.\n",
"\n",
"## Method Details\n",
"\n",
"### 1. Query Rewriting\n",
"\n",
"- **Purpose**: To make queries more specific and detailed, improving the likelihood of retrieving relevant information.\n",
"- **Implementation**:\n",
" - Uses a GPT-4 model with a custom prompt template.\n",
" - Takes the original query and reformulates it to be more specific and detailed.\n",
"\n",
"### 2. Step-back Prompting\n",
"\n",
"- **Purpose**: To generate broader, more general queries that can help retrieve relevant background information.\n",
"- **Implementation**:\n",
" - Uses a GPT-4 model with a custom prompt template.\n",
" - Takes the original query and generates a more general \"step-back\" query.\n",
"\n",
"### 3. Sub-query Decomposition\n",
"\n",
"- **Purpose**: To break down complex queries into simpler sub-queries for more comprehensive information retrieval.\n",
"- **Implementation**:\n",
" - Uses a GPT-4 model with a custom prompt template.\n",
" - Decomposes the original query into 2-4 simpler sub-queries.\n",
"\n",
"## Benefits of these Approaches\n",
"\n",
"1. **Improved Relevance**: Query rewriting helps in retrieving more specific and relevant information.\n",
"2. **Better Context**: Step-back prompting allows for retrieval of broader context and background information.\n",
"3. **Comprehensive Results**: Sub-query decomposition enables retrieval of information that covers different aspects of a complex query.\n",
"4. **Flexibility**: Each technique can be used independently or in combination, depending on the specific use case.\n",
"\n",
"## Implementation Details\n",
"\n",
"- All techniques use OpenAI's GPT-4 model for query transformation.\n",
"- Custom prompt templates are used to guide the model in generating appropriate transformations.\n",
"- The code provides separate functions for each transformation technique, allowing for easy integration into existing RAG systems.\n",
"\n",
"## Example Use Case\n",
"\n",
"The code demonstrates each technique using the example query:\n",
"\"What are the impacts of climate change on the environment?\"\n",
"\n",
"- **Query Rewriting** expands this to include specific aspects like temperature changes and biodiversity.\n",
"- **Step-back Prompting** generalizes it to \"What are the general effects of climate change?\"\n",
"- **Sub-query Decomposition** breaks it down into questions about biodiversity, oceans, weather patterns, and terrestrial environments.\n",
"\n",
"## Conclusion\n",
"\n",
"These query transformation techniques offer powerful ways to enhance the retrieval capabilities of RAG systems. By reformulating queries in various ways, they can significantly improve the relevance, context, and comprehensiveness of retrieved information. These methods are particularly valuable in domains where queries can be complex or multifaceted, such as scientific research, legal analysis, or comprehensive fact-finding tasks."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Import libraries and set environment variables"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain_openai import ChatOpenAI\n",
"from langchain.prompts import PromptTemplate\n",
"\n",
"import os\n",
"from dotenv import load_dotenv\n",
"\n",
"# Load environment variables from a .env file\n",
"load_dotenv()\n",
"\n",
"# Set the OpenAI API key environment variable\n",
"os.environ[\"OPENAI_API_KEY\"] = os.getenv('OPENAI_API_KEY')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1 - Query Rewriting: Reformulating queries to improve retrieval."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"re_write_llm = ChatOpenAI(temperature=0, model_name=\"gpt-4o\", max_tokens=4000)\n",
"\n",
"# Create a prompt template for query rewriting\n",
"query_rewrite_template = \"\"\"You are an AI assistant tasked with reformulating user queries to improve retrieval in a RAG system. \n",
"Given the original query, rewrite it to be more specific, detailed, and likely to retrieve relevant information.\n",
"\n",
"Original query: {original_query}\n",
"\n",
"Rewritten query:\"\"\"\n",
"\n",
"query_rewrite_prompt = PromptTemplate(\n",
" input_variables=[\"original_query\"],\n",
" template=query_rewrite_template\n",
")\n",
"\n",
"# Create an LLMChain for query rewriting\n",
"query_rewriter = query_rewrite_prompt | re_write_llm\n",
"\n",
"def rewrite_query(original_query):\n",
" \"\"\"\n",
" Rewrite the original query to improve retrieval.\n",
" \n",
" Args:\n",
" original_query (str): The original user query\n",
" \n",
" Returns:\n",
" str: The rewritten query\n",
" \"\"\"\n",
" response = query_rewriter.invoke(original_query)\n",
" return response.content"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Demonstrate on a use case"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Original query: What are the impacts of climate change on the environment?\n",
"\n",
"Rewritten query: What are the specific effects of climate change on various ecosystems, including changes in temperature, precipitation patterns, sea levels, and biodiversity?\n"
]
}
],
"source": [
"# example query over the understanding climate change dataset\n",
"original_query = \"What are the impacts of climate change on the environment?\"\n",
"rewritten_query = rewrite_query(original_query)\n",
"print(\"Original query:\", original_query)\n",
"print(\"\\nRewritten query:\", rewritten_query)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2 - Step-back Prompting: Generating broader queries for better context retrieval.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"step_back_llm = ChatOpenAI(temperature=0, model_name=\"gpt-4o\", max_tokens=4000)\n",
"\n",
"\n",
"# Create a prompt template for step-back prompting\n",
"step_back_template = \"\"\"You are an AI assistant tasked with generating broader, more general queries to improve context retrieval in a RAG system.\n",
"Given the original query, generate a step-back query that is more general and can help retrieve relevant background information.\n",
"\n",
"Original query: {original_query}\n",
"\n",
"Step-back query:\"\"\"\n",
"\n",
"step_back_prompt = PromptTemplate(\n",
" input_variables=[\"original_query\"],\n",
" template=step_back_template\n",
")\n",
"\n",
"# Create an LLMChain for step-back prompting\n",
"step_back_chain = step_back_prompt | step_back_llm\n",
"\n",
"def generate_step_back_query(original_query):\n",
" \"\"\"\n",
" Generate a step-back query to retrieve broader context.\n",
" \n",
" Args:\n",
" original_query (str): The original user query\n",
" \n",
" Returns:\n",
" str: The step-back query\n",
" \"\"\"\n",
" response = step_back_chain.invoke(original_query)\n",
" return response.content"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Demonstrate on a use case"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Original query: What are the impacts of climate change on the environment?\n",
"\n",
"Step-back query: What are the general effects of climate change?\n"
]
}
],
"source": [
"# example query over the understanding climate change dataset\n",
"original_query = \"What are the impacts of climate change on the environment?\"\n",
"step_back_query = generate_step_back_query(original_query)\n",
"print(\"Original query:\", original_query)\n",
"print(\"\\nStep-back query:\", step_back_query)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3- Sub-query Decomposition: Breaking complex queries into simpler sub-queries."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"sub_query_llm = ChatOpenAI(temperature=0, model_name=\"gpt-4o\", max_tokens=4000)\n",
"\n",
"# Create a prompt template for sub-query decomposition\n",
"subquery_decomposition_template = \"\"\"You are an AI assistant tasked with breaking down complex queries into simpler sub-queries for a RAG system.\n",
"Given the original query, decompose it into 2-4 simpler sub-queries that, when answered together, would provide a comprehensive response to the original query.\n",
"\n",
"Original query: {original_query}\n",
"\n",
"example: What are the impacts of climate change on the environment?\n",
"\n",
"Sub-queries:\n",
"1. What are the impacts of climate change on biodiversity?\n",
"2. How does climate change affect the oceans?\n",
"3. What are the effects of climate change on agriculture?\n",
"4. What are the impacts of climate change on human health?\"\"\"\n",
"\n",
"\n",
"subquery_decomposition_prompt = PromptTemplate(\n",
" input_variables=[\"original_query\"],\n",
" template=subquery_decomposition_template\n",
")\n",
"\n",
"# Create an LLMChain for sub-query decomposition\n",
"subquery_decomposer_chain = subquery_decomposition_prompt | sub_query_llm\n",
"\n",
"def decompose_query(original_query: str):\n",
" \"\"\"\n",
" Decompose the original query into simpler sub-queries.\n",
" \n",
" Args:\n",
" original_query (str): The original complex query\n",
" \n",
" Returns:\n",
" List[str]: A list of simpler sub-queries\n",
" \"\"\"\n",
" response = subquery_decomposer_chain.invoke(original_query).content\n",
" sub_queries = [q.strip() for q in response.split('\\n') if q.strip() and not q.strip().startswith('Sub-queries:')]\n",
" return sub_queries"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Demonstrate on a use case"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Sub-queries:\n",
"Original query: What are the impacts of climate change on the environment?\n",
"1. How does climate change affect biodiversity and ecosystems?\n",
"2. What are the impacts of climate change on oceanic conditions and marine life?\n",
"3. How does climate change influence weather patterns and extreme weather events?\n",
"4. What are the effects of climate change on terrestrial environments, such as forests and deserts?\n"
]
}
],
"source": [
"# example query over the understanding climate change dataset\n",
"original_query = \"What are the impacts of climate change on the environment?\"\n",
"sub_queries = decompose_query(original_query)\n",
"print(\"\\nSub-queries:\")\n",
"for i, sub_query in enumerate(sub_queries, 1):\n",
" print(sub_query)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}