Merge pull request #83 from VakeDomen/feature/hype

Feature/hype
2025-04-07 00:48:52 +03:00 · 2025-04-01 23:32:26 +03:00
parent 73b91bfa13 57e9dcc87a
commit 1cfb0d44cb
4 changed files with 824 additions and 25 deletions
--- a/README.md
+++ b/README.md
@@ -153,7 +153,24 @@ Explore the extensive list of cutting-edge RAG techniques:

 ### 📚 Context and Content Enrichment

-8. **[Contextual Chunk Headers :label:](https://github.com/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/contextual_chunk_headers.ipynb)**
+8. Hypothetical Prompt Embeddings (HyPE) ❓🚀  
+   - **[LangChain](all_rag_techniques/HyPE_Hypothetical_Prompt_Embedding.ipynb)**  
+   - **[Runnable Script](all_rag_techniques_runnable_scripts/HyPE_Hypothetical_Prompt_Embedding.py)**  
+
+   #### Overview 🔎  
+   HyPE (Hypothetical Prompt Embeddings) is an enhancement to traditional RAG retrieval that **precomputes hypothetical prompts at the indexing stage**, but inseting the chunk in their place. This transforms retrieval into a **question-question matching task**. This avoids the need for runtime synthetic answer generation, reducing inference-time computational overhead while **improving retrieval alignment**.  
+
+   #### Implementation 🛠️  
+   - 📖 **Precomputed Questions:** Instead of embedding document chunks, HyPE **generates multiple hypothetical queries per chunk** at indexing time.  
+   - 🔍 **Question-Question Matching:** User queries are matched against stored hypothetical questions, leading to **better retrieval alignment**.  
+   - ⚡ **No Runtime Overhead:** Unlike HyDE, HyPE does **not require LLM calls at query time**, making retrieval **faster and cheaper**.  
+   - 📈 **Higher Precision & Recall:** Improves retrieval **context precision by up to 42 percentage points** and **claim recall by up to 45 percentage points**.  
+
+   #### Additional Resources 📚  
+   - **[Preprint: Hypothetical Prompt Embeddings (HyPE)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5139335)** - Research paper detailing the method, evaluation, and benchmarks.  
+
+
+9. **[Contextual Chunk Headers :label:](https://github.com/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/contextual_chunk_headers.ipynb)**

    #### Overview 🔎
    Contextual chunk headers (CCH) is a method of creating document-level and section-level context, and prepending those chunk headers to the chunks prior to embedding them.
@@ -164,7 +181,7 @@ Explore the extensive list of cutting-edge RAG techniques:
    #### Additional Resources 📚
    **[dsRAG](https://github.com/D-Star-AI/dsRAG)**: open-source retrieval engine that implements this technique (and a few other advanced RAG techniques)

-9. **[Relevant Segment Extraction 🧩](https://github.com/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/relevant_segment_extraction.ipynb)**
+10. **[Relevant Segment Extraction 🧩](https://github.com/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/relevant_segment_extraction.ipynb)**

    #### Overview 🔎
    Relevant segment extraction (RSE) is a method of dynamically constructing multi-chunk segments of text that are relevant to a given query.
@@ -172,7 +189,7 @@ Explore the extensive list of cutting-edge RAG techniques:
    #### Implementation 🛠️
    Perform a retrieval post-processing step that analyzes the most relevant chunks and identifies longer multi-chunk segments to provide more complete context to the LLM.

-10. Context Enrichment Techniques 📝  
+11. Context Enrichment Techniques 📝  
   - **[LangChain](all_rag_techniques/context_enrichment_window_around_chunk.ipynb)**  
   - **[LlamaIndex](all_rag_techniques/context_enrichment_window_around_chunk_with_llamaindex.ipynb)**
   - **[Runnable Script](all_rag_techniques_runnable_scripts/context_enrichment_window_around_chunk.py)**
@@ -183,7 +200,7 @@ Explore the extensive list of cutting-edge RAG techniques:
   #### Implementation 🛠️
   Retrieve the most relevant sentence while also accessing the sentences before and after it in the original text.

-11. Semantic Chunking 🧠
+12. Semantic Chunking 🧠
   - **[LangChain](https://github.com/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/semantic_chunking.ipynb)**
   - **[Runnable Script](https://github.com/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques_runnable_scripts/semantic_chunking.py)**

@@ -196,7 +213,7 @@ Explore the extensive list of cutting-edge RAG techniques:
   #### Additional Resources 📚
   - **[Semantic Chunking: Improving AI Information Retrieval](https://open.substack.com/pub/diamantai/p/semantic-chunking-improving-ai-information?r=336pe4&utm_campaign=post&utm_medium=web)** - A comprehensive blog post exploring the benefits and implementation of semantic chunking in RAG systems.

-12. Contextual Compression 🗜️  
+13. Contextual Compression 🗜️  
   - **[LangChain](all_rag_techniques/contextual_compression.ipynb)**  
   - **[Runnable Script](all_rag_techniques_runnable_scripts/contextual_compression.py)**

@@ -206,7 +223,7 @@ Explore the extensive list of cutting-edge RAG techniques:
   #### Implementation 🛠️
   Use an LLM to compress or summarize retrieved chunks, preserving key information relevant to the query.

-13. Document Augmentation through Question Generation for Enhanced Retrieval  
+14. Document Augmentation through Question Generation for Enhanced Retrieval  
   - **[LangChain](all_rag_techniques/document_augmentation.ipynb)**  
   - **[Runnable Script](all_rag_techniques_runnable_scripts/document_augmentation.py)**

@@ -218,7 +235,7 @@ Explore the extensive list of cutting-edge RAG techniques:

 ### 🚀 Advanced Retrieval Methods

-14. Fusion Retrieval 🔗  
+15. Fusion Retrieval 🔗  
    - **[LangChain](all_rag_techniques/fusion_retrieval.ipynb)**  
    - **[LlamaIndex](https://github.com/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/fusion_retrieval_with_llamaindex.ipynb)**
    - **[Runnable Script](all_rag_techniques_runnable_scripts/fusion_retrieval.py)**
@@ -229,7 +246,7 @@ Explore the extensive list of cutting-edge RAG techniques:
    #### Implementation 🛠️
    Combine keyword-based search with vector-based search for more comprehensive and accurate retrieval.

-15. Intelligent Reranking 📈  
+16. Intelligent Reranking 📈  
    - **[LangChain](all_rag_techniques/reranking.ipynb)**  
    - **[LlamaIndex](https://github.com/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/reranking_with_llamaindex.ipynb)**
    - **[Runnable Script](all_rag_techniques_runnable_scripts/reranking.py)**
@@ -245,7 +262,7 @@ Explore the extensive list of cutting-edge RAG techniques:
    #### Additional Resources 📚
    - **[Relevance Revolution: How Re-ranking Transforms RAG Systems](https://open.substack.com/pub/diamantai/p/relevance-revolution-how-re-ranking?r=336pe4&utm_campaign=post&utm_medium=web)** - A comprehensive blog post exploring the power of re-ranking in enhancing RAG system performance.

-16. Multi-faceted Filtering 🔍
+17. Multi-faceted Filtering 🔍

    #### Overview 🔎
    Applying various filtering techniques to refine and improve the quality of retrieved results.
@@ -256,7 +273,7 @@ Explore the extensive list of cutting-edge RAG techniques:
    - 📄 **Content Filtering:** Remove results that don't match specific content criteria or essential keywords.
    - 🌈 **Diversity Filtering:** Ensure result diversity by filtering out near-duplicate entries.

-17. Hierarchical Indices 🗂️  
+18. Hierarchical Indices 🗂️  
    - **[LangChain](all_rag_techniques/hierarchical_indices.ipynb)**  
    - **[Runnable Script](all_rag_techniques_runnable_scripts/hierarchical_indices.py)**

@@ -269,7 +286,7 @@ Explore the extensive list of cutting-edge RAG techniques:
    #### Additional Resources 📚
    - **[Hierarchical Indices: Enhancing RAG Systems](https://open.substack.com/pub/diamantai/p/hierarchical-indices-enhancing-rag?r=336pe4&utm_campaign=post&utm_medium=web)** - A comprehensive blog post exploring the power of hierarchical indices in enhancing RAG system performance.

-18. Ensemble Retrieval 🎭
+19. Ensemble Retrieval 🎭

    #### Overview 🔎
    Combining multiple retrieval models or techniques for more robust and accurate results.
@@ -277,7 +294,7 @@ Explore the extensive list of cutting-edge RAG techniques:
    #### Implementation 🛠️
    Apply different embedding models or retrieval algorithms and use voting or weighting mechanisms to determine the final set of retrieved documents.

-19. Dartboard Retrieval 🎯
+20. Dartboard Retrieval 🎯
    - **[LangChain](https://github.com/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/dartboard.ipynb)** 
    #### Overview 🔎
    Optimizing over Relevant Information Gain in Retrieval
@@ -286,7 +303,7 @@ Explore the extensive list of cutting-edge RAG techniques:
    - Combine both relevance and diversity into a single scoring function and directly optimize for it.
    - POC showing plain simple RAG underperforming when the database is dense, and the dartboard retrieval outperforming it.

-20. Multi-modal Retrieval 📽️
+21. Multi-modal Retrieval 📽️

    #### Overview 🔎
    Extending RAG capabilities to handle diverse data types for richer responses.
@@ -298,7 +315,7 @@ Explore the extensive list of cutting-edge RAG techniques:

 ### 🔁 Iterative and Adaptive Techniques

-21. Retrieval with Feedback Loops 🔁  
+22. Retrieval with Feedback Loops 🔁  
    - **[LangChain](all_rag_techniques/retrieval_with_feedback_loop.ipynb)**  
    - **[Runnable Script](all_rag_techniques_runnable_scripts/retrieval_with_feedback_loop.py)**

@@ -308,7 +325,7 @@ Explore the extensive list of cutting-edge RAG techniques:
    #### Implementation 🛠️
    Collect and utilize user feedback on the relevance and quality of retrieved documents and generated responses to fine-tune retrieval and ranking models.

-22. Adaptive Retrieval 🎯  
+23. Adaptive Retrieval 🎯  
    - **[LangChain](all_rag_techniques/adaptive_retrieval.ipynb)**  
    - **[Runnable Script](all_rag_techniques_runnable_scripts/adaptive_retrieval.py)**

@@ -318,7 +335,7 @@ Explore the extensive list of cutting-edge RAG techniques:
    #### Implementation 🛠️
    Classify queries into different categories and use tailored retrieval strategies for each, considering user context and preferences.

-23. Iterative Retrieval 🔄
+24. Iterative Retrieval 🔄

    #### Overview 🔎
    Performing multiple rounds of retrieval to refine and enhance result quality.
@@ -328,7 +345,7 @@ Explore the extensive list of cutting-edge RAG techniques:

 ### 📊 Evaluation

-24. **[DeepEval Evaluation](evaluation/evaluation_deep_eval.ipynb)** 📘
+25. **[DeepEval Evaluation](evaluation/evaluation_deep_eval.ipynb)** 📘

    #### Overview 🔎
    Performing evaluations Retrieval-Augmented Generation systems, by covering several metrics and creating test cases.
@@ -337,7 +354,7 @@ Explore the extensive list of cutting-edge RAG techniques:
    Use the `deepeval` library to conduct test cases on correctness, faithfulness and contextual relevancy of RAG systems.
    

-25. **[GroUSE Evaluation](evaluation/evaluation_grouse.ipynb)** 🐦
+26. **[GroUSE Evaluation](evaluation/evaluation_grouse.ipynb)** 🐦

    #### Overview 🔎
    Evaluate the final stage of Retrieval-Augmented Generation using metrics of the GroUSE framework and meta-evaluate your custom LLM judge on GroUSE unit tests.
@@ -348,7 +365,7 @@ Explore the extensive list of cutting-edge RAG techniques:

 ### 🔬 Explainability and Transparency

-26. Explainable Retrieval 🔍  
+27. Explainable Retrieval 🔍  
    - **[LangChain](all_rag_techniques/explainable_retrieval.ipynb)**  
    - **[Runnable Script](all_rag_techniques_runnable_scripts/explainable_retrieval.py)**

@@ -360,7 +377,7 @@ Explore the extensive list of cutting-edge RAG techniques:

 ### 🏗️ Advanced Architectures

-27. Knowledge Graph Integration (Graph RAG) 🕸️  
+28. Knowledge Graph Integration (Graph RAG) 🕸️  
    - **[LangChain](all_rag_techniques/graph_rag.ipynb)**  
    - **[Runnable Script](all_rag_techniques_runnable_scripts/graph_rag.py)**

@@ -370,7 +387,7 @@ Explore the extensive list of cutting-edge RAG techniques:
    #### Implementation 🛠️
    Retrieve entities and their relationships from a knowledge graph relevant to the query, combining this structured data with unstructured text for more informative responses.
    
-28. GraphRag (Microsoft) 🎯
+29. GraphRag (Microsoft) 🎯
    - **[GraphRag](all_rag_techniques/Microsoft_GraphRag.ipynb)**  

    #### Overview 🔎
@@ -379,7 +396,7 @@ Explore the extensive list of cutting-edge RAG techniques:
    #### Implementation 🛠️
    • Analyze an input corpus by extracting entities, relationships from text units. generates summaries of each community and its constituents from the bottom-up.

-29. RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval 🌳  
+30. RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval 🌳  
    - **[LangChain](all_rag_techniques/raptor.ipynb)**  
    - **[Runnable Script](all_rag_techniques_runnable_scripts/raptor.py)**

@@ -389,7 +406,7 @@ Explore the extensive list of cutting-edge RAG techniques:
    #### Implementation 🛠️
    Use abstractive summarization to recursively process and summarize retrieved documents, organizing the information in a tree structure for hierarchical context.

-30. Self RAG 🔁  
+31. Self RAG 🔁  
    - **[LangChain](all_rag_techniques/self_rag.ipynb)**  
    - **[Runnable Script](all_rag_techniques_runnable_scripts/self_rag.py)**

@@ -399,7 +416,7 @@ Explore the extensive list of cutting-edge RAG techniques:
    #### Implementation 🛠️
    • Implement a multi-step process including retrieval decision, document retrieval, relevance evaluation, response generation, support assessment, and utility evaluation to produce accurate, relevant, and useful outputs.

-31. Corrective RAG 🔧  
+32. Corrective RAG 🔧  
    - **[LangChain](all_rag_techniques/crag.ipynb)**  
    - **[Runnable Script](all_rag_techniques_runnable_scripts/crag.py)**

@@ -411,7 +428,7 @@ Explore the extensive list of cutting-edge RAG techniques:

 ## 🌟 Special Advanced Technique 🌟

-32. **[Sophisticated Controllable Agent for Complex RAG Tasks 🤖](https://github.com/NirDiamant/Controllable-RAG-Agent)**
+33. **[Sophisticated Controllable Agent for Complex RAG Tasks 🤖](https://github.com/NirDiamant/Controllable-RAG-Agent)**

    #### Overview 🔎
    An advanced RAG solution designed to tackle complex questions that simple semantic similarity-based retrieval cannot solve. This approach uses a sophisticated deterministic graph as the "brain" 🧠 of a highly controllable autonomous agent, capable of answering non-trivial questions from your own data.
--- a/all_rag_techniques/HyPE_Hypothetical_Prompt_Embeddings.ipynb
+++ b/all_rag_techniques/HyPE_Hypothetical_Prompt_Embeddings.ipynb
@@ -0,0 +1,558 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Hypothetical Prompt Embeddings (HyPE)\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "This code implements a Retrieval-Augmented Generation (RAG) system enhanced by Hypothetical Prompt Embeddings (HyPE). Unlike traditional RAG pipelines that struggle with query-document style mismatch, HyPE precomputes hypothetical questions during the indexing phase. This transforms retrieval into a question-question matching problem, eliminating the need for expensive runtime query expansion techniques.\n",
+    "\n",
+    "## Key Components of notebook\n",
+    "\n",
+    "1. PDF processing and text extraction\n",
+    "2. Text chunking to maintain coherent information units\n",
+    "3. **Hypothetical Prompt Embedding Generation** using an LLM to create multiple proxy questions per chunk\n",
+    "4. Vector store creation using [FAISS](https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/) and OpenAI embeddings\n",
+    "5. Retriever setup for querying the processed documents\n",
+    "6. Evaluation of the RAG system\n",
+    "\n",
+    "## Method Details\n",
+    "\n",
+    "### Document Preprocessing\n",
+    "\n",
+    "1. The PDF is loaded using `PyPDFLoader`.\n",
+    "2. The text is split into chunks using `RecursiveCharacterTextSplitter` with specified chunk size and overlap.\n",
+    "\n",
+    "### Hypothetical Question Generation\n",
+    "\n",
+    "Instead of embedding raw text chunks, HyPE **generates multiple hypothetical prompts** for each chunk. These **precomputed questions** simulate user queries, improving alignment with real-world searches. This removes the need for runtime synthetic answer generation needed in techniques like HyDE.\n",
+    "\n",
+    "### Vector Store Creation\n",
+    "\n",
+    "1. Each hypothetical question is embedded using OpenAI embeddings.\n",
+    "2. A FAISS vector store is built, associating **each question embedding with its original chunk**.\n",
+    "3. This approach **stores multiple representations per chunk**, increasing retrieval flexibility.\n",
+    "\n",
+    "### Retriever Setup\n",
+    "\n",
+    "1. The retriever is optimized for **question-question matching** rather than direct document retrieval.\n",
+    "2. The FAISS index enables **efficient nearest-neighbor** search over the hypothetical prompt embeddings.\n",
+    "3. Retrieved chunks provide a **richer and more precise context** for downstream LLM generation.\n",
+    "\n",
+    "## Key Features\n",
+    "\n",
+    "1. **Precomputed Hypothetical Prompts** – Improves query alignment without runtime overhead.\n",
+    "2. **Multi-Vector Representation**– Each chunk is indexed multiple times for broader semantic coverage.\n",
+    "3. **Efficient Retrieval** – FAISS ensures fast similarity search over the enhanced embeddings.\n",
+    "4. **Modular Design** – The pipeline is easy to adapt for different datasets and retrieval settings. Additionally it's compatible with most optimizations like reranking etc.\n",
+    "\n",
+    "## Evaluation\n",
+    "\n",
+    "HyPE's effectiveness is evaluated across multiple datasets, showing:\n",
+    "\n",
+    "- Up to 42 percentage points improvement in retrieval precision\n",
+    "- Up to 45 percentage points improvement in claim recall\n",
+    "    (See full evaluation results in [preprint](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5139335))\n",
+    "\n",
+    "## Benefits of this Approach\n",
+    "\n",
+    "1. **Eliminates Query-Time Overhead** – All hypothetical generation is done offline at indexing.\n",
+    "2. **Enhanced Retrieval Precision** – Better alignment between queries and stored content.\n",
+    "3. **Scalable & Efficient** – No addinal per-query computational cost; retrieval is as fast as standard RAG.\n",
+    "4. **Flexible & Extensible** – Can be combined with advanced RAG techniques like reranking.\n",
+    "\n",
+    "## Conclusion\n",
+    "\n",
+    "HyPE provides a scalable and efficient alternative to traditional RAG systems, overcoming query-document style mismatch while avoiding the computational cost of runtime query expansion. By moving hypothetical prompt generation to indexing, it significantly enhances retrieval precision and efficiency, making it a practical solution for real-world applications.\n",
+    "\n",
+    "For further details, refer to the full paper: [preprint](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5139335)\n",
+    "\n",
+    "\n",
+    "<div style=\"text-align: center;\">\n",
+    "\n",
+    "<img src=\"../images/hype.svg\" alt=\"HyPE\" style=\"width:70%; height:auto;\">\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Import libraries and environment variables"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 63,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import sys\n",
+    "import faiss\n",
+    "from tqdm import tqdm\n",
+    "from dotenv import load_dotenv\n",
+    "from concurrent.futures import ThreadPoolExecutor, as_completed\n",
+    "from langchain_community.docstore.in_memory import InMemoryDocstore\n",
+    "\n",
+    "\n",
+    "# Load environment variables from a .env file\n",
+    "load_dotenv()\n",
+    "\n",
+    "# Set the OpenAI API key environment variable (comment out if not using OpenAI)\n",
+    "if not os.getenv('OPENAI_API_KEY'):\n",
+    "    os.environ[\"OPENAI_API_KEY\"] = input(\"Please enter your OpenAI API key: \")\n",
+    "else:\n",
+    "    os.environ[\"OPENAI_API_KEY\"] = os.getenv('OPENAI_API_KEY')\n",
+    "\n",
+    "sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..'))) # Add the parent directory to the path since we work with notebooks\n",
+    "from helper_functions import *\n",
+    "from evaluation.evalute_rag import *\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Define constants\n",
+    "\n",
+    "- `PATH`: path to the data, to be embedded into the RAG pipeline\n",
+    "\n",
+    "This tutorial uses OpenAI endpoint ([avalible models](https://platform.openai.com/docs/pricing)). \n",
+    "- `LANGUAGE_MODEL_NAME`: The name of the language model to be used. \n",
+    "- `EMBEDDING_MODEL_NAME`: The name of the embedding model to be used.\n",
+    "\n",
+    "The tutroial uses a `RecursiveCharacterTextSplitter` chunking approach where the chunking length function used is python `len` function. The chunking varables to be tweaked here are:\n",
+    "- `CHUNK_SIZE`: The minimum length of one chunk\n",
+    "- `CHUNK_OVERLAP`: The overlap of two consecutive chunks."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 64,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "PATH = \"../data/Understanding_Climate_Change.pdf\"\n",
+    "LANGUAGE_MODEL_NAME = \"gpt-4o-mini\"\n",
+    "EMBEDDING_MODEL_NAME = \"text-embedding-3-small\"\n",
+    "CHUNK_SIZE = 1000\n",
+    "CHUNK_OVERLAP = 200"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Define generation of Hypothetical Prompt Embeddings\n",
+    "\n",
+    "The code block below generates hypothetical questions for each text chunk and embeds them for retrieval.\n",
+    "\n",
+    "- An LLM extracts key questions from the input chunk.\n",
+    "- These questions are embedded using OpenAI's model.\n",
+    "- The function returns the original chunk and its prompt embeddings later used for retrieval.\n",
+    "\n",
+    "To ensure clean output, extra newlines are removed, and regex parsing can improve list formatting when needed."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 65,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def generate_hypothetical_prompt_embeddings(chunk_text: str):\n",
+    "    \"\"\"\n",
+    "    Uses the LLM to generate multiple hypothetical questions for a single chunk.\n",
+    "    These questions will be used as 'proxies' for the chunk during retrieval.\n",
+    "\n",
+    "    Parameters:\n",
+    "    chunk_text (str): Text contents of the chunk\n",
+    "\n",
+    "    Returns:\n",
+    "    chunk_text (str): Text contents of the chunk. This is done to make the \n",
+    "        multithreading easier\n",
+    "    hypothetical prompt embeddings (List[float]): A list of embedding vectors\n",
+    "        generated from the questions\n",
+    "    \"\"\"\n",
+    "    llm = ChatOpenAI(temperature=0, model_name=LANGUAGE_MODEL_NAME)\n",
+    "    embedding_model = OpenAIEmbeddings(model=EMBEDDING_MODEL_NAME)\n",
+    "\n",
+    "    question_gen_prompt = PromptTemplate.from_template(\n",
+    "        \"Analyze the input text and generate essential questions that, when answered, \\\n",
+    "        capture the main points of the text. Each question should be one line, \\\n",
+    "        without numbering or prefixes.\\n\\n \\\n",
+    "        Text:\\n{chunk_text}\\n\\nQuestions:\\n\"\n",
+    "    )\n",
+    "    question_chain = question_gen_prompt | llm | StrOutputParser()\n",
+    "\n",
+    "    # parse questions from response\n",
+    "    # Notes: \n",
+    "    # - gpt4o likes to split questions by \\n\\n so we remove one \\n\n",
+    "    # - for production or if using smaller models from ollama, it's beneficial to use regex to parse \n",
+    "    # things like (un)ordeed lists\n",
+    "    # r\"^\\s*[\\-\\*\\•]|\\s*\\d+\\.\\s*|\\s*[a-zA-Z]\\)\\s*|\\s*\\(\\d+\\)\\s*|\\s*\\([a-zA-Z]\\)\\s*|\\s*\\([ivxlcdm]+\\)\\s*\"\n",
+    "    questions = question_chain.invoke({\"chunk_text\": chunk_text}).replace(\"\\n\\n\", \"\\n\").split(\"\\n\")\n",
+    "    \n",
+    "    return chunk_text, embedding_model.embed_documents(questions)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Define creation and population of FAISS Vector Store\n",
+    "\n",
+    "The code block below builds a FAISS vector store by embedding text chunks in parallel.\n",
+    "\n",
+    "What happens?\n",
+    "- Parallel processing – Uses threading to generate embeddings faster.\n",
+    "- FAISS initialization – Sets up an L2 index for efficient similarity search.\n",
+    "- Chunk embedding – Each chunk is stored multiple times, once for each generated question embedding.\n",
+    "- In-memory storage – Uses InMemoryDocstore for fast lookup.\n",
+    "\n",
+    "This ensures efficient retrieval, improving query alignment with precomputed question embeddings."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 66,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def prepare_vector_store(chunks: List[str]):\n",
+    "    \"\"\"\n",
+    "    Creates and populates a FAISS vector store from a list of text chunks.\n",
+    "\n",
+    "    This function processes a list of text chunks in parallel, generating \n",
+    "    hypothetical prompt embeddings for each chunk.\n",
+    "    The embeddings are stored in a FAISS index for efficient similarity search.\n",
+    "\n",
+    "    Parameters:\n",
+    "    chunks (List[str]): A list of text chunks to be embedded and stored.\n",
+    "\n",
+    "    Returns:\n",
+    "    FAISS: A FAISS vector store containing the embedded text chunks.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    # Wait with initialization to see vector lengths\n",
+    "    vector_store = None  \n",
+    "\n",
+    "    with ThreadPoolExecutor() as pool:  \n",
+    "        # Use threading to speed up generation of prompt embeddings\n",
+    "        futures = [pool.submit(generate_hypothetical_prompt_embeddings, c) for c in chunks]\n",
+    "        \n",
+    "        # Process embeddings as they complete\n",
+    "        for f in tqdm(as_completed(futures), total=len(chunks)):  \n",
+    "            \n",
+    "            chunk, vectors = f.result()  # Retrieve the processed chunk and its embeddings\n",
+    "            \n",
+    "            # Initialize the FAISS vector store on the first chunk\n",
+    "            if vector_store == None:  \n",
+    "                vector_store = FAISS(\n",
+    "                    embedding_function=OpenAIEmbeddings(model=EMBEDDING_MODEL_NAME),  # Define embedding model\n",
+    "                    index=faiss.IndexFlatL2(len(vectors[0]))  # Define an L2 index for similarity search\n",
+    "                    docstore=InMemoryDocstore(),  # Use in-memory document storage\n",
+    "                    index_to_docstore_id={}  # Maintain index-to-document mapping\n",
+    "                )\n",
+    "            \n",
+    "            # Pair the chunk's content with each generated embedding vector.\n",
+    "            # Each chunk is inserted multiple times, once for each prompt vector\n",
+    "            chunks_with_embedding_vectors = [(chunk.page_content, vec) for vec in vectors]\n",
+    "            \n",
+    "            # Add embeddings to the store\n",
+    "            vector_store.add_embeddings(chunks_with_embedding_vectors)  \n",
+    "\n",
+    "    return vector_store  # Return the populated vector store\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Encode PDF into a FAISS Vector Store\n",
+    "\n",
+    "The code block below processes a PDF file and stores its content as embeddings for retrieval.\n",
+    "\n",
+    "What happens?\n",
+    "- PDF loading – Extracts text from the document.\n",
+    "- Chunking – Splits text into overlapping segments for better context retention.\n",
+    "- Preprocessing – Cleans text to improve embedding quality.\n",
+    "- Vector store creation – Generates embeddings and stores them in FAISS for retrieval."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 70,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def encode_pdf(path, chunk_size=1000, chunk_overlap=200):\n",
+    "    \"\"\"\n",
+    "    Encodes a PDF book into a vector store using OpenAI embeddings.\n",
+    "\n",
+    "    Args:\n",
+    "        path: The path to the PDF file.\n",
+    "        chunk_size: The desired size of each text chunk.\n",
+    "        chunk_overlap: The amount of overlap between consecutive chunks.\n",
+    "\n",
+    "    Returns:\n",
+    "        A FAISS vector store containing the encoded book content.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    # Load PDF documents\n",
+    "    loader = PyPDFLoader(path)\n",
+    "    documents = loader.load()\n",
+    "\n",
+    "    # Split documents into chunks\n",
+    "    text_splitter = RecursiveCharacterTextSplitter(\n",
+    "        chunk_size=chunk_size, chunk_overlap=chunk_overlap, length_function=len\n",
+    "    )\n",
+    "    texts = text_splitter.split_documents(documents)\n",
+    "    cleaned_texts = replace_t_with_space(texts)\n",
+    "\n",
+    "    vectorstore = prepare_vector_store(cleaned_texts)\n",
+    "\n",
+    "    return vectorstore"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Create HyPE vector store\n",
+    "\n",
+    "Now we process the PDF and store its embeddings.\n",
+    "This step initializes the FAISS vector store with the encoded document."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 71,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 97/97 [00:22<00:00,  4.40it/s]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Chunk size can be quite large with HyPE as we are not loosing percision with more\n",
+    "# information. For production, test how exhaustive your model is in generating sufficient \n",
+    "# amount of questions per chunk. This will mostly depend on your information density.\n",
+    "chunks_vector_store = encode_pdf(PATH, chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Create retriever\n",
+    "\n",
+    "Now we set up the retriever to fetch relevant chunks from the vector store.\n",
+    "\n",
+    "Retrieves the top `k=3` most relevant chunks based on query similarity."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 79,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "chunks_query_retriever = chunks_vector_store.as_retriever(search_kwargs={\"k\": 3})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Test retriever\n",
+    "\n",
+    "Now we test retrieval using a sample query.\n",
+    "\n",
+    "- Queries the vector store to find the most relevant chunks.\n",
+    "- Deduplicates results to remove potentially repeated chunks.\n",
+    "- Displays the retrieved context for inspection.\n",
+    "\n",
+    "This step verifies that the retriever returns meaningful and diverse information for the given question."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 80,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Context 1:\n",
+      "Most of these climate changes are attributed to very small variations in Earth's orbit that \n",
+      "change the amount of solar energy our planet receives. During the Holocene epoch, which \n",
+      "began at the end of the last ice age, human societies f lourished, but the industrial era has seen \n",
+      "unprecedented changes.  \n",
+      "Modern Observations  \n",
+      "Modern scientific observations indicate a rapid increase in global temperatures, sea levels, \n",
+      "and extreme weather events. The Intergovernmental Panel on Climate Change (IPCC) has \n",
+      "documented these changes extensively. Ice core samples, tree rings, and ocean sediments \n",
+      "provide a historical record that scientists use to understand past climate conditions and \n",
+      "predict future trends. The evidence overwhelmingly shows that recent changes are primarily \n",
+      "driven by human activities, particularly the emission of greenhou se gases.  \n",
+      "Chapter 2: Causes of Climate Change  \n",
+      "Greenhouse Gases\n",
+      "\n",
+      "\n",
+      "Context 2:\n",
+      "driven by human activities, particularly the emission of greenhou se gases.  \n",
+      "Chapter 2: Causes of Climate Change  \n",
+      "Greenhouse Gases  \n",
+      "The primary cause of recent climate change is the increase in greenhouse gases in the \n",
+      "atmosphere. Greenhouse gases, such as carbon dioxide (CO2), methane (CH4), and nitrous \n",
+      "oxide (N2O), trap heat from the sun, creating a \"greenhouse effect.\" This effect is  essential \n",
+      "for life on Earth, as it keeps the planet warm enough to support life. However, human \n",
+      "activities have intensified this natural process, leading to a warmer climate.  \n",
+      "Fossil Fuels  \n",
+      "Burning fossil fuels for energy releases large amounts of CO2. This includes coal, oil, and \n",
+      "natural gas used for electricity, heating, and transportation. The industrial revolution marked \n",
+      "the beginning of a significant increase in fossil fuel consumption, which continues to rise \n",
+      "today.  \n",
+      "Coal\n",
+      "\n",
+      "\n",
+      "Context 3:\n",
+      "Understanding Climate Change  \n",
+      "Chapter 1: Introduction to Climate Change  \n",
+      "Climate change refers to significant, long -term changes in the global climate. The term \n",
+      "\"global climate\" encompasses the planet's overall weather patterns, including temperature, \n",
+      "precipitation, and wind patterns, over an extended period. Over the past cent ury, human \n",
+      "activities, particularly the burning of fossil fuels and deforestation, have significantly \n",
+      "contributed to climate change.  \n",
+      "Historical Context  \n",
+      "The Earth's climate has changed throughout history. Over the past 650,000 years, there have \n",
+      "been seven cycles of glacial advance and retreat, with the abrupt end of the last ice age about \n",
+      "11,700 years ago marking the beginning of the modern climate era and  human civilization. \n",
+      "Most of these climate changes are attributed to very small variations in Earth's orbit that \n",
+      "change the amount of solar energy our planet receives. During the Holocene epoch, which\n",
+      "\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "test_query = \"What is the main cause of climate change?\"\n",
+    "context = retrieve_context_per_question(test_query, chunks_query_retriever)\n",
+    "context = list(set(context))\n",
+    "show_context(context)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Evaluate results"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 76,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'questions': ['1. **Multiple Choice: Causes of Climate Change**',\n",
+       "  '   - What is the primary cause of the current climate change trend?',\n",
+       "  '     A) Solar radiation variations',\n",
+       "  '     B) Natural cycles of the Earth',\n",
+       "  '     C) Human activities, such as burning fossil fuels',\n",
+       "  '     D) Volcanic eruptions',\n",
+       "  '',\n",
+       "  '2. **True or False: Impact on Biodiversity**',\n",
+       "  '   - True or False: Climate change does not have any significant impact on the migration patterns and extinction rates of various species.',\n",
+       "  '',\n",
+       "  '3. **Short Answer: Mitigation Strategies**',\n",
+       "  '   - What are two effective strategies that can be implemented at a community level to mitigate the effects of climate change?',\n",
+       "  '',\n",
+       "  '4. **Matching: Climate Change Effects**',\n",
+       "  '   - Match the following effects of climate change (numbered) with their likely consequences (lettered).',\n",
+       "  '     1. Rising sea levels',\n",
+       "  '     2. Increased frequency of extreme weather events',\n",
+       "  '     3. Melting polar ice caps',\n",
+       "  '     4. Ocean acidification',\n",
+       "  '     ',\n",
+       "  '     A) Displacement of coastal communities',\n",
+       "  '     B) Loss of marine biodiversity',\n",
+       "  '     C) Increased global temperatures',\n",
+       "  '     D) More frequent and severe hurricanes and floods',\n",
+       "  '',\n",
+       "  '5. **Essay: International Cooperation**',\n",
+       "  '   - Discuss the importance of international cooperation in combating climate change. Include examples of successful global agreements or initiatives and explain how they have contributed to addressing climate change.'],\n",
+       " 'results': ['```json\\n{\\n  \"Relevance\": 5,\\n  \"Completeness\": 4,\\n  \"Conciseness\": 3\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 5,\\n  \"Completeness\": 4,\\n  \"Conciseness\": 3\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 2,\\n  \"Completeness\": 1,\\n  \"Conciseness\": 2\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 4,\\n  \"Completeness\": 3,\\n  \"Conciseness\": 3\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 5,\\n  \"Completeness\": 4,\\n  \"Conciseness\": 3\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 1,\\n  \"Completeness\": 1,\\n  \"Conciseness\": 2\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 1,\\n  \"Completeness\": 1,\\n  \"Conciseness\": 2\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 5,\\n  \"Completeness\": 4,\\n  \"Conciseness\": 3\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 5,\\n  \"Completeness\": 4,\\n  \"Conciseness\": 3\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 2,\\n  \"Completeness\": 1,\\n  \"Conciseness\": 2\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 2,\\n  \"Completeness\": 1,\\n  \"Conciseness\": 2\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 4,\\n  \"Completeness\": 3,\\n  \"Conciseness\": 2\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 2,\\n  \"Completeness\": 1,\\n  \"Conciseness\": 2\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 4,\\n  \"Completeness\": 3,\\n  \"Conciseness\": 3\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 4,\\n  \"Completeness\": 2,\\n  \"Conciseness\": 3\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 5,\\n  \"Completeness\": 4,\\n  \"Conciseness\": 3\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 5,\\n  \"Completeness\": 4,\\n  \"Conciseness\": 3\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 5,\\n  \"Completeness\": 4,\\n  \"Conciseness\": 3\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 5,\\n  \"Completeness\": 4,\\n  \"Conciseness\": 3\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 4,\\n  \"Completeness\": 3,\\n  \"Conciseness\": 3\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 4,\\n  \"Completeness\": 3,\\n  \"Conciseness\": 2\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 4,\\n  \"Completeness\": 3,\\n  \"Conciseness\": 3\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 5,\\n  \"Completeness\": 4,\\n  \"Conciseness\": 3\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 5,\\n  \"Completeness\": 4,\\n  \"Conciseness\": 3\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 2,\\n  \"Completeness\": 1,\\n  \"Conciseness\": 2\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 4,\\n  \"Completeness\": 3,\\n  \"Conciseness\": 3\\n}\\n```',\n",
+       "  '```json\\n{\\n  \"Relevance\": 4,\\n  \"Completeness\": 2,\\n  \"Conciseness\": 3\\n}\\n```'],\n",
+       " 'average_scores': None}"
+      ]
+     },
+     "execution_count": 76,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "evaluate_rag(chunks_query_retriever)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/all_rag_techniques_runnable_scripts/HyPE_Hypothetical_Prompt_Embeddings.py
+++ b/all_rag_techniques_runnable_scripts/HyPE_Hypothetical_Prompt_Embeddings.py
@@ -0,0 +1,203 @@
+import os
+import sys
+import argparse
+import time
+import faiss
+from dotenv import load_dotenv
+from tqdm import tqdm
+from concurrent.futures import ThreadPoolExecutor, as_completed
+from langchain_community.docstore.in_memory import InMemoryDocstore
+
+# Add the parent directory to the path since we work with notebooks
+sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..')))
+
+from helper_functions import *
+from evaluation.evalute_rag import *
+
+# Load environment variables from a .env file (e.g., OpenAI API key)
+load_dotenv()
+os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')
+
+class HyPE:
+    """
+    A class to handle the HyPE RAG process, which enhances document chunking by 
+    generating hypothetical questions as proxies for retrieval.
+    """
+
+    def __init__(self, path, chunk_size=1000, chunk_overlap=200, n_retrieved=3):
+        """
+        Initializes the HyPE-based RAG retriever by encoding the PDF document with 
+        hypothetical prompt embeddings.
+
+        Args:
+            path (str): Path to the PDF file to encode.
+            chunk_size (int): Size of each text chunk (default: 1000).
+            chunk_overlap (int): Overlap between consecutive chunks (default: 200).
+            n_retrieved (int): Number of chunks to retrieve for each query (default: 3).
+        """
+        print("\n--- Initializing HyPE RAG Retriever ---")
+
+        # Encode the PDF document into a FAISS vector store using hypothetical prompt embeddings
+        start_time = time.time()
+        self.vector_store = self.encode_pdf(path, chunk_size=chunk_size, chunk_overlap=chunk_overlap)
+        self.time_records = {'Chunking': time.time() - start_time}
+        print(f"Chunking Time: {self.time_records['Chunking']:.2f} seconds")
+
+        # Create a retriever from the vector store
+        self.chunks_query_retriever = self.vector_store.as_retriever(search_kwargs={"k": n_retrieved})
+
+    def generate_hypothetical_prompt_embeddings(self, chunk_text):
+        """
+        Uses an LLM to generate multiple hypothetical questions for a single chunk.
+        These questions act as 'proxies' for the chunk during retrieval.
+
+        Parameters:
+        chunk_text (str): Text contents of the chunk.
+
+        Returns:
+        tuple: (Original chunk text, List of embedding vectors generated from the questions)
+        """
+        llm = ChatOpenAI(temperature=0, model_name="gpt-4o-mini")
+        embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")
+
+        question_gen_prompt = PromptTemplate.from_template(
+            "Analyze the input text and generate essential questions that, when answered, \
+            capture the main points of the text. Each question should be one line, \
+            without numbering or prefixes.\n\n \
+            Text:\n{chunk_text}\n\nQuestions:\n"
+        )
+        question_chain = question_gen_prompt | llm | StrOutputParser()
+
+        # Parse questions from response
+        questions = question_chain.invoke({"chunk_text": chunk_text}).replace("\n\n", "\n").split("\n")
+
+        return chunk_text, embedding_model.embed_documents(questions)
+
+    def prepare_vector_store(self, chunks):
+        """
+        Creates and populates a FAISS vector store using hypothetical prompt embeddings.
+
+        Parameters:
+        chunks (List[str]): A list of text chunks to be embedded and stored.
+
+        Returns:
+        FAISS: A FAISS vector store containing the embedded text chunks.
+        """
+        vector_store = None  # Wait to initialize to determine vector size
+
+        with ThreadPoolExecutor() as pool:
+            # Parallelized embedding generation
+            futures = [pool.submit(self.generate_hypothetical_prompt_embeddings, c) for c in chunks]
+
+            for f in tqdm(as_completed(futures), total=len(chunks)):  
+                chunk, vectors = f.result()  # Retrieve processed chunk and embeddings
+
+                # Initialize FAISS store once vector size is known
+                if vector_store is None:
+                    vector_store = FAISS(
+                        embedding_function=OpenAIEmbeddings(model="text-embedding-3-small"),
+                        index=faiss.IndexFlatL2(len(vectors[0])),
+                        docstore=InMemoryDocstore(),
+                        index_to_docstore_id={}
+                    )
+
+                # Store multiple vector representations per chunk
+                chunks_with_embedding_vectors = [(chunk.page_content, vec) for vec in vectors]
+                vector_store.add_embeddings(chunks_with_embedding_vectors)
+
+        return vector_store
+
+    def encode_pdf(self, path, chunk_size=1000, chunk_overlap=200):
+        """
+        Encodes a PDF document into a vector store using hypothetical prompt embeddings.
+
+        Args:
+            path: The path to the PDF file.
+            chunk_size: The size of each text chunk.
+            chunk_overlap: The overlap between consecutive chunks.
+
+        Returns:
+            A FAISS vector store containing the encoded book content.
+        """
+        # Load PDF documents
+        loader = PyPDFLoader(path)
+        documents = loader.load()
+
+        # Split documents into chunks
+        text_splitter = RecursiveCharacterTextSplitter(
+            chunk_size=chunk_size, chunk_overlap=chunk_overlap, length_function=len
+        )
+        texts = text_splitter.split_documents(documents)
+        cleaned_texts = replace_t_with_space(texts)
+
+        return self.prepare_vector_store(cleaned_texts)
+
+    def run(self, query):
+        """
+        Retrieves and displays the context for the given query.
+
+        Args:
+            query (str): The query to retrieve context for.
+
+        Returns:
+            None
+        """
+        # Measure retrieval time
+        start_time = time.time()
+        context = retrieve_context_per_question(query, self.chunks_query_retriever)
+        self.time_records['Retrieval'] = time.time() - start_time
+        print(f"Retrieval Time: {self.time_records['Retrieval']:.2f} seconds")
+
+        # Deduplicate context and display results
+        context = list(set(context))
+        show_context(context)
+
+
+def validate_args(args):
+    if args.chunk_size <= 0:
+        raise ValueError("chunk_size must be a positive integer.")
+    if args.chunk_overlap < 0:
+        raise ValueError("chunk_overlap must be a non-negative integer.")
+    if args.n_retrieved <= 0:
+        raise ValueError("n_retrieved must be a positive integer.")
+    return args
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description="Encode a PDF document and test a HyPE-based RAG system.")
+    parser.add_argument("--path", type=str, default="../data/Understanding_Climate_Change.pdf",
+                        help="Path to the PDF file to encode.")
+    parser.add_argument("--chunk_size", type=int, default=1000,
+                        help="Size of each text chunk (default: 1000).")
+    parser.add_argument("--chunk_overlap", type=int, default=200,
+                        help="Overlap between consecutive chunks (default: 200).")
+    parser.add_argument("--n_retrieved", type=int, default=3,
+                        help="Number of chunks to retrieve for each query (default: 3).")
+    parser.add_argument("--query", type=str, default="What is the main cause of climate change?",
+                        help="Query to test the retriever (default: 'What is the main cause of climate change?').")
+    parser.add_argument("--evaluate", action="store_true",
+                        help="Whether to evaluate the retriever's performance (default: False).")
+
+    return validate_args(parser.parse_args())
+
+
+def main(args):
+    # Initialize the HyPE-based RAG Retriever
+    hyperag = HyPE(
+        path=args.path,
+        chunk_size=args.chunk_size,
+        chunk_overlap=args.chunk_overlap,
+        n_retrieved=args.n_retrieved
+    )
+
+    # Retrieve context based on the query
+    hyperag.run(args.query)
+
+    # Evaluate the retriever's performance on the query (if requested)
+    if args.evaluate:
+        evaluate_rag(hyperag.chunks_query_retriever)
+
+
+if __name__ == '__main__':
+    # Call the main function with parsed arguments
+    main(parse_args())
--- a/images/hype.svg
+++ b/images/hype.svg