nv-ingest/examples/langchain_multimodal_rag.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7efe0f92-fdbb-4471-b74c-5bdaafed8102",
   "metadata": {},
   "source": [
    "# Multimodal RAG with LangChain"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91ece9e3-155a-44f4-81e5-2f9492c62a2f",
   "metadata": {},
   "source": [
    "This notebook shows how to perform RAG on the table, chart, and text extraction results of NV-Ingest's pdf extraction tools using LangChain"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6905d11-0ec3-43c8-961b-24cb52e36bfe",
   "metadata": {},
   "source": [
    "**Note:** In order to run this notebook, you'll need to have the NV-Ingest microservice running along with all of the other included microservices. To do this, make sure all of the services are uncommented in the file: [docker-compose.yaml](https://github.com/NVIDIA/nv-ingest/blob/main/docker-compose.yaml) and follow the [quickstart guide](https://github.com/NVIDIA/nv-ingest?tab=readme-ov-file#quickstart) to start everything up. You'll also need to have the NV-Ingest python client installed as demonstrated [here](https://github.com/NVIDIA/nv-ingest?tab=readme-ov-file#step-2-installing-python-dependencies)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "81014734-f765-48fc-8fc2-4c19f5f28eae",
   "metadata": {},
   "source": [
    "To start, make sure LangChain and pymilvus are installed and up to date"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bacbe052-4429-4c0a-8b1e-309ac55ad8fb",
   "metadata": {},
   "outputs": [],
   "source": [
    "pip install -qU langchain langchain_community langchain-nvidia-ai-endpoints langchain_milvus pymilvus"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d888ba26-04cf-4577-81a3-5bcd537fc2f6",
   "metadata": {},
   "source": [
    "Then, we'll use NV-Ingest's Ingestor interface to extract the tables and charts from a test pdf, embed them, and upload them to our Milvus vector database (VDB)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "32017922-9b9c-48b9-86ab-6319377fcce8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from nv_ingest_client.client import Ingestor\n",
    "\n",
    "ingestor = (\n",
    "    Ingestor(message_client_hostname=\"localhost\")\n",
    "    .files(\"../data/multimodal_test.pdf\")\n",
    "    .extract(\n",
    "        extract_text=False,\n",
    "        extract_tables=True,\n",
    "        extract_images=False,\n",
    "    ).embed(\n",
    "        text=False,\n",
    "        tables=True,\n",
    "    ).vdb_upload()\n",
    ")\n",
    "\n",
    "results = ingestor.ingest()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02131711-31bf-4536-81b7-8c464c7473e3",
   "metadata": {},
   "source": [
    "Now, the text, table, and chart content is extracted and stored in the Milvus VDB along with the embeddings. Next we'll connect LangChain to Milvus and create a vector store so that we can query our extraction results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "53957974-c688-4521-8c61-09f2649d5d53",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings\n",
    "from langchain_milvus import Milvus\n",
    "\n",
    "embedding_function = NVIDIAEmbeddings(base_url=\"http://localhost:8012/v1\")\n",
    "\n",
    "vectorstore = Milvus(\n",
    "    embedding_function=embedding_function,\n",
    "    collection_name=\"nv_ingest_collection\",\n",
    "    primary_field = \"pk\",\n",
    "    vector_field = \"vector\",\n",
    "    text_field=\"text\",\n",
    "    connection_args={\"uri\": \"http://localhost:19530\"},\n",
    ")\n",
    "retriever = vectorstore.as_retriever()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b87111b5-e5a8-45a0-9663-2ae6d9ea2ab6",
   "metadata": {},
   "source": [
    "Then, we'll create an RAG chain using [llama-3.1-405b-instruct](https://build.nvidia.com/meta/llama-3_1-405b-instruct) that we can use to query our pdf in natural language"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "b4c9e109-395c-40e2-a1a5-e0c0ef217e24",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os \n",
    "from langchain_nvidia_ai_endpoints import ChatNVIDIA\n",
    "\n",
    "# TODO: Add your NVIDIA API key\n",
    "os.environ[\"NVIDIA_API_KEY\"] = \"[YOUR NVIDIA API KEY HERE]\"\n",
    "\n",
    "llm = ChatNVIDIA(model=\"meta/llama-3.1-405b-instruct\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "77fd17f8-eac0-4457-b6fb-6e5c8ce90c84",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_core.prompts import PromptTemplate\n",
    "from langchain_core.runnables import RunnablePassthrough\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "\n",
    "template = (\n",
    "    \"You are an assistant for question-answering tasks. \"\n",
    "    \"Use the following pieces of retrieved context to answer \"\n",
    "    \"the question. If you don't know the answer, say that you \"\n",
    "    \"don't know. Keep the answer concise.\"\n",
    "    \"\\n\\n\"\n",
    "    \"{context}\"\n",
    "    \"Question: {question}\"\n",
    ")\n",
    "\n",
    "prompt = PromptTemplate.from_template(template)\n",
    "\n",
    "rag_chain = (\n",
    "    {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
    "    | prompt\n",
    "    | llm\n",
    "    | StrOutputParser()\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc2ee8fb-a154-46c9-9181-29a035fdcfbb",
   "metadata": {},
   "source": [
    "And now we can ask our pdf questions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "b547a19a-9ada-4a40-a246-6d7bc4d24482",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The dog is chasing a squirrel in the front yard.'"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rag_chain.invoke(\"What is the dog doing and where?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5b3f079-65a6-4d32-a190-1df96925c5c7",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}