llm-examples/learn/generation/langchain/handbook/09-langchain-streaming/09-langchain-streaming.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/08-langchain-retrieval-agent.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/08-langchain-retrieval-agent.ipynb)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### [LangChain Handbook](https://pinecone.io/learn/langchain)\n",
    "\n",
    "# Streaming\n",
    "\n",
    "For LLMs, streaming has become an increasingly popular feature. The idea is to rapidly return tokens as an LLM is generating them, rather than waiting for a full response to be created before returning anything.\n",
    "\n",
    "Streaming is actually very easy to implement for simple use-cases, but it can get complicated when we start including things like Agents which have their own logic running which can block our attempts at streaming. Fortunately, we can make it work — it just requires a little extra effort.\n",
    "\n",
    "We'll start easy by implementing streaming to the terminal for LLMs, but by the end of the notebook we'll be handling the more complex task of streaming via FastAPI for Agents.\n",
    "\n",
    "First, let's install all of the libraries we'll be using."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.2.1\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "!pip install -qU \\\n",
    "    openai==0.28.0 \\\n",
    "    langchain==0.0.301 \\\n",
    "    fastapi==0.103.1 \\\n",
    "    \"uvicorn[standard]\"==0.23.2"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## LLM Streaming to Stdout"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The simplest form of streaming is to simply \"print\" the tokens as they're generated. To set this up we need to initialize an LLM (one that supports streaming, not all do) with two specific parameters:\n",
    "\n",
    "* `streaming=True`, to enable streaming\n",
    "* `callbacks=[SomeCallBackHere()]`, where we pass a LangChain callback class (or list containing multiple).\n",
    "\n",
    "The `streaming` parameter is self-explanatory. The `callbacks` parameter and callback classes less so — essentially they act as little bits of code that do something as each token from our LLM is generated. As mentioned, the simplest form of streaming is to print the tokens as they're being generated, like with the `StreamingStdOutCallbackHandler`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from langchain.chat_models import ChatOpenAI\n",
    "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = os.getenv(\"OPENAI_API_KEY\") or \"YOUR_API_KEY\"\n",
    "\n",
    "llm = ChatOpenAI(\n",
    "    openai_api_key=os.getenv(\"OPENAI_API_KEY\"),\n",
    "    temperature=0.0,\n",
    "    model_name=\"gpt-3.5-turbo\",\n",
    "    streaming=True,  # ! important\n",
    "    callbacks=[StreamingStdOutCallbackHandler()]  # ! important\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now if we run the LLM we'll see the response being _streamed_."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Once upon a time, in a small village nestled deep within a lush forest, there lived a young girl named Lily. She was known for her vibrant red hair, sparkling green eyes, and a heart full of curiosity. Lily had always been fascinated by the stories her grandmother would tell her about a magical creature called the Moonlight Unicorn.\n",
      "\n",
      "Legend had it that the Moonlight Unicorn was a rare and majestic creature that only appeared during the full moon. Its coat shimmered like silver, and its horn glowed with a soft, ethereal light. The unicorn was said to possess incredible powers, capable of granting wishes and bringing good fortune to those who encountered it.\n",
      "\n",
      "Driven by her desire to see the Moonlight Unicorn, Lily embarked on a journey through the enchanted forest. Armed with her grandmother's stories and a sense of adventure, she ventured deeper into the woods, following the faint whispers of the wind.\n",
      "\n",
      "As she wandered through the forest, Lily encountered various magical creatures. She met a mischievous sprite who led her to a hidden waterfall, where she discovered a family of playful water nymphs. They taught her the secrets of the forest and shared their wisdom about the Moonlight Unicorn.\n",
      "\n",
      "With newfound knowledge, Lily continued her quest, guided by the moon's gentle glow. She encountered talking animals, wise old trees, and even a friendly dragon who offered her protection on her journey. Each encounter brought her closer to her goal, and her heart filled with hope.\n",
      "\n",
      "Days turned into weeks, and weeks turned into months, but Lily never lost faith. She faced countless challenges along the way, from treacherous paths to dark and eerie caves. Yet, she pressed on, fueled by her unwavering determination to find the Moonlight Unicorn.\n",
      "\n",
      "One fateful night, as the full moon illuminated the forest, Lily stumbled upon a clearing bathed in a soft, silver light. There, standing before her, was the Moonlight Unicorn. Its coat shimmered like stardust, and its horn emitted a gentle glow that illuminated the entire clearing.\n",
      "\n",
      "Overwhelmed with joy, Lily approached the unicorn cautiously, her heart pounding with excitement. The unicorn, sensing her pure intentions, greeted her with a gentle nudge. Lily reached out and touched the unicorn's horn, feeling a surge of warmth and magic course through her veins.\n",
      "\n",
      "In that moment, the Moonlight Unicorn spoke to Lily, its voice as soft as a whisper. It thanked her for her unwavering determination and rewarded her with a single wish. Lily, without hesitation, wished for the forest to be protected and preserved for generations to come, ensuring its magic would never fade away.\n",
      "\n",
      "As her wish was granted, the forest came alive with renewed energy. The trees stood taller, the animals thrived, and the enchantment of the forest grew stronger than ever before. Lily knew that her journey had not only brought her closer to the Moonlight Unicorn but had also helped preserve the magic of the forest for all to enjoy.\n",
      "\n",
      "With a heart full of gratitude, Lily bid farewell to the Moonlight Unicorn, knowing that she had fulfilled her purpose. She returned to her village, where she shared her incredible adventure with her grandmother and the villagers, inspiring them to cherish and protect the forest.\n",
      "\n",
      "From that day forward, Lily became known as the Guardian of the Forest, and her story was passed down through generations. The enchanted forest thrived, and the Moonlight Unicorn continued to bless those who sought its presence.\n",
      "\n",
      "And so, the tale of Lily and the Moonlight Unicorn became a legend, reminding people of the power of determination, the magic of nature, and the importance of preserving the wonders that surround us."
     ]
    },
    {
     "data": {
      "text/plain": [
       "AIMessageChunk(content=\"Once upon a time, in a small village nestled deep within a lush forest, there lived a young girl named Lily. She was known for her vibrant red hair, sparkling green eyes, and a heart full of curiosity. Lily had always been fascinated by the stories her grandmother would tell her about a magical creature called the Moonlight Unicorn.\\n\\nLegend had it that the Moonlight Unicorn was a rare and majestic creature that only appeared during the full moon. Its coat shimmered like silver, and its horn glowed with a soft, ethereal light. The unicorn was said to possess incredible powers, capable of granting wishes and bringing good fortune to those who encountered it.\\n\\nDriven by her desire to see the Moonlight Unicorn, Lily embarked on a journey through the enchanted forest. Armed with her grandmother's stories and a sense of adventure, she ventured deeper into the woods, following the faint whispers of the wind.\\n\\nAs she wandered through the forest, Lily encountered various magical creatures. She met a mischievous sprite who led her to a hidden waterfall, where she discovered a family of playful water nymphs. They taught her the secrets of the forest and shared their wisdom about the Moonlight Unicorn.\\n\\nWith newfound knowledge, Lily continued her quest, guided by the moon's gentle glow. She encountered talking animals, wise old trees, and even a friendly dragon who offered her protection on her journey. Each encounter brought her closer to her goal, and her heart filled with hope.\\n\\nDays turned into weeks, and weeks turned into months, but Lily never lost faith. She faced countless challenges along the way, from treacherous paths to dark and eerie caves. Yet, she pressed on, fueled by her unwavering determination to find the Moonlight Unicorn.\\n\\nOne fateful night, as the full moon illuminated the forest, Lily stumbled upon a clearing bathed in a soft, silver light. There, standing before her, was the Moonlight Unicorn. Its coat shimmered like stardust, and its horn emitted a gentle glow that illuminated the entire clearing.\\n\\nOverwhelmed with joy, Lily approached the unicorn cautiously, her heart pounding with excitement. The unicorn, sensing her pure intentions, greeted her with a gentle nudge. Lily reached out and touched the unicorn's horn, feeling a surge of warmth and magic course through her veins.\\n\\nIn that moment, the Moonlight Unicorn spoke to Lily, its voice as soft as a whisper. It thanked her for her unwavering determination and rewarded her with a single wish. Lily, without hesitation, wished for the forest to be protected and preserved for generations to come, ensuring its magic would never fade away.\\n\\nAs her wish was granted, the forest came alive with renewed energy. The trees stood taller, the animals thrived, and the enchantment of the forest grew stronger than ever before. Lily knew that her journey had not only brought her closer to the Moonlight Unicorn but had also helped preserve the magic of the forest for all to enjoy.\\n\\nWith a heart full of gratitude, Lily bid farewell to the Moonlight Unicorn, knowing that she had fulfilled her purpose. She returned to her village, where she shared her incredible adventure with her grandmother and the villagers, inspiring them to cherish and protect the forest.\\n\\nFrom that day forward, Lily became known as the Guardian of the Forest, and her story was passed down through generations. The enchanted forest thrived, and the Moonlight Unicorn continued to bless those who sought its presence.\\n\\nAnd so, the tale of Lily and the Moonlight Unicorn became a legend, reminding people of the power of determination, the magic of nature, and the importance of preserving the wonders that surround us.\", additional_kwargs={}, example=False)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain.schema import HumanMessage\n",
    "\n",
    "# create messages to be passed to chat LLM\n",
    "messages = [HumanMessage(content=\"tell me a long story\")]\n",
    "\n",
    "llm(messages)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That was surprisingly easy, but things begin to get much more complicated as soon as we begin using agents. Let's first initialize an agent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.memory import ConversationBufferWindowMemory\n",
    "from langchain.agents import load_tools, AgentType, initialize_agent\n",
    "\n",
    "# initialize conversational memory\n",
    "memory = ConversationBufferWindowMemory(\n",
    "    memory_key=\"chat_history\",\n",
    "    k=5,\n",
    "    return_messages=True,\n",
    "    output_key=\"output\"\n",
    ")\n",
    "\n",
    "# create a single tool to see how it impacts streaming\n",
    "tools = load_tools([\"llm-math\"], llm=llm)\n",
    "\n",
    "# initialize the agent\n",
    "agent = initialize_agent(\n",
    "    agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,\n",
    "    tools=tools,\n",
    "    llm=llm,\n",
    "    memory=memory,\n",
    "    verbose=True,\n",
    "    max_iterations=3,\n",
    "    early_stopping_method=\"generate\",\n",
    "    return_intermediate_steps=False\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We already added our `StreamingStdOutCallbackHandler` to the agent as we initialized the agent with the same `llm` as we created with that callback. So let's see what we get when running the agent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
      "{\n",
      "    \"action\": \"Final Answer\",\n",
      "    \"action_input\": \"I'm an AI language model, so I don't have feelings, but I'm here to help you. How can I assist you today?\"\n",
      "}\u001b[32;1m\u001b[1;3m{\n",
      "    \"action\": \"Final Answer\",\n",
      "    \"action_input\": \"I'm an AI language model, so I don't have feelings, but I'm here to help you. How can I assist you today?\"\n",
      "}\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'input': 'Hello, how are you?',\n",
       " 'chat_history': [],\n",
       " 'output': \"I'm an AI language model, so I don't have feelings, but I'm here to help you. How can I assist you today?\"}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt = \"Hello, how are you?\"\n",
    "\n",
    "agent(prompt)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Not bad, but we do now have the issue of streaming the _entire_ output from the LLM. Because we're using an agent, the LLM is instructed to output the JSON format we can see here so that the agent logic can handle tool usage, multiple \"thinking\" steps, and so on. For example, if we ask a math question we'll see this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
      "{\n",
      "    \"action\": \"Calculator\",\n",
      "    \"action_input\": \"sqrt(71)\"\n",
      "}\u001b[32;1m\u001b[1;3m{\n",
      "    \"action\": \"Calculator\",\n",
      "    \"action_input\": \"sqrt(71)\"\n",
      "}\u001b[0m```text\n",
      "sqrt(71)\n",
      "```\n",
      "...numexpr.evaluate(\"sqrt(71)\")...\n",
      "\n",
      "Observation: \u001b[36;1m\u001b[1;3mAnswer: 8.426149773176359\u001b[0m\n",
      "Thought:{\n",
      "    \"action\": \"Final Answer\",\n",
      "    \"action_input\": \"The square root of 71 is approximately 8.426149773176359.\"\n",
      "}\u001b[32;1m\u001b[1;3m{\n",
      "    \"action\": \"Final Answer\",\n",
      "    \"action_input\": \"The square root of 71 is approximately 8.426149773176359.\"\n",
      "}\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'input': 'what is the square root of 71?',\n",
       " 'chat_history': [HumanMessage(content='Hello, how are you?', additional_kwargs={}, example=False),\n",
       "  AIMessage(content=\"I'm an AI language model, so I don't have feelings, but I'm here to help you. How can I assist you today?\", additional_kwargs={}, example=False),\n",
       "  HumanMessage(content='what is the square root of 71?', additional_kwargs={}, example=False),\n",
       "  AIMessage(content='The square root of 71 is approximately 8.426149773176359.', additional_kwargs={}, example=False),\n",
       "  HumanMessage(content='what is the square root of 71?', additional_kwargs={}, example=False),\n",
       "  AIMessage(content='The square root of 71 is approximately 8.426149773176359.', additional_kwargs={}, example=False)],\n",
       " 'output': 'The square root of 71 is approximately 8.426149773176359.'}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "agent(\"what is the square root of 71?\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's interesting to see during development but we'll want to clean this streaming up a little in any actual use-case. For that we can go with two approaches — either we build a custom callback handler, or use a purpose built callback handler from LangChain (as usual, LangChain has something for everything). Let's first try LangChain's purpose-built `FinalStreamingStdOutCallbackHandler`.\n",
    "\n",
    "We will overwrite the existing `callbacks` attribute found here:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "agent.agent.llm_chain.llm"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the new callback handler:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.callbacks.streaming_stdout_final_only import (\n",
    "    FinalStreamingStdOutCallbackHandler,\n",
    ")\n",
    "\n",
    "agent.agent.llm_chain.llm.callbacks = [\n",
    "    FinalStreamingStdOutCallbackHandler(\n",
    "        answer_prefix_tokens=[\"Final\", \"Answer\"]\n",
    "    )\n",
    "]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's try it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
      "\u001b[32;1m\u001b[1;3m{\n",
      "    \"action\": \"Calculator\",\n",
      "    \"action_input\": \"sqrt(71)\"\n",
      "}\u001b[0m```text\n",
      "sqrt(71)\n",
      "```\n",
      "...numexpr.evaluate(\"sqrt(71)\")...\n",
      "\n",
      "Observation: \u001b[36;1m\u001b[1;3mAnswer: 8.426149773176359\u001b[0m\n",
      "Thought:\",\n",
      "    \"action_input\": \"The square root of 71 is approximately 8.426149773176359.\"\n",
      "}\u001b[32;1m\u001b[1;3m{\n",
      "    \"action\": \"Final Answer\",\n",
      "    \"action_input\": \"The square root of 71 is approximately 8.426149773176359.\"\n",
      "}\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'input': 'what is the square root of 71?',\n",
       " 'chat_history': [HumanMessage(content='Hello, how are you?', additional_kwargs={}, example=False),\n",
       "  AIMessage(content=\"I'm an AI language model, so I don't have feelings, but I'm here to help you. How can I assist you today?\", additional_kwargs={}, example=False),\n",
       "  HumanMessage(content='what is the square root of 71?', additional_kwargs={}, example=False),\n",
       "  AIMessage(content='The square root of 71 is approximately 8.426149773176359.', additional_kwargs={}, example=False),\n",
       "  HumanMessage(content='what is the square root of 71?', additional_kwargs={}, example=False),\n",
       "  AIMessage(content='The square root of 71 is approximately 8.426149773176359.', additional_kwargs={}, example=False),\n",
       "  HumanMessage(content='what is the square root of 71?', additional_kwargs={}, example=False),\n",
       "  AIMessage(content='The square root of 71 is approximately 8.426149773176359.', additional_kwargs={}, example=False),\n",
       "  HumanMessage(content='what is the square root of 71?', additional_kwargs={}, example=False),\n",
       "  AIMessage(content='The square root of 71 is approximately 8.426149773176359.', additional_kwargs={}, example=False)],\n",
       " 'output': 'The square root of 71 is approximately 8.426149773176359.'}"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "agent(\"what is the square root of 71?\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Not quite there, we should really clean up the `answer_prefix_tokens` argument but it is hard to get right. It's generally easier to use a custom callback handler like so:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "\n",
    "class CallbackHandler(StreamingStdOutCallbackHandler):\n",
    "    def __init__(self):\n",
    "        self.content: str = \"\"\n",
    "        self.final_answer: bool = False\n",
    "\n",
    "    def on_llm_new_token(self, token: str, **kwargs: any) -> None:\n",
    "        self.content += token\n",
    "        if \"Final Answer\" in self.content:\n",
    "            # now we're in the final answer section, but don't print yet\n",
    "            self.final_answer = True\n",
    "            self.content = \"\"\n",
    "        if self.final_answer:\n",
    "            if '\"action_input\": \"' in self.content:\n",
    "                if token not in [\"}\"]:\n",
    "                    sys.stdout.write(token)  # equal to `print(token, end=\"\")`\n",
    "                    sys.stdout.flush()\n",
    "\n",
    "agent.agent.llm_chain.llm.callbacks = [CallbackHandler()]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's try again:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
      "\u001b[32;1m\u001b[1;3m{\n",
      "    \"action\": \"Calculator\",\n",
      "    \"action_input\": \"sqrt(71)\"\n",
      "}\u001b[0m```text\n",
      "sqrt(71)\n",
      "```\n",
      "...numexpr.evaluate(\"sqrt(71)\")...\n",
      "\n",
      "Observation: \u001b[36;1m\u001b[1;3mAnswer: 8.426149773176359\u001b[0m\n",
      "Thought: \"The square root of 71 is approximately 8.426149773176359.\"\n",
      "\u001b[32;1m\u001b[1;3m{\n",
      "    \"action\": \"Final Answer\",\n",
      "    \"action_input\": \"The square root of 71 is approximately 8.426149773176359.\"\n",
      "}\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'input': 'what is the square root of 71?',\n",
       " 'chat_history': [HumanMessage(content='what is the square root of 71?', additional_kwargs={}, example=False),\n",
       "  AIMessage(content='The square root of 71 is approximately 8.426149773176359.', additional_kwargs={}, example=False),\n",
       "  HumanMessage(content='what is the square root of 71?', additional_kwargs={}, example=False),\n",
       "  AIMessage(content='The square root of 71 is approximately 8.426149773176359.', additional_kwargs={}, example=False),\n",
       "  HumanMessage(content='what is the square root of 71?', additional_kwargs={}, example=False),\n",
       "  AIMessage(content='The square root of 71 is approximately 8.426149773176359.', additional_kwargs={}, example=False),\n",
       "  HumanMessage(content='what is the square root of 71?', additional_kwargs={}, example=False),\n",
       "  AIMessage(content='The square root of 71 is approximately 8.426149773176359.', additional_kwargs={}, example=False),\n",
       "  HumanMessage(content='what is the square root of 71?', additional_kwargs={}, example=False),\n",
       "  AIMessage(content='The square root of 71 is approximately 8.426149773176359.', additional_kwargs={}, example=False)],\n",
       " 'output': 'The square root of 71 is approximately 8.426149773176359.'}"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "agent(\"what is the square root of 71?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<langchain.callbacks.streaming_stdout.StreamingStdOutCallbackHandler at 0x11fd01640>]"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "agent.agent.llm_chain.llm"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It isn't perfect, but this is getting better. Now, in most scenarios we're unlikely to simply be printing output to a terminal or notebook. When we want to do something more complex like stream this data through another API, we need to do things differently."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using FastAPI with Agents"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In most cases we'll be placing our LLMs, Agents, etc behind something like an API. Let's add that into the mix and see how we can implement streaming for agents with FastAPI.\n",
    "\n",
    "First, we'll create a simple `main.py` script to contain our FastAPI logic. You can find it in the same GitHub repo location as this notebook ([here's a link]() TK add link).\n",
    "\n",
    "To run the API, navigate to the directory and run `uvicorn main:app --reload`. Once complete, you can confirm it is running by looking for the 🤙 status in the next cell output:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'status': '🤙'}"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import requests\n",
    "\n",
    "res = requests.get(\"http://localhost:8000/health\")\n",
    "res.json()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Unlike with our StdOut streaming, we now need to send our tokens to a generator function that feeds those tokens to FastAPI via a `StreamingResponse` object. To handle this we need to use async code, otherwise our generator will not begin emitting anything until _after_ generation is already complete.\n",
    "\n",
    "The `Queue` is accessed by our callback handler, as as each token is generated, it puts the token into the queue. Our generator function asyncronously checks for new tokens being added to the queue. As soon as the generator sees a token has been added, it gets the token and yields it to our `StreamingResponse`.\n",
    "\n",
    "To see it in action, we'll define a stream requests function called `get_stream`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_stream(query: str):\n",
    "    s = requests.Session()\n",
    "    with s.post(\n",
    "        \"http://localhost:8000/chat\",\n",
    "        stream=True,\n",
    "        json={\"text\": query}\n",
    "    ) as r:\n",
    "        for line in r.iter_content():\n",
    "            print(line.decode(\"utf-8\"), end=\"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " \"Once upon a time, in a faraway land, there was a young prince named Edward. He lived in a magnificent castle surrounded by lush gardens and sparkling fountains. Edward was known for his kindness and generosity, and he was loved by all who knew him. One day, a mysterious old woman appeared at the castle gates. She was hunched over and carried a staff, and her eyes sparkled with wisdom. The old woman told Edward that she was a powerful sorceress, and she had come to test his character. She presented him with a challenge: if he could prove himself worthy, she would grant him a single wish. Edward eagerly accepted the challenge, and the old woman led him on a journey through enchanted forests, treacherous mountains, and dark caves. Along the way, Edward encountered many obstacles and faced numerous trials. He showed bravery in the face of danger, compassion towards those in need, and wisdom in his decision-making. Finally, after what felt like an eternity, Edward reached the end of his journey. The old woman stood before him, her eyes filled with pride. She granted him his wish, and Edward chose to use it to bring prosperity and happiness to his kingdom. From that day forward, Edward ruled with fairness and justice, and his kingdom flourished. The story of Prince Edward's journey and his ultimate triumph spread far and wide, inspiring others to be brave, kind, and wise. And so, the legend of Prince Edward lived on, reminding us all of the power of character and the importance of staying true to ourselves.\"\n"
     ]
    }
   ],
   "source": [
    "get_stream(\"tell me a long story\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "ml",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}