mirror of
https://github.com/pinecone-io/examples.git
synced 2023-10-11 20:04:54 +03:00
adding final content/tweaks from Cohere + Pinecone demo
This commit is contained in:
@@ -1,345 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "view-in-github",
|
||||
"colab_type": "text"
|
||||
},
|
||||
"source": [
|
||||
"<a href=\"https://colab.research.google.com/github/pinecone-io/examples/blob/cohere-webinar-2205/integrations/cohere/webinar_classification_and_search/03_filtering.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# Searching and Filtering"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "7jef4UKJwpdC"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"!pip install cohere pinecone-client"
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "Qia-LzrBwpvB",
|
||||
"outputId": "dd0b612d-5c58-40ac-d172-69201d6a7407"
|
||||
},
|
||||
"execution_count": 1,
|
||||
"outputs": [
|
||||
{
|
||||
"output_type": "stream",
|
||||
"name": "stdout",
|
||||
"text": [
|
||||
"Collecting cohere\n",
|
||||
" Downloading cohere-1.3.9-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.0 MB)\n",
|
||||
"\u001b[K |████████████████████████████████| 18.0 MB 180 kB/s \n",
|
||||
"\u001b[?25hCollecting pinecone-client\n",
|
||||
" Downloading pinecone_client-2.0.10-py3-none-any.whl (159 kB)\n",
|
||||
"\u001b[K |████████████████████████████████| 159 kB 58.5 MB/s \n",
|
||||
"\u001b[?25hRequirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from cohere) (2.23.0)\n",
|
||||
"Collecting pyyaml>=5.4\n",
|
||||
" Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)\n",
|
||||
"\u001b[K |████████████████████████████████| 596 kB 36.8 MB/s \n",
|
||||
"\u001b[?25hRequirement already satisfied: urllib3>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from pinecone-client) (1.24.3)\n",
|
||||
"Collecting loguru>=0.5.0\n",
|
||||
" Downloading loguru-0.6.0-py3-none-any.whl (58 kB)\n",
|
||||
"\u001b[K |████████████████████████████████| 58 kB 6.3 MB/s \n",
|
||||
"\u001b[?25hRequirement already satisfied: python-dateutil>=2.5.3 in /usr/local/lib/python3.7/dist-packages (from pinecone-client) (2.8.2)\n",
|
||||
"Collecting dnspython>=2.0.0\n",
|
||||
" Downloading dnspython-2.2.1-py3-none-any.whl (269 kB)\n",
|
||||
"\u001b[K |████████████████████████████████| 269 kB 37.2 MB/s \n",
|
||||
"\u001b[?25hRequirement already satisfied: typing-extensions>=3.7.4 in /usr/local/lib/python3.7/dist-packages (from pinecone-client) (4.2.0)\n",
|
||||
"Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.5.3->pinecone-client) (1.15.0)\n",
|
||||
"Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->cohere) (2.10)\n",
|
||||
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->cohere) (2021.10.8)\n",
|
||||
"Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->cohere) (3.0.4)\n",
|
||||
"Installing collected packages: pyyaml, loguru, dnspython, pinecone-client, cohere\n",
|
||||
" Attempting uninstall: pyyaml\n",
|
||||
" Found existing installation: PyYAML 3.13\n",
|
||||
" Uninstalling PyYAML-3.13:\n",
|
||||
" Successfully uninstalled PyYAML-3.13\n",
|
||||
"Successfully installed cohere-1.3.9 dnspython-2.2.1 loguru-0.6.0 pinecone-client-2.0.10 pyyaml-6.0\n"
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "uQomGiK3wDbJ"
|
||||
},
|
||||
"source": [
|
||||
"\n",
|
||||
"\n",
|
||||
"Taking our search one step further, we can add filtering to specify our search scope, while still maintaining fast search times using Pinecone's single stage filtering.\n",
|
||||
"\n",
|
||||
"We can start by initializing Cohere + Pinecone."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"id": "29JYnPS0wDbM"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"COHERE_KEY = \"<<COHERE_KEY_HERE>>\"\n",
|
||||
"PINECONE_KEY = \"<<PINECONE_KEY_HERE>>\" # app.pinecone.io"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"id": "XK1E7xhdwDbN"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import cohere\n",
|
||||
"import pinecone\n",
|
||||
"\n",
|
||||
"co = cohere.Client(COHERE_KEY)\n",
|
||||
"\n",
|
||||
"pinecone.init(PINECONE_KEY, environment='us-west1-gcp')\n",
|
||||
"\n",
|
||||
"index_name = 'cohere-pinecone-askscience'\n",
|
||||
"# connect to index\n",
|
||||
"index = pinecone.Index(index_name)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "VgXYr2RKwDbN"
|
||||
},
|
||||
"source": [
|
||||
"For the filters we will use *four* categories, each of which includes many flairs used by users in **r/askscience**."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"id": "y7XktfVpwDbN"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"all_tags = ['Physics', 'Biology', 'Engineering', 'Unknown', 'Earth Sciences',\n",
|
||||
" 'Astronomy', 'Anthropology', 'Human Body', 'Social Science',\n",
|
||||
" 'Medicine', 'Computing', 'Psychology', 'Chemistry', 'Linguistics',\n",
|
||||
" 'Mathematics', 'Planetary Sci.', 'Neuroscience', 'Paleontology',\n",
|
||||
" 'COVID-19', 'Archaeology', 'Earth Sciences and Biology', 'Meta',\n",
|
||||
" 'Economics', 'CERN AMA', 'Dog Cognition AMA',\n",
|
||||
" 'Cancer Treatment AMA', 'Psychology AMA', 'Archaeology AMA',\n",
|
||||
" 'Alzheimer’s disease AMA', 'Oceanography AMA', 'Biology AMA',\n",
|
||||
" 'Biology/Agriculture', 'Neuroscience AMA', 'Climate History AMA',\n",
|
||||
" 'Climate Science AMA', 'Food Safety AMA', 'Ecology and Evolution']\n",
|
||||
"\n",
|
||||
"chats = {\n",
|
||||
" \"#general\": all_tags,\n",
|
||||
" \"#medical\": [\n",
|
||||
" 'Human Body', 'Medicine', 'COVID-19', 'Cancer Treatment AMA', 'Food Safety AMA'\n",
|
||||
" ],\n",
|
||||
" \"#natural-sciences\": [\n",
|
||||
" 'Physics', 'Biology', 'Earth Sciences', 'Astronomy', 'Anthropology'\n",
|
||||
" 'Human Body', 'Chemistry', 'Mathematics', 'Planetry Sci.', 'Neuroscience',\n",
|
||||
" 'Earth Sciences and Biology', 'CERN AMA', 'Oceanography AMA',\n",
|
||||
" 'Biology AMA', 'Biology/Argiculture', 'Neuroscience AMA', 'Climate History AMA',\n",
|
||||
" 'Climate Science AMA', 'Ecology and Evolution'\n",
|
||||
" ],\n",
|
||||
" \"#social-sciences\": [\n",
|
||||
" 'Anthropology', 'Social Science', 'Psychology', 'Linguistics', 'Economics',\n",
|
||||
" 'Psychology AMA'\n",
|
||||
" ]\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "zWyxkxZKwDbO",
|
||||
"outputId": "0d60b392-1d38-425e-80f8-ae661544078b"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"output_type": "stream",
|
||||
"name": "stdout",
|
||||
"text": [
|
||||
"0.48: Are there any positive effects of climate change? (Earth Sciences)\n",
|
||||
"0.42: AskScience AMA Series: We mapped human transformation of Earth over the past 10,000 years and the results will surprise you! Ask us anything! (Unknown)\n",
|
||||
"0.36: Has human society and culture fundamentally altered our own biological evolution? (Ecology and Evolution)\n",
|
||||
"0.36: What environmental impacts would a border wall between the United States and Mexico cause? (Earth Sciences)\n",
|
||||
"0.36: How different was this world ecologically, about 2000 to 2500 yrs ago? (Earth Sciences)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = \"what are the effects of the anthropocene?\"\n",
|
||||
"\n",
|
||||
"# create embedding with cohere\n",
|
||||
"xq = co.embed(\n",
|
||||
" texts=[query],\n",
|
||||
" model='large',\n",
|
||||
" truncate='LEFT'\n",
|
||||
").embeddings\n",
|
||||
"\n",
|
||||
"# query, returning the top 5 most similar results\n",
|
||||
"res = index.query(xq, top_k=5, include_metadata=True)\n",
|
||||
"\n",
|
||||
"for match in res['results'][0]['matches']:\n",
|
||||
" print(f\"{match['score']:.2f}: {match['metadata']['title']} ({match['metadata']['link_flair_text']})\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "tu9MmgYzwDbP"
|
||||
},
|
||||
"source": [
|
||||
"Naturally there's some overlap between topics (and this example may be pretty inaccurate), but these will build the filters we will use.\n",
|
||||
"\n",
|
||||
"Filtering in Pinecone is pretty simple, we pass our conditions to the `filter` parameter using operators like equal to `$eq`, in `$in`, greater than `$gt`, etc. So if we want to return `Paleontology` specific results we can like so:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "fDv2PR_9wDbS",
|
||||
"outputId": "74cec566-f64e-4767-96ce-6c1b28b23a4a"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"output_type": "stream",
|
||||
"name": "stdout",
|
||||
"text": [
|
||||
"0.25: What exactly would the landscape of the British Isles have looked like prior to human cultivation? (Paleontology)\n",
|
||||
"0.17: AskScience AMA Series: I am paleontologist Hans Sues, I study late Paleozoic and Mesozoic vertebrates. Ask Me Anything! (Paleontology)\n",
|
||||
"0.16: Given the way the Indian subcontinent was once a very large island, is it possible to find the fossils of coastal animals in the Himalayas? (Paleontology)\n",
|
||||
"0.15: If I went back to the Cretacious era to go fishing, what would I catch? How big would they be? What eon would be most interesting to fish in? (Paleontology)\n",
|
||||
"0.13: We are paleontologists who study fossils from an incredible site in Texas called the Arlington Archosaur Site. Ask us anything! (Paleontology)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = \"what are the effects of the anthropocene?\"\n",
|
||||
"\n",
|
||||
"# create embedding with cohere\n",
|
||||
"xq = co.embed(\n",
|
||||
" texts=[query],\n",
|
||||
" model='large',\n",
|
||||
" truncate='LEFT'\n",
|
||||
").embeddings\n",
|
||||
"\n",
|
||||
"# then query pinecone w/ a filter\n",
|
||||
"res = index.query(\n",
|
||||
" xq, top_k=5, include_metadata=True,\n",
|
||||
" filter={\n",
|
||||
" 'link_flair_text': {'$eq': 'Paleontology'}\n",
|
||||
" })\n",
|
||||
"\n",
|
||||
"for match in res['results'][0]['matches']:\n",
|
||||
" print(f\"{match['score']:.2f}: {match['metadata']['title']} ({match['metadata']['link_flair_text']})\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Or as with our demo, we might group flair labels together and use `$in`."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "kytGzOZDxGHS"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"query = \"what are the effects of the anthropocene?\"\n",
|
||||
"\n",
|
||||
"# create embedding with cohere\n",
|
||||
"xq = co.embed(\n",
|
||||
" texts=[query],\n",
|
||||
" model='large',\n",
|
||||
" truncate='LEFT'\n",
|
||||
").embeddings\n",
|
||||
"\n",
|
||||
"# then query pinecone w/ a filter\n",
|
||||
"res = index.query(\n",
|
||||
" xq, top_k=5, include_metadata=True,\n",
|
||||
" filter={\n",
|
||||
" 'link_flair_text': {'$in': chats['#social-sciences']}\n",
|
||||
" })\n",
|
||||
"\n",
|
||||
"for match in res['results'][0]['matches']:\n",
|
||||
" print(f\"{match['score']:.2f}: {match['metadata']['title']} ({match['metadata']['link_flair_text']})\")"
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "ft1YCgctxDNz",
|
||||
"outputId": "d331409c-a209-45b4-d5a9-2525e61d7835"
|
||||
},
|
||||
"execution_count": 8,
|
||||
"outputs": [
|
||||
{
|
||||
"output_type": "stream",
|
||||
"name": "stdout",
|
||||
"text": [
|
||||
"0.23: Why has Europe's population remained relatively constant whereas other continents have shown clear increase? (Social Science)\n",
|
||||
"0.22: AskScience AMA Series: I’m Stephan Lewandowsky, here with Klaus Oberauer, we will be responding to your questions about the conflict between our brains and our globe: How will we meet the challenges of the 21st century despite our cognitive limitations? AMA! (Psychology)\n",
|
||||
"0.20: Has the growing % of the population avoiding meat consumption had any impact on meat production? (Anthropology)\n",
|
||||
"0.20: What will happen to us if the birth replacement rate keeps falling? (Social Science)\n",
|
||||
"0.19: If modern man came into existence 200k years ago, but modern day societies began about 10k years ago with the discoveries of agriculture and livestock, what the hell where they doing the other 190k years?? (Anthropology)\n"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"interpreter": {
|
||||
"hash": "52a41a0d6b6e16f4b56c5995d2d276cfef8bcc0e6d8203d15fa60c631f3e9c76"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.8.12 ('streamlit')",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.12"
|
||||
},
|
||||
"orig_nbformat": 4,
|
||||
"colab": {
|
||||
"name": "03_filtering.ipynb",
|
||||
"provenance": [],
|
||||
"collapsed_sections": [],
|
||||
"include_colab_link": true
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
@@ -1 +1 @@
|
||||
Here are notebooks and data for the Cohere x Pinecone webinar in May 2022.
|
||||
Here are notebooks and data for the Cohere x Pinecone webinar in May 2022. Find the [demo here](https://share.streamlit.io/pinecone-io/playground/not_slack_chatbot/src/server.py) and for a recording of the workshop [see here]().
|
||||
|
||||
22929
integrations/cohere/webinar_classification_and_search/askscience.tsv
Normal file
22929
integrations/cohere/webinar_classification_and_search/askscience.tsv
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user