Created using Colaboratory

This commit is contained in:
James Briggs
2022-05-19 16:09:05 +01:00
parent d17bbb4891
commit 0d2ee0d69f

View File

@@ -7,7 +7,7 @@
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/pinecone-io/examples/blob/cohere-webinar-2205/integrations/cohere/webinar_classification_and_search/01_semantic_search.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
"<a href=\"https://colab.research.google.com/github/pinecone-io/examples/blob/master/integrations/cohere/webinar_classification_and_search/01_semantic_search.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
@@ -813,7 +813,208 @@
"source": [
"Looks great, our semantic search pipeline is clearly able to identify the meaning between each of our queries and return the most semantically similar questions from the already indexed questions.\n",
"\n",
"---"
"---\n",
"\n",
"## Adding Filtering\n",
"\n",
"Taking our search one step further, we can add filtering to specify our search scope, while still maintaining fast search times using Pinecone's single stage filtering.\n",
"\n",
"For the filters we will use *four* categories, each of which includes many flairs used by users in **r/askscience**."
]
},
{
"cell_type": "code",
"source": [
"all_tags = ['Physics', 'Biology', 'Engineering', 'Unknown', 'Earth Sciences',\n",
" 'Astronomy', 'Anthropology', 'Human Body', 'Social Science',\n",
" 'Medicine', 'Computing', 'Psychology', 'Chemistry', 'Linguistics',\n",
" 'Mathematics', 'Planetary Sci.', 'Neuroscience', 'Paleontology',\n",
" 'COVID-19', 'Archaeology', 'Earth Sciences and Biology', 'Meta',\n",
" 'Economics', 'CERN AMA', 'Dog Cognition AMA',\n",
" 'Cancer Treatment AMA', 'Psychology AMA', 'Archaeology AMA',\n",
" 'Alzheimers disease AMA', 'Oceanography AMA', 'Biology AMA',\n",
" 'Biology/Agriculture', 'Neuroscience AMA', 'Climate History AMA',\n",
" 'Climate Science AMA', 'Food Safety AMA', 'Ecology and Evolution']\n",
"\n",
"chats = {\n",
" \"#general\": all_tags,\n",
" \"#medical\": [\n",
" 'Human Body', 'Medicine', 'COVID-19', 'Cancer Treatment AMA', 'Food Safety AMA'\n",
" ],\n",
" \"#natural-sciences\": [\n",
" 'Physics', 'Biology', 'Earth Sciences', 'Astronomy', 'Anthropology'\n",
" 'Human Body', 'Chemistry', 'Mathematics', 'Planetry Sci.', 'Neuroscience',\n",
" 'Earth Sciences and Biology', 'CERN AMA', 'Oceanography AMA',\n",
" 'Biology AMA', 'Biology/Argiculture', 'Neuroscience AMA', 'Climate History AMA',\n",
" 'Climate Science AMA', 'Ecology and Evolution'\n",
" ],\n",
" \"#social-sciences\": [\n",
" 'Anthropology', 'Social Science', 'Psychology', 'Linguistics', 'Economics',\n",
" 'Psychology AMA'\n",
" ]\n",
"}"
],
"metadata": {
"id": "8e6D2BrU15kl"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"First lets try querying *without* any filters."
],
"metadata": {
"id": "zPagQJLI19LF"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "zWyxkxZKwDbO",
"outputId": "0d60b392-1d38-425e-80f8-ae661544078b"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"0.48: Are there any positive effects of climate change? (Earth Sciences)\n",
"0.42: AskScience AMA Series: We mapped human transformation of Earth over the past 10,000 years and the results will surprise you! Ask us anything! (Unknown)\n",
"0.36: Has human society and culture fundamentally altered our own biological evolution? (Ecology and Evolution)\n",
"0.36: What environmental impacts would a border wall between the United States and Mexico cause? (Earth Sciences)\n",
"0.36: How different was this world ecologically, about 2000 to 2500 yrs ago? (Earth Sciences)\n"
]
}
],
"source": [
"query = \"what are the effects of the anthropocene?\"\n",
"\n",
"# create embedding with cohere\n",
"xq = co.embed(\n",
" texts=[query],\n",
" model='large',\n",
" truncate='LEFT'\n",
").embeddings\n",
"\n",
"# query, returning the top 5 most similar results\n",
"res = index.query(xq, top_k=5, include_metadata=True)\n",
"\n",
"for match in res['results'][0]['matches']:\n",
" print(f\"{match['score']:.2f}: {match['metadata']['title']} ({match['metadata']['link_flair_text']})\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tu9MmgYzwDbP"
},
"source": [
"Naturally there's some overlap between topics (and this example may be pretty inaccurate), but these will build the filters we will use.\n",
"\n",
"Filtering in Pinecone is pretty simple, we pass our conditions to the `filter` parameter using operators like equal to `$eq`, in `$in`, greater than `$gt`, etc. So if we want to return `Paleontology` specific results we can like so:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "fDv2PR_9wDbS",
"outputId": "74cec566-f64e-4767-96ce-6c1b28b23a4a"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"0.25: What exactly would the landscape of the British Isles have looked like prior to human cultivation? (Paleontology)\n",
"0.17: AskScience AMA Series: I am paleontologist Hans Sues, I study late Paleozoic and Mesozoic vertebrates. Ask Me Anything! (Paleontology)\n",
"0.16: Given the way the Indian subcontinent was once a very large island, is it possible to find the fossils of coastal animals in the Himalayas? (Paleontology)\n",
"0.15: If I went back to the Cretacious era to go fishing, what would I catch? How big would they be? What eon would be most interesting to fish in? (Paleontology)\n",
"0.13: We are paleontologists who study fossils from an incredible site in Texas called the Arlington Archosaur Site. Ask us anything! (Paleontology)\n"
]
}
],
"source": [
"query = \"what are the effects of the anthropocene?\"\n",
"\n",
"# create embedding with cohere\n",
"xq = co.embed(\n",
" texts=[query],\n",
" model='large',\n",
" truncate='LEFT'\n",
").embeddings\n",
"\n",
"# then query pinecone w/ a filter\n",
"res = index.query(\n",
" xq, top_k=5, include_metadata=True,\n",
" filter={\n",
" 'link_flair_text': {'$eq': 'Paleontology'}\n",
" })\n",
"\n",
"for match in res['results'][0]['matches']:\n",
" print(f\"{match['score']:.2f}: {match['metadata']['title']} ({match['metadata']['link_flair_text']})\")"
]
},
{
"cell_type": "markdown",
"source": [
"Or as with our demo, we might group flair labels together and use `$in`."
],
"metadata": {
"id": "kytGzOZDxGHS"
}
},
{
"cell_type": "code",
"source": [
"query = \"what are the effects of the anthropocene?\"\n",
"\n",
"# create embedding with cohere\n",
"xq = co.embed(\n",
" texts=[query],\n",
" model='large',\n",
" truncate='LEFT'\n",
").embeddings\n",
"\n",
"# then query pinecone w/ a filter\n",
"res = index.query(\n",
" xq, top_k=5, include_metadata=True,\n",
" filter={\n",
" 'link_flair_text': {'$in': chats['#social-sciences']}\n",
" })\n",
"\n",
"for match in res['results'][0]['matches']:\n",
" print(f\"{match['score']:.2f}: {match['metadata']['title']} ({match['metadata']['link_flair_text']})\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ft1YCgctxDNz",
"outputId": "d331409c-a209-45b4-d5a9-2525e61d7835"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"0.23: Why has Europe's population remained relatively constant whereas other continents have shown clear increase? (Social Science)\n",
"0.22: AskScience AMA Series: Im Stephan Lewandowsky, here with Klaus Oberauer, we will be responding to your questions about the conflict between our brains and our globe: How will we meet the challenges of the 21st century despite our cognitive limitations? AMA! (Psychology)\n",
"0.20: Has the growing % of the population avoiding meat consumption had any impact on meat production? (Anthropology)\n",
"0.20: What will happen to us if the birth replacement rate keeps falling? (Social Science)\n",
"0.19: If modern man came into existence 200k years ago, but modern day societies began about 10k years ago with the discoveries of agriculture and livestock, what the hell where they doing the other 190k years?? (Anthropology)\n"
]
}
]
}
],