diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 9f74919..639b4b3 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -1,6 +1,6 @@
-# Contributing to Claude Cookbook
+# Contributing to Claude Cookbooks
-Thank you for your interest in contributing to the Claude Cookbook! This guide will help you get started with development and ensure your contributions meet our quality standards.
+Thank you for your interest in contributing to the Claude Cookbooks! This guide will help you get started with development and ensure your contributions meet our quality standards.
## Development Setup
diff --git a/README.md b/README.md
index ea4cbc2..21512cc 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
-# Claude Cookbook
+# Claude Cookbooks
-The Claude Cookbook provides code and guides designed to help developers build with Claude, offering copy-able code snippets that you can easily integrate into your own projects.
+The Claude Cookbooks provides code and guides designed to help developers build with Claude, offering copy-able code snippets that you can easily integrate into your own projects.
## Prerequisites
@@ -20,7 +20,7 @@ Looking for more resources to enhance your experience with Claude and AI assista
## Contributing
-The Claude Cookbook thrives on the contributions of the developer community. We value your input, whether it's submitting an idea, fixing a typo, adding a new guide, or improving an existing one. By contributing, you help make this resource even more valuable for everyone.
+The Claude Cookbooks thrives on the contributions of the developer community. We value your input, whether it's submitting an idea, fixing a typo, adding a new guide, or improving an existing one. By contributing, you help make this resource even more valuable for everyone.
To avoid duplication of efforts, please review the existing issues and pull requests before contributing.
diff --git a/lychee.toml b/lychee.toml
index 5612169..4b3cde5 100644
--- a/lychee.toml
+++ b/lychee.toml
@@ -1,4 +1,4 @@
-# Lychee configuration for Claude Cookbook
+# Lychee configuration for Claude Cookbooks
# Validates links in notebooks and documentation
# Core settings
diff --git a/skills/README.md b/skills/README.md
index f3c8a59..868e714 100644
--- a/skills/README.md
+++ b/skills/README.md
@@ -1,6 +1,6 @@
# Claude Skills
-Welcome to the Skills section of the Claude Cookbook! This directory contains a collection of guides that showcase specific skills and capabilities where Claude excels. Each guide provides an in-depth exploration of a particular skill, discussing potential use cases, prompt engineering techniques to optimize results, and approaches for evaluating Claude's performance.
+Welcome to the Skills section of the Claude Cookbooks! This directory contains a collection of guides that showcase specific skills and capabilities where Claude excels. Each guide provides an in-depth exploration of a particular skill, discussing potential use cases, prompt engineering techniques to optimize results, and approaches for evaluating Claude's performance.
## Guides
diff --git a/skills/retrieval_augmented_generation/data/anthropic_docs.json b/skills/retrieval_augmented_generation/data/anthropic_docs.json
index 699f784..03aecc4 100644
--- a/skills/retrieval_augmented_generation/data/anthropic_docs.json
+++ b/skills/retrieval_augmented_generation/data/anthropic_docs.json
@@ -12,7 +12,7 @@
{
"chunk_link": "https://docs.claude.com/en/docs/welcome#develop-with-claude",
"chunk_heading": "Develop with Claude",
- "text": "Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n"
+ "text": "Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n"
},
{
"chunk_link": "https://docs.claude.com/en/docs/welcome#key-capabilities",
@@ -67,7 +67,7 @@
{
"chunk_link": "https://docs.claude.com/en/docs/quickstart#next-steps",
"chunk_heading": "Next steps",
- "text": "Next steps\n\n\nNow that you have made your first Claude API request, it\u2019s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude\u2019s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude\u2019s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude\u2019s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n"
+ "text": "Next steps\n\n\nNow that you have made your first Claude API request, it\u2019s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude\u2019s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude\u2019s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude\u2019s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n"
},
{
"chunk_link": "https://docs.claude.com/en/docs/intro-to-claude#what-you-can-do-with-claude",
@@ -102,7 +102,7 @@
{
"chunk_link": "https://docs.claude.com/en/docs/intro-to-claude#start-building-with-claude",
"chunk_heading": "Start building with Claude",
- "text": "Start building with Claude\n\n\nWhen you\u2019re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n"
+ "text": "Start building with Claude\n\n\nWhen you\u2019re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n"
},
{
"chunk_link": "https://docs.claude.com/en/docs/about-claude/models#model-names",
@@ -186,13 +186,13 @@
},
{
"chunk_link": "https://docs.claude.com/en/docs/build-with-claude/text-generation#anthropic-cookbook",
- "chunk_heading": "Claude Cookbook",
- "text": "Claude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude\u2019s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude\u2019s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude\u2019s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n"
+ "chunk_heading": "Claude Cookbooks",
+ "text": "Claude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude\u2019s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude\u2019s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude\u2019s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n"
},
{
"chunk_link": "https://docs.claude.com/en/docs/build-with-claude/text-generation#more-resources",
"chunk_heading": "More Resources",
- "text": "More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we\u2019ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n"
+ "text": "More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we\u2019ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n"
},
{
"chunk_link": "https://docs.claude.com/en/docs/build-with-claude/embeddings#before-implementing-embeddings",
@@ -1027,7 +1027,7 @@
{
"chunk_link": "https://docs.claude.com/en/docs/about-claude/use-cases/classification#deploy-your-classifier",
"chunk_heading": "Deploy your classifier",
- "text": "Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n"
+ "text": "Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n"
},
{
"chunk_link": "https://docs.claude.com/en/api/messages-streaming#streaming-with-sdks",
diff --git a/skills/retrieval_augmented_generation/data/anthropic_summary_indexed_docs.json b/skills/retrieval_augmented_generation/data/anthropic_summary_indexed_docs.json
index f6749ca..a25669d 100644
--- a/skills/retrieval_augmented_generation/data/anthropic_summary_indexed_docs.json
+++ b/skills/retrieval_augmented_generation/data/anthropic_summary_indexed_docs.json
@@ -14,7 +14,7 @@
{
"chunk_link": "https://docs.claude.com/en/docs/welcome#develop-with-claude",
"chunk_heading": "Develop with Claude",
- "text": "Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n",
+ "text": "Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n",
"summary": "Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations."
},
{
@@ -80,8 +80,8 @@
{
"chunk_link": "https://docs.claude.com/en/docs/quickstart#next-steps",
"chunk_heading": "Next steps",
- "text": "Next steps\n\n\nNow that you have made your first Claude API request, it\u2019s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude\u2019s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude\u2019s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude\u2019s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n",
- "summary": "The summary covers the next steps after making an initial Claude API request, including exploring the Prompt Engineering Guide to optimize Claude's performance, the Claude Cookbook for interactive Jupyter notebooks, and the Prompt Library for example prompts across use cases. It also mentions the overview and prerequisites for working with the Anthropic platform."
+ "text": "Next steps\n\n\nNow that you have made your first Claude API request, it\u2019s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude\u2019s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude\u2019s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude\u2019s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n",
+ "summary": "The summary covers the next steps after making an initial Claude API request, including exploring the Prompt Engineering Guide to optimize Claude's performance, the Claude Cookbooks for interactive Jupyter notebooks, and the Prompt Library for example prompts across use cases. It also mentions the overview and prerequisites for working with the Anthropic platform."
},
{
"chunk_link": "https://docs.claude.com/en/docs/intro-to-claude#what-you-can-do-with-claude",
@@ -122,8 +122,8 @@
{
"chunk_link": "https://docs.claude.com/en/docs/intro-to-claude#start-building-with-claude",
"chunk_heading": "Start building with Claude",
- "text": "Start building with Claude\n\n\nWhen you\u2019re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n",
- "summary": "The documentation provides guidance on how to start building with the Claude AI model, including following the Quickstart, exploring the API Reference and Prompt Library, using the Workbench, and checking out the Claude Cookbook for working code examples. It also covers model options, enterprise considerations, and implementation details."
+ "text": "Start building with Claude\n\n\nWhen you\u2019re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n",
+ "summary": "The documentation provides guidance on how to start building with the Claude AI model, including following the Quickstart, exploring the API Reference and Prompt Library, using the Workbench, and checking out the Claude Cookbooks for working code examples. It also covers model options, enterprise considerations, and implementation details."
},
{
"chunk_link": "https://docs.claude.com/en/docs/about-claude/models#model-names",
@@ -223,14 +223,14 @@
},
{
"chunk_link": "https://docs.claude.com/en/docs/build-with-claude/text-generation#anthropic-cookbook",
- "chunk_heading": "Claude Cookbook",
- "text": "Claude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude\u2019s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude\u2019s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude\u2019s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n",
- "summary": "The Claude Cookbook provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, how to extend Claude's capabilities by integrating external tools and functions, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks."
+ "chunk_heading": "Claude Cookbooks",
+ "text": "Claude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude\u2019s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude\u2019s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude\u2019s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n",
+ "summary": "The Claude Cookbooks provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, how to extend Claude's capabilities by integrating external tools and functions, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks."
},
{
"chunk_link": "https://docs.claude.com/en/docs/build-with-claude/text-generation#more-resources",
"chunk_heading": "More Resources",
- "text": "More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we\u2019ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n",
+ "text": "More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we\u2019ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n",
"summary": "The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models."
},
{
@@ -1232,8 +1232,8 @@
{
"chunk_link": "https://docs.claude.com/en/docs/about-claude/use-cases/classification#deploy-your-classifier",
"chunk_heading": "Deploy your classifier",
- "text": "Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n",
- "summary": "Deploy your classifier: Check out the Classification Guide in the Claude Cookbook for code examples on using Claude for classification. The guide covers when to use Claude for classification, establishing your use case, implementing Claude, building prompts, developing test cases, running evaluations, and deploying your classifier."
+ "text": "Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n",
+ "summary": "Deploy your classifier: Check out the Classification Guide in the Claude Cookbooks for code examples on using Claude for classification. The guide covers when to use Claude for classification, establishing your use case, implementing Claude, building prompts, developing test cases, running evaluations, and deploying your classifier."
},
{
"chunk_link": "https://docs.claude.com/en/api/messages-streaming#streaming-with-sdks",
diff --git a/skills/retrieval_augmented_generation/data/end_to_end_results.json b/skills/retrieval_augmented_generation/data/end_to_end_results.json
index 1648e12..a40181a 100644
--- a/skills/retrieval_augmented_generation/data/end_to_end_results.json
+++ b/skills/retrieval_augmented_generation/data/end_to_end_results.json
@@ -2316,7 +2316,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nControlling Claude’s output\n\n\n\n\nText capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nControlling Claude’s output\n\n\n\n\nText capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
@@ -2463,7 +2463,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Text capabilities and use cases\n\nText\n Text capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n \n\nSummary: \n Claude has a broad range of text-based capabilities, including text summarization, content generation, data/entity extraction, question answering, text translation, text analysis and recommendations, dialogue and conversation, and code explanation and generation. These capabilities enable a wide variety of use cases, from crafting compelling content to building intelligent interactive systems and accelerating software development. \n \n\n \n Claude Cookbook\n\nText\n Claude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n\nSummary: \n The Claude Cookbook provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, how to extend Claude's capabilities by integrating external tools and functions, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks. \n \n\n \n What you can do with Claude\n\nText\n What you can do with Claude\n\n\nClaude is designed to empower enterprises at scale with strong performance across benchmark evaluations for reasoning, math, coding, and fluency in English and non-English languages.\nHere’s a non-exhaustive list of Claude’s capabilities and common uses.\nCapabilityEnables you to…Text and code generationAdhere to brand voice for excellent customer-facing experiences such as copywriting and chatbotsCreate production-level code and operate (in-line code generation, debugging, and conversational querying) within complex codebasesBuild automatic translation features between languagesConduct complex financial forecastsSupport legal use cases that require high-quality technical analysis, long context windows for processing detailed documents, and fast outputsVisionProcess and analyze visual input, such as extracting insights from charts and graphsGenerate code from images with code snippets or templates based on diagramsDescribe an image for a user with low visionTool useInteract with external client-side tools and functions, allowing Claude to reason, plan, and execute actions by generating structured outputs through API calls\nAdhere to brand voice for excellent customer-facing experiences such as copywriting and chatbotsCreate production-level code and operate (in-line code generation, debugging, and conversational querying) within complex codebasesBuild automatic translation features between languagesConduct complex financial forecastsSupport legal use cases that require high-quality technical analysis, long context windows for processing detailed documents, and fast outputs\nProcess and analyze visual input, such as extracting insights from charts and graphsGenerate code from images with code snippets or templates based on diagramsDescribe an image for a user with low vision\nInteract with external client-side tools and functions, allowing Claude to reason, plan, and execute actions by generating structured outputs through API calls\n \n\nSummary: \n Claude is a powerful AI model designed to assist enterprises with a wide range of capabilities, including text and code generation, language translation, financial forecasting, legal analysis, and visual processing. It can also interact with external tools and functions, allowing it to reason, plan, and execute actions through API calls. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Text capabilities and use cases\n\nText\n Text capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n \n\nSummary: \n Claude has a broad range of text-based capabilities, including text summarization, content generation, data/entity extraction, question answering, text translation, text analysis and recommendations, dialogue and conversation, and code explanation and generation. These capabilities enable a wide variety of use cases, from crafting compelling content to building intelligent interactive systems and accelerating software development. \n \n\n \n Claude Cookbooks\n\nText\n Claude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n\nSummary: \n The Claude Cookbooks provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, how to extend Claude's capabilities by integrating external tools and functions, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks. \n \n\n \n What you can do with Claude\n\nText\n What you can do with Claude\n\n\nClaude is designed to empower enterprises at scale with strong performance across benchmark evaluations for reasoning, math, coding, and fluency in English and non-English languages.\nHere’s a non-exhaustive list of Claude’s capabilities and common uses.\nCapabilityEnables you to…Text and code generationAdhere to brand voice for excellent customer-facing experiences such as copywriting and chatbotsCreate production-level code and operate (in-line code generation, debugging, and conversational querying) within complex codebasesBuild automatic translation features between languagesConduct complex financial forecastsSupport legal use cases that require high-quality technical analysis, long context windows for processing detailed documents, and fast outputsVisionProcess and analyze visual input, such as extracting insights from charts and graphsGenerate code from images with code snippets or templates based on diagramsDescribe an image for a user with low visionTool useInteract with external client-side tools and functions, allowing Claude to reason, plan, and execute actions by generating structured outputs through API calls\nAdhere to brand voice for excellent customer-facing experiences such as copywriting and chatbotsCreate production-level code and operate (in-line code generation, debugging, and conversational querying) within complex codebasesBuild automatic translation features between languagesConduct complex financial forecastsSupport legal use cases that require high-quality technical analysis, long context windows for processing detailed documents, and fast outputs\nProcess and analyze visual input, such as extracting insights from charts and graphsGenerate code from images with code snippets or templates based on diagramsDescribe an image for a user with low vision\nInteract with external client-side tools and functions, allowing Claude to reason, plan, and execute actions by generating structured outputs through API calls\n \n\nSummary: \n Claude is a powerful AI model designed to assist enterprises with a wide range of capabilities, including text and code generation, language translation, financial forecasting, legal analysis, and visual processing. It can also interact with external tools and functions, allowing it to reason, plan, and execute actions through API calls. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
@@ -2471,7 +2471,7 @@
"correct_answer": "You can upload PDFs and have Claude summarize their content, making it easier to understand the key points of long documents without having to read through everything."
},
"response": {
- "output": "To use Claude to more easily digest the content of long PDF documents, you can leverage the PDF Upload & Summarization capability described in the Claude Cookbook. This allows you to upload PDF files and have Claude summarize the key insights, making it easier to quickly understand the content of lengthy documents.",
+ "output": "To use Claude to more easily digest the content of long PDF documents, you can leverage the PDF Upload & Summarization capability described in the Claude Cookbooks. This allows you to upload PDF files and have Claude summarize the key insights, making it easier to quickly understand the content of lengthy documents.",
"tokenUsage": {
"total": 1410,
"prompt": 1345,
@@ -2498,7 +2498,7 @@
{
"pass": true,
"score": 1,
- "reason": "The Generated Answer is correct based on the Correct Answer. Both answers convey the key point that Claude can be used to summarize PDF documents, making it easier to understand long documents without reading everything. The Generated Answer provides some additional context by mentioning the \"Claude Cookbook\", but this doesn't contradict or omit any critical information from the Correct Answer. The essence of both answers is the same: Claude can summarize PDFs to help users digest long documents more easily.",
+ "reason": "The Generated Answer is correct based on the Correct Answer. Both answers convey the key point that Claude can be used to summarize PDF documents, making it easier to understand long documents without reading everything. The Generated Answer provides some additional context by mentioning the \"Claude Cookbooks\", but this doesn't contradict or omit any critical information from the Correct Answer. The essence of both answers is the same: Claude can summarize PDFs to help users digest long documents more easily.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -2514,7 +2514,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nControlling Claude’s output\n\n\n\n\nText capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nControlling Claude’s output\n\n\n\n\nText capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
@@ -2522,7 +2522,7 @@
"correct_answer": "You can upload PDFs and have Claude summarize their content, making it easier to understand the key points of long documents without having to read through everything."
},
"response": {
- "output": "To more easily digest the content of long PDF documents using Claude, you can take advantage of Claude's PDF upload and summarization capabilities. Here's how you can do it:\n\n1. Upload the PDF document to Claude. The system allows you to directly upload PDFs for Claude to analyze.\n\n2. Once uploaded, you can ask Claude to summarize the content of the PDF. Claude has the ability to distill lengthy content into key insights, making it easier for you to grasp the main points without reading the entire document.\n\n3. You can also ask specific questions about the content of the PDF. Claude's question-answering capabilities allow you to extract particular information or clarify certain points from the document.\n\n4. For even more structured insights, you can use Claude's data and entity extraction abilities to pull out specific types of information from the PDF, such as names, dates, or key statistics.\n\n5. If needed, you can ask Claude to provide analysis or recommendations based on the PDF content, helping you to understand sentiment, patterns, or implications within the document.\n\nBy leveraging these capabilities, you can quickly get the essence of long PDF documents, saving time and improving your understanding of the material. The Claude Cookbook mentioned in the context provides practical examples and tutorials on how to use these features effectively.",
+ "output": "To more easily digest the content of long PDF documents using Claude, you can take advantage of Claude's PDF upload and summarization capabilities. Here's how you can do it:\n\n1. Upload the PDF document to Claude. The system allows you to directly upload PDFs for Claude to analyze.\n\n2. Once uploaded, you can ask Claude to summarize the content of the PDF. Claude has the ability to distill lengthy content into key insights, making it easier for you to grasp the main points without reading the entire document.\n\n3. You can also ask specific questions about the content of the PDF. Claude's question-answering capabilities allow you to extract particular information or clarify certain points from the document.\n\n4. For even more structured insights, you can use Claude's data and entity extraction abilities to pull out specific types of information from the PDF, such as names, dates, or key statistics.\n\n5. If needed, you can ask Claude to provide analysis or recommendations based on the PDF content, helping you to understand sentiment, patterns, or implications within the document.\n\nBy leveraging these capabilities, you can quickly get the essence of long PDF documents, saving time and improving your understanding of the material. The Claude Cookbooks mentioned in the context provides practical examples and tutorials on how to use these features effectively.",
"tokenUsage": {
"total": 938,
"prompt": 664,
@@ -2565,7 +2565,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Text capabilities and use cases\n\nText\n Text capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n \n\nSummary: \n Claude has a broad range of text-based capabilities, including text summarization, content generation, data/entity extraction, question answering, text translation, text analysis and recommendations, dialogue and conversation, and code explanation and generation. These capabilities enable a wide variety of use cases, from crafting compelling content to building intelligent interactive systems and accelerating software development. \n \n\n \n Claude Cookbook\n\nText\n Claude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n\nSummary: \n The Claude Cookbook provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, how to extend Claude's capabilities by integrating external tools and functions, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks. \n \n\n \n What you can do with Claude\n\nText\n What you can do with Claude\n\n\nClaude is designed to empower enterprises at scale with strong performance across benchmark evaluations for reasoning, math, coding, and fluency in English and non-English languages.\nHere’s a non-exhaustive list of Claude’s capabilities and common uses.\nCapabilityEnables you to…Text and code generationAdhere to brand voice for excellent customer-facing experiences such as copywriting and chatbotsCreate production-level code and operate (in-line code generation, debugging, and conversational querying) within complex codebasesBuild automatic translation features between languagesConduct complex financial forecastsSupport legal use cases that require high-quality technical analysis, long context windows for processing detailed documents, and fast outputsVisionProcess and analyze visual input, such as extracting insights from charts and graphsGenerate code from images with code snippets or templates based on diagramsDescribe an image for a user with low visionTool useInteract with external client-side tools and functions, allowing Claude to reason, plan, and execute actions by generating structured outputs through API calls\nAdhere to brand voice for excellent customer-facing experiences such as copywriting and chatbotsCreate production-level code and operate (in-line code generation, debugging, and conversational querying) within complex codebasesBuild automatic translation features between languagesConduct complex financial forecastsSupport legal use cases that require high-quality technical analysis, long context windows for processing detailed documents, and fast outputs\nProcess and analyze visual input, such as extracting insights from charts and graphsGenerate code from images with code snippets or templates based on diagramsDescribe an image for a user with low vision\nInteract with external client-side tools and functions, allowing Claude to reason, plan, and execute actions by generating structured outputs through API calls\n \n\nSummary: \n Claude is a powerful AI model designed to assist enterprises with a wide range of capabilities, including text and code generation, language translation, financial forecasting, legal analysis, and visual processing. It can also interact with external tools and functions, allowing it to reason, plan, and execute actions through API calls. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Text capabilities and use cases\n\nText\n Text capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n \n\nSummary: \n Claude has a broad range of text-based capabilities, including text summarization, content generation, data/entity extraction, question answering, text translation, text analysis and recommendations, dialogue and conversation, and code explanation and generation. These capabilities enable a wide variety of use cases, from crafting compelling content to building intelligent interactive systems and accelerating software development. \n \n\n \n Claude Cookbooks\n\nText\n Claude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n\nSummary: \n The Claude Cookbooks provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, how to extend Claude's capabilities by integrating external tools and functions, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks. \n \n\n \n What you can do with Claude\n\nText\n What you can do with Claude\n\n\nClaude is designed to empower enterprises at scale with strong performance across benchmark evaluations for reasoning, math, coding, and fluency in English and non-English languages.\nHere’s a non-exhaustive list of Claude’s capabilities and common uses.\nCapabilityEnables you to…Text and code generationAdhere to brand voice for excellent customer-facing experiences such as copywriting and chatbotsCreate production-level code and operate (in-line code generation, debugging, and conversational querying) within complex codebasesBuild automatic translation features between languagesConduct complex financial forecastsSupport legal use cases that require high-quality technical analysis, long context windows for processing detailed documents, and fast outputsVisionProcess and analyze visual input, such as extracting insights from charts and graphsGenerate code from images with code snippets or templates based on diagramsDescribe an image for a user with low visionTool useInteract with external client-side tools and functions, allowing Claude to reason, plan, and execute actions by generating structured outputs through API calls\nAdhere to brand voice for excellent customer-facing experiences such as copywriting and chatbotsCreate production-level code and operate (in-line code generation, debugging, and conversational querying) within complex codebasesBuild automatic translation features between languagesConduct complex financial forecastsSupport legal use cases that require high-quality technical analysis, long context windows for processing detailed documents, and fast outputs\nProcess and analyze visual input, such as extracting insights from charts and graphsGenerate code from images with code snippets or templates based on diagramsDescribe an image for a user with low vision\nInteract with external client-side tools and functions, allowing Claude to reason, plan, and execute actions by generating structured outputs through API calls\n \n\nSummary: \n Claude is a powerful AI model designed to assist enterprises with a wide range of capabilities, including text and code generation, language translation, financial forecasting, legal analysis, and visual processing. It can also interact with external tools and functions, allowing it to reason, plan, and execute actions through API calls. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
@@ -2662,7 +2662,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbook\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Text capabilities and use cases\n\nText capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n \n \n\n \n More Resources\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbooks\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Text capabilities and use cases\n\nText capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n \n \n\n \n More Resources\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_three"
},
"vars": {
@@ -2670,7 +2670,7 @@
"correct_answer": "You can upload PDFs and have Claude summarize their content, making it easier to understand the key points of long documents without having to read through everything."
},
"response": {
- "output": "To use Claude to more easily digest the content of long PDF documents, you can follow these steps:\n\n1. Upload the PDF document to Claude using the \"PDF Upload & Summarization\" feature described in the Claude Cookbook. This will allow Claude to read and summarize the content of the PDF, making it easier to quickly understand the key points.\n\n2. Leverage Claude's text summarization capabilities to condense the lengthy PDF content into a concise summary. This can be done by providing the PDF document to Claude and requesting a high-level summary.\n\n3. If you need to dive deeper into specific sections of the PDF, you can use Claude's question answering abilities to ask targeted questions about the document's content. This will allow you to extract relevant information without having to read through the entire PDF.\n\n4. Additionally, you can use Claude's text analysis and recommendation features to identify key themes, sentiments, and insights within the PDF document, further aiding your understanding of the content.",
+ "output": "To use Claude to more easily digest the content of long PDF documents, you can follow these steps:\n\n1. Upload the PDF document to Claude using the \"PDF Upload & Summarization\" feature described in the Claude Cookbooks. This will allow Claude to read and summarize the content of the PDF, making it easier to quickly understand the key points.\n\n2. Leverage Claude's text summarization capabilities to condense the lengthy PDF content into a concise summary. This can be done by providing the PDF document to Claude and requesting a high-level summary.\n\n3. If you need to dive deeper into specific sections of the PDF, you can use Claude's question answering abilities to ask targeted questions about the document's content. This will allow you to extract relevant information without having to read through the entire PDF.\n\n4. Additionally, you can use Claude's text analysis and recommendation features to identify key themes, sentiments, and insights within the PDF document, further aiding your understanding of the content.",
"tokenUsage": {
"total": 1280,
"prompt": 1069,
@@ -2713,7 +2713,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbook\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Text capabilities and use cases\n\nText capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n \n \n\n \n More Resources\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbooks\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Text capabilities and use cases\n\nText capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n \n \n\n \n More Resources\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_three"
},
"vars": {
@@ -3065,7 +3065,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Evaluating the Performance of your Ticket Routing Classifier\n\nText\n Evaluating the Performance of your Ticket Routing Classifier\n\n\nBefore deploying your ticket routing classifier to production, it’s crucial to evaluate its performance in terms of accuracy, cost, and speed. These three factors determine the readiness of your new system and boost confidence in its real-world effectiveness. A thorough evaluation helps you convince both technical and business stakeholders of the appropriateness and impact of your solution.\n \n\nSummary: \n Evaluating the performance of a ticket routing classifier is crucial before deployment, as it determines the accuracy, cost, and speed of the system. A thorough evaluation helps convince stakeholders of the appropriateness and impact of the solution, boosting confidence in its real-world effectiveness. \n \n\n \n Advantages of Using Claude\n\nText\n Advantages of Using Claude\n\n\nTraditionally, multi-class classification techniques in Natural Language Processing (NLP) have been used to categorize support tickets. However, these methods require a very large training dataset, complex ontology design, and inflexible class definition.\nUsing Large Language Models (LLMs) like Claude, text classification for customer support ticket routing has become significantly more efficient and effective, addressing the limitations of traditional ML techniques:\nMinimal training data: Claude’s pre-trained language model can understand and classify tickets with just a few dozen labeled examples, greatly reducing the time and cost associated with data preparation.\nAdaptability to changing classes: As your product or customer needs evolve, Claude can easily adapt to changes in class definitions or the introduction of new classes without extensive relabeling of training data\nSimplified ontology design: Claude’s advanced language understanding capabilities allow it to accurately classify tickets based on their content and context, rather than relying on strict ontological structures.\nInterpretable reasoning: Claude can generate human-readable explanations for its classification decisions, providing interpretable reasoning that builds trust in the automation system and allow you to easily adapt the approach if needed\n \n\nSummary: \n Claude, Anthropic's large language model, offers significant advantages over traditional multi-class classification techniques for customer support ticket routing. It requires minimal training data, can easily adapt to changing class definitions, and simplifies ontology design, while providing interpretable reasoning for its classification decisions. \n \n\n \n Deploy your classifier\n\nText\n Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n\nSummary: \n Deploy your classifier: Check out the Classification Guide in the Claude Cookbook for code examples on using Claude for classification. The guide covers when to use Claude for classification, establishing your use case, implementing Claude, building prompts, developing test cases, running evaluations, and deploying your classifier. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Evaluating the Performance of your Ticket Routing Classifier\n\nText\n Evaluating the Performance of your Ticket Routing Classifier\n\n\nBefore deploying your ticket routing classifier to production, it’s crucial to evaluate its performance in terms of accuracy, cost, and speed. These three factors determine the readiness of your new system and boost confidence in its real-world effectiveness. A thorough evaluation helps you convince both technical and business stakeholders of the appropriateness and impact of your solution.\n \n\nSummary: \n Evaluating the performance of a ticket routing classifier is crucial before deployment, as it determines the accuracy, cost, and speed of the system. A thorough evaluation helps convince stakeholders of the appropriateness and impact of the solution, boosting confidence in its real-world effectiveness. \n \n\n \n Advantages of Using Claude\n\nText\n Advantages of Using Claude\n\n\nTraditionally, multi-class classification techniques in Natural Language Processing (NLP) have been used to categorize support tickets. However, these methods require a very large training dataset, complex ontology design, and inflexible class definition.\nUsing Large Language Models (LLMs) like Claude, text classification for customer support ticket routing has become significantly more efficient and effective, addressing the limitations of traditional ML techniques:\nMinimal training data: Claude’s pre-trained language model can understand and classify tickets with just a few dozen labeled examples, greatly reducing the time and cost associated with data preparation.\nAdaptability to changing classes: As your product or customer needs evolve, Claude can easily adapt to changes in class definitions or the introduction of new classes without extensive relabeling of training data\nSimplified ontology design: Claude’s advanced language understanding capabilities allow it to accurately classify tickets based on their content and context, rather than relying on strict ontological structures.\nInterpretable reasoning: Claude can generate human-readable explanations for its classification decisions, providing interpretable reasoning that builds trust in the automation system and allow you to easily adapt the approach if needed\n \n\nSummary: \n Claude, Anthropic's large language model, offers significant advantages over traditional multi-class classification techniques for customer support ticket routing. It requires minimal training data, can easily adapt to changing class definitions, and simplifies ontology design, while providing interpretable reasoning for its classification decisions. \n \n\n \n Deploy your classifier\n\nText\n Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n\nSummary: \n Deploy your classifier: Check out the Classification Guide in the Claude Cookbooks for code examples on using Claude for classification. The guide covers when to use Claude for classification, establishing your use case, implementing Claude, building prompts, developing test cases, running evaluations, and deploying your classifier. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
@@ -3169,7 +3169,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Evaluating the Performance of your Ticket Routing Classifier\n\nText\n Evaluating the Performance of your Ticket Routing Classifier\n\n\nBefore deploying your ticket routing classifier to production, it’s crucial to evaluate its performance in terms of accuracy, cost, and speed. These three factors determine the readiness of your new system and boost confidence in its real-world effectiveness. A thorough evaluation helps you convince both technical and business stakeholders of the appropriateness and impact of your solution.\n \n\nSummary: \n Evaluating the performance of a ticket routing classifier is crucial before deployment, as it determines the accuracy, cost, and speed of the system. A thorough evaluation helps convince stakeholders of the appropriateness and impact of the solution, boosting confidence in its real-world effectiveness. \n \n\n \n Advantages of Using Claude\n\nText\n Advantages of Using Claude\n\n\nTraditionally, multi-class classification techniques in Natural Language Processing (NLP) have been used to categorize support tickets. However, these methods require a very large training dataset, complex ontology design, and inflexible class definition.\nUsing Large Language Models (LLMs) like Claude, text classification for customer support ticket routing has become significantly more efficient and effective, addressing the limitations of traditional ML techniques:\nMinimal training data: Claude’s pre-trained language model can understand and classify tickets with just a few dozen labeled examples, greatly reducing the time and cost associated with data preparation.\nAdaptability to changing classes: As your product or customer needs evolve, Claude can easily adapt to changes in class definitions or the introduction of new classes without extensive relabeling of training data\nSimplified ontology design: Claude’s advanced language understanding capabilities allow it to accurately classify tickets based on their content and context, rather than relying on strict ontological structures.\nInterpretable reasoning: Claude can generate human-readable explanations for its classification decisions, providing interpretable reasoning that builds trust in the automation system and allow you to easily adapt the approach if needed\n \n\nSummary: \n Claude, Anthropic's large language model, offers significant advantages over traditional multi-class classification techniques for customer support ticket routing. It requires minimal training data, can easily adapt to changing class definitions, and simplifies ontology design, while providing interpretable reasoning for its classification decisions. \n \n\n \n Deploy your classifier\n\nText\n Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n\nSummary: \n Deploy your classifier: Check out the Classification Guide in the Claude Cookbook for code examples on using Claude for classification. The guide covers when to use Claude for classification, establishing your use case, implementing Claude, building prompts, developing test cases, running evaluations, and deploying your classifier. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Evaluating the Performance of your Ticket Routing Classifier\n\nText\n Evaluating the Performance of your Ticket Routing Classifier\n\n\nBefore deploying your ticket routing classifier to production, it’s crucial to evaluate its performance in terms of accuracy, cost, and speed. These three factors determine the readiness of your new system and boost confidence in its real-world effectiveness. A thorough evaluation helps you convince both technical and business stakeholders of the appropriateness and impact of your solution.\n \n\nSummary: \n Evaluating the performance of a ticket routing classifier is crucial before deployment, as it determines the accuracy, cost, and speed of the system. A thorough evaluation helps convince stakeholders of the appropriateness and impact of the solution, boosting confidence in its real-world effectiveness. \n \n\n \n Advantages of Using Claude\n\nText\n Advantages of Using Claude\n\n\nTraditionally, multi-class classification techniques in Natural Language Processing (NLP) have been used to categorize support tickets. However, these methods require a very large training dataset, complex ontology design, and inflexible class definition.\nUsing Large Language Models (LLMs) like Claude, text classification for customer support ticket routing has become significantly more efficient and effective, addressing the limitations of traditional ML techniques:\nMinimal training data: Claude’s pre-trained language model can understand and classify tickets with just a few dozen labeled examples, greatly reducing the time and cost associated with data preparation.\nAdaptability to changing classes: As your product or customer needs evolve, Claude can easily adapt to changes in class definitions or the introduction of new classes without extensive relabeling of training data\nSimplified ontology design: Claude’s advanced language understanding capabilities allow it to accurately classify tickets based on their content and context, rather than relying on strict ontological structures.\nInterpretable reasoning: Claude can generate human-readable explanations for its classification decisions, providing interpretable reasoning that builds trust in the automation system and allow you to easily adapt the approach if needed\n \n\nSummary: \n Claude, Anthropic's large language model, offers significant advantages over traditional multi-class classification techniques for customer support ticket routing. It requires minimal training data, can easily adapt to changing class definitions, and simplifies ontology design, while providing interpretable reasoning for its classification decisions. \n \n\n \n Deploy your classifier\n\nText\n Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n\nSummary: \n Deploy your classifier: Check out the Classification Guide in the Claude Cookbooks for code examples on using Claude for classification. The guide covers when to use Claude for classification, establishing your use case, implementing Claude, building prompts, developing test cases, running evaluations, and deploying your classifier. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
@@ -4078,7 +4078,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n Before starting to engineer and improve a prompt in Claude, what key things does Anthropic recommend you have in place first?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nImplementing Claude\n\n\n1Scope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n2Design your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n3Prepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n4Develop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n5Implement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n6Test your system\nConduct red teaming for potential misuse and A/B test improvements.\n7Deploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\n8Monitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\n1Scope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n\n1\n1\nScope your use case Identify a problem to solve or tasks to automate with Claude. Define requirements: features, performance, and cost.\nScope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n2Design your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n\n2\n2\nDesign your integration Select Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs. Choose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\nDesign your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n3Prepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n\n3\n3\nPrepare your data Identify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\nPrepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n4Develop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n\n4\n4\nDevelop your prompts Use Workbench to create evals, draft prompts, and iteratively refine based on test results. Deploy polished prompts and monitor real-world performance for further refinement.\nDevelop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n5Implement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n\n5\n5\nImplement Claude Set up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\nImplement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n6Test your system\nConduct red teaming for potential misuse and A/B test improvements.\n\n6\n6\nTest your system Conduct red teaming for potential misuse and A/B test improvements.\nTest your system\nConduct red teaming for potential misuse and A/B test improvements.\nConduct red teaming for potential misuse and A/B test improvements.\n7Deploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\n\n7\n7\nDeploy to production Once your application runs smoothly end-to-end, deploy to production.\nDeploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\nOnce your application runs smoothly end-to-end, deploy to production.\n8Monitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\n\n8\n8\nMonitor and improve Monitor performance and effectiveness to make ongoing improvements.\nMonitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\nMonitor performance and effectiveness to make ongoing improvements.\n\n\nHow to prompt engineer\n\n\nThe prompt engineering pages in this section have been organized from most broadly effective techniques to more specialized techniques. When troubleshooting performance, we suggest you try these techniques in order, although the actual impact of each technique will depend on our use case.\nPrompt generator\nBe clear and direct\nUse examples (multishot)\nLet Claude think (chain of thought)\nUse XML tags\nGive Claude a role (system prompts)\nPrefill Claude’s response\nChain complex prompts\nLong context tips\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n Before starting to engineer and improve a prompt in Claude, what key things does Anthropic recommend you have in place first?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nImplementing Claude\n\n\n1Scope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n2Design your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n3Prepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n4Develop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n5Implement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n6Test your system\nConduct red teaming for potential misuse and A/B test improvements.\n7Deploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\n8Monitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\n1Scope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n\n1\n1\nScope your use case Identify a problem to solve or tasks to automate with Claude. Define requirements: features, performance, and cost.\nScope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n2Design your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n\n2\n2\nDesign your integration Select Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs. Choose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\nDesign your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n3Prepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n\n3\n3\nPrepare your data Identify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\nPrepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n4Develop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n\n4\n4\nDevelop your prompts Use Workbench to create evals, draft prompts, and iteratively refine based on test results. Deploy polished prompts and monitor real-world performance for further refinement.\nDevelop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n5Implement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n\n5\n5\nImplement Claude Set up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\nImplement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n6Test your system\nConduct red teaming for potential misuse and A/B test improvements.\n\n6\n6\nTest your system Conduct red teaming for potential misuse and A/B test improvements.\nTest your system\nConduct red teaming for potential misuse and A/B test improvements.\nConduct red teaming for potential misuse and A/B test improvements.\n7Deploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\n\n7\n7\nDeploy to production Once your application runs smoothly end-to-end, deploy to production.\nDeploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\nOnce your application runs smoothly end-to-end, deploy to production.\n8Monitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\n\n8\n8\nMonitor and improve Monitor performance and effectiveness to make ongoing improvements.\nMonitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\nMonitor performance and effectiveness to make ongoing improvements.\n\n\nHow to prompt engineer\n\n\nThe prompt engineering pages in this section have been organized from most broadly effective techniques to more specialized techniques. When troubleshooting performance, we suggest you try these techniques in order, although the actual impact of each technique will depend on our use case.\nPrompt generator\nBe clear and direct\nUse examples (multishot)\nLet Claude think (chain of thought)\nUse XML tags\nGive Claude a role (system prompts)\nPrefill Claude’s response\nChain complex prompts\nLong context tips\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
@@ -4380,7 +4380,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n Before starting to engineer and improve a prompt in Claude, what key things does Anthropic recommend you have in place first?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nImplementing Claude\n\n\n1Scope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n2Design your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n3Prepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n4Develop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n5Implement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n6Test your system\nConduct red teaming for potential misuse and A/B test improvements.\n7Deploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\n8Monitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\n1Scope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n\n1\n1\nScope your use case Identify a problem to solve or tasks to automate with Claude. Define requirements: features, performance, and cost.\nScope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n2Design your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n\n2\n2\nDesign your integration Select Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs. Choose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\nDesign your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n3Prepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n\n3\n3\nPrepare your data Identify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\nPrepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n4Develop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n\n4\n4\nDevelop your prompts Use Workbench to create evals, draft prompts, and iteratively refine based on test results. Deploy polished prompts and monitor real-world performance for further refinement.\nDevelop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n5Implement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n\n5\n5\nImplement Claude Set up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\nImplement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n6Test your system\nConduct red teaming for potential misuse and A/B test improvements.\n\n6\n6\nTest your system Conduct red teaming for potential misuse and A/B test improvements.\nTest your system\nConduct red teaming for potential misuse and A/B test improvements.\nConduct red teaming for potential misuse and A/B test improvements.\n7Deploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\n\n7\n7\nDeploy to production Once your application runs smoothly end-to-end, deploy to production.\nDeploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\nOnce your application runs smoothly end-to-end, deploy to production.\n8Monitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\n\n8\n8\nMonitor and improve Monitor performance and effectiveness to make ongoing improvements.\nMonitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\nMonitor performance and effectiveness to make ongoing improvements.\n\n\nHow to prompt engineer\n\n\nThe prompt engineering pages in this section have been organized from most broadly effective techniques to more specialized techniques. When troubleshooting performance, we suggest you try these techniques in order, although the actual impact of each technique will depend on our use case.\nPrompt generator\nBe clear and direct\nUse examples (multishot)\nLet Claude think (chain of thought)\nUse XML tags\nGive Claude a role (system prompts)\nPrefill Claude’s response\nChain complex prompts\nLong context tips\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n Before starting to engineer and improve a prompt in Claude, what key things does Anthropic recommend you have in place first?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nImplementing Claude\n\n\n1Scope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n2Design your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n3Prepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n4Develop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n5Implement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n6Test your system\nConduct red teaming for potential misuse and A/B test improvements.\n7Deploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\n8Monitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\n1Scope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n\n1\n1\nScope your use case Identify a problem to solve or tasks to automate with Claude. Define requirements: features, performance, and cost.\nScope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n2Design your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n\n2\n2\nDesign your integration Select Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs. Choose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\nDesign your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n3Prepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n\n3\n3\nPrepare your data Identify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\nPrepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n4Develop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n\n4\n4\nDevelop your prompts Use Workbench to create evals, draft prompts, and iteratively refine based on test results. Deploy polished prompts and monitor real-world performance for further refinement.\nDevelop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n5Implement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n\n5\n5\nImplement Claude Set up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\nImplement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n6Test your system\nConduct red teaming for potential misuse and A/B test improvements.\n\n6\n6\nTest your system Conduct red teaming for potential misuse and A/B test improvements.\nTest your system\nConduct red teaming for potential misuse and A/B test improvements.\nConduct red teaming for potential misuse and A/B test improvements.\n7Deploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\n\n7\n7\nDeploy to production Once your application runs smoothly end-to-end, deploy to production.\nDeploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\nOnce your application runs smoothly end-to-end, deploy to production.\n8Monitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\n\n8\n8\nMonitor and improve Monitor performance and effectiveness to make ongoing improvements.\nMonitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\nMonitor performance and effectiveness to make ongoing improvements.\n\n\nHow to prompt engineer\n\n\nThe prompt engineering pages in this section have been organized from most broadly effective techniques to more specialized techniques. When troubleshooting performance, we suggest you try these techniques in order, although the actual impact of each technique will depend on our use case.\nPrompt generator\nBe clear and direct\nUse examples (multishot)\nLet Claude think (chain of thought)\nUse XML tags\nGive Claude a role (system prompts)\nPrefill Claude’s response\nChain complex prompts\nLong context tips\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
@@ -10199,7 +10199,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMay 13th, 2024\n\n\nClaude.ai and our iOS app are now available in Europe. Learn more in our Europe launch announcement.\n\n\nJune 5th, 2024\n\n\nClaude.ai, our API, and iOS app are now available in Canada. Learn more in our Canada launch announcement.\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMay 13th, 2024\n\n\nClaude.ai and our iOS app are now available in Europe. Learn more in our Europe launch announcement.\n\n\nJune 5th, 2024\n\n\nClaude.ai, our API, and iOS app are now available in Canada. Learn more in our Canada launch announcement.\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
@@ -10295,7 +10295,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMay 13th, 2024\n\n\nClaude.ai and our iOS app are now available in Europe. Learn more in our Europe launch announcement.\n\n\nJune 5th, 2024\n\n\nClaude.ai, our API, and iOS app are now available in Canada. Learn more in our Canada launch announcement.\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMay 13th, 2024\n\n\nClaude.ai and our iOS app are now available in Europe. Learn more in our Europe launch announcement.\n\n\nJune 5th, 2024\n\n\nClaude.ai, our API, and iOS app are now available in Canada. Learn more in our Canada launch announcement.\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
@@ -12252,7 +12252,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nHow to prefill Claude’s response\n\n\nTo prefill, include the desired initial text in the Assistant message (Claude’s response will continue from where the Assistant message leaves off):\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\n```\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\n\n```\n\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nHow to prefill Claude’s response\n\n\nTo prefill, include the desired initial text in the Assistant message (Claude’s response will continue from where the Assistant message leaves off):\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\n```\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\n\n```\n\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
@@ -12450,7 +12450,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nHow to prefill Claude’s response\n\n\nTo prefill, include the desired initial text in the Assistant message (Claude’s response will continue from where the Assistant message leaves off):\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\n```\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\n\n```\n\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nHow to prefill Claude’s response\n\n\nTo prefill, include the desired initial text in the Assistant message (Claude’s response will continue from where the Assistant message leaves off):\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\n```\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\n\n```\n\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
@@ -13209,26 +13209,26 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n\n\nGet started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n\n\nGet started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
"query": "What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?",
- "correct_answer": "The Claude Cookbook provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting."
+ "correct_answer": "The Claude Cookbooks provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting."
},
"response": {
- "output": "Two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings, are:\n\n1. Explore the \"Claude Cookbook\" which provides practical examples and hands-on tutorials, including sections on \"PDF Upload & Summarization\" and \"Embeddings with VoyageAI\".\n\n2. Review the \"More Resources\" section, which includes a \"Prompt Engineering Guide\" to master the art of prompt crafting, and the \"API Documentation\" which covers how to interact with Claude via the API, including details on request formats, response handling, and troubleshooting.",
+ "output": "Two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings, are:\n\n1. Explore the \"Claude Cookbooks\" which provides practical examples and hands-on tutorials, including sections on \"PDF Upload & Summarization\" and \"Embeddings with VoyageAI\".\n\n2. Review the \"More Resources\" section, which includes a \"Prompt Engineering Guide\" to master the art of prompt crafting, and the \"API Documentation\" which covers how to interact with Claude via the API, including details on request formats, response handling, and troubleshooting.",
"tokenUsage": {}
},
"success": false,
"score": 0,
"namedScores": {},
"latencyMs": 2,
- "error": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. While it correctly identifies the Claude Cookbook as an interactive resource for learning Claude's capabilities, it fails to mention the Developer Console and its prompt generator tool, which is specifically highlighted in the Correct Answer. \n\nThe Generated Answer instead mentions the \"More Resources\" section and a \"Prompt Engineering Guide\", which are not mentioned in the Correct Answer and may not be as interactive as the Developer Console's prompt generator tool.\n\nAlthough both answers discuss ways to learn about Claude's capabilities, including PDF handling and embeddings, the Generated Answer doesn't fully align with the specific interactive methods outlined in the Correct Answer.",
+ "error": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. While it correctly identifies the Claude Cookbooks as an interactive resource for learning Claude's capabilities, it fails to mention the Developer Console and its prompt generator tool, which is specifically highlighted in the Correct Answer. \n\nThe Generated Answer instead mentions the \"More Resources\" section and a \"Prompt Engineering Guide\", which are not mentioned in the Correct Answer and may not be as interactive as the Developer Console's prompt generator tool.\n\nAlthough both answers discuss ways to learn about Claude's capabilities, including PDF handling and embeddings, the Generated Answer doesn't fully align with the specific interactive methods outlined in the Correct Answer.",
"gradingResult": {
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. While it correctly identifies the Claude Cookbook as an interactive resource for learning Claude's capabilities, it fails to mention the Developer Console and its prompt generator tool, which is specifically highlighted in the Correct Answer. \n\nThe Generated Answer instead mentions the \"More Resources\" section and a \"Prompt Engineering Guide\", which are not mentioned in the Correct Answer and may not be as interactive as the Developer Console's prompt generator tool.\n\nAlthough both answers discuss ways to learn about Claude's capabilities, including PDF handling and embeddings, the Generated Answer doesn't fully align with the specific interactive methods outlined in the Correct Answer.",
+ "reason": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. While it correctly identifies the Claude Cookbooks as an interactive resource for learning Claude's capabilities, it fails to mention the Developer Console and its prompt generator tool, which is specifically highlighted in the Correct Answer. \n\nThe Generated Answer instead mentions the \"More Resources\" section and a \"Prompt Engineering Guide\", which are not mentioned in the Correct Answer and may not be as interactive as the Developer Console's prompt generator tool.\n\nAlthough both answers discuss ways to learn about Claude's capabilities, including PDF handling and embeddings, the Generated Answer doesn't fully align with the specific interactive methods outlined in the Correct Answer.",
"namedScores": {},
"tokensUsed": {
"total": 0,
@@ -13239,7 +13239,7 @@
{
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. While it correctly identifies the Claude Cookbook as an interactive resource for learning Claude's capabilities, it fails to mention the Developer Console and its prompt generator tool, which is specifically highlighted in the Correct Answer. \n\nThe Generated Answer instead mentions the \"More Resources\" section and a \"Prompt Engineering Guide\", which are not mentioned in the Correct Answer and may not be as interactive as the Developer Console's prompt generator tool.\n\nAlthough both answers discuss ways to learn about Claude's capabilities, including PDF handling and embeddings, the Generated Answer doesn't fully align with the specific interactive methods outlined in the Correct Answer.",
+ "reason": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. While it correctly identifies the Claude Cookbooks as an interactive resource for learning Claude's capabilities, it fails to mention the Developer Console and its prompt generator tool, which is specifically highlighted in the Correct Answer. \n\nThe Generated Answer instead mentions the \"More Resources\" section and a \"Prompt Engineering Guide\", which are not mentioned in the Correct Answer and may not be as interactive as the Developer Console's prompt generator tool.\n\nAlthough both answers discuss ways to learn about Claude's capabilities, including PDF handling and embeddings, the Generated Answer doesn't fully align with the specific interactive methods outlined in the Correct Answer.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -13307,15 +13307,15 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude for Sheets usage examples\n\nText\n Claude for Sheets usage examples\n\n\n \n\nSummary: \n Claude for Sheets usage examples provide demonstrations of how to integrate the Claude AI model into Google Sheets, enabling users to leverage the model's capabilities within the spreadsheet environment for tasks such as data analysis, text generation, and more. \n \n\n \n Claude Cookbook\n\nText\n Claude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n\nSummary: \n The Claude Cookbook provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, how to extend Claude's capabilities by integrating external tools and functions, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks. \n \n\n \n Further information\n\nText\n Further information\n\n\nFor more information regarding this extension, see the Claude for Sheets Google Workspace Marketplace overview page.\nEmbeddingsVisionxlinkedin\nEmbeddingsVision\nxlinkedin\nWhy use Claude for Sheets? Get started with Claude for Sheets Install Claude for Sheets Enter your first prompt Advanced use Optional function parameters Claude for Sheets usage examples Prompt engineering interactive tutorial Prompt engineering workflow Claude for Sheets workbook template Troubleshooting Further information\nWhy use Claude for Sheets?Get started with Claude for SheetsInstall Claude for SheetsEnter your first promptAdvanced useOptional function parametersClaude for Sheets usage examplesPrompt engineering interactive tutorialPrompt engineering workflowClaude for Sheets workbook templateTroubleshootingFurther information\n \n\nSummary: \n The \"Further information\" section provides additional resources for the Claude for Sheets extension, including a Google Workspace Marketplace overview page. It also lists various related topics such as Embeddings, Vision, and LinkedIn, which may be of interest to users. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude for Sheets usage examples\n\nText\n Claude for Sheets usage examples\n\n\n \n\nSummary: \n Claude for Sheets usage examples provide demonstrations of how to integrate the Claude AI model into Google Sheets, enabling users to leverage the model's capabilities within the spreadsheet environment for tasks such as data analysis, text generation, and more. \n \n\n \n Claude Cookbooks\n\nText\n Claude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n\nSummary: \n The Claude Cookbooks provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, how to extend Claude's capabilities by integrating external tools and functions, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks. \n \n\n \n Further information\n\nText\n Further information\n\n\nFor more information regarding this extension, see the Claude for Sheets Google Workspace Marketplace overview page.\nEmbeddingsVisionxlinkedin\nEmbeddingsVision\nxlinkedin\nWhy use Claude for Sheets? Get started with Claude for Sheets Install Claude for Sheets Enter your first prompt Advanced use Optional function parameters Claude for Sheets usage examples Prompt engineering interactive tutorial Prompt engineering workflow Claude for Sheets workbook template Troubleshooting Further information\nWhy use Claude for Sheets?Get started with Claude for SheetsInstall Claude for SheetsEnter your first promptAdvanced useOptional function parametersClaude for Sheets usage examplesPrompt engineering interactive tutorialPrompt engineering workflowClaude for Sheets workbook templateTroubleshootingFurther information\n \n\nSummary: \n The \"Further information\" section provides additional resources for the Claude for Sheets extension, including a Google Workspace Marketplace overview page. It also lists various related topics such as Embeddings, Vision, and LinkedIn, which may be of interest to users. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
"query": "What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?",
- "correct_answer": "The Claude Cookbook provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting."
+ "correct_answer": "The Claude Cookbooks provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting."
},
"response": {
- "output": "Two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings, are:\n\n1. The Claude Cookbook, which provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n2. The Claude for Sheets usage examples, which demonstrate how to integrate the Claude AI model into Google Sheets, enabling users to leverage the model's capabilities within the spreadsheet environment for tasks such as data analysis and text generation.",
+ "output": "Two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings, are:\n\n1. The Claude Cookbooks, which provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n2. The Claude for Sheets usage examples, which demonstrate how to integrate the Claude AI model into Google Sheets, enabling users to leverage the model's capabilities within the spreadsheet environment for tasks such as data analysis and text generation.",
"tokenUsage": {
"total": 1025,
"prompt": 894,
@@ -13328,11 +13328,11 @@
"namedScores": {},
"latencyMs": 1389,
"cost": 0.00038725,
- "error": "The Generated Answer is partially correct but misses a key element and includes incorrect information. It correctly mentions the Claude Cookbook as an interactive way to learn Claude's capabilities, including uploading PDFs and working with embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is a critical piece of information from the Correct Answer. Instead, it incorrectly includes Claude for Sheets usage examples, which is not mentioned in the Correct Answer as one of the two interactive ways to learn Claude's capabilities. Due to this significant omission and incorrect addition, the Generated Answer cannot be considered fully correct.",
+ "error": "The Generated Answer is partially correct but misses a key element and includes incorrect information. It correctly mentions the Claude Cookbooks as an interactive way to learn Claude's capabilities, including uploading PDFs and working with embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is a critical piece of information from the Correct Answer. Instead, it incorrectly includes Claude for Sheets usage examples, which is not mentioned in the Correct Answer as one of the two interactive ways to learn Claude's capabilities. Due to this significant omission and incorrect addition, the Generated Answer cannot be considered fully correct.",
"gradingResult": {
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key element and includes incorrect information. It correctly mentions the Claude Cookbook as an interactive way to learn Claude's capabilities, including uploading PDFs and working with embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is a critical piece of information from the Correct Answer. Instead, it incorrectly includes Claude for Sheets usage examples, which is not mentioned in the Correct Answer as one of the two interactive ways to learn Claude's capabilities. Due to this significant omission and incorrect addition, the Generated Answer cannot be considered fully correct.",
+ "reason": "The Generated Answer is partially correct but misses a key element and includes incorrect information. It correctly mentions the Claude Cookbooks as an interactive way to learn Claude's capabilities, including uploading PDFs and working with embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is a critical piece of information from the Correct Answer. Instead, it incorrectly includes Claude for Sheets usage examples, which is not mentioned in the Correct Answer as one of the two interactive ways to learn Claude's capabilities. Due to this significant omission and incorrect addition, the Generated Answer cannot be considered fully correct.",
"namedScores": {},
"tokensUsed": {
"total": 0,
@@ -13343,7 +13343,7 @@
{
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key element and includes incorrect information. It correctly mentions the Claude Cookbook as an interactive way to learn Claude's capabilities, including uploading PDFs and working with embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is a critical piece of information from the Correct Answer. Instead, it incorrectly includes Claude for Sheets usage examples, which is not mentioned in the Correct Answer as one of the two interactive ways to learn Claude's capabilities. Due to this significant omission and incorrect addition, the Generated Answer cannot be considered fully correct.",
+ "reason": "The Generated Answer is partially correct but misses a key element and includes incorrect information. It correctly mentions the Claude Cookbooks as an interactive way to learn Claude's capabilities, including uploading PDFs and working with embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is a critical piece of information from the Correct Answer. Instead, it incorrectly includes Claude for Sheets usage examples, which is not mentioned in the Correct Answer as one of the two interactive ways to learn Claude's capabilities. Due to this significant omission and incorrect addition, the Generated Answer cannot be considered fully correct.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -13359,15 +13359,15 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n\n\nGet started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n\n\nGet started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
"query": "What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?",
- "correct_answer": "The Claude Cookbook provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting."
+ "correct_answer": "The Claude Cookbooks provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting."
},
"response": {
- "output": "Based on the provided context, two interactive ways to learn how to use Claude's capabilities, including uploading PDFs and generating embeddings, are:\n\n1. Using the Claude Cookbook: This collection of Jupyter notebooks offers practical examples and hands-on tutorials. Specifically, it includes a section on \"PDF Upload & Summarization\" where you can learn how to upload PDFs and have Claude summarize their content. It also has a section on \"Embeddings with VoyageAI\" where you can explore how to create and use embeddings for advanced text similarity and search tasks.\n\n2. Experimenting directly with Claude: The context mentions that you can \"head over to claude.ai or our web Console to start experimenting with Claude right away!\" This allows for direct interaction with Claude to test various capabilities, including potentially uploading PDFs and working with embeddings.",
+ "output": "Based on the provided context, two interactive ways to learn how to use Claude's capabilities, including uploading PDFs and generating embeddings, are:\n\n1. Using the Claude Cookbooks: This collection of Jupyter notebooks offers practical examples and hands-on tutorials. Specifically, it includes a section on \"PDF Upload & Summarization\" where you can learn how to upload PDFs and have Claude summarize their content. It also has a section on \"Embeddings with VoyageAI\" where you can explore how to create and use embeddings for advanced text similarity and search tasks.\n\n2. Experimenting directly with Claude: The context mentions that you can \"head over to claude.ai or our web Console to start experimenting with Claude right away!\" This allows for direct interaction with Claude to test various capabilities, including potentially uploading PDFs and working with embeddings.",
"tokenUsage": {
"total": 1217,
"prompt": 1031,
@@ -13380,11 +13380,11 @@
"namedScores": {},
"latencyMs": 4115,
"cost": 0.005883,
- "error": "The Generated Answer is partially correct but misses a key piece of information. It correctly identifies the Claude Cookbook as an interactive way to learn about Claude's capabilities, including uploading PDFs and generating embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is specifically mentioned in the Correct Answer as the second interactive learning method.\n\nInstead, the Generated Answer suggests \"Experimenting directly with Claude\" via claude.ai or the web Console as the second method. While this is a way to interact with Claude, it's not as specific or tailored to learning about capabilities as the Developer Console's prompt generator tool mentioned in the Correct Answer.\n\nDue to this omission of a critical piece of information (the Developer Console and its prompt generator), and the substitution of a less specific alternative, the Generated Answer cannot be considered fully correct.",
+ "error": "The Generated Answer is partially correct but misses a key piece of information. It correctly identifies the Claude Cookbooks as an interactive way to learn about Claude's capabilities, including uploading PDFs and generating embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is specifically mentioned in the Correct Answer as the second interactive learning method.\n\nInstead, the Generated Answer suggests \"Experimenting directly with Claude\" via claude.ai or the web Console as the second method. While this is a way to interact with Claude, it's not as specific or tailored to learning about capabilities as the Developer Console's prompt generator tool mentioned in the Correct Answer.\n\nDue to this omission of a critical piece of information (the Developer Console and its prompt generator), and the substitution of a less specific alternative, the Generated Answer cannot be considered fully correct.",
"gradingResult": {
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key piece of information. It correctly identifies the Claude Cookbook as an interactive way to learn about Claude's capabilities, including uploading PDFs and generating embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is specifically mentioned in the Correct Answer as the second interactive learning method.\n\nInstead, the Generated Answer suggests \"Experimenting directly with Claude\" via claude.ai or the web Console as the second method. While this is a way to interact with Claude, it's not as specific or tailored to learning about capabilities as the Developer Console's prompt generator tool mentioned in the Correct Answer.\n\nDue to this omission of a critical piece of information (the Developer Console and its prompt generator), and the substitution of a less specific alternative, the Generated Answer cannot be considered fully correct.",
+ "reason": "The Generated Answer is partially correct but misses a key piece of information. It correctly identifies the Claude Cookbooks as an interactive way to learn about Claude's capabilities, including uploading PDFs and generating embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is specifically mentioned in the Correct Answer as the second interactive learning method.\n\nInstead, the Generated Answer suggests \"Experimenting directly with Claude\" via claude.ai or the web Console as the second method. While this is a way to interact with Claude, it's not as specific or tailored to learning about capabilities as the Developer Console's prompt generator tool mentioned in the Correct Answer.\n\nDue to this omission of a critical piece of information (the Developer Console and its prompt generator), and the substitution of a less specific alternative, the Generated Answer cannot be considered fully correct.",
"namedScores": {},
"tokensUsed": {
"total": 0,
@@ -13395,7 +13395,7 @@
{
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key piece of information. It correctly identifies the Claude Cookbook as an interactive way to learn about Claude's capabilities, including uploading PDFs and generating embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is specifically mentioned in the Correct Answer as the second interactive learning method.\n\nInstead, the Generated Answer suggests \"Experimenting directly with Claude\" via claude.ai or the web Console as the second method. While this is a way to interact with Claude, it's not as specific or tailored to learning about capabilities as the Developer Console's prompt generator tool mentioned in the Correct Answer.\n\nDue to this omission of a critical piece of information (the Developer Console and its prompt generator), and the substitution of a less specific alternative, the Generated Answer cannot be considered fully correct.",
+ "reason": "The Generated Answer is partially correct but misses a key piece of information. It correctly identifies the Claude Cookbooks as an interactive way to learn about Claude's capabilities, including uploading PDFs and generating embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is specifically mentioned in the Correct Answer as the second interactive learning method.\n\nInstead, the Generated Answer suggests \"Experimenting directly with Claude\" via claude.ai or the web Console as the second method. While this is a way to interact with Claude, it's not as specific or tailored to learning about capabilities as the Developer Console's prompt generator tool mentioned in the Correct Answer.\n\nDue to this omission of a critical piece of information (the Developer Console and its prompt generator), and the substitution of a less specific alternative, the Generated Answer cannot be considered fully correct.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -13411,15 +13411,15 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude for Sheets usage examples\n\nText\n Claude for Sheets usage examples\n\n\n \n\nSummary: \n Claude for Sheets usage examples provide demonstrations of how to integrate the Claude AI model into Google Sheets, enabling users to leverage the model's capabilities within the spreadsheet environment for tasks such as data analysis, text generation, and more. \n \n\n \n Claude Cookbook\n\nText\n Claude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n\nSummary: \n The Claude Cookbook provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, how to extend Claude's capabilities by integrating external tools and functions, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks. \n \n\n \n Further information\n\nText\n Further information\n\n\nFor more information regarding this extension, see the Claude for Sheets Google Workspace Marketplace overview page.\nEmbeddingsVisionxlinkedin\nEmbeddingsVision\nxlinkedin\nWhy use Claude for Sheets? Get started with Claude for Sheets Install Claude for Sheets Enter your first prompt Advanced use Optional function parameters Claude for Sheets usage examples Prompt engineering interactive tutorial Prompt engineering workflow Claude for Sheets workbook template Troubleshooting Further information\nWhy use Claude for Sheets?Get started with Claude for SheetsInstall Claude for SheetsEnter your first promptAdvanced useOptional function parametersClaude for Sheets usage examplesPrompt engineering interactive tutorialPrompt engineering workflowClaude for Sheets workbook templateTroubleshootingFurther information\n \n\nSummary: \n The \"Further information\" section provides additional resources for the Claude for Sheets extension, including a Google Workspace Marketplace overview page. It also lists various related topics such as Embeddings, Vision, and LinkedIn, which may be of interest to users. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude for Sheets usage examples\n\nText\n Claude for Sheets usage examples\n\n\n \n\nSummary: \n Claude for Sheets usage examples provide demonstrations of how to integrate the Claude AI model into Google Sheets, enabling users to leverage the model's capabilities within the spreadsheet environment for tasks such as data analysis, text generation, and more. \n \n\n \n Claude Cookbooks\n\nText\n Claude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n\nSummary: \n The Claude Cookbooks provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, how to extend Claude's capabilities by integrating external tools and functions, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks. \n \n\n \n Further information\n\nText\n Further information\n\n\nFor more information regarding this extension, see the Claude for Sheets Google Workspace Marketplace overview page.\nEmbeddingsVisionxlinkedin\nEmbeddingsVision\nxlinkedin\nWhy use Claude for Sheets? Get started with Claude for Sheets Install Claude for Sheets Enter your first prompt Advanced use Optional function parameters Claude for Sheets usage examples Prompt engineering interactive tutorial Prompt engineering workflow Claude for Sheets workbook template Troubleshooting Further information\nWhy use Claude for Sheets?Get started with Claude for SheetsInstall Claude for SheetsEnter your first promptAdvanced useOptional function parametersClaude for Sheets usage examplesPrompt engineering interactive tutorialPrompt engineering workflowClaude for Sheets workbook templateTroubleshootingFurther information\n \n\nSummary: \n The \"Further information\" section provides additional resources for the Claude for Sheets extension, including a Google Workspace Marketplace overview page. It also lists various related topics such as Embeddings, Vision, and LinkedIn, which may be of interest to users. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
"query": "What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?",
- "correct_answer": "The Claude Cookbook provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting."
+ "correct_answer": "The Claude Cookbooks provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting."
},
"response": {
- "output": "Based on the provided context, two interactive ways to learn how to use Claude's capabilities are:\n\n1. Exploring the Claude Cookbook: This collection of Jupyter notebooks offers practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, as well as how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n2. Using the prompt engineering interactive tutorial: While not explicitly described in detail, the context mentions a \"Prompt engineering interactive tutorial\" as part of the Claude for Sheets resources, which likely provides an interactive way to learn about Claude's capabilities within the Google Sheets environment.",
+ "output": "Based on the provided context, two interactive ways to learn how to use Claude's capabilities are:\n\n1. Exploring the Claude Cookbooks: This collection of Jupyter notebooks offers practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, as well as how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n2. Using the prompt engineering interactive tutorial: While not explicitly described in detail, the context mentions a \"Prompt engineering interactive tutorial\" as part of the Claude for Sheets resources, which likely provides an interactive way to learn about Claude's capabilities within the Google Sheets environment.",
"tokenUsage": {
"total": 1037,
"prompt": 894,
@@ -13432,11 +13432,11 @@
"namedScores": {},
"latencyMs": 3983,
"cost": 0.004827,
- "error": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. Both answers correctly identify the Claude Cookbook with its interactive Jupyter notebooks as one way to learn about Claude's capabilities, including uploading PDFs and working with embeddings. However, the Generated Answer fails to mention the Developer Console and its prompt generator tool, which is the second interactive method specified in the Correct Answer. Instead, it mentions a \"prompt engineering interactive tutorial\" for Google Sheets, which is not part of the Correct Answer. This omission of a critical piece of information (the Developer Console) and the inclusion of an unmentioned element makes the Generated Answer incorrect overall.",
+ "error": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. Both answers correctly identify the Claude Cookbooks with its interactive Jupyter notebooks as one way to learn about Claude's capabilities, including uploading PDFs and working with embeddings. However, the Generated Answer fails to mention the Developer Console and its prompt generator tool, which is the second interactive method specified in the Correct Answer. Instead, it mentions a \"prompt engineering interactive tutorial\" for Google Sheets, which is not part of the Correct Answer. This omission of a critical piece of information (the Developer Console) and the inclusion of an unmentioned element makes the Generated Answer incorrect overall.",
"gradingResult": {
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. Both answers correctly identify the Claude Cookbook with its interactive Jupyter notebooks as one way to learn about Claude's capabilities, including uploading PDFs and working with embeddings. However, the Generated Answer fails to mention the Developer Console and its prompt generator tool, which is the second interactive method specified in the Correct Answer. Instead, it mentions a \"prompt engineering interactive tutorial\" for Google Sheets, which is not part of the Correct Answer. This omission of a critical piece of information (the Developer Console) and the inclusion of an unmentioned element makes the Generated Answer incorrect overall.",
+ "reason": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. Both answers correctly identify the Claude Cookbooks with its interactive Jupyter notebooks as one way to learn about Claude's capabilities, including uploading PDFs and working with embeddings. However, the Generated Answer fails to mention the Developer Console and its prompt generator tool, which is the second interactive method specified in the Correct Answer. Instead, it mentions a \"prompt engineering interactive tutorial\" for Google Sheets, which is not part of the Correct Answer. This omission of a critical piece of information (the Developer Console) and the inclusion of an unmentioned element makes the Generated Answer incorrect overall.",
"namedScores": {},
"tokensUsed": {
"total": 0,
@@ -13447,7 +13447,7 @@
{
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. Both answers correctly identify the Claude Cookbook with its interactive Jupyter notebooks as one way to learn about Claude's capabilities, including uploading PDFs and working with embeddings. However, the Generated Answer fails to mention the Developer Console and its prompt generator tool, which is the second interactive method specified in the Correct Answer. Instead, it mentions a \"prompt engineering interactive tutorial\" for Google Sheets, which is not part of the Correct Answer. This omission of a critical piece of information (the Developer Console) and the inclusion of an unmentioned element makes the Generated Answer incorrect overall.",
+ "reason": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. Both answers correctly identify the Claude Cookbooks with its interactive Jupyter notebooks as one way to learn about Claude's capabilities, including uploading PDFs and working with embeddings. However, the Generated Answer fails to mention the Developer Console and its prompt generator tool, which is the second interactive method specified in the Correct Answer. Instead, it mentions a \"prompt engineering interactive tutorial\" for Google Sheets, which is not part of the Correct Answer. This omission of a critical piece of information (the Developer Console) and the inclusion of an unmentioned element makes the Generated Answer incorrect overall.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -13508,15 +13508,15 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbook\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Get started\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbooks\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Get started\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_three"
},
"vars": {
"query": "What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?",
- "correct_answer": "The Claude Cookbook provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting."
+ "correct_answer": "The Claude Cookbooks provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting."
},
"response": {
- "output": "Two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings, are:\n\n1. The Claude Cookbook, which provides Jupyter notebooks that demonstrate how to upload PDFs and have Claude summarize their content, as well as how to create and use embeddings with VoyageAI.\n\n2. The Anthropic Developer Console, which includes the Workbench and prompt generator tool, allowing you to enjoy easier and more powerful prompting in your browser to explore Claude's capabilities.",
+ "output": "Two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings, are:\n\n1. The Claude Cookbooks, which provides Jupyter notebooks that demonstrate how to upload PDFs and have Claude summarize their content, as well as how to create and use embeddings with VoyageAI.\n\n2. The Anthropic Developer Console, which includes the Workbench and prompt generator tool, allowing you to enjoy easier and more powerful prompting in your browser to explore Claude's capabilities.",
"tokenUsage": {
"total": 1043,
"prompt": 928,
@@ -13543,7 +13543,7 @@
{
"pass": true,
"score": 1,
- "reason": "The Generated Answer is essentially correct and aligns well with the Correct Answer. Both answers mention the two key interactive ways to learn Claude's capabilities:\n\n1. The Claude Cookbook with its interactive Jupyter notebooks for demonstrating PDF uploads, embeddings, and other features.\n\n2. The Developer Console, which includes tools for easier and more powerful prompting.\n\nThe Generated Answer provides slightly more detail about the Cookbook mentioning VoyageAI for embeddings and summarizing PDF content, while the Correct Answer is more general. The Generated Answer also specifies the Workbench as part of the Developer Console, which isn't mentioned in the Correct Answer but doesn't contradict it.\n\nOverall, the substance and key points of both answers are the same, and there are no critical pieces of information missing or contradictions between the two.",
+ "reason": "The Generated Answer is essentially correct and aligns well with the Correct Answer. Both answers mention the two key interactive ways to learn Claude's capabilities:\n\n1. The Claude Cookbooks with its interactive Jupyter notebooks for demonstrating PDF uploads, embeddings, and other features.\n\n2. The Developer Console, which includes tools for easier and more powerful prompting.\n\nThe Generated Answer provides slightly more detail about the Cookbook mentioning VoyageAI for embeddings and summarizing PDF content, while the Correct Answer is more general. The Generated Answer also specifies the Workbench as part of the Developer Console, which isn't mentioned in the Correct Answer but doesn't contradict it.\n\nOverall, the substance and key points of both answers are the same, and there are no critical pieces of information missing or contradictions between the two.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -13559,15 +13559,15 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbook\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Get started\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbooks\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Get started\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_three"
},
"vars": {
"query": "What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?",
- "correct_answer": "The Claude Cookbook provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting."
+ "correct_answer": "The Claude Cookbooks provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting."
},
"response": {
- "output": "Based on the provided context, there are two interactive ways to learn how to use Claude's capabilities, including uploading PDFs and generating embeddings:\n\n1. Claude Cookbook: This is a collection of Jupyter notebooks that provide practical examples and hands-on tutorials. It specifically includes lessons on PDF upload and summarization, as well as creating and using embeddings with VoyageAI for advanced text similarity and search tasks.\n\n2. Developer Console: This tool offers a Workbench and prompt generator in your browser, allowing for easier and more powerful prompting. While not explicitly stated to cover PDF uploads and embeddings, it provides an interactive environment to explore Claude's capabilities.\n\nBoth of these resources offer interactive ways to learn and experiment with Claude's various features, including those mentioned in the query.",
+ "output": "Based on the provided context, there are two interactive ways to learn how to use Claude's capabilities, including uploading PDFs and generating embeddings:\n\n1. Claude Cookbooks: This is a collection of Jupyter notebooks that provide practical examples and hands-on tutorials. It specifically includes lessons on PDF upload and summarization, as well as creating and using embeddings with VoyageAI for advanced text similarity and search tasks.\n\n2. Developer Console: This tool offers a Workbench and prompt generator in your browser, allowing for easier and more powerful prompting. While not explicitly stated to cover PDF uploads and embeddings, it provides an interactive environment to explore Claude's capabilities.\n\nBoth of these resources offer interactive ways to learn and experiment with Claude's various features, including those mentioned in the query.",
"tokenUsage": {
"total": 1098,
"prompt": 928,
@@ -13594,7 +13594,7 @@
{
"pass": true,
"score": 1,
- "reason": "The Generated Answer is correct based on the Correct Answer. Both answers highlight the same two interactive ways to learn Claude's capabilities:\n\n1. The Claude Cookbook: Both answers mention this as a collection of Jupyter notebooks that demonstrate various capabilities, including uploading PDFs and generating embeddings.\n\n2. The Developer Console: Both answers describe this as a tool for easier and more powerful prompting, with the Generated Answer providing slightly more detail about the Workbench feature.\n\nThe Generated Answer covers the key points from the Correct Answer, providing essentially the same information about these two interactive learning methods. There are no critical omissions or contradictions between the two answers.",
+ "reason": "The Generated Answer is correct based on the Correct Answer. Both answers highlight the same two interactive ways to learn Claude's capabilities:\n\n1. The Claude Cookbooks: Both answers mention this as a collection of Jupyter notebooks that demonstrate various capabilities, including uploading PDFs and generating embeddings.\n\n2. The Developer Console: Both answers describe this as a tool for easier and more powerful prompting, with the Generated Answer providing slightly more detail about the Workbench feature.\n\nThe Generated Answer covers the key points from the Correct Answer, providing essentially the same information about these two interactive learning methods. There are no critical omissions or contradictions between the two answers.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -14114,7 +14114,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
@@ -14160,7 +14160,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
@@ -14263,7 +14263,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Get started\n\nText\n Get started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n\nSummary: \n The documentation covers getting started with Anthropic's Claude AI model, including an introduction to its capabilities, a quickstart guide for making API calls, and a prompt library for inspiration. It provides essential information for users new to Claude to learn the basics and start using the model. \n \n\n \n Get started with Claude\n\nText\n Get started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n \n\nSummary: \n The documentation covers getting started with Anthropic's Claude AI model, including a quickstart guide, account creation, API key obtainment, and interactive experimentation through the web Console. It also provides information on support resources and additional model-related topics. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Get started\n\nText\n Get started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n\nSummary: \n The documentation covers getting started with Anthropic's Claude AI model, including an introduction to its capabilities, a quickstart guide for making API calls, and a prompt library for inspiration. It provides essential information for users new to Claude to learn the basics and start using the model. \n \n\n \n Get started with Claude\n\nText\n Get started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n \n\nSummary: \n The documentation covers getting started with Anthropic's Claude AI model, including a quickstart guide, account creation, API key obtainment, and interactive experimentation through the web Console. It also provides information on support resources and additional model-related topics. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
@@ -14315,7 +14315,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Get started\n\nText\n Get started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n\nSummary: \n The documentation covers getting started with Anthropic's Claude AI model, including an introduction to its capabilities, a quickstart guide for making API calls, and a prompt library for inspiration. It provides essential information for users new to Claude to learn the basics and start using the model. \n \n\n \n Get started with Claude\n\nText\n Get started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n \n\nSummary: \n The documentation covers getting started with Anthropic's Claude AI model, including a quickstart guide, account creation, API key obtainment, and interactive experimentation through the web Console. It also provides information on support resources and additional model-related topics. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Get started\n\nText\n Get started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n\nSummary: \n The documentation covers getting started with Anthropic's Claude AI model, including an introduction to its capabilities, a quickstart guide for making API calls, and a prompt library for inspiration. It provides essential information for users new to Claude to learn the basics and start using the model. \n \n\n \n Get started with Claude\n\nText\n Get started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n \n\nSummary: \n The documentation covers getting started with Anthropic's Claude AI model, including a quickstart guide, account creation, API key obtainment, and interactive experimentation through the web Console. It also provides information on support resources and additional model-related topics. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
@@ -14366,7 +14366,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Get started with Claude\n\nGet started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n \n \n\n \n Get started\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n \n\n \n Start building with Claude\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Get started with Claude\n\nGet started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n \n \n\n \n Get started\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n \n\n \n Start building with Claude\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_three"
},
"vars": {
@@ -14418,7 +14418,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Get started with Claude\n\nGet started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n \n \n\n \n Get started\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n \n\n \n Start building with Claude\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Get started with Claude\n\nGet started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n \n \n\n \n Get started\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n \n\n \n Start building with Claude\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_three"
},
"vars": {
@@ -16214,7 +16214,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are some of the key capabilities that make Claude suitable for enterprise use cases requiring integration with specialized applications and processing of large volumes of sensitive data?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Enterprise considerations\n\nEnterprise considerations\n\n\nAlong with an extensive set of features, tools, and capabilities, Claude is also built to be secure, trustworthy, and scalable for wide-reaching enterprise needs.\nFeatureDescriptionSecureEnterprise-grade security and data handling for APISOC II Type 2 certified, HIPAA compliance options for APIAccessible through AWS (GA) and GCP (in private preview)TrustworthyResistant to jailbreaks and misuse. We continuously monitor prompts and outputs for harmful, malicious use cases that violate our AUP.Copyright indemnity protections for paid commercial servicesUniquely positioned to serve high trust industries that process large volumes of sensitive user dataCapable200K token context window for expanded use cases, with future support for 1MTool use, also known as function calling, which allows seamless integration of Claude into specialized applications and custom workflowsMultimodal input capabilities with text output, allowing you to upload images (such as tables, graphs, and photos) along with text prompts for richer context and complex use casesDeveloper Console with Workbench and prompt generation tool for easier, more powerful prompting and experimentationSDKs and APIs to expedite and enhance developmentReliableVery low hallucination ratesAccurate over long documentsGlobalGreat for coding tasks and fluency in English and non-English languages like Spanish and JapaneseEnables use cases like translation services and broader global utilityCost consciousFamily of models balances cost, performance, and intelligence\nEnterprise-grade security and data handling for APISOC II Type 2 certified, HIPAA compliance options for APIAccessible through AWS (GA) and GCP (in private preview)\nResistant to jailbreaks and misuse. We continuously monitor prompts and outputs for harmful, malicious use cases that violate our AUP.Copyright indemnity protections for paid commercial servicesUniquely positioned to serve high trust industries that process large volumes of sensitive user data\n200K token context window for expanded use cases, with future support for 1MTool use, also known as function calling, which allows seamless integration of Claude into specialized applications and custom workflowsMultimodal input capabilities with text output, allowing you to upload images (such as tables, graphs, and photos) along with text prompts for richer context and complex use casesDeveloper Console with Workbench and prompt generation tool for easier, more powerful prompting and experimentationSDKs and APIs to expedite and enhance development\nVery low hallucination ratesAccurate over long documents\nGreat for coding tasks and fluency in English and non-English languages like Spanish and JapaneseEnables use cases like translation services and broader global utility\nFamily of models balances cost, performance, and intelligence\n \n \n\n \n Deploy your classifier\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n \n\n \n Integrate Claude into your Support Workflow\n\nIntegrate Claude into your Support Workflow\n\n\nWhen integrating your code into production, you’ll need to architect how it fits into the flow of your ticket routing system. There are two ways you could go around doing this:\nPush-based: Where the Support Ticket System you’re using (e.g. Zendesk an Anthropic partner) will trigger your code by sending a webhook event to your routing service, which will then classify the intent and route it.\nPull-Based: Where your code could pull for the latest tickets at a certain schedule and then route them.\nWhile the bulk of the classification work discussed in previous sections remains the same, you will need to wrap your code in a service for either of the two approaches above. The choice of approach depends on what APIs the support ticketing system provides. Between the two, the push-based approach using webhooks is more web-scaleable but needs you to expose a public endpoint that might have IT Security implications. The pull-based approach is easier to implement but makes unnecessary calls to the Support Ticket System.\n\nThe diagram above shows the push-based approach in action:\nSupport Ticket Creation - The process begins when a customer creates a new support ticket. The customer provides the necessary information about their issue or inquiry, which is then submitted to the Support Ticket System.\nWebhook Event Generation - Upon receiving the new support ticket, the Support Ticket System should generate a Webhook Event Ticket Created notification. This event triggers the subsequent steps in the ticket routing process.\nTicket Content Retrieval - The webhook event initiates the retrieval of the ticket’s contents from the Support Ticket System. This step ensures that the full details of the customer’s issue are available for analysis and classification.\nSupport Request Classification - Using the retrieved ticket contents, the system classifies the intent behind the support request using your code. This classification helps identify the most appropriate team or service to handle the ticket. For the webhook-based approach to work, your code from the previous section will need to be served using a RESTful API which can be called from the webhook. The endpoint for the request would need to be reachable from the internet.\nTicket Update - Finally, the ticket is updated back into the Support Ticket System, from where the assigned support team can work on resolving it.\nNote: While the classification method calls Claude API, we’ve removed that extra call from the diagram for simplicity.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are some of the key capabilities that make Claude suitable for enterprise use cases requiring integration with specialized applications and processing of large volumes of sensitive data?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Enterprise considerations\n\nEnterprise considerations\n\n\nAlong with an extensive set of features, tools, and capabilities, Claude is also built to be secure, trustworthy, and scalable for wide-reaching enterprise needs.\nFeatureDescriptionSecureEnterprise-grade security and data handling for APISOC II Type 2 certified, HIPAA compliance options for APIAccessible through AWS (GA) and GCP (in private preview)TrustworthyResistant to jailbreaks and misuse. We continuously monitor prompts and outputs for harmful, malicious use cases that violate our AUP.Copyright indemnity protections for paid commercial servicesUniquely positioned to serve high trust industries that process large volumes of sensitive user dataCapable200K token context window for expanded use cases, with future support for 1MTool use, also known as function calling, which allows seamless integration of Claude into specialized applications and custom workflowsMultimodal input capabilities with text output, allowing you to upload images (such as tables, graphs, and photos) along with text prompts for richer context and complex use casesDeveloper Console with Workbench and prompt generation tool for easier, more powerful prompting and experimentationSDKs and APIs to expedite and enhance developmentReliableVery low hallucination ratesAccurate over long documentsGlobalGreat for coding tasks and fluency in English and non-English languages like Spanish and JapaneseEnables use cases like translation services and broader global utilityCost consciousFamily of models balances cost, performance, and intelligence\nEnterprise-grade security and data handling for APISOC II Type 2 certified, HIPAA compliance options for APIAccessible through AWS (GA) and GCP (in private preview)\nResistant to jailbreaks and misuse. We continuously monitor prompts and outputs for harmful, malicious use cases that violate our AUP.Copyright indemnity protections for paid commercial servicesUniquely positioned to serve high trust industries that process large volumes of sensitive user data\n200K token context window for expanded use cases, with future support for 1MTool use, also known as function calling, which allows seamless integration of Claude into specialized applications and custom workflowsMultimodal input capabilities with text output, allowing you to upload images (such as tables, graphs, and photos) along with text prompts for richer context and complex use casesDeveloper Console with Workbench and prompt generation tool for easier, more powerful prompting and experimentationSDKs and APIs to expedite and enhance development\nVery low hallucination ratesAccurate over long documents\nGreat for coding tasks and fluency in English and non-English languages like Spanish and JapaneseEnables use cases like translation services and broader global utility\nFamily of models balances cost, performance, and intelligence\n \n \n\n \n Deploy your classifier\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n \n\n \n Integrate Claude into your Support Workflow\n\nIntegrate Claude into your Support Workflow\n\n\nWhen integrating your code into production, you’ll need to architect how it fits into the flow of your ticket routing system. There are two ways you could go around doing this:\nPush-based: Where the Support Ticket System you’re using (e.g. Zendesk an Anthropic partner) will trigger your code by sending a webhook event to your routing service, which will then classify the intent and route it.\nPull-Based: Where your code could pull for the latest tickets at a certain schedule and then route them.\nWhile the bulk of the classification work discussed in previous sections remains the same, you will need to wrap your code in a service for either of the two approaches above. The choice of approach depends on what APIs the support ticketing system provides. Between the two, the push-based approach using webhooks is more web-scaleable but needs you to expose a public endpoint that might have IT Security implications. The pull-based approach is easier to implement but makes unnecessary calls to the Support Ticket System.\n\nThe diagram above shows the push-based approach in action:\nSupport Ticket Creation - The process begins when a customer creates a new support ticket. The customer provides the necessary information about their issue or inquiry, which is then submitted to the Support Ticket System.\nWebhook Event Generation - Upon receiving the new support ticket, the Support Ticket System should generate a Webhook Event Ticket Created notification. This event triggers the subsequent steps in the ticket routing process.\nTicket Content Retrieval - The webhook event initiates the retrieval of the ticket’s contents from the Support Ticket System. This step ensures that the full details of the customer’s issue are available for analysis and classification.\nSupport Request Classification - Using the retrieved ticket contents, the system classifies the intent behind the support request using your code. This classification helps identify the most appropriate team or service to handle the ticket. For the webhook-based approach to work, your code from the previous section will need to be served using a RESTful API which can be called from the webhook. The endpoint for the request would need to be reachable from the internet.\nTicket Update - Finally, the ticket is updated back into the Support Ticket System, from where the assigned support team can work on resolving it.\nNote: While the classification method calls Claude API, we’ve removed that extra call from the diagram for simplicity.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_three"
},
"vars": {
@@ -16369,7 +16369,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are some of the key capabilities that make Claude suitable for enterprise use cases requiring integration with specialized applications and processing of large volumes of sensitive data?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Enterprise considerations\n\nEnterprise considerations\n\n\nAlong with an extensive set of features, tools, and capabilities, Claude is also built to be secure, trustworthy, and scalable for wide-reaching enterprise needs.\nFeatureDescriptionSecureEnterprise-grade security and data handling for APISOC II Type 2 certified, HIPAA compliance options for APIAccessible through AWS (GA) and GCP (in private preview)TrustworthyResistant to jailbreaks and misuse. We continuously monitor prompts and outputs for harmful, malicious use cases that violate our AUP.Copyright indemnity protections for paid commercial servicesUniquely positioned to serve high trust industries that process large volumes of sensitive user dataCapable200K token context window for expanded use cases, with future support for 1MTool use, also known as function calling, which allows seamless integration of Claude into specialized applications and custom workflowsMultimodal input capabilities with text output, allowing you to upload images (such as tables, graphs, and photos) along with text prompts for richer context and complex use casesDeveloper Console with Workbench and prompt generation tool for easier, more powerful prompting and experimentationSDKs and APIs to expedite and enhance developmentReliableVery low hallucination ratesAccurate over long documentsGlobalGreat for coding tasks and fluency in English and non-English languages like Spanish and JapaneseEnables use cases like translation services and broader global utilityCost consciousFamily of models balances cost, performance, and intelligence\nEnterprise-grade security and data handling for APISOC II Type 2 certified, HIPAA compliance options for APIAccessible through AWS (GA) and GCP (in private preview)\nResistant to jailbreaks and misuse. We continuously monitor prompts and outputs for harmful, malicious use cases that violate our AUP.Copyright indemnity protections for paid commercial servicesUniquely positioned to serve high trust industries that process large volumes of sensitive user data\n200K token context window for expanded use cases, with future support for 1MTool use, also known as function calling, which allows seamless integration of Claude into specialized applications and custom workflowsMultimodal input capabilities with text output, allowing you to upload images (such as tables, graphs, and photos) along with text prompts for richer context and complex use casesDeveloper Console with Workbench and prompt generation tool for easier, more powerful prompting and experimentationSDKs and APIs to expedite and enhance development\nVery low hallucination ratesAccurate over long documents\nGreat for coding tasks and fluency in English and non-English languages like Spanish and JapaneseEnables use cases like translation services and broader global utility\nFamily of models balances cost, performance, and intelligence\n \n \n\n \n Deploy your classifier\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n \n\n \n Integrate Claude into your Support Workflow\n\nIntegrate Claude into your Support Workflow\n\n\nWhen integrating your code into production, you’ll need to architect how it fits into the flow of your ticket routing system. There are two ways you could go around doing this:\nPush-based: Where the Support Ticket System you’re using (e.g. Zendesk an Anthropic partner) will trigger your code by sending a webhook event to your routing service, which will then classify the intent and route it.\nPull-Based: Where your code could pull for the latest tickets at a certain schedule and then route them.\nWhile the bulk of the classification work discussed in previous sections remains the same, you will need to wrap your code in a service for either of the two approaches above. The choice of approach depends on what APIs the support ticketing system provides. Between the two, the push-based approach using webhooks is more web-scaleable but needs you to expose a public endpoint that might have IT Security implications. The pull-based approach is easier to implement but makes unnecessary calls to the Support Ticket System.\n\nThe diagram above shows the push-based approach in action:\nSupport Ticket Creation - The process begins when a customer creates a new support ticket. The customer provides the necessary information about their issue or inquiry, which is then submitted to the Support Ticket System.\nWebhook Event Generation - Upon receiving the new support ticket, the Support Ticket System should generate a Webhook Event Ticket Created notification. This event triggers the subsequent steps in the ticket routing process.\nTicket Content Retrieval - The webhook event initiates the retrieval of the ticket’s contents from the Support Ticket System. This step ensures that the full details of the customer’s issue are available for analysis and classification.\nSupport Request Classification - Using the retrieved ticket contents, the system classifies the intent behind the support request using your code. This classification helps identify the most appropriate team or service to handle the ticket. For the webhook-based approach to work, your code from the previous section will need to be served using a RESTful API which can be called from the webhook. The endpoint for the request would need to be reachable from the internet.\nTicket Update - Finally, the ticket is updated back into the Support Ticket System, from where the assigned support team can work on resolving it.\nNote: While the classification method calls Claude API, we’ve removed that extra call from the diagram for simplicity.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are some of the key capabilities that make Claude suitable for enterprise use cases requiring integration with specialized applications and processing of large volumes of sensitive data?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Enterprise considerations\n\nEnterprise considerations\n\n\nAlong with an extensive set of features, tools, and capabilities, Claude is also built to be secure, trustworthy, and scalable for wide-reaching enterprise needs.\nFeatureDescriptionSecureEnterprise-grade security and data handling for APISOC II Type 2 certified, HIPAA compliance options for APIAccessible through AWS (GA) and GCP (in private preview)TrustworthyResistant to jailbreaks and misuse. We continuously monitor prompts and outputs for harmful, malicious use cases that violate our AUP.Copyright indemnity protections for paid commercial servicesUniquely positioned to serve high trust industries that process large volumes of sensitive user dataCapable200K token context window for expanded use cases, with future support for 1MTool use, also known as function calling, which allows seamless integration of Claude into specialized applications and custom workflowsMultimodal input capabilities with text output, allowing you to upload images (such as tables, graphs, and photos) along with text prompts for richer context and complex use casesDeveloper Console with Workbench and prompt generation tool for easier, more powerful prompting and experimentationSDKs and APIs to expedite and enhance developmentReliableVery low hallucination ratesAccurate over long documentsGlobalGreat for coding tasks and fluency in English and non-English languages like Spanish and JapaneseEnables use cases like translation services and broader global utilityCost consciousFamily of models balances cost, performance, and intelligence\nEnterprise-grade security and data handling for APISOC II Type 2 certified, HIPAA compliance options for APIAccessible through AWS (GA) and GCP (in private preview)\nResistant to jailbreaks and misuse. We continuously monitor prompts and outputs for harmful, malicious use cases that violate our AUP.Copyright indemnity protections for paid commercial servicesUniquely positioned to serve high trust industries that process large volumes of sensitive user data\n200K token context window for expanded use cases, with future support for 1MTool use, also known as function calling, which allows seamless integration of Claude into specialized applications and custom workflowsMultimodal input capabilities with text output, allowing you to upload images (such as tables, graphs, and photos) along with text prompts for richer context and complex use casesDeveloper Console with Workbench and prompt generation tool for easier, more powerful prompting and experimentationSDKs and APIs to expedite and enhance development\nVery low hallucination ratesAccurate over long documents\nGreat for coding tasks and fluency in English and non-English languages like Spanish and JapaneseEnables use cases like translation services and broader global utility\nFamily of models balances cost, performance, and intelligence\n \n \n\n \n Deploy your classifier\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n \n\n \n Integrate Claude into your Support Workflow\n\nIntegrate Claude into your Support Workflow\n\n\nWhen integrating your code into production, you’ll need to architect how it fits into the flow of your ticket routing system. There are two ways you could go around doing this:\nPush-based: Where the Support Ticket System you’re using (e.g. Zendesk an Anthropic partner) will trigger your code by sending a webhook event to your routing service, which will then classify the intent and route it.\nPull-Based: Where your code could pull for the latest tickets at a certain schedule and then route them.\nWhile the bulk of the classification work discussed in previous sections remains the same, you will need to wrap your code in a service for either of the two approaches above. The choice of approach depends on what APIs the support ticketing system provides. Between the two, the push-based approach using webhooks is more web-scaleable but needs you to expose a public endpoint that might have IT Security implications. The pull-based approach is easier to implement but makes unnecessary calls to the Support Ticket System.\n\nThe diagram above shows the push-based approach in action:\nSupport Ticket Creation - The process begins when a customer creates a new support ticket. The customer provides the necessary information about their issue or inquiry, which is then submitted to the Support Ticket System.\nWebhook Event Generation - Upon receiving the new support ticket, the Support Ticket System should generate a Webhook Event Ticket Created notification. This event triggers the subsequent steps in the ticket routing process.\nTicket Content Retrieval - The webhook event initiates the retrieval of the ticket’s contents from the Support Ticket System. This step ensures that the full details of the customer’s issue are available for analysis and classification.\nSupport Request Classification - Using the retrieved ticket contents, the system classifies the intent behind the support request using your code. This classification helps identify the most appropriate team or service to handle the ticket. For the webhook-based approach to work, your code from the previous section will need to be served using a RESTful API which can be called from the webhook. The endpoint for the request would need to be reachable from the internet.\nTicket Update - Finally, the ticket is updated back into the Support Ticket System, from where the assigned support team can work on resolving it.\nNote: While the classification method calls Claude API, we’ve removed that extra call from the diagram for simplicity.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_three"
},
"vars": {
@@ -16824,7 +16824,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
@@ -16870,7 +16870,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
@@ -16973,7 +16973,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nText\n May 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n\nSummary: \n Anthropic has released a Prompt Generator tool in the Developer Console, which helps users create high-quality prompts tailored to their specific tasks. The tool is discussed in a recent blog post, and is part of Anthropic's suite of Claude AI model-related products and services. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Prompt engineering workflow\n\nText\n Prompt engineering workflow\n\n\nOur Claude for Sheets prompting examples workbench is a Claude-powered spreadsheet that houses example prompts and prompt engineering structures.\n \n\nSummary: \n The Anthropic Claude for Sheets prompting examples workbench is a Claude-powered spreadsheet that provides example prompts and prompt engineering structures, serving as a resource for users to explore and learn about prompt engineering. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nText\n May 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n\nSummary: \n Anthropic has released a Prompt Generator tool in the Developer Console, which helps users create high-quality prompts tailored to their specific tasks. The tool is discussed in a recent blog post, and is part of Anthropic's suite of Claude AI model-related products and services. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Prompt engineering workflow\n\nText\n Prompt engineering workflow\n\n\nOur Claude for Sheets prompting examples workbench is a Claude-powered spreadsheet that houses example prompts and prompt engineering structures.\n \n\nSummary: \n The Anthropic Claude for Sheets prompting examples workbench is a Claude-powered spreadsheet that provides example prompts and prompt engineering structures, serving as a resource for users to explore and learn about prompt engineering. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
@@ -17024,7 +17024,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nText\n May 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n\nSummary: \n Anthropic has released a Prompt Generator tool in the Developer Console, which helps users create high-quality prompts tailored to their specific tasks. The tool is discussed in a recent blog post, and is part of Anthropic's suite of Claude AI model-related products and services. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Prompt engineering workflow\n\nText\n Prompt engineering workflow\n\n\nOur Claude for Sheets prompting examples workbench is a Claude-powered spreadsheet that houses example prompts and prompt engineering structures.\n \n\nSummary: \n The Anthropic Claude for Sheets prompting examples workbench is a Claude-powered spreadsheet that provides example prompts and prompt engineering structures, serving as a resource for users to explore and learn about prompt engineering. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nText\n May 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n\nSummary: \n Anthropic has released a Prompt Generator tool in the Developer Console, which helps users create high-quality prompts tailored to their specific tasks. The tool is discussed in a recent blog post, and is part of Anthropic's suite of Claude AI model-related products and services. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Prompt engineering workflow\n\nText\n Prompt engineering workflow\n\n\nOur Claude for Sheets prompting examples workbench is a Claude-powered spreadsheet that houses example prompts and prompt engineering structures.\n \n\nSummary: \n The Anthropic Claude for Sheets prompting examples workbench is a Claude-powered spreadsheet that provides example prompts and prompt engineering structures, serving as a resource for users to explore and learn about prompt engineering. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
@@ -17075,7 +17075,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Before prompt engineering\n\nBefore prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Before prompt engineering\n\nBefore prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_three"
},
"vars": {
@@ -17172,7 +17172,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Before prompt engineering\n\nBefore prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Before prompt engineering\n\nBefore prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_three"
},
"vars": {
@@ -19533,7 +19533,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nImplement Claude for classification\n\n\nThe three key model decision factors are: intelligence, latency, and price.\nFor classification, a smaller model like Claude 3 Haiku is typically ideal due to its speed and efficiency. Though, for classification tasks where specialized knowledge or complex reasoning is required, Sonnet or Opus may be a better choice. Learn more about how Opus, Sonnet, and Haiku compare here.\nUse evaluations to gauge whether a Claude model is performing well enough to launch into production.\n\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n\n\nEvaluation metrics\n\n\nSome success metrics to consider evaluating Claude’s performance on a classification task include:\nCriteriaDescriptionAccuracyThe model’s output exactly matches the golden answer or correctly classifies the input according to the task’s requirements. This is typically calculated as (Number of correct predictions) / (Overall number of predictions).F1 ScoreThe model’s output optimally balances precision and recall.ConsistencyThe model’s output is consistent with its predictions for similar inputs or follows a logical pattern.StructureThe model’s output follows the expected format or structure, making it easy to parse and interpret. For example, many classifiers are expected to output JSON format.SpeedThe model provides a response within the acceptable time limit or latency threshold for the task.Bias and FairnessIf classifying data about people, is it important that the model does not demonstrate any biases based on gender, ethnicity, or other characteristics that would lead to its misclassification.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nImplement Claude for classification\n\n\nThe three key model decision factors are: intelligence, latency, and price.\nFor classification, a smaller model like Claude 3 Haiku is typically ideal due to its speed and efficiency. Though, for classification tasks where specialized knowledge or complex reasoning is required, Sonnet or Opus may be a better choice. Learn more about how Opus, Sonnet, and Haiku compare here.\nUse evaluations to gauge whether a Claude model is performing well enough to launch into production.\n\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n\n\nEvaluation metrics\n\n\nSome success metrics to consider evaluating Claude’s performance on a classification task include:\nCriteriaDescriptionAccuracyThe model’s output exactly matches the golden answer or correctly classifies the input according to the task’s requirements. This is typically calculated as (Number of correct predictions) / (Overall number of predictions).F1 ScoreThe model’s output optimally balances precision and recall.ConsistencyThe model’s output is consistent with its predictions for similar inputs or follows a logical pattern.StructureThe model’s output follows the expected format or structure, making it easy to parse and interpret. For example, many classifiers are expected to output JSON format.SpeedThe model provides a response within the acceptable time limit or latency threshold for the task.Bias and FairnessIf classifying data about people, is it important that the model does not demonstrate any biases based on gender, ethnicity, or other characteristics that would lead to its misclassification.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
@@ -19579,7 +19579,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nImplement Claude for classification\n\n\nThe three key model decision factors are: intelligence, latency, and price.\nFor classification, a smaller model like Claude 3 Haiku is typically ideal due to its speed and efficiency. Though, for classification tasks where specialized knowledge or complex reasoning is required, Sonnet or Opus may be a better choice. Learn more about how Opus, Sonnet, and Haiku compare here.\nUse evaluations to gauge whether a Claude model is performing well enough to launch into production.\n\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n\n\nEvaluation metrics\n\n\nSome success metrics to consider evaluating Claude’s performance on a classification task include:\nCriteriaDescriptionAccuracyThe model’s output exactly matches the golden answer or correctly classifies the input according to the task’s requirements. This is typically calculated as (Number of correct predictions) / (Overall number of predictions).F1 ScoreThe model’s output optimally balances precision and recall.ConsistencyThe model’s output is consistent with its predictions for similar inputs or follows a logical pattern.StructureThe model’s output follows the expected format or structure, making it easy to parse and interpret. For example, many classifiers are expected to output JSON format.SpeedThe model provides a response within the acceptable time limit or latency threshold for the task.Bias and FairnessIf classifying data about people, is it important that the model does not demonstrate any biases based on gender, ethnicity, or other characteristics that would lead to its misclassification.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nImplement Claude for classification\n\n\nThe three key model decision factors are: intelligence, latency, and price.\nFor classification, a smaller model like Claude 3 Haiku is typically ideal due to its speed and efficiency. Though, for classification tasks where specialized knowledge or complex reasoning is required, Sonnet or Opus may be a better choice. Learn more about how Opus, Sonnet, and Haiku compare here.\nUse evaluations to gauge whether a Claude model is performing well enough to launch into production.\n\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n\n\nEvaluation metrics\n\n\nSome success metrics to consider evaluating Claude’s performance on a classification task include:\nCriteriaDescriptionAccuracyThe model’s output exactly matches the golden answer or correctly classifies the input according to the task’s requirements. This is typically calculated as (Number of correct predictions) / (Overall number of predictions).F1 ScoreThe model’s output optimally balances precision and recall.ConsistencyThe model’s output is consistent with its predictions for similar inputs or follows a logical pattern.StructureThe model’s output follows the expected format or structure, making it easy to parse and interpret. For example, many classifiers are expected to output JSON format.SpeedThe model provides a response within the acceptable time limit or latency threshold for the task.Bias and FairnessIf classifying data about people, is it important that the model does not demonstrate any biases based on gender, ethnicity, or other characteristics that would lead to its misclassification.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
@@ -19631,7 +19631,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Implement Claude for classification\n\nText\n Implement Claude for classification\n\n\nThe three key model decision factors are: intelligence, latency, and price.\nFor classification, a smaller model like Claude 3 Haiku is typically ideal due to its speed and efficiency. Though, for classification tasks where specialized knowledge or complex reasoning is required, Sonnet or Opus may be a better choice. Learn more about how Opus, Sonnet, and Haiku compare here.\nUse evaluations to gauge whether a Claude model is performing well enough to launch into production.\n \n\nSummary: \n For classification tasks, the smaller Claude 3 Haiku model is typically ideal due to its speed and efficiency, though Sonnet or Opus may be better for tasks requiring specialized knowledge or complex reasoning. Evaluations should be used to gauge whether a Claude model is performing well enough for production. \n \n\n \n Evaluation metrics\n\nText\n Evaluation metrics\n\n\nSome success metrics to consider evaluating Claude’s performance on a classification task include:\nCriteriaDescriptionAccuracyThe model’s output exactly matches the golden answer or correctly classifies the input according to the task’s requirements. This is typically calculated as (Number of correct predictions) / (Overall number of predictions).F1 ScoreThe model’s output optimally balances precision and recall.ConsistencyThe model’s output is consistent with its predictions for similar inputs or follows a logical pattern.StructureThe model’s output follows the expected format or structure, making it easy to parse and interpret. For example, many classifiers are expected to output JSON format.SpeedThe model provides a response within the acceptable time limit or latency threshold for the task.Bias and FairnessIf classifying data about people, is it important that the model does not demonstrate any biases based on gender, ethnicity, or other characteristics that would lead to its misclassification.\n \n\nSummary: \n The documentation covers evaluation metrics for the Claude AI model, including accuracy, F1 score, consistency, structure, speed, and bias/fairness. These metrics can be used to assess the model's performance on classification tasks, ensuring it meets the required standards for output quality, consistency, and fairness. \n \n\n \n Deploy your classifier\n\nText\n Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n\nSummary: \n Deploy your classifier: Check out the Classification Guide in the Claude Cookbook for code examples on using Claude for classification. The guide covers when to use Claude for classification, establishing your use case, implementing Claude, building prompts, developing test cases, running evaluations, and deploying your classifier. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Implement Claude for classification\n\nText\n Implement Claude for classification\n\n\nThe three key model decision factors are: intelligence, latency, and price.\nFor classification, a smaller model like Claude 3 Haiku is typically ideal due to its speed and efficiency. Though, for classification tasks where specialized knowledge or complex reasoning is required, Sonnet or Opus may be a better choice. Learn more about how Opus, Sonnet, and Haiku compare here.\nUse evaluations to gauge whether a Claude model is performing well enough to launch into production.\n \n\nSummary: \n For classification tasks, the smaller Claude 3 Haiku model is typically ideal due to its speed and efficiency, though Sonnet or Opus may be better for tasks requiring specialized knowledge or complex reasoning. Evaluations should be used to gauge whether a Claude model is performing well enough for production. \n \n\n \n Evaluation metrics\n\nText\n Evaluation metrics\n\n\nSome success metrics to consider evaluating Claude’s performance on a classification task include:\nCriteriaDescriptionAccuracyThe model’s output exactly matches the golden answer or correctly classifies the input according to the task’s requirements. This is typically calculated as (Number of correct predictions) / (Overall number of predictions).F1 ScoreThe model’s output optimally balances precision and recall.ConsistencyThe model’s output is consistent with its predictions for similar inputs or follows a logical pattern.StructureThe model’s output follows the expected format or structure, making it easy to parse and interpret. For example, many classifiers are expected to output JSON format.SpeedThe model provides a response within the acceptable time limit or latency threshold for the task.Bias and FairnessIf classifying data about people, is it important that the model does not demonstrate any biases based on gender, ethnicity, or other characteristics that would lead to its misclassification.\n \n\nSummary: \n The documentation covers evaluation metrics for the Claude AI model, including accuracy, F1 score, consistency, structure, speed, and bias/fairness. These metrics can be used to assess the model's performance on classification tasks, ensuring it meets the required standards for output quality, consistency, and fairness. \n \n\n \n Deploy your classifier\n\nText\n Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n\nSummary: \n Deploy your classifier: Check out the Classification Guide in the Claude Cookbooks for code examples on using Claude for classification. The guide covers when to use Claude for classification, establishing your use case, implementing Claude, building prompts, developing test cases, running evaluations, and deploying your classifier. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
@@ -19734,7 +19734,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Implement Claude for classification\n\nText\n Implement Claude for classification\n\n\nThe three key model decision factors are: intelligence, latency, and price.\nFor classification, a smaller model like Claude 3 Haiku is typically ideal due to its speed and efficiency. Though, for classification tasks where specialized knowledge or complex reasoning is required, Sonnet or Opus may be a better choice. Learn more about how Opus, Sonnet, and Haiku compare here.\nUse evaluations to gauge whether a Claude model is performing well enough to launch into production.\n \n\nSummary: \n For classification tasks, the smaller Claude 3 Haiku model is typically ideal due to its speed and efficiency, though Sonnet or Opus may be better for tasks requiring specialized knowledge or complex reasoning. Evaluations should be used to gauge whether a Claude model is performing well enough for production. \n \n\n \n Evaluation metrics\n\nText\n Evaluation metrics\n\n\nSome success metrics to consider evaluating Claude’s performance on a classification task include:\nCriteriaDescriptionAccuracyThe model’s output exactly matches the golden answer or correctly classifies the input according to the task’s requirements. This is typically calculated as (Number of correct predictions) / (Overall number of predictions).F1 ScoreThe model’s output optimally balances precision and recall.ConsistencyThe model’s output is consistent with its predictions for similar inputs or follows a logical pattern.StructureThe model’s output follows the expected format or structure, making it easy to parse and interpret. For example, many classifiers are expected to output JSON format.SpeedThe model provides a response within the acceptable time limit or latency threshold for the task.Bias and FairnessIf classifying data about people, is it important that the model does not demonstrate any biases based on gender, ethnicity, or other characteristics that would lead to its misclassification.\n \n\nSummary: \n The documentation covers evaluation metrics for the Claude AI model, including accuracy, F1 score, consistency, structure, speed, and bias/fairness. These metrics can be used to assess the model's performance on classification tasks, ensuring it meets the required standards for output quality, consistency, and fairness. \n \n\n \n Deploy your classifier\n\nText\n Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n\nSummary: \n Deploy your classifier: Check out the Classification Guide in the Claude Cookbook for code examples on using Claude for classification. The guide covers when to use Claude for classification, establishing your use case, implementing Claude, building prompts, developing test cases, running evaluations, and deploying your classifier. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Implement Claude for classification\n\nText\n Implement Claude for classification\n\n\nThe three key model decision factors are: intelligence, latency, and price.\nFor classification, a smaller model like Claude 3 Haiku is typically ideal due to its speed and efficiency. Though, for classification tasks where specialized knowledge or complex reasoning is required, Sonnet or Opus may be a better choice. Learn more about how Opus, Sonnet, and Haiku compare here.\nUse evaluations to gauge whether a Claude model is performing well enough to launch into production.\n \n\nSummary: \n For classification tasks, the smaller Claude 3 Haiku model is typically ideal due to its speed and efficiency, though Sonnet or Opus may be better for tasks requiring specialized knowledge or complex reasoning. Evaluations should be used to gauge whether a Claude model is performing well enough for production. \n \n\n \n Evaluation metrics\n\nText\n Evaluation metrics\n\n\nSome success metrics to consider evaluating Claude’s performance on a classification task include:\nCriteriaDescriptionAccuracyThe model’s output exactly matches the golden answer or correctly classifies the input according to the task’s requirements. This is typically calculated as (Number of correct predictions) / (Overall number of predictions).F1 ScoreThe model’s output optimally balances precision and recall.ConsistencyThe model’s output is consistent with its predictions for similar inputs or follows a logical pattern.StructureThe model’s output follows the expected format or structure, making it easy to parse and interpret. For example, many classifiers are expected to output JSON format.SpeedThe model provides a response within the acceptable time limit or latency threshold for the task.Bias and FairnessIf classifying data about people, is it important that the model does not demonstrate any biases based on gender, ethnicity, or other characteristics that would lead to its misclassification.\n \n\nSummary: \n The documentation covers evaluation metrics for the Claude AI model, including accuracy, F1 score, consistency, structure, speed, and bias/fairness. These metrics can be used to assess the model's performance on classification tasks, ensuring it meets the required standards for output quality, consistency, and fairness. \n \n\n \n Deploy your classifier\n\nText\n Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n\nSummary: \n Deploy your classifier: Check out the Classification Guide in the Claude Cookbooks for code examples on using Claude for classification. The guide covers when to use Claude for classification, establishing your use case, implementing Claude, building prompts, developing test cases, running evaluations, and deploying your classifier. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
@@ -19832,7 +19832,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Building evals and test cases\n\nBuilding evals and test cases\n\n\n \n \n\n \n 2. Develop your test cases\n\n2. Develop your test cases\n\n\nTo run your classification evaluation, you will need test cases to run it on. Take a look at our guide to developing test cases.\n \n \n\n \n Deploy your classifier\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Building evals and test cases\n\nBuilding evals and test cases\n\n\n \n \n\n \n 2. Develop your test cases\n\n2. Develop your test cases\n\n\nTo run your classification evaluation, you will need test cases to run it on. Take a look at our guide to developing test cases.\n \n \n\n \n Deploy your classifier\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_three"
},
"vars": {
@@ -19884,7 +19884,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Building evals and test cases\n\nBuilding evals and test cases\n\n\n \n \n\n \n 2. Develop your test cases\n\n2. Develop your test cases\n\n\nTo run your classification evaluation, you will need test cases to run it on. Take a look at our guide to developing test cases.\n \n \n\n \n Deploy your classifier\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Building evals and test cases\n\nBuilding evals and test cases\n\n\n \n \n\n \n 2. Develop your test cases\n\n2. Develop your test cases\n\n\nTo run your classification evaluation, you will need test cases to run it on. Take a look at our guide to developing test cases.\n \n \n\n \n Deploy your classifier\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_three"
},
"vars": {
@@ -22498,15 +22498,15 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
- "query": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
- "correct_answer": "The Claude Cookbook provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs."
+ "query": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
+ "correct_answer": "The Claude Cookbooks provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs."
},
"response": {
- "output": "The Claude Cookbook can help developers learn to use Anthropic's APIs in two ways:\n\n1. Interactive Jupyter notebooks that demonstrate uploading PDFs, creating embeddings, and more.\n2. Hands-on tutorials and practical examples that extend Claude's capabilities by integrating external tools and functions into workflows.",
+ "output": "The Claude Cookbooks can help developers learn to use Anthropic's APIs in two ways:\n\n1. Interactive Jupyter notebooks that demonstrate uploading PDFs, creating embeddings, and more.\n2. Hands-on tutorials and practical examples that extend Claude's capabilities by integrating external tools and functions into workflows.",
"tokenUsage": {}
},
"success": false,
@@ -22595,15 +22595,15 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Next steps\n\nText\n Next steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n\nSummary: \n The summary covers the next steps after making an initial Claude API request, including exploring the Prompt Engineering Guide to optimize Claude's performance, the Claude Cookbook for interactive Jupyter notebooks, and the Prompt Library for example prompts across use cases. It also mentions the overview and prerequisites for working with the Anthropic platform. \n \n\n \n Accessing the API\n\nText\n Accessing the API\n\n\nThe API is made available via our web Console. You can use the Workbench to try out the API in the browser and then generate API keys in Account Settings.\n \n\nSummary: \n The API can be accessed through Anthropic's web Console. Users can use the Workbench to try out the API in the browser and then generate API keys in the Account Settings. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Next steps\n\nText\n Next steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n\nSummary: \n The summary covers the next steps after making an initial Claude API request, including exploring the Prompt Engineering Guide to optimize Claude's performance, the Claude Cookbooks for interactive Jupyter notebooks, and the Prompt Library for example prompts across use cases. It also mentions the overview and prerequisites for working with the Anthropic platform. \n \n\n \n Accessing the API\n\nText\n Accessing the API\n\n\nThe API is made available via our web Console. You can use the Workbench to try out the API in the browser and then generate API keys in Account Settings.\n \n\nSummary: \n The API can be accessed through Anthropic's web Console. Users can use the Workbench to try out the API in the browser and then generate API keys in the Account Settings. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
- "query": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
- "correct_answer": "The Claude Cookbook provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs."
+ "query": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
+ "correct_answer": "The Claude Cookbooks provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs."
},
"response": {
- "output": "The Claude Cookbook can help developers learn to use Anthropic's APIs in two ways:\n\n1. Interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more. The Claude Cookbook provides these interactive notebooks to help developers learn through hands-on examples.\n\n2. The Claude Cookbook is part of Anthropic's suite of developer tools, which also includes the Developer Console with the Workbench and prompt generator tool, as well as the API Reference documentation. These tools enable easier, more powerful prompting and exploration of the Claude API and SDKs.",
+ "output": "The Claude Cookbooks can help developers learn to use Anthropic's APIs in two ways:\n\n1. Interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more. The Claude Cookbooks provides these interactive notebooks to help developers learn through hands-on examples.\n\n2. The Claude Cookbooks is part of Anthropic's suite of developer tools, which also includes the Developer Console with the Workbench and prompt generator tool, as well as the API Reference documentation. These tools enable easier, more powerful prompting and exploration of the Claude API and SDKs.",
"tokenUsage": {
"total": 1132,
"prompt": 1002,
@@ -22698,15 +22698,15 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Next steps\n\nText\n Next steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n\nSummary: \n The summary covers the next steps after making an initial Claude API request, including exploring the Prompt Engineering Guide to optimize Claude's performance, the Claude Cookbook for interactive Jupyter notebooks, and the Prompt Library for example prompts across use cases. It also mentions the overview and prerequisites for working with the Anthropic platform. \n \n\n \n Accessing the API\n\nText\n Accessing the API\n\n\nThe API is made available via our web Console. You can use the Workbench to try out the API in the browser and then generate API keys in Account Settings.\n \n\nSummary: \n The API can be accessed through Anthropic's web Console. Users can use the Workbench to try out the API in the browser and then generate API keys in the Account Settings. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Next steps\n\nText\n Next steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n\nSummary: \n The summary covers the next steps after making an initial Claude API request, including exploring the Prompt Engineering Guide to optimize Claude's performance, the Claude Cookbooks for interactive Jupyter notebooks, and the Prompt Library for example prompts across use cases. It also mentions the overview and prerequisites for working with the Anthropic platform. \n \n\n \n Accessing the API\n\nText\n Accessing the API\n\n\nThe API is made available via our web Console. You can use the Workbench to try out the API in the browser and then generate API keys in Account Settings.\n \n\nSummary: \n The API can be accessed through Anthropic's web Console. Users can use the Workbench to try out the API in the browser and then generate API keys in the Account Settings. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
- "query": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
- "correct_answer": "The Claude Cookbook provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs."
+ "query": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
+ "correct_answer": "The Claude Cookbooks provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs."
},
"response": {
- "output": "The Claude Cookbook can help developers learn to use Anthropic's APIs in two key ways:\n\n1. It provides interactive Jupyter notebooks that demonstrate various API functionalities, including uploading PDFs and working with embeddings.\n\n2. It offers hands-on learning experiences through these interactive notebooks, allowing developers to explore and experiment with the APIs in a guided environment.\n\nThese features enable developers to gain practical experience and understanding of Anthropic's APIs through interactive, example-based learning.",
+ "output": "The Claude Cookbooks can help developers learn to use Anthropic's APIs in two key ways:\n\n1. It provides interactive Jupyter notebooks that demonstrate various API functionalities, including uploading PDFs and working with embeddings.\n\n2. It offers hands-on learning experiences through these interactive notebooks, allowing developers to explore and experiment with the APIs in a guided environment.\n\nThese features enable developers to gain practical experience and understanding of Anthropic's APIs through interactive, example-based learning.",
"tokenUsage": {
"total": 1107,
"prompt": 1002,
@@ -22733,7 +22733,7 @@
{
"pass": true,
"score": 1,
- "reason": "The Generated Answer is correct based on the Correct Answer. Both answers highlight that the Claude Cookbook provides interactive Jupyter notebooks as a way to help developers learn to use Anthropic's APIs. They both specifically mention that these notebooks demonstrate how to upload PDFs and work with embeddings. The Generated Answer expands on this by emphasizing the hands-on learning experience, but this is consistent with and elaborates on the core information provided in the Correct Answer. There are no contradictions or critical missing pieces of information between the two answers.",
+ "reason": "The Generated Answer is correct based on the Correct Answer. Both answers highlight that the Claude Cookbooks provides interactive Jupyter notebooks as a way to help developers learn to use Anthropic's APIs. They both specifically mention that these notebooks demonstrate how to upload PDFs and work with embeddings. The Generated Answer expands on this by emphasizing the hands-on learning experience, but this is consistent with and elaborates on the core information provided in the Correct Answer. There are no contradictions or critical missing pieces of information between the two answers.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -22749,15 +22749,15 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
- "query": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
- "correct_answer": "The Claude Cookbook provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs."
+ "query": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
+ "correct_answer": "The Claude Cookbooks provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs."
},
"response": {
- "output": "The Claude Cookbook can help developers learn to use Anthropic's APIs in two key ways:\n\n1. It provides interactive Jupyter notebooks that demonstrate practical examples, such as uploading PDFs and working with embeddings.\n\n2. It offers hands-on tutorials for specific tasks like PDF summarization and integrating external tools and functions into workflows with Claude.\n\nThese interactive and practical resources allow developers to gain hands-on experience with Anthropic's APIs through guided examples and tutorials.",
+ "output": "The Claude Cookbooks can help developers learn to use Anthropic's APIs in two key ways:\n\n1. It provides interactive Jupyter notebooks that demonstrate practical examples, such as uploading PDFs and working with embeddings.\n\n2. It offers hands-on tutorials for specific tasks like PDF summarization and integrating external tools and functions into workflows with Claude.\n\nThese interactive and practical resources allow developers to gain hands-on experience with Anthropic's APIs through guided examples and tutorials.",
"tokenUsage": {
"total": 1090,
"prompt": 986,
@@ -22784,7 +22784,7 @@
{
"pass": true,
"score": 1,
- "reason": "The Generated Answer is correct based on the Correct Answer. Both answers highlight that the Claude Cookbook provides interactive Jupyter notebooks to help developers learn to use Anthropic's APIs. They both specifically mention working with PDFs and embeddings as examples. While the Generated Answer provides an additional point about tutorials for specific tasks, this doesn't contradict the Correct Answer and can be seen as an elaboration on the same core idea of providing practical, interactive learning resources. The substance of both answers is essentially the same, focusing on the Cookbook's role in providing hands-on, interactive learning experiences for developers working with Anthropic's APIs.",
+ "reason": "The Generated Answer is correct based on the Correct Answer. Both answers highlight that the Claude Cookbooks provides interactive Jupyter notebooks to help developers learn to use Anthropic's APIs. They both specifically mention working with PDFs and embeddings as examples. While the Generated Answer provides an additional point about tutorials for specific tasks, this doesn't contradict the Correct Answer and can be seen as an elaboration on the same core idea of providing practical, interactive learning resources. The substance of both answers is essentially the same, focusing on the Cookbook's role in providing hands-on, interactive learning experiences for developers working with Anthropic's APIs.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -22845,15 +22845,15 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbook\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Next steps\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbooks\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Next steps\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_three"
},
"vars": {
- "query": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
- "correct_answer": "The Claude Cookbook provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs."
+ "query": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
+ "correct_answer": "The Claude Cookbooks provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs."
},
"response": {
- "output": "The Claude Cookbook can help developers learn to use Anthropic's APIs in two ways:\n\n1. Providing interactive Jupyter notebooks that demonstrate how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\n2. Showing how to extend Claude's capabilities by integrating external tools and functions into workflows, as well as how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.",
+ "output": "The Claude Cookbooks can help developers learn to use Anthropic's APIs in two ways:\n\n1. Providing interactive Jupyter notebooks that demonstrate how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\n2. Showing how to extend Claude's capabilities by integrating external tools and functions into workflows, as well as how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.",
"tokenUsage": {
"total": 1144,
"prompt": 1045,
@@ -22897,15 +22897,15 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbook\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Next steps\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbooks\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Next steps\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_three"
},
"vars": {
- "query": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
- "correct_answer": "The Claude Cookbook provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs."
+ "query": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
+ "correct_answer": "The Claude Cookbooks provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs."
},
"response": {
- "output": "The Claude Cookbook can help developers learn to use Anthropic's APIs in two main ways:\n\n1. It provides interactive Jupyter notebooks that demonstrate practical examples and hands-on tutorials. These notebooks cover topics like uploading PDFs and having Claude summarize their content, making it easier to work with long documents.\n\n2. It offers tutorials on extending Claude's capabilities by integrating external tools and functions into workflows. This allows developers to learn about tool use and function calling with Anthropic's APIs.",
+ "output": "The Claude Cookbooks can help developers learn to use Anthropic's APIs in two main ways:\n\n1. It provides interactive Jupyter notebooks that demonstrate practical examples and hands-on tutorials. These notebooks cover topics like uploading PDFs and having Claude summarize their content, making it easier to work with long documents.\n\n2. It offers tutorials on extending Claude's capabilities by integrating external tools and functions into workflows. This allows developers to learn about tool use and function calling with Anthropic's APIs.",
"tokenUsage": {
"total": 1152,
"prompt": 1045,
@@ -22918,11 +22918,11 @@
"namedScores": {},
"latencyMs": 2778,
"cost": 0.00474,
- "error": "The Generated Answer is partially correct but misses a key element from the Correct Answer. Both answers agree that the Claude Cookbook provides interactive Jupyter notebooks to help developers learn to use Anthropic's APIs. However, the Generated Answer does not mention working with embeddings, which is a specific capability highlighted in the Correct Answer. Instead, it introduces information about extending Claude's capabilities and tool use, which is not mentioned in the Correct Answer. While the Generated Answer provides useful information, it doesn't fully align with the specific points outlined in the Correct Answer.",
+ "error": "The Generated Answer is partially correct but misses a key element from the Correct Answer. Both answers agree that the Claude Cookbooks provides interactive Jupyter notebooks to help developers learn to use Anthropic's APIs. However, the Generated Answer does not mention working with embeddings, which is a specific capability highlighted in the Correct Answer. Instead, it introduces information about extending Claude's capabilities and tool use, which is not mentioned in the Correct Answer. While the Generated Answer provides useful information, it doesn't fully align with the specific points outlined in the Correct Answer.",
"gradingResult": {
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key element from the Correct Answer. Both answers agree that the Claude Cookbook provides interactive Jupyter notebooks to help developers learn to use Anthropic's APIs. However, the Generated Answer does not mention working with embeddings, which is a specific capability highlighted in the Correct Answer. Instead, it introduces information about extending Claude's capabilities and tool use, which is not mentioned in the Correct Answer. While the Generated Answer provides useful information, it doesn't fully align with the specific points outlined in the Correct Answer.",
+ "reason": "The Generated Answer is partially correct but misses a key element from the Correct Answer. Both answers agree that the Claude Cookbooks provides interactive Jupyter notebooks to help developers learn to use Anthropic's APIs. However, the Generated Answer does not mention working with embeddings, which is a specific capability highlighted in the Correct Answer. Instead, it introduces information about extending Claude's capabilities and tool use, which is not mentioned in the Correct Answer. While the Generated Answer provides useful information, it doesn't fully align with the specific points outlined in the Correct Answer.",
"namedScores": {},
"tokensUsed": {
"total": 0,
@@ -22933,7 +22933,7 @@
{
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key element from the Correct Answer. Both answers agree that the Claude Cookbook provides interactive Jupyter notebooks to help developers learn to use Anthropic's APIs. However, the Generated Answer does not mention working with embeddings, which is a specific capability highlighted in the Correct Answer. Instead, it introduces information about extending Claude's capabilities and tool use, which is not mentioned in the Correct Answer. While the Generated Answer provides useful information, it doesn't fully align with the specific points outlined in the Correct Answer.",
+ "reason": "The Generated Answer is partially correct but misses a key element from the Correct Answer. Both answers agree that the Claude Cookbooks provides interactive Jupyter notebooks to help developers learn to use Anthropic's APIs. However, the Generated Answer does not mention working with embeddings, which is a specific capability highlighted in the Correct Answer. Instead, it introduces information about extending Claude's capabilities and tool use, which is not mentioned in the Correct Answer. While the Generated Answer provides useful information, it doesn't fully align with the specific points outlined in the Correct Answer.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -23300,7 +23300,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Understanding Results\n\nText\n Understanding Results\n\n\nThe Evaluation tool helps you:\nIdentify edge cases where your prompt might falter\nRate individual results to determine cases where your prompt performance better or worse\nEnsure consistent performance across a range of inputs\nRefine your prompt for better reliability\nBy reviewing results across test cases, you can spot patterns and make informed adjustments to your prompt.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\n\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nStart evaluating your prompts today to build more robust AI applications with Claude!\nReducing latencyGlossaryxlinkedin\nReducing latencyGlossary\nxlinkedin\nAccessing the Evaluate Feature Creating Test Cases Tips for Effective Evaluation Understanding Results\nAccessing the Evaluate FeatureCreating Test CasesTips for Effective EvaluationUnderstanding Results\n \n\nSummary: \n The Evaluation tool helps users identify edge cases, rate individual results, ensure consistent performance, and refine prompts for better reliability. By reviewing results across test cases, users can spot patterns and make informed adjustments to their prompts. The Evaluation tool is currently in beta, and user feedback is valuable for the Anthropic team. \n \n\n \n Prompt engineering tutorial\n\nText\n Prompt engineering tutorial\n\n\nIf you’re an interactive learner, you can dive into our interactive tutorials instead!\nGitHub prompting tutorialAn example-filled tutorial that covers the prompt engineering concepts found in our docs.Google Sheets prompting tutorialA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\nGitHub prompting tutorialAn example-filled tutorial that covers the prompt engineering concepts found in our docs.\n\nGitHub prompting tutorial\nAn example-filled tutorial that covers the prompt engineering concepts found in our docs.\nGoogle Sheets prompting tutorialA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\n\nGoogle Sheets prompting tutorial\nA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\nDevelop test casesPrompt generatorxlinkedin\nDevelop test casesPrompt generator\nxlinkedin\nBefore prompt engineering When to prompt engineer How to prompt engineer Prompt engineering tutorial\nBefore prompt engineeringWhen to prompt engineerHow to prompt engineerPrompt engineering tutorial\n \n\nSummary: \n Anthropic's documentation includes a prompt engineering tutorial, which is available in two formats: a GitHub-based tutorial with examples, and a lighter-weight version in a Google Sheets spreadsheet. These tutorials cover the concepts and techniques of prompt engineering for Anthropic's Claude AI model. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Understanding Results\n\nText\n Understanding Results\n\n\nThe Evaluation tool helps you:\nIdentify edge cases where your prompt might falter\nRate individual results to determine cases where your prompt performance better or worse\nEnsure consistent performance across a range of inputs\nRefine your prompt for better reliability\nBy reviewing results across test cases, you can spot patterns and make informed adjustments to your prompt.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\n\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nStart evaluating your prompts today to build more robust AI applications with Claude!\nReducing latencyGlossaryxlinkedin\nReducing latencyGlossary\nxlinkedin\nAccessing the Evaluate Feature Creating Test Cases Tips for Effective Evaluation Understanding Results\nAccessing the Evaluate FeatureCreating Test CasesTips for Effective EvaluationUnderstanding Results\n \n\nSummary: \n The Evaluation tool helps users identify edge cases, rate individual results, ensure consistent performance, and refine prompts for better reliability. By reviewing results across test cases, users can spot patterns and make informed adjustments to their prompts. The Evaluation tool is currently in beta, and user feedback is valuable for the Anthropic team. \n \n\n \n Prompt engineering tutorial\n\nText\n Prompt engineering tutorial\n\n\nIf you’re an interactive learner, you can dive into our interactive tutorials instead!\nGitHub prompting tutorialAn example-filled tutorial that covers the prompt engineering concepts found in our docs.Google Sheets prompting tutorialA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\nGitHub prompting tutorialAn example-filled tutorial that covers the prompt engineering concepts found in our docs.\n\nGitHub prompting tutorial\nAn example-filled tutorial that covers the prompt engineering concepts found in our docs.\nGoogle Sheets prompting tutorialA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\n\nGoogle Sheets prompting tutorial\nA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\nDevelop test casesPrompt generatorxlinkedin\nDevelop test casesPrompt generator\nxlinkedin\nBefore prompt engineering When to prompt engineer How to prompt engineer Prompt engineering tutorial\nBefore prompt engineeringWhen to prompt engineerHow to prompt engineerPrompt engineering tutorial\n \n\nSummary: \n Anthropic's documentation includes a prompt engineering tutorial, which is available in two formats: a GitHub-based tutorial with examples, and a lighter-weight version in a Google Sheets spreadsheet. These tutorials cover the concepts and techniques of prompt engineering for Anthropic's Claude AI model. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
@@ -23447,7 +23447,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Understanding Results\n\nText\n Understanding Results\n\n\nThe Evaluation tool helps you:\nIdentify edge cases where your prompt might falter\nRate individual results to determine cases where your prompt performance better or worse\nEnsure consistent performance across a range of inputs\nRefine your prompt for better reliability\nBy reviewing results across test cases, you can spot patterns and make informed adjustments to your prompt.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\n\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nStart evaluating your prompts today to build more robust AI applications with Claude!\nReducing latencyGlossaryxlinkedin\nReducing latencyGlossary\nxlinkedin\nAccessing the Evaluate Feature Creating Test Cases Tips for Effective Evaluation Understanding Results\nAccessing the Evaluate FeatureCreating Test CasesTips for Effective EvaluationUnderstanding Results\n \n\nSummary: \n The Evaluation tool helps users identify edge cases, rate individual results, ensure consistent performance, and refine prompts for better reliability. By reviewing results across test cases, users can spot patterns and make informed adjustments to their prompts. The Evaluation tool is currently in beta, and user feedback is valuable for the Anthropic team. \n \n\n \n Prompt engineering tutorial\n\nText\n Prompt engineering tutorial\n\n\nIf you’re an interactive learner, you can dive into our interactive tutorials instead!\nGitHub prompting tutorialAn example-filled tutorial that covers the prompt engineering concepts found in our docs.Google Sheets prompting tutorialA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\nGitHub prompting tutorialAn example-filled tutorial that covers the prompt engineering concepts found in our docs.\n\nGitHub prompting tutorial\nAn example-filled tutorial that covers the prompt engineering concepts found in our docs.\nGoogle Sheets prompting tutorialA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\n\nGoogle Sheets prompting tutorial\nA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\nDevelop test casesPrompt generatorxlinkedin\nDevelop test casesPrompt generator\nxlinkedin\nBefore prompt engineering When to prompt engineer How to prompt engineer Prompt engineering tutorial\nBefore prompt engineeringWhen to prompt engineerHow to prompt engineerPrompt engineering tutorial\n \n\nSummary: \n Anthropic's documentation includes a prompt engineering tutorial, which is available in two formats: a GitHub-based tutorial with examples, and a lighter-weight version in a Google Sheets spreadsheet. These tutorials cover the concepts and techniques of prompt engineering for Anthropic's Claude AI model. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Understanding Results\n\nText\n Understanding Results\n\n\nThe Evaluation tool helps you:\nIdentify edge cases where your prompt might falter\nRate individual results to determine cases where your prompt performance better or worse\nEnsure consistent performance across a range of inputs\nRefine your prompt for better reliability\nBy reviewing results across test cases, you can spot patterns and make informed adjustments to your prompt.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\n\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nStart evaluating your prompts today to build more robust AI applications with Claude!\nReducing latencyGlossaryxlinkedin\nReducing latencyGlossary\nxlinkedin\nAccessing the Evaluate Feature Creating Test Cases Tips for Effective Evaluation Understanding Results\nAccessing the Evaluate FeatureCreating Test CasesTips for Effective EvaluationUnderstanding Results\n \n\nSummary: \n The Evaluation tool helps users identify edge cases, rate individual results, ensure consistent performance, and refine prompts for better reliability. By reviewing results across test cases, users can spot patterns and make informed adjustments to their prompts. The Evaluation tool is currently in beta, and user feedback is valuable for the Anthropic team. \n \n\n \n Prompt engineering tutorial\n\nText\n Prompt engineering tutorial\n\n\nIf you’re an interactive learner, you can dive into our interactive tutorials instead!\nGitHub prompting tutorialAn example-filled tutorial that covers the prompt engineering concepts found in our docs.Google Sheets prompting tutorialA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\nGitHub prompting tutorialAn example-filled tutorial that covers the prompt engineering concepts found in our docs.\n\nGitHub prompting tutorial\nAn example-filled tutorial that covers the prompt engineering concepts found in our docs.\nGoogle Sheets prompting tutorialA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\n\nGoogle Sheets prompting tutorial\nA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\nDevelop test casesPrompt generatorxlinkedin\nDevelop test casesPrompt generator\nxlinkedin\nBefore prompt engineering When to prompt engineer How to prompt engineer Prompt engineering tutorial\nBefore prompt engineeringWhen to prompt engineerHow to prompt engineerPrompt engineering tutorial\n \n\nSummary: \n Anthropic's documentation includes a prompt engineering tutorial, which is available in two formats: a GitHub-based tutorial with examples, and a lighter-weight version in a Google Sheets spreadsheet. These tutorials cover the concepts and techniques of prompt engineering for Anthropic's Claude AI model. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
@@ -26458,7 +26458,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
@@ -26554,7 +26554,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nText\n May 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n\nSummary: \n Anthropic has released a Prompt Generator tool in the Developer Console, which helps users create high-quality prompts tailored to their specific tasks. The tool is discussed in a recent blog post, and is part of Anthropic's suite of Claude AI model-related products and services. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Next steps\n\nText\n Next steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n\nSummary: \n The summary covers the next steps after making an initial Claude API request, including exploring the Prompt Engineering Guide to optimize Claude's performance, the Claude Cookbook for interactive Jupyter notebooks, and the Prompt Library for example prompts across use cases. It also mentions the overview and prerequisites for working with the Anthropic platform. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nText\n May 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n\nSummary: \n Anthropic has released a Prompt Generator tool in the Developer Console, which helps users create high-quality prompts tailored to their specific tasks. The tool is discussed in a recent blog post, and is part of Anthropic's suite of Claude AI model-related products and services. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Next steps\n\nText\n Next steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n\nSummary: \n The summary covers the next steps after making an initial Claude API request, including exploring the Prompt Engineering Guide to optimize Claude's performance, the Claude Cookbooks for interactive Jupyter notebooks, and the Prompt Library for example prompts across use cases. It also mentions the overview and prerequisites for working with the Anthropic platform. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
@@ -26605,7 +26605,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
@@ -26656,7 +26656,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Before prompt engineering\n\nBefore prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Before prompt engineering\n\nBefore prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_three"
},
"vars": {
@@ -26752,7 +26752,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nText\n May 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n\nSummary: \n Anthropic has released a Prompt Generator tool in the Developer Console, which helps users create high-quality prompts tailored to their specific tasks. The tool is discussed in a recent blog post, and is part of Anthropic's suite of Claude AI model-related products and services. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Next steps\n\nText\n Next steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n\nSummary: \n The summary covers the next steps after making an initial Claude API request, including exploring the Prompt Engineering Guide to optimize Claude's performance, the Claude Cookbook for interactive Jupyter notebooks, and the Prompt Library for example prompts across use cases. It also mentions the overview and prerequisites for working with the Anthropic platform. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nText\n May 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n\nSummary: \n Anthropic has released a Prompt Generator tool in the Developer Console, which helps users create high-quality prompts tailored to their specific tasks. The tool is discussed in a recent blog post, and is part of Anthropic's suite of Claude AI model-related products and services. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Next steps\n\nText\n Next steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n\nSummary: \n The summary covers the next steps after making an initial Claude API request, including exploring the Prompt Engineering Guide to optimize Claude's performance, the Claude Cookbooks for interactive Jupyter notebooks, and the Prompt Library for example prompts across use cases. It also mentions the overview and prerequisites for working with the Anthropic platform. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
@@ -26956,7 +26956,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Before prompt engineering\n\nBefore prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Before prompt engineering\n\nBefore prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_three"
},
"vars": {
@@ -28262,7 +28262,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nDive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nDive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
@@ -28359,7 +28359,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Prompt examples\n\nText\n Prompt examples\n\n\nMany of the prompting techniques that work well for text-based interactions with Claude can also be applied to image-based prompts.\nThese examples demonstrate best practice prompt structures involving images.\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\n\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\n \n\nSummary: \n Prompt examples demonstrate that many text-based techniques can be applied to image-based prompts with Claude. The model works best when images are placed before text, but images after text or interspersed with text will also perform well. Anthropic recommends an image-then-text structure if the use case allows it. \n \n\n \n Dive deeper into vision\n\nText\n Dive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n \n\nSummary: \n This documentation covers resources for using images with the Claude AI model, including a multimodal cookbook with tips on effective prompting, an API reference for the Messages API, and information on image size, costs, and quality. It also provides prompt examples and addresses limitations and FAQs around using vision capabilities. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Prompt examples\n\nText\n Prompt examples\n\n\nMany of the prompting techniques that work well for text-based interactions with Claude can also be applied to image-based prompts.\nThese examples demonstrate best practice prompt structures involving images.\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\n\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\n \n\nSummary: \n Prompt examples demonstrate that many text-based techniques can be applied to image-based prompts with Claude. The model works best when images are placed before text, but images after text or interspersed with text will also perform well. Anthropic recommends an image-then-text structure if the use case allows it. \n \n\n \n Dive deeper into vision\n\nText\n Dive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n \n\nSummary: \n This documentation covers resources for using images with the Claude AI model, including a multimodal cookbook with tips on effective prompting, an API reference for the Messages API, and information on image size, costs, and quality. It also provides prompt examples and addresses limitations and FAQs around using vision capabilities. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
@@ -28367,7 +28367,7 @@
"correct_answer": "Anthropic provides a multimodal cookbook with tips on getting started with images and best practices, as well as API reference documentation for the Messages API that includes example API calls involving images."
},
"response": {
- "output": "Anthropic provides several helpful resources to dive deeper into building with images using Claude:\n\n1. Multimodal Cookbook: This resource provides tips and best practices for effectively prompting Claude with images, such as how to interpret and analyze charts or extract content from forms.\n\n2. API Reference: The documentation for the Messages API includes example API calls involving images, allowing developers to explore and implement image-based functionality.\n\n3. Developer Community: Anthropic encourages developers to join their community to connect with other creators and get help from Anthropic experts on using the vision capabilities of Claude.\n\n4. Developer Console: Anthropic's browser-based Workbench and prompt generator tool enable easier and more powerful prompting, including with images.\n\n5. Claude Cookbook: Interactive Jupyter notebooks demonstrate how to use features like uploading PDFs and working with embeddings, which can be applied to image-based use cases.",
+ "output": "Anthropic provides several helpful resources to dive deeper into building with images using Claude:\n\n1. Multimodal Cookbook: This resource provides tips and best practices for effectively prompting Claude with images, such as how to interpret and analyze charts or extract content from forms.\n\n2. API Reference: The documentation for the Messages API includes example API calls involving images, allowing developers to explore and implement image-based functionality.\n\n3. Developer Community: Anthropic encourages developers to join their community to connect with other creators and get help from Anthropic experts on using the vision capabilities of Claude.\n\n4. Developer Console: Anthropic's browser-based Workbench and prompt generator tool enable easier and more powerful prompting, including with images.\n\n5. Claude Cookbooks: Interactive Jupyter notebooks demonstrate how to use features like uploading PDFs and working with embeddings, which can be applied to image-based use cases.",
"tokenUsage": {
"total": 1347,
"prompt": 1149,
@@ -28380,11 +28380,11 @@
"namedScores": {},
"latencyMs": 1886,
"cost": 0.00053475,
- "error": "The Generated Answer is partially correct but contains some inaccuracies and additional information not mentioned in the Correct Answer. The Generated Answer correctly mentions the multimodal cookbook and API reference documentation, which align with the Correct Answer. However, it also includes information about a developer community, developer console, and Claude Cookbook, which are not mentioned in the Correct Answer. While these additional resources may exist, they are not part of the specific resources mentioned in the Correct Answer for diving deeper into building with images using Claude. Due to these additions and the absence of some specific details (like mentioning \"Messages API\" explicitly), the Generated Answer cannot be considered fully correct based on the given Correct Answer.",
+ "error": "The Generated Answer is partially correct but contains some inaccuracies and additional information not mentioned in the Correct Answer. The Generated Answer correctly mentions the multimodal cookbook and API reference documentation, which align with the Correct Answer. However, it also includes information about a developer community, developer console, and Claude Cookbooks, which are not mentioned in the Correct Answer. While these additional resources may exist, they are not part of the specific resources mentioned in the Correct Answer for diving deeper into building with images using Claude. Due to these additions and the absence of some specific details (like mentioning \"Messages API\" explicitly), the Generated Answer cannot be considered fully correct based on the given Correct Answer.",
"gradingResult": {
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but contains some inaccuracies and additional information not mentioned in the Correct Answer. The Generated Answer correctly mentions the multimodal cookbook and API reference documentation, which align with the Correct Answer. However, it also includes information about a developer community, developer console, and Claude Cookbook, which are not mentioned in the Correct Answer. While these additional resources may exist, they are not part of the specific resources mentioned in the Correct Answer for diving deeper into building with images using Claude. Due to these additions and the absence of some specific details (like mentioning \"Messages API\" explicitly), the Generated Answer cannot be considered fully correct based on the given Correct Answer.",
+ "reason": "The Generated Answer is partially correct but contains some inaccuracies and additional information not mentioned in the Correct Answer. The Generated Answer correctly mentions the multimodal cookbook and API reference documentation, which align with the Correct Answer. However, it also includes information about a developer community, developer console, and Claude Cookbooks, which are not mentioned in the Correct Answer. While these additional resources may exist, they are not part of the specific resources mentioned in the Correct Answer for diving deeper into building with images using Claude. Due to these additions and the absence of some specific details (like mentioning \"Messages API\" explicitly), the Generated Answer cannot be considered fully correct based on the given Correct Answer.",
"namedScores": {},
"tokensUsed": {
"total": 0,
@@ -28395,7 +28395,7 @@
{
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but contains some inaccuracies and additional information not mentioned in the Correct Answer. The Generated Answer correctly mentions the multimodal cookbook and API reference documentation, which align with the Correct Answer. However, it also includes information about a developer community, developer console, and Claude Cookbook, which are not mentioned in the Correct Answer. While these additional resources may exist, they are not part of the specific resources mentioned in the Correct Answer for diving deeper into building with images using Claude. Due to these additions and the absence of some specific details (like mentioning \"Messages API\" explicitly), the Generated Answer cannot be considered fully correct based on the given Correct Answer.",
+ "reason": "The Generated Answer is partially correct but contains some inaccuracies and additional information not mentioned in the Correct Answer. The Generated Answer correctly mentions the multimodal cookbook and API reference documentation, which align with the Correct Answer. However, it also includes information about a developer community, developer console, and Claude Cookbooks, which are not mentioned in the Correct Answer. While these additional resources may exist, they are not part of the specific resources mentioned in the Correct Answer for diving deeper into building with images using Claude. Due to these additions and the absence of some specific details (like mentioning \"Messages API\" explicitly), the Generated Answer cannot be considered fully correct based on the given Correct Answer.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -28411,7 +28411,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nDive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nDive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_base"
},
"vars": {
@@ -28419,7 +28419,7 @@
"correct_answer": "Anthropic provides a multimodal cookbook with tips on getting started with images and best practices, as well as API reference documentation for the Messages API that includes example API calls involving images."
},
"response": {
- "output": "Anthropic provides several helpful resources for diving deeper into building with images using Claude:\n\n1. Multimodal cookbook: This contains tips for getting started with images, best practices, and techniques to ensure high-quality performance. It includes examples of how to effectively prompt Claude with images for tasks like interpreting charts or extracting content from forms.\n\n2. API reference: The documentation for the Messages API includes example API calls involving images.\n\n3. Developer Console: This includes the Workbench and prompt generator tool for easier and more powerful prompting in your browser.\n\n4. Claude Cookbook: This features interactive Jupyter notebooks demonstrating various capabilities, including working with images.\n\n5. Support team: Anthropic offers the option to reach out to their support team for additional questions.\n\n6. Developer community: Users can join Anthropic's developer community to connect with other creators and get help from Anthropic experts.\n\nThese resources provide a range of options for developers to learn about and implement image-related functionalities with Claude, from documentation and examples to interactive tools and community support.",
+ "output": "Anthropic provides several helpful resources for diving deeper into building with images using Claude:\n\n1. Multimodal cookbook: This contains tips for getting started with images, best practices, and techniques to ensure high-quality performance. It includes examples of how to effectively prompt Claude with images for tasks like interpreting charts or extracting content from forms.\n\n2. API reference: The documentation for the Messages API includes example API calls involving images.\n\n3. Developer Console: This includes the Workbench and prompt generator tool for easier and more powerful prompting in your browser.\n\n4. Claude Cookbooks: This features interactive Jupyter notebooks demonstrating various capabilities, including working with images.\n\n5. Support team: Anthropic offers the option to reach out to their support team for additional questions.\n\n6. Developer community: Users can join Anthropic's developer community to connect with other creators and get help from Anthropic experts.\n\nThese resources provide a range of options for developers to learn about and implement image-related functionalities with Claude, from documentation and examples to interactive tools and community support.",
"tokenUsage": {
"total": 1057,
"prompt": 828,
@@ -28432,11 +28432,11 @@
"namedScores": {},
"latencyMs": 4457,
"cost": 0.005919000000000001,
- "error": "The Generated Answer is partially correct but contains additional information not present in the Correct Answer. The key points from the Correct Answer - the multimodal cookbook with tips and best practices, and the API reference documentation with example API calls involving images - are both included in the Generated Answer. However, the Generated Answer goes beyond this by mentioning additional resources like the Developer Console, Claude Cookbook, support team, and developer community, which are not mentioned in the Correct Answer. While these additional resources may be helpful, they are not specified in the Correct Answer as resources specifically for diving deeper into building with images using Claude. Therefore, the Generated Answer, while containing the correct information, also includes extra information that cannot be verified based on the given Correct Answer.",
+ "error": "The Generated Answer is partially correct but contains additional information not present in the Correct Answer. The key points from the Correct Answer - the multimodal cookbook with tips and best practices, and the API reference documentation with example API calls involving images - are both included in the Generated Answer. However, the Generated Answer goes beyond this by mentioning additional resources like the Developer Console, Claude Cookbooks, support team, and developer community, which are not mentioned in the Correct Answer. While these additional resources may be helpful, they are not specified in the Correct Answer as resources specifically for diving deeper into building with images using Claude. Therefore, the Generated Answer, while containing the correct information, also includes extra information that cannot be verified based on the given Correct Answer.",
"gradingResult": {
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but contains additional information not present in the Correct Answer. The key points from the Correct Answer - the multimodal cookbook with tips and best practices, and the API reference documentation with example API calls involving images - are both included in the Generated Answer. However, the Generated Answer goes beyond this by mentioning additional resources like the Developer Console, Claude Cookbook, support team, and developer community, which are not mentioned in the Correct Answer. While these additional resources may be helpful, they are not specified in the Correct Answer as resources specifically for diving deeper into building with images using Claude. Therefore, the Generated Answer, while containing the correct information, also includes extra information that cannot be verified based on the given Correct Answer.",
+ "reason": "The Generated Answer is partially correct but contains additional information not present in the Correct Answer. The key points from the Correct Answer - the multimodal cookbook with tips and best practices, and the API reference documentation with example API calls involving images - are both included in the Generated Answer. However, the Generated Answer goes beyond this by mentioning additional resources like the Developer Console, Claude Cookbooks, support team, and developer community, which are not mentioned in the Correct Answer. While these additional resources may be helpful, they are not specified in the Correct Answer as resources specifically for diving deeper into building with images using Claude. Therefore, the Generated Answer, while containing the correct information, also includes extra information that cannot be verified based on the given Correct Answer.",
"namedScores": {},
"tokensUsed": {
"total": 0,
@@ -28447,7 +28447,7 @@
{
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but contains additional information not present in the Correct Answer. The key points from the Correct Answer - the multimodal cookbook with tips and best practices, and the API reference documentation with example API calls involving images - are both included in the Generated Answer. However, the Generated Answer goes beyond this by mentioning additional resources like the Developer Console, Claude Cookbook, support team, and developer community, which are not mentioned in the Correct Answer. While these additional resources may be helpful, they are not specified in the Correct Answer as resources specifically for diving deeper into building with images using Claude. Therefore, the Generated Answer, while containing the correct information, also includes extra information that cannot be verified based on the given Correct Answer.",
+ "reason": "The Generated Answer is partially correct but contains additional information not present in the Correct Answer. The key points from the Correct Answer - the multimodal cookbook with tips and best practices, and the API reference documentation with example API calls involving images - are both included in the Generated Answer. However, the Generated Answer goes beyond this by mentioning additional resources like the Developer Console, Claude Cookbooks, support team, and developer community, which are not mentioned in the Correct Answer. While these additional resources may be helpful, they are not specified in the Correct Answer as resources specifically for diving deeper into building with images using Claude. Therefore, the Generated Answer, while containing the correct information, also includes extra information that cannot be verified based on the given Correct Answer.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -28463,7 +28463,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Dive deeper into vision\n\nDive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n \n \n\n \n Claude Cookbook\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Vision\n\nVision\n\n\nClaude can read both text and images in requests. Currently, we support the base64 source type for images, and the image/jpeg, image/png, image/gif, and image/webp media types. See our vision guide for more details.\nShell Python TypeScript #!/bin/sh IMAGE_URL = \"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\" IMAGE_MEDIA_TYPE = \"image/jpeg\" IMAGE_BASE64 = $( curl \" $IMAGE_URL \" | base64 ) curl https://api.anthropic.com/v1/messages \\ --header \"x-api-key: $ANTHROPIC_API_KEY \" \\ --header \"anthropic-version: 2023-06-01\" \\ --header \"content-type: application/json\" \\ --data \\ '{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"' $IMAGE_MEDIA_TYPE '\",\n \"data\": \"' $IMAGE_BASE64 '\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\nShellPythonTypeScript\nShellPythonTypeScript\nShell\nShell\n\nPython\nPython\nTypeScript\nTypeScript\n\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n```\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n\n```\nJSON{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\nJSON\nJSON\n\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n```\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n\n```\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Dive deeper into vision\n\nDive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n \n \n\n \n Claude Cookbooks\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Vision\n\nVision\n\n\nClaude can read both text and images in requests. Currently, we support the base64 source type for images, and the image/jpeg, image/png, image/gif, and image/webp media types. See our vision guide for more details.\nShell Python TypeScript #!/bin/sh IMAGE_URL = \"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\" IMAGE_MEDIA_TYPE = \"image/jpeg\" IMAGE_BASE64 = $( curl \" $IMAGE_URL \" | base64 ) curl https://api.anthropic.com/v1/messages \\ --header \"x-api-key: $ANTHROPIC_API_KEY \" \\ --header \"anthropic-version: 2023-06-01\" \\ --header \"content-type: application/json\" \\ --data \\ '{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"' $IMAGE_MEDIA_TYPE '\",\n \"data\": \"' $IMAGE_BASE64 '\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\nShellPythonTypeScript\nShellPythonTypeScript\nShell\nShell\n\nPython\nPython\nTypeScript\nTypeScript\n\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n```\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n\n```\nJSON{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\nJSON\nJSON\n\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n```\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n\n```\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_three"
},
"vars": {
@@ -28514,7 +28514,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Prompt examples\n\nText\n Prompt examples\n\n\nMany of the prompting techniques that work well for text-based interactions with Claude can also be applied to image-based prompts.\nThese examples demonstrate best practice prompt structures involving images.\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\n\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\n \n\nSummary: \n Prompt examples demonstrate that many text-based techniques can be applied to image-based prompts with Claude. The model works best when images are placed before text, but images after text or interspersed with text will also perform well. Anthropic recommends an image-then-text structure if the use case allows it. \n \n\n \n Dive deeper into vision\n\nText\n Dive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n \n\nSummary: \n This documentation covers resources for using images with the Claude AI model, including a multimodal cookbook with tips on effective prompting, an API reference for the Messages API, and information on image size, costs, and quality. It also provides prompt examples and addresses limitations and FAQs around using vision capabilities. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Prompt examples\n\nText\n Prompt examples\n\n\nMany of the prompting techniques that work well for text-based interactions with Claude can also be applied to image-based prompts.\nThese examples demonstrate best practice prompt structures involving images.\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\n\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\n \n\nSummary: \n Prompt examples demonstrate that many text-based techniques can be applied to image-based prompts with Claude. The model works best when images are placed before text, but images after text or interspersed with text will also perform well. Anthropic recommends an image-then-text structure if the use case allows it. \n \n\n \n Dive deeper into vision\n\nText\n Dive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n \n\nSummary: \n This documentation covers resources for using images with the Claude AI model, including a multimodal cookbook with tips on effective prompting, an API reference for the Messages API, and information on image size, costs, and quality. It also provides prompt examples and addresses limitations and FAQs around using vision capabilities. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
@@ -28522,7 +28522,7 @@
"correct_answer": "Anthropic provides a multimodal cookbook with tips on getting started with images and best practices, as well as API reference documentation for the Messages API that includes example API calls involving images."
},
"response": {
- "output": "Anthropic provides several helpful resources for diving deeper into building with images using Claude:\n\n1. Multimodal cookbook: This contains tips on getting started with images and best practices for using Claude with visual inputs. It demonstrates techniques for tasks like interpreting charts and extracting content from forms.\n\n2. API reference: The documentation for the Messages API includes example API calls involving images.\n\n3. Developer Console: This browser-based tool includes a Workbench and prompt generator to help with image-based prompting.\n\n4. Claude Cookbook: This resource offers interactive Jupyter notebooks that demonstrate various capabilities, which may include image-related functionalities.\n\n5. Developer community: Anthropic has a community where developers can connect with each other and get help from Anthropic experts on topics like working with images.\n\n6. Support team: Anthropic's support team is available to answer questions about using Claude with images.\n\nAdditionally, the documentation provides guidance on image size evaluation, cost calculation, ensuring image quality, and offers prompt examples specifically for working with images. There's also information on limitations and FAQs related to Claude's vision capabilities.",
+ "output": "Anthropic provides several helpful resources for diving deeper into building with images using Claude:\n\n1. Multimodal cookbook: This contains tips on getting started with images and best practices for using Claude with visual inputs. It demonstrates techniques for tasks like interpreting charts and extracting content from forms.\n\n2. API reference: The documentation for the Messages API includes example API calls involving images.\n\n3. Developer Console: This browser-based tool includes a Workbench and prompt generator to help with image-based prompting.\n\n4. Claude Cookbooks: This resource offers interactive Jupyter notebooks that demonstrate various capabilities, which may include image-related functionalities.\n\n5. Developer community: Anthropic has a community where developers can connect with each other and get help from Anthropic experts on topics like working with images.\n\n6. Support team: Anthropic's support team is available to answer questions about using Claude with images.\n\nAdditionally, the documentation provides guidance on image size evaluation, cost calculation, ensuring image quality, and offers prompt examples specifically for working with images. There's also information on limitations and FAQs related to Claude's vision capabilities.",
"tokenUsage": {
"total": 1392,
"prompt": 1149,
@@ -28549,7 +28549,7 @@
{
"pass": true,
"score": 1,
- "reason": "The Generated Answer is essentially correct, but it includes additional information not mentioned in the Correct Answer. The key points from the Correct Answer - the multimodal cookbook with tips and best practices, and the API reference documentation with example API calls involving images - are both included in the Generated Answer. However, the Generated Answer goes beyond this by mentioning several other resources like the Developer Console, Claude Cookbook, developer community, and support team. While these additional resources may be helpful, they weren't specified in the Correct Answer. Despite this extra information, the Generated Answer doesn't contradict the Correct Answer or omit any critical information from it. Therefore, the substance of the Generated Answer aligns with the Correct Answer.",
+ "reason": "The Generated Answer is essentially correct, but it includes additional information not mentioned in the Correct Answer. The key points from the Correct Answer - the multimodal cookbook with tips and best practices, and the API reference documentation with example API calls involving images - are both included in the Generated Answer. However, the Generated Answer goes beyond this by mentioning several other resources like the Developer Console, Claude Cookbooks, developer community, and support team. While these additional resources may be helpful, they weren't specified in the Correct Answer. Despite this extra information, the Generated Answer doesn't contradict the Correct Answer or omit any critical information from it. Therefore, the substance of the Generated Answer aligns with the Correct Answer.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -28611,7 +28611,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Dive deeper into vision\n\nDive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n \n \n\n \n Claude Cookbook\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Vision\n\nVision\n\n\nClaude can read both text and images in requests. Currently, we support the base64 source type for images, and the image/jpeg, image/png, image/gif, and image/webp media types. See our vision guide for more details.\nShell Python TypeScript #!/bin/sh IMAGE_URL = \"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\" IMAGE_MEDIA_TYPE = \"image/jpeg\" IMAGE_BASE64 = $( curl \" $IMAGE_URL \" | base64 ) curl https://api.anthropic.com/v1/messages \\ --header \"x-api-key: $ANTHROPIC_API_KEY \" \\ --header \"anthropic-version: 2023-06-01\" \\ --header \"content-type: application/json\" \\ --data \\ '{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"' $IMAGE_MEDIA_TYPE '\",\n \"data\": \"' $IMAGE_BASE64 '\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\nShellPythonTypeScript\nShellPythonTypeScript\nShell\nShell\n\nPython\nPython\nTypeScript\nTypeScript\n\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n```\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n\n```\nJSON{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\nJSON\nJSON\n\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n```\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n\n```\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Dive deeper into vision\n\nDive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n \n \n\n \n Claude Cookbooks\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Vision\n\nVision\n\n\nClaude can read both text and images in requests. Currently, we support the base64 source type for images, and the image/jpeg, image/png, image/gif, and image/webp media types. See our vision guide for more details.\nShell Python TypeScript #!/bin/sh IMAGE_URL = \"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\" IMAGE_MEDIA_TYPE = \"image/jpeg\" IMAGE_BASE64 = $( curl \" $IMAGE_URL \" | base64 ) curl https://api.anthropic.com/v1/messages \\ --header \"x-api-key: $ANTHROPIC_API_KEY \" \\ --header \"anthropic-version: 2023-06-01\" \\ --header \"content-type: application/json\" \\ --data \\ '{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"' $IMAGE_MEDIA_TYPE '\",\n \"data\": \"' $IMAGE_BASE64 '\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\nShellPythonTypeScript\nShellPythonTypeScript\nShell\nShell\n\nPython\nPython\nTypeScript\nTypeScript\n\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n```\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n\n```\nJSON{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\nJSON\nJSON\n\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n```\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n\n```\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_three"
},
"vars": {
@@ -29014,7 +29014,7 @@
"label": "Haiku: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two key benefits of using the Anthropic Evaluation tool when developing prompts for an AI classification application?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Before prompt engineering\n\nText\n Before prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n\nSummary: \n This guide assumes you have a clear definition of success criteria, ways to empirically test against those criteria, and a first draft prompt to improve. If not, it suggests spending time establishing those first, and provides a prompt generator in the Claude Console as a starting point. \n \n\n \n Understanding Results\n\nText\n Understanding Results\n\n\nThe Evaluation tool helps you:\nIdentify edge cases where your prompt might falter\nRate individual results to determine cases where your prompt performance better or worse\nEnsure consistent performance across a range of inputs\nRefine your prompt for better reliability\nBy reviewing results across test cases, you can spot patterns and make informed adjustments to your prompt.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\n\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nStart evaluating your prompts today to build more robust AI applications with Claude!\nReducing latencyGlossaryxlinkedin\nReducing latencyGlossary\nxlinkedin\nAccessing the Evaluate Feature Creating Test Cases Tips for Effective Evaluation Understanding Results\nAccessing the Evaluate FeatureCreating Test CasesTips for Effective EvaluationUnderstanding Results\n \n\nSummary: \n The Evaluation tool helps users identify edge cases, rate individual results, ensure consistent performance, and refine prompts for better reliability. By reviewing results across test cases, users can spot patterns and make informed adjustments to their prompts. The Evaluation tool is currently in beta, and user feedback is valuable for the Anthropic team. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two key benefits of using the Anthropic Evaluation tool when developing prompts for an AI classification application?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Before prompt engineering\n\nText\n Before prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n\nSummary: \n This guide assumes you have a clear definition of success criteria, ways to empirically test against those criteria, and a first draft prompt to improve. If not, it suggests spending time establishing those first, and provides a prompt generator in the Claude Console as a starting point. \n \n\n \n Understanding Results\n\nText\n Understanding Results\n\n\nThe Evaluation tool helps you:\nIdentify edge cases where your prompt might falter\nRate individual results to determine cases where your prompt performance better or worse\nEnsure consistent performance across a range of inputs\nRefine your prompt for better reliability\nBy reviewing results across test cases, you can spot patterns and make informed adjustments to your prompt.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\n\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nStart evaluating your prompts today to build more robust AI applications with Claude!\nReducing latencyGlossaryxlinkedin\nReducing latencyGlossary\nxlinkedin\nAccessing the Evaluate Feature Creating Test Cases Tips for Effective Evaluation Understanding Results\nAccessing the Evaluate FeatureCreating Test CasesTips for Effective EvaluationUnderstanding Results\n \n\nSummary: \n The Evaluation tool helps users identify edge cases, rate individual results, ensure consistent performance, and refine prompts for better reliability. By reviewing results across test cases, users can spot patterns and make informed adjustments to their prompts. The Evaluation tool is currently in beta, and user feedback is valuable for the Anthropic team. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
@@ -29065,7 +29065,7 @@
"label": "3.5 Sonnet: T-0.0"
},
"prompt": {
- "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two key benefits of using the Anthropic Evaluation tool when developing prompts for an AI classification application?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Before prompt engineering\n\nText\n Before prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n\nSummary: \n This guide assumes you have a clear definition of success criteria, ways to empirically test against those criteria, and a first draft prompt to improve. If not, it suggests spending time establishing those first, and provides a prompt generator in the Claude Console as a starting point. \n \n\n \n Understanding Results\n\nText\n Understanding Results\n\n\nThe Evaluation tool helps you:\nIdentify edge cases where your prompt might falter\nRate individual results to determine cases where your prompt performance better or worse\nEnsure consistent performance across a range of inputs\nRefine your prompt for better reliability\nBy reviewing results across test cases, you can spot patterns and make informed adjustments to your prompt.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\n\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nStart evaluating your prompts today to build more robust AI applications with Claude!\nReducing latencyGlossaryxlinkedin\nReducing latencyGlossary\nxlinkedin\nAccessing the Evaluate Feature Creating Test Cases Tips for Effective Evaluation Understanding Results\nAccessing the Evaluate FeatureCreating Test CasesTips for Effective EvaluationUnderstanding Results\n \n\nSummary: \n The Evaluation tool helps users identify edge cases, rate individual results, ensure consistent performance, and refine prompts for better reliability. By reviewing results across test cases, users can spot patterns and make informed adjustments to their prompts. The Evaluation tool is currently in beta, and user feedback is valuable for the Anthropic team. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "raw": "\n You have been tasked with helping us to answer the following query: \n \n What are two key benefits of using the Anthropic Evaluation tool when developing prompts for an AI classification application?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Before prompt engineering\n\nText\n Before prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n\nSummary: \n This guide assumes you have a clear definition of success criteria, ways to empirically test against those criteria, and a first draft prompt to improve. If not, it suggests spending time establishing those first, and provides a prompt generator in the Claude Console as a starting point. \n \n\n \n Understanding Results\n\nText\n Understanding Results\n\n\nThe Evaluation tool helps you:\nIdentify edge cases where your prompt might falter\nRate individual results to determine cases where your prompt performance better or worse\nEnsure consistent performance across a range of inputs\nRefine your prompt for better reliability\nBy reviewing results across test cases, you can spot patterns and make informed adjustments to your prompt.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\n\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nStart evaluating your prompts today to build more robust AI applications with Claude!\nReducing latencyGlossaryxlinkedin\nReducing latencyGlossary\nxlinkedin\nAccessing the Evaluate Feature Creating Test Cases Tips for Effective Evaluation Understanding Results\nAccessing the Evaluate FeatureCreating Test CasesTips for Effective EvaluationUnderstanding Results\n \n\nSummary: \n The Evaluation tool helps users identify edge cases, rate individual results, ensure consistent performance, and refine prompts for better reliability. By reviewing results across test cases, users can spot patterns and make informed adjustments to their prompts. The Evaluation tool is currently in beta, and user feedback is valuable for the Anthropic team. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"label": "prompts.py:answer_query_level_two"
},
"vars": {
@@ -32252,7 +32252,7 @@
"score": 1,
"namedScores": {},
"text": "To use Claude to more easily digest the content of long PDF documents, you can follow these steps:\n\n1. Upload the PDF document to your application or environment where you can interact with Claude.\n\n2. Utilize Claude's text summarization capabilities to distill the key insights and main points from the lengthy PDF content. This will allow you to quickly understand the core information without having to read through the entire document.\n\n3. Optionally, you can also leverage Claude's question answering abilities to ask specific questions about the PDF content and get concise, relevant responses.\n\n4. If you need to extract structured data or insights from the PDF, you can use Claude's data/entity extraction capabilities to uncover relevant information, such as key facts, figures, or named entities.\n\n5. For advanced use cases, you can explore integrating external tools and functions with Claude, as described in the \"Tool Use & Function Calling\" section, to further extend its capabilities for working with PDF documents.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nControlling Claude’s output\n\n\n\n\nText capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nControlling Claude’s output\n\n\n\n\nText capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 2,
"tokenUsage": {},
@@ -32285,8 +32285,8 @@
"pass": true,
"score": 1,
"namedScores": {},
- "text": "To more easily digest the content of long PDF documents using Claude, you can take advantage of Claude's PDF upload and summarization capabilities. Here's how you can do it:\n\n1. Upload the PDF document to Claude. The system allows you to directly upload PDFs for Claude to analyze.\n\n2. Once uploaded, you can ask Claude to summarize the content of the PDF. Claude has the ability to distill lengthy content into key insights, making it easier for you to grasp the main points without reading the entire document.\n\n3. You can also ask specific questions about the content of the PDF. Claude's question-answering capabilities allow you to extract particular information or clarify certain points from the document.\n\n4. For even more structured insights, you can use Claude's data and entity extraction abilities to pull out specific types of information from the PDF, such as names, dates, or key statistics.\n\n5. If needed, you can ask Claude to provide analysis or recommendations based on the PDF content, helping you to understand sentiment, patterns, or implications within the document.\n\nBy leveraging these capabilities, you can quickly get the essence of long PDF documents, saving time and improving your understanding of the material. The Claude Cookbook mentioned in the context provides practical examples and tutorials on how to use these features effectively.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nControlling Claude’s output\n\n\n\n\nText capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "text": "To more easily digest the content of long PDF documents using Claude, you can take advantage of Claude's PDF upload and summarization capabilities. Here's how you can do it:\n\n1. Upload the PDF document to Claude. The system allows you to directly upload PDFs for Claude to analyze.\n\n2. Once uploaded, you can ask Claude to summarize the content of the PDF. Claude has the ability to distill lengthy content into key insights, making it easier for you to grasp the main points without reading the entire document.\n\n3. You can also ask specific questions about the content of the PDF. Claude's question-answering capabilities allow you to extract particular information or clarify certain points from the document.\n\n4. For even more structured insights, you can use Claude's data and entity extraction abilities to pull out specific types of information from the PDF, such as names, dates, or key statistics.\n\n5. If needed, you can ask Claude to provide analysis or recommendations based on the PDF content, helping you to understand sentiment, patterns, or implications within the document.\n\nBy leveraging these capabilities, you can quickly get the essence of long PDF documents, saving time and improving your understanding of the material. The Claude Cookbooks mentioned in the context provides practical examples and tutorials on how to use these features effectively.",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nControlling Claude’s output\n\n\n\n\nText capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 5155,
"tokenUsage": {
@@ -32323,8 +32323,8 @@
"pass": true,
"score": 1,
"namedScores": {},
- "text": "To use Claude to more easily digest the content of long PDF documents, you can leverage the PDF Upload & Summarization capability described in the Claude Cookbook. This allows you to upload PDF files and have Claude summarize the key insights, making it easier to quickly understand the content of lengthy documents.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Text capabilities and use cases\n\nText\n Text capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n \n\nSummary: \n Claude has a broad range of text-based capabilities, including text summarization, content generation, data/entity extraction, question answering, text translation, text analysis and recommendations, dialogue and conversation, and code explanation and generation. These capabilities enable a wide variety of use cases, from crafting compelling content to building intelligent interactive systems and accelerating software development. \n \n\n \n Claude Cookbook\n\nText\n Claude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n\nSummary: \n The Claude Cookbook provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, how to extend Claude's capabilities by integrating external tools and functions, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks. \n \n\n \n What you can do with Claude\n\nText\n What you can do with Claude\n\n\nClaude is designed to empower enterprises at scale with strong performance across benchmark evaluations for reasoning, math, coding, and fluency in English and non-English languages.\nHere’s a non-exhaustive list of Claude’s capabilities and common uses.\nCapabilityEnables you to…Text and code generationAdhere to brand voice for excellent customer-facing experiences such as copywriting and chatbotsCreate production-level code and operate (in-line code generation, debugging, and conversational querying) within complex codebasesBuild automatic translation features between languagesConduct complex financial forecastsSupport legal use cases that require high-quality technical analysis, long context windows for processing detailed documents, and fast outputsVisionProcess and analyze visual input, such as extracting insights from charts and graphsGenerate code from images with code snippets or templates based on diagramsDescribe an image for a user with low visionTool useInteract with external client-side tools and functions, allowing Claude to reason, plan, and execute actions by generating structured outputs through API calls\nAdhere to brand voice for excellent customer-facing experiences such as copywriting and chatbotsCreate production-level code and operate (in-line code generation, debugging, and conversational querying) within complex codebasesBuild automatic translation features between languagesConduct complex financial forecastsSupport legal use cases that require high-quality technical analysis, long context windows for processing detailed documents, and fast outputs\nProcess and analyze visual input, such as extracting insights from charts and graphsGenerate code from images with code snippets or templates based on diagramsDescribe an image for a user with low vision\nInteract with external client-side tools and functions, allowing Claude to reason, plan, and execute actions by generating structured outputs through API calls\n \n\nSummary: \n Claude is a powerful AI model designed to assist enterprises with a wide range of capabilities, including text and code generation, language translation, financial forecasting, legal analysis, and visual processing. It can also interact with external tools and functions, allowing it to reason, plan, and execute actions through API calls. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "text": "To use Claude to more easily digest the content of long PDF documents, you can leverage the PDF Upload & Summarization capability described in the Claude Cookbooks. This allows you to upload PDF files and have Claude summarize the key insights, making it easier to quickly understand the content of lengthy documents.",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Text capabilities and use cases\n\nText\n Text capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n \n\nSummary: \n Claude has a broad range of text-based capabilities, including text summarization, content generation, data/entity extraction, question answering, text translation, text analysis and recommendations, dialogue and conversation, and code explanation and generation. These capabilities enable a wide variety of use cases, from crafting compelling content to building intelligent interactive systems and accelerating software development. \n \n\n \n Claude Cookbooks\n\nText\n Claude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n\nSummary: \n The Claude Cookbooks provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, how to extend Claude's capabilities by integrating external tools and functions, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks. \n \n\n \n What you can do with Claude\n\nText\n What you can do with Claude\n\n\nClaude is designed to empower enterprises at scale with strong performance across benchmark evaluations for reasoning, math, coding, and fluency in English and non-English languages.\nHere’s a non-exhaustive list of Claude’s capabilities and common uses.\nCapabilityEnables you to…Text and code generationAdhere to brand voice for excellent customer-facing experiences such as copywriting and chatbotsCreate production-level code and operate (in-line code generation, debugging, and conversational querying) within complex codebasesBuild automatic translation features between languagesConduct complex financial forecastsSupport legal use cases that require high-quality technical analysis, long context windows for processing detailed documents, and fast outputsVisionProcess and analyze visual input, such as extracting insights from charts and graphsGenerate code from images with code snippets or templates based on diagramsDescribe an image for a user with low visionTool useInteract with external client-side tools and functions, allowing Claude to reason, plan, and execute actions by generating structured outputs through API calls\nAdhere to brand voice for excellent customer-facing experiences such as copywriting and chatbotsCreate production-level code and operate (in-line code generation, debugging, and conversational querying) within complex codebasesBuild automatic translation features between languagesConduct complex financial forecastsSupport legal use cases that require high-quality technical analysis, long context windows for processing detailed documents, and fast outputs\nProcess and analyze visual input, such as extracting insights from charts and graphsGenerate code from images with code snippets or templates based on diagramsDescribe an image for a user with low vision\nInteract with external client-side tools and functions, allowing Claude to reason, plan, and execute actions by generating structured outputs through API calls\n \n\nSummary: \n Claude is a powerful AI model designed to assist enterprises with a wide range of capabilities, including text and code generation, language translation, financial forecasting, legal analysis, and visual processing. It can also interact with external tools and functions, allowing it to reason, plan, and execute actions through API calls. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 1028,
"tokenUsage": {
@@ -32346,7 +32346,7 @@
{
"pass": true,
"score": 1,
- "reason": "The Generated Answer is correct based on the Correct Answer. Both answers convey the key point that Claude can be used to summarize PDF documents, making it easier to understand long documents without reading everything. The Generated Answer provides some additional context by mentioning the \"Claude Cookbook\", but this doesn't contradict or omit any critical information from the Correct Answer. The essence of both answers is the same: Claude can summarize PDFs to help users digest long documents more easily.",
+ "reason": "The Generated Answer is correct based on the Correct Answer. Both answers convey the key point that Claude can be used to summarize PDF documents, making it easier to understand long documents without reading everything. The Generated Answer provides some additional context by mentioning the \"Claude Cookbooks\", but this doesn't contradict or omit any critical information from the Correct Answer. The essence of both answers is the same: Claude can summarize PDFs to help users digest long documents more easily.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -32362,7 +32362,7 @@
"score": 1,
"namedScores": {},
"text": "To more easily digest the content of long PDF documents using Claude, you can utilize the PDF Upload & Summarization feature. This capability allows you to upload PDF files and have Claude summarize their content, making it much easier to extract key information from lengthy documents.\n\nThe process typically involves:\n\n1. Uploading the PDF document to Claude.\n2. Asking Claude to summarize the content.\n3. Receiving a concise summary that distills the main points and key insights from the document.\n\nThis feature is particularly useful for quickly understanding the core ideas of long reports, research papers, or any other extensive PDF documents without having to read through the entire text yourself. It can save significant time and effort in processing large amounts of information.\n\nAdditionally, you can follow up with specific questions about the document's content, allowing Claude to provide more targeted information or explanations based on the full text it has processed.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Text capabilities and use cases\n\nText\n Text capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n \n\nSummary: \n Claude has a broad range of text-based capabilities, including text summarization, content generation, data/entity extraction, question answering, text translation, text analysis and recommendations, dialogue and conversation, and code explanation and generation. These capabilities enable a wide variety of use cases, from crafting compelling content to building intelligent interactive systems and accelerating software development. \n \n\n \n Claude Cookbook\n\nText\n Claude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n\nSummary: \n The Claude Cookbook provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, how to extend Claude's capabilities by integrating external tools and functions, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks. \n \n\n \n What you can do with Claude\n\nText\n What you can do with Claude\n\n\nClaude is designed to empower enterprises at scale with strong performance across benchmark evaluations for reasoning, math, coding, and fluency in English and non-English languages.\nHere’s a non-exhaustive list of Claude’s capabilities and common uses.\nCapabilityEnables you to…Text and code generationAdhere to brand voice for excellent customer-facing experiences such as copywriting and chatbotsCreate production-level code and operate (in-line code generation, debugging, and conversational querying) within complex codebasesBuild automatic translation features between languagesConduct complex financial forecastsSupport legal use cases that require high-quality technical analysis, long context windows for processing detailed documents, and fast outputsVisionProcess and analyze visual input, such as extracting insights from charts and graphsGenerate code from images with code snippets or templates based on diagramsDescribe an image for a user with low visionTool useInteract with external client-side tools and functions, allowing Claude to reason, plan, and execute actions by generating structured outputs through API calls\nAdhere to brand voice for excellent customer-facing experiences such as copywriting and chatbotsCreate production-level code and operate (in-line code generation, debugging, and conversational querying) within complex codebasesBuild automatic translation features between languagesConduct complex financial forecastsSupport legal use cases that require high-quality technical analysis, long context windows for processing detailed documents, and fast outputs\nProcess and analyze visual input, such as extracting insights from charts and graphsGenerate code from images with code snippets or templates based on diagramsDescribe an image for a user with low vision\nInteract with external client-side tools and functions, allowing Claude to reason, plan, and execute actions by generating structured outputs through API calls\n \n\nSummary: \n Claude is a powerful AI model designed to assist enterprises with a wide range of capabilities, including text and code generation, language translation, financial forecasting, legal analysis, and visual processing. It can also interact with external tools and functions, allowing it to reason, plan, and execute actions through API calls. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Text capabilities and use cases\n\nText\n Text capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n \n\nSummary: \n Claude has a broad range of text-based capabilities, including text summarization, content generation, data/entity extraction, question answering, text translation, text analysis and recommendations, dialogue and conversation, and code explanation and generation. These capabilities enable a wide variety of use cases, from crafting compelling content to building intelligent interactive systems and accelerating software development. \n \n\n \n Claude Cookbooks\n\nText\n Claude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n\nSummary: \n The Claude Cookbooks provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, how to extend Claude's capabilities by integrating external tools and functions, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks. \n \n\n \n What you can do with Claude\n\nText\n What you can do with Claude\n\n\nClaude is designed to empower enterprises at scale with strong performance across benchmark evaluations for reasoning, math, coding, and fluency in English and non-English languages.\nHere’s a non-exhaustive list of Claude’s capabilities and common uses.\nCapabilityEnables you to…Text and code generationAdhere to brand voice for excellent customer-facing experiences such as copywriting and chatbotsCreate production-level code and operate (in-line code generation, debugging, and conversational querying) within complex codebasesBuild automatic translation features between languagesConduct complex financial forecastsSupport legal use cases that require high-quality technical analysis, long context windows for processing detailed documents, and fast outputsVisionProcess and analyze visual input, such as extracting insights from charts and graphsGenerate code from images with code snippets or templates based on diagramsDescribe an image for a user with low visionTool useInteract with external client-side tools and functions, allowing Claude to reason, plan, and execute actions by generating structured outputs through API calls\nAdhere to brand voice for excellent customer-facing experiences such as copywriting and chatbotsCreate production-level code and operate (in-line code generation, debugging, and conversational querying) within complex codebasesBuild automatic translation features between languagesConduct complex financial forecastsSupport legal use cases that require high-quality technical analysis, long context windows for processing detailed documents, and fast outputs\nProcess and analyze visual input, such as extracting insights from charts and graphsGenerate code from images with code snippets or templates based on diagramsDescribe an image for a user with low vision\nInteract with external client-side tools and functions, allowing Claude to reason, plan, and execute actions by generating structured outputs through API calls\n \n\nSummary: \n Claude is a powerful AI model designed to assist enterprises with a wide range of capabilities, including text and code generation, language translation, financial forecasting, legal analysis, and visual processing. It can also interact with external tools and functions, allowing it to reason, plan, and execute actions through API calls. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 5216,
"tokenUsage": {
@@ -32399,8 +32399,8 @@
"pass": true,
"score": 1,
"namedScores": {},
- "text": "To use Claude to more easily digest the content of long PDF documents, you can follow these steps:\n\n1. Upload the PDF document to Claude using the \"PDF Upload & Summarization\" feature described in the Claude Cookbook. This will allow Claude to read and summarize the content of the PDF, making it easier to quickly understand the key points.\n\n2. Leverage Claude's text summarization capabilities to condense the lengthy PDF content into a concise summary. This can be done by providing the PDF document to Claude and requesting a high-level summary.\n\n3. If you need to dive deeper into specific sections of the PDF, you can use Claude's question answering abilities to ask targeted questions about the document's content. This will allow you to extract relevant information without having to read through the entire PDF.\n\n4. Additionally, you can use Claude's text analysis and recommendation features to identify key themes, sentiments, and insights within the PDF document, further aiding your understanding of the content.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbook\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Text capabilities and use cases\n\nText capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n \n \n\n \n More Resources\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "text": "To use Claude to more easily digest the content of long PDF documents, you can follow these steps:\n\n1. Upload the PDF document to Claude using the \"PDF Upload & Summarization\" feature described in the Claude Cookbooks. This will allow Claude to read and summarize the content of the PDF, making it easier to quickly understand the key points.\n\n2. Leverage Claude's text summarization capabilities to condense the lengthy PDF content into a concise summary. This can be done by providing the PDF document to Claude and requesting a high-level summary.\n\n3. If you need to dive deeper into specific sections of the PDF, you can use Claude's question answering abilities to ask targeted questions about the document's content. This will allow you to extract relevant information without having to read through the entire PDF.\n\n4. Additionally, you can use Claude's text analysis and recommendation features to identify key themes, sentiments, and insights within the PDF document, further aiding your understanding of the content.",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbooks\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Text capabilities and use cases\n\nText capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n \n \n\n \n More Resources\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 2554,
"tokenUsage": {
@@ -32438,7 +32438,7 @@
"score": 1,
"namedScores": {},
"text": "To more easily digest the content of long PDF documents using Claude, you can utilize the PDF Upload & Summarization feature. This capability allows you to upload PDF files and have Claude summarize their content, making it much easier to quickly grasp the key points of lengthy documents.\n\nThe process typically involves:\n\n1. Uploading the PDF document to Claude\n2. Asking Claude to summarize the content\n3. Receiving a concise summary that captures the main ideas and important details\n\nThis feature is particularly useful for executives, researchers, students, or anyone who needs to quickly understand the essence of long documents without reading them in full. You can then ask follow-up questions about specific parts of the summary or the original document if you need more detailed information on certain topics.\n\nAdditionally, you can combine this with other text capabilities like question answering to further interact with the document's content, or use data extraction to pull out specific types of information from the summarized text.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbook\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Text capabilities and use cases\n\nText capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n \n \n\n \n More Resources\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can I use Claude to more easily digest the content of long PDF documents?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbooks\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Text capabilities and use cases\n\nText capabilities and use cases\n\n\nClaude has a broad range of text-based capabilities, including but not limited to:\nCapabilityThis enables you to…Text SummarizationDistill lengthy content into key insights for executives, social media, or product teams.Content GenerationCraft compelling content from blog posts and emails to marketing slogans and product descriptions.Data / Entity ExtractionUncover structured insights from unstructured text like reviews, news articles, or transcripts.Question AnsweringBuild intelligent, interactive systems from customer support chatbots to educational AI tutors.Text TranslationSeamlessly communicate across languages in products, support, and content creation.Text Analysis & RecommendationsUnderstand sentiment, preferences, and patterns to personalize user experiences and offerings.Dialogue and ConversationCreate engaging, context-aware interactions in games, virtual assistants, and storytelling apps.Code Explanation & GenerationAccelerate development with instant code reviews, boilerplate generation, and interactive tutorials.\n \n \n\n \n More Resources\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 4320,
"tokenUsage": {
@@ -32818,7 +32818,7 @@
"score": 0,
"namedScores": {},
"text": "The Generated Answer is incorrect based on the Correct Answer. While the Generated Answer provides several valid metrics for evaluating the performance of a ticket classification system, it misses the specific metrics mentioned in the Correct Answer. The Correct Answer explicitly states that beyond accuracy, we should measure \"95th percentile response time and average cost per classification.\" These two critical pieces of information are not mentioned in the Generated Answer. The Generated Answer focuses on different metrics like precision, recall, F1-score, and others, which, while potentially useful, are not the specific metrics outlined in the Correct Answer for assessing the system's performance and production-readiness.\n---\nTo measure the performance of the ticket classification system implemented using Claude beyond just accuracy, you can consider the following additional metrics:\n\n1. Precision and Recall: Calculating the precision (the proportion of correctly classified tickets among all tickets classified as a particular class) and recall (the proportion of correctly classified tickets among all tickets that should have been classified as that class) can provide a more nuanced understanding of the system's performance.\n\n2. F1-Score: The F1-score is the harmonic mean of precision and recall, and it provides a balanced measure of the system's performance, taking both false positives and false negatives into account.\n\n3. Inference Time: Measuring the time it takes for the system to classify a ticket can be an important metric, especially if the system needs to operate in a real-time or near-real-time environment.\n\n4. Robustness: Evaluating the system's performance on edge cases, such as tickets with unusual language or formatting, can help assess its robustness and ability to handle diverse inputs.\n\n5. Explainability: Assessing the system's ability to provide human-readable explanations for its classification decisions can be valuable, as it can help build trust and facilitate further improvements.\n\n6. Feedback and Monitoring: Implementing a feedback loop to collect user feedback and monitor the system's performance in production can provide valuable insights for continuous improvement.\n\nBy considering these additional metrics, you can gain a more comprehensive understanding of the ticket classification system's performance, identify areas for improvement, and ensure that it meets the requirements of your specific use case.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Evaluating the Performance of your Ticket Routing Classifier\n\nText\n Evaluating the Performance of your Ticket Routing Classifier\n\n\nBefore deploying your ticket routing classifier to production, it’s crucial to evaluate its performance in terms of accuracy, cost, and speed. These three factors determine the readiness of your new system and boost confidence in its real-world effectiveness. A thorough evaluation helps you convince both technical and business stakeholders of the appropriateness and impact of your solution.\n \n\nSummary: \n Evaluating the performance of a ticket routing classifier is crucial before deployment, as it determines the accuracy, cost, and speed of the system. A thorough evaluation helps convince stakeholders of the appropriateness and impact of the solution, boosting confidence in its real-world effectiveness. \n \n\n \n Advantages of Using Claude\n\nText\n Advantages of Using Claude\n\n\nTraditionally, multi-class classification techniques in Natural Language Processing (NLP) have been used to categorize support tickets. However, these methods require a very large training dataset, complex ontology design, and inflexible class definition.\nUsing Large Language Models (LLMs) like Claude, text classification for customer support ticket routing has become significantly more efficient and effective, addressing the limitations of traditional ML techniques:\nMinimal training data: Claude’s pre-trained language model can understand and classify tickets with just a few dozen labeled examples, greatly reducing the time and cost associated with data preparation.\nAdaptability to changing classes: As your product or customer needs evolve, Claude can easily adapt to changes in class definitions or the introduction of new classes without extensive relabeling of training data\nSimplified ontology design: Claude’s advanced language understanding capabilities allow it to accurately classify tickets based on their content and context, rather than relying on strict ontological structures.\nInterpretable reasoning: Claude can generate human-readable explanations for its classification decisions, providing interpretable reasoning that builds trust in the automation system and allow you to easily adapt the approach if needed\n \n\nSummary: \n Claude, Anthropic's large language model, offers significant advantages over traditional multi-class classification techniques for customer support ticket routing. It requires minimal training data, can easily adapt to changing class definitions, and simplifies ontology design, while providing interpretable reasoning for its classification decisions. \n \n\n \n Deploy your classifier\n\nText\n Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n\nSummary: \n Deploy your classifier: Check out the Classification Guide in the Claude Cookbook for code examples on using Claude for classification. The guide covers when to use Claude for classification, establishing your use case, implementing Claude, building prompts, developing test cases, running evaluations, and deploying your classifier. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Evaluating the Performance of your Ticket Routing Classifier\n\nText\n Evaluating the Performance of your Ticket Routing Classifier\n\n\nBefore deploying your ticket routing classifier to production, it’s crucial to evaluate its performance in terms of accuracy, cost, and speed. These three factors determine the readiness of your new system and boost confidence in its real-world effectiveness. A thorough evaluation helps you convince both technical and business stakeholders of the appropriateness and impact of your solution.\n \n\nSummary: \n Evaluating the performance of a ticket routing classifier is crucial before deployment, as it determines the accuracy, cost, and speed of the system. A thorough evaluation helps convince stakeholders of the appropriateness and impact of the solution, boosting confidence in its real-world effectiveness. \n \n\n \n Advantages of Using Claude\n\nText\n Advantages of Using Claude\n\n\nTraditionally, multi-class classification techniques in Natural Language Processing (NLP) have been used to categorize support tickets. However, these methods require a very large training dataset, complex ontology design, and inflexible class definition.\nUsing Large Language Models (LLMs) like Claude, text classification for customer support ticket routing has become significantly more efficient and effective, addressing the limitations of traditional ML techniques:\nMinimal training data: Claude’s pre-trained language model can understand and classify tickets with just a few dozen labeled examples, greatly reducing the time and cost associated with data preparation.\nAdaptability to changing classes: As your product or customer needs evolve, Claude can easily adapt to changes in class definitions or the introduction of new classes without extensive relabeling of training data\nSimplified ontology design: Claude’s advanced language understanding capabilities allow it to accurately classify tickets based on their content and context, rather than relying on strict ontological structures.\nInterpretable reasoning: Claude can generate human-readable explanations for its classification decisions, providing interpretable reasoning that builds trust in the automation system and allow you to easily adapt the approach if needed\n \n\nSummary: \n Claude, Anthropic's large language model, offers significant advantages over traditional multi-class classification techniques for customer support ticket routing. It requires minimal training data, can easily adapt to changing class definitions, and simplifies ontology design, while providing interpretable reasoning for its classification decisions. \n \n\n \n Deploy your classifier\n\nText\n Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n\nSummary: \n Deploy your classifier: Check out the Classification Guide in the Claude Cookbooks for code examples on using Claude for classification. The guide covers when to use Claude for classification, establishing your use case, implementing Claude, building prompts, developing test cases, running evaluations, and deploying your classifier. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 2692,
"tokenUsage": {
@@ -32856,7 +32856,7 @@
"score": 0,
"namedScores": {},
"text": "The Generated Answer is incorrect based on the Correct Answer. While the Generated Answer provides a comprehensive list of potential metrics and approaches to evaluate the ticket classification system, it does not specifically mention the two key metrics highlighted in the Correct Answer: the 95th percentile response time and average cost per classification. These specific metrics are crucial for assessing the system's performance and production-readiness according to the Correct Answer. The Generated Answer touches on cost and speed in a general sense, but it lacks the precise metrics specified in the Correct Answer. Therefore, despite providing valuable additional information, the Generated Answer misses the critical pieces of information present in the Correct Answer.\n---\nTo measure the performance of the ticket classification system implemented using Claude beyond just accuracy, we can consider several additional metrics and approaches:\n\n1. Cost and speed: Evaluate the system's efficiency in terms of computational resources required and the time taken to classify tickets.\n\n2. Adaptability: Assess how well the system adapts to changing class definitions or the introduction of new classes without extensive retraining.\n\n3. Interpretability: Analyze the human-readable explanations provided by Claude for its classification decisions, which can help build trust in the system and allow for easy adjustments if needed.\n\n4. Minimal data requirements: Evaluate how well the system performs with limited training data, as Claude can work effectively with just a few dozen labeled examples.\n\n5. Confusion matrix: Examine the system's performance across different classes to identify any specific areas of strength or weakness.\n\n6. Precision and recall: Measure these metrics for each class to understand the system's ability to correctly identify relevant tickets and avoid false positives.\n\n7. F1 score: Calculate the harmonic mean of precision and recall to get a balanced measure of the classifier's performance.\n\n8. ROC curve and AUC: For binary classification tasks within the system, these metrics can provide insights into the classifier's ability to distinguish between classes.\n\n9. Cross-validation: Use techniques like k-fold cross-validation to ensure the system's performance is consistent across different subsets of data.\n\n10. Real-world impact: Assess the system's effectiveness in improving overall ticket routing efficiency and customer satisfaction in a production environment.\n\nBy considering these additional metrics and approaches, you can gain a more comprehensive understanding of the ticket classification system's performance beyond simple accuracy measures.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Evaluating the Performance of your Ticket Routing Classifier\n\nText\n Evaluating the Performance of your Ticket Routing Classifier\n\n\nBefore deploying your ticket routing classifier to production, it’s crucial to evaluate its performance in terms of accuracy, cost, and speed. These three factors determine the readiness of your new system and boost confidence in its real-world effectiveness. A thorough evaluation helps you convince both technical and business stakeholders of the appropriateness and impact of your solution.\n \n\nSummary: \n Evaluating the performance of a ticket routing classifier is crucial before deployment, as it determines the accuracy, cost, and speed of the system. A thorough evaluation helps convince stakeholders of the appropriateness and impact of the solution, boosting confidence in its real-world effectiveness. \n \n\n \n Advantages of Using Claude\n\nText\n Advantages of Using Claude\n\n\nTraditionally, multi-class classification techniques in Natural Language Processing (NLP) have been used to categorize support tickets. However, these methods require a very large training dataset, complex ontology design, and inflexible class definition.\nUsing Large Language Models (LLMs) like Claude, text classification for customer support ticket routing has become significantly more efficient and effective, addressing the limitations of traditional ML techniques:\nMinimal training data: Claude’s pre-trained language model can understand and classify tickets with just a few dozen labeled examples, greatly reducing the time and cost associated with data preparation.\nAdaptability to changing classes: As your product or customer needs evolve, Claude can easily adapt to changes in class definitions or the introduction of new classes without extensive relabeling of training data\nSimplified ontology design: Claude’s advanced language understanding capabilities allow it to accurately classify tickets based on their content and context, rather than relying on strict ontological structures.\nInterpretable reasoning: Claude can generate human-readable explanations for its classification decisions, providing interpretable reasoning that builds trust in the automation system and allow you to easily adapt the approach if needed\n \n\nSummary: \n Claude, Anthropic's large language model, offers significant advantages over traditional multi-class classification techniques for customer support ticket routing. It requires minimal training data, can easily adapt to changing class definitions, and simplifies ontology design, while providing interpretable reasoning for its classification decisions. \n \n\n \n Deploy your classifier\n\nText\n Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n\nSummary: \n Deploy your classifier: Check out the Classification Guide in the Claude Cookbook for code examples on using Claude for classification. The guide covers when to use Claude for classification, establishing your use case, implementing Claude, building prompts, developing test cases, running evaluations, and deploying your classifier. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Evaluating the Performance of your Ticket Routing Classifier\n\nText\n Evaluating the Performance of your Ticket Routing Classifier\n\n\nBefore deploying your ticket routing classifier to production, it’s crucial to evaluate its performance in terms of accuracy, cost, and speed. These three factors determine the readiness of your new system and boost confidence in its real-world effectiveness. A thorough evaluation helps you convince both technical and business stakeholders of the appropriateness and impact of your solution.\n \n\nSummary: \n Evaluating the performance of a ticket routing classifier is crucial before deployment, as it determines the accuracy, cost, and speed of the system. A thorough evaluation helps convince stakeholders of the appropriateness and impact of the solution, boosting confidence in its real-world effectiveness. \n \n\n \n Advantages of Using Claude\n\nText\n Advantages of Using Claude\n\n\nTraditionally, multi-class classification techniques in Natural Language Processing (NLP) have been used to categorize support tickets. However, these methods require a very large training dataset, complex ontology design, and inflexible class definition.\nUsing Large Language Models (LLMs) like Claude, text classification for customer support ticket routing has become significantly more efficient and effective, addressing the limitations of traditional ML techniques:\nMinimal training data: Claude’s pre-trained language model can understand and classify tickets with just a few dozen labeled examples, greatly reducing the time and cost associated with data preparation.\nAdaptability to changing classes: As your product or customer needs evolve, Claude can easily adapt to changes in class definitions or the introduction of new classes without extensive relabeling of training data\nSimplified ontology design: Claude’s advanced language understanding capabilities allow it to accurately classify tickets based on their content and context, rather than relying on strict ontological structures.\nInterpretable reasoning: Claude can generate human-readable explanations for its classification decisions, providing interpretable reasoning that builds trust in the automation system and allow you to easily adapt the approach if needed\n \n\nSummary: \n Claude, Anthropic's large language model, offers significant advantages over traditional multi-class classification techniques for customer support ticket routing. It requires minimal training data, can easily adapt to changing class definitions, and simplifies ontology design, while providing interpretable reasoning for its classification decisions. \n \n\n \n Deploy your classifier\n\nText\n Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n\nSummary: \n Deploy your classifier: Check out the Classification Guide in the Claude Cookbooks for code examples on using Claude for classification. The guide covers when to use Claude for classification, establishing your use case, implementing Claude, building prompts, developing test cases, running evaluations, and deploying your classifier. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 6928,
"tokenUsage": {
@@ -33734,7 +33734,7 @@
"score": 0,
"namedScores": {},
"text": "The Generated Answer is incorrect based on the Correct Answer. While the Generated Answer provides some useful information about preparing to use Claude, it does not specifically address the key things Anthropic recommends having in place before starting prompt engineering.\n\nThe Correct Answer emphasizes three critical elements:\n1. A clear definition of success criteria for the use case\n2. Ways to empirically test against those criteria\n3. A first draft prompt to improve\n\nThese specific elements are not mentioned in the Generated Answer. Instead, the Generated Answer provides a broader overview of steps to prepare for using Claude, which, while potentially useful, does not align with the specific recommendations for prompt engineering mentioned in the Correct Answer.\n\nThe Generated Answer misses the core focus on having success criteria, empirical testing methods, and an initial prompt draft, which are crucial for effective prompt engineering according to the Correct Answer.\n---\nAccording to the provided documents, Anthropic recommends the following key things to have in place before starting to engineer and improve a prompt in Claude:\n\n1. Scope your use case: Identify a problem to solve or tasks to automate with Claude, and define requirements such as features, performance, and cost.\n\n2. Design your integration: Select Claude's capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on your needs, and choose a deployment method such as the Claude API, AWS Bedrock, or Vertex AI.\n\n3. Prepare your data: Identify and clean relevant data (databases, code repos, knowledge bases) for Claude's context.\n\n4. Develop your prompts: Use Workbench to create evals, draft prompts, and iteratively refine based on test results. Deploy polished prompts and monitor real-world performance for further refinement.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n Before starting to engineer and improve a prompt in Claude, what key things does Anthropic recommend you have in place first?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nImplementing Claude\n\n\n1Scope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n2Design your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n3Prepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n4Develop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n5Implement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n6Test your system\nConduct red teaming for potential misuse and A/B test improvements.\n7Deploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\n8Monitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\n1Scope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n\n1\n1\nScope your use case Identify a problem to solve or tasks to automate with Claude. Define requirements: features, performance, and cost.\nScope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n2Design your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n\n2\n2\nDesign your integration Select Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs. Choose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\nDesign your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n3Prepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n\n3\n3\nPrepare your data Identify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\nPrepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n4Develop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n\n4\n4\nDevelop your prompts Use Workbench to create evals, draft prompts, and iteratively refine based on test results. Deploy polished prompts and monitor real-world performance for further refinement.\nDevelop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n5Implement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n\n5\n5\nImplement Claude Set up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\nImplement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n6Test your system\nConduct red teaming for potential misuse and A/B test improvements.\n\n6\n6\nTest your system Conduct red teaming for potential misuse and A/B test improvements.\nTest your system\nConduct red teaming for potential misuse and A/B test improvements.\nConduct red teaming for potential misuse and A/B test improvements.\n7Deploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\n\n7\n7\nDeploy to production Once your application runs smoothly end-to-end, deploy to production.\nDeploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\nOnce your application runs smoothly end-to-end, deploy to production.\n8Monitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\n\n8\n8\nMonitor and improve Monitor performance and effectiveness to make ongoing improvements.\nMonitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\nMonitor performance and effectiveness to make ongoing improvements.\n\n\nHow to prompt engineer\n\n\nThe prompt engineering pages in this section have been organized from most broadly effective techniques to more specialized techniques. When troubleshooting performance, we suggest you try these techniques in order, although the actual impact of each technique will depend on our use case.\nPrompt generator\nBe clear and direct\nUse examples (multishot)\nLet Claude think (chain of thought)\nUse XML tags\nGive Claude a role (system prompts)\nPrefill Claude’s response\nChain complex prompts\nLong context tips\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n Before starting to engineer and improve a prompt in Claude, what key things does Anthropic recommend you have in place first?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nImplementing Claude\n\n\n1Scope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n2Design your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n3Prepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n4Develop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n5Implement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n6Test your system\nConduct red teaming for potential misuse and A/B test improvements.\n7Deploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\n8Monitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\n1Scope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n\n1\n1\nScope your use case Identify a problem to solve or tasks to automate with Claude. Define requirements: features, performance, and cost.\nScope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n2Design your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n\n2\n2\nDesign your integration Select Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs. Choose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\nDesign your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n3Prepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n\n3\n3\nPrepare your data Identify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\nPrepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n4Develop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n\n4\n4\nDevelop your prompts Use Workbench to create evals, draft prompts, and iteratively refine based on test results. Deploy polished prompts and monitor real-world performance for further refinement.\nDevelop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n5Implement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n\n5\n5\nImplement Claude Set up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\nImplement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n6Test your system\nConduct red teaming for potential misuse and A/B test improvements.\n\n6\n6\nTest your system Conduct red teaming for potential misuse and A/B test improvements.\nTest your system\nConduct red teaming for potential misuse and A/B test improvements.\nConduct red teaming for potential misuse and A/B test improvements.\n7Deploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\n\n7\n7\nDeploy to production Once your application runs smoothly end-to-end, deploy to production.\nDeploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\nOnce your application runs smoothly end-to-end, deploy to production.\n8Monitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\n\n8\n8\nMonitor and improve Monitor performance and effectiveness to make ongoing improvements.\nMonitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\nMonitor performance and effectiveness to make ongoing improvements.\n\n\nHow to prompt engineer\n\n\nThe prompt engineering pages in this section have been organized from most broadly effective techniques to more specialized techniques. When troubleshooting performance, we suggest you try these techniques in order, although the actual impact of each technique will depend on our use case.\nPrompt generator\nBe clear and direct\nUse examples (multishot)\nLet Claude think (chain of thought)\nUse XML tags\nGive Claude a role (system prompts)\nPrefill Claude’s response\nChain complex prompts\nLong context tips\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 1,
"tokenUsage": {},
@@ -33768,7 +33768,7 @@
"score": 0,
"namedScores": {},
"text": "The Generated Answer is incorrect when compared to the Correct Answer. While the Generated Answer provides some useful information about preparing to use Claude, it does not accurately reflect the specific recommendations given in the Correct Answer for prompt engineering.\n\nThe Correct Answer emphasizes three key elements:\n1. Clear definition of success criteria\n2. Ways to empirically test against those criteria\n3. A first draft prompt to improve\n\nThe Generated Answer does not mention any of these specific points. Instead, it focuses on broader preparation steps for using Claude in general, rather than the specific prerequisites for prompt engineering that Anthropic recommends.\n\nWhile the Generated Answer's suggestions may be helpful in a general sense, they do not align with the specific prompt engineering preparation steps outlined in the Correct Answer. Therefore, the Generated Answer must be considered incorrect in this context.\n---\nBased on the information provided, Anthropic recommends having the following key things in place before starting to engineer and improve prompts for Claude:\n\n1. A well-scoped use case: Identify the specific problem you want to solve or tasks you want to automate with Claude. Define clear requirements including features, performance expectations, and cost considerations.\n\n2. An integration design: Select which of Claude's capabilities (e.g. vision, tool use) and which model (Opus, Sonnet, Haiku) you'll be using based on your needs. Choose your deployment method (Claude API, AWS Bedrock, Vertex AI, etc.).\n\n3. Prepared data: Identify and clean relevant data from your databases, code repositories, knowledge bases, etc. that will provide context for Claude.\n\n4. A testing environment: Set up Workbench to create evaluations, draft initial prompts, and iteratively refine them based on test results.\n\nHaving these elements in place provides a solid foundation for effective prompt engineering and improvement with Claude.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n Before starting to engineer and improve a prompt in Claude, what key things does Anthropic recommend you have in place first?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nImplementing Claude\n\n\n1Scope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n2Design your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n3Prepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n4Develop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n5Implement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n6Test your system\nConduct red teaming for potential misuse and A/B test improvements.\n7Deploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\n8Monitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\n1Scope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n\n1\n1\nScope your use case Identify a problem to solve or tasks to automate with Claude. Define requirements: features, performance, and cost.\nScope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n2Design your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n\n2\n2\nDesign your integration Select Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs. Choose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\nDesign your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n3Prepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n\n3\n3\nPrepare your data Identify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\nPrepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n4Develop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n\n4\n4\nDevelop your prompts Use Workbench to create evals, draft prompts, and iteratively refine based on test results. Deploy polished prompts and monitor real-world performance for further refinement.\nDevelop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n5Implement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n\n5\n5\nImplement Claude Set up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\nImplement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n6Test your system\nConduct red teaming for potential misuse and A/B test improvements.\n\n6\n6\nTest your system Conduct red teaming for potential misuse and A/B test improvements.\nTest your system\nConduct red teaming for potential misuse and A/B test improvements.\nConduct red teaming for potential misuse and A/B test improvements.\n7Deploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\n\n7\n7\nDeploy to production Once your application runs smoothly end-to-end, deploy to production.\nDeploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\nOnce your application runs smoothly end-to-end, deploy to production.\n8Monitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\n\n8\n8\nMonitor and improve Monitor performance and effectiveness to make ongoing improvements.\nMonitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\nMonitor performance and effectiveness to make ongoing improvements.\n\n\nHow to prompt engineer\n\n\nThe prompt engineering pages in this section have been organized from most broadly effective techniques to more specialized techniques. When troubleshooting performance, we suggest you try these techniques in order, although the actual impact of each technique will depend on our use case.\nPrompt generator\nBe clear and direct\nUse examples (multishot)\nLet Claude think (chain of thought)\nUse XML tags\nGive Claude a role (system prompts)\nPrefill Claude’s response\nChain complex prompts\nLong context tips\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n Before starting to engineer and improve a prompt in Claude, what key things does Anthropic recommend you have in place first?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nImplementing Claude\n\n\n1Scope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n2Design your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n3Prepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n4Develop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n5Implement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n6Test your system\nConduct red teaming for potential misuse and A/B test improvements.\n7Deploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\n8Monitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\n1Scope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n\n1\n1\nScope your use case Identify a problem to solve or tasks to automate with Claude. Define requirements: features, performance, and cost.\nScope your use case\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\nIdentify a problem to solve or tasks to automate with Claude.\nDefine requirements: features, performance, and cost.\n2Design your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n\n2\n2\nDesign your integration Select Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs. Choose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\nDesign your integration\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\nSelect Claude’s capabilities (e.g., vision, tool use) and models (Opus, Sonnet, Haiku) based on needs.\nChoose a deployment method, such as the Claude API, AWS Bedrock, or Vertex AI.\n3Prepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n\n3\n3\nPrepare your data Identify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\nPrepare your data\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\nIdentify and clean relevant data (databases, code repos, knowledge bases) for Claude’s context.\n4Develop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n\n4\n4\nDevelop your prompts Use Workbench to create evals, draft prompts, and iteratively refine based on test results. Deploy polished prompts and monitor real-world performance for further refinement.\nDevelop your prompts\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\nUse Workbench to create evals, draft prompts, and iteratively refine based on test results.\nDeploy polished prompts and monitor real-world performance for further refinement.\n5Implement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n\n5\n5\nImplement Claude Set up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\nImplement Claude\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\nSet up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.\n6Test your system\nConduct red teaming for potential misuse and A/B test improvements.\n\n6\n6\nTest your system Conduct red teaming for potential misuse and A/B test improvements.\nTest your system\nConduct red teaming for potential misuse and A/B test improvements.\nConduct red teaming for potential misuse and A/B test improvements.\n7Deploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\n\n7\n7\nDeploy to production Once your application runs smoothly end-to-end, deploy to production.\nDeploy to production\nOnce your application runs smoothly end-to-end, deploy to production.\nOnce your application runs smoothly end-to-end, deploy to production.\n8Monitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\n\n8\n8\nMonitor and improve Monitor performance and effectiveness to make ongoing improvements.\nMonitor and improve\nMonitor performance and effectiveness to make ongoing improvements.\nMonitor performance and effectiveness to make ongoing improvements.\n\n\nHow to prompt engineer\n\n\nThe prompt engineering pages in this section have been organized from most broadly effective techniques to more specialized techniques. When troubleshooting performance, we suggest you try these techniques in order, although the actual impact of each technique will depend on our use case.\nPrompt generator\nBe clear and direct\nUse examples (multishot)\nLet Claude think (chain of thought)\nUse XML tags\nGive Claude a role (system prompts)\nPrefill Claude’s response\nChain complex prompts\nLong context tips\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 4965,
"tokenUsage": {
@@ -38674,7 +38674,7 @@
"score": 1,
"namedScores": {},
"text": "Anthropic launched Claude.ai and the Claude iOS app in Europe first, on May 13th, 2024, followed by the launch in Canada on June 5th, 2024.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMay 13th, 2024\n\n\nClaude.ai and our iOS app are now available in Europe. Learn more in our Europe launch announcement.\n\n\nJune 5th, 2024\n\n\nClaude.ai, our API, and iOS app are now available in Canada. Learn more in our Canada launch announcement.\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMay 13th, 2024\n\n\nClaude.ai and our iOS app are now available in Europe. Learn more in our Europe launch announcement.\n\n\nJune 5th, 2024\n\n\nClaude.ai, our API, and iOS app are now available in Canada. Learn more in our Canada launch announcement.\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 1,
"tokenUsage": {},
@@ -38708,7 +38708,7 @@
"score": 1,
"namedScores": {},
"text": "Based on the information provided:\n\n1. Claude.ai and the iOS app were launched in Europe first, on May 13th, 2024.\n\n2. Claude.ai, the API, and the iOS app were then launched in Canada on June 5th, 2024.\n\nSo the order was: Europe first, then Canada, with about a 3-week gap between the launches.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMay 13th, 2024\n\n\nClaude.ai and our iOS app are now available in Europe. Learn more in our Europe launch announcement.\n\n\nJune 5th, 2024\n\n\nClaude.ai, our API, and iOS app are now available in Canada. Learn more in our Canada launch announcement.\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMay 13th, 2024\n\n\nClaude.ai and our iOS app are now available in Europe. Learn more in our Europe launch announcement.\n\n\nJune 5th, 2024\n\n\nClaude.ai, our API, and iOS app are now available in Canada. Learn more in our Canada launch announcement.\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 2747,
"tokenUsage": {
@@ -40403,7 +40403,7 @@
"score": 1,
"namedScores": {},
"text": "To prefill Claude's response, you can include the desired initial text in the \"content\" field of the \"assistant\" message in the API request. The API parameter used to generate a short response in this case is \"max_tokens\", which limits the maximum number of tokens (words) in the generated response.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nHow to prefill Claude’s response\n\n\nTo prefill, include the desired initial text in the Assistant message (Claude’s response will continue from where the Assistant message leaves off):\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\n```\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\n\n```\n\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nHow to prefill Claude’s response\n\n\nTo prefill, include the desired initial text in the Assistant message (Claude’s response will continue from where the Assistant message leaves off):\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\n```\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\n\n```\n\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 1,
"tokenUsage": {},
@@ -40437,7 +40437,7 @@
"score": 0,
"namedScores": {},
"text": "The generated answer is partially correct but misses a key piece of information. It correctly explains how to pre-fill part of Claude's response by including the desired initial text in the \"assistant\" message. However, it fails to mention the specific API parameter \"max_tokens\" for generating a short response like a single multiple choice answer. While the generated answer does mention using max_tokens to limit response length, it doesn't specify setting it to a very small value like 1 for extremely short responses. This omission of a critical detail from the correct answer means the generated answer cannot be considered fully correct.\n---\nTo guide Claude's response by pre-filling part of it, you can include the desired initial text in the \"assistant\" message when making an API call. The API will then continue the response from where the pre-filled text leaves off. Here's an example:\n\n```python\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Pre-filled text\n ]\n)\n```\n\nTo generate a short response, you can use the `max_tokens` parameter. In the example above, `max_tokens=1024` is used, but you can set this to a lower value to limit the length of the response. For instance, setting `max_tokens=50` would result in a much shorter completion.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nHow to prefill Claude’s response\n\n\nTo prefill, include the desired initial text in the Assistant message (Claude’s response will continue from where the Assistant message leaves off):\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\n```\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\n\n```\n\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nHow to prefill Claude’s response\n\n\nTo prefill, include the desired initial text in the Assistant message (Claude’s response will continue from where the Assistant message leaves off):\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\n```\nimport anthropic\n\nclient = anthropic.Anthropic()\nresponse = client.messages.create(\n model=\"claude-3-5-sonnet-20240620\",\n max_tokens=1024,\n messages=[\n {\"role\": \"user\", \"content\": \"What is your favorite color?\"},\n {\"role\": \"assistant\", \"content\": \"As an AI assistant, I don't have a favorite color, But if I had to pick, it would be green because\"} # Prefill here\n ]\n)\n\n```\n\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 5135,
"tokenUsage": {
@@ -41143,15 +41143,15 @@
"pass": false,
"score": 0,
"namedScores": {},
- "text": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. While it correctly identifies the Claude Cookbook as an interactive resource for learning Claude's capabilities, it fails to mention the Developer Console and its prompt generator tool, which is specifically highlighted in the Correct Answer. \n\nThe Generated Answer instead mentions the \"More Resources\" section and a \"Prompt Engineering Guide\", which are not mentioned in the Correct Answer and may not be as interactive as the Developer Console's prompt generator tool.\n\nAlthough both answers discuss ways to learn about Claude's capabilities, including PDF handling and embeddings, the Generated Answer doesn't fully align with the specific interactive methods outlined in the Correct Answer.\n---\nTwo interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings, are:\n\n1. Explore the \"Claude Cookbook\" which provides practical examples and hands-on tutorials, including sections on \"PDF Upload & Summarization\" and \"Embeddings with VoyageAI\".\n\n2. Review the \"More Resources\" section, which includes a \"Prompt Engineering Guide\" to master the art of prompt crafting, and the \"API Documentation\" which covers how to interact with Claude via the API, including details on request formats, response handling, and troubleshooting.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n\n\nGet started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "text": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. While it correctly identifies the Claude Cookbooks as an interactive resource for learning Claude's capabilities, it fails to mention the Developer Console and its prompt generator tool, which is specifically highlighted in the Correct Answer. \n\nThe Generated Answer instead mentions the \"More Resources\" section and a \"Prompt Engineering Guide\", which are not mentioned in the Correct Answer and may not be as interactive as the Developer Console's prompt generator tool.\n\nAlthough both answers discuss ways to learn about Claude's capabilities, including PDF handling and embeddings, the Generated Answer doesn't fully align with the specific interactive methods outlined in the Correct Answer.\n---\nTwo interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings, are:\n\n1. Explore the \"Claude Cookbooks\" which provides practical examples and hands-on tutorials, including sections on \"PDF Upload & Summarization\" and \"Embeddings with VoyageAI\".\n\n2. Review the \"More Resources\" section, which includes a \"Prompt Engineering Guide\" to master the art of prompt crafting, and the \"API Documentation\" which covers how to interact with Claude via the API, including details on request formats, response handling, and troubleshooting.",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n\n\nGet started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 2,
"tokenUsage": {},
"gradingResult": {
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. While it correctly identifies the Claude Cookbook as an interactive resource for learning Claude's capabilities, it fails to mention the Developer Console and its prompt generator tool, which is specifically highlighted in the Correct Answer. \n\nThe Generated Answer instead mentions the \"More Resources\" section and a \"Prompt Engineering Guide\", which are not mentioned in the Correct Answer and may not be as interactive as the Developer Console's prompt generator tool.\n\nAlthough both answers discuss ways to learn about Claude's capabilities, including PDF handling and embeddings, the Generated Answer doesn't fully align with the specific interactive methods outlined in the Correct Answer.",
+ "reason": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. While it correctly identifies the Claude Cookbooks as an interactive resource for learning Claude's capabilities, it fails to mention the Developer Console and its prompt generator tool, which is specifically highlighted in the Correct Answer. \n\nThe Generated Answer instead mentions the \"More Resources\" section and a \"Prompt Engineering Guide\", which are not mentioned in the Correct Answer and may not be as interactive as the Developer Console's prompt generator tool.\n\nAlthough both answers discuss ways to learn about Claude's capabilities, including PDF handling and embeddings, the Generated Answer doesn't fully align with the specific interactive methods outlined in the Correct Answer.",
"namedScores": {},
"tokensUsed": {
"total": 0,
@@ -41162,7 +41162,7 @@
{
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. While it correctly identifies the Claude Cookbook as an interactive resource for learning Claude's capabilities, it fails to mention the Developer Console and its prompt generator tool, which is specifically highlighted in the Correct Answer. \n\nThe Generated Answer instead mentions the \"More Resources\" section and a \"Prompt Engineering Guide\", which are not mentioned in the Correct Answer and may not be as interactive as the Developer Console's prompt generator tool.\n\nAlthough both answers discuss ways to learn about Claude's capabilities, including PDF handling and embeddings, the Generated Answer doesn't fully align with the specific interactive methods outlined in the Correct Answer.",
+ "reason": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. While it correctly identifies the Claude Cookbooks as an interactive resource for learning Claude's capabilities, it fails to mention the Developer Console and its prompt generator tool, which is specifically highlighted in the Correct Answer. \n\nThe Generated Answer instead mentions the \"More Resources\" section and a \"Prompt Engineering Guide\", which are not mentioned in the Correct Answer and may not be as interactive as the Developer Console's prompt generator tool.\n\nAlthough both answers discuss ways to learn about Claude's capabilities, including PDF handling and embeddings, the Generated Answer doesn't fully align with the specific interactive methods outlined in the Correct Answer.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -41177,8 +41177,8 @@
"pass": false,
"score": 0,
"namedScores": {},
- "text": "The Generated Answer is partially correct but misses a key piece of information. It correctly identifies the Claude Cookbook as an interactive way to learn about Claude's capabilities, including uploading PDFs and generating embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is specifically mentioned in the Correct Answer as the second interactive learning method.\n\nInstead, the Generated Answer suggests \"Experimenting directly with Claude\" via claude.ai or the web Console as the second method. While this is a way to interact with Claude, it's not as specific or tailored to learning about capabilities as the Developer Console's prompt generator tool mentioned in the Correct Answer.\n\nDue to this omission of a critical piece of information (the Developer Console and its prompt generator), and the substitution of a less specific alternative, the Generated Answer cannot be considered fully correct.\n---\nBased on the provided context, two interactive ways to learn how to use Claude's capabilities, including uploading PDFs and generating embeddings, are:\n\n1. Using the Claude Cookbook: This collection of Jupyter notebooks offers practical examples and hands-on tutorials. Specifically, it includes a section on \"PDF Upload & Summarization\" where you can learn how to upload PDFs and have Claude summarize their content. It also has a section on \"Embeddings with VoyageAI\" where you can explore how to create and use embeddings for advanced text similarity and search tasks.\n\n2. Experimenting directly with Claude: The context mentions that you can \"head over to claude.ai or our web Console to start experimenting with Claude right away!\" This allows for direct interaction with Claude to test various capabilities, including potentially uploading PDFs and working with embeddings.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n\n\nGet started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "text": "The Generated Answer is partially correct but misses a key piece of information. It correctly identifies the Claude Cookbooks as an interactive way to learn about Claude's capabilities, including uploading PDFs and generating embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is specifically mentioned in the Correct Answer as the second interactive learning method.\n\nInstead, the Generated Answer suggests \"Experimenting directly with Claude\" via claude.ai or the web Console as the second method. While this is a way to interact with Claude, it's not as specific or tailored to learning about capabilities as the Developer Console's prompt generator tool mentioned in the Correct Answer.\n\nDue to this omission of a critical piece of information (the Developer Console and its prompt generator), and the substitution of a less specific alternative, the Generated Answer cannot be considered fully correct.\n---\nBased on the provided context, two interactive ways to learn how to use Claude's capabilities, including uploading PDFs and generating embeddings, are:\n\n1. Using the Claude Cookbooks: This collection of Jupyter notebooks offers practical examples and hands-on tutorials. Specifically, it includes a section on \"PDF Upload & Summarization\" where you can learn how to upload PDFs and have Claude summarize their content. It also has a section on \"Embeddings with VoyageAI\" where you can explore how to create and use embeddings for advanced text similarity and search tasks.\n\n2. Experimenting directly with Claude: The context mentions that you can \"head over to claude.ai or our web Console to start experimenting with Claude right away!\" This allows for direct interaction with Claude to test various capabilities, including potentially uploading PDFs and working with embeddings.",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n\n\nGet started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 4115,
"tokenUsage": {
@@ -41189,7 +41189,7 @@
"gradingResult": {
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key piece of information. It correctly identifies the Claude Cookbook as an interactive way to learn about Claude's capabilities, including uploading PDFs and generating embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is specifically mentioned in the Correct Answer as the second interactive learning method.\n\nInstead, the Generated Answer suggests \"Experimenting directly with Claude\" via claude.ai or the web Console as the second method. While this is a way to interact with Claude, it's not as specific or tailored to learning about capabilities as the Developer Console's prompt generator tool mentioned in the Correct Answer.\n\nDue to this omission of a critical piece of information (the Developer Console and its prompt generator), and the substitution of a less specific alternative, the Generated Answer cannot be considered fully correct.",
+ "reason": "The Generated Answer is partially correct but misses a key piece of information. It correctly identifies the Claude Cookbooks as an interactive way to learn about Claude's capabilities, including uploading PDFs and generating embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is specifically mentioned in the Correct Answer as the second interactive learning method.\n\nInstead, the Generated Answer suggests \"Experimenting directly with Claude\" via claude.ai or the web Console as the second method. While this is a way to interact with Claude, it's not as specific or tailored to learning about capabilities as the Developer Console's prompt generator tool mentioned in the Correct Answer.\n\nDue to this omission of a critical piece of information (the Developer Console and its prompt generator), and the substitution of a less specific alternative, the Generated Answer cannot be considered fully correct.",
"namedScores": {},
"tokensUsed": {
"total": 0,
@@ -41200,7 +41200,7 @@
{
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key piece of information. It correctly identifies the Claude Cookbook as an interactive way to learn about Claude's capabilities, including uploading PDFs and generating embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is specifically mentioned in the Correct Answer as the second interactive learning method.\n\nInstead, the Generated Answer suggests \"Experimenting directly with Claude\" via claude.ai or the web Console as the second method. While this is a way to interact with Claude, it's not as specific or tailored to learning about capabilities as the Developer Console's prompt generator tool mentioned in the Correct Answer.\n\nDue to this omission of a critical piece of information (the Developer Console and its prompt generator), and the substitution of a less specific alternative, the Generated Answer cannot be considered fully correct.",
+ "reason": "The Generated Answer is partially correct but misses a key piece of information. It correctly identifies the Claude Cookbooks as an interactive way to learn about Claude's capabilities, including uploading PDFs and generating embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is specifically mentioned in the Correct Answer as the second interactive learning method.\n\nInstead, the Generated Answer suggests \"Experimenting directly with Claude\" via claude.ai or the web Console as the second method. While this is a way to interact with Claude, it's not as specific or tailored to learning about capabilities as the Developer Console's prompt generator tool mentioned in the Correct Answer.\n\nDue to this omission of a critical piece of information (the Developer Console and its prompt generator), and the substitution of a less specific alternative, the Generated Answer cannot be considered fully correct.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -41215,8 +41215,8 @@
"pass": false,
"score": 0,
"namedScores": {},
- "text": "The Generated Answer is partially correct but misses a key element and includes incorrect information. It correctly mentions the Claude Cookbook as an interactive way to learn Claude's capabilities, including uploading PDFs and working with embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is a critical piece of information from the Correct Answer. Instead, it incorrectly includes Claude for Sheets usage examples, which is not mentioned in the Correct Answer as one of the two interactive ways to learn Claude's capabilities. Due to this significant omission and incorrect addition, the Generated Answer cannot be considered fully correct.\n---\nTwo interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings, are:\n\n1. The Claude Cookbook, which provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n2. The Claude for Sheets usage examples, which demonstrate how to integrate the Claude AI model into Google Sheets, enabling users to leverage the model's capabilities within the spreadsheet environment for tasks such as data analysis and text generation.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude for Sheets usage examples\n\nText\n Claude for Sheets usage examples\n\n\n \n\nSummary: \n Claude for Sheets usage examples provide demonstrations of how to integrate the Claude AI model into Google Sheets, enabling users to leverage the model's capabilities within the spreadsheet environment for tasks such as data analysis, text generation, and more. \n \n\n \n Claude Cookbook\n\nText\n Claude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n\nSummary: \n The Claude Cookbook provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, how to extend Claude's capabilities by integrating external tools and functions, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks. \n \n\n \n Further information\n\nText\n Further information\n\n\nFor more information regarding this extension, see the Claude for Sheets Google Workspace Marketplace overview page.\nEmbeddingsVisionxlinkedin\nEmbeddingsVision\nxlinkedin\nWhy use Claude for Sheets? Get started with Claude for Sheets Install Claude for Sheets Enter your first prompt Advanced use Optional function parameters Claude for Sheets usage examples Prompt engineering interactive tutorial Prompt engineering workflow Claude for Sheets workbook template Troubleshooting Further information\nWhy use Claude for Sheets?Get started with Claude for SheetsInstall Claude for SheetsEnter your first promptAdvanced useOptional function parametersClaude for Sheets usage examplesPrompt engineering interactive tutorialPrompt engineering workflowClaude for Sheets workbook templateTroubleshootingFurther information\n \n\nSummary: \n The \"Further information\" section provides additional resources for the Claude for Sheets extension, including a Google Workspace Marketplace overview page. It also lists various related topics such as Embeddings, Vision, and LinkedIn, which may be of interest to users. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "text": "The Generated Answer is partially correct but misses a key element and includes incorrect information. It correctly mentions the Claude Cookbooks as an interactive way to learn Claude's capabilities, including uploading PDFs and working with embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is a critical piece of information from the Correct Answer. Instead, it incorrectly includes Claude for Sheets usage examples, which is not mentioned in the Correct Answer as one of the two interactive ways to learn Claude's capabilities. Due to this significant omission and incorrect addition, the Generated Answer cannot be considered fully correct.\n---\nTwo interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings, are:\n\n1. The Claude Cookbooks, which provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n2. The Claude for Sheets usage examples, which demonstrate how to integrate the Claude AI model into Google Sheets, enabling users to leverage the model's capabilities within the spreadsheet environment for tasks such as data analysis and text generation.",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude for Sheets usage examples\n\nText\n Claude for Sheets usage examples\n\n\n \n\nSummary: \n Claude for Sheets usage examples provide demonstrations of how to integrate the Claude AI model into Google Sheets, enabling users to leverage the model's capabilities within the spreadsheet environment for tasks such as data analysis, text generation, and more. \n \n\n \n Claude Cookbooks\n\nText\n Claude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n\nSummary: \n The Claude Cookbooks provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, how to extend Claude's capabilities by integrating external tools and functions, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks. \n \n\n \n Further information\n\nText\n Further information\n\n\nFor more information regarding this extension, see the Claude for Sheets Google Workspace Marketplace overview page.\nEmbeddingsVisionxlinkedin\nEmbeddingsVision\nxlinkedin\nWhy use Claude for Sheets? Get started with Claude for Sheets Install Claude for Sheets Enter your first prompt Advanced use Optional function parameters Claude for Sheets usage examples Prompt engineering interactive tutorial Prompt engineering workflow Claude for Sheets workbook template Troubleshooting Further information\nWhy use Claude for Sheets?Get started with Claude for SheetsInstall Claude for SheetsEnter your first promptAdvanced useOptional function parametersClaude for Sheets usage examplesPrompt engineering interactive tutorialPrompt engineering workflowClaude for Sheets workbook templateTroubleshootingFurther information\n \n\nSummary: \n The \"Further information\" section provides additional resources for the Claude for Sheets extension, including a Google Workspace Marketplace overview page. It also lists various related topics such as Embeddings, Vision, and LinkedIn, which may be of interest to users. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 1389,
"tokenUsage": {
@@ -41227,7 +41227,7 @@
"gradingResult": {
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key element and includes incorrect information. It correctly mentions the Claude Cookbook as an interactive way to learn Claude's capabilities, including uploading PDFs and working with embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is a critical piece of information from the Correct Answer. Instead, it incorrectly includes Claude for Sheets usage examples, which is not mentioned in the Correct Answer as one of the two interactive ways to learn Claude's capabilities. Due to this significant omission and incorrect addition, the Generated Answer cannot be considered fully correct.",
+ "reason": "The Generated Answer is partially correct but misses a key element and includes incorrect information. It correctly mentions the Claude Cookbooks as an interactive way to learn Claude's capabilities, including uploading PDFs and working with embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is a critical piece of information from the Correct Answer. Instead, it incorrectly includes Claude for Sheets usage examples, which is not mentioned in the Correct Answer as one of the two interactive ways to learn Claude's capabilities. Due to this significant omission and incorrect addition, the Generated Answer cannot be considered fully correct.",
"namedScores": {},
"tokensUsed": {
"total": 0,
@@ -41238,7 +41238,7 @@
{
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key element and includes incorrect information. It correctly mentions the Claude Cookbook as an interactive way to learn Claude's capabilities, including uploading PDFs and working with embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is a critical piece of information from the Correct Answer. Instead, it incorrectly includes Claude for Sheets usage examples, which is not mentioned in the Correct Answer as one of the two interactive ways to learn Claude's capabilities. Due to this significant omission and incorrect addition, the Generated Answer cannot be considered fully correct.",
+ "reason": "The Generated Answer is partially correct but misses a key element and includes incorrect information. It correctly mentions the Claude Cookbooks as an interactive way to learn Claude's capabilities, including uploading PDFs and working with embeddings. However, it fails to mention the Developer Console and its prompt generator tool, which is a critical piece of information from the Correct Answer. Instead, it incorrectly includes Claude for Sheets usage examples, which is not mentioned in the Correct Answer as one of the two interactive ways to learn Claude's capabilities. Due to this significant omission and incorrect addition, the Generated Answer cannot be considered fully correct.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -41253,8 +41253,8 @@
"pass": false,
"score": 0,
"namedScores": {},
- "text": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. Both answers correctly identify the Claude Cookbook with its interactive Jupyter notebooks as one way to learn about Claude's capabilities, including uploading PDFs and working with embeddings. However, the Generated Answer fails to mention the Developer Console and its prompt generator tool, which is the second interactive method specified in the Correct Answer. Instead, it mentions a \"prompt engineering interactive tutorial\" for Google Sheets, which is not part of the Correct Answer. This omission of a critical piece of information (the Developer Console) and the inclusion of an unmentioned element makes the Generated Answer incorrect overall.\n---\nBased on the provided context, two interactive ways to learn how to use Claude's capabilities are:\n\n1. Exploring the Claude Cookbook: This collection of Jupyter notebooks offers practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, as well as how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n2. Using the prompt engineering interactive tutorial: While not explicitly described in detail, the context mentions a \"Prompt engineering interactive tutorial\" as part of the Claude for Sheets resources, which likely provides an interactive way to learn about Claude's capabilities within the Google Sheets environment.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude for Sheets usage examples\n\nText\n Claude for Sheets usage examples\n\n\n \n\nSummary: \n Claude for Sheets usage examples provide demonstrations of how to integrate the Claude AI model into Google Sheets, enabling users to leverage the model's capabilities within the spreadsheet environment for tasks such as data analysis, text generation, and more. \n \n\n \n Claude Cookbook\n\nText\n Claude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n\nSummary: \n The Claude Cookbook provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, how to extend Claude's capabilities by integrating external tools and functions, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks. \n \n\n \n Further information\n\nText\n Further information\n\n\nFor more information regarding this extension, see the Claude for Sheets Google Workspace Marketplace overview page.\nEmbeddingsVisionxlinkedin\nEmbeddingsVision\nxlinkedin\nWhy use Claude for Sheets? Get started with Claude for Sheets Install Claude for Sheets Enter your first prompt Advanced use Optional function parameters Claude for Sheets usage examples Prompt engineering interactive tutorial Prompt engineering workflow Claude for Sheets workbook template Troubleshooting Further information\nWhy use Claude for Sheets?Get started with Claude for SheetsInstall Claude for SheetsEnter your first promptAdvanced useOptional function parametersClaude for Sheets usage examplesPrompt engineering interactive tutorialPrompt engineering workflowClaude for Sheets workbook templateTroubleshootingFurther information\n \n\nSummary: \n The \"Further information\" section provides additional resources for the Claude for Sheets extension, including a Google Workspace Marketplace overview page. It also lists various related topics such as Embeddings, Vision, and LinkedIn, which may be of interest to users. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "text": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. Both answers correctly identify the Claude Cookbooks with its interactive Jupyter notebooks as one way to learn about Claude's capabilities, including uploading PDFs and working with embeddings. However, the Generated Answer fails to mention the Developer Console and its prompt generator tool, which is the second interactive method specified in the Correct Answer. Instead, it mentions a \"prompt engineering interactive tutorial\" for Google Sheets, which is not part of the Correct Answer. This omission of a critical piece of information (the Developer Console) and the inclusion of an unmentioned element makes the Generated Answer incorrect overall.\n---\nBased on the provided context, two interactive ways to learn how to use Claude's capabilities are:\n\n1. Exploring the Claude Cookbooks: This collection of Jupyter notebooks offers practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, as well as how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n2. Using the prompt engineering interactive tutorial: While not explicitly described in detail, the context mentions a \"Prompt engineering interactive tutorial\" as part of the Claude for Sheets resources, which likely provides an interactive way to learn about Claude's capabilities within the Google Sheets environment.",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude for Sheets usage examples\n\nText\n Claude for Sheets usage examples\n\n\n \n\nSummary: \n Claude for Sheets usage examples provide demonstrations of how to integrate the Claude AI model into Google Sheets, enabling users to leverage the model's capabilities within the spreadsheet environment for tasks such as data analysis, text generation, and more. \n \n\n \n Claude Cookbooks\n\nText\n Claude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n\nSummary: \n The Claude Cookbooks provides practical examples and hands-on tutorials, including how to upload PDFs and have Claude summarize their content, how to extend Claude's capabilities by integrating external tools and functions, and how to create and use embeddings with VoyageAI for advanced text similarity and search tasks. \n \n\n \n Further information\n\nText\n Further information\n\n\nFor more information regarding this extension, see the Claude for Sheets Google Workspace Marketplace overview page.\nEmbeddingsVisionxlinkedin\nEmbeddingsVision\nxlinkedin\nWhy use Claude for Sheets? Get started with Claude for Sheets Install Claude for Sheets Enter your first prompt Advanced use Optional function parameters Claude for Sheets usage examples Prompt engineering interactive tutorial Prompt engineering workflow Claude for Sheets workbook template Troubleshooting Further information\nWhy use Claude for Sheets?Get started with Claude for SheetsInstall Claude for SheetsEnter your first promptAdvanced useOptional function parametersClaude for Sheets usage examplesPrompt engineering interactive tutorialPrompt engineering workflowClaude for Sheets workbook templateTroubleshootingFurther information\n \n\nSummary: \n The \"Further information\" section provides additional resources for the Claude for Sheets extension, including a Google Workspace Marketplace overview page. It also lists various related topics such as Embeddings, Vision, and LinkedIn, which may be of interest to users. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 3983,
"tokenUsage": {
@@ -41265,7 +41265,7 @@
"gradingResult": {
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. Both answers correctly identify the Claude Cookbook with its interactive Jupyter notebooks as one way to learn about Claude's capabilities, including uploading PDFs and working with embeddings. However, the Generated Answer fails to mention the Developer Console and its prompt generator tool, which is the second interactive method specified in the Correct Answer. Instead, it mentions a \"prompt engineering interactive tutorial\" for Google Sheets, which is not part of the Correct Answer. This omission of a critical piece of information (the Developer Console) and the inclusion of an unmentioned element makes the Generated Answer incorrect overall.",
+ "reason": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. Both answers correctly identify the Claude Cookbooks with its interactive Jupyter notebooks as one way to learn about Claude's capabilities, including uploading PDFs and working with embeddings. However, the Generated Answer fails to mention the Developer Console and its prompt generator tool, which is the second interactive method specified in the Correct Answer. Instead, it mentions a \"prompt engineering interactive tutorial\" for Google Sheets, which is not part of the Correct Answer. This omission of a critical piece of information (the Developer Console) and the inclusion of an unmentioned element makes the Generated Answer incorrect overall.",
"namedScores": {},
"tokensUsed": {
"total": 0,
@@ -41276,7 +41276,7 @@
{
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. Both answers correctly identify the Claude Cookbook with its interactive Jupyter notebooks as one way to learn about Claude's capabilities, including uploading PDFs and working with embeddings. However, the Generated Answer fails to mention the Developer Console and its prompt generator tool, which is the second interactive method specified in the Correct Answer. Instead, it mentions a \"prompt engineering interactive tutorial\" for Google Sheets, which is not part of the Correct Answer. This omission of a critical piece of information (the Developer Console) and the inclusion of an unmentioned element makes the Generated Answer incorrect overall.",
+ "reason": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. Both answers correctly identify the Claude Cookbooks with its interactive Jupyter notebooks as one way to learn about Claude's capabilities, including uploading PDFs and working with embeddings. However, the Generated Answer fails to mention the Developer Console and its prompt generator tool, which is the second interactive method specified in the Correct Answer. Instead, it mentions a \"prompt engineering interactive tutorial\" for Google Sheets, which is not part of the Correct Answer. This omission of a critical piece of information (the Developer Console) and the inclusion of an unmentioned element makes the Generated Answer incorrect overall.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -41291,8 +41291,8 @@
"pass": true,
"score": 1,
"namedScores": {},
- "text": "Two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings, are:\n\n1. The Claude Cookbook, which provides Jupyter notebooks that demonstrate how to upload PDFs and have Claude summarize their content, as well as how to create and use embeddings with VoyageAI.\n\n2. The Anthropic Developer Console, which includes the Workbench and prompt generator tool, allowing you to enjoy easier and more powerful prompting in your browser to explore Claude's capabilities.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbook\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Get started\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "text": "Two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings, are:\n\n1. The Claude Cookbooks, which provides Jupyter notebooks that demonstrate how to upload PDFs and have Claude summarize their content, as well as how to create and use embeddings with VoyageAI.\n\n2. The Anthropic Developer Console, which includes the Workbench and prompt generator tool, allowing you to enjoy easier and more powerful prompting in your browser to explore Claude's capabilities.",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbooks\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Get started\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 1146,
"tokenUsage": {
@@ -41314,7 +41314,7 @@
{
"pass": true,
"score": 1,
- "reason": "The Generated Answer is essentially correct and aligns well with the Correct Answer. Both answers mention the two key interactive ways to learn Claude's capabilities:\n\n1. The Claude Cookbook with its interactive Jupyter notebooks for demonstrating PDF uploads, embeddings, and other features.\n\n2. The Developer Console, which includes tools for easier and more powerful prompting.\n\nThe Generated Answer provides slightly more detail about the Cookbook mentioning VoyageAI for embeddings and summarizing PDF content, while the Correct Answer is more general. The Generated Answer also specifies the Workbench as part of the Developer Console, which isn't mentioned in the Correct Answer but doesn't contradict it.\n\nOverall, the substance and key points of both answers are the same, and there are no critical pieces of information missing or contradictions between the two.",
+ "reason": "The Generated Answer is essentially correct and aligns well with the Correct Answer. Both answers mention the two key interactive ways to learn Claude's capabilities:\n\n1. The Claude Cookbooks with its interactive Jupyter notebooks for demonstrating PDF uploads, embeddings, and other features.\n\n2. The Developer Console, which includes tools for easier and more powerful prompting.\n\nThe Generated Answer provides slightly more detail about the Cookbook mentioning VoyageAI for embeddings and summarizing PDF content, while the Correct Answer is more general. The Generated Answer also specifies the Workbench as part of the Developer Console, which isn't mentioned in the Correct Answer but doesn't contradict it.\n\nOverall, the substance and key points of both answers are the same, and there are no critical pieces of information missing or contradictions between the two.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -41329,8 +41329,8 @@
"pass": true,
"score": 1,
"namedScores": {},
- "text": "Based on the provided context, there are two interactive ways to learn how to use Claude's capabilities, including uploading PDFs and generating embeddings:\n\n1. Claude Cookbook: This is a collection of Jupyter notebooks that provide practical examples and hands-on tutorials. It specifically includes lessons on PDF upload and summarization, as well as creating and using embeddings with VoyageAI for advanced text similarity and search tasks.\n\n2. Developer Console: This tool offers a Workbench and prompt generator in your browser, allowing for easier and more powerful prompting. While not explicitly stated to cover PDF uploads and embeddings, it provides an interactive environment to explore Claude's capabilities.\n\nBoth of these resources offer interactive ways to learn and experiment with Claude's various features, including those mentioned in the query.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbook\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Get started\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "text": "Based on the provided context, there are two interactive ways to learn how to use Claude's capabilities, including uploading PDFs and generating embeddings:\n\n1. Claude Cookbooks: This is a collection of Jupyter notebooks that provide practical examples and hands-on tutorials. It specifically includes lessons on PDF upload and summarization, as well as creating and using embeddings with VoyageAI for advanced text similarity and search tasks.\n\n2. Developer Console: This tool offers a Workbench and prompt generator in your browser, allowing for easier and more powerful prompting. While not explicitly stated to cover PDF uploads and embeddings, it provides an interactive environment to explore Claude's capabilities.\n\nBoth of these resources offer interactive ways to learn and experiment with Claude's various features, including those mentioned in the query.",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbooks\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Get started\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 4782,
"tokenUsage": {
@@ -41352,7 +41352,7 @@
{
"pass": true,
"score": 1,
- "reason": "The Generated Answer is correct based on the Correct Answer. Both answers highlight the same two interactive ways to learn Claude's capabilities:\n\n1. The Claude Cookbook: Both answers mention this as a collection of Jupyter notebooks that demonstrate various capabilities, including uploading PDFs and generating embeddings.\n\n2. The Developer Console: Both answers describe this as a tool for easier and more powerful prompting, with the Generated Answer providing slightly more detail about the Workbench feature.\n\nThe Generated Answer covers the key points from the Correct Answer, providing essentially the same information about these two interactive learning methods. There are no critical omissions or contradictions between the two answers.",
+ "reason": "The Generated Answer is correct based on the Correct Answer. Both answers highlight the same two interactive ways to learn Claude's capabilities:\n\n1. The Claude Cookbooks: Both answers mention this as a collection of Jupyter notebooks that demonstrate various capabilities, including uploading PDFs and generating embeddings.\n\n2. The Developer Console: Both answers describe this as a tool for easier and more powerful prompting, with the Generated Answer providing slightly more detail about the Workbench feature.\n\nThe Generated Answer covers the key points from the Correct Answer, providing essentially the same information about these two interactive learning methods. There are no critical omissions or contradictions between the two answers.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -41367,7 +41367,7 @@
"test": {
"vars": {
"query": "What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?",
- "correct_answer": "The Claude Cookbook provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting."
+ "correct_answer": "The Claude Cookbooks provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting."
},
"assert": [
{
@@ -41379,7 +41379,7 @@
"description": "Row #45"
},
"vars": [
- "The Claude Cookbook provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting.",
+ "The Claude Cookbooks provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting.",
"What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?"
]
},
@@ -41885,7 +41885,7 @@
"score": 0,
"namedScores": {},
"text": "The Generated Answer is incorrect based on the Correct Answer. The Generated Answer mentions using the Quickstart to make an API call and experimenting with the Workbench, which are not mentioned in the Correct Answer. The Correct Answer specifically states that users can start experimenting with Claude by visiting claude.ai or using Anthropic's web Console. These methods are completely different from what is mentioned in the Generated Answer. There is no overlap or similarity in the substance of the two answers, and the Generated Answer is missing the critical information provided in the Correct Answer.\n---\nAccording to Anthropic's documentation, two ways to start experimenting with Claude as a user are:\n\n1. Follow the Quickstart to make your first API call.\n2. Experiment and start building with the Workbench.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 1,
"tokenUsage": {},
@@ -41919,7 +41919,7 @@
"score": 0,
"namedScores": {},
"text": "The Generated Answer is incorrect based on the Correct Answer. While both answers discuss ways to experiment with Claude, they mention different methods. The Correct Answer specifically states that users can start experimenting with Claude by visiting claude.ai or using Anthropic's web Console. The Generated Answer, however, mentions using the Workbench in the Developer Console and exploring example prompts in the Prompt Library. These are different methods than those mentioned in the Correct Answer. Although both answers are discussing ways to interact with Claude, the specific methods mentioned are not the same, and the Generated Answer is missing the critical information about claude.ai and the web Console mentioned in the Correct Answer.\n---\nAccording to Anthropic's documentation, two ways to start experimenting with Claude as a user are:\n\n1. Using the Workbench in the Developer Console, which allows for easier and more powerful prompting in your browser.\n\n2. Exploring example prompts in the Prompt Library for inspiration.\n\nThese options provide users with interactive ways to experiment with Claude's capabilities without needing to immediately dive into API integration.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 2629,
"tokenUsage": {
@@ -41957,7 +41957,7 @@
"score": 0,
"namedScores": {},
"text": "The Generated Answer is partially correct but contains some inaccuracies when compared to the Correct Answer. Both answers correctly mention using claude.ai (referred to as the \"web Console\" in the Generated Answer) as one way to start experimenting with Claude. However, the Generated Answer incorrectly states that following the Quickstart guide to make an API call is the second method, while the Correct Answer indicates that using Anthropic's web Console is the second method. This is a significant difference in the information provided, as making API calls is not mentioned in the Correct Answer as one of the two primary ways to start experimenting with Claude as a user.\n---\nAccording to Anthropic's documentation, two ways to start experimenting with Claude as a user are:\n\n1. Use the web Console at claude.ai to start interacting with Claude right away.\n2. Follow the Quickstart guide to learn how to make your first API call in minutes.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Get started\n\nText\n Get started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n\nSummary: \n The documentation covers getting started with Anthropic's Claude AI model, including an introduction to its capabilities, a quickstart guide for making API calls, and a prompt library for inspiration. It provides essential information for users new to Claude to learn the basics and start using the model. \n \n\n \n Get started with Claude\n\nText\n Get started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n \n\nSummary: \n The documentation covers getting started with Anthropic's Claude AI model, including a quickstart guide, account creation, API key obtainment, and interactive experimentation through the web Console. It also provides information on support resources and additional model-related topics. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Get started\n\nText\n Get started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n\nSummary: \n The documentation covers getting started with Anthropic's Claude AI model, including an introduction to its capabilities, a quickstart guide for making API calls, and a prompt library for inspiration. It provides essential information for users new to Claude to learn the basics and start using the model. \n \n\n \n Get started with Claude\n\nText\n Get started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n \n\nSummary: \n The documentation covers getting started with Anthropic's Claude AI model, including a quickstart guide, account creation, API key obtainment, and interactive experimentation through the web Console. It also provides information on support resources and additional model-related topics. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 1344,
"tokenUsage": {
@@ -41995,7 +41995,7 @@
"score": 1,
"namedScores": {},
"text": "According to Anthropic's documentation, there are two main ways to start experimenting with Claude as a user:\n\n1. Visit claude.ai to interact with Claude directly through a web interface.\n\n2. Use the web Console provided by Anthropic to start experimenting with Claude right away.\n\nBoth of these options allow users to begin exploring Claude's capabilities without needing to integrate the API or write code.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Get started\n\nText\n Get started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n\nSummary: \n The documentation covers getting started with Anthropic's Claude AI model, including an introduction to its capabilities, a quickstart guide for making API calls, and a prompt library for inspiration. It provides essential information for users new to Claude to learn the basics and start using the model. \n \n\n \n Get started with Claude\n\nText\n Get started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n \n\nSummary: \n The documentation covers getting started with Anthropic's Claude AI model, including a quickstart guide, account creation, API key obtainment, and interactive experimentation through the web Console. It also provides information on support resources and additional model-related topics. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Get started\n\nText\n Get started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n\nSummary: \n The documentation covers getting started with Anthropic's Claude AI model, including an introduction to its capabilities, a quickstart guide for making API calls, and a prompt library for inspiration. It provides essential information for users new to Claude to learn the basics and start using the model. \n \n\n \n Get started with Claude\n\nText\n Get started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n \n\nSummary: \n The documentation covers getting started with Anthropic's Claude AI model, including a quickstart guide, account creation, API key obtainment, and interactive experimentation through the web Console. It also provides information on support resources and additional model-related topics. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 3306,
"tokenUsage": {
@@ -42033,7 +42033,7 @@
"score": 0,
"namedScores": {},
"text": "The Generated Answer is partially correct but contains some inaccuracies when compared to the Correct Answer. Both answers correctly mention using claude.ai as one way to start experimenting with Claude. However, the Generated Answer incorrectly states that the second method is to follow the Quickstart guide to make an API call. The Correct Answer instead mentions using Anthropic's web Console as the second method. This is a significant difference in the information provided. While the Generated Answer does contain some correct information, it misses a key point from the Correct Answer and includes information not present in the Correct Answer. Therefore, it cannot be considered fully correct.\n---\nAccording to Anthropic's documentation, two ways to start experimenting with Claude as a user are:\n\n1. Go to the Claude web Console at claude.ai and start interacting with the models there.\n\n2. Follow the Quickstart guide to learn how to make your first API call and start building with Claude.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Get started with Claude\n\nGet started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n \n \n\n \n Get started\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n \n\n \n Start building with Claude\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Get started with Claude\n\nGet started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n \n \n\n \n Get started\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n \n\n \n Start building with Claude\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 914,
"tokenUsage": {
@@ -42071,7 +42071,7 @@
"score": 1,
"namedScores": {},
"text": "According to Anthropic's documentation, two ways to start experimenting with Claude as a user are:\n\n1. Go to claude.ai to interact with Claude directly through their web interface.\n\n2. Use the web Console provided by Anthropic to start experimenting with Claude right away.\n\nThese options allow users to quickly begin exploring Claude's capabilities without needing to integrate the API or write code.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Get started with Claude\n\nGet started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n \n \n\n \n Get started\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n \n\n \n Start building with Claude\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Get started with Claude\n\nGet started with Claude\n\n\nIf you’re ready to start exploring what Claude can do for you, let’s dive in! Whether you’re a developer looking to integrate Claude into your applications or a user wanting to experience the power of AI firsthand, we’ve got you covered.\nCheck out our quickstart guide for step-by-step instructions on how to get up and running with Claude. You’ll learn how to create an account, obtain API keys, and start interacting with our models in no time. You can also head over to claude.ai or our web Console to start experimenting with Claude right away!\nIf you have any questions or need assistance, don’t hesitate to reach out to our support team or consult the Discord community.\nTicket RoutingSecurity and compliancexlinkedin\nTicket RoutingSecurity and compliance\nxlinkedin\nModel names Model comparison Prompt and output performance Legacy models Legacy model comparison Get started with Claude\nModel namesModel comparisonPrompt and output performanceLegacy modelsLegacy model comparisonGet started with Claude\n \n \n\n \n Get started\n\nGet started\n\n\nIf you’re new to Claude, start here to learn the essentials and make your first API call.\nIntro to ClaudeExplore Claude’s capabilities and development flow.QuickstartLearn how to make your first API call in minutes.Prompt LibraryExplore example prompts for inspiration.\nIntro to ClaudeExplore Claude’s capabilities and development flow.\n\nIntro to Claude\nExplore Claude’s capabilities and development flow.\nQuickstartLearn how to make your first API call in minutes.\n\nQuickstart\nLearn how to make your first API call in minutes.\nPrompt LibraryExplore example prompts for inspiration.\n\nPrompt Library\nExplore example prompts for inspiration.\n \n \n\n \n Start building with Claude\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 1949,
"tokenUsage": {
@@ -43515,7 +43515,7 @@
"score": 1,
"namedScores": {},
"text": "Claude is well-suited for enterprise use cases that require integration with specialized applications and processing of large volumes of sensitive data due to the following key capabilities:\n\n1. Enterprise-grade security and data handling: Claude is SOC II Type 2 certified and offers HIPAA compliance options for APIs, ensuring secure handling of sensitive data. It is accessible through AWS (GA) and GCP (in private preview), providing enterprise-level deployment options.\n\n2. Trustworthiness and resistance to misuse: Claude is resistant to jailbreaks and misuse, with continuous monitoring of prompts and outputs to prevent harmful or malicious use cases. It also provides copyright indemnity protections for paid commercial services, making it suitable for high-trust industries.\n\n3. Expanded capabilities for complex use cases: Claude has a 200K token context window, with future support for 1M tokens, enabling it to handle large volumes of data. Its tool use (function calling) capability allows seamless integration into specialized applications and custom workflows. Additionally, its multimodal input capabilities, which accept text, images, and other media, enable processing of complex use cases.\n\n4. Developer-friendly features: Claude provides a Developer Console with Workbench and prompt generation tools, making it easier for developers to create and experiment with powerful prompts. It also offers SDKs and APIs to expedite and enhance the development process.\n\n5. Reliability and accuracy: Claude has very low hallucination rates and is accurate over long documents, making it suitable for enterprise-level applications that require consistent and reliable performance.\n\n6. Global capabilities: Claude is great for coding tasks and has fluency in multiple languages, including English, Spanish, and Japanese. This enables use cases like translation services and broader global utility.\n\n7. Cost-conscious model family: Claude's family of models balances cost, performance, and intelligence, providing enterprise customers with options that fit their specific requirements and budgets.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are some of the key capabilities that make Claude suitable for enterprise use cases requiring integration with specialized applications and processing of large volumes of sensitive data?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Enterprise considerations\n\nEnterprise considerations\n\n\nAlong with an extensive set of features, tools, and capabilities, Claude is also built to be secure, trustworthy, and scalable for wide-reaching enterprise needs.\nFeatureDescriptionSecureEnterprise-grade security and data handling for APISOC II Type 2 certified, HIPAA compliance options for APIAccessible through AWS (GA) and GCP (in private preview)TrustworthyResistant to jailbreaks and misuse. We continuously monitor prompts and outputs for harmful, malicious use cases that violate our AUP.Copyright indemnity protections for paid commercial servicesUniquely positioned to serve high trust industries that process large volumes of sensitive user dataCapable200K token context window for expanded use cases, with future support for 1MTool use, also known as function calling, which allows seamless integration of Claude into specialized applications and custom workflowsMultimodal input capabilities with text output, allowing you to upload images (such as tables, graphs, and photos) along with text prompts for richer context and complex use casesDeveloper Console with Workbench and prompt generation tool for easier, more powerful prompting and experimentationSDKs and APIs to expedite and enhance developmentReliableVery low hallucination ratesAccurate over long documentsGlobalGreat for coding tasks and fluency in English and non-English languages like Spanish and JapaneseEnables use cases like translation services and broader global utilityCost consciousFamily of models balances cost, performance, and intelligence\nEnterprise-grade security and data handling for APISOC II Type 2 certified, HIPAA compliance options for APIAccessible through AWS (GA) and GCP (in private preview)\nResistant to jailbreaks and misuse. We continuously monitor prompts and outputs for harmful, malicious use cases that violate our AUP.Copyright indemnity protections for paid commercial servicesUniquely positioned to serve high trust industries that process large volumes of sensitive user data\n200K token context window for expanded use cases, with future support for 1MTool use, also known as function calling, which allows seamless integration of Claude into specialized applications and custom workflowsMultimodal input capabilities with text output, allowing you to upload images (such as tables, graphs, and photos) along with text prompts for richer context and complex use casesDeveloper Console with Workbench and prompt generation tool for easier, more powerful prompting and experimentationSDKs and APIs to expedite and enhance development\nVery low hallucination ratesAccurate over long documents\nGreat for coding tasks and fluency in English and non-English languages like Spanish and JapaneseEnables use cases like translation services and broader global utility\nFamily of models balances cost, performance, and intelligence\n \n \n\n \n Deploy your classifier\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n \n\n \n Integrate Claude into your Support Workflow\n\nIntegrate Claude into your Support Workflow\n\n\nWhen integrating your code into production, you’ll need to architect how it fits into the flow of your ticket routing system. There are two ways you could go around doing this:\nPush-based: Where the Support Ticket System you’re using (e.g. Zendesk an Anthropic partner) will trigger your code by sending a webhook event to your routing service, which will then classify the intent and route it.\nPull-Based: Where your code could pull for the latest tickets at a certain schedule and then route them.\nWhile the bulk of the classification work discussed in previous sections remains the same, you will need to wrap your code in a service for either of the two approaches above. The choice of approach depends on what APIs the support ticketing system provides. Between the two, the push-based approach using webhooks is more web-scaleable but needs you to expose a public endpoint that might have IT Security implications. The pull-based approach is easier to implement but makes unnecessary calls to the Support Ticket System.\n\nThe diagram above shows the push-based approach in action:\nSupport Ticket Creation - The process begins when a customer creates a new support ticket. The customer provides the necessary information about their issue or inquiry, which is then submitted to the Support Ticket System.\nWebhook Event Generation - Upon receiving the new support ticket, the Support Ticket System should generate a Webhook Event Ticket Created notification. This event triggers the subsequent steps in the ticket routing process.\nTicket Content Retrieval - The webhook event initiates the retrieval of the ticket’s contents from the Support Ticket System. This step ensures that the full details of the customer’s issue are available for analysis and classification.\nSupport Request Classification - Using the retrieved ticket contents, the system classifies the intent behind the support request using your code. This classification helps identify the most appropriate team or service to handle the ticket. For the webhook-based approach to work, your code from the previous section will need to be served using a RESTful API which can be called from the webhook. The endpoint for the request would need to be reachable from the internet.\nTicket Update - Finally, the ticket is updated back into the Support Ticket System, from where the assigned support team can work on resolving it.\nNote: While the classification method calls Claude API, we’ve removed that extra call from the diagram for simplicity.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are some of the key capabilities that make Claude suitable for enterprise use cases requiring integration with specialized applications and processing of large volumes of sensitive data?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Enterprise considerations\n\nEnterprise considerations\n\n\nAlong with an extensive set of features, tools, and capabilities, Claude is also built to be secure, trustworthy, and scalable for wide-reaching enterprise needs.\nFeatureDescriptionSecureEnterprise-grade security and data handling for APISOC II Type 2 certified, HIPAA compliance options for APIAccessible through AWS (GA) and GCP (in private preview)TrustworthyResistant to jailbreaks and misuse. We continuously monitor prompts and outputs for harmful, malicious use cases that violate our AUP.Copyright indemnity protections for paid commercial servicesUniquely positioned to serve high trust industries that process large volumes of sensitive user dataCapable200K token context window for expanded use cases, with future support for 1MTool use, also known as function calling, which allows seamless integration of Claude into specialized applications and custom workflowsMultimodal input capabilities with text output, allowing you to upload images (such as tables, graphs, and photos) along with text prompts for richer context and complex use casesDeveloper Console with Workbench and prompt generation tool for easier, more powerful prompting and experimentationSDKs and APIs to expedite and enhance developmentReliableVery low hallucination ratesAccurate over long documentsGlobalGreat for coding tasks and fluency in English and non-English languages like Spanish and JapaneseEnables use cases like translation services and broader global utilityCost consciousFamily of models balances cost, performance, and intelligence\nEnterprise-grade security and data handling for APISOC II Type 2 certified, HIPAA compliance options for APIAccessible through AWS (GA) and GCP (in private preview)\nResistant to jailbreaks and misuse. We continuously monitor prompts and outputs for harmful, malicious use cases that violate our AUP.Copyright indemnity protections for paid commercial servicesUniquely positioned to serve high trust industries that process large volumes of sensitive user data\n200K token context window for expanded use cases, with future support for 1MTool use, also known as function calling, which allows seamless integration of Claude into specialized applications and custom workflowsMultimodal input capabilities with text output, allowing you to upload images (such as tables, graphs, and photos) along with text prompts for richer context and complex use casesDeveloper Console with Workbench and prompt generation tool for easier, more powerful prompting and experimentationSDKs and APIs to expedite and enhance development\nVery low hallucination ratesAccurate over long documents\nGreat for coding tasks and fluency in English and non-English languages like Spanish and JapaneseEnables use cases like translation services and broader global utility\nFamily of models balances cost, performance, and intelligence\n \n \n\n \n Deploy your classifier\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n \n\n \n Integrate Claude into your Support Workflow\n\nIntegrate Claude into your Support Workflow\n\n\nWhen integrating your code into production, you’ll need to architect how it fits into the flow of your ticket routing system. There are two ways you could go around doing this:\nPush-based: Where the Support Ticket System you’re using (e.g. Zendesk an Anthropic partner) will trigger your code by sending a webhook event to your routing service, which will then classify the intent and route it.\nPull-Based: Where your code could pull for the latest tickets at a certain schedule and then route them.\nWhile the bulk of the classification work discussed in previous sections remains the same, you will need to wrap your code in a service for either of the two approaches above. The choice of approach depends on what APIs the support ticketing system provides. Between the two, the push-based approach using webhooks is more web-scaleable but needs you to expose a public endpoint that might have IT Security implications. The pull-based approach is easier to implement but makes unnecessary calls to the Support Ticket System.\n\nThe diagram above shows the push-based approach in action:\nSupport Ticket Creation - The process begins when a customer creates a new support ticket. The customer provides the necessary information about their issue or inquiry, which is then submitted to the Support Ticket System.\nWebhook Event Generation - Upon receiving the new support ticket, the Support Ticket System should generate a Webhook Event Ticket Created notification. This event triggers the subsequent steps in the ticket routing process.\nTicket Content Retrieval - The webhook event initiates the retrieval of the ticket’s contents from the Support Ticket System. This step ensures that the full details of the customer’s issue are available for analysis and classification.\nSupport Request Classification - Using the retrieved ticket contents, the system classifies the intent behind the support request using your code. This classification helps identify the most appropriate team or service to handle the ticket. For the webhook-based approach to work, your code from the previous section will need to be served using a RESTful API which can be called from the webhook. The endpoint for the request would need to be reachable from the internet.\nTicket Update - Finally, the ticket is updated back into the Support Ticket System, from where the assigned support team can work on resolving it.\nNote: While the classification method calls Claude API, we’ve removed that extra call from the diagram for simplicity.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 3785,
"tokenUsage": {
@@ -43553,7 +43553,7 @@
"score": 1,
"namedScores": {},
"text": "Based on the provided context, some key capabilities that make Claude suitable for enterprise use cases requiring integration with specialized applications and processing of large volumes of sensitive data include:\n\n1. Enterprise-grade security: Claude is SOC II Type 2 certified and offers HIPAA compliance options for API use, making it suitable for handling sensitive data.\n\n2. Large context window: Claude has a 200K token context window, with future support for 1M tokens, allowing it to process large volumes of data.\n\n3. Tool use/function calling: This enables seamless integration of Claude into specialized applications and custom workflows.\n\n4. Multimodal input capabilities: Claude can process text along with images like tables, graphs, and photos, allowing for richer context and complex use cases.\n\n5. Low hallucination rates and accuracy over long documents: This makes Claude reliable for processing large volumes of data accurately.\n\n6. APIs and SDKs: These tools expedite and enhance development, making it easier to integrate Claude into existing enterprise systems.\n\n7. Trustworthiness: Claude is resistant to jailbreaks and misuse, with continuous monitoring for harmful or malicious use cases.\n\n8. Scalability: Claude is accessible through major cloud platforms (AWS and GCP), allowing for enterprise-scale deployments.\n\n9. Versatility: Claude excels at coding tasks and is fluent in multiple languages, enabling diverse use cases like translation services.\n\n10. Developer tools: The Developer Console with Workbench and prompt generation tools facilitate easier and more powerful prompting and experimentation for enterprise developers.\n\nThese capabilities make Claude well-suited for enterprises that need to process large volumes of sensitive data while integrating with specialized applications and workflows.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are some of the key capabilities that make Claude suitable for enterprise use cases requiring integration with specialized applications and processing of large volumes of sensitive data?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Enterprise considerations\n\nEnterprise considerations\n\n\nAlong with an extensive set of features, tools, and capabilities, Claude is also built to be secure, trustworthy, and scalable for wide-reaching enterprise needs.\nFeatureDescriptionSecureEnterprise-grade security and data handling for APISOC II Type 2 certified, HIPAA compliance options for APIAccessible through AWS (GA) and GCP (in private preview)TrustworthyResistant to jailbreaks and misuse. We continuously monitor prompts and outputs for harmful, malicious use cases that violate our AUP.Copyright indemnity protections for paid commercial servicesUniquely positioned to serve high trust industries that process large volumes of sensitive user dataCapable200K token context window for expanded use cases, with future support for 1MTool use, also known as function calling, which allows seamless integration of Claude into specialized applications and custom workflowsMultimodal input capabilities with text output, allowing you to upload images (such as tables, graphs, and photos) along with text prompts for richer context and complex use casesDeveloper Console with Workbench and prompt generation tool for easier, more powerful prompting and experimentationSDKs and APIs to expedite and enhance developmentReliableVery low hallucination ratesAccurate over long documentsGlobalGreat for coding tasks and fluency in English and non-English languages like Spanish and JapaneseEnables use cases like translation services and broader global utilityCost consciousFamily of models balances cost, performance, and intelligence\nEnterprise-grade security and data handling for APISOC II Type 2 certified, HIPAA compliance options for APIAccessible through AWS (GA) and GCP (in private preview)\nResistant to jailbreaks and misuse. We continuously monitor prompts and outputs for harmful, malicious use cases that violate our AUP.Copyright indemnity protections for paid commercial servicesUniquely positioned to serve high trust industries that process large volumes of sensitive user data\n200K token context window for expanded use cases, with future support for 1MTool use, also known as function calling, which allows seamless integration of Claude into specialized applications and custom workflowsMultimodal input capabilities with text output, allowing you to upload images (such as tables, graphs, and photos) along with text prompts for richer context and complex use casesDeveloper Console with Workbench and prompt generation tool for easier, more powerful prompting and experimentationSDKs and APIs to expedite and enhance development\nVery low hallucination ratesAccurate over long documents\nGreat for coding tasks and fluency in English and non-English languages like Spanish and JapaneseEnables use cases like translation services and broader global utility\nFamily of models balances cost, performance, and intelligence\n \n \n\n \n Deploy your classifier\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n \n\n \n Integrate Claude into your Support Workflow\n\nIntegrate Claude into your Support Workflow\n\n\nWhen integrating your code into production, you’ll need to architect how it fits into the flow of your ticket routing system. There are two ways you could go around doing this:\nPush-based: Where the Support Ticket System you’re using (e.g. Zendesk an Anthropic partner) will trigger your code by sending a webhook event to your routing service, which will then classify the intent and route it.\nPull-Based: Where your code could pull for the latest tickets at a certain schedule and then route them.\nWhile the bulk of the classification work discussed in previous sections remains the same, you will need to wrap your code in a service for either of the two approaches above. The choice of approach depends on what APIs the support ticketing system provides. Between the two, the push-based approach using webhooks is more web-scaleable but needs you to expose a public endpoint that might have IT Security implications. The pull-based approach is easier to implement but makes unnecessary calls to the Support Ticket System.\n\nThe diagram above shows the push-based approach in action:\nSupport Ticket Creation - The process begins when a customer creates a new support ticket. The customer provides the necessary information about their issue or inquiry, which is then submitted to the Support Ticket System.\nWebhook Event Generation - Upon receiving the new support ticket, the Support Ticket System should generate a Webhook Event Ticket Created notification. This event triggers the subsequent steps in the ticket routing process.\nTicket Content Retrieval - The webhook event initiates the retrieval of the ticket’s contents from the Support Ticket System. This step ensures that the full details of the customer’s issue are available for analysis and classification.\nSupport Request Classification - Using the retrieved ticket contents, the system classifies the intent behind the support request using your code. This classification helps identify the most appropriate team or service to handle the ticket. For the webhook-based approach to work, your code from the previous section will need to be served using a RESTful API which can be called from the webhook. The endpoint for the request would need to be reachable from the internet.\nTicket Update - Finally, the ticket is updated back into the Support Ticket System, from where the assigned support team can work on resolving it.\nNote: While the classification method calls Claude API, we’ve removed that extra call from the diagram for simplicity.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are some of the key capabilities that make Claude suitable for enterprise use cases requiring integration with specialized applications and processing of large volumes of sensitive data?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Enterprise considerations\n\nEnterprise considerations\n\n\nAlong with an extensive set of features, tools, and capabilities, Claude is also built to be secure, trustworthy, and scalable for wide-reaching enterprise needs.\nFeatureDescriptionSecureEnterprise-grade security and data handling for APISOC II Type 2 certified, HIPAA compliance options for APIAccessible through AWS (GA) and GCP (in private preview)TrustworthyResistant to jailbreaks and misuse. We continuously monitor prompts and outputs for harmful, malicious use cases that violate our AUP.Copyright indemnity protections for paid commercial servicesUniquely positioned to serve high trust industries that process large volumes of sensitive user dataCapable200K token context window for expanded use cases, with future support for 1MTool use, also known as function calling, which allows seamless integration of Claude into specialized applications and custom workflowsMultimodal input capabilities with text output, allowing you to upload images (such as tables, graphs, and photos) along with text prompts for richer context and complex use casesDeveloper Console with Workbench and prompt generation tool for easier, more powerful prompting and experimentationSDKs and APIs to expedite and enhance developmentReliableVery low hallucination ratesAccurate over long documentsGlobalGreat for coding tasks and fluency in English and non-English languages like Spanish and JapaneseEnables use cases like translation services and broader global utilityCost consciousFamily of models balances cost, performance, and intelligence\nEnterprise-grade security and data handling for APISOC II Type 2 certified, HIPAA compliance options for APIAccessible through AWS (GA) and GCP (in private preview)\nResistant to jailbreaks and misuse. We continuously monitor prompts and outputs for harmful, malicious use cases that violate our AUP.Copyright indemnity protections for paid commercial servicesUniquely positioned to serve high trust industries that process large volumes of sensitive user data\n200K token context window for expanded use cases, with future support for 1MTool use, also known as function calling, which allows seamless integration of Claude into specialized applications and custom workflowsMultimodal input capabilities with text output, allowing you to upload images (such as tables, graphs, and photos) along with text prompts for richer context and complex use casesDeveloper Console with Workbench and prompt generation tool for easier, more powerful prompting and experimentationSDKs and APIs to expedite and enhance development\nVery low hallucination ratesAccurate over long documents\nGreat for coding tasks and fluency in English and non-English languages like Spanish and JapaneseEnables use cases like translation services and broader global utility\nFamily of models balances cost, performance, and intelligence\n \n \n\n \n Deploy your classifier\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n \n\n \n Integrate Claude into your Support Workflow\n\nIntegrate Claude into your Support Workflow\n\n\nWhen integrating your code into production, you’ll need to architect how it fits into the flow of your ticket routing system. There are two ways you could go around doing this:\nPush-based: Where the Support Ticket System you’re using (e.g. Zendesk an Anthropic partner) will trigger your code by sending a webhook event to your routing service, which will then classify the intent and route it.\nPull-Based: Where your code could pull for the latest tickets at a certain schedule and then route them.\nWhile the bulk of the classification work discussed in previous sections remains the same, you will need to wrap your code in a service for either of the two approaches above. The choice of approach depends on what APIs the support ticketing system provides. Between the two, the push-based approach using webhooks is more web-scaleable but needs you to expose a public endpoint that might have IT Security implications. The pull-based approach is easier to implement but makes unnecessary calls to the Support Ticket System.\n\nThe diagram above shows the push-based approach in action:\nSupport Ticket Creation - The process begins when a customer creates a new support ticket. The customer provides the necessary information about their issue or inquiry, which is then submitted to the Support Ticket System.\nWebhook Event Generation - Upon receiving the new support ticket, the Support Ticket System should generate a Webhook Event Ticket Created notification. This event triggers the subsequent steps in the ticket routing process.\nTicket Content Retrieval - The webhook event initiates the retrieval of the ticket’s contents from the Support Ticket System. This step ensures that the full details of the customer’s issue are available for analysis and classification.\nSupport Request Classification - Using the retrieved ticket contents, the system classifies the intent behind the support request using your code. This classification helps identify the most appropriate team or service to handle the ticket. For the webhook-based approach to work, your code from the previous section will need to be served using a RESTful API which can be called from the webhook. The endpoint for the request would need to be reachable from the internet.\nTicket Update - Finally, the ticket is updated back into the Support Ticket System, from where the assigned support team can work on resolving it.\nNote: While the classification method calls Claude API, we’ve removed that extra call from the diagram for simplicity.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 7560,
"tokenUsage": {
@@ -44108,7 +44108,7 @@
"score": 0,
"namedScores": {},
"text": "The Generated Answer is partially correct but missing a critical piece of information. It correctly states that Anthropic released a prompt generator tool and that it's available through the Developer Console interface. However, it omits the specific release date (May 10th, 2024) mentioned in the Correct Answer. This date is a key piece of information in the Correct Answer. While we're instructed to ignore minute details, a specific release date seems significant enough to be considered critical information. Therefore, despite the rest of the answer being correct in substance, the omission of this date makes the Generated Answer incomplete.\n---\nAnthropic released a prompt generator tool to help guide Claude in generating high-quality prompts, and it is available through the Developer Console interface.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 2,
"tokenUsage": {},
@@ -44142,7 +44142,7 @@
"score": 0,
"namedScores": {},
"text": "The Generated Answer is incorrect based on the Correct Answer. While the Generated Answer correctly identifies that a prompt generator tool is available through the Developer Console interface, it fails to provide the critical release date information. The Correct Answer specifies that Anthropic released this tool on May 10th, 2024, which is a key piece of information missing from the Generated Answer. Additionally, the Generated Answer explicitly states that no specific release date is provided in the given context, which directly contradicts the Correct Answer. These discrepancies make the Generated Answer incorrect in substance, despite correctly identifying the existence of the tool and its availability through the Developer Console.\n---\nBased on the information provided in the documents, Anthropic has not released a specific prompt generator tool with a release date mentioned. However, the documents do reference a \"prompt generator tool\" as part of the Developer Console. Specifically, it states:\n\n\"Developer Console: Enjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\"\n\nThis indicates that a prompt generator tool is available through Anthropic's Developer Console interface, which is accessible via a web browser. However, no specific release date for this tool is provided in the given context.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 3254,
"tokenUsage": {
@@ -44180,7 +44180,7 @@
"score": 1,
"namedScores": {},
"text": "Anthropic released a prompt generator tool to help guide Claude in generating high-quality prompts in May 2024, and it is available through the Developer Console.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nText\n May 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n\nSummary: \n Anthropic has released a Prompt Generator tool in the Developer Console, which helps users create high-quality prompts tailored to their specific tasks. The tool is discussed in a recent blog post, and is part of Anthropic's suite of Claude AI model-related products and services. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Prompt engineering workflow\n\nText\n Prompt engineering workflow\n\n\nOur Claude for Sheets prompting examples workbench is a Claude-powered spreadsheet that houses example prompts and prompt engineering structures.\n \n\nSummary: \n The Anthropic Claude for Sheets prompting examples workbench is a Claude-powered spreadsheet that provides example prompts and prompt engineering structures, serving as a resource for users to explore and learn about prompt engineering. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nText\n May 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n\nSummary: \n Anthropic has released a Prompt Generator tool in the Developer Console, which helps users create high-quality prompts tailored to their specific tasks. The tool is discussed in a recent blog post, and is part of Anthropic's suite of Claude AI model-related products and services. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Prompt engineering workflow\n\nText\n Prompt engineering workflow\n\n\nOur Claude for Sheets prompting examples workbench is a Claude-powered spreadsheet that houses example prompts and prompt engineering structures.\n \n\nSummary: \n The Anthropic Claude for Sheets prompting examples workbench is a Claude-powered spreadsheet that provides example prompts and prompt engineering structures, serving as a resource for users to explore and learn about prompt engineering. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 758,
"tokenUsage": {
@@ -44218,7 +44218,7 @@
"score": 1,
"namedScores": {},
"text": "According to the provided information, Anthropic released a prompt generator tool on May 10th, 2024. The tool is available through the Developer Console interface. This prompt generator is designed to help users create high-quality prompts tailored to specific tasks when working with Claude.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nText\n May 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n\nSummary: \n Anthropic has released a Prompt Generator tool in the Developer Console, which helps users create high-quality prompts tailored to their specific tasks. The tool is discussed in a recent blog post, and is part of Anthropic's suite of Claude AI model-related products and services. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Prompt engineering workflow\n\nText\n Prompt engineering workflow\n\n\nOur Claude for Sheets prompting examples workbench is a Claude-powered spreadsheet that houses example prompts and prompt engineering structures.\n \n\nSummary: \n The Anthropic Claude for Sheets prompting examples workbench is a Claude-powered spreadsheet that provides example prompts and prompt engineering structures, serving as a resource for users to explore and learn about prompt engineering. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nText\n May 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n\nSummary: \n Anthropic has released a Prompt Generator tool in the Developer Console, which helps users create high-quality prompts tailored to their specific tasks. The tool is discussed in a recent blog post, and is part of Anthropic's suite of Claude AI model-related products and services. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Prompt engineering workflow\n\nText\n Prompt engineering workflow\n\n\nOur Claude for Sheets prompting examples workbench is a Claude-powered spreadsheet that houses example prompts and prompt engineering structures.\n \n\nSummary: \n The Anthropic Claude for Sheets prompting examples workbench is a Claude-powered spreadsheet that provides example prompts and prompt engineering structures, serving as a resource for users to explore and learn about prompt engineering. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 1816,
"tokenUsage": {
@@ -44256,7 +44256,7 @@
"score": 1,
"namedScores": {},
"text": "Anthropic released a prompt generator tool to help guide Claude in generating high-quality prompts on May 10th, 2024, and it is available through the Developer Console.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Before prompt engineering\n\nBefore prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Before prompt engineering\n\nBefore prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 840,
"tokenUsage": {
@@ -44294,7 +44294,7 @@
"score": 1,
"namedScores": {},
"text": "According to the provided documents, Anthropic released a prompt generator tool on May 10th, 2024. The prompt generator is available through the Developer Console interface. This tool is designed to help guide Claude in generating high-quality prompts tailored to specific tasks.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Before prompt engineering\n\nBefore prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Before prompt engineering\n\nBefore prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 1365,
"tokenUsage": {
@@ -46331,7 +46331,7 @@
"score": 0,
"namedScores": {},
"text": "The Generated Answer is partially correct but misses a key piece of information. It correctly identifies \"Develop your test cases\" as one of the steps, which aligns with the Correct Answer. However, it incorrectly states \"Build a strong input prompt\" as the first step, which is not mentioned in the Correct Answer. The Correct Answer specifies that the second step is to \"take a look at Anthropic's guide to developing test cases,\" which is not included in the Generated Answer. This omission of a critical piece of information from the Correct Answer makes the Generated Answer incomplete and therefore incorrect.\n---\nThe two steps needed before running a classification evaluation on Claude according to the documentation are:\n\n1. Build a strong input prompt.\n2. Develop your test cases.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nImplement Claude for classification\n\n\nThe three key model decision factors are: intelligence, latency, and price.\nFor classification, a smaller model like Claude 3 Haiku is typically ideal due to its speed and efficiency. Though, for classification tasks where specialized knowledge or complex reasoning is required, Sonnet or Opus may be a better choice. Learn more about how Opus, Sonnet, and Haiku compare here.\nUse evaluations to gauge whether a Claude model is performing well enough to launch into production.\n\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n\n\nEvaluation metrics\n\n\nSome success metrics to consider evaluating Claude’s performance on a classification task include:\nCriteriaDescriptionAccuracyThe model’s output exactly matches the golden answer or correctly classifies the input according to the task’s requirements. This is typically calculated as (Number of correct predictions) / (Overall number of predictions).F1 ScoreThe model’s output optimally balances precision and recall.ConsistencyThe model’s output is consistent with its predictions for similar inputs or follows a logical pattern.StructureThe model’s output follows the expected format or structure, making it easy to parse and interpret. For example, many classifiers are expected to output JSON format.SpeedThe model provides a response within the acceptable time limit or latency threshold for the task.Bias and FairnessIf classifying data about people, is it important that the model does not demonstrate any biases based on gender, ethnicity, or other characteristics that would lead to its misclassification.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nImplement Claude for classification\n\n\nThe three key model decision factors are: intelligence, latency, and price.\nFor classification, a smaller model like Claude 3 Haiku is typically ideal due to its speed and efficiency. Though, for classification tasks where specialized knowledge or complex reasoning is required, Sonnet or Opus may be a better choice. Learn more about how Opus, Sonnet, and Haiku compare here.\nUse evaluations to gauge whether a Claude model is performing well enough to launch into production.\n\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n\n\nEvaluation metrics\n\n\nSome success metrics to consider evaluating Claude’s performance on a classification task include:\nCriteriaDescriptionAccuracyThe model’s output exactly matches the golden answer or correctly classifies the input according to the task’s requirements. This is typically calculated as (Number of correct predictions) / (Overall number of predictions).F1 ScoreThe model’s output optimally balances precision and recall.ConsistencyThe model’s output is consistent with its predictions for similar inputs or follows a logical pattern.StructureThe model’s output follows the expected format or structure, making it easy to parse and interpret. For example, many classifiers are expected to output JSON format.SpeedThe model provides a response within the acceptable time limit or latency threshold for the task.Bias and FairnessIf classifying data about people, is it important that the model does not demonstrate any biases based on gender, ethnicity, or other characteristics that would lead to its misclassification.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 1,
"tokenUsage": {},
@@ -46365,7 +46365,7 @@
"score": 0,
"namedScores": {},
"text": "The Generated Answer is partially correct but misses a key element mentioned in the Correct Answer. Both answers agree that developing test cases is one of the necessary steps. However, the Generated Answer incorrectly states that \"Build a strong input prompt\" is the other step, while the Correct Answer specifies that users should \"take a look at Anthropic's guide to developing test cases.\" This guide is not mentioned in the Generated Answer, which is a critical piece of information missing. Additionally, the Generated Answer does not capture the implied sequence (first look at the guide, then develop test cases) that the Correct Answer suggests. Due to these discrepancies, the Generated Answer cannot be considered fully correct.\n---\nAccording to the documentation provided, two steps are needed before running a classification evaluation on Claude:\n\n1. Build a strong input prompt\n2. Develop your test cases\n\nThese steps are listed under the \"Implement Claude for classification\" section, preceding the \"Run your eval\" step.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nImplement Claude for classification\n\n\nThe three key model decision factors are: intelligence, latency, and price.\nFor classification, a smaller model like Claude 3 Haiku is typically ideal due to its speed and efficiency. Though, for classification tasks where specialized knowledge or complex reasoning is required, Sonnet or Opus may be a better choice. Learn more about how Opus, Sonnet, and Haiku compare here.\nUse evaluations to gauge whether a Claude model is performing well enough to launch into production.\n\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n\n\nEvaluation metrics\n\n\nSome success metrics to consider evaluating Claude’s performance on a classification task include:\nCriteriaDescriptionAccuracyThe model’s output exactly matches the golden answer or correctly classifies the input according to the task’s requirements. This is typically calculated as (Number of correct predictions) / (Overall number of predictions).F1 ScoreThe model’s output optimally balances precision and recall.ConsistencyThe model’s output is consistent with its predictions for similar inputs or follows a logical pattern.StructureThe model’s output follows the expected format or structure, making it easy to parse and interpret. For example, many classifiers are expected to output JSON format.SpeedThe model provides a response within the acceptable time limit or latency threshold for the task.Bias and FairnessIf classifying data about people, is it important that the model does not demonstrate any biases based on gender, ethnicity, or other characteristics that would lead to its misclassification.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nImplement Claude for classification\n\n\nThe three key model decision factors are: intelligence, latency, and price.\nFor classification, a smaller model like Claude 3 Haiku is typically ideal due to its speed and efficiency. Though, for classification tasks where specialized knowledge or complex reasoning is required, Sonnet or Opus may be a better choice. Learn more about how Opus, Sonnet, and Haiku compare here.\nUse evaluations to gauge whether a Claude model is performing well enough to launch into production.\n\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n\n\nEvaluation metrics\n\n\nSome success metrics to consider evaluating Claude’s performance on a classification task include:\nCriteriaDescriptionAccuracyThe model’s output exactly matches the golden answer or correctly classifies the input according to the task’s requirements. This is typically calculated as (Number of correct predictions) / (Overall number of predictions).F1 ScoreThe model’s output optimally balances precision and recall.ConsistencyThe model’s output is consistent with its predictions for similar inputs or follows a logical pattern.StructureThe model’s output follows the expected format or structure, making it easy to parse and interpret. For example, many classifiers are expected to output JSON format.SpeedThe model provides a response within the acceptable time limit or latency threshold for the task.Bias and FairnessIf classifying data about people, is it important that the model does not demonstrate any biases based on gender, ethnicity, or other characteristics that would lead to its misclassification.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 1619,
"tokenUsage": {
@@ -46403,7 +46403,7 @@
"score": 0,
"namedScores": {},
"text": "The Generated Answer is partially correct but misses a critical piece of information. Both answers correctly identify \"Develop your test cases\" as the first step. However, the Generated Answer incorrectly states that the second step is to \"Run your evaluation.\" The Correct Answer indicates that the second step is to \"take a look at Anthropic's guide to developing test cases.\" This is a significant difference, as reviewing the guide is an important preparatory step before actually running the evaluation. The Generated Answer skips this crucial step and jumps directly to running the evaluation, which could lead to poorly designed test cases. Therefore, while the Generated Answer gets the first step right, it misses an essential part of the preparation process.\n---\nAccording to the documentation, the two steps needed before running a classification evaluation on Claude are:\n\n1. Develop your test cases.\n2. Run your evaluation.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Implement Claude for classification\n\nText\n Implement Claude for classification\n\n\nThe three key model decision factors are: intelligence, latency, and price.\nFor classification, a smaller model like Claude 3 Haiku is typically ideal due to its speed and efficiency. Though, for classification tasks where specialized knowledge or complex reasoning is required, Sonnet or Opus may be a better choice. Learn more about how Opus, Sonnet, and Haiku compare here.\nUse evaluations to gauge whether a Claude model is performing well enough to launch into production.\n \n\nSummary: \n For classification tasks, the smaller Claude 3 Haiku model is typically ideal due to its speed and efficiency, though Sonnet or Opus may be better for tasks requiring specialized knowledge or complex reasoning. Evaluations should be used to gauge whether a Claude model is performing well enough for production. \n \n\n \n Evaluation metrics\n\nText\n Evaluation metrics\n\n\nSome success metrics to consider evaluating Claude’s performance on a classification task include:\nCriteriaDescriptionAccuracyThe model’s output exactly matches the golden answer or correctly classifies the input according to the task’s requirements. This is typically calculated as (Number of correct predictions) / (Overall number of predictions).F1 ScoreThe model’s output optimally balances precision and recall.ConsistencyThe model’s output is consistent with its predictions for similar inputs or follows a logical pattern.StructureThe model’s output follows the expected format or structure, making it easy to parse and interpret. For example, many classifiers are expected to output JSON format.SpeedThe model provides a response within the acceptable time limit or latency threshold for the task.Bias and FairnessIf classifying data about people, is it important that the model does not demonstrate any biases based on gender, ethnicity, or other characteristics that would lead to its misclassification.\n \n\nSummary: \n The documentation covers evaluation metrics for the Claude AI model, including accuracy, F1 score, consistency, structure, speed, and bias/fairness. These metrics can be used to assess the model's performance on classification tasks, ensuring it meets the required standards for output quality, consistency, and fairness. \n \n\n \n Deploy your classifier\n\nText\n Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n\nSummary: \n Deploy your classifier: Check out the Classification Guide in the Claude Cookbook for code examples on using Claude for classification. The guide covers when to use Claude for classification, establishing your use case, implementing Claude, building prompts, developing test cases, running evaluations, and deploying your classifier. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Implement Claude for classification\n\nText\n Implement Claude for classification\n\n\nThe three key model decision factors are: intelligence, latency, and price.\nFor classification, a smaller model like Claude 3 Haiku is typically ideal due to its speed and efficiency. Though, for classification tasks where specialized knowledge or complex reasoning is required, Sonnet or Opus may be a better choice. Learn more about how Opus, Sonnet, and Haiku compare here.\nUse evaluations to gauge whether a Claude model is performing well enough to launch into production.\n \n\nSummary: \n For classification tasks, the smaller Claude 3 Haiku model is typically ideal due to its speed and efficiency, though Sonnet or Opus may be better for tasks requiring specialized knowledge or complex reasoning. Evaluations should be used to gauge whether a Claude model is performing well enough for production. \n \n\n \n Evaluation metrics\n\nText\n Evaluation metrics\n\n\nSome success metrics to consider evaluating Claude’s performance on a classification task include:\nCriteriaDescriptionAccuracyThe model’s output exactly matches the golden answer or correctly classifies the input according to the task’s requirements. This is typically calculated as (Number of correct predictions) / (Overall number of predictions).F1 ScoreThe model’s output optimally balances precision and recall.ConsistencyThe model’s output is consistent with its predictions for similar inputs or follows a logical pattern.StructureThe model’s output follows the expected format or structure, making it easy to parse and interpret. For example, many classifiers are expected to output JSON format.SpeedThe model provides a response within the acceptable time limit or latency threshold for the task.Bias and FairnessIf classifying data about people, is it important that the model does not demonstrate any biases based on gender, ethnicity, or other characteristics that would lead to its misclassification.\n \n\nSummary: \n The documentation covers evaluation metrics for the Claude AI model, including accuracy, F1 score, consistency, structure, speed, and bias/fairness. These metrics can be used to assess the model's performance on classification tasks, ensuring it meets the required standards for output quality, consistency, and fairness. \n \n\n \n Deploy your classifier\n\nText\n Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n\nSummary: \n Deploy your classifier: Check out the Classification Guide in the Claude Cookbooks for code examples on using Claude for classification. The guide covers when to use Claude for classification, establishing your use case, implementing Claude, building prompts, developing test cases, running evaluations, and deploying your classifier. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 916,
"tokenUsage": {
@@ -46441,7 +46441,7 @@
"score": 0,
"namedScores": {},
"text": "The Generated Answer is partially correct, but misses a key element mentioned in the Correct Answer. Both answers agree that developing test cases is one of the necessary steps. However, the Generated Answer incorrectly states that building a strong input prompt is the other step, while the Correct Answer specifies that taking a look at Anthropic's guide to developing test cases is the second step. This is a significant difference in content. Additionally, the Generated Answer does not mention the need to review Anthropic's guide, which is an important part of the process according to the Correct Answer. Due to this missing critical information and the incorrect step mentioned, the Generated Answer cannot be considered fully correct.\n---\nAccording to the documentation provided, two steps needed before running a classification evaluation on Claude are:\n\n1. Build a strong input prompt\n2. Develop your test cases\n\nThese steps are listed in the \"Deploy your classifier\" section, which outlines the process for implementing Claude for classification tasks.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Implement Claude for classification\n\nText\n Implement Claude for classification\n\n\nThe three key model decision factors are: intelligence, latency, and price.\nFor classification, a smaller model like Claude 3 Haiku is typically ideal due to its speed and efficiency. Though, for classification tasks where specialized knowledge or complex reasoning is required, Sonnet or Opus may be a better choice. Learn more about how Opus, Sonnet, and Haiku compare here.\nUse evaluations to gauge whether a Claude model is performing well enough to launch into production.\n \n\nSummary: \n For classification tasks, the smaller Claude 3 Haiku model is typically ideal due to its speed and efficiency, though Sonnet or Opus may be better for tasks requiring specialized knowledge or complex reasoning. Evaluations should be used to gauge whether a Claude model is performing well enough for production. \n \n\n \n Evaluation metrics\n\nText\n Evaluation metrics\n\n\nSome success metrics to consider evaluating Claude’s performance on a classification task include:\nCriteriaDescriptionAccuracyThe model’s output exactly matches the golden answer or correctly classifies the input according to the task’s requirements. This is typically calculated as (Number of correct predictions) / (Overall number of predictions).F1 ScoreThe model’s output optimally balances precision and recall.ConsistencyThe model’s output is consistent with its predictions for similar inputs or follows a logical pattern.StructureThe model’s output follows the expected format or structure, making it easy to parse and interpret. For example, many classifiers are expected to output JSON format.SpeedThe model provides a response within the acceptable time limit or latency threshold for the task.Bias and FairnessIf classifying data about people, is it important that the model does not demonstrate any biases based on gender, ethnicity, or other characteristics that would lead to its misclassification.\n \n\nSummary: \n The documentation covers evaluation metrics for the Claude AI model, including accuracy, F1 score, consistency, structure, speed, and bias/fairness. These metrics can be used to assess the model's performance on classification tasks, ensuring it meets the required standards for output quality, consistency, and fairness. \n \n\n \n Deploy your classifier\n\nText\n Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n\nSummary: \n Deploy your classifier: Check out the Classification Guide in the Claude Cookbook for code examples on using Claude for classification. The guide covers when to use Claude for classification, establishing your use case, implementing Claude, building prompts, developing test cases, running evaluations, and deploying your classifier. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Implement Claude for classification\n\nText\n Implement Claude for classification\n\n\nThe three key model decision factors are: intelligence, latency, and price.\nFor classification, a smaller model like Claude 3 Haiku is typically ideal due to its speed and efficiency. Though, for classification tasks where specialized knowledge or complex reasoning is required, Sonnet or Opus may be a better choice. Learn more about how Opus, Sonnet, and Haiku compare here.\nUse evaluations to gauge whether a Claude model is performing well enough to launch into production.\n \n\nSummary: \n For classification tasks, the smaller Claude 3 Haiku model is typically ideal due to its speed and efficiency, though Sonnet or Opus may be better for tasks requiring specialized knowledge or complex reasoning. Evaluations should be used to gauge whether a Claude model is performing well enough for production. \n \n\n \n Evaluation metrics\n\nText\n Evaluation metrics\n\n\nSome success metrics to consider evaluating Claude’s performance on a classification task include:\nCriteriaDescriptionAccuracyThe model’s output exactly matches the golden answer or correctly classifies the input according to the task’s requirements. This is typically calculated as (Number of correct predictions) / (Overall number of predictions).F1 ScoreThe model’s output optimally balances precision and recall.ConsistencyThe model’s output is consistent with its predictions for similar inputs or follows a logical pattern.StructureThe model’s output follows the expected format or structure, making it easy to parse and interpret. For example, many classifiers are expected to output JSON format.SpeedThe model provides a response within the acceptable time limit or latency threshold for the task.Bias and FairnessIf classifying data about people, is it important that the model does not demonstrate any biases based on gender, ethnicity, or other characteristics that would lead to its misclassification.\n \n\nSummary: \n The documentation covers evaluation metrics for the Claude AI model, including accuracy, F1 score, consistency, structure, speed, and bias/fairness. These metrics can be used to assess the model's performance on classification tasks, ensuring it meets the required standards for output quality, consistency, and fairness. \n \n\n \n Deploy your classifier\n\nText\n Deploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n\nSummary: \n Deploy your classifier: Check out the Classification Guide in the Claude Cookbooks for code examples on using Claude for classification. The guide covers when to use Claude for classification, establishing your use case, implementing Claude, building prompts, developing test cases, running evaluations, and deploying your classifier. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 1943,
"tokenUsage": {
@@ -46479,7 +46479,7 @@
"score": 0,
"namedScores": {},
"text": "The Generated Answer is partially correct but misses a key piece of information. Both answers correctly identify \"Develop your test cases\" as the first step. However, the Generated Answer incorrectly states that the second step is to \"Deploy your classifier.\" The Correct Answer indicates that the second step is to \"take a look at Anthropic's guide to developing test cases.\" This is a significant difference, as reviewing the guide is an important preparatory step that the Generated Answer fails to mention. Additionally, the Generated Answer does not include any reference to Anthropic's guide, which is a critical piece of information present in the Correct Answer. Due to this missing crucial information, the Generated Answer cannot be considered fully correct.\n---\nThe two steps needed before running a classification evaluation on Claude according to the documentation are:\n\n1. Develop your test cases.\n2. Deploy your classifier.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Building evals and test cases\n\nBuilding evals and test cases\n\n\n \n \n\n \n 2. Develop your test cases\n\n2. Develop your test cases\n\n\nTo run your classification evaluation, you will need test cases to run it on. Take a look at our guide to developing test cases.\n \n \n\n \n Deploy your classifier\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Building evals and test cases\n\nBuilding evals and test cases\n\n\n \n \n\n \n 2. Develop your test cases\n\n2. Develop your test cases\n\n\nTo run your classification evaluation, you will need test cases to run it on. Take a look at our guide to developing test cases.\n \n \n\n \n Deploy your classifier\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 799,
"tokenUsage": {
@@ -46517,7 +46517,7 @@
"score": 0,
"namedScores": {},
"text": "The Generated Answer is partially correct but misses a key point mentioned in the Correct Answer. Both answers agree that developing test cases is one of the necessary steps. However, the Generated Answer incorrectly states that building a strong input prompt is the other step, while the Correct Answer specifies that taking a look at Anthropic's guide to developing test cases is the second step. This is a significant difference in content. Additionally, the Generated Answer does not mention the need to review Anthropic's guide, which is explicitly stated in the Correct Answer. Due to this missing critical information and the incorrect step mentioned, the Generated Answer cannot be considered fully correct.\n---\nAccording to the documentation, the two steps needed before running a classification evaluation on Claude are:\n\n1. Build a strong input prompt\n2. Develop your test cases\n\nThese steps are listed in the \"Implement Claude for classification\" section of the document, preceding the step to \"Run your eval\".",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Building evals and test cases\n\nBuilding evals and test cases\n\n\n \n \n\n \n 2. Develop your test cases\n\n2. Develop your test cases\n\n\nTo run your classification evaluation, you will need test cases to run it on. Take a look at our guide to developing test cases.\n \n \n\n \n Deploy your classifier\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbook.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What two steps are needed before running a classification evaluation on Claude according to the documentation?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Building evals and test cases\n\nBuilding evals and test cases\n\n\n \n \n\n \n 2. Develop your test cases\n\n2. Develop your test cases\n\n\nTo run your classification evaluation, you will need test cases to run it on. Take a look at our guide to developing test cases.\n \n \n\n \n Deploy your classifier\n\nDeploy your classifier\n\n\nTo see code examples of how to use Claude for classification, check out the Classification Guide in the Claude Cookbooks.\nOverviewTicket Routingxlinkedin\nOverviewTicket Routing\nxlinkedin\nWhen to use Claude for classification Establish your classification use case Implement Claude for classification 1. Build a strong input prompt 2. Develop your test cases 3. Run your eval Evaluation metrics Deploy your classifier\nWhen to use Claude for classificationEstablish your classification use caseImplement Claude for classification1. Build a strong input prompt2. Develop your test cases3. Run your evalEvaluation metricsDeploy your classifier\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 2815,
"tokenUsage": {
@@ -48800,8 +48800,8 @@
"pass": false,
"score": 0,
"namedScores": {},
- "text": "The Generated Answer is partially correct but contains additional information not present in the Correct Answer. Both answers mention the interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings. However, the Generated Answer adds a second point about hands-on tutorials and practical examples that extend Claude's capabilities, which is not mentioned in the Correct Answer. While this additional information may be true, it goes beyond what is specified in the Correct Answer. Since we are instructed to mark an answer as incorrect if any critical piece of information is missing from the Generated Answer compared to the Correct Answer, we must consider this Generated Answer incorrect.\n---\nThe Claude Cookbook can help developers learn to use Anthropic's APIs in two ways:\n\n1. Interactive Jupyter notebooks that demonstrate uploading PDFs, creating embeddings, and more.\n2. Hands-on tutorials and practical examples that extend Claude's capabilities by integrating external tools and functions into workflows.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "text": "The Generated Answer is partially correct but contains additional information not present in the Correct Answer. Both answers mention the interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings. However, the Generated Answer adds a second point about hands-on tutorials and practical examples that extend Claude's capabilities, which is not mentioned in the Correct Answer. While this additional information may be true, it goes beyond what is specified in the Correct Answer. Since we are instructed to mark an answer as incorrect if any critical piece of information is missing from the Generated Answer compared to the Correct Answer, we must consider this Generated Answer incorrect.\n---\nThe Claude Cookbooks can help developers learn to use Anthropic's APIs in two ways:\n\n1. Interactive Jupyter notebooks that demonstrate uploading PDFs, creating embeddings, and more.\n2. Hands-on tutorials and practical examples that extend Claude's capabilities by integrating external tools and functions into workflows.",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 1,
"tokenUsage": {},
@@ -48834,8 +48834,8 @@
"pass": true,
"score": 1,
"namedScores": {},
- "text": "The Claude Cookbook can help developers learn to use Anthropic's APIs in two key ways:\n\n1. It provides interactive Jupyter notebooks that demonstrate practical examples, such as uploading PDFs and working with embeddings.\n\n2. It offers hands-on tutorials for specific tasks like PDF summarization and integrating external tools and functions into workflows with Claude.\n\nThese interactive and practical resources allow developers to gain hands-on experience with Anthropic's APIs through guided examples and tutorials.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "text": "The Claude Cookbooks can help developers learn to use Anthropic's APIs in two key ways:\n\n1. It provides interactive Jupyter notebooks that demonstrate practical examples, such as uploading PDFs and working with embeddings.\n\n2. It offers hands-on tutorials for specific tasks like PDF summarization and integrating external tools and functions into workflows with Claude.\n\nThese interactive and practical resources allow developers to gain hands-on experience with Anthropic's APIs through guided examples and tutorials.",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 2820,
"tokenUsage": {
@@ -48857,7 +48857,7 @@
{
"pass": true,
"score": 1,
- "reason": "The Generated Answer is correct based on the Correct Answer. Both answers highlight that the Claude Cookbook provides interactive Jupyter notebooks to help developers learn to use Anthropic's APIs. They both specifically mention working with PDFs and embeddings as examples. While the Generated Answer provides an additional point about tutorials for specific tasks, this doesn't contradict the Correct Answer and can be seen as an elaboration on the same core idea of providing practical, interactive learning resources. The substance of both answers is essentially the same, focusing on the Cookbook's role in providing hands-on, interactive learning experiences for developers working with Anthropic's APIs.",
+ "reason": "The Generated Answer is correct based on the Correct Answer. Both answers highlight that the Claude Cookbooks provides interactive Jupyter notebooks to help developers learn to use Anthropic's APIs. They both specifically mention working with PDFs and embeddings as examples. While the Generated Answer provides an additional point about tutorials for specific tasks, this doesn't contradict the Correct Answer and can be seen as an elaboration on the same core idea of providing practical, interactive learning resources. The substance of both answers is essentially the same, focusing on the Cookbook's role in providing hands-on, interactive learning experiences for developers working with Anthropic's APIs.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -48872,8 +48872,8 @@
"pass": false,
"score": 0,
"namedScores": {},
- "text": "The Generated Answer is partially correct but contains some information not present in the Correct Answer. The first point about interactive Jupyter notebooks demonstrating PDF uploads and embeddings aligns perfectly with the Correct Answer. However, the second point about the Developer Console, Workbench, and API Reference documentation is not mentioned in the Correct Answer. While this additional information may be accurate, it goes beyond the scope of what was specified in the Correct Answer. Since the Generated Answer is missing one of the two ways explicitly mentioned in the Correct Answer (working with embeddings), and includes information not present in the Correct Answer, it cannot be considered fully correct.\n---\nThe Claude Cookbook can help developers learn to use Anthropic's APIs in two ways:\n\n1. Interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more. The Claude Cookbook provides these interactive notebooks to help developers learn through hands-on examples.\n\n2. The Claude Cookbook is part of Anthropic's suite of developer tools, which also includes the Developer Console with the Workbench and prompt generator tool, as well as the API Reference documentation. These tools enable easier, more powerful prompting and exploration of the Claude API and SDKs.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Next steps\n\nText\n Next steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n\nSummary: \n The summary covers the next steps after making an initial Claude API request, including exploring the Prompt Engineering Guide to optimize Claude's performance, the Claude Cookbook for interactive Jupyter notebooks, and the Prompt Library for example prompts across use cases. It also mentions the overview and prerequisites for working with the Anthropic platform. \n \n\n \n Accessing the API\n\nText\n Accessing the API\n\n\nThe API is made available via our web Console. You can use the Workbench to try out the API in the browser and then generate API keys in Account Settings.\n \n\nSummary: \n The API can be accessed through Anthropic's web Console. Users can use the Workbench to try out the API in the browser and then generate API keys in the Account Settings. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "text": "The Generated Answer is partially correct but contains some information not present in the Correct Answer. The first point about interactive Jupyter notebooks demonstrating PDF uploads and embeddings aligns perfectly with the Correct Answer. However, the second point about the Developer Console, Workbench, and API Reference documentation is not mentioned in the Correct Answer. While this additional information may be accurate, it goes beyond the scope of what was specified in the Correct Answer. Since the Generated Answer is missing one of the two ways explicitly mentioned in the Correct Answer (working with embeddings), and includes information not present in the Correct Answer, it cannot be considered fully correct.\n---\nThe Claude Cookbooks can help developers learn to use Anthropic's APIs in two ways:\n\n1. Interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more. The Claude Cookbooks provides these interactive notebooks to help developers learn through hands-on examples.\n\n2. The Claude Cookbooks is part of Anthropic's suite of developer tools, which also includes the Developer Console with the Workbench and prompt generator tool, as well as the API Reference documentation. These tools enable easier, more powerful prompting and exploration of the Claude API and SDKs.",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Next steps\n\nText\n Next steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n\nSummary: \n The summary covers the next steps after making an initial Claude API request, including exploring the Prompt Engineering Guide to optimize Claude's performance, the Claude Cookbooks for interactive Jupyter notebooks, and the Prompt Library for example prompts across use cases. It also mentions the overview and prerequisites for working with the Anthropic platform. \n \n\n \n Accessing the API\n\nText\n Accessing the API\n\n\nThe API is made available via our web Console. You can use the Workbench to try out the API in the browser and then generate API keys in Account Settings.\n \n\nSummary: \n The API can be accessed through Anthropic's web Console. Users can use the Workbench to try out the API in the browser and then generate API keys in the Account Settings. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 1319,
"tokenUsage": {
@@ -48910,8 +48910,8 @@
"pass": true,
"score": 1,
"namedScores": {},
- "text": "The Claude Cookbook can help developers learn to use Anthropic's APIs in two key ways:\n\n1. It provides interactive Jupyter notebooks that demonstrate various API functionalities, including uploading PDFs and working with embeddings.\n\n2. It offers hands-on learning experiences through these interactive notebooks, allowing developers to explore and experiment with the APIs in a guided environment.\n\nThese features enable developers to gain practical experience and understanding of Anthropic's APIs through interactive, example-based learning.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Next steps\n\nText\n Next steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n\nSummary: \n The summary covers the next steps after making an initial Claude API request, including exploring the Prompt Engineering Guide to optimize Claude's performance, the Claude Cookbook for interactive Jupyter notebooks, and the Prompt Library for example prompts across use cases. It also mentions the overview and prerequisites for working with the Anthropic platform. \n \n\n \n Accessing the API\n\nText\n Accessing the API\n\n\nThe API is made available via our web Console. You can use the Workbench to try out the API in the browser and then generate API keys in Account Settings.\n \n\nSummary: \n The API can be accessed through Anthropic's web Console. Users can use the Workbench to try out the API in the browser and then generate API keys in the Account Settings. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "text": "The Claude Cookbooks can help developers learn to use Anthropic's APIs in two key ways:\n\n1. It provides interactive Jupyter notebooks that demonstrate various API functionalities, including uploading PDFs and working with embeddings.\n\n2. It offers hands-on learning experiences through these interactive notebooks, allowing developers to explore and experiment with the APIs in a guided environment.\n\nThese features enable developers to gain practical experience and understanding of Anthropic's APIs through interactive, example-based learning.",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Next steps\n\nText\n Next steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n\nSummary: \n The summary covers the next steps after making an initial Claude API request, including exploring the Prompt Engineering Guide to optimize Claude's performance, the Claude Cookbooks for interactive Jupyter notebooks, and the Prompt Library for example prompts across use cases. It also mentions the overview and prerequisites for working with the Anthropic platform. \n \n\n \n Accessing the API\n\nText\n Accessing the API\n\n\nThe API is made available via our web Console. You can use the Workbench to try out the API in the browser and then generate API keys in Account Settings.\n \n\nSummary: \n The API can be accessed through Anthropic's web Console. Users can use the Workbench to try out the API in the browser and then generate API keys in the Account Settings. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 2879,
"tokenUsage": {
@@ -48933,7 +48933,7 @@
{
"pass": true,
"score": 1,
- "reason": "The Generated Answer is correct based on the Correct Answer. Both answers highlight that the Claude Cookbook provides interactive Jupyter notebooks as a way to help developers learn to use Anthropic's APIs. They both specifically mention that these notebooks demonstrate how to upload PDFs and work with embeddings. The Generated Answer expands on this by emphasizing the hands-on learning experience, but this is consistent with and elaborates on the core information provided in the Correct Answer. There are no contradictions or critical missing pieces of information between the two answers.",
+ "reason": "The Generated Answer is correct based on the Correct Answer. Both answers highlight that the Claude Cookbooks provides interactive Jupyter notebooks as a way to help developers learn to use Anthropic's APIs. They both specifically mention that these notebooks demonstrate how to upload PDFs and work with embeddings. The Generated Answer expands on this by emphasizing the hands-on learning experience, but this is consistent with and elaborates on the core information provided in the Correct Answer. There are no contradictions or critical missing pieces of information between the two answers.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -48948,8 +48948,8 @@
"pass": false,
"score": 0,
"namedScores": {},
- "text": "The Generated Answer is partially correct but misses a key element from the Correct Answer. While both answers mention interactive Jupyter notebooks and working with PDFs, the Generated Answer does not specifically mention \"embeddings\" as stated in the Correct Answer. Instead, it talks about extending Claude's capabilities and using VoyageAI for text similarity and search tasks, which are not mentioned in the Correct Answer. The Generated Answer also includes information about summarizing PDF content, which is not present in the Correct Answer. Due to these discrepancies and the omission of the key concept of embeddings, the Generated Answer cannot be considered fully correct based on the given Correct Answer.\n---\nThe Claude Cookbook can help developers learn to use Anthropic's APIs in two ways:\n\n1. Providing interactive Jupyter notebooks that demonstrate how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\n2. Showing how to extend Claude's capabilities by integrating external tools and functions into workflows, as well as how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbook\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Next steps\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "text": "The Generated Answer is partially correct but misses a key element from the Correct Answer. While both answers mention interactive Jupyter notebooks and working with PDFs, the Generated Answer does not specifically mention \"embeddings\" as stated in the Correct Answer. Instead, it talks about extending Claude's capabilities and using VoyageAI for text similarity and search tasks, which are not mentioned in the Correct Answer. The Generated Answer also includes information about summarizing PDF content, which is not present in the Correct Answer. Due to these discrepancies and the omission of the key concept of embeddings, the Generated Answer cannot be considered fully correct based on the given Correct Answer.\n---\nThe Claude Cookbooks can help developers learn to use Anthropic's APIs in two ways:\n\n1. Providing interactive Jupyter notebooks that demonstrate how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\n2. Showing how to extend Claude's capabilities by integrating external tools and functions into workflows, as well as how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbooks\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Next steps\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 1263,
"tokenUsage": {
@@ -48986,8 +48986,8 @@
"pass": false,
"score": 0,
"namedScores": {},
- "text": "The Generated Answer is partially correct but misses a key element from the Correct Answer. Both answers agree that the Claude Cookbook provides interactive Jupyter notebooks to help developers learn to use Anthropic's APIs. However, the Generated Answer does not mention working with embeddings, which is a specific capability highlighted in the Correct Answer. Instead, it introduces information about extending Claude's capabilities and tool use, which is not mentioned in the Correct Answer. While the Generated Answer provides useful information, it doesn't fully align with the specific points outlined in the Correct Answer.\n---\nThe Claude Cookbook can help developers learn to use Anthropic's APIs in two main ways:\n\n1. It provides interactive Jupyter notebooks that demonstrate practical examples and hands-on tutorials. These notebooks cover topics like uploading PDFs and having Claude summarize their content, making it easier to work with long documents.\n\n2. It offers tutorials on extending Claude's capabilities by integrating external tools and functions into workflows. This allows developers to learn about tool use and function calling with Anthropic's APIs.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbook\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Next steps\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "text": "The Generated Answer is partially correct but misses a key element from the Correct Answer. Both answers agree that the Claude Cookbooks provides interactive Jupyter notebooks to help developers learn to use Anthropic's APIs. However, the Generated Answer does not mention working with embeddings, which is a specific capability highlighted in the Correct Answer. Instead, it introduces information about extending Claude's capabilities and tool use, which is not mentioned in the Correct Answer. While the Generated Answer provides useful information, it doesn't fully align with the specific points outlined in the Correct Answer.\n---\nThe Claude Cookbooks can help developers learn to use Anthropic's APIs in two main ways:\n\n1. It provides interactive Jupyter notebooks that demonstrate practical examples and hands-on tutorials. These notebooks cover topics like uploading PDFs and having Claude summarize their content, making it easier to work with long documents.\n\n2. It offers tutorials on extending Claude's capabilities by integrating external tools and functions into workflows. This allows developers to learn about tool use and function calling with Anthropic's APIs.",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Claude Cookbooks\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Next steps\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 2778,
"tokenUsage": {
@@ -48998,7 +48998,7 @@
"gradingResult": {
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key element from the Correct Answer. Both answers agree that the Claude Cookbook provides interactive Jupyter notebooks to help developers learn to use Anthropic's APIs. However, the Generated Answer does not mention working with embeddings, which is a specific capability highlighted in the Correct Answer. Instead, it introduces information about extending Claude's capabilities and tool use, which is not mentioned in the Correct Answer. While the Generated Answer provides useful information, it doesn't fully align with the specific points outlined in the Correct Answer.",
+ "reason": "The Generated Answer is partially correct but misses a key element from the Correct Answer. Both answers agree that the Claude Cookbooks provides interactive Jupyter notebooks to help developers learn to use Anthropic's APIs. However, the Generated Answer does not mention working with embeddings, which is a specific capability highlighted in the Correct Answer. Instead, it introduces information about extending Claude's capabilities and tool use, which is not mentioned in the Correct Answer. While the Generated Answer provides useful information, it doesn't fully align with the specific points outlined in the Correct Answer.",
"namedScores": {},
"tokensUsed": {
"total": 0,
@@ -49009,7 +49009,7 @@
{
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but misses a key element from the Correct Answer. Both answers agree that the Claude Cookbook provides interactive Jupyter notebooks to help developers learn to use Anthropic's APIs. However, the Generated Answer does not mention working with embeddings, which is a specific capability highlighted in the Correct Answer. Instead, it introduces information about extending Claude's capabilities and tool use, which is not mentioned in the Correct Answer. While the Generated Answer provides useful information, it doesn't fully align with the specific points outlined in the Correct Answer.",
+ "reason": "The Generated Answer is partially correct but misses a key element from the Correct Answer. Both answers agree that the Claude Cookbooks provides interactive Jupyter notebooks to help developers learn to use Anthropic's APIs. However, the Generated Answer does not mention working with embeddings, which is a specific capability highlighted in the Correct Answer. Instead, it introduces information about extending Claude's capabilities and tool use, which is not mentioned in the Correct Answer. While the Generated Answer provides useful information, it doesn't fully align with the specific points outlined in the Correct Answer.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -49023,8 +49023,8 @@
],
"test": {
"vars": {
- "query": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
- "correct_answer": "The Claude Cookbook provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs."
+ "query": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
+ "correct_answer": "The Claude Cookbooks provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs."
},
"assert": [
{
@@ -49036,8 +49036,8 @@
"description": "Row #76"
},
"vars": [
- "The Claude Cookbook provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs.",
- "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?"
+ "The Claude Cookbooks provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs.",
+ "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?"
]
},
{
@@ -49367,7 +49367,7 @@
"score": 1,
"namedScores": {},
"text": "The Evaluation tool in Anthropic's Claude platform can help improve prompts and build more robust AI applications in the following ways:\n\n1. Identifying edge cases: The Evaluation tool allows users to test their prompts across a range of inputs, helping them identify edge cases where the prompt might falter or perform poorly.\n\n2. Rating individual results: Users can rate individual results from the Evaluation tool, which helps them determine cases where their prompt performs better or worse, enabling them to make informed adjustments.\n\n3. Ensuring consistent performance: By reviewing results across multiple test cases, users can identify patterns and ensure their prompt maintains consistent performance, rather than having unpredictable or inconsistent outputs.\n\n4. Refining prompts for better reliability: The insights gained from the Evaluation tool can help users refine their prompts, making them more reliable and robust for a wider range of inputs and use cases.\n\n5. Providing valuable feedback: The Evaluation tool is currently in beta, and Anthropic encourages users to provide feedback on any issues or suggestions they have, which can help the company improve the tool and support the development of more robust AI applications.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Understanding Results\n\nText\n Understanding Results\n\n\nThe Evaluation tool helps you:\nIdentify edge cases where your prompt might falter\nRate individual results to determine cases where your prompt performance better or worse\nEnsure consistent performance across a range of inputs\nRefine your prompt for better reliability\nBy reviewing results across test cases, you can spot patterns and make informed adjustments to your prompt.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\n\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nStart evaluating your prompts today to build more robust AI applications with Claude!\nReducing latencyGlossaryxlinkedin\nReducing latencyGlossary\nxlinkedin\nAccessing the Evaluate Feature Creating Test Cases Tips for Effective Evaluation Understanding Results\nAccessing the Evaluate FeatureCreating Test CasesTips for Effective EvaluationUnderstanding Results\n \n\nSummary: \n The Evaluation tool helps users identify edge cases, rate individual results, ensure consistent performance, and refine prompts for better reliability. By reviewing results across test cases, users can spot patterns and make informed adjustments to their prompts. The Evaluation tool is currently in beta, and user feedback is valuable for the Anthropic team. \n \n\n \n Prompt engineering tutorial\n\nText\n Prompt engineering tutorial\n\n\nIf you’re an interactive learner, you can dive into our interactive tutorials instead!\nGitHub prompting tutorialAn example-filled tutorial that covers the prompt engineering concepts found in our docs.Google Sheets prompting tutorialA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\nGitHub prompting tutorialAn example-filled tutorial that covers the prompt engineering concepts found in our docs.\n\nGitHub prompting tutorial\nAn example-filled tutorial that covers the prompt engineering concepts found in our docs.\nGoogle Sheets prompting tutorialA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\n\nGoogle Sheets prompting tutorial\nA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\nDevelop test casesPrompt generatorxlinkedin\nDevelop test casesPrompt generator\nxlinkedin\nBefore prompt engineering When to prompt engineer How to prompt engineer Prompt engineering tutorial\nBefore prompt engineeringWhen to prompt engineerHow to prompt engineerPrompt engineering tutorial\n \n\nSummary: \n Anthropic's documentation includes a prompt engineering tutorial, which is available in two formats: a GitHub-based tutorial with examples, and a lighter-weight version in a Google Sheets spreadsheet. These tutorials cover the concepts and techniques of prompt engineering for Anthropic's Claude AI model. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Understanding Results\n\nText\n Understanding Results\n\n\nThe Evaluation tool helps you:\nIdentify edge cases where your prompt might falter\nRate individual results to determine cases where your prompt performance better or worse\nEnsure consistent performance across a range of inputs\nRefine your prompt for better reliability\nBy reviewing results across test cases, you can spot patterns and make informed adjustments to your prompt.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\n\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nStart evaluating your prompts today to build more robust AI applications with Claude!\nReducing latencyGlossaryxlinkedin\nReducing latencyGlossary\nxlinkedin\nAccessing the Evaluate Feature Creating Test Cases Tips for Effective Evaluation Understanding Results\nAccessing the Evaluate FeatureCreating Test CasesTips for Effective EvaluationUnderstanding Results\n \n\nSummary: \n The Evaluation tool helps users identify edge cases, rate individual results, ensure consistent performance, and refine prompts for better reliability. By reviewing results across test cases, users can spot patterns and make informed adjustments to their prompts. The Evaluation tool is currently in beta, and user feedback is valuable for the Anthropic team. \n \n\n \n Prompt engineering tutorial\n\nText\n Prompt engineering tutorial\n\n\nIf you’re an interactive learner, you can dive into our interactive tutorials instead!\nGitHub prompting tutorialAn example-filled tutorial that covers the prompt engineering concepts found in our docs.Google Sheets prompting tutorialA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\nGitHub prompting tutorialAn example-filled tutorial that covers the prompt engineering concepts found in our docs.\n\nGitHub prompting tutorial\nAn example-filled tutorial that covers the prompt engineering concepts found in our docs.\nGoogle Sheets prompting tutorialA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\n\nGoogle Sheets prompting tutorial\nA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\nDevelop test casesPrompt generatorxlinkedin\nDevelop test casesPrompt generator\nxlinkedin\nBefore prompt engineering When to prompt engineer How to prompt engineer Prompt engineering tutorial\nBefore prompt engineeringWhen to prompt engineerHow to prompt engineerPrompt engineering tutorial\n \n\nSummary: \n Anthropic's documentation includes a prompt engineering tutorial, which is available in two formats: a GitHub-based tutorial with examples, and a lighter-weight version in a Google Sheets spreadsheet. These tutorials cover the concepts and techniques of prompt engineering for Anthropic's Claude AI model. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 2556,
"tokenUsage": {
@@ -49405,7 +49405,7 @@
"score": 1,
"namedScores": {},
"text": "The Evaluation tool in Anthropic's Claude platform can help improve prompts and build more robust AI applications in several key ways:\n\n1. Identifying edge cases: It helps users find scenarios where their prompts might not perform as expected, allowing them to address potential weaknesses.\n\n2. Rating individual results: Users can assess the performance of their prompts on specific inputs, pinpointing areas of strength and weakness.\n\n3. Ensuring consistent performance: By testing prompts across a range of inputs, users can verify that their AI applications perform reliably in various scenarios.\n\n4. Refining prompts: The tool enables users to make informed adjustments to their prompts based on the patterns and insights gained from reviewing test results.\n\n5. Spotting patterns: By analyzing results across multiple test cases, users can identify trends in prompt performance and make targeted improvements.\n\nThese capabilities allow developers to iteratively improve their prompts, leading to more reliable and robust AI applications. The Evaluation tool is currently in beta, and Anthropic encourages users to provide feedback to further enhance its functionality.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Understanding Results\n\nText\n Understanding Results\n\n\nThe Evaluation tool helps you:\nIdentify edge cases where your prompt might falter\nRate individual results to determine cases where your prompt performance better or worse\nEnsure consistent performance across a range of inputs\nRefine your prompt for better reliability\nBy reviewing results across test cases, you can spot patterns and make informed adjustments to your prompt.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\n\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nStart evaluating your prompts today to build more robust AI applications with Claude!\nReducing latencyGlossaryxlinkedin\nReducing latencyGlossary\nxlinkedin\nAccessing the Evaluate Feature Creating Test Cases Tips for Effective Evaluation Understanding Results\nAccessing the Evaluate FeatureCreating Test CasesTips for Effective EvaluationUnderstanding Results\n \n\nSummary: \n The Evaluation tool helps users identify edge cases, rate individual results, ensure consistent performance, and refine prompts for better reliability. By reviewing results across test cases, users can spot patterns and make informed adjustments to their prompts. The Evaluation tool is currently in beta, and user feedback is valuable for the Anthropic team. \n \n\n \n Prompt engineering tutorial\n\nText\n Prompt engineering tutorial\n\n\nIf you’re an interactive learner, you can dive into our interactive tutorials instead!\nGitHub prompting tutorialAn example-filled tutorial that covers the prompt engineering concepts found in our docs.Google Sheets prompting tutorialA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\nGitHub prompting tutorialAn example-filled tutorial that covers the prompt engineering concepts found in our docs.\n\nGitHub prompting tutorial\nAn example-filled tutorial that covers the prompt engineering concepts found in our docs.\nGoogle Sheets prompting tutorialA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\n\nGoogle Sheets prompting tutorial\nA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\nDevelop test casesPrompt generatorxlinkedin\nDevelop test casesPrompt generator\nxlinkedin\nBefore prompt engineering When to prompt engineer How to prompt engineer Prompt engineering tutorial\nBefore prompt engineeringWhen to prompt engineerHow to prompt engineerPrompt engineering tutorial\n \n\nSummary: \n Anthropic's documentation includes a prompt engineering tutorial, which is available in two formats: a GitHub-based tutorial with examples, and a lighter-weight version in a Google Sheets spreadsheet. These tutorials cover the concepts and techniques of prompt engineering for Anthropic's Claude AI model. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Understanding Results\n\nText\n Understanding Results\n\n\nThe Evaluation tool helps you:\nIdentify edge cases where your prompt might falter\nRate individual results to determine cases where your prompt performance better or worse\nEnsure consistent performance across a range of inputs\nRefine your prompt for better reliability\nBy reviewing results across test cases, you can spot patterns and make informed adjustments to your prompt.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\n\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nStart evaluating your prompts today to build more robust AI applications with Claude!\nReducing latencyGlossaryxlinkedin\nReducing latencyGlossary\nxlinkedin\nAccessing the Evaluate Feature Creating Test Cases Tips for Effective Evaluation Understanding Results\nAccessing the Evaluate FeatureCreating Test CasesTips for Effective EvaluationUnderstanding Results\n \n\nSummary: \n The Evaluation tool helps users identify edge cases, rate individual results, ensure consistent performance, and refine prompts for better reliability. By reviewing results across test cases, users can spot patterns and make informed adjustments to their prompts. The Evaluation tool is currently in beta, and user feedback is valuable for the Anthropic team. \n \n\n \n Prompt engineering tutorial\n\nText\n Prompt engineering tutorial\n\n\nIf you’re an interactive learner, you can dive into our interactive tutorials instead!\nGitHub prompting tutorialAn example-filled tutorial that covers the prompt engineering concepts found in our docs.Google Sheets prompting tutorialA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\nGitHub prompting tutorialAn example-filled tutorial that covers the prompt engineering concepts found in our docs.\n\nGitHub prompting tutorial\nAn example-filled tutorial that covers the prompt engineering concepts found in our docs.\nGoogle Sheets prompting tutorialA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\n\nGoogle Sheets prompting tutorial\nA lighter weight version of our prompt engineering tutorial via an interactive spreadsheet.\nDevelop test casesPrompt generatorxlinkedin\nDevelop test casesPrompt generator\nxlinkedin\nBefore prompt engineering When to prompt engineer How to prompt engineer Prompt engineering tutorial\nBefore prompt engineeringWhen to prompt engineerHow to prompt engineerPrompt engineering tutorial\n \n\nSummary: \n Anthropic's documentation includes a prompt engineering tutorial, which is available in two formats: a GitHub-based tutorial with examples, and a lighter-weight version in a Google Sheets spreadsheet. These tutorials cover the concepts and techniques of prompt engineering for Anthropic's Claude AI model. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 6321,
"tokenUsage": {
@@ -52012,7 +52012,7 @@
"score": 1,
"namedScores": {},
"text": "On May 10th, 2024, Anthropic introduced a new Prompt Generator tool in the Developer Console. This tool enables users to create and tailor prompts for specific tasks more easily. The Prompt Generator provides guidance and suggestions to help users craft high-quality prompts that can optimize Claude's performance for their particular use cases.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 1,
"tokenUsage": {},
@@ -52046,7 +52046,7 @@
"score": 1,
"namedScores": {},
"text": "On May 10th, 2024, Anthropic introduced a new prompt generator tool in their Developer Console. This tool, called Prompt Generator, is designed to help users create high-quality prompts tailored to specific tasks. The Prompt Generator makes it easier for users to guide Claude in generating prompts that are optimized for their particular needs. This new capability enables users to more effectively craft prompts for specific use cases, potentially improving the quality and relevance of Claude's outputs for their intended tasks. Anthropic provided more details about this new feature in a blog post, though the specific contents of that post are not included in the given context.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nMore Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n\n\nNext steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 3814,
"tokenUsage": {
@@ -52084,7 +52084,7 @@
"score": 1,
"namedScores": {},
"text": "On May 10th, 2024, Anthropic introduced a new Prompt Generator tool in the Developer Console. This tool is designed to help users create high-quality prompts tailored to their specific tasks. The Prompt Generator makes it easier for users to guide the Claude AI model to generate prompts that are well-suited for their particular needs.\n\nIn addition to the Prompt Generator, Anthropic also provides other resources to help users work with the Claude model, including:\n\n1. Prompt Engineering Guide: This guide helps users master the art of prompt crafting to get the most out of the Claude model, particularly for fine-tuning with legacy models.\n\n2. Prompt Library: This library contains a wide range of pre-crafted prompts for various tasks and industries, which can be used for inspiration or as a starting point.\n\n3. API Documentation: This documentation provides everything users need to interact with the Claude model via the Claude API, including request formats, response handling, and troubleshooting.\n\nThese new capabilities and features introduced by Anthropic on May 10th, 2024 are designed to empower users to create and tailor prompts for specific tasks, ultimately enhancing their ability to leverage the Claude AI model effectively.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nText\n May 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n\nSummary: \n Anthropic has released a Prompt Generator tool in the Developer Console, which helps users create high-quality prompts tailored to their specific tasks. The tool is discussed in a recent blog post, and is part of Anthropic's suite of Claude AI model-related products and services. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Next steps\n\nText\n Next steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n\nSummary: \n The summary covers the next steps after making an initial Claude API request, including exploring the Prompt Engineering Guide to optimize Claude's performance, the Claude Cookbook for interactive Jupyter notebooks, and the Prompt Library for example prompts across use cases. It also mentions the overview and prerequisites for working with the Anthropic platform. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nText\n May 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n\nSummary: \n Anthropic has released a Prompt Generator tool in the Developer Console, which helps users create high-quality prompts tailored to their specific tasks. The tool is discussed in a recent blog post, and is part of Anthropic's suite of Claude AI model-related products and services. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Next steps\n\nText\n Next steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n\nSummary: \n The summary covers the next steps after making an initial Claude API request, including exploring the Prompt Engineering Guide to optimize Claude's performance, the Claude Cookbooks for interactive Jupyter notebooks, and the Prompt Library for example prompts across use cases. It also mentions the overview and prerequisites for working with the Anthropic platform. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 2235,
"tokenUsage": {
@@ -52122,7 +52122,7 @@
"score": 1,
"namedScores": {},
"text": "Based on the information provided in the documents, on May 10th, 2024, Anthropic introduced a new Prompt Generator tool in their Developer Console. This tool is designed to help users create high-quality prompts tailored to their specific tasks when working with Claude, Anthropic's AI model.\n\nThe Prompt Generator enables users to more easily craft effective prompts by:\n\n1. Guiding users through the process of creating prompts specifically suited to their tasks.\n2. Helping to optimize Claude's performance through better prompt engineering.\n3. Potentially drawing from or integrating with Anthropic's existing Prompt Library, which contains pre-crafted prompts for various tasks and industries.\n\nWhile the exact details of how the Prompt Generator works are not provided in the given context, it appears to be a significant addition to Anthropic's suite of tools for working with Claude. This tool likely complements existing resources such as the Prompt Engineering Guide and Prompt Library, making it easier for users to create effective, task-specific prompts without needing extensive expertise in prompt engineering.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nText\n May 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n\nSummary: \n Anthropic has released a Prompt Generator tool in the Developer Console, which helps users create high-quality prompts tailored to their specific tasks. The tool is discussed in a recent blog post, and is part of Anthropic's suite of Claude AI model-related products and services. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Next steps\n\nText\n Next steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n\nSummary: \n The summary covers the next steps after making an initial Claude API request, including exploring the Prompt Engineering Guide to optimize Claude's performance, the Claude Cookbook for interactive Jupyter notebooks, and the Prompt Library for example prompts across use cases. It also mentions the overview and prerequisites for working with the Anthropic platform. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nText\n May 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n\nSummary: \n Anthropic has released a Prompt Generator tool in the Developer Console, which helps users create high-quality prompts tailored to their specific tasks. The tool is discussed in a recent blog post, and is part of Anthropic's suite of Claude AI model-related products and services. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Next steps\n\nText\n Next steps\n\n\nNow that you have made your first Claude API request, it’s time to explore what else is possible:\nPrompt Engineering GuideOptimize Claude’s performance through prompting.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.Prompt LibraryExplore dozens of example prompts for inspiration across use cases.\nPrompt Engineering GuideOptimize Claude’s performance through prompting.\n\nPrompt Engineering Guide\nOptimize Claude’s performance through prompting.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nPrompt LibraryExplore dozens of example prompts for inspiration across use cases.\n\nPrompt Library\nExplore dozens of example prompts for inspiration across use cases.\nOverviewIntro to Claudexlinkedin\nOverviewIntro to Claude\nxlinkedin\nPrerequisites Start with the Workbench Install the SDK Set your API key Call the API Next steps\nPrerequisitesStart with the WorkbenchInstall the SDKSet your API keyCall the APINext steps\n \n\nSummary: \n The summary covers the next steps after making an initial Claude API request, including exploring the Prompt Engineering Guide to optimize Claude's performance, the Claude Cookbooks for interactive Jupyter notebooks, and the Prompt Library for example prompts across use cases. It also mentions the overview and prerequisites for working with the Anthropic platform. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 4019,
"tokenUsage": {
@@ -52160,7 +52160,7 @@
"score": 1,
"namedScores": {},
"text": "On May 10th, 2024, Anthropic introduced a new Prompt Generator tool in the Developer Console. This tool enables users to more easily create and tailor prompts for specific tasks. The Prompt Generator allows users to guide Claude, Anthropic's AI assistant, to generate high-quality prompts that are tailored to the user's needs. This feature provides a more streamlined and powerful way for developers to prompt Claude, enabling them to create more effective and customized prompts for their applications.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Before prompt engineering\n\nBefore prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Before prompt engineering\n\nBefore prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 1360,
"tokenUsage": {
@@ -52198,7 +52198,7 @@
"score": 1,
"namedScores": {},
"text": "On May 10th, 2024, Anthropic introduced a new prompt generator tool in their Developer Console. This tool is designed to help users create high-quality prompts tailored to specific tasks when working with Claude, Anthropic's AI assistant.\n\nThe prompt generator makes it easier for users to guide Claude in generating prompts that are customized to their particular needs. This new capability enables users to more effectively craft prompts for their specific use cases, potentially improving the quality and relevance of Claude's outputs.\n\nThe tool is integrated into the Developer Console, which Anthropic describes as offering \"easier, more powerful prompting in your browser.\" This suggests that the prompt generator is part of a suite of developer tools aimed at enhancing the user experience and effectiveness when working with Claude.\n\nWhile the exact details of how the prompt generator functions are not provided in the given context, it's clear that this new feature is intended to streamline the process of creating task-specific prompts, which is a crucial aspect of effectively utilizing AI language models like Claude.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Before prompt engineering\n\nBefore prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n May 10th, 2024\n\nMay 10th, 2024\n\n\nOur prompt generator tool is now available in the Developer Console. Prompt Generator makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks. Read more in our blog post.\nOverviewClaude Appsxlinkedin\nOverviewClaude Apps\nxlinkedin\nJune 27th, 2024 June 20th, 2024 May 30th, 2024 May 10th, 2024\nJune 27th, 2024June 20th, 2024May 30th, 2024May 10th, 2024\n \n \n\n \n Develop with Claude\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n \n\n \n Before prompt engineering\n\nBefore prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 5065,
"tokenUsage": {
@@ -53494,7 +53494,7 @@
"score": 0,
"namedScores": {},
"text": "The Generated Answer is mostly correct, but contains one piece of information that is not present in the Correct Answer and cannot be directly inferred from it. The first two points in the Generated Answer (multimodal cookbook and API reference) align well with the Correct Answer. However, the third point about the developer community is not mentioned in the Correct Answer. While this might be a helpful resource, it's not explicitly stated as one of the resources for diving deeper into building with images using Claude according to the given Correct Answer. Therefore, despite the overall similarity, this additional information makes the Generated Answer not fully correct based on the criteria provided.\n---\nThe key resources provided by Anthropic to dive deeper into building with images using Claude are:\n\n1. Multimodal cookbook: This cookbook provides tips on getting started with images and best practice techniques for effectively prompting Claude with images to perform tasks like interpreting and analyzing charts or extracting content from forms.\n\n2. API reference: The documentation for the Messages API, including example API calls involving images, can be used to explore and implement image-related functionality.\n\n3. Developer community: Anthropic encourages joining their developer community to connect with other creators and get help from Anthropic experts on building with images using Claude.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nDive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nDive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 1,
"tokenUsage": {},
@@ -53527,8 +53527,8 @@
"pass": false,
"score": 0,
"namedScores": {},
- "text": "The Generated Answer is partially correct but contains additional information not present in the Correct Answer. The key points from the Correct Answer - the multimodal cookbook with tips and best practices, and the API reference documentation with example API calls involving images - are both included in the Generated Answer. However, the Generated Answer goes beyond this by mentioning additional resources like the Developer Console, Claude Cookbook, support team, and developer community, which are not mentioned in the Correct Answer. While these additional resources may be helpful, they are not specified in the Correct Answer as resources specifically for diving deeper into building with images using Claude. Therefore, the Generated Answer, while containing the correct information, also includes extra information that cannot be verified based on the given Correct Answer.\n---\nAnthropic provides several helpful resources for diving deeper into building with images using Claude:\n\n1. Multimodal cookbook: This contains tips for getting started with images, best practices, and techniques to ensure high-quality performance. It includes examples of how to effectively prompt Claude with images for tasks like interpreting charts or extracting content from forms.\n\n2. API reference: The documentation for the Messages API includes example API calls involving images.\n\n3. Developer Console: This includes the Workbench and prompt generator tool for easier and more powerful prompting in your browser.\n\n4. Claude Cookbook: This features interactive Jupyter notebooks demonstrating various capabilities, including working with images.\n\n5. Support team: Anthropic offers the option to reach out to their support team for additional questions.\n\n6. Developer community: Users can join Anthropic's developer community to connect with other creators and get help from Anthropic experts.\n\nThese resources provide a range of options for developers to learn about and implement image-related functionalities with Claude, from documentation and examples to interactive tools and community support.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nDive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbook for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "text": "The Generated Answer is partially correct but contains additional information not present in the Correct Answer. The key points from the Correct Answer - the multimodal cookbook with tips and best practices, and the API reference documentation with example API calls involving images - are both included in the Generated Answer. However, the Generated Answer goes beyond this by mentioning additional resources like the Developer Console, Claude Cookbooks, support team, and developer community, which are not mentioned in the Correct Answer. While these additional resources may be helpful, they are not specified in the Correct Answer as resources specifically for diving deeper into building with images using Claude. Therefore, the Generated Answer, while containing the correct information, also includes extra information that cannot be verified based on the given Correct Answer.\n---\nAnthropic provides several helpful resources for diving deeper into building with images using Claude:\n\n1. Multimodal cookbook: This contains tips for getting started with images, best practices, and techniques to ensure high-quality performance. It includes examples of how to effectively prompt Claude with images for tasks like interpreting charts or extracting content from forms.\n\n2. API reference: The documentation for the Messages API includes example API calls involving images.\n\n3. Developer Console: This includes the Workbench and prompt generator tool for easier and more powerful prompting in your browser.\n\n4. Claude Cookbooks: This features interactive Jupyter notebooks demonstrating various capabilities, including working with images.\n\n5. Support team: Anthropic offers the option to reach out to their support team for additional questions.\n\n6. Developer community: Users can join Anthropic's developer community to connect with other creators and get help from Anthropic experts.\n\nThese resources provide a range of options for developers to learn about and implement image-related functionalities with Claude, from documentation and examples to interactive tools and community support.",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \nDive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n\n\nStart building with Claude\n\n\nWhen you’re ready, start building with Claude:\nFollow the Quickstart to make your first API call\nCheck out the API Reference\nExplore the Prompt Library for example prompts\nExperiment and start building with the Workbench\nCheck out the Claude Cookbooks for working code examples\nQuickstartOverviewxlinkedin\nQuickstartOverview\nxlinkedin\nWhat you can do with Claude Model options Claude 3.5 Family Claude 3 Family Enterprise considerations Implementing Claude Start building with Claude\nWhat you can do with ClaudeModel optionsClaude 3.5 FamilyClaude 3 FamilyEnterprise considerationsImplementing ClaudeStart building with Claude\n\n\nDevelop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 4457,
"tokenUsage": {
@@ -53539,7 +53539,7 @@
"gradingResult": {
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but contains additional information not present in the Correct Answer. The key points from the Correct Answer - the multimodal cookbook with tips and best practices, and the API reference documentation with example API calls involving images - are both included in the Generated Answer. However, the Generated Answer goes beyond this by mentioning additional resources like the Developer Console, Claude Cookbook, support team, and developer community, which are not mentioned in the Correct Answer. While these additional resources may be helpful, they are not specified in the Correct Answer as resources specifically for diving deeper into building with images using Claude. Therefore, the Generated Answer, while containing the correct information, also includes extra information that cannot be verified based on the given Correct Answer.",
+ "reason": "The Generated Answer is partially correct but contains additional information not present in the Correct Answer. The key points from the Correct Answer - the multimodal cookbook with tips and best practices, and the API reference documentation with example API calls involving images - are both included in the Generated Answer. However, the Generated Answer goes beyond this by mentioning additional resources like the Developer Console, Claude Cookbooks, support team, and developer community, which are not mentioned in the Correct Answer. While these additional resources may be helpful, they are not specified in the Correct Answer as resources specifically for diving deeper into building with images using Claude. Therefore, the Generated Answer, while containing the correct information, also includes extra information that cannot be verified based on the given Correct Answer.",
"namedScores": {},
"tokensUsed": {
"total": 0,
@@ -53550,7 +53550,7 @@
{
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but contains additional information not present in the Correct Answer. The key points from the Correct Answer - the multimodal cookbook with tips and best practices, and the API reference documentation with example API calls involving images - are both included in the Generated Answer. However, the Generated Answer goes beyond this by mentioning additional resources like the Developer Console, Claude Cookbook, support team, and developer community, which are not mentioned in the Correct Answer. While these additional resources may be helpful, they are not specified in the Correct Answer as resources specifically for diving deeper into building with images using Claude. Therefore, the Generated Answer, while containing the correct information, also includes extra information that cannot be verified based on the given Correct Answer.",
+ "reason": "The Generated Answer is partially correct but contains additional information not present in the Correct Answer. The key points from the Correct Answer - the multimodal cookbook with tips and best practices, and the API reference documentation with example API calls involving images - are both included in the Generated Answer. However, the Generated Answer goes beyond this by mentioning additional resources like the Developer Console, Claude Cookbooks, support team, and developer community, which are not mentioned in the Correct Answer. While these additional resources may be helpful, they are not specified in the Correct Answer as resources specifically for diving deeper into building with images using Claude. Therefore, the Generated Answer, while containing the correct information, also includes extra information that cannot be verified based on the given Correct Answer.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -53565,8 +53565,8 @@
"pass": false,
"score": 0,
"namedScores": {},
- "text": "The Generated Answer is partially correct but contains some inaccuracies and additional information not mentioned in the Correct Answer. The Generated Answer correctly mentions the multimodal cookbook and API reference documentation, which align with the Correct Answer. However, it also includes information about a developer community, developer console, and Claude Cookbook, which are not mentioned in the Correct Answer. While these additional resources may exist, they are not part of the specific resources mentioned in the Correct Answer for diving deeper into building with images using Claude. Due to these additions and the absence of some specific details (like mentioning \"Messages API\" explicitly), the Generated Answer cannot be considered fully correct based on the given Correct Answer.\n---\nAnthropic provides several helpful resources to dive deeper into building with images using Claude:\n\n1. Multimodal Cookbook: This resource provides tips and best practices for effectively prompting Claude with images, such as how to interpret and analyze charts or extract content from forms.\n\n2. API Reference: The documentation for the Messages API includes example API calls involving images, allowing developers to explore and implement image-based functionality.\n\n3. Developer Community: Anthropic encourages developers to join their community to connect with other creators and get help from Anthropic experts on using the vision capabilities of Claude.\n\n4. Developer Console: Anthropic's browser-based Workbench and prompt generator tool enable easier and more powerful prompting, including with images.\n\n5. Claude Cookbook: Interactive Jupyter notebooks demonstrate how to use features like uploading PDFs and working with embeddings, which can be applied to image-based use cases.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Prompt examples\n\nText\n Prompt examples\n\n\nMany of the prompting techniques that work well for text-based interactions with Claude can also be applied to image-based prompts.\nThese examples demonstrate best practice prompt structures involving images.\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\n\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\n \n\nSummary: \n Prompt examples demonstrate that many text-based techniques can be applied to image-based prompts with Claude. The model works best when images are placed before text, but images after text or interspersed with text will also perform well. Anthropic recommends an image-then-text structure if the use case allows it. \n \n\n \n Dive deeper into vision\n\nText\n Dive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n \n\nSummary: \n This documentation covers resources for using images with the Claude AI model, including a multimodal cookbook with tips on effective prompting, an API reference for the Messages API, and information on image size, costs, and quality. It also provides prompt examples and addresses limitations and FAQs around using vision capabilities. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "text": "The Generated Answer is partially correct but contains some inaccuracies and additional information not mentioned in the Correct Answer. The Generated Answer correctly mentions the multimodal cookbook and API reference documentation, which align with the Correct Answer. However, it also includes information about a developer community, developer console, and Claude Cookbooks, which are not mentioned in the Correct Answer. While these additional resources may exist, they are not part of the specific resources mentioned in the Correct Answer for diving deeper into building with images using Claude. Due to these additions and the absence of some specific details (like mentioning \"Messages API\" explicitly), the Generated Answer cannot be considered fully correct based on the given Correct Answer.\n---\nAnthropic provides several helpful resources to dive deeper into building with images using Claude:\n\n1. Multimodal Cookbook: This resource provides tips and best practices for effectively prompting Claude with images, such as how to interpret and analyze charts or extract content from forms.\n\n2. API Reference: The documentation for the Messages API includes example API calls involving images, allowing developers to explore and implement image-based functionality.\n\n3. Developer Community: Anthropic encourages developers to join their community to connect with other creators and get help from Anthropic experts on using the vision capabilities of Claude.\n\n4. Developer Console: Anthropic's browser-based Workbench and prompt generator tool enable easier and more powerful prompting, including with images.\n\n5. Claude Cookbooks: Interactive Jupyter notebooks demonstrate how to use features like uploading PDFs and working with embeddings, which can be applied to image-based use cases.",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Prompt examples\n\nText\n Prompt examples\n\n\nMany of the prompting techniques that work well for text-based interactions with Claude can also be applied to image-based prompts.\nThese examples demonstrate best practice prompt structures involving images.\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\n\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\n \n\nSummary: \n Prompt examples demonstrate that many text-based techniques can be applied to image-based prompts with Claude. The model works best when images are placed before text, but images after text or interspersed with text will also perform well. Anthropic recommends an image-then-text structure if the use case allows it. \n \n\n \n Dive deeper into vision\n\nText\n Dive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n \n\nSummary: \n This documentation covers resources for using images with the Claude AI model, including a multimodal cookbook with tips on effective prompting, an API reference for the Messages API, and information on image size, costs, and quality. It also provides prompt examples and addresses limitations and FAQs around using vision capabilities. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 1886,
"tokenUsage": {
@@ -53577,7 +53577,7 @@
"gradingResult": {
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but contains some inaccuracies and additional information not mentioned in the Correct Answer. The Generated Answer correctly mentions the multimodal cookbook and API reference documentation, which align with the Correct Answer. However, it also includes information about a developer community, developer console, and Claude Cookbook, which are not mentioned in the Correct Answer. While these additional resources may exist, they are not part of the specific resources mentioned in the Correct Answer for diving deeper into building with images using Claude. Due to these additions and the absence of some specific details (like mentioning \"Messages API\" explicitly), the Generated Answer cannot be considered fully correct based on the given Correct Answer.",
+ "reason": "The Generated Answer is partially correct but contains some inaccuracies and additional information not mentioned in the Correct Answer. The Generated Answer correctly mentions the multimodal cookbook and API reference documentation, which align with the Correct Answer. However, it also includes information about a developer community, developer console, and Claude Cookbooks, which are not mentioned in the Correct Answer. While these additional resources may exist, they are not part of the specific resources mentioned in the Correct Answer for diving deeper into building with images using Claude. Due to these additions and the absence of some specific details (like mentioning \"Messages API\" explicitly), the Generated Answer cannot be considered fully correct based on the given Correct Answer.",
"namedScores": {},
"tokensUsed": {
"total": 0,
@@ -53588,7 +53588,7 @@
{
"pass": false,
"score": 0,
- "reason": "The Generated Answer is partially correct but contains some inaccuracies and additional information not mentioned in the Correct Answer. The Generated Answer correctly mentions the multimodal cookbook and API reference documentation, which align with the Correct Answer. However, it also includes information about a developer community, developer console, and Claude Cookbook, which are not mentioned in the Correct Answer. While these additional resources may exist, they are not part of the specific resources mentioned in the Correct Answer for diving deeper into building with images using Claude. Due to these additions and the absence of some specific details (like mentioning \"Messages API\" explicitly), the Generated Answer cannot be considered fully correct based on the given Correct Answer.",
+ "reason": "The Generated Answer is partially correct but contains some inaccuracies and additional information not mentioned in the Correct Answer. The Generated Answer correctly mentions the multimodal cookbook and API reference documentation, which align with the Correct Answer. However, it also includes information about a developer community, developer console, and Claude Cookbooks, which are not mentioned in the Correct Answer. While these additional resources may exist, they are not part of the specific resources mentioned in the Correct Answer for diving deeper into building with images using Claude. Due to these additions and the absence of some specific details (like mentioning \"Messages API\" explicitly), the Generated Answer cannot be considered fully correct based on the given Correct Answer.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -53603,8 +53603,8 @@
"pass": true,
"score": 1,
"namedScores": {},
- "text": "Anthropic provides several helpful resources for diving deeper into building with images using Claude:\n\n1. Multimodal cookbook: This contains tips on getting started with images and best practices for using Claude with visual inputs. It demonstrates techniques for tasks like interpreting charts and extracting content from forms.\n\n2. API reference: The documentation for the Messages API includes example API calls involving images.\n\n3. Developer Console: This browser-based tool includes a Workbench and prompt generator to help with image-based prompting.\n\n4. Claude Cookbook: This resource offers interactive Jupyter notebooks that demonstrate various capabilities, which may include image-related functionalities.\n\n5. Developer community: Anthropic has a community where developers can connect with each other and get help from Anthropic experts on topics like working with images.\n\n6. Support team: Anthropic's support team is available to answer questions about using Claude with images.\n\nAdditionally, the documentation provides guidance on image size evaluation, cost calculation, ensuring image quality, and offers prompt examples specifically for working with images. There's also information on limitations and FAQs related to Claude's vision capabilities.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Prompt examples\n\nText\n Prompt examples\n\n\nMany of the prompting techniques that work well for text-based interactions with Claude can also be applied to image-based prompts.\nThese examples demonstrate best practice prompt structures involving images.\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\n\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\n \n\nSummary: \n Prompt examples demonstrate that many text-based techniques can be applied to image-based prompts with Claude. The model works best when images are placed before text, but images after text or interspersed with text will also perform well. Anthropic recommends an image-then-text structure if the use case allows it. \n \n\n \n Dive deeper into vision\n\nText\n Dive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n \n\nSummary: \n This documentation covers resources for using images with the Claude AI model, including a multimodal cookbook with tips on effective prompting, an API reference for the Messages API, and information on image size, costs, and quality. It also provides prompt examples and addresses limitations and FAQs around using vision capabilities. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbookLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbook\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "text": "Anthropic provides several helpful resources for diving deeper into building with images using Claude:\n\n1. Multimodal cookbook: This contains tips on getting started with images and best practices for using Claude with visual inputs. It demonstrates techniques for tasks like interpreting charts and extracting content from forms.\n\n2. API reference: The documentation for the Messages API includes example API calls involving images.\n\n3. Developer Console: This browser-based tool includes a Workbench and prompt generator to help with image-based prompting.\n\n4. Claude Cookbooks: This resource offers interactive Jupyter notebooks that demonstrate various capabilities, which may include image-related functionalities.\n\n5. Developer community: Anthropic has a community where developers can connect with each other and get help from Anthropic experts on topics like working with images.\n\n6. Support team: Anthropic's support team is available to answer questions about using Claude with images.\n\nAdditionally, the documentation provides guidance on image size evaluation, cost calculation, ensuring image quality, and offers prompt examples specifically for working with images. There's also information on limitations and FAQs related to Claude's vision capabilities.",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Prompt examples\n\nText\n Prompt examples\n\n\nMany of the prompting techniques that work well for text-based interactions with Claude can also be applied to image-based prompts.\nThese examples demonstrate best practice prompt structures involving images.\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\n\nJust as with document-query placement, Claude works best when images come before text. Images placed after text or interpolated with text will still perform well, but if your use case allows it, we recommend an image-then-text structure.\n \n\nSummary: \n Prompt examples demonstrate that many text-based techniques can be applied to image-based prompts with Claude. The model works best when images are placed before text, but images after text or interspersed with text will also perform well. Anthropic recommends an image-then-text structure if the use case allows it. \n \n\n \n Dive deeper into vision\n\nText\n Dive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n \n\nSummary: \n This documentation covers resources for using images with the Claude AI model, including a multimodal cookbook with tips on effective prompting, an API reference for the Messages API, and information on image size, costs, and quality. It also provides prompt examples and addresses limitations and FAQs around using vision capabilities. \n \n\n \n Develop with Claude\n\nText\n Develop with Claude\n\n\nAnthropic has best-in-class developer tools to build scalable applications with Claude.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.API ReferenceExplore, implement, and scale with the Claude API and SDKs.Claude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\nDeveloper ConsoleEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\n\nDeveloper Console\nEnjoy easier, more powerful prompting in your browser with the Workbench and prompt generator tool.\nAPI ReferenceExplore, implement, and scale with the Claude API and SDKs.\n\nAPI Reference\nExplore, implement, and scale with the Claude API and SDKs.\nClaude CookbooksLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n\nClaude Cookbooks\nLearn with interactive Jupyter notebooks that demonstrate uploading PDFs, embeddings, and more.\n \n\nSummary: \n Anthropic provides a suite of developer tools, including a browser-based Workbench and prompt generator, API reference documentation, and interactive Jupyter notebooks, to help developers build scalable applications with the Claude AI model. These tools enable easier, more powerful prompting, exploration and implementation of the Claude API and SDKs, and learning through interactive demonstrations. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 6191,
"tokenUsage": {
@@ -53626,7 +53626,7 @@
{
"pass": true,
"score": 1,
- "reason": "The Generated Answer is essentially correct, but it includes additional information not mentioned in the Correct Answer. The key points from the Correct Answer - the multimodal cookbook with tips and best practices, and the API reference documentation with example API calls involving images - are both included in the Generated Answer. However, the Generated Answer goes beyond this by mentioning several other resources like the Developer Console, Claude Cookbook, developer community, and support team. While these additional resources may be helpful, they weren't specified in the Correct Answer. Despite this extra information, the Generated Answer doesn't contradict the Correct Answer or omit any critical information from it. Therefore, the substance of the Generated Answer aligns with the Correct Answer.",
+ "reason": "The Generated Answer is essentially correct, but it includes additional information not mentioned in the Correct Answer. The key points from the Correct Answer - the multimodal cookbook with tips and best practices, and the API reference documentation with example API calls involving images - are both included in the Generated Answer. However, the Generated Answer goes beyond this by mentioning several other resources like the Developer Console, Claude Cookbooks, developer community, and support team. While these additional resources may be helpful, they weren't specified in the Correct Answer. Despite this extra information, the Generated Answer doesn't contradict the Correct Answer or omit any critical information from it. Therefore, the substance of the Generated Answer aligns with the Correct Answer.",
"assertion": {
"type": "python",
"value": "file://eval_end_to_end.py"
@@ -53642,7 +53642,7 @@
"score": 1,
"namedScores": {},
"text": "Anthropic provides the following helpful resources to dive deeper into building with images using Claude:\n\n1. Multimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. It covers how to effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\n\n2. API reference: The documentation for the Messages API includes example API calls involving images, providing guidance on how to integrate images into your workflows.\n\n3. Developer community: Anthropic encourages users to join their developer community to connect with other creators and get help from Anthropic experts on building with images using Claude.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Dive deeper into vision\n\nDive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n \n \n\n \n Claude Cookbook\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Vision\n\nVision\n\n\nClaude can read both text and images in requests. Currently, we support the base64 source type for images, and the image/jpeg, image/png, image/gif, and image/webp media types. See our vision guide for more details.\nShell Python TypeScript #!/bin/sh IMAGE_URL = \"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\" IMAGE_MEDIA_TYPE = \"image/jpeg\" IMAGE_BASE64 = $( curl \" $IMAGE_URL \" | base64 ) curl https://api.anthropic.com/v1/messages \\ --header \"x-api-key: $ANTHROPIC_API_KEY \" \\ --header \"anthropic-version: 2023-06-01\" \\ --header \"content-type: application/json\" \\ --data \\ '{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"' $IMAGE_MEDIA_TYPE '\",\n \"data\": \"' $IMAGE_BASE64 '\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\nShellPythonTypeScript\nShellPythonTypeScript\nShell\nShell\n\nPython\nPython\nTypeScript\nTypeScript\n\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n```\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n\n```\nJSON{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\nJSON\nJSON\n\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n```\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n\n```\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Dive deeper into vision\n\nDive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n \n \n\n \n Claude Cookbooks\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Vision\n\nVision\n\n\nClaude can read both text and images in requests. Currently, we support the base64 source type for images, and the image/jpeg, image/png, image/gif, and image/webp media types. See our vision guide for more details.\nShell Python TypeScript #!/bin/sh IMAGE_URL = \"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\" IMAGE_MEDIA_TYPE = \"image/jpeg\" IMAGE_BASE64 = $( curl \" $IMAGE_URL \" | base64 ) curl https://api.anthropic.com/v1/messages \\ --header \"x-api-key: $ANTHROPIC_API_KEY \" \\ --header \"anthropic-version: 2023-06-01\" \\ --header \"content-type: application/json\" \\ --data \\ '{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"' $IMAGE_MEDIA_TYPE '\",\n \"data\": \"' $IMAGE_BASE64 '\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\nShellPythonTypeScript\nShellPythonTypeScript\nShell\nShell\n\nPython\nPython\nTypeScript\nTypeScript\n\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n```\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n\n```\nJSON{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\nJSON\nJSON\n\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n```\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n\n```\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 2075,
"tokenUsage": {
@@ -53680,7 +53680,7 @@
"score": 0,
"namedScores": {},
"text": "The Generated Answer is partially correct but contains some additional information not mentioned in the Correct Answer. The key points from the Correct Answer - the multimodal cookbook with tips and best practices, and the API reference documentation with example API calls involving images - are both accurately included in the Generated Answer. However, the Generated Answer also mentions additional resources like a vision guide, developer community, and support team, which are not specified in the Correct Answer. While these additional resources may exist, they were not part of the original Correct Answer, so we cannot verify their accuracy. Since the Generated Answer includes the critical information from the Correct Answer but also adds unverified information, it should be marked as incorrect.\n---\nAnthropic provides several helpful resources for diving deeper into building with images using Claude:\n\n1. Multimodal cookbook: This contains tips on getting started with images and best practice techniques for ensuring high-quality performance. It includes guidance on how to effectively prompt Claude with images for tasks like interpreting charts or extracting content from forms.\n\n2. API reference: The documentation for the Messages API includes example API calls involving images.\n\n3. Vision guide: This guide provides more details on Claude's image capabilities, including supported image formats and types.\n\n4. Developer community: Anthropic has a developer community where creators can connect with each other and get help from Anthropic experts.\n\n5. Support team: Anthropic's support team is available to answer questions about using Claude with images.\n\nThese resources offer a mix of practical examples, technical documentation, and community support to help developers effectively utilize Claude's image capabilities.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Dive deeper into vision\n\nDive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n \n \n\n \n Claude Cookbook\n\nClaude Cookbook\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Vision\n\nVision\n\n\nClaude can read both text and images in requests. Currently, we support the base64 source type for images, and the image/jpeg, image/png, image/gif, and image/webp media types. See our vision guide for more details.\nShell Python TypeScript #!/bin/sh IMAGE_URL = \"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\" IMAGE_MEDIA_TYPE = \"image/jpeg\" IMAGE_BASE64 = $( curl \" $IMAGE_URL \" | base64 ) curl https://api.anthropic.com/v1/messages \\ --header \"x-api-key: $ANTHROPIC_API_KEY \" \\ --header \"anthropic-version: 2023-06-01\" \\ --header \"content-type: application/json\" \\ --data \\ '{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"' $IMAGE_MEDIA_TYPE '\",\n \"data\": \"' $IMAGE_BASE64 '\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\nShellPythonTypeScript\nShellPythonTypeScript\nShell\nShell\n\nPython\nPython\nTypeScript\nTypeScript\n\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n```\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n\n```\nJSON{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\nJSON\nJSON\n\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n```\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n\n```\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Dive deeper into vision\n\nDive deeper into vision\n\n\nReady to start building with images using Claude? Here are a few helpful resources:\nMultimodal cookbook: This cookbook has tips on getting started with images and best practice techniques to ensure the highest quality performance with images. See how you can effectively prompt Claude with images to carry out tasks such as interpreting and analyzing charts or extracting content from forms.\nAPI reference: Visit our documentation for the Messages API, including example API calls involving images.\nIf you have any other questions, feel free to reach out to our support team. You can also join our developer community to connect with other creators and get help from Anthropic experts.\nGoogle Sheets add-onTool use (function calling)xlinkedin\nGoogle Sheets add-onTool use (function calling)\nxlinkedin\nHow to use vision Before you upload Evaluate image size Calculate image costs Ensuring image quality Prompt examples About the prompt examples Limitations FAQ Dive deeper into vision\nHow to use visionBefore you uploadEvaluate image sizeCalculate image costsEnsuring image qualityPrompt examplesAbout the prompt examplesLimitationsFAQDive deeper into vision\n \n \n\n \n Claude Cookbooks\n\nClaude Cookbooks\n\n\nDive into practical examples and hands-on tutorials with our collection of Jupyter notebooks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.Tool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.Embeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\nPDF Upload & SummarizationLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\n\nPDF Upload & Summarization\nLearn how to upload PDFs and have Claude summarize their content, making it easy to digest long documents.\nTool Use & Function CallingDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\n\nTool Use & Function Calling\nDiscover how to extend Claude’s capabilities by integrating external tools and functions into your workflows.\nEmbeddings with VoyageAIExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n\nEmbeddings with VoyageAI\nExplore how to create and use embeddings with VoyageAI for advanced text similarity and search tasks.\n \n \n\n \n Vision\n\nVision\n\n\nClaude can read both text and images in requests. Currently, we support the base64 source type for images, and the image/jpeg, image/png, image/gif, and image/webp media types. See our vision guide for more details.\nShell Python TypeScript #!/bin/sh IMAGE_URL = \"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\" IMAGE_MEDIA_TYPE = \"image/jpeg\" IMAGE_BASE64 = $( curl \" $IMAGE_URL \" | base64 ) curl https://api.anthropic.com/v1/messages \\ --header \"x-api-key: $ANTHROPIC_API_KEY \" \\ --header \"anthropic-version: 2023-06-01\" \\ --header \"content-type: application/json\" \\ --data \\ '{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"' $IMAGE_MEDIA_TYPE '\",\n \"data\": \"' $IMAGE_BASE64 '\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\nShellPythonTypeScript\nShellPythonTypeScript\nShell\nShell\n\nPython\nPython\nTypeScript\nTypeScript\n\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n```\n#!/bin/sh\n\nIMAGE_URL=\"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg\"\nIMAGE_MEDIA_TYPE=\"image/jpeg\"\nIMAGE_BASE64=$(curl \"$IMAGE_URL\" | base64)\n\ncurl https://api.anthropic.com/v1/messages \\\n --header \"x-api-key: $ANTHROPIC_API_KEY\" \\\n --header \"anthropic-version: 2023-06-01\" \\\n --header \"content-type: application/json\" \\\n --data \\\n'{\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"max_tokens\": 1024,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"image\", \"source\": {\n \"type\": \"base64\",\n \"media_type\": \"'$IMAGE_MEDIA_TYPE'\",\n \"data\": \"'$IMAGE_BASE64'\"\n }},\n {\"type\": \"text\", \"text\": \"What is in the above image?\"}\n ]}\n ]\n}'\n\n```\nJSON{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\nJSON\nJSON\n\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n```\n{\n \"id\": \"msg_01EcyWo6m4hyW8KHs2y2pei5\",\n \"type\": \"message\",\n \"role\": \"assistant\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": \"This image shows an ant, specifically a close-up view of an ant. The ant is shown in detail, with its distinct head, antennae, and legs clearly visible. The image is focused on capturing the intricate details and features of the ant, likely taken with a macro lens to get an extreme close-up perspective.\"\n }\n ],\n \"model\": \"claude-3-5-sonnet-20240620\",\n \"stop_reason\": \"end_turn\",\n \"stop_sequence\": null,\n \"usage\": {\n \"input_tokens\": 1551,\n \"output_tokens\": 71\n }\n}\n\n```\n \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 4710,
"tokenUsage": {
@@ -54060,7 +54060,7 @@
"score": 1,
"namedScores": {},
"text": "Two key benefits of using the Anthropic Evaluation tool when developing prompts for an AI classification application are:\n\n1. Identifying edge cases where the prompt might falter. The Evaluation tool helps users spot patterns and identify areas where the prompt's performance may be inconsistent or unreliable.\n\n2. Ensuring consistent performance across a range of inputs. By reviewing results across test cases, users can refine their prompts to achieve more reliable and consistent classification performance.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two key benefits of using the Anthropic Evaluation tool when developing prompts for an AI classification application?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Before prompt engineering\n\nText\n Before prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n\nSummary: \n This guide assumes you have a clear definition of success criteria, ways to empirically test against those criteria, and a first draft prompt to improve. If not, it suggests spending time establishing those first, and provides a prompt generator in the Claude Console as a starting point. \n \n\n \n Understanding Results\n\nText\n Understanding Results\n\n\nThe Evaluation tool helps you:\nIdentify edge cases where your prompt might falter\nRate individual results to determine cases where your prompt performance better or worse\nEnsure consistent performance across a range of inputs\nRefine your prompt for better reliability\nBy reviewing results across test cases, you can spot patterns and make informed adjustments to your prompt.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\n\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nStart evaluating your prompts today to build more robust AI applications with Claude!\nReducing latencyGlossaryxlinkedin\nReducing latencyGlossary\nxlinkedin\nAccessing the Evaluate Feature Creating Test Cases Tips for Effective Evaluation Understanding Results\nAccessing the Evaluate FeatureCreating Test CasesTips for Effective EvaluationUnderstanding Results\n \n\nSummary: \n The Evaluation tool helps users identify edge cases, rate individual results, ensure consistent performance, and refine prompts for better reliability. By reviewing results across test cases, users can spot patterns and make informed adjustments to their prompts. The Evaluation tool is currently in beta, and user feedback is valuable for the Anthropic team. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two key benefits of using the Anthropic Evaluation tool when developing prompts for an AI classification application?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Before prompt engineering\n\nText\n Before prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n\nSummary: \n This guide assumes you have a clear definition of success criteria, ways to empirically test against those criteria, and a first draft prompt to improve. If not, it suggests spending time establishing those first, and provides a prompt generator in the Claude Console as a starting point. \n \n\n \n Understanding Results\n\nText\n Understanding Results\n\n\nThe Evaluation tool helps you:\nIdentify edge cases where your prompt might falter\nRate individual results to determine cases where your prompt performance better or worse\nEnsure consistent performance across a range of inputs\nRefine your prompt for better reliability\nBy reviewing results across test cases, you can spot patterns and make informed adjustments to your prompt.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\n\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nStart evaluating your prompts today to build more robust AI applications with Claude!\nReducing latencyGlossaryxlinkedin\nReducing latencyGlossary\nxlinkedin\nAccessing the Evaluate Feature Creating Test Cases Tips for Effective Evaluation Understanding Results\nAccessing the Evaluate FeatureCreating Test CasesTips for Effective EvaluationUnderstanding Results\n \n\nSummary: \n The Evaluation tool helps users identify edge cases, rate individual results, ensure consistent performance, and refine prompts for better reliability. By reviewing results across test cases, users can spot patterns and make informed adjustments to their prompts. The Evaluation tool is currently in beta, and user feedback is valuable for the Anthropic team. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "Haiku: T-0.0",
"latencyMs": 1261,
"tokenUsage": {
@@ -54098,7 +54098,7 @@
"score": 1,
"namedScores": {},
"text": "Based on the provided context, two key benefits of using the Anthropic Evaluation tool when developing prompts for an AI classification application are:\n\n1. Identifying edge cases where your prompt might falter, allowing you to refine the prompt to handle a wider range of inputs more reliably.\n\n2. Ensuring consistent performance across a range of inputs, which helps build more robust AI applications.\n\nThe tool also allows users to rate individual results and spot patterns across test cases, enabling informed adjustments to improve prompt reliability and performance.",
- "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two key benefits of using the Anthropic Evaluation tool when developing prompts for an AI classification application?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Before prompt engineering\n\nText\n Before prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n\nSummary: \n This guide assumes you have a clear definition of success criteria, ways to empirically test against those criteria, and a first draft prompt to improve. If not, it suggests spending time establishing those first, and provides a prompt generator in the Claude Console as a starting point. \n \n\n \n Understanding Results\n\nText\n Understanding Results\n\n\nThe Evaluation tool helps you:\nIdentify edge cases where your prompt might falter\nRate individual results to determine cases where your prompt performance better or worse\nEnsure consistent performance across a range of inputs\nRefine your prompt for better reliability\nBy reviewing results across test cases, you can spot patterns and make informed adjustments to your prompt.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\n\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nStart evaluating your prompts today to build more robust AI applications with Claude!\nReducing latencyGlossaryxlinkedin\nReducing latencyGlossary\nxlinkedin\nAccessing the Evaluate Feature Creating Test Cases Tips for Effective Evaluation Understanding Results\nAccessing the Evaluate FeatureCreating Test CasesTips for Effective EvaluationUnderstanding Results\n \n\nSummary: \n The Evaluation tool helps users identify edge cases, rate individual results, ensure consistent performance, and refine prompts for better reliability. By reviewing results across test cases, users can spot patterns and make informed adjustments to their prompts. The Evaluation tool is currently in beta, and user feedback is valuable for the Anthropic team. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbook More Resources\nText capabilities and use casesClaude CookbookMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
+ "prompt": "\n You have been tasked with helping us to answer the following query: \n \n What are two key benefits of using the Anthropic Evaluation tool when developing prompts for an AI classification application?\n \n You have access to the following documents which are meant to provide context as you answer the query:\n \n \n \n Before prompt engineering\n\nText\n Before prompt engineering\n\n\nThis guide assumes that you have:\nA clear definition of the success criteria for your use case\nSome ways to empirically test against those criteria\nA first draft prompt you want to improve\nIf not, we highly suggest you spend time establishing that first. Check out Define your success criteria and Create strong empirical evaluations for tips and guidance.\nPrompt generatorDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n\nPrompt generator\nDon’t have a first draft prompt? Try the prompt generator in the Claude Console!\n \n\nSummary: \n This guide assumes you have a clear definition of success criteria, ways to empirically test against those criteria, and a first draft prompt to improve. If not, it suggests spending time establishing those first, and provides a prompt generator in the Claude Console as a starting point. \n \n\n \n Understanding Results\n\nText\n Understanding Results\n\n\nThe Evaluation tool helps you:\nIdentify edge cases where your prompt might falter\nRate individual results to determine cases where your prompt performance better or worse\nEnsure consistent performance across a range of inputs\nRefine your prompt for better reliability\nBy reviewing results across test cases, you can spot patterns and make informed adjustments to your prompt.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\n\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nRemember that the Evaluation tool is in beta. Your feedback is valuable! If you encounter any issues or have suggestions, please reach out to the Anthropic team.\nStart evaluating your prompts today to build more robust AI applications with Claude!\nReducing latencyGlossaryxlinkedin\nReducing latencyGlossary\nxlinkedin\nAccessing the Evaluate Feature Creating Test Cases Tips for Effective Evaluation Understanding Results\nAccessing the Evaluate FeatureCreating Test CasesTips for Effective EvaluationUnderstanding Results\n \n\nSummary: \n The Evaluation tool helps users identify edge cases, rate individual results, ensure consistent performance, and refine prompts for better reliability. By reviewing results across test cases, users can spot patterns and make informed adjustments to their prompts. The Evaluation tool is currently in beta, and user feedback is valuable for the Anthropic team. \n \n\n \n More Resources\n\nText\n More Resources\n\n\nFrom crafting the perfect prompt to understanding API details, we’ve got you covered.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.Prompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.API DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nPrompt Engineering GuideMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\n\nPrompt Engineering Guide\nMaster the art of prompt crafting to get the most out of Claude. Especially useful for fine-tuning with legacy models.\nPrompt LibraryFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\n\nPrompt Library\nFind a wide range of pre-crafted prompts for various tasks and industries. Perfect for inspiration or quick starts.\nAPI DocumentationEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\n\nAPI Documentation\nEverything you need to interact with Claude via our API: request formats, response handling, and troubleshooting.\nLong context tipsEmbeddingsxlinkedin\nLong context tipsEmbeddings\nxlinkedin\nText capabilities and use cases Claude Cookbooks More Resources\nText capabilities and use casesClaude CookbooksMore Resources\n \n\nSummary: \n The Claude Documentation provides a Prompt Engineering Guide to help users master the art of prompt crafting, a Prompt Library with pre-crafted prompts for various tasks, and API Documentation for interacting with the Claude AI model. These resources are designed to help users get the most out of the Claude model, particularly for fine-tuning with legacy models. \n \n\n \n Please remain faithful to the underlying context, and only deviate from it if you are 100% sure that you know the answer already. \n Answer the question now, and avoid providing preamble such as 'Here is the answer', etc\n ",
"provider": "3.5 Sonnet: T-0.0",
"latencyMs": 2774,
"tokenUsage": {
@@ -55616,7 +55616,7 @@
{
"vars": {
"query": "What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?",
- "correct_answer": "The Claude Cookbook provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting."
+ "correct_answer": "The Claude Cookbooks provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting."
},
"assert": [
{
@@ -56049,8 +56049,8 @@
},
{
"vars": {
- "query": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
- "correct_answer": "The Claude Cookbook provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs."
+ "query": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
+ "correct_answer": "The Claude Cookbooks provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs."
},
"assert": [
{
diff --git a/skills/retrieval_augmented_generation/data/retrieval_results.json b/skills/retrieval_augmented_generation/data/retrieval_results.json
index 7bad6f4..23f35ef 100644
--- a/skills/retrieval_augmented_generation/data/retrieval_results.json
+++ b/skills/retrieval_augmented_generation/data/retrieval_results.json
@@ -28819,11 +28819,11 @@
"id": "python:provider_retrieval.py:retrieve_base"
},
"prompt": {
- "raw": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
+ "raw": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
"label": "{{ query }}"
},
"vars": {
- "query": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
+ "query": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
"correct_chunks": "[\"https://docs.claude.com/en/docs/welcome#develop-with-claude\",\"https://docs.claude.com/en/docs/quickstart#next-steps\"]"
},
"response": {
@@ -28948,11 +28948,11 @@
"id": "python:provider_retrieval.py:retrieve_level_two"
},
"prompt": {
- "raw": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
+ "raw": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
"label": "{{ query }}"
},
"vars": {
- "query": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
+ "query": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
"correct_chunks": "[\"https://docs.claude.com/en/docs/welcome#develop-with-claude\",\"https://docs.claude.com/en/docs/quickstart#next-steps\"]"
},
"response": {
@@ -29851,11 +29851,11 @@
"id": "python:provider_retrieval.py:retrieve_level_three"
},
"prompt": {
- "raw": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
+ "raw": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
"label": "{{ query }}"
},
"vars": {
- "query": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
+ "query": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
"correct_chunks": "[\"https://docs.claude.com/en/docs/welcome#develop-with-claude\",\"https://docs.claude.com/en/docs/quickstart#next-steps\"]"
},
"response": {
@@ -66207,7 +66207,7 @@
"score": 0.8,
"namedScores": {},
"text": "[\"https://docs.claude.com/en/docs/welcome#develop-with-claude\",\"https://docs.claude.com/en/docs/quickstart#next-steps\",\"https://docs.claude.com/en/docs/build-with-claude/text-generation#anthropic-cookbook\"]",
- "prompt": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
+ "prompt": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
"provider": "python:provider_retrieval.py:retrieve_base",
"latencyMs": 1373,
"gradingResult": {
@@ -66322,7 +66322,7 @@
"score": 0.8,
"namedScores": {},
"text": "[\"https://docs.claude.com/en/docs/quickstart#next-steps\",\"https://docs.claude.com/en/api/#accessing-the-api\",\"https://docs.claude.com/en/docs/welcome#develop-with-claude\"]",
- "prompt": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
+ "prompt": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
"provider": "python:provider_retrieval.py:retrieve_level_two",
"latencyMs": 1494,
"gradingResult": {
@@ -66437,7 +66437,7 @@
"score": 0.8,
"namedScores": {},
"text": "[\"https://docs.claude.com/en/docs/build-with-claude/text-generation#anthropic-cookbook\",\"https://docs.claude.com/en/docs/quickstart#next-steps\",\"https://docs.claude.com/en/docs/welcome#develop-with-claude\"]",
- "prompt": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
+ "prompt": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
"provider": "python:provider_retrieval.py:retrieve_level_three",
"latencyMs": 4931,
"gradingResult": {
@@ -66550,7 +66550,7 @@
],
"test": {
"vars": {
- "query": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
+ "query": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
"correct_chunks": "[\"https://docs.claude.com/en/docs/welcome#develop-with-claude\",\"https://docs.claude.com/en/docs/quickstart#next-steps\"]"
},
"assert": [
@@ -66564,7 +66564,7 @@
},
"vars": [
"[\"https://docs.claude.com/en/docs/welcome#develop-with-claude\",\"https://docs.claude.com/en/docs/quickstart#next-steps\"]",
- "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?"
+ "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?"
]
},
{
@@ -76465,7 +76465,7 @@
},
{
"vars": {
- "query": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
+ "query": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
"correct_chunks": "[\"https://docs.claude.com/en/docs/welcome#develop-with-claude\",\"https://docs.claude.com/en/docs/quickstart#next-steps\"]"
},
"assert": [
diff --git a/skills/retrieval_augmented_generation/evaluation/csvs/evaluation_results_detailed.csv b/skills/retrieval_augmented_generation/evaluation/csvs/evaluation_results_detailed.csv
index f6cc622..de89142 100644
--- a/skills/retrieval_augmented_generation/evaluation/csvs/evaluation_results_detailed.csv
+++ b/skills/retrieval_augmented_generation/evaluation/csvs/evaluation_results_detailed.csv
@@ -74,7 +74,7 @@ How do the streaming API delta formats differ between tool_use content blocks an
What are the image file size limits when uploading images to Claude using the API versus on claude.ai?,0.3333333333333333,1.0,1.0,True
What is one key consideration when selecting a Claude model for an enterprise use case that needs low latency?,0.6666666666666666,1.0,0.5,True
"What embedding model does Anthropic recommend for code retrieval, and how does its performance compare to alternatives according to Voyage AI?",0.6666666666666666,1.0,1.0,True
-What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?,0.6666666666666666,1.0,1.0,True
+What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?,0.6666666666666666,1.0,1.0,True
How does the size of the context window impact a language model's ability to utilize retrieval augmented generation (RAG)?,0.6666666666666666,1.0,1.0,True
How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?,0.3333333333333333,0.5,1.0,True
Which Claude model has the fastest comparative latency according to the comparison tables?,0.6666666666666666,1.0,1.0,True
diff --git a/skills/retrieval_augmented_generation/evaluation/csvs/evaluation_results_detailed_level_three.csv b/skills/retrieval_augmented_generation/evaluation/csvs/evaluation_results_detailed_level_three.csv
index 8e87b18..c7e6d82 100644
--- a/skills/retrieval_augmented_generation/evaluation/csvs/evaluation_results_detailed_level_three.csv
+++ b/skills/retrieval_augmented_generation/evaluation/csvs/evaluation_results_detailed_level_three.csv
@@ -74,7 +74,7 @@ How do the streaming API delta formats differ between tool_use content blocks an
What are the image file size limits when uploading images to Claude using the API versus on claude.ai?,0.3333333333333333,1.0,1.0,True
What is one key consideration when selecting a Claude model for an enterprise use case that needs low latency?,0.3333333333333333,0.5,0.5,True
"What embedding model does Anthropic recommend for code retrieval, and how does its performance compare to alternatives according to Voyage AI?",0.3333333333333333,0.5,1.0,False
-What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?,0.6666666666666666,1.0,0.5,False
+What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?,0.6666666666666666,1.0,0.5,False
How does the size of the context window impact a language model's ability to utilize retrieval augmented generation (RAG)?,0.6666666666666666,1.0,1.0,True
How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?,0.3333333333333333,0.5,1.0,True
Which Claude model has the fastest comparative latency according to the comparison tables?,0.0,0.0,0.0,True
diff --git a/skills/retrieval_augmented_generation/evaluation/csvs/evaluation_results_detailed_level_two.csv b/skills/retrieval_augmented_generation/evaluation/csvs/evaluation_results_detailed_level_two.csv
index cb6ba10..4aaf792 100644
--- a/skills/retrieval_augmented_generation/evaluation/csvs/evaluation_results_detailed_level_two.csv
+++ b/skills/retrieval_augmented_generation/evaluation/csvs/evaluation_results_detailed_level_two.csv
@@ -74,7 +74,7 @@ How do the streaming API delta formats differ between tool_use content blocks an
What are the image file size limits when uploading images to Claude using the API versus on claude.ai?,0.3333333333333333,1.0,1.0,True
What is one key consideration when selecting a Claude model for an enterprise use case that needs low latency?,0.6666666666666666,1.0,1.0,True
"What embedding model does Anthropic recommend for code retrieval, and how does its performance compare to alternatives according to Voyage AI?",0.6666666666666666,1.0,1.0,True
-What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?,0.6666666666666666,1.0,1.0,True
+What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?,0.6666666666666666,1.0,1.0,True
How does the size of the context window impact a language model's ability to utilize retrieval augmented generation (RAG)?,0.6666666666666666,1.0,1.0,True
How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?,0.3333333333333333,0.5,1.0,True
Which Claude model has the fastest comparative latency according to the comparison tables?,0.3333333333333333,0.5,1.0,True
diff --git a/skills/retrieval_augmented_generation/evaluation/csvs/evaluation_results_detailed_one.csv b/skills/retrieval_augmented_generation/evaluation/csvs/evaluation_results_detailed_one.csv
index 38ecb55..5d37142 100644
--- a/skills/retrieval_augmented_generation/evaluation/csvs/evaluation_results_detailed_one.csv
+++ b/skills/retrieval_augmented_generation/evaluation/csvs/evaluation_results_detailed_one.csv
@@ -74,7 +74,7 @@ How do the streaming API delta formats differ between tool_use content blocks an
What are the image file size limits when uploading images to Claude using the API versus on claude.ai?,0.3333333333333333,1.0,1.0,True
What is one key consideration when selecting a Claude model for an enterprise use case that needs low latency?,0.6666666666666666,1.0,0.5,True
"What embedding model does Anthropic recommend for code retrieval, and how does its performance compare to alternatives according to Voyage AI?",0.6666666666666666,1.0,1.0,True
-What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?,0.6666666666666666,1.0,1.0,False
+What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?,0.6666666666666666,1.0,1.0,False
How does the size of the context window impact a language model's ability to utilize retrieval augmented generation (RAG)?,0.6666666666666666,1.0,1.0,True
How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?,0.3333333333333333,0.5,1.0,True
Which Claude model has the fastest comparative latency according to the comparison tables?,0.6666666666666666,1.0,1.0,True
diff --git a/skills/retrieval_augmented_generation/evaluation/docs_evaluation_dataset.json b/skills/retrieval_augmented_generation/evaluation/docs_evaluation_dataset.json
index fd36743..563b1d6 100644
--- a/skills/retrieval_augmented_generation/evaluation/docs_evaluation_dataset.json
+++ b/skills/retrieval_augmented_generation/evaluation/docs_evaluation_dataset.json
@@ -399,7 +399,7 @@
"https://docs.claude.com/en/docs/quickstart#next-steps",
"https://docs.claude.com/en/docs/welcome#develop-with-claude"
],
- "correct_answer": "The Claude Cookbook provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting."
+ "correct_answer": "The Claude Cookbooks provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting."
},
{
"id": "c417a6d5",
@@ -668,12 +668,12 @@
},
{
"id": "142b8567",
- "question": "What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?",
+ "question": "What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?",
"correct_chunks": [
"https://docs.claude.com/en/docs/welcome#develop-with-claude",
"https://docs.claude.com/en/docs/quickstart#next-steps"
],
- "correct_answer": "The Claude Cookbook provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs."
+ "correct_answer": "The Claude Cookbooks provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs."
},
{
"id": "79f3daa2",
diff --git a/skills/retrieval_augmented_generation/evaluation/promptfoo_datasets/end_to_end_dataset.csv b/skills/retrieval_augmented_generation/evaluation/promptfoo_datasets/end_to_end_dataset.csv
index 585c288..ce999d8 100644
--- a/skills/retrieval_augmented_generation/evaluation/promptfoo_datasets/end_to_end_dataset.csv
+++ b/skills/retrieval_augmented_generation/evaluation/promptfoo_datasets/end_to_end_dataset.csv
@@ -1,101 +1,101 @@
-query,correct_answer,__expected
-"How can you create multiple test cases for an evaluation in the Anthropic Evaluation tool?","To create multiple test cases in the Anthropic Evaluation tool, click the 'Add Test Case' button, fill in values for each variable in your prompt, and repeat the process to create additional test case scenarios.","python:file://eval_end_to_end.py"
-"What embeddings provider does Anthropic recommend for customized domain-specific models, and what capabilities does this provider offer?","Anthropic recommends Voyage AI for embedding models. Voyage AI offers customized models for specific industry domains like finance and healthcare, as well as bespoke fine-tuned models for individual customers. They have a wide variety of options and capabilities.","python:file://eval_end_to_end.py"
-"What are some key success metrics to consider when evaluating Claude's performance on a classification task, and how do they relate to choosing the right model to reduce latency?","When evaluating Claude's performance on a classification task, some key success metrics to consider include accuracy, F1 score, consistency, structure, speed, bias and fairness. Choosing the right model that fits your specific requirements in terms of speed and output quality is a straightforward way to reduce latency and meet the acceptable response time for your use case.","python:file://eval_end_to_end.py"
-"What are two ways that Claude for Sheets can improve prompt engineering workflows compared to using chained prompts?","Claude for Sheets enables testing prompts across evaluation suites in parallel, which is faster than running chained prompts sequentially. It also excels at office tasks like survey analysis and online data processing that may be more cumbersome with chained prompts.","python:file://eval_end_to_end.py"
-"What happens if a prompt for the Text Completions API is missing the ""\n\nHuman:"" and ""\n\nAssistant:"" turns?","If a prompt for the Text Completions API is missing the required ""\n\nHuman:"" and ""\n\nAssistant:"" turns, it will result in an API error.","python:file://eval_end_to_end.py"
-"How do the additional tokens required for tool use in Claude API requests impact pricing compared to regular API requests?","Tool use requests in the Claude API are priced the same as regular API requests, based on the total input and output tokens. However, tool use requests have additional tokens beyond the regular input and output, including the tools parameter, tool use content blocks, tool result content blocks, and a special system prompt that enables tool use, which add to the total tokens and cost.","python:file://eval_end_to_end.py"
-"When will the new Anthropic Developer Console features that show API usage, billing details, and rate limits be available?","The new Usage, Cost, and Rate Limits tabs in the Anthropic Developer Console that show API usage, billing details, and current rate limits will be available on June 27th, 2024.","python:file://eval_end_to_end.py"
-"When deciding whether to use chain-of-thought (CoT) for a task, what are two key factors to consider in order to strike the right balance between performance and latency?","When deciding whether to use CoT, consider if the task requires in-depth thinking that a human would need to work through, and be aware that the increased output length from CoT may impact latency.","python:file://eval_end_to_end.py"
-"How can I use Claude to more easily digest the content of long PDF documents?","You can upload PDFs and have Claude summarize their content, making it easier to understand the key points of long documents without having to read through everything.","python:file://eval_end_to_end.py"
-"According to the documentation, where can you view your organization's current API rate limits in the Claude Console?","You can view your organization's current API rate limits in the Rate Limits tab of the Developer Console.","python:file://eval_end_to_end.py"
-"How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy?","In addition to accuracy, we can measure the 95th percentile response time and average cost per classification to assess the ticket classification system's performance and production-readiness.","python:file://eval_end_to_end.py"
-"How can you specify a system prompt using the Text Completions API versus the Messages API?","With the Text Completions API, the system prompt is added as text before the first ""\n\nHuman:"" turn. With the Messages API, the system prompt is specified using the separate ""system"" parameter when making the API request.","python:file://eval_end_to_end.py"
-"How can you combine XML tags with chain of thought reasoning to create high-performance prompts for Claude?","You can combine XML tags like and with chain of thought reasoning, where Claude explains its step-by-step reasoning process, to create structured, high-performance prompts. For example, you can prompt Claude to show its reasoning by including ""Before answering, explain your reasoning step-by-step in tags."" in the user message or system prompt.","python:file://eval_end_to_end.py"
-"When evaluating the Claude model's performance for ticket routing, what three key metrics are calculated and what are the results for the claude-3-haiku-20240307 model on the 91 test samples?","When evaluating the claude-3-haiku-20240307 model's performance on the 91 test samples, the three key metrics calculated are accuracy (89.01%), 95th percentile response time (1.61 seconds), and average cost per request routing ($0.0004).","python:file://eval_end_to_end.py"
-"Before starting to engineer and improve a prompt in Claude, what key things does Anthropic recommend you have in place first?","Before prompt engineering, Anthropic highly recommends having a clear definition of success criteria for your use case, some ways to empirically test against those criteria, and a first draft prompt you want to improve.","python:file://eval_end_to_end.py"
-"How does the Messages API handle mid-response prompting compared to the Text Completions API?","The Messages API allows you to continue a response by making the last input message have the ""assistant"" role, whereas the Text Completions API lets you pre-fill part of Claude's response directly in the prompt string.","python:file://eval_end_to_end.py"
-"How does Claude's response differ when given a role through a system prompt compared to not having a specific role in the financial analysis example?","When given the role of CFO through a system prompt, Claude provides a much more insightful, structured, and actionable financial analysis compared to not having a specific role. The role-based response breaks down key financial metrics, provides strategic commentary, and makes specific recommendations.","python:file://eval_end_to_end.py"
-"What are some quantitative metrics that can be used to measure the success of a sentiment analysis model, and how might specific targets for those metrics be determined?","Quantitative metrics for evaluating a sentiment analysis model include task-specific metrics like F1 score, as well as generic metrics like accuracy, precision, and recall. Specific targets should be based on industry benchmarks, prior experiments, AI research, or expert knowledge, and should represent an improvement over the current baseline.","python:file://eval_end_to_end.py"
-"What is a power user tip mentioned in the documentation for creating high-performance prompts using XML tags?","Combining XML tags with other prompt engineering techniques like multishot prompting (using tags) or chain of thought (using and tags) to create super-structured, high-performance prompts.","python:file://eval_end_to_end.py"
-"How can you use an LLM like Claude to automatically grade the outputs of other LLMs based on a rubric?","You can use an LLM like Claude to grade the outputs of other LLMs by providing it with the output to grade along with a detailed rubric. Instruct the LLM to think through its reasoning and then output a simple 'correct' or 'incorrect' result based on how well the output matches the criteria in the rubric.","python:file://eval_end_to_end.py"
-"How can you access and deploy Voyage embeddings on AWS Marketplace?","To access Voyage embeddings on AWS, subscribe to the model package on AWS Marketplace, select the model to deploy, agree to the terms, and copy the Product ARN for your selected region. Then create a JupyterLab space in SageMaker Studio, upload Voyage's notebook, and follow the instructions to deploy the model package using the ARN.","python:file://eval_end_to_end.py"
-"When using tools just to get Claude to produce JSON output following a particular schema, what key things should you do in terms of tool setup and prompting?","When using tools to get JSON output, you should provide a single tool, set the tool_choice to explicitly instruct the model to use that tool, and ensure the tool name and description are from the model's perspective since it will pass the input to the tool.","python:file://eval_end_to_end.py"
-"What are the key differences between the legacy Claude Instant 1.2 model and the Claude 3 Haiku model in terms of capabilities and performance?","The Claude 3 Haiku model has vision capabilities, is faster, more performant, and more intelligent than the legacy Claude Instant 1.2 model. Claude 3 Haiku also has more up-to-date training data.","python:file://eval_end_to_end.py"
-"What is one key benefit of using examples when prompt engineering with Claude?","One key benefit of using examples in prompts is that they reduce misinterpretation of instructions, leading to more accurate outputs from Claude.","python:file://eval_end_to_end.py"
-"According to the Claude Documentation, what is one key advantage of using prompt engineering instead of fine-tuning when it comes to adapting an AI model to new domains or tasks?","Prompt engineering allows you to easily adapt AI models to new domains by providing domain-specific context directly in the prompts, without needing to retrain the model through fine-tuning.","python:file://eval_end_to_end.py"
-"How can I quickly get started using the Claude for Sheets extension with a pre-made template?","You can make a copy of Anthropic's provided Claude for Sheets workbook template to quickly get started using the extension with your own work.","python:file://eval_end_to_end.py"
-"How does the ""index"" field in the ""content_block_delta"" event relate to the text being streamed in a response?","The ""index"" field in each ""content_block_delta"" event indicates which content block the text delta applies to. Multiple deltas with the same index consecutively stream the text for a single content block in the response.","python:file://eval_end_to_end.py"
-"How can you include an image as part of a Claude API request, and what image formats are currently supported?","To include an image in a Claude API request, provide it as a base64-encoded image in an ""image"" content block within the ""messages"" array. The currently supported image formats are JPEG, PNG, GIF, and WebP.","python:file://eval_end_to_end.py"
-"What is the relationship between time to first token (TTFT) and latency when evaluating a language model's performance?","TTFT is a specific measure of latency that captures the time it takes for a language model to generate the first token of its response after receiving a prompt. It is an important component of a model's overall latency and responsiveness, especially for interactive applications.","python:file://eval_end_to_end.py"
-"How can providing Claude with examples of handling certain edge cases like implicit requests or emotional prioritization help improve its performance in routing support tickets?","Providing edge case examples to Claude in the prompt can meaningfully improve its performance in correctly routing support tickets in scenarios where it may otherwise misclassify them, such as implicit requests, emotional prioritization, ambiguous intent vs. routing, or issue prioritization.","python:file://eval_end_to_end.py"
-"How does the stop_reason of ""tool_use"" relate to the overall workflow of integrating external tools with Claude?","When Claude determines that one of the user-provided tools can help answer the user's query, it constructs a tool use request. This causes the API response to have a stop_reason of ""tool_use"", signaling Claude's intent to use the tool. The user must then extract the tool input from Claude's request, run the actual tool code client-side, and continue the conversation by sending the tool results back to Claude.","python:file://eval_end_to_end.py"
-"According to the documentation, what error event and corresponding HTTP error code may be sent during periods of high usage for the Claude API when using streaming responses?","During periods of high usage, an overloaded_error event may be sent in the event stream, which would normally correspond to an HTTP 529 error code in a non-streaming context.","python:file://eval_end_to_end.py"
-"What are the two types of deltas that can be contained in a content_block_delta event when streaming responses from the Claude API?","The two types of deltas that can be contained in a content_block_delta event are text_delta and input_json_delta.","python:file://eval_end_to_end.py"
-"On what date did Claude 3.5 Sonnet and tool use both become generally available across the Claude API, Amazon Bedrock, and Google Vertex AI?","Claude 3.5 Sonnet became generally available across those platforms on June 20th, 2024, while tool use became generally available on May 30th, 2024.","python:file://eval_end_to_end.py"
-"In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe?","Anthropic launched Claude.ai and the Claude iOS app in Europe in May 2024, and then launched them in Canada the following month in June 2024.","python:file://eval_end_to_end.py"
-"When the API response from Claude has a stop_reason of ""tool_use"", what does this indicate and what should be done next to continue the conversation?","A stop_reason of ""tool_use"" signals that Claude has decided to use a tool and has constructed a formatted tool use request. To continue the conversation, the tool name and input should be extracted from Claude's request, the actual tool code should be executed client-side, and then a new user message containing a tool_result content block should be sent to Claude.","python:file://eval_end_to_end.py"
-"What Python libraries are used in the example code snippet for evaluating tone and style in a customer service chatbot?","The example code snippet for evaluating tone and style in a customer service chatbot uses the anthropic Python library to interact with the Claude AI model.","python:file://eval_end_to_end.py"
-"What are the two main ways to authenticate when using the Anthropic Python SDK to access Claude models on Amazon Bedrock?","The two main ways to authenticate are: 1) Directly providing the aws_access_key, aws_secret_key, and optionally aws_session_token, or 2) Using the default AWS credential providers, such as the ~/.aws/credentials file or the AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID environment variables.","python:file://eval_end_to_end.py"
-"When deciding whether to implement leak-resistant prompt engineering strategies, what two factors should be considered and balanced?","When deciding to use leak-resistant prompt engineering, the potential reduction in prompt leaks should be balanced against the risk of degraded model performance due to the added complexity of the prompt.","python:file://eval_end_to_end.py"
-"How can selecting the appropriate Claude model based on your specific requirements help reduce latency in your application?","Choosing the right Claude model that best fits your needs in terms of speed and output quality is one of the most straightforward ways to reduce latency in your application. Anthropic offers a range of Claude models with different capabilities and performance characteristics to allow you to choose the optimal balance of intelligence, speed, and cost for your use case.","python:file://eval_end_to_end.py"
-"How can you stream responses from the Claude API using the Python SDK?","You can stream responses from the Claude API using the Python SDK by using the client.messages.stream() method and iterating over the stream.text_stream attribute in a for loop.","python:file://eval_end_to_end.py"
-"How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case?","You can shape Claude's response by pre-filling part of it in the last position of the input messages list. To get a short response like a single multiple choice answer, you can set the ""max_tokens"" parameter to a small value like 1.","python:file://eval_end_to_end.py"
-"What is more important when building an eval set for an AI system - having a larger number of test cases with automated grading, or having fewer high-quality test cases graded by humans?","When building an eval set, it is better to prioritize having a larger volume of test cases with slightly lower signal automated grading over having fewer questions with high-quality human hand-grading.","python:file://eval_end_to_end.py"
-"What are the two required fields in a content_block_delta event for a text delta type?","The two required fields in a content_block_delta event for a text delta type are ""index"" and ""delta"", where the ""delta"" field contains a ""type"" of ""text_delta"" and the ""text"" being added.","python:file://eval_end_to_end.py"
-"What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?","The Claude Cookbook provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting.","python:file://eval_end_to_end.py"
-"Why does breaking a task into distinct subtasks for chained prompts help improve Claude's accuracy on the overall task?","Breaking a task into distinct subtasks for chained prompts improves Claude's accuracy because each subtask gets Claude's full attention, reducing errors compared to tackling the entire complex task at once.","python:file://eval_end_to_end.py"
-"How does the streaming format for Messages responses differ from Text Completions streaming responses?","Messages streaming responses can contain multiple content blocks of varying types, making the streaming format more complex compared to Text Completions which only include completion, ping, and error server-sent-events.","python:file://eval_end_to_end.py"
-"What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?","According to the documentation, users can start experimenting with Claude by visiting claude.ai or using Anthropic's web Console.","python:file://eval_end_to_end.py"
-"How can using chain prompts help reduce errors and inconsistency in complex tasks handled by Claude?","Chain prompts break complex tasks into smaller subtasks, allowing Claude to give its full attention to each one. This reduces errors and inconsistencies that may occur when trying to handle a complex workflow all at once.","python:file://eval_end_to_end.py"
-"What HTTP status code does an overloaded_error event correspond to in a non-streaming context for the Claude API?","In a non-streaming context, an overloaded_error event would normally correspond to an HTTP 529 status code.","python:file://eval_end_to_end.py"
-"What are the two ways to specify the format in which Voyage AI returns embeddings through its HTTP API?","When making a request to Voyage AI's embedding endpoint, you can either leave the encoding_format parameter unspecified to get the embeddings as lists of floating-point numbers, or set encoding_format to ""base64"" to get the embeddings compressed to Base64 encodings.","python:file://eval_end_to_end.py"
-"When streaming API requests that use tools, how are the input JSON deltas for tool_use content blocks sent, and how can they be accumulated and parsed by the client?","When streaming requests with tool use, the input JSON deltas for tool_use content blocks are sent as partial JSON strings in multiple content_block_delta events. The client can accumulate these partial JSON strings and parse the complete JSON object once a content_block_stop event is received, using a library like Pydantic for partial JSON parsing or helpers provided in Anthropic's SDKs.","python:file://eval_end_to_end.py"
-"What are the two interactive prompt engineering tutorials that Anthropic offers, and how do they differ?","Anthropic offers a GitHub prompting tutorial that covers prompt engineering concepts in-depth with examples, and a lighter-weight Google Sheets prompting tutorial that utilizes Claude for Sheets.","python:file://eval_end_to_end.py"
-"What are some of the key capabilities that make Claude suitable for enterprise use cases requiring integration with specialized applications and processing of large volumes of sensitive data?","Claude offers a 200K token context window, tool use for integration into specialized applications, multimodal input capabilities for richer context, and is uniquely positioned to serve high-trust industries processing large volumes of sensitive data with enterprise-grade security and data handling.","python:file://eval_end_to_end.py"
-"As of June 2024, in which regions are Anthropic's Claude.ai API and iOS app available?","As of June 2024, Anthropic's Claude.ai API and iOS app are available in the United States, Canada, and Europe.","python:file://eval_end_to_end.py"
-"What are the two main approaches for integrating Claude into a support ticket workflow, and how do they differ in terms of scalability and ease of implementation?","The two main approaches for integrating Claude into a support ticket workflow are push-based using webhooks, and pull-based. The push-based approach is more web-scalable but requires exposing a public endpoint which has IT security implications. The pull-based approach is easier to implement but makes unnecessary calls to the support ticket system.","python:file://eval_end_to_end.py"
-"When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?","On May 10th, 2024, Anthropic released a prompt generator tool that is available through the Developer Console.","python:file://eval_end_to_end.py"
-"Which Claude 3 model provides the best balance of intelligence and speed for high-throughput tasks like sales forecasting and targeted marketing?","The Claude 3 Sonnet model balances intelligence and speed, making it well-suited for high-throughput tasks like sales forecasting and targeted marketing.","python:file://eval_end_to_end.py"
-"How can you calculate the similarity between two Voyage embedding vectors, and what is this equivalent to since Voyage embeddings are normalized to length 1?","You can calculate the similarity between two Voyage embedding vectors using the dot product, which is equivalent to cosine similarity since Voyage embeddings are normalized to length 1.","python:file://eval_end_to_end.py"
-"How can using examples in prompts improve Claude's performance on complex tasks?","Well-chosen examples in prompts can boost Claude's ability to handle complex tasks by reducing misinterpretation of instructions, enforcing consistent structure and style, and serving as a guide for the desired output.","python:file://eval_end_to_end.py"
-"What are the two types of content block deltas that can be emitted when streaming responses with tool use, and what does each delta type contain?","When streaming responses with tool use, the two types of content block deltas are text deltas and input JSON deltas. Text deltas contain a ""text"" field with a string of the incrementally generated text. Input JSON deltas contain a ""partial_json"" field with a string containing part of the JSON object specifying the tool's input.","python:file://eval_end_to_end.py"
-"What are two key capabilities of Claude that enable it to build interactive systems and personalized user experiences?","Claude's question answering and text analysis capabilities enable it to build intelligent, interactive systems like chatbots and personalize user experiences by understanding sentiment and preferences.","python:file://eval_end_to_end.py"
-"What are the key event types included in a raw HTTP stream response when using message streaming, and what is the typical order they occur in?","A raw HTTP stream response includes a message_start event, followed by one or more content blocks (each with a content_block_start, content_block_delta events, and content_block_stop), a message_delta event, and a final message_stop event. Ping events may also be dispersed throughout.","python:file://eval_end_to_end.py"
-"What is the maximum number of images that can be included in a single request using the Claude API compared to the claude.ai interface?","The Messages API allows including up to 20 images per request, while the claude.ai interface has a lower limit of up to 5 images per turn.","python:file://eval_end_to_end.py"
-"When Claude's response is cut off due to hitting the max_tokens limit and contains an incomplete tool use block, what should you do to get the full tool use?","If Claude's response hits the max_tokens limit and has an incomplete tool use block, you should retry the request with a higher max_tokens value to get Claude's full response including the complete tool use.","python:file://eval_end_to_end.py"
-"What two steps are needed before running a classification evaluation on Claude according to the documentation?","Before running a classification evaluation on Claude, you need to 1) develop your test cases, and 2) take a look at Anthropic's guide to developing test cases.","python:file://eval_end_to_end.py"
-"How can you use the content parameter in the messages list to influence Claude's response?","You can provide content in the last position of the messages list, with the ""assistant"" role, to pre-fill part of Claude's response. This allows you to shape the assistant's output.","python:file://eval_end_to_end.py"
-"What are two key advantages of prompt engineering over fine-tuning when it comes to model comprehension and general knowledge preservation?","Compared to fine-tuning, prompt engineering is far more effective at helping models understand and utilize external content like retrieved documents. Prompt engineering also preserves the model's broad general knowledge, while fine-tuning risks catastrophic forgetting where the model loses its general capabilities.","python:file://eval_end_to_end.py"
-"What are the two main steps to get started with making requests to Claude models on Anthropic's Bedrock API?","To get started making requests to Claude models on Anthropic's Bedrock API, you need to: 1) Install and configure the AWS CLI, and 2) Install an SDK for accessing Bedrock, such as the Python SDK shown in the example code.","python:file://eval_end_to_end.py"
-"How can you check which Claude models are available in a specific AWS region using the AWS CLI?","You can list the available Claude models in a specific AWS region by running the command `aws bedrock list-foundation-models --region= --by-provider anthropic --query ""modelSummaries[*].modelId""`, replacing `` with the desired AWS region such as `us-west-2`.","python:file://eval_end_to_end.py"
-"What argument can be passed to the voyageai.Client.embed() method or the Voyage HTTP API to specify whether the input text is a query or a document?","The input_type argument can be passed with a value of ""query"" or ""document"" to specify the type of input text being embedded.","python:file://eval_end_to_end.py"
-"How do the streaming API delta formats differ between tool_use content blocks and text content blocks?","Tool_use content block deltas contain partial JSON strings for the input field, whereas text content block deltas directly contain the text delta. Tool_use deltas may have delays between streaming events as the model emits one complete key-value pair at a time.","python:file://eval_end_to_end.py"
-"What are the image file size limits when uploading images to Claude using the API versus on claude.ai?","When uploading images to Claude, the API has a maximum file size limit of 5MB per image, while on claude.ai the limit is 10MB per image.","python:file://eval_end_to_end.py"
-"What is one key consideration when selecting a Claude model for an enterprise use case that needs low latency?","When selecting a Claude model for an enterprise use case that requires low latency, it's important to choose the model that best balances speed and output quality based on the specific requirements of the use case.","python:file://eval_end_to_end.py"
-"What embedding model does Anthropic recommend for code retrieval, and how does its performance compare to alternatives according to Voyage AI?","For code retrieval, Voyage AI recommends using the voyage-code-2 embedding model, which they claim performs 17% better than alternatives and achieves state-of-the-art results on general-purpose corpora as well.","python:file://eval_end_to_end.py"
-"What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?","The Claude Cookbook provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs.","python:file://eval_end_to_end.py"
-"How does the size of the context window impact a language model's ability to utilize retrieval augmented generation (RAG)?","The size of the context window determines how much retrieved information can be passed to the language model to augment its knowledge when generating a response using RAG. A larger context window allows more relevant retrieved information to be utilized by the model, improving the accuracy and groundedness of the generated text.","python:file://eval_end_to_end.py"
-"How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?","The Evaluation tool helps identify edge cases where prompts might falter, allows rating individual results to determine prompt performance, ensures consistent performance across inputs, and enables prompt refinement for better reliability. Reviewing results across test cases helps spot patterns to make informed adjustments that lead to more robust AI applications.","python:file://eval_end_to_end.py"
-"Which Claude model has the fastest comparative latency according to the comparison tables?","The Claude 3 Haiku model has the fastest comparative latency","python:file://eval_end_to_end.py"
-"How can you build up a conversation with multiple turns using the Anthropic Messages API in Python?","To have a multi-turn conversation using the Anthropic Messages API in Python, send the full conversation history in the messages parameter each time, including any prior user and assistant messages. The API is stateless, so the entire context must be provided with each request.","python:file://eval_end_to_end.py"
-"How can using XML tags to provide a specific role or context help improve Claude's analysis of a legal contract compared to not using a role prompt?","Providing Claude with a specific role, such as being the General Counsel of a company, using XML tags can help it catch critical legal issues and risks in a contract that it might miss without the role context, potentially saving the company millions of dollars.","python:file://eval_end_to_end.py"
-"What are the key differences between how Claude 3 Opus and Claude 3 Sonnet handle missing information when making tool calls?","When required parameters are missing, Claude 3 Opus is more likely to ask the user for the missing information, while Claude 3 Sonnet is more likely to try to infer reasonable values on its own to proceed with the tool call.","python:file://eval_end_to_end.py"
-"What steps should be taken to ensure a reliable deployment of an automated ticket routing system using Claude into a production environment?","To ensure a reliable production deployment of Claude for ticket routing, key steps include implementing retry logic to handle errors, conducting thorough staging and load testing, setting up error handling and logging, using a gradual rollout process, providing documentation and training, and establishing monitoring and alerting.","python:file://eval_end_to_end.py"
-"How should you evaluate a model's performance on a ticket routing classifier?","You should evaluate performance in terms of accuracy, cost, and speed.","python:file://eval_end_to_end.py"
-"What two methods does Anthropic recommend for learning how to prompt engineer with Claude before diving into the techniques?","Anthropic recommends trying their interactive GitHub prompting tutorial and Google Sheets prompting tutorial to learn prompt engineering concepts before diving into the techniques in the documentation.","python:file://eval_end_to_end.py"
-"What are the key differences between a pretrained large language model and Claude in terms of their training and capabilities?","Pretrained large language models are trained on unlabeled text data to predict the next word given the previous context, but are not inherently good at answering questions or following instructions without prompt engineering. In contrast, Claude is a large language model that has been further fine-tuned and trained using RLHF to be more helpful, honest, and capable of performing a wider range of useful tasks.","python:file://eval_end_to_end.py"
-"What are some key advantages of using prompt engineering instead of fine-tuning to adapt a pretrained language model for a specific task or domain?","Prompt engineering is typically faster, more cost-effective, requires less data and compute resources, and preserves the model's general knowledge compared to fine-tuning. It also allows for greater flexibility, rapid iteration, and transparency.","python:file://eval_end_to_end.py"
-"How can you authenticate with GCP before running requests to access Claude models on Vertex AI?","Before running requests to access Claude models on Vertex AI, you may need to run `gcloud auth application-default login` to authenticate with GCP.","python:file://eval_end_to_end.py"
-"What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?","According to the information provided, on May 10th, 2024, Anthropic introduced a new ""Prompt Generator"" tool in the Developer Console. This tool is designed to help users guide Claude to generate high-quality prompts tailored to their specific tasks. The text states that the Prompt Generator ""makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks."" This indicates that the Prompt Generator feature provides users with the ability to create customized prompts for Claude, going beyond the standard prompting capabilities. By combining this information with the details about the Claude iOS app and the Claude Team plan released around the same time, we can infer that Anthropic was expanding its platform and tools to provide users with more advanced capabilities for interacting with and leveraging the Claude AI assistant for their specific needs and use cases.","python:file://eval_end_to_end.py"
-"On what date did both the Claude 3.5 Sonnet model and the Artifacts feature in Claude.ai become available?","Both Claude 3.5 Sonnet and the Artifacts feature in Claude.ai became available on June 20th, 2024.","python:file://eval_end_to_end.py"
-"When putting words in Claude's mouth to shape the response, what header and value can you use in the request to limit Claude's response to a single token?","You can use ""max_tokens"": 1 in the request to limit Claude's response to a single token when putting words in its mouth.","python:file://eval_end_to_end.py"
-"What does the temperature parameter do when working with large language models?","Temperature is a parameter that controls the randomness of the model during generation","python:file://eval_end_to_end.py"
-"What are two ways to specify API parameters when calling the Claude API using Claude for Sheets?","When calling the Claude API using Claude for Sheets, you can specify API parameters in two ways: 1) As additional arguments after the prompt and model in the CLAUDE() function, like =CLAUDE(prompt, model, ""max_tokens"", 3). 2) By passing in an API key to be used just for a specific cell, like ""api_key"", ""sk-ant-api03-j1W...""","python:file://eval_end_to_end.py"
-"How does prefilling the response with an opening curly brace ({ ) affect Claude's output when extracting structured data from text?","Prefilling Claude's response with { causes it to skip the preamble explanation and directly output the extracted data as a JSON object, resulting in a more concise response that is easier for programs to parse without additional processing.","python:file://eval_end_to_end.py"
-"What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?","Anthropic provides a multimodal cookbook with tips on getting started with images and best practices, as well as API reference documentation for the Messages API that includes example API calls involving images.","python:file://eval_end_to_end.py"
-"How do you specify the API key when creating a new Anthropic client in the Python and TypeScript SDK examples?","In both the Python and TypeScript examples, you can specify the API key as a string parameter when creating a new Anthropic client object. If no API key is provided, it defaults to using the ANTHROPIC_API_KEY environment variable.","python:file://eval_end_to_end.py"
-"What are two key benefits of using the Anthropic Evaluation tool when developing prompts for an AI classification application?","The Evaluation tool helps identify edge cases where the prompt might falter, and ensures consistent performance across a range of test case inputs. This allows you to refine the prompt for better reliability in the AI classification application.","python:file://eval_end_to_end.py"
-"What are the key differences between a pretrained language model like Claude's underlying model, and the final version of Claude available through Anthropic's API?","The pretrained language model that forms Claude's foundation is not inherently good at answering questions or following instructions. To create the helpful, honest and safe Claude assistant available through the API, the pretrained model underwent fine-tuning and reinforcement learning from human feedback (RLHF).","python:file://eval_end_to_end.py"
-"What is the IPv6 address range used by Anthropic?","The IPv6 address range used by Anthropic is 2607:6bc0::/48.","python:file://eval_end_to_end.py"
-"When using the Python SDK to create a message with Claude, what are two ways you can specify your API key?","When using the Python SDK, you can specify your API key either by passing it as the api_key parameter when initializing the Anthropic client, or by setting it as an environment variable named ANTHROPIC_API_KEY which the client will use by default.","python:file://eval_end_to_end.py"
+query,correct_answer,__expected
+"How can you create multiple test cases for an evaluation in the Anthropic Evaluation tool?","To create multiple test cases in the Anthropic Evaluation tool, click the 'Add Test Case' button, fill in values for each variable in your prompt, and repeat the process to create additional test case scenarios.","python:file://eval_end_to_end.py"
+"What embeddings provider does Anthropic recommend for customized domain-specific models, and what capabilities does this provider offer?","Anthropic recommends Voyage AI for embedding models. Voyage AI offers customized models for specific industry domains like finance and healthcare, as well as bespoke fine-tuned models for individual customers. They have a wide variety of options and capabilities.","python:file://eval_end_to_end.py"
+"What are some key success metrics to consider when evaluating Claude's performance on a classification task, and how do they relate to choosing the right model to reduce latency?","When evaluating Claude's performance on a classification task, some key success metrics to consider include accuracy, F1 score, consistency, structure, speed, bias and fairness. Choosing the right model that fits your specific requirements in terms of speed and output quality is a straightforward way to reduce latency and meet the acceptable response time for your use case.","python:file://eval_end_to_end.py"
+"What are two ways that Claude for Sheets can improve prompt engineering workflows compared to using chained prompts?","Claude for Sheets enables testing prompts across evaluation suites in parallel, which is faster than running chained prompts sequentially. It also excels at office tasks like survey analysis and online data processing that may be more cumbersome with chained prompts.","python:file://eval_end_to_end.py"
+"What happens if a prompt for the Text Completions API is missing the ""\n\nHuman:"" and ""\n\nAssistant:"" turns?","If a prompt for the Text Completions API is missing the required ""\n\nHuman:"" and ""\n\nAssistant:"" turns, it will result in an API error.","python:file://eval_end_to_end.py"
+"How do the additional tokens required for tool use in Claude API requests impact pricing compared to regular API requests?","Tool use requests in the Claude API are priced the same as regular API requests, based on the total input and output tokens. However, tool use requests have additional tokens beyond the regular input and output, including the tools parameter, tool use content blocks, tool result content blocks, and a special system prompt that enables tool use, which add to the total tokens and cost.","python:file://eval_end_to_end.py"
+"When will the new Anthropic Developer Console features that show API usage, billing details, and rate limits be available?","The new Usage, Cost, and Rate Limits tabs in the Anthropic Developer Console that show API usage, billing details, and current rate limits will be available on June 27th, 2024.","python:file://eval_end_to_end.py"
+"When deciding whether to use chain-of-thought (CoT) for a task, what are two key factors to consider in order to strike the right balance between performance and latency?","When deciding whether to use CoT, consider if the task requires in-depth thinking that a human would need to work through, and be aware that the increased output length from CoT may impact latency.","python:file://eval_end_to_end.py"
+"How can I use Claude to more easily digest the content of long PDF documents?","You can upload PDFs and have Claude summarize their content, making it easier to understand the key points of long documents without having to read through everything.","python:file://eval_end_to_end.py"
+"According to the documentation, where can you view your organization's current API rate limits in the Claude Console?","You can view your organization's current API rate limits in the Rate Limits tab of the Developer Console.","python:file://eval_end_to_end.py"
+"How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy?","In addition to accuracy, we can measure the 95th percentile response time and average cost per classification to assess the ticket classification system's performance and production-readiness.","python:file://eval_end_to_end.py"
+"How can you specify a system prompt using the Text Completions API versus the Messages API?","With the Text Completions API, the system prompt is added as text before the first ""\n\nHuman:"" turn. With the Messages API, the system prompt is specified using the separate ""system"" parameter when making the API request.","python:file://eval_end_to_end.py"
+"How can you combine XML tags with chain of thought reasoning to create high-performance prompts for Claude?","You can combine XML tags like and with chain of thought reasoning, where Claude explains its step-by-step reasoning process, to create structured, high-performance prompts. For example, you can prompt Claude to show its reasoning by including ""Before answering, explain your reasoning step-by-step in tags."" in the user message or system prompt.","python:file://eval_end_to_end.py"
+"When evaluating the Claude model's performance for ticket routing, what three key metrics are calculated and what are the results for the claude-3-haiku-20240307 model on the 91 test samples?","When evaluating the claude-3-haiku-20240307 model's performance on the 91 test samples, the three key metrics calculated are accuracy (89.01%), 95th percentile response time (1.61 seconds), and average cost per request routing ($0.0004).","python:file://eval_end_to_end.py"
+"Before starting to engineer and improve a prompt in Claude, what key things does Anthropic recommend you have in place first?","Before prompt engineering, Anthropic highly recommends having a clear definition of success criteria for your use case, some ways to empirically test against those criteria, and a first draft prompt you want to improve.","python:file://eval_end_to_end.py"
+"How does the Messages API handle mid-response prompting compared to the Text Completions API?","The Messages API allows you to continue a response by making the last input message have the ""assistant"" role, whereas the Text Completions API lets you pre-fill part of Claude's response directly in the prompt string.","python:file://eval_end_to_end.py"
+"How does Claude's response differ when given a role through a system prompt compared to not having a specific role in the financial analysis example?","When given the role of CFO through a system prompt, Claude provides a much more insightful, structured, and actionable financial analysis compared to not having a specific role. The role-based response breaks down key financial metrics, provides strategic commentary, and makes specific recommendations.","python:file://eval_end_to_end.py"
+"What are some quantitative metrics that can be used to measure the success of a sentiment analysis model, and how might specific targets for those metrics be determined?","Quantitative metrics for evaluating a sentiment analysis model include task-specific metrics like F1 score, as well as generic metrics like accuracy, precision, and recall. Specific targets should be based on industry benchmarks, prior experiments, AI research, or expert knowledge, and should represent an improvement over the current baseline.","python:file://eval_end_to_end.py"
+"What is a power user tip mentioned in the documentation for creating high-performance prompts using XML tags?","Combining XML tags with other prompt engineering techniques like multishot prompting (using tags) or chain of thought (using and tags) to create super-structured, high-performance prompts.","python:file://eval_end_to_end.py"
+"How can you use an LLM like Claude to automatically grade the outputs of other LLMs based on a rubric?","You can use an LLM like Claude to grade the outputs of other LLMs by providing it with the output to grade along with a detailed rubric. Instruct the LLM to think through its reasoning and then output a simple 'correct' or 'incorrect' result based on how well the output matches the criteria in the rubric.","python:file://eval_end_to_end.py"
+"How can you access and deploy Voyage embeddings on AWS Marketplace?","To access Voyage embeddings on AWS, subscribe to the model package on AWS Marketplace, select the model to deploy, agree to the terms, and copy the Product ARN for your selected region. Then create a JupyterLab space in SageMaker Studio, upload Voyage's notebook, and follow the instructions to deploy the model package using the ARN.","python:file://eval_end_to_end.py"
+"When using tools just to get Claude to produce JSON output following a particular schema, what key things should you do in terms of tool setup and prompting?","When using tools to get JSON output, you should provide a single tool, set the tool_choice to explicitly instruct the model to use that tool, and ensure the tool name and description are from the model's perspective since it will pass the input to the tool.","python:file://eval_end_to_end.py"
+"What are the key differences between the legacy Claude Instant 1.2 model and the Claude 3 Haiku model in terms of capabilities and performance?","The Claude 3 Haiku model has vision capabilities, is faster, more performant, and more intelligent than the legacy Claude Instant 1.2 model. Claude 3 Haiku also has more up-to-date training data.","python:file://eval_end_to_end.py"
+"What is one key benefit of using examples when prompt engineering with Claude?","One key benefit of using examples in prompts is that they reduce misinterpretation of instructions, leading to more accurate outputs from Claude.","python:file://eval_end_to_end.py"
+"According to the Claude Documentation, what is one key advantage of using prompt engineering instead of fine-tuning when it comes to adapting an AI model to new domains or tasks?","Prompt engineering allows you to easily adapt AI models to new domains by providing domain-specific context directly in the prompts, without needing to retrain the model through fine-tuning.","python:file://eval_end_to_end.py"
+"How can I quickly get started using the Claude for Sheets extension with a pre-made template?","You can make a copy of Anthropic's provided Claude for Sheets workbook template to quickly get started using the extension with your own work.","python:file://eval_end_to_end.py"
+"How does the ""index"" field in the ""content_block_delta"" event relate to the text being streamed in a response?","The ""index"" field in each ""content_block_delta"" event indicates which content block the text delta applies to. Multiple deltas with the same index consecutively stream the text for a single content block in the response.","python:file://eval_end_to_end.py"
+"How can you include an image as part of a Claude API request, and what image formats are currently supported?","To include an image in a Claude API request, provide it as a base64-encoded image in an ""image"" content block within the ""messages"" array. The currently supported image formats are JPEG, PNG, GIF, and WebP.","python:file://eval_end_to_end.py"
+"What is the relationship between time to first token (TTFT) and latency when evaluating a language model's performance?","TTFT is a specific measure of latency that captures the time it takes for a language model to generate the first token of its response after receiving a prompt. It is an important component of a model's overall latency and responsiveness, especially for interactive applications.","python:file://eval_end_to_end.py"
+"How can providing Claude with examples of handling certain edge cases like implicit requests or emotional prioritization help improve its performance in routing support tickets?","Providing edge case examples to Claude in the prompt can meaningfully improve its performance in correctly routing support tickets in scenarios where it may otherwise misclassify them, such as implicit requests, emotional prioritization, ambiguous intent vs. routing, or issue prioritization.","python:file://eval_end_to_end.py"
+"How does the stop_reason of ""tool_use"" relate to the overall workflow of integrating external tools with Claude?","When Claude determines that one of the user-provided tools can help answer the user's query, it constructs a tool use request. This causes the API response to have a stop_reason of ""tool_use"", signaling Claude's intent to use the tool. The user must then extract the tool input from Claude's request, run the actual tool code client-side, and continue the conversation by sending the tool results back to Claude.","python:file://eval_end_to_end.py"
+"According to the documentation, what error event and corresponding HTTP error code may be sent during periods of high usage for the Claude API when using streaming responses?","During periods of high usage, an overloaded_error event may be sent in the event stream, which would normally correspond to an HTTP 529 error code in a non-streaming context.","python:file://eval_end_to_end.py"
+"What are the two types of deltas that can be contained in a content_block_delta event when streaming responses from the Claude API?","The two types of deltas that can be contained in a content_block_delta event are text_delta and input_json_delta.","python:file://eval_end_to_end.py"
+"On what date did Claude 3.5 Sonnet and tool use both become generally available across the Claude API, Amazon Bedrock, and Google Vertex AI?","Claude 3.5 Sonnet became generally available across those platforms on June 20th, 2024, while tool use became generally available on May 30th, 2024.","python:file://eval_end_to_end.py"
+"In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe?","Anthropic launched Claude.ai and the Claude iOS app in Europe in May 2024, and then launched them in Canada the following month in June 2024.","python:file://eval_end_to_end.py"
+"When the API response from Claude has a stop_reason of ""tool_use"", what does this indicate and what should be done next to continue the conversation?","A stop_reason of ""tool_use"" signals that Claude has decided to use a tool and has constructed a formatted tool use request. To continue the conversation, the tool name and input should be extracted from Claude's request, the actual tool code should be executed client-side, and then a new user message containing a tool_result content block should be sent to Claude.","python:file://eval_end_to_end.py"
+"What Python libraries are used in the example code snippet for evaluating tone and style in a customer service chatbot?","The example code snippet for evaluating tone and style in a customer service chatbot uses the anthropic Python library to interact with the Claude AI model.","python:file://eval_end_to_end.py"
+"What are the two main ways to authenticate when using the Anthropic Python SDK to access Claude models on Amazon Bedrock?","The two main ways to authenticate are: 1) Directly providing the aws_access_key, aws_secret_key, and optionally aws_session_token, or 2) Using the default AWS credential providers, such as the ~/.aws/credentials file or the AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID environment variables.","python:file://eval_end_to_end.py"
+"When deciding whether to implement leak-resistant prompt engineering strategies, what two factors should be considered and balanced?","When deciding to use leak-resistant prompt engineering, the potential reduction in prompt leaks should be balanced against the risk of degraded model performance due to the added complexity of the prompt.","python:file://eval_end_to_end.py"
+"How can selecting the appropriate Claude model based on your specific requirements help reduce latency in your application?","Choosing the right Claude model that best fits your needs in terms of speed and output quality is one of the most straightforward ways to reduce latency in your application. Anthropic offers a range of Claude models with different capabilities and performance characteristics to allow you to choose the optimal balance of intelligence, speed, and cost for your use case.","python:file://eval_end_to_end.py"
+"How can you stream responses from the Claude API using the Python SDK?","You can stream responses from the Claude API using the Python SDK by using the client.messages.stream() method and iterating over the stream.text_stream attribute in a for loop.","python:file://eval_end_to_end.py"
+"How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case?","You can shape Claude's response by pre-filling part of it in the last position of the input messages list. To get a short response like a single multiple choice answer, you can set the ""max_tokens"" parameter to a small value like 1.","python:file://eval_end_to_end.py"
+"What is more important when building an eval set for an AI system - having a larger number of test cases with automated grading, or having fewer high-quality test cases graded by humans?","When building an eval set, it is better to prioritize having a larger volume of test cases with slightly lower signal automated grading over having fewer questions with high-quality human hand-grading.","python:file://eval_end_to_end.py"
+"What are the two required fields in a content_block_delta event for a text delta type?","The two required fields in a content_block_delta event for a text delta type are ""index"" and ""delta"", where the ""delta"" field contains a ""type"" of ""text_delta"" and the ""text"" being added.","python:file://eval_end_to_end.py"
+"What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?","The Claude Cookbooks provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting.","python:file://eval_end_to_end.py"
+"Why does breaking a task into distinct subtasks for chained prompts help improve Claude's accuracy on the overall task?","Breaking a task into distinct subtasks for chained prompts improves Claude's accuracy because each subtask gets Claude's full attention, reducing errors compared to tackling the entire complex task at once.","python:file://eval_end_to_end.py"
+"How does the streaming format for Messages responses differ from Text Completions streaming responses?","Messages streaming responses can contain multiple content blocks of varying types, making the streaming format more complex compared to Text Completions which only include completion, ping, and error server-sent-events.","python:file://eval_end_to_end.py"
+"What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?","According to the documentation, users can start experimenting with Claude by visiting claude.ai or using Anthropic's web Console.","python:file://eval_end_to_end.py"
+"How can using chain prompts help reduce errors and inconsistency in complex tasks handled by Claude?","Chain prompts break complex tasks into smaller subtasks, allowing Claude to give its full attention to each one. This reduces errors and inconsistencies that may occur when trying to handle a complex workflow all at once.","python:file://eval_end_to_end.py"
+"What HTTP status code does an overloaded_error event correspond to in a non-streaming context for the Claude API?","In a non-streaming context, an overloaded_error event would normally correspond to an HTTP 529 status code.","python:file://eval_end_to_end.py"
+"What are the two ways to specify the format in which Voyage AI returns embeddings through its HTTP API?","When making a request to Voyage AI's embedding endpoint, you can either leave the encoding_format parameter unspecified to get the embeddings as lists of floating-point numbers, or set encoding_format to ""base64"" to get the embeddings compressed to Base64 encodings.","python:file://eval_end_to_end.py"
+"When streaming API requests that use tools, how are the input JSON deltas for tool_use content blocks sent, and how can they be accumulated and parsed by the client?","When streaming requests with tool use, the input JSON deltas for tool_use content blocks are sent as partial JSON strings in multiple content_block_delta events. The client can accumulate these partial JSON strings and parse the complete JSON object once a content_block_stop event is received, using a library like Pydantic for partial JSON parsing or helpers provided in Anthropic's SDKs.","python:file://eval_end_to_end.py"
+"What are the two interactive prompt engineering tutorials that Anthropic offers, and how do they differ?","Anthropic offers a GitHub prompting tutorial that covers prompt engineering concepts in-depth with examples, and a lighter-weight Google Sheets prompting tutorial that utilizes Claude for Sheets.","python:file://eval_end_to_end.py"
+"What are some of the key capabilities that make Claude suitable for enterprise use cases requiring integration with specialized applications and processing of large volumes of sensitive data?","Claude offers a 200K token context window, tool use for integration into specialized applications, multimodal input capabilities for richer context, and is uniquely positioned to serve high-trust industries processing large volumes of sensitive data with enterprise-grade security and data handling.","python:file://eval_end_to_end.py"
+"As of June 2024, in which regions are Anthropic's Claude.ai API and iOS app available?","As of June 2024, Anthropic's Claude.ai API and iOS app are available in the United States, Canada, and Europe.","python:file://eval_end_to_end.py"
+"What are the two main approaches for integrating Claude into a support ticket workflow, and how do they differ in terms of scalability and ease of implementation?","The two main approaches for integrating Claude into a support ticket workflow are push-based using webhooks, and pull-based. The push-based approach is more web-scalable but requires exposing a public endpoint which has IT security implications. The pull-based approach is easier to implement but makes unnecessary calls to the support ticket system.","python:file://eval_end_to_end.py"
+"When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?","On May 10th, 2024, Anthropic released a prompt generator tool that is available through the Developer Console.","python:file://eval_end_to_end.py"
+"Which Claude 3 model provides the best balance of intelligence and speed for high-throughput tasks like sales forecasting and targeted marketing?","The Claude 3 Sonnet model balances intelligence and speed, making it well-suited for high-throughput tasks like sales forecasting and targeted marketing.","python:file://eval_end_to_end.py"
+"How can you calculate the similarity between two Voyage embedding vectors, and what is this equivalent to since Voyage embeddings are normalized to length 1?","You can calculate the similarity between two Voyage embedding vectors using the dot product, which is equivalent to cosine similarity since Voyage embeddings are normalized to length 1.","python:file://eval_end_to_end.py"
+"How can using examples in prompts improve Claude's performance on complex tasks?","Well-chosen examples in prompts can boost Claude's ability to handle complex tasks by reducing misinterpretation of instructions, enforcing consistent structure and style, and serving as a guide for the desired output.","python:file://eval_end_to_end.py"
+"What are the two types of content block deltas that can be emitted when streaming responses with tool use, and what does each delta type contain?","When streaming responses with tool use, the two types of content block deltas are text deltas and input JSON deltas. Text deltas contain a ""text"" field with a string of the incrementally generated text. Input JSON deltas contain a ""partial_json"" field with a string containing part of the JSON object specifying the tool's input.","python:file://eval_end_to_end.py"
+"What are two key capabilities of Claude that enable it to build interactive systems and personalized user experiences?","Claude's question answering and text analysis capabilities enable it to build intelligent, interactive systems like chatbots and personalize user experiences by understanding sentiment and preferences.","python:file://eval_end_to_end.py"
+"What are the key event types included in a raw HTTP stream response when using message streaming, and what is the typical order they occur in?","A raw HTTP stream response includes a message_start event, followed by one or more content blocks (each with a content_block_start, content_block_delta events, and content_block_stop), a message_delta event, and a final message_stop event. Ping events may also be dispersed throughout.","python:file://eval_end_to_end.py"
+"What is the maximum number of images that can be included in a single request using the Claude API compared to the claude.ai interface?","The Messages API allows including up to 20 images per request, while the claude.ai interface has a lower limit of up to 5 images per turn.","python:file://eval_end_to_end.py"
+"When Claude's response is cut off due to hitting the max_tokens limit and contains an incomplete tool use block, what should you do to get the full tool use?","If Claude's response hits the max_tokens limit and has an incomplete tool use block, you should retry the request with a higher max_tokens value to get Claude's full response including the complete tool use.","python:file://eval_end_to_end.py"
+"What two steps are needed before running a classification evaluation on Claude according to the documentation?","Before running a classification evaluation on Claude, you need to 1) develop your test cases, and 2) take a look at Anthropic's guide to developing test cases.","python:file://eval_end_to_end.py"
+"How can you use the content parameter in the messages list to influence Claude's response?","You can provide content in the last position of the messages list, with the ""assistant"" role, to pre-fill part of Claude's response. This allows you to shape the assistant's output.","python:file://eval_end_to_end.py"
+"What are two key advantages of prompt engineering over fine-tuning when it comes to model comprehension and general knowledge preservation?","Compared to fine-tuning, prompt engineering is far more effective at helping models understand and utilize external content like retrieved documents. Prompt engineering also preserves the model's broad general knowledge, while fine-tuning risks catastrophic forgetting where the model loses its general capabilities.","python:file://eval_end_to_end.py"
+"What are the two main steps to get started with making requests to Claude models on Anthropic's Bedrock API?","To get started making requests to Claude models on Anthropic's Bedrock API, you need to: 1) Install and configure the AWS CLI, and 2) Install an SDK for accessing Bedrock, such as the Python SDK shown in the example code.","python:file://eval_end_to_end.py"
+"How can you check which Claude models are available in a specific AWS region using the AWS CLI?","You can list the available Claude models in a specific AWS region by running the command `aws bedrock list-foundation-models --region= --by-provider anthropic --query ""modelSummaries[*].modelId""`, replacing `` with the desired AWS region such as `us-west-2`.","python:file://eval_end_to_end.py"
+"What argument can be passed to the voyageai.Client.embed() method or the Voyage HTTP API to specify whether the input text is a query or a document?","The input_type argument can be passed with a value of ""query"" or ""document"" to specify the type of input text being embedded.","python:file://eval_end_to_end.py"
+"How do the streaming API delta formats differ between tool_use content blocks and text content blocks?","Tool_use content block deltas contain partial JSON strings for the input field, whereas text content block deltas directly contain the text delta. Tool_use deltas may have delays between streaming events as the model emits one complete key-value pair at a time.","python:file://eval_end_to_end.py"
+"What are the image file size limits when uploading images to Claude using the API versus on claude.ai?","When uploading images to Claude, the API has a maximum file size limit of 5MB per image, while on claude.ai the limit is 10MB per image.","python:file://eval_end_to_end.py"
+"What is one key consideration when selecting a Claude model for an enterprise use case that needs low latency?","When selecting a Claude model for an enterprise use case that requires low latency, it's important to choose the model that best balances speed and output quality based on the specific requirements of the use case.","python:file://eval_end_to_end.py"
+"What embedding model does Anthropic recommend for code retrieval, and how does its performance compare to alternatives according to Voyage AI?","For code retrieval, Voyage AI recommends using the voyage-code-2 embedding model, which they claim performs 17% better than alternatives and achieves state-of-the-art results on general-purpose corpora as well.","python:file://eval_end_to_end.py"
+"What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?","The Claude Cookbooks provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs.","python:file://eval_end_to_end.py"
+"How does the size of the context window impact a language model's ability to utilize retrieval augmented generation (RAG)?","The size of the context window determines how much retrieved information can be passed to the language model to augment its knowledge when generating a response using RAG. A larger context window allows more relevant retrieved information to be utilized by the model, improving the accuracy and groundedness of the generated text.","python:file://eval_end_to_end.py"
+"How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?","The Evaluation tool helps identify edge cases where prompts might falter, allows rating individual results to determine prompt performance, ensures consistent performance across inputs, and enables prompt refinement for better reliability. Reviewing results across test cases helps spot patterns to make informed adjustments that lead to more robust AI applications.","python:file://eval_end_to_end.py"
+"Which Claude model has the fastest comparative latency according to the comparison tables?","The Claude 3 Haiku model has the fastest comparative latency","python:file://eval_end_to_end.py"
+"How can you build up a conversation with multiple turns using the Anthropic Messages API in Python?","To have a multi-turn conversation using the Anthropic Messages API in Python, send the full conversation history in the messages parameter each time, including any prior user and assistant messages. The API is stateless, so the entire context must be provided with each request.","python:file://eval_end_to_end.py"
+"How can using XML tags to provide a specific role or context help improve Claude's analysis of a legal contract compared to not using a role prompt?","Providing Claude with a specific role, such as being the General Counsel of a company, using XML tags can help it catch critical legal issues and risks in a contract that it might miss without the role context, potentially saving the company millions of dollars.","python:file://eval_end_to_end.py"
+"What are the key differences between how Claude 3 Opus and Claude 3 Sonnet handle missing information when making tool calls?","When required parameters are missing, Claude 3 Opus is more likely to ask the user for the missing information, while Claude 3 Sonnet is more likely to try to infer reasonable values on its own to proceed with the tool call.","python:file://eval_end_to_end.py"
+"What steps should be taken to ensure a reliable deployment of an automated ticket routing system using Claude into a production environment?","To ensure a reliable production deployment of Claude for ticket routing, key steps include implementing retry logic to handle errors, conducting thorough staging and load testing, setting up error handling and logging, using a gradual rollout process, providing documentation and training, and establishing monitoring and alerting.","python:file://eval_end_to_end.py"
+"How should you evaluate a model's performance on a ticket routing classifier?","You should evaluate performance in terms of accuracy, cost, and speed.","python:file://eval_end_to_end.py"
+"What two methods does Anthropic recommend for learning how to prompt engineer with Claude before diving into the techniques?","Anthropic recommends trying their interactive GitHub prompting tutorial and Google Sheets prompting tutorial to learn prompt engineering concepts before diving into the techniques in the documentation.","python:file://eval_end_to_end.py"
+"What are the key differences between a pretrained large language model and Claude in terms of their training and capabilities?","Pretrained large language models are trained on unlabeled text data to predict the next word given the previous context, but are not inherently good at answering questions or following instructions without prompt engineering. In contrast, Claude is a large language model that has been further fine-tuned and trained using RLHF to be more helpful, honest, and capable of performing a wider range of useful tasks.","python:file://eval_end_to_end.py"
+"What are some key advantages of using prompt engineering instead of fine-tuning to adapt a pretrained language model for a specific task or domain?","Prompt engineering is typically faster, more cost-effective, requires less data and compute resources, and preserves the model's general knowledge compared to fine-tuning. It also allows for greater flexibility, rapid iteration, and transparency.","python:file://eval_end_to_end.py"
+"How can you authenticate with GCP before running requests to access Claude models on Vertex AI?","Before running requests to access Claude models on Vertex AI, you may need to run `gcloud auth application-default login` to authenticate with GCP.","python:file://eval_end_to_end.py"
+"What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?","According to the information provided, on May 10th, 2024, Anthropic introduced a new ""Prompt Generator"" tool in the Developer Console. This tool is designed to help users guide Claude to generate high-quality prompts tailored to their specific tasks. The text states that the Prompt Generator ""makes it easy to guide Claude to generate a high-quality prompts tailored to your specific tasks."" This indicates that the Prompt Generator feature provides users with the ability to create customized prompts for Claude, going beyond the standard prompting capabilities. By combining this information with the details about the Claude iOS app and the Claude Team plan released around the same time, we can infer that Anthropic was expanding its platform and tools to provide users with more advanced capabilities for interacting with and leveraging the Claude AI assistant for their specific needs and use cases.","python:file://eval_end_to_end.py"
+"On what date did both the Claude 3.5 Sonnet model and the Artifacts feature in Claude.ai become available?","Both Claude 3.5 Sonnet and the Artifacts feature in Claude.ai became available on June 20th, 2024.","python:file://eval_end_to_end.py"
+"When putting words in Claude's mouth to shape the response, what header and value can you use in the request to limit Claude's response to a single token?","You can use ""max_tokens"": 1 in the request to limit Claude's response to a single token when putting words in its mouth.","python:file://eval_end_to_end.py"
+"What does the temperature parameter do when working with large language models?","Temperature is a parameter that controls the randomness of the model during generation","python:file://eval_end_to_end.py"
+"What are two ways to specify API parameters when calling the Claude API using Claude for Sheets?","When calling the Claude API using Claude for Sheets, you can specify API parameters in two ways: 1) As additional arguments after the prompt and model in the CLAUDE() function, like =CLAUDE(prompt, model, ""max_tokens"", 3). 2) By passing in an API key to be used just for a specific cell, like ""api_key"", ""sk-ant-api03-j1W...""","python:file://eval_end_to_end.py"
+"How does prefilling the response with an opening curly brace ({ ) affect Claude's output when extracting structured data from text?","Prefilling Claude's response with { causes it to skip the preamble explanation and directly output the extracted data as a JSON object, resulting in a more concise response that is easier for programs to parse without additional processing.","python:file://eval_end_to_end.py"
+"What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?","Anthropic provides a multimodal cookbook with tips on getting started with images and best practices, as well as API reference documentation for the Messages API that includes example API calls involving images.","python:file://eval_end_to_end.py"
+"How do you specify the API key when creating a new Anthropic client in the Python and TypeScript SDK examples?","In both the Python and TypeScript examples, you can specify the API key as a string parameter when creating a new Anthropic client object. If no API key is provided, it defaults to using the ANTHROPIC_API_KEY environment variable.","python:file://eval_end_to_end.py"
+"What are two key benefits of using the Anthropic Evaluation tool when developing prompts for an AI classification application?","The Evaluation tool helps identify edge cases where the prompt might falter, and ensures consistent performance across a range of test case inputs. This allows you to refine the prompt for better reliability in the AI classification application.","python:file://eval_end_to_end.py"
+"What are the key differences between a pretrained language model like Claude's underlying model, and the final version of Claude available through Anthropic's API?","The pretrained language model that forms Claude's foundation is not inherently good at answering questions or following instructions. To create the helpful, honest and safe Claude assistant available through the API, the pretrained model underwent fine-tuning and reinforcement learning from human feedback (RLHF).","python:file://eval_end_to_end.py"
+"What is the IPv6 address range used by Anthropic?","The IPv6 address range used by Anthropic is 2607:6bc0::/48.","python:file://eval_end_to_end.py"
+"When using the Python SDK to create a message with Claude, what are two ways you can specify your API key?","When using the Python SDK, you can specify your API key either by passing it as the api_key parameter when initializing the Anthropic client, or by setting it as an environment variable named ANTHROPIC_API_KEY which the client will use by default.","python:file://eval_end_to_end.py"
diff --git a/skills/retrieval_augmented_generation/evaluation/promptfoo_datasets/retrieval_dataset.csv b/skills/retrieval_augmented_generation/evaluation/promptfoo_datasets/retrieval_dataset.csv
index 5e50def..8f94028 100644
--- a/skills/retrieval_augmented_generation/evaluation/promptfoo_datasets/retrieval_dataset.csv
+++ b/skills/retrieval_augmented_generation/evaluation/promptfoo_datasets/retrieval_dataset.csv
@@ -1,101 +1,101 @@
-query,correct_chunks,__expected
-"How can you create multiple test cases for an evaluation in the Anthropic Evaluation tool?","[""https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#creating-test-cases"",""https://docs.claude.com/en/docs/build-with-claude/develop-tests#building-evals-and-test-cases""]","python:file://eval_retrieval.py"
-"What embeddings provider does Anthropic recommend for customized domain-specific models, and what capabilities does this provider offer?","[""https://docs.claude.com/en/docs/build-with-claude/embeddings#before-implementing-embeddings"",""https://docs.claude.com/en/docs/build-with-claude/embeddings#how-to-get-embeddings-with-anthropic""]","python:file://eval_retrieval.py"
-"What are some key success metrics to consider when evaluating Claude's performance on a classification task, and how do they relate to choosing the right model to reduce latency?","[""https://docs.claude.com/en/docs/about-claude/use-cases/classification#evaluation-metrics"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#1-choose-the-right-model""]","python:file://eval_retrieval.py"
-"What are two ways that Claude for Sheets can improve prompt engineering workflows compared to using chained prompts?","[""https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#why-use-claude-for-sheets"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#how-to-chain-prompts""]","python:file://eval_retrieval.py"
-"What happens if a prompt for the Text Completions API is missing the ""\n\nHuman:"" and ""\n\nAssistant:"" turns?","[""https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#system-prompt"",""https://docs.claude.com/en/api/prompt-validation#examples""]","python:file://eval_retrieval.py"
-"How do the additional tokens required for tool use in Claude API requests impact pricing compared to regular API requests?","[""https://docs.claude.com/en/docs/build-with-claude/tool-use#pricing"",""https://docs.claude.com/en/docs/build-with-claude/tool-use#how-tool-use-works""]","python:file://eval_retrieval.py"
-"When will the new Anthropic Developer Console features that show API usage, billing details, and rate limits be available?","[""https://docs.claude.com/en/release-notes/api#june-27th-2024""]","python:file://eval_retrieval.py"
-"When deciding whether to use chain-of-thought (CoT) for a task, what are two key factors to consider in order to strike the right balance between performance and latency?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-of-thought#why-not-let-claude-think"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-of-thought#before-implementing-cot""]","python:file://eval_retrieval.py"
-"How can I use Claude to more easily digest the content of long PDF documents?","[""https://docs.claude.com/en/docs/build-with-claude/text-generation#anthropic-cookbook"",""https://docs.claude.com/en/docs/build-with-claude/vision#before-you-upload""]","python:file://eval_retrieval.py"
-"According to the documentation, where can you view your organization's current API rate limits in the Claude Console?","[""https://docs.claude.com/en/api/rate-limits#about-our-limits"",""https://docs.claude.com/en/release-notes/api#june-27th-2024""]","python:file://eval_retrieval.py"
-"How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy?","[""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#evaluation-methodology"",""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#prompting-claude-for-ticket-routing""]","python:file://eval_retrieval.py"
-"How can you specify a system prompt using the Text Completions API versus the Messages API?","[""https://docs.claude.com/en/api/prompt-validation#examples"",""https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#system-prompt""]","python:file://eval_retrieval.py"
-"How can you combine XML tags with chain of thought reasoning to create high-performance prompts for Claude?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags#tagging-best-practices"",""https://docs.claude.com/en/docs/build-with-claude/tool-use#chain-of-thought""]","python:file://eval_retrieval.py"
-"When evaluating the Claude model's performance for ticket routing, what three key metrics are calculated and what are the results for the claude-3-haiku-20240307 model on the 91 test samples?","[""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#evaluation-methodology"",""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#example-data""]","python:file://eval_retrieval.py"
-"Before starting to engineer and improve a prompt in Claude, what key things does Anthropic recommend you have in place first?","[""https://docs.claude.com/en/docs/build-with-claude/define-success#next-steps"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#before-prompt-engineering""]","python:file://eval_retrieval.py"
-"How does the Messages API handle mid-response prompting compared to the Text Completions API?","[""https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#inputs-and-outputs"",""https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#putting-words-in-claudes-mouth""]","python:file://eval_retrieval.py"
-"How does Claude's response differ when given a role through a system prompt compared to not having a specific role in the financial analysis example?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/system-prompts#example-2-financial-analysis""]","python:file://eval_retrieval.py"
-"What are some quantitative metrics that can be used to measure the success of a sentiment analysis model, and how might specific targets for those metrics be determined?","[""https://docs.claude.com/en/docs/build-with-claude/define-success#building-strong-criteria""]","python:file://eval_retrieval.py"
-"What is a power user tip mentioned in the documentation for creating high-performance prompts using XML tags?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#how-to-prompt-engineer"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags#tagging-best-practices""]","python:file://eval_retrieval.py"
-"How can you use an LLM like Claude to automatically grade the outputs of other LLMs based on a rubric?","[""https://docs.claude.com/en/docs/build-with-claude/develop-tests#tips-for-llm-based-grading"",""https://docs.claude.com/en/api/messages-examples#multiple-conversational-turns""]","python:file://eval_retrieval.py"
-"How can you access and deploy Voyage embeddings on AWS Marketplace?","[""https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-on-the-aws-marketplace""]","python:file://eval_retrieval.py"
-"When using tools just to get Claude to produce JSON output following a particular schema, what key things should you do in terms of tool setup and prompting?","[""https://docs.claude.com/en/docs/build-with-claude/tool-use#tool-use-examples"",""https://docs.claude.com/en/docs/build-with-claude/tool-use#json-output""]","python:file://eval_retrieval.py"
-"What are the key differences between the legacy Claude Instant 1.2 model and the Claude 3 Haiku model in terms of capabilities and performance?","[""https://docs.claude.com/en/docs/about-claude/models#legacy-model-comparison"",""https://docs.claude.com/en/docs/about-claude/models#model-comparison"",""https://docs.claude.com/en/docs/about-claude/models#legacy-models""]","python:file://eval_retrieval.py"
-"What is one key benefit of using examples when prompt engineering with Claude?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/multishot-prompting#why-use-examples""]","python:file://eval_retrieval.py"
-"According to the Claude Documentation, what is one key advantage of using prompt engineering instead of fine-tuning when it comes to adapting an AI model to new domains or tasks?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#when-to-prompt-engineer"",""https://docs.claude.com/en/docs/resources/glossary#fine-tuning""]","python:file://eval_retrieval.py"
-"How can I quickly get started using the Claude for Sheets extension with a pre-made template?","[""https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#claude-for-sheets-workbook-template"",""https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#get-started-with-claude-for-sheets""]","python:file://eval_retrieval.py"
-"How does the ""index"" field in the ""content_block_delta"" event relate to the text being streamed in a response?","[""https://docs.claude.com/en/api/messages-streaming#basic-streaming-request"",""https://docs.claude.com/en/api/messages-streaming#text-delta""]","python:file://eval_retrieval.py"
-"How can you include an image as part of a Claude API request, and what image formats are currently supported?","[""https://docs.claude.com/en/api/messages-examples#vision"",""https://docs.claude.com/en/docs/build-with-claude/vision#about-the-prompt-examples""]","python:file://eval_retrieval.py"
-"What is the relationship between time to first token (TTFT) and latency when evaluating a language model's performance?","[""https://docs.claude.com/en/docs/resources/glossary#ttft-time-to-first-token"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#how-to-measure-latency"",""https://docs.claude.com/en/docs/resources/glossary#latency""]","python:file://eval_retrieval.py"
-"How can providing Claude with examples of handling certain edge cases like implicit requests or emotional prioritization help improve its performance in routing support tickets?","[""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#adapting-to-common-scenarios"",""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#prompting-claude-for-ticket-routing""]","python:file://eval_retrieval.py"
-"How does the stop_reason of ""tool_use"" relate to the overall workflow of integrating external tools with Claude?","[""https://docs.claude.com/en/api/messages-examples#tool-use-and-json-mode"",""https://docs.claude.com/en/docs/build-with-claude/tool-use#how-tool-use-works""]","python:file://eval_retrieval.py"
-"According to the documentation, what error event and corresponding HTTP error code may be sent during periods of high usage for the Claude API when using streaming responses?","[""https://docs.claude.com/en/api/messages-streaming#error-events"",""https://docs.claude.com/en/api/streaming#error-event-types"",""https://docs.claude.com/en/api/errors#http-errors""]","python:file://eval_retrieval.py"
-"What are the two types of deltas that can be contained in a content_block_delta event when streaming responses from the Claude API?","[""https://docs.claude.com/en/api/messages-streaming#text-delta"",""https://docs.claude.com/en/api/messages-streaming#delta-types""]","python:file://eval_retrieval.py"
-"On what date did Claude 3.5 Sonnet and tool use both become generally available across the Claude API, Amazon Bedrock, and Google Vertex AI?","[""https://docs.claude.com/en/release-notes/api#june-20th-2024"",""https://docs.claude.com/en/release-notes/api#may-30th-2024""]","python:file://eval_retrieval.py"
-"In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe?","[""https://docs.claude.com/en/release-notes/claude-apps#june-5th-2024"",""https://docs.claude.com/en/release-notes/claude-apps#may-13th-2024""]","python:file://eval_retrieval.py"
-"When the API response from Claude has a stop_reason of ""tool_use"", what does this indicate and what should be done next to continue the conversation?","[""https://docs.claude.com/en/docs/build-with-claude/tool-use#json-output"",""https://docs.claude.com/en/docs/build-with-claude/tool-use#how-tool-use-works""]","python:file://eval_retrieval.py"
-"What Python libraries are used in the example code snippet for evaluating tone and style in a customer service chatbot?","[""https://docs.claude.com/en/docs/build-with-claude/develop-tests#example-evals""]","python:file://eval_retrieval.py"
-"What are the two main ways to authenticate when using the Anthropic Python SDK to access Claude models on Amazon Bedrock?","[""https://docs.claude.com/en/api/claude-on-amazon-bedrock#install-an-sdk-for-accessing-bedrock"",""https://docs.claude.com/en/api/claude-on-amazon-bedrock#making-requests""]","python:file://eval_retrieval.py"
-"When deciding whether to implement leak-resistant prompt engineering strategies, what two factors should be considered and balanced?","[""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-prompt-leak#strategies-to-reduce-prompt-leak"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-prompt-leak#before-you-try-to-reduce-prompt-leak""]","python:file://eval_retrieval.py"
-"How can selecting the appropriate Claude model based on your specific requirements help reduce latency in your application?","[""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#1-choose-the-right-model"",""https://docs.claude.com/en/docs/intro-to-claude#model-options""]","python:file://eval_retrieval.py"
-"How can you stream responses from the Claude API using the Python SDK?","[""https://docs.claude.com/en/api/messages-streaming#streaming-with-sdks"",""https://docs.claude.com/en/api/client-sdks#python""]","python:file://eval_retrieval.py"
-"How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case?","[""https://docs.claude.com/en/api/messages-examples#putting-words-in-claudes-mouth"",""https://docs.claude.com/en/api/messages-examples#basic-request-and-response""]","python:file://eval_retrieval.py"
-"What is more important when building an eval set for an AI system - having a larger number of test cases with automated grading, or having fewer high-quality test cases graded by humans?","[""https://docs.claude.com/en/docs/build-with-claude/develop-tests#eval-design-principles"",""https://docs.claude.com/en/docs/build-with-claude/develop-tests#building-evals-and-test-cases""]","python:file://eval_retrieval.py"
-"What are the two required fields in a content_block_delta event for a text delta type?","[""https://docs.claude.com/en/api/messages-streaming#delta-types"",""https://docs.claude.com/en/api/messages-streaming#text-delta""]","python:file://eval_retrieval.py"
-"What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?","[""https://docs.claude.com/en/docs/quickstart#next-steps"",""https://docs.claude.com/en/docs/welcome#develop-with-claude""]","python:file://eval_retrieval.py"
-"Why does breaking a task into distinct subtasks for chained prompts help improve Claude's accuracy on the overall task?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#how-to-chain-prompts"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#why-chain-prompts""]","python:file://eval_retrieval.py"
-"How does the streaming format for Messages responses differ from Text Completions streaming responses?","[""https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#streaming-format""]","python:file://eval_retrieval.py"
-"What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?","[""https://docs.claude.com/en/docs/about-claude/models#get-started-with-claude""]","python:file://eval_retrieval.py"
-"How can using chain prompts help reduce errors and inconsistency in complex tasks handled by Claude?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#why-chain-prompts"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/increase-consistency#chain-prompts-for-complex-tasks""]","python:file://eval_retrieval.py"
-"What HTTP status code does an overloaded_error event correspond to in a non-streaming context for the Claude API?","[""https://docs.claude.com/en/api/streaming#error-event-types"",""https://docs.claude.com/en/api/messages-streaming#error-events""]","python:file://eval_retrieval.py"
-"What are the two ways to specify the format in which Voyage AI returns embeddings through its HTTP API?","[""https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-http-api""]","python:file://eval_retrieval.py"
-"When streaming API requests that use tools, how are the input JSON deltas for tool_use content blocks sent, and how can they be accumulated and parsed by the client?","[""https://docs.claude.com/en/api/messages-streaming#input-json-delta"",""https://docs.claude.com/en/api/messages-streaming#streaming-request-with-tool-use""]","python:file://eval_retrieval.py"
-"What are the two interactive prompt engineering tutorials that Anthropic offers, and how do they differ?","[""https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#prompt-engineering-interactive-tutorial"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#prompt-engineering-tutorial""]","python:file://eval_retrieval.py"
-"What are some of the key capabilities that make Claude suitable for enterprise use cases requiring integration with specialized applications and processing of large volumes of sensitive data?","[""https://docs.claude.com/en/docs/intro-to-claude#enterprise-considerations""]","python:file://eval_retrieval.py"
-"As of June 2024, in which regions are Anthropic's Claude.ai API and iOS app available?","[""https://docs.claude.com/en/release-notes/claude-apps#may-1st-2024"",""https://docs.claude.com/en/release-notes/claude-apps#june-5th-2024"",""https://docs.claude.com/en/release-notes/claude-apps#may-13th-2024""]","python:file://eval_retrieval.py"
-"What are the two main approaches for integrating Claude into a support ticket workflow, and how do they differ in terms of scalability and ease of implementation?","[""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#integrate-claude-into-your-support-workflow"",""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#introduction""]","python:file://eval_retrieval.py"
-"When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?","[""https://docs.claude.com/en/release-notes/api#may-10th-2024""]","python:file://eval_retrieval.py"
-"Which Claude 3 model provides the best balance of intelligence and speed for high-throughput tasks like sales forecasting and targeted marketing?","[""https://docs.claude.com/en/api/claude-on-vertex-ai#api-model-names"",""https://docs.claude.com/en/docs/intro-to-claude#claude-3-family""]","python:file://eval_retrieval.py"
-"How can you calculate the similarity between two Voyage embedding vectors, and what is this equivalent to since Voyage embeddings are normalized to length 1?","[""https://docs.claude.com/en/docs/build-with-claude/embeddings#faq"",""https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-embedding-example""]","python:file://eval_retrieval.py"
-"How can using examples in prompts improve Claude's performance on complex tasks?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/multishot-prompting#why-use-examples"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/increase-consistency#chain-prompts-for-complex-tasks""]","python:file://eval_retrieval.py"
-"What are the two types of content block deltas that can be emitted when streaming responses with tool use, and what does each delta type contain?","[""https://docs.claude.com/en/api/messages-streaming#input-json-delta"",""https://docs.claude.com/en/api/messages-streaming#text-delta"",""https://docs.claude.com/en/api/messages-streaming#streaming-request-with-tool-use"",""https://docs.claude.com/en/api/messages-streaming#delta-types""]","python:file://eval_retrieval.py"
-"What are two key capabilities of Claude that enable it to build interactive systems and personalized user experiences?","[""https://docs.claude.com/en/docs/build-with-claude/text-generation#text-capabilities-and-use-cases""]","python:file://eval_retrieval.py"
-"What are the key event types included in a raw HTTP stream response when using message streaming, and what is the typical order they occur in?","[""https://docs.claude.com/en/api/messages-streaming#event-types"",""https://docs.claude.com/en/api/messages-streaming#raw-http-stream-response""]","python:file://eval_retrieval.py"
-"What is the maximum number of images that can be included in a single request using the Claude API compared to the claude.ai interface?","[""https://docs.claude.com/en/docs/build-with-claude/vision#about-the-prompt-examples"",""https://docs.claude.com/en/docs/build-with-claude/vision#faq""]","python:file://eval_retrieval.py"
-"When Claude's response is cut off due to hitting the max_tokens limit and contains an incomplete tool use block, what should you do to get the full tool use?","[""https://docs.claude.com/en/docs/build-with-claude/tool-use#troubleshooting-errors""]","python:file://eval_retrieval.py"
-"What two steps are needed before running a classification evaluation on Claude according to the documentation?","[""https://docs.claude.com/en/docs/about-claude/use-cases/classification#3-run-your-eval"",""https://docs.claude.com/en/docs/about-claude/use-cases/classification#2-develop-your-test-cases""]","python:file://eval_retrieval.py"
-"How can you use the content parameter in the messages list to influence Claude's response?","[""https://docs.claude.com/en/api/messages-examples#basic-request-and-response"",""https://docs.claude.com/en/api/messages-examples#putting-words-in-claudes-mouth""]","python:file://eval_retrieval.py"
-"What are two key advantages of prompt engineering over fine-tuning when it comes to model comprehension and general knowledge preservation?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#when-to-prompt-engineer"",""https://docs.claude.com/en/docs/resources/glossary#fine-tuning""]","python:file://eval_retrieval.py"
-"What are the two main steps to get started with making requests to Claude models on Anthropic's Bedrock API?","[""https://docs.claude.com/en/api/claude-on-amazon-bedrock#install-and-configure-the-aws-cli"",""https://docs.claude.com/en/api/claude-on-amazon-bedrock#making-requests""]","python:file://eval_retrieval.py"
-"How can you check which Claude models are available in a specific AWS region using the AWS CLI?","[""https://docs.claude.com/en/api/claude-on-amazon-bedrock#subscribe-to-anthropic-models"",""https://docs.claude.com/en/api/claude-on-amazon-bedrock#list-available-models""]","python:file://eval_retrieval.py"
-"What argument can be passed to the voyageai.Client.embed() method or the Voyage HTTP API to specify whether the input text is a query or a document?","[""https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-python-package"",""https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-http-api""]","python:file://eval_retrieval.py"
-"How do the streaming API delta formats differ between tool_use content blocks and text content blocks?","[""https://docs.claude.com/en/api/messages-streaming#input-json-delta"",""https://docs.claude.com/en/api/messages-streaming#text-delta""]","python:file://eval_retrieval.py"
-"What are the image file size limits when uploading images to Claude using the API versus on claude.ai?","[""https://docs.claude.com/en/docs/build-with-claude/vision#faq""]","python:file://eval_retrieval.py"
-"What is one key consideration when selecting a Claude model for an enterprise use case that needs low latency?","[""https://docs.claude.com/en/docs/intro-to-claude#model-options"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#1-choose-the-right-model""]","python:file://eval_retrieval.py"
-"What embedding model does Anthropic recommend for code retrieval, and how does its performance compare to alternatives according to Voyage AI?","[""https://docs.claude.com/en/docs/build-with-claude/embeddings#how-to-get-embeddings-with-anthropic"",""https://docs.claude.com/en/docs/build-with-claude/embeddings#available-voyage-models""]","python:file://eval_retrieval.py"
-"What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?","[""https://docs.claude.com/en/docs/welcome#develop-with-claude"",""https://docs.claude.com/en/docs/quickstart#next-steps""]","python:file://eval_retrieval.py"
-"How does the size of the context window impact a language model's ability to utilize retrieval augmented generation (RAG)?","[""https://docs.claude.com/en/docs/resources/glossary#context-window"",""https://docs.claude.com/en/docs/resources/glossary#rag-retrieval-augmented-generation""]","python:file://eval_retrieval.py"
-"How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?","[""https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#understanding-results"",""https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#creating-test-cases""]","python:file://eval_retrieval.py"
-"Which Claude model has the fastest comparative latency according to the comparison tables?","[""https://docs.claude.com/en/docs/about-claude/models#model-comparison"",""https://docs.claude.com/en/docs/about-claude/models#legacy-model-comparison""]","python:file://eval_retrieval.py"
-"How can you build up a conversation with multiple turns using the Anthropic Messages API in Python?","[""https://docs.claude.com/en/api/client-sdks#python"",""https://docs.claude.com/en/api/messages-examples#multiple-conversational-turns""]","python:file://eval_retrieval.py"
-"How can using XML tags to provide a specific role or context help improve Claude's analysis of a legal contract compared to not using a role prompt?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags#examples"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/system-prompts#example-1-legal-contract-analysis""]","python:file://eval_retrieval.py"
-"What are the key differences between how Claude 3 Opus and Claude 3 Sonnet handle missing information when making tool calls?","[""https://docs.claude.com/en/docs/build-with-claude/tool-use#chain-of-thought"",""https://docs.claude.com/en/docs/build-with-claude/tool-use#tool-use-examples""]","python:file://eval_retrieval.py"
-"What steps should be taken to ensure a reliable deployment of an automated ticket routing system using Claude into a production environment?","[""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#additional-considerations"",""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#integrate-claude-into-your-support-workflow""]","python:file://eval_retrieval.py"
-"How should you evaluate a model's performance on a ticket routing classifier?","[""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#evaluating-the-performance-of-your-ticket-routing-classifier"",""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#integrate-claude-into-your-support-workflow""]","python:file://eval_retrieval.py"
-"What two methods does Anthropic recommend for learning how to prompt engineer with Claude before diving into the techniques?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#how-to-prompt-engineer"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#prompt-engineering-tutorial""]","python:file://eval_retrieval.py"
-"What are the key differences between a pretrained large language model and Claude in terms of their training and capabilities?","[""https://docs.claude.com/en/docs/resources/glossary#llm"",""https://docs.claude.com/en/docs/resources/glossary#pretraining""]","python:file://eval_retrieval.py"
-"What are some key advantages of using prompt engineering instead of fine-tuning to adapt a pretrained language model for a specific task or domain?","[""https://docs.claude.com/en/docs/resources/glossary#fine-tuning"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#when-to-prompt-engineer"",""https://docs.claude.com/en/docs/resources/glossary#pretraining""]","python:file://eval_retrieval.py"
-"How can you authenticate with GCP before running requests to access Claude models on Vertex AI?","[""https://docs.claude.com/en/api/claude-on-vertex-ai#making-requests"",""https://docs.claude.com/en/api/claude-on-vertex-ai#accessing-vertex-ai""]","python:file://eval_retrieval.py"
-"What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?","[""https://docs.claude.com/en/release-notes/api#may-10th-2024""]","python:file://eval_retrieval.py"
-"On what date did both the Claude 3.5 Sonnet model and the Artifacts feature in Claude.ai become available?","[""https://docs.claude.com/en/release-notes/api#june-20th-2024"",""https://docs.claude.com/en/release-notes/claude-apps#june-20th-2024""]","python:file://eval_retrieval.py"
-"When putting words in Claude's mouth to shape the response, what header and value can you use in the request to limit Claude's response to a single token?","[""https://docs.claude.com/en/api/messages-examples#basic-request-and-response"",""https://docs.claude.com/en/api/messages-examples#putting-words-in-claudes-mouth""]","python:file://eval_retrieval.py"
-"What does the temperature parameter do when working with large language models?","[""https://docs.claude.com/en/docs/resources/glossary#temperature"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#2-optimize-prompt-and-output-length""]","python:file://eval_retrieval.py"
-"What are two ways to specify API parameters when calling the Claude API using Claude for Sheets?","[""https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#tips-for-effective-evaluation"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/prefill-claudes-response#how-to-prefill-claudes-response"",""https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#enter-your-first-prompt""]","python:file://eval_retrieval.py"
-"How does prefilling the response with an opening curly brace ({ ) affect Claude's output when extracting structured data from text?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/prefill-claudes-response#example-1-controlling-output-formatting-and-skipping-the-preamble""]","python:file://eval_retrieval.py"
-"What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?","[""https://docs.claude.com/en/docs/build-with-claude/vision#dive-deeper-into-vision"",""https://docs.claude.com/en/docs/build-with-claude/vision#about-the-prompt-examples""]","python:file://eval_retrieval.py"
-"How do you specify the API key when creating a new Anthropic client in the Python and TypeScript SDK examples?","[""https://docs.claude.com/en/api/client-sdks#typescript"",""https://docs.claude.com/en/api/client-sdks#python""]","python:file://eval_retrieval.py"
-"What are two key benefits of using the Anthropic Evaluation tool when developing prompts for an AI classification application?","[""https://docs.claude.com/en/docs/about-claude/use-cases/classification#2-develop-your-test-cases"",""https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#understanding-results""]","python:file://eval_retrieval.py"
-"What are the key differences between a pretrained language model like Claude's underlying model, and the final version of Claude available through Anthropic's API?","[""https://docs.claude.com/en/docs/resources/glossary#pretraining"",""https://docs.claude.com/en/docs/resources/glossary#llm"",""https://docs.claude.com/en/docs/resources/glossary#fine-tuning""]","python:file://eval_retrieval.py"
-"What is the IPv6 address range used by Anthropic?","[""https://docs.claude.com/en/api/ip-addresses#ipv6""]","python:file://eval_retrieval.py"
-"When using the Python SDK to create a message with Claude, what are two ways you can specify your API key?","[""https://docs.claude.com/en/api/messages-examples#multiple-conversational-turns"",""https://docs.claude.com/en/api/client-sdks#python""]","python:file://eval_retrieval.py"
+query,correct_chunks,__expected
+"How can you create multiple test cases for an evaluation in the Anthropic Evaluation tool?","[""https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#creating-test-cases"",""https://docs.claude.com/en/docs/build-with-claude/develop-tests#building-evals-and-test-cases""]","python:file://eval_retrieval.py"
+"What embeddings provider does Anthropic recommend for customized domain-specific models, and what capabilities does this provider offer?","[""https://docs.claude.com/en/docs/build-with-claude/embeddings#before-implementing-embeddings"",""https://docs.claude.com/en/docs/build-with-claude/embeddings#how-to-get-embeddings-with-anthropic""]","python:file://eval_retrieval.py"
+"What are some key success metrics to consider when evaluating Claude's performance on a classification task, and how do they relate to choosing the right model to reduce latency?","[""https://docs.claude.com/en/docs/about-claude/use-cases/classification#evaluation-metrics"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#1-choose-the-right-model""]","python:file://eval_retrieval.py"
+"What are two ways that Claude for Sheets can improve prompt engineering workflows compared to using chained prompts?","[""https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#why-use-claude-for-sheets"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#how-to-chain-prompts""]","python:file://eval_retrieval.py"
+"What happens if a prompt for the Text Completions API is missing the ""\n\nHuman:"" and ""\n\nAssistant:"" turns?","[""https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#system-prompt"",""https://docs.claude.com/en/api/prompt-validation#examples""]","python:file://eval_retrieval.py"
+"How do the additional tokens required for tool use in Claude API requests impact pricing compared to regular API requests?","[""https://docs.claude.com/en/docs/build-with-claude/tool-use#pricing"",""https://docs.claude.com/en/docs/build-with-claude/tool-use#how-tool-use-works""]","python:file://eval_retrieval.py"
+"When will the new Anthropic Developer Console features that show API usage, billing details, and rate limits be available?","[""https://docs.claude.com/en/release-notes/api#june-27th-2024""]","python:file://eval_retrieval.py"
+"When deciding whether to use chain-of-thought (CoT) for a task, what are two key factors to consider in order to strike the right balance between performance and latency?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-of-thought#why-not-let-claude-think"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-of-thought#before-implementing-cot""]","python:file://eval_retrieval.py"
+"How can I use Claude to more easily digest the content of long PDF documents?","[""https://docs.claude.com/en/docs/build-with-claude/text-generation#anthropic-cookbook"",""https://docs.claude.com/en/docs/build-with-claude/vision#before-you-upload""]","python:file://eval_retrieval.py"
+"According to the documentation, where can you view your organization's current API rate limits in the Claude Console?","[""https://docs.claude.com/en/api/rate-limits#about-our-limits"",""https://docs.claude.com/en/release-notes/api#june-27th-2024""]","python:file://eval_retrieval.py"
+"How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy?","[""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#evaluation-methodology"",""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#prompting-claude-for-ticket-routing""]","python:file://eval_retrieval.py"
+"How can you specify a system prompt using the Text Completions API versus the Messages API?","[""https://docs.claude.com/en/api/prompt-validation#examples"",""https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#system-prompt""]","python:file://eval_retrieval.py"
+"How can you combine XML tags with chain of thought reasoning to create high-performance prompts for Claude?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags#tagging-best-practices"",""https://docs.claude.com/en/docs/build-with-claude/tool-use#chain-of-thought""]","python:file://eval_retrieval.py"
+"When evaluating the Claude model's performance for ticket routing, what three key metrics are calculated and what are the results for the claude-3-haiku-20240307 model on the 91 test samples?","[""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#evaluation-methodology"",""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#example-data""]","python:file://eval_retrieval.py"
+"Before starting to engineer and improve a prompt in Claude, what key things does Anthropic recommend you have in place first?","[""https://docs.claude.com/en/docs/build-with-claude/define-success#next-steps"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#before-prompt-engineering""]","python:file://eval_retrieval.py"
+"How does the Messages API handle mid-response prompting compared to the Text Completions API?","[""https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#inputs-and-outputs"",""https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#putting-words-in-claudes-mouth""]","python:file://eval_retrieval.py"
+"How does Claude's response differ when given a role through a system prompt compared to not having a specific role in the financial analysis example?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/system-prompts#example-2-financial-analysis""]","python:file://eval_retrieval.py"
+"What are some quantitative metrics that can be used to measure the success of a sentiment analysis model, and how might specific targets for those metrics be determined?","[""https://docs.claude.com/en/docs/build-with-claude/define-success#building-strong-criteria""]","python:file://eval_retrieval.py"
+"What is a power user tip mentioned in the documentation for creating high-performance prompts using XML tags?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#how-to-prompt-engineer"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags#tagging-best-practices""]","python:file://eval_retrieval.py"
+"How can you use an LLM like Claude to automatically grade the outputs of other LLMs based on a rubric?","[""https://docs.claude.com/en/docs/build-with-claude/develop-tests#tips-for-llm-based-grading"",""https://docs.claude.com/en/api/messages-examples#multiple-conversational-turns""]","python:file://eval_retrieval.py"
+"How can you access and deploy Voyage embeddings on AWS Marketplace?","[""https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-on-the-aws-marketplace""]","python:file://eval_retrieval.py"
+"When using tools just to get Claude to produce JSON output following a particular schema, what key things should you do in terms of tool setup and prompting?","[""https://docs.claude.com/en/docs/build-with-claude/tool-use#tool-use-examples"",""https://docs.claude.com/en/docs/build-with-claude/tool-use#json-output""]","python:file://eval_retrieval.py"
+"What are the key differences between the legacy Claude Instant 1.2 model and the Claude 3 Haiku model in terms of capabilities and performance?","[""https://docs.claude.com/en/docs/about-claude/models#legacy-model-comparison"",""https://docs.claude.com/en/docs/about-claude/models#model-comparison"",""https://docs.claude.com/en/docs/about-claude/models#legacy-models""]","python:file://eval_retrieval.py"
+"What is one key benefit of using examples when prompt engineering with Claude?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/multishot-prompting#why-use-examples""]","python:file://eval_retrieval.py"
+"According to the Claude Documentation, what is one key advantage of using prompt engineering instead of fine-tuning when it comes to adapting an AI model to new domains or tasks?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#when-to-prompt-engineer"",""https://docs.claude.com/en/docs/resources/glossary#fine-tuning""]","python:file://eval_retrieval.py"
+"How can I quickly get started using the Claude for Sheets extension with a pre-made template?","[""https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#claude-for-sheets-workbook-template"",""https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#get-started-with-claude-for-sheets""]","python:file://eval_retrieval.py"
+"How does the ""index"" field in the ""content_block_delta"" event relate to the text being streamed in a response?","[""https://docs.claude.com/en/api/messages-streaming#basic-streaming-request"",""https://docs.claude.com/en/api/messages-streaming#text-delta""]","python:file://eval_retrieval.py"
+"How can you include an image as part of a Claude API request, and what image formats are currently supported?","[""https://docs.claude.com/en/api/messages-examples#vision"",""https://docs.claude.com/en/docs/build-with-claude/vision#about-the-prompt-examples""]","python:file://eval_retrieval.py"
+"What is the relationship between time to first token (TTFT) and latency when evaluating a language model's performance?","[""https://docs.claude.com/en/docs/resources/glossary#ttft-time-to-first-token"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#how-to-measure-latency"",""https://docs.claude.com/en/docs/resources/glossary#latency""]","python:file://eval_retrieval.py"
+"How can providing Claude with examples of handling certain edge cases like implicit requests or emotional prioritization help improve its performance in routing support tickets?","[""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#adapting-to-common-scenarios"",""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#prompting-claude-for-ticket-routing""]","python:file://eval_retrieval.py"
+"How does the stop_reason of ""tool_use"" relate to the overall workflow of integrating external tools with Claude?","[""https://docs.claude.com/en/api/messages-examples#tool-use-and-json-mode"",""https://docs.claude.com/en/docs/build-with-claude/tool-use#how-tool-use-works""]","python:file://eval_retrieval.py"
+"According to the documentation, what error event and corresponding HTTP error code may be sent during periods of high usage for the Claude API when using streaming responses?","[""https://docs.claude.com/en/api/messages-streaming#error-events"",""https://docs.claude.com/en/api/streaming#error-event-types"",""https://docs.claude.com/en/api/errors#http-errors""]","python:file://eval_retrieval.py"
+"What are the two types of deltas that can be contained in a content_block_delta event when streaming responses from the Claude API?","[""https://docs.claude.com/en/api/messages-streaming#text-delta"",""https://docs.claude.com/en/api/messages-streaming#delta-types""]","python:file://eval_retrieval.py"
+"On what date did Claude 3.5 Sonnet and tool use both become generally available across the Claude API, Amazon Bedrock, and Google Vertex AI?","[""https://docs.claude.com/en/release-notes/api#june-20th-2024"",""https://docs.claude.com/en/release-notes/api#may-30th-2024""]","python:file://eval_retrieval.py"
+"In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe?","[""https://docs.claude.com/en/release-notes/claude-apps#june-5th-2024"",""https://docs.claude.com/en/release-notes/claude-apps#may-13th-2024""]","python:file://eval_retrieval.py"
+"When the API response from Claude has a stop_reason of ""tool_use"", what does this indicate and what should be done next to continue the conversation?","[""https://docs.claude.com/en/docs/build-with-claude/tool-use#json-output"",""https://docs.claude.com/en/docs/build-with-claude/tool-use#how-tool-use-works""]","python:file://eval_retrieval.py"
+"What Python libraries are used in the example code snippet for evaluating tone and style in a customer service chatbot?","[""https://docs.claude.com/en/docs/build-with-claude/develop-tests#example-evals""]","python:file://eval_retrieval.py"
+"What are the two main ways to authenticate when using the Anthropic Python SDK to access Claude models on Amazon Bedrock?","[""https://docs.claude.com/en/api/claude-on-amazon-bedrock#install-an-sdk-for-accessing-bedrock"",""https://docs.claude.com/en/api/claude-on-amazon-bedrock#making-requests""]","python:file://eval_retrieval.py"
+"When deciding whether to implement leak-resistant prompt engineering strategies, what two factors should be considered and balanced?","[""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-prompt-leak#strategies-to-reduce-prompt-leak"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-prompt-leak#before-you-try-to-reduce-prompt-leak""]","python:file://eval_retrieval.py"
+"How can selecting the appropriate Claude model based on your specific requirements help reduce latency in your application?","[""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#1-choose-the-right-model"",""https://docs.claude.com/en/docs/intro-to-claude#model-options""]","python:file://eval_retrieval.py"
+"How can you stream responses from the Claude API using the Python SDK?","[""https://docs.claude.com/en/api/messages-streaming#streaming-with-sdks"",""https://docs.claude.com/en/api/client-sdks#python""]","python:file://eval_retrieval.py"
+"How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case?","[""https://docs.claude.com/en/api/messages-examples#putting-words-in-claudes-mouth"",""https://docs.claude.com/en/api/messages-examples#basic-request-and-response""]","python:file://eval_retrieval.py"
+"What is more important when building an eval set for an AI system - having a larger number of test cases with automated grading, or having fewer high-quality test cases graded by humans?","[""https://docs.claude.com/en/docs/build-with-claude/develop-tests#eval-design-principles"",""https://docs.claude.com/en/docs/build-with-claude/develop-tests#building-evals-and-test-cases""]","python:file://eval_retrieval.py"
+"What are the two required fields in a content_block_delta event for a text delta type?","[""https://docs.claude.com/en/api/messages-streaming#delta-types"",""https://docs.claude.com/en/api/messages-streaming#text-delta""]","python:file://eval_retrieval.py"
+"What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?","[""https://docs.claude.com/en/docs/quickstart#next-steps"",""https://docs.claude.com/en/docs/welcome#develop-with-claude""]","python:file://eval_retrieval.py"
+"Why does breaking a task into distinct subtasks for chained prompts help improve Claude's accuracy on the overall task?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#how-to-chain-prompts"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#why-chain-prompts""]","python:file://eval_retrieval.py"
+"How does the streaming format for Messages responses differ from Text Completions streaming responses?","[""https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#streaming-format""]","python:file://eval_retrieval.py"
+"What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?","[""https://docs.claude.com/en/docs/about-claude/models#get-started-with-claude""]","python:file://eval_retrieval.py"
+"How can using chain prompts help reduce errors and inconsistency in complex tasks handled by Claude?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#why-chain-prompts"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/increase-consistency#chain-prompts-for-complex-tasks""]","python:file://eval_retrieval.py"
+"What HTTP status code does an overloaded_error event correspond to in a non-streaming context for the Claude API?","[""https://docs.claude.com/en/api/streaming#error-event-types"",""https://docs.claude.com/en/api/messages-streaming#error-events""]","python:file://eval_retrieval.py"
+"What are the two ways to specify the format in which Voyage AI returns embeddings through its HTTP API?","[""https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-http-api""]","python:file://eval_retrieval.py"
+"When streaming API requests that use tools, how are the input JSON deltas for tool_use content blocks sent, and how can they be accumulated and parsed by the client?","[""https://docs.claude.com/en/api/messages-streaming#input-json-delta"",""https://docs.claude.com/en/api/messages-streaming#streaming-request-with-tool-use""]","python:file://eval_retrieval.py"
+"What are the two interactive prompt engineering tutorials that Anthropic offers, and how do they differ?","[""https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#prompt-engineering-interactive-tutorial"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#prompt-engineering-tutorial""]","python:file://eval_retrieval.py"
+"What are some of the key capabilities that make Claude suitable for enterprise use cases requiring integration with specialized applications and processing of large volumes of sensitive data?","[""https://docs.claude.com/en/docs/intro-to-claude#enterprise-considerations""]","python:file://eval_retrieval.py"
+"As of June 2024, in which regions are Anthropic's Claude.ai API and iOS app available?","[""https://docs.claude.com/en/release-notes/claude-apps#may-1st-2024"",""https://docs.claude.com/en/release-notes/claude-apps#june-5th-2024"",""https://docs.claude.com/en/release-notes/claude-apps#may-13th-2024""]","python:file://eval_retrieval.py"
+"What are the two main approaches for integrating Claude into a support ticket workflow, and how do they differ in terms of scalability and ease of implementation?","[""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#integrate-claude-into-your-support-workflow"",""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#introduction""]","python:file://eval_retrieval.py"
+"When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?","[""https://docs.claude.com/en/release-notes/api#may-10th-2024""]","python:file://eval_retrieval.py"
+"Which Claude 3 model provides the best balance of intelligence and speed for high-throughput tasks like sales forecasting and targeted marketing?","[""https://docs.claude.com/en/api/claude-on-vertex-ai#api-model-names"",""https://docs.claude.com/en/docs/intro-to-claude#claude-3-family""]","python:file://eval_retrieval.py"
+"How can you calculate the similarity between two Voyage embedding vectors, and what is this equivalent to since Voyage embeddings are normalized to length 1?","[""https://docs.claude.com/en/docs/build-with-claude/embeddings#faq"",""https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-embedding-example""]","python:file://eval_retrieval.py"
+"How can using examples in prompts improve Claude's performance on complex tasks?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/multishot-prompting#why-use-examples"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/increase-consistency#chain-prompts-for-complex-tasks""]","python:file://eval_retrieval.py"
+"What are the two types of content block deltas that can be emitted when streaming responses with tool use, and what does each delta type contain?","[""https://docs.claude.com/en/api/messages-streaming#input-json-delta"",""https://docs.claude.com/en/api/messages-streaming#text-delta"",""https://docs.claude.com/en/api/messages-streaming#streaming-request-with-tool-use"",""https://docs.claude.com/en/api/messages-streaming#delta-types""]","python:file://eval_retrieval.py"
+"What are two key capabilities of Claude that enable it to build interactive systems and personalized user experiences?","[""https://docs.claude.com/en/docs/build-with-claude/text-generation#text-capabilities-and-use-cases""]","python:file://eval_retrieval.py"
+"What are the key event types included in a raw HTTP stream response when using message streaming, and what is the typical order they occur in?","[""https://docs.claude.com/en/api/messages-streaming#event-types"",""https://docs.claude.com/en/api/messages-streaming#raw-http-stream-response""]","python:file://eval_retrieval.py"
+"What is the maximum number of images that can be included in a single request using the Claude API compared to the claude.ai interface?","[""https://docs.claude.com/en/docs/build-with-claude/vision#about-the-prompt-examples"",""https://docs.claude.com/en/docs/build-with-claude/vision#faq""]","python:file://eval_retrieval.py"
+"When Claude's response is cut off due to hitting the max_tokens limit and contains an incomplete tool use block, what should you do to get the full tool use?","[""https://docs.claude.com/en/docs/build-with-claude/tool-use#troubleshooting-errors""]","python:file://eval_retrieval.py"
+"What two steps are needed before running a classification evaluation on Claude according to the documentation?","[""https://docs.claude.com/en/docs/about-claude/use-cases/classification#3-run-your-eval"",""https://docs.claude.com/en/docs/about-claude/use-cases/classification#2-develop-your-test-cases""]","python:file://eval_retrieval.py"
+"How can you use the content parameter in the messages list to influence Claude's response?","[""https://docs.claude.com/en/api/messages-examples#basic-request-and-response"",""https://docs.claude.com/en/api/messages-examples#putting-words-in-claudes-mouth""]","python:file://eval_retrieval.py"
+"What are two key advantages of prompt engineering over fine-tuning when it comes to model comprehension and general knowledge preservation?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#when-to-prompt-engineer"",""https://docs.claude.com/en/docs/resources/glossary#fine-tuning""]","python:file://eval_retrieval.py"
+"What are the two main steps to get started with making requests to Claude models on Anthropic's Bedrock API?","[""https://docs.claude.com/en/api/claude-on-amazon-bedrock#install-and-configure-the-aws-cli"",""https://docs.claude.com/en/api/claude-on-amazon-bedrock#making-requests""]","python:file://eval_retrieval.py"
+"How can you check which Claude models are available in a specific AWS region using the AWS CLI?","[""https://docs.claude.com/en/api/claude-on-amazon-bedrock#subscribe-to-anthropic-models"",""https://docs.claude.com/en/api/claude-on-amazon-bedrock#list-available-models""]","python:file://eval_retrieval.py"
+"What argument can be passed to the voyageai.Client.embed() method or the Voyage HTTP API to specify whether the input text is a query or a document?","[""https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-python-package"",""https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-http-api""]","python:file://eval_retrieval.py"
+"How do the streaming API delta formats differ between tool_use content blocks and text content blocks?","[""https://docs.claude.com/en/api/messages-streaming#input-json-delta"",""https://docs.claude.com/en/api/messages-streaming#text-delta""]","python:file://eval_retrieval.py"
+"What are the image file size limits when uploading images to Claude using the API versus on claude.ai?","[""https://docs.claude.com/en/docs/build-with-claude/vision#faq""]","python:file://eval_retrieval.py"
+"What is one key consideration when selecting a Claude model for an enterprise use case that needs low latency?","[""https://docs.claude.com/en/docs/intro-to-claude#model-options"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#1-choose-the-right-model""]","python:file://eval_retrieval.py"
+"What embedding model does Anthropic recommend for code retrieval, and how does its performance compare to alternatives according to Voyage AI?","[""https://docs.claude.com/en/docs/build-with-claude/embeddings#how-to-get-embeddings-with-anthropic"",""https://docs.claude.com/en/docs/build-with-claude/embeddings#available-voyage-models""]","python:file://eval_retrieval.py"
+"What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs?","[""https://docs.claude.com/en/docs/welcome#develop-with-claude"",""https://docs.claude.com/en/docs/quickstart#next-steps""]","python:file://eval_retrieval.py"
+"How does the size of the context window impact a language model's ability to utilize retrieval augmented generation (RAG)?","[""https://docs.claude.com/en/docs/resources/glossary#context-window"",""https://docs.claude.com/en/docs/resources/glossary#rag-retrieval-augmented-generation""]","python:file://eval_retrieval.py"
+"How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?","[""https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#understanding-results"",""https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#creating-test-cases""]","python:file://eval_retrieval.py"
+"Which Claude model has the fastest comparative latency according to the comparison tables?","[""https://docs.claude.com/en/docs/about-claude/models#model-comparison"",""https://docs.claude.com/en/docs/about-claude/models#legacy-model-comparison""]","python:file://eval_retrieval.py"
+"How can you build up a conversation with multiple turns using the Anthropic Messages API in Python?","[""https://docs.claude.com/en/api/client-sdks#python"",""https://docs.claude.com/en/api/messages-examples#multiple-conversational-turns""]","python:file://eval_retrieval.py"
+"How can using XML tags to provide a specific role or context help improve Claude's analysis of a legal contract compared to not using a role prompt?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags#examples"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/system-prompts#example-1-legal-contract-analysis""]","python:file://eval_retrieval.py"
+"What are the key differences between how Claude 3 Opus and Claude 3 Sonnet handle missing information when making tool calls?","[""https://docs.claude.com/en/docs/build-with-claude/tool-use#chain-of-thought"",""https://docs.claude.com/en/docs/build-with-claude/tool-use#tool-use-examples""]","python:file://eval_retrieval.py"
+"What steps should be taken to ensure a reliable deployment of an automated ticket routing system using Claude into a production environment?","[""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#additional-considerations"",""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#integrate-claude-into-your-support-workflow""]","python:file://eval_retrieval.py"
+"How should you evaluate a model's performance on a ticket routing classifier?","[""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#evaluating-the-performance-of-your-ticket-routing-classifier"",""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#integrate-claude-into-your-support-workflow""]","python:file://eval_retrieval.py"
+"What two methods does Anthropic recommend for learning how to prompt engineer with Claude before diving into the techniques?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#how-to-prompt-engineer"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#prompt-engineering-tutorial""]","python:file://eval_retrieval.py"
+"What are the key differences between a pretrained large language model and Claude in terms of their training and capabilities?","[""https://docs.claude.com/en/docs/resources/glossary#llm"",""https://docs.claude.com/en/docs/resources/glossary#pretraining""]","python:file://eval_retrieval.py"
+"What are some key advantages of using prompt engineering instead of fine-tuning to adapt a pretrained language model for a specific task or domain?","[""https://docs.claude.com/en/docs/resources/glossary#fine-tuning"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#when-to-prompt-engineer"",""https://docs.claude.com/en/docs/resources/glossary#pretraining""]","python:file://eval_retrieval.py"
+"How can you authenticate with GCP before running requests to access Claude models on Vertex AI?","[""https://docs.claude.com/en/api/claude-on-vertex-ai#making-requests"",""https://docs.claude.com/en/api/claude-on-vertex-ai#accessing-vertex-ai""]","python:file://eval_retrieval.py"
+"What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?","[""https://docs.claude.com/en/release-notes/api#may-10th-2024""]","python:file://eval_retrieval.py"
+"On what date did both the Claude 3.5 Sonnet model and the Artifacts feature in Claude.ai become available?","[""https://docs.claude.com/en/release-notes/api#june-20th-2024"",""https://docs.claude.com/en/release-notes/claude-apps#june-20th-2024""]","python:file://eval_retrieval.py"
+"When putting words in Claude's mouth to shape the response, what header and value can you use in the request to limit Claude's response to a single token?","[""https://docs.claude.com/en/api/messages-examples#basic-request-and-response"",""https://docs.claude.com/en/api/messages-examples#putting-words-in-claudes-mouth""]","python:file://eval_retrieval.py"
+"What does the temperature parameter do when working with large language models?","[""https://docs.claude.com/en/docs/resources/glossary#temperature"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#2-optimize-prompt-and-output-length""]","python:file://eval_retrieval.py"
+"What are two ways to specify API parameters when calling the Claude API using Claude for Sheets?","[""https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#tips-for-effective-evaluation"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/prefill-claudes-response#how-to-prefill-claudes-response"",""https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#enter-your-first-prompt""]","python:file://eval_retrieval.py"
+"How does prefilling the response with an opening curly brace ({ ) affect Claude's output when extracting structured data from text?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/prefill-claudes-response#example-1-controlling-output-formatting-and-skipping-the-preamble""]","python:file://eval_retrieval.py"
+"What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?","[""https://docs.claude.com/en/docs/build-with-claude/vision#dive-deeper-into-vision"",""https://docs.claude.com/en/docs/build-with-claude/vision#about-the-prompt-examples""]","python:file://eval_retrieval.py"
+"How do you specify the API key when creating a new Anthropic client in the Python and TypeScript SDK examples?","[""https://docs.claude.com/en/api/client-sdks#typescript"",""https://docs.claude.com/en/api/client-sdks#python""]","python:file://eval_retrieval.py"
+"What are two key benefits of using the Anthropic Evaluation tool when developing prompts for an AI classification application?","[""https://docs.claude.com/en/docs/about-claude/use-cases/classification#2-develop-your-test-cases"",""https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#understanding-results""]","python:file://eval_retrieval.py"
+"What are the key differences between a pretrained language model like Claude's underlying model, and the final version of Claude available through Anthropic's API?","[""https://docs.claude.com/en/docs/resources/glossary#pretraining"",""https://docs.claude.com/en/docs/resources/glossary#llm"",""https://docs.claude.com/en/docs/resources/glossary#fine-tuning""]","python:file://eval_retrieval.py"
+"What is the IPv6 address range used by Anthropic?","[""https://docs.claude.com/en/api/ip-addresses#ipv6""]","python:file://eval_retrieval.py"
+"When using the Python SDK to create a message with Claude, what are two ways you can specify your API key?","[""https://docs.claude.com/en/api/messages-examples#multiple-conversational-turns"",""https://docs.claude.com/en/api/client-sdks#python""]","python:file://eval_retrieval.py"
diff --git a/skills/retrieval_augmented_generation/guide.ipynb b/skills/retrieval_augmented_generation/guide.ipynb
index e35907e..92765ae 100644
--- a/skills/retrieval_augmented_generation/guide.ipynb
+++ b/skills/retrieval_augmented_generation/guide.ipynb
@@ -1933,7 +1933,7 @@
"text": [
"\n",
"\n",
- "The generated answer is incorrect. While it correctly mentions the Claude Cookbook as one interactive learning resource, it fails to mention the Developer Console and its prompt generator tool, which is a key component mentioned in the correct answer. Instead, it references the \"More Resources\" section and documentation, which weren't identified in the correct answer as interactive learning methods. The generated answer therefore misses one of the two main interactive learning tools specified in the correct answer.\n",
+ "The generated answer is incorrect. While it correctly mentions the Claude Cookbooks as one interactive learning resource, it fails to mention the Developer Console and its prompt generator tool, which is a key component mentioned in the correct answer. Instead, it references the \"More Resources\" section and documentation, which weren't identified in the correct answer as interactive learning methods. The generated answer therefore misses one of the two main interactive learning tools specified in the correct answer.\n",
"false\n",
"\n",
"\n"
@@ -3804,7 +3804,7 @@
"text": [
"\n",
"\n",
- "The Generated Answer is correct as it conveys the same core message as the Correct Answer. Both answers emphasize that Claude can be used to summarize PDF documents, making it easier to understand long documents without reading everything. While the Generated Answer provides additional details about text analysis capabilities and mentions the Claude Cookbook, these are supplementary details that don't contradict the core message. The essential functionality - uploading PDFs and getting summaries to more easily digest long documents - is accurately captured in both answers.\n",
+ "The Generated Answer is correct as it conveys the same core message as the Correct Answer. Both answers emphasize that Claude can be used to summarize PDF documents, making it easier to understand long documents without reading everything. While the Generated Answer provides additional details about text analysis capabilities and mentions the Claude Cookbooks, these are supplementary details that don't contradict the core message. The essential functionality - uploading PDFs and getting summaries to more easily digest long documents - is accurately captured in both answers.\n",
"true\n",
"\n",
"\n"
@@ -4633,7 +4633,7 @@
"text": [
"\n",
"\n",
- "The Generated Answer is incorrect because it misses a critical piece of information from the Correct Answer. While it correctly mentions the Claude Cookbook as one interactive way to learn Claude's capabilities, it completely fails to mention the Developer Console and its prompt generator tool, which is the second key interactive learning method specified in the Correct Answer. Instead, it incorrectly references \"Claude for Sheets usage examples\" as the second method, which wasn't mentioned in the Correct Answer at all. The omission of the Developer Console and the inclusion of incorrect information makes this answer incomplete and partially inaccurate.\n",
+ "The Generated Answer is incorrect because it misses a critical piece of information from the Correct Answer. While it correctly mentions the Claude Cookbooks as one interactive way to learn Claude's capabilities, it completely fails to mention the Developer Console and its prompt generator tool, which is the second key interactive learning method specified in the Correct Answer. Instead, it incorrectly references \"Claude for Sheets usage examples\" as the second method, which wasn't mentioned in the Correct Answer at all. The omission of the Developer Console and the inclusion of incorrect information makes this answer incomplete and partially inaccurate.\n",
"false\n",
"\n",
"\n"
@@ -5298,7 +5298,7 @@
"text": [
"\n",
"\n",
- "The Generated Answer is essentially correct. Both answers highlight that the Claude Cookbook provides interactive Jupyter notebooks that demonstrate API functionality, specifically mentioning PDF uploads and embeddings. While the Generated Answer splits this into two points and adds some additional context about hands-on learning, the core information matches the Correct Answer. There are no contradictions or missing critical pieces of information between the two answers - they're conveying the same fundamental message about how the Cookbook helps developers learn through interactive notebooks and demonstrations.\n",
+ "The Generated Answer is essentially correct. Both answers highlight that the Claude Cookbooks provides interactive Jupyter notebooks that demonstrate API functionality, specifically mentioning PDF uploads and embeddings. While the Generated Answer splits this into two points and adds some additional context about hands-on learning, the core information matches the Correct Answer. There are no contradictions or missing critical pieces of information between the two answers - they're conveying the same fundamental message about how the Cookbook helps developers learn through interactive notebooks and demonstrations.\n",
"true\n",
"\n",
"\n"
@@ -8845,7 +8845,7 @@
"\n",
"The Generated Answer is correct. It captures the two key interactive ways to learn Claude's capabilities that were mentioned in the Correct Answer:\n",
"\n",
- "1. The Claude Cookbook with its interactive Jupyter notebooks\n",
+ "1. The Claude Cookbooks with its interactive Jupyter notebooks\n",
"2. The Developer Console with its prompt generator tool\n",
"\n",
"The Generated Answer actually provides slightly more detail than the Correct Answer, but the core substance is the same. The mention of VoyageAI and additional details about the Developer Console don't contradict the Correct Answer - they're just supplementary information. Both answers focus on the same two main interactive learning methods, and there are no critical omissions or contradictions between them.\n",