mirror of
https://github.com/anthropics/claude-cookbooks.git
synced 2025-10-06 01:00:28 +03:00
- Changed all instances of singular 'Claude Cookbook' to plural 'Claude Cookbooks' - Updated URLs from anthropic-cookbooks to claude-cookbooks - Applied changes across documentation, code, and data files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
32 KiB
32 KiB
| 1 | query | correct_chunks | __expected |
|---|---|---|---|
| 2 | How can you create multiple test cases for an evaluation in the Anthropic Evaluation tool? | ["https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#creating-test-cases","https://docs.claude.com/en/docs/build-with-claude/develop-tests#building-evals-and-test-cases"] | python:file://eval_retrieval.py |
| 3 | What embeddings provider does Anthropic recommend for customized domain-specific models, and what capabilities does this provider offer? | ["https://docs.claude.com/en/docs/build-with-claude/embeddings#before-implementing-embeddings","https://docs.claude.com/en/docs/build-with-claude/embeddings#how-to-get-embeddings-with-anthropic"] | python:file://eval_retrieval.py |
| 4 | What are some key success metrics to consider when evaluating Claude's performance on a classification task, and how do they relate to choosing the right model to reduce latency? | ["https://docs.claude.com/en/docs/about-claude/use-cases/classification#evaluation-metrics","https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#1-choose-the-right-model"] | python:file://eval_retrieval.py |
| 5 | What are two ways that Claude for Sheets can improve prompt engineering workflows compared to using chained prompts? | ["https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#why-use-claude-for-sheets","https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#how-to-chain-prompts"] | python:file://eval_retrieval.py |
| 6 | What happens if a prompt for the Text Completions API is missing the "\n\nHuman:" and "\n\nAssistant:" turns? | ["https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#system-prompt","https://docs.claude.com/en/api/prompt-validation#examples"] | python:file://eval_retrieval.py |
| 7 | How do the additional tokens required for tool use in Claude API requests impact pricing compared to regular API requests? | ["https://docs.claude.com/en/docs/build-with-claude/tool-use#pricing","https://docs.claude.com/en/docs/build-with-claude/tool-use#how-tool-use-works"] | python:file://eval_retrieval.py |
| 8 | When will the new Anthropic Developer Console features that show API usage, billing details, and rate limits be available? | ["https://docs.claude.com/en/release-notes/api#june-27th-2024"] | python:file://eval_retrieval.py |
| 9 | When deciding whether to use chain-of-thought (CoT) for a task, what are two key factors to consider in order to strike the right balance between performance and latency? | ["https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-of-thought#why-not-let-claude-think","https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-of-thought#before-implementing-cot"] | python:file://eval_retrieval.py |
| 10 | How can I use Claude to more easily digest the content of long PDF documents? | ["https://docs.claude.com/en/docs/build-with-claude/text-generation#anthropic-cookbook","https://docs.claude.com/en/docs/build-with-claude/vision#before-you-upload"] | python:file://eval_retrieval.py |
| 11 | According to the documentation, where can you view your organization's current API rate limits in the Claude Console? | ["https://docs.claude.com/en/api/rate-limits#about-our-limits","https://docs.claude.com/en/release-notes/api#june-27th-2024"] | python:file://eval_retrieval.py |
| 12 | How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy? | ["https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#evaluation-methodology","https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#prompting-claude-for-ticket-routing"] | python:file://eval_retrieval.py |
| 13 | How can you specify a system prompt using the Text Completions API versus the Messages API? | ["https://docs.claude.com/en/api/prompt-validation#examples","https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#system-prompt"] | python:file://eval_retrieval.py |
| 14 | How can you combine XML tags with chain of thought reasoning to create high-performance prompts for Claude? | ["https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags#tagging-best-practices","https://docs.claude.com/en/docs/build-with-claude/tool-use#chain-of-thought"] | python:file://eval_retrieval.py |
| 15 | When evaluating the Claude model's performance for ticket routing, what three key metrics are calculated and what are the results for the claude-3-haiku-20240307 model on the 91 test samples? | ["https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#evaluation-methodology","https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#example-data"] | python:file://eval_retrieval.py |
| 16 | Before starting to engineer and improve a prompt in Claude, what key things does Anthropic recommend you have in place first? | ["https://docs.claude.com/en/docs/build-with-claude/define-success#next-steps","https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#before-prompt-engineering"] | python:file://eval_retrieval.py |
| 17 | How does the Messages API handle mid-response prompting compared to the Text Completions API? | ["https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#inputs-and-outputs","https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#putting-words-in-claudes-mouth"] | python:file://eval_retrieval.py |
| 18 | How does Claude's response differ when given a role through a system prompt compared to not having a specific role in the financial analysis example? | ["https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/system-prompts#example-2-financial-analysis"] | python:file://eval_retrieval.py |
| 19 | What are some quantitative metrics that can be used to measure the success of a sentiment analysis model, and how might specific targets for those metrics be determined? | ["https://docs.claude.com/en/docs/build-with-claude/define-success#building-strong-criteria"] | python:file://eval_retrieval.py |
| 20 | What is a power user tip mentioned in the documentation for creating high-performance prompts using XML tags? | ["https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#how-to-prompt-engineer","https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags#tagging-best-practices"] | python:file://eval_retrieval.py |
| 21 | How can you use an LLM like Claude to automatically grade the outputs of other LLMs based on a rubric? | ["https://docs.claude.com/en/docs/build-with-claude/develop-tests#tips-for-llm-based-grading","https://docs.claude.com/en/api/messages-examples#multiple-conversational-turns"] | python:file://eval_retrieval.py |
| 22 | How can you access and deploy Voyage embeddings on AWS Marketplace? | ["https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-on-the-aws-marketplace"] | python:file://eval_retrieval.py |
| 23 | When using tools just to get Claude to produce JSON output following a particular schema, what key things should you do in terms of tool setup and prompting? | ["https://docs.claude.com/en/docs/build-with-claude/tool-use#tool-use-examples","https://docs.claude.com/en/docs/build-with-claude/tool-use#json-output"] | python:file://eval_retrieval.py |
| 24 | What are the key differences between the legacy Claude Instant 1.2 model and the Claude 3 Haiku model in terms of capabilities and performance? | ["https://docs.claude.com/en/docs/about-claude/models#legacy-model-comparison","https://docs.claude.com/en/docs/about-claude/models#model-comparison","https://docs.claude.com/en/docs/about-claude/models#legacy-models"] | python:file://eval_retrieval.py |
| 25 | What is one key benefit of using examples when prompt engineering with Claude? | ["https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/multishot-prompting#why-use-examples"] | python:file://eval_retrieval.py |
| 26 | According to the Claude Documentation, what is one key advantage of using prompt engineering instead of fine-tuning when it comes to adapting an AI model to new domains or tasks? | ["https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#when-to-prompt-engineer","https://docs.claude.com/en/docs/resources/glossary#fine-tuning"] | python:file://eval_retrieval.py |
| 27 | How can I quickly get started using the Claude for Sheets extension with a pre-made template? | ["https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#claude-for-sheets-workbook-template","https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#get-started-with-claude-for-sheets"] | python:file://eval_retrieval.py |
| 28 | How does the "index" field in the "content_block_delta" event relate to the text being streamed in a response? | ["https://docs.claude.com/en/api/messages-streaming#basic-streaming-request","https://docs.claude.com/en/api/messages-streaming#text-delta"] | python:file://eval_retrieval.py |
| 29 | How can you include an image as part of a Claude API request, and what image formats are currently supported? | ["https://docs.claude.com/en/api/messages-examples#vision","https://docs.claude.com/en/docs/build-with-claude/vision#about-the-prompt-examples"] | python:file://eval_retrieval.py |
| 30 | What is the relationship between time to first token (TTFT) and latency when evaluating a language model's performance? | ["https://docs.claude.com/en/docs/resources/glossary#ttft-time-to-first-token","https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#how-to-measure-latency","https://docs.claude.com/en/docs/resources/glossary#latency"] | python:file://eval_retrieval.py |
| 31 | How can providing Claude with examples of handling certain edge cases like implicit requests or emotional prioritization help improve its performance in routing support tickets? | ["https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#adapting-to-common-scenarios","https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#prompting-claude-for-ticket-routing"] | python:file://eval_retrieval.py |
| 32 | How does the stop_reason of "tool_use" relate to the overall workflow of integrating external tools with Claude? | ["https://docs.claude.com/en/api/messages-examples#tool-use-and-json-mode","https://docs.claude.com/en/docs/build-with-claude/tool-use#how-tool-use-works"] | python:file://eval_retrieval.py |
| 33 | According to the documentation, what error event and corresponding HTTP error code may be sent during periods of high usage for the Claude API when using streaming responses? | ["https://docs.claude.com/en/api/messages-streaming#error-events","https://docs.claude.com/en/api/streaming#error-event-types","https://docs.claude.com/en/api/errors#http-errors"] | python:file://eval_retrieval.py |
| 34 | What are the two types of deltas that can be contained in a content_block_delta event when streaming responses from the Claude API? | ["https://docs.claude.com/en/api/messages-streaming#text-delta","https://docs.claude.com/en/api/messages-streaming#delta-types"] | python:file://eval_retrieval.py |
| 35 | On what date did Claude 3.5 Sonnet and tool use both become generally available across the Claude API, Amazon Bedrock, and Google Vertex AI? | ["https://docs.claude.com/en/release-notes/api#june-20th-2024","https://docs.claude.com/en/release-notes/api#may-30th-2024"] | python:file://eval_retrieval.py |
| 36 | In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe? | ["https://docs.claude.com/en/release-notes/claude-apps#june-5th-2024","https://docs.claude.com/en/release-notes/claude-apps#may-13th-2024"] | python:file://eval_retrieval.py |
| 37 | When the API response from Claude has a stop_reason of "tool_use", what does this indicate and what should be done next to continue the conversation? | ["https://docs.claude.com/en/docs/build-with-claude/tool-use#json-output","https://docs.claude.com/en/docs/build-with-claude/tool-use#how-tool-use-works"] | python:file://eval_retrieval.py |
| 38 | What Python libraries are used in the example code snippet for evaluating tone and style in a customer service chatbot? | ["https://docs.claude.com/en/docs/build-with-claude/develop-tests#example-evals"] | python:file://eval_retrieval.py |
| 39 | What are the two main ways to authenticate when using the Anthropic Python SDK to access Claude models on Amazon Bedrock? | ["https://docs.claude.com/en/api/claude-on-amazon-bedrock#install-an-sdk-for-accessing-bedrock","https://docs.claude.com/en/api/claude-on-amazon-bedrock#making-requests"] | python:file://eval_retrieval.py |
| 40 | When deciding whether to implement leak-resistant prompt engineering strategies, what two factors should be considered and balanced? | ["https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-prompt-leak#strategies-to-reduce-prompt-leak","https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-prompt-leak#before-you-try-to-reduce-prompt-leak"] | python:file://eval_retrieval.py |
| 41 | How can selecting the appropriate Claude model based on your specific requirements help reduce latency in your application? | ["https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#1-choose-the-right-model","https://docs.claude.com/en/docs/intro-to-claude#model-options"] | python:file://eval_retrieval.py |
| 42 | How can you stream responses from the Claude API using the Python SDK? | ["https://docs.claude.com/en/api/messages-streaming#streaming-with-sdks","https://docs.claude.com/en/api/client-sdks#python"] | python:file://eval_retrieval.py |
| 43 | How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case? | ["https://docs.claude.com/en/api/messages-examples#putting-words-in-claudes-mouth","https://docs.claude.com/en/api/messages-examples#basic-request-and-response"] | python:file://eval_retrieval.py |
| 44 | What is more important when building an eval set for an AI system - having a larger number of test cases with automated grading, or having fewer high-quality test cases graded by humans? | ["https://docs.claude.com/en/docs/build-with-claude/develop-tests#eval-design-principles","https://docs.claude.com/en/docs/build-with-claude/develop-tests#building-evals-and-test-cases"] | python:file://eval_retrieval.py |
| 45 | What are the two required fields in a content_block_delta event for a text delta type? | ["https://docs.claude.com/en/api/messages-streaming#delta-types","https://docs.claude.com/en/api/messages-streaming#text-delta"] | python:file://eval_retrieval.py |
| 46 | What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings? | ["https://docs.claude.com/en/docs/quickstart#next-steps","https://docs.claude.com/en/docs/welcome#develop-with-claude"] | python:file://eval_retrieval.py |
| 47 | Why does breaking a task into distinct subtasks for chained prompts help improve Claude's accuracy on the overall task? | ["https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#how-to-chain-prompts","https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#why-chain-prompts"] | python:file://eval_retrieval.py |
| 48 | How does the streaming format for Messages responses differ from Text Completions streaming responses? | ["https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#streaming-format"] | python:file://eval_retrieval.py |
| 49 | What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation? | ["https://docs.claude.com/en/docs/about-claude/models#get-started-with-claude"] | python:file://eval_retrieval.py |
| 50 | How can using chain prompts help reduce errors and inconsistency in complex tasks handled by Claude? | ["https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#why-chain-prompts","https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/increase-consistency#chain-prompts-for-complex-tasks"] | python:file://eval_retrieval.py |
| 51 | What HTTP status code does an overloaded_error event correspond to in a non-streaming context for the Claude API? | ["https://docs.claude.com/en/api/streaming#error-event-types","https://docs.claude.com/en/api/messages-streaming#error-events"] | python:file://eval_retrieval.py |
| 52 | What are the two ways to specify the format in which Voyage AI returns embeddings through its HTTP API? | ["https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-http-api"] | python:file://eval_retrieval.py |
| 53 | When streaming API requests that use tools, how are the input JSON deltas for tool_use content blocks sent, and how can they be accumulated and parsed by the client? | ["https://docs.claude.com/en/api/messages-streaming#input-json-delta","https://docs.claude.com/en/api/messages-streaming#streaming-request-with-tool-use"] | python:file://eval_retrieval.py |
| 54 | What are the two interactive prompt engineering tutorials that Anthropic offers, and how do they differ? | ["https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#prompt-engineering-interactive-tutorial","https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#prompt-engineering-tutorial"] | python:file://eval_retrieval.py |
| 55 | What are some of the key capabilities that make Claude suitable for enterprise use cases requiring integration with specialized applications and processing of large volumes of sensitive data? | ["https://docs.claude.com/en/docs/intro-to-claude#enterprise-considerations"] | python:file://eval_retrieval.py |
| 56 | As of June 2024, in which regions are Anthropic's Claude.ai API and iOS app available? | ["https://docs.claude.com/en/release-notes/claude-apps#may-1st-2024","https://docs.claude.com/en/release-notes/claude-apps#june-5th-2024","https://docs.claude.com/en/release-notes/claude-apps#may-13th-2024"] | python:file://eval_retrieval.py |
| 57 | What are the two main approaches for integrating Claude into a support ticket workflow, and how do they differ in terms of scalability and ease of implementation? | ["https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#integrate-claude-into-your-support-workflow","https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#introduction"] | python:file://eval_retrieval.py |
| 58 | When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available? | ["https://docs.claude.com/en/release-notes/api#may-10th-2024"] | python:file://eval_retrieval.py |
| 59 | Which Claude 3 model provides the best balance of intelligence and speed for high-throughput tasks like sales forecasting and targeted marketing? | ["https://docs.claude.com/en/api/claude-on-vertex-ai#api-model-names","https://docs.claude.com/en/docs/intro-to-claude#claude-3-family"] | python:file://eval_retrieval.py |
| 60 | How can you calculate the similarity between two Voyage embedding vectors, and what is this equivalent to since Voyage embeddings are normalized to length 1? | ["https://docs.claude.com/en/docs/build-with-claude/embeddings#faq","https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-embedding-example"] | python:file://eval_retrieval.py |
| 61 | How can using examples in prompts improve Claude's performance on complex tasks? | ["https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/multishot-prompting#why-use-examples","https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/increase-consistency#chain-prompts-for-complex-tasks"] | python:file://eval_retrieval.py |
| 62 | What are the two types of content block deltas that can be emitted when streaming responses with tool use, and what does each delta type contain? | ["https://docs.claude.com/en/api/messages-streaming#input-json-delta","https://docs.claude.com/en/api/messages-streaming#text-delta","https://docs.claude.com/en/api/messages-streaming#streaming-request-with-tool-use","https://docs.claude.com/en/api/messages-streaming#delta-types"] | python:file://eval_retrieval.py |
| 63 | What are two key capabilities of Claude that enable it to build interactive systems and personalized user experiences? | ["https://docs.claude.com/en/docs/build-with-claude/text-generation#text-capabilities-and-use-cases"] | python:file://eval_retrieval.py |
| 64 | What are the key event types included in a raw HTTP stream response when using message streaming, and what is the typical order they occur in? | ["https://docs.claude.com/en/api/messages-streaming#event-types","https://docs.claude.com/en/api/messages-streaming#raw-http-stream-response"] | python:file://eval_retrieval.py |
| 65 | What is the maximum number of images that can be included in a single request using the Claude API compared to the claude.ai interface? | ["https://docs.claude.com/en/docs/build-with-claude/vision#about-the-prompt-examples","https://docs.claude.com/en/docs/build-with-claude/vision#faq"] | python:file://eval_retrieval.py |
| 66 | When Claude's response is cut off due to hitting the max_tokens limit and contains an incomplete tool use block, what should you do to get the full tool use? | ["https://docs.claude.com/en/docs/build-with-claude/tool-use#troubleshooting-errors"] | python:file://eval_retrieval.py |
| 67 | What two steps are needed before running a classification evaluation on Claude according to the documentation? | ["https://docs.claude.com/en/docs/about-claude/use-cases/classification#3-run-your-eval","https://docs.claude.com/en/docs/about-claude/use-cases/classification#2-develop-your-test-cases"] | python:file://eval_retrieval.py |
| 68 | How can you use the content parameter in the messages list to influence Claude's response? | ["https://docs.claude.com/en/api/messages-examples#basic-request-and-response","https://docs.claude.com/en/api/messages-examples#putting-words-in-claudes-mouth"] | python:file://eval_retrieval.py |
| 69 | What are two key advantages of prompt engineering over fine-tuning when it comes to model comprehension and general knowledge preservation? | ["https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#when-to-prompt-engineer","https://docs.claude.com/en/docs/resources/glossary#fine-tuning"] | python:file://eval_retrieval.py |
| 70 | What are the two main steps to get started with making requests to Claude models on Anthropic's Bedrock API? | ["https://docs.claude.com/en/api/claude-on-amazon-bedrock#install-and-configure-the-aws-cli","https://docs.claude.com/en/api/claude-on-amazon-bedrock#making-requests"] | python:file://eval_retrieval.py |
| 71 | How can you check which Claude models are available in a specific AWS region using the AWS CLI? | ["https://docs.claude.com/en/api/claude-on-amazon-bedrock#subscribe-to-anthropic-models","https://docs.claude.com/en/api/claude-on-amazon-bedrock#list-available-models"] | python:file://eval_retrieval.py |
| 72 | What argument can be passed to the voyageai.Client.embed() method or the Voyage HTTP API to specify whether the input text is a query or a document? | ["https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-python-package","https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-http-api"] | python:file://eval_retrieval.py |
| 73 | How do the streaming API delta formats differ between tool_use content blocks and text content blocks? | ["https://docs.claude.com/en/api/messages-streaming#input-json-delta","https://docs.claude.com/en/api/messages-streaming#text-delta"] | python:file://eval_retrieval.py |
| 74 | What are the image file size limits when uploading images to Claude using the API versus on claude.ai? | ["https://docs.claude.com/en/docs/build-with-claude/vision#faq"] | python:file://eval_retrieval.py |
| 75 | What is one key consideration when selecting a Claude model for an enterprise use case that needs low latency? | ["https://docs.claude.com/en/docs/intro-to-claude#model-options","https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#1-choose-the-right-model"] | python:file://eval_retrieval.py |
| 76 | What embedding model does Anthropic recommend for code retrieval, and how does its performance compare to alternatives according to Voyage AI? | ["https://docs.claude.com/en/docs/build-with-claude/embeddings#how-to-get-embeddings-with-anthropic","https://docs.claude.com/en/docs/build-with-claude/embeddings#available-voyage-models"] | python:file://eval_retrieval.py |
| 77 | What are two ways the Claude Cookbooks can help developers learn to use Anthropic's APIs? | ["https://docs.claude.com/en/docs/welcome#develop-with-claude","https://docs.claude.com/en/docs/quickstart#next-steps"] | python:file://eval_retrieval.py |
| 78 | How does the size of the context window impact a language model's ability to utilize retrieval augmented generation (RAG)? | ["https://docs.claude.com/en/docs/resources/glossary#context-window","https://docs.claude.com/en/docs/resources/glossary#rag-retrieval-augmented-generation"] | python:file://eval_retrieval.py |
| 79 | How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications? | ["https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#understanding-results","https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#creating-test-cases"] | python:file://eval_retrieval.py |
| 80 | Which Claude model has the fastest comparative latency according to the comparison tables? | ["https://docs.claude.com/en/docs/about-claude/models#model-comparison","https://docs.claude.com/en/docs/about-claude/models#legacy-model-comparison"] | python:file://eval_retrieval.py |
| 81 | How can you build up a conversation with multiple turns using the Anthropic Messages API in Python? | ["https://docs.claude.com/en/api/client-sdks#python","https://docs.claude.com/en/api/messages-examples#multiple-conversational-turns"] | python:file://eval_retrieval.py |
| 82 | How can using XML tags to provide a specific role or context help improve Claude's analysis of a legal contract compared to not using a role prompt? | ["https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags#examples","https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/system-prompts#example-1-legal-contract-analysis"] | python:file://eval_retrieval.py |
| 83 | What are the key differences between how Claude 3 Opus and Claude 3 Sonnet handle missing information when making tool calls? | ["https://docs.claude.com/en/docs/build-with-claude/tool-use#chain-of-thought","https://docs.claude.com/en/docs/build-with-claude/tool-use#tool-use-examples"] | python:file://eval_retrieval.py |
| 84 | What steps should be taken to ensure a reliable deployment of an automated ticket routing system using Claude into a production environment? | ["https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#additional-considerations","https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#integrate-claude-into-your-support-workflow"] | python:file://eval_retrieval.py |
| 85 | How should you evaluate a model's performance on a ticket routing classifier? | ["https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#evaluating-the-performance-of-your-ticket-routing-classifier","https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#integrate-claude-into-your-support-workflow"] | python:file://eval_retrieval.py |
| 86 | What two methods does Anthropic recommend for learning how to prompt engineer with Claude before diving into the techniques? | ["https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#how-to-prompt-engineer","https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#prompt-engineering-tutorial"] | python:file://eval_retrieval.py |
| 87 | What are the key differences between a pretrained large language model and Claude in terms of their training and capabilities? | ["https://docs.claude.com/en/docs/resources/glossary#llm","https://docs.claude.com/en/docs/resources/glossary#pretraining"] | python:file://eval_retrieval.py |
| 88 | What are some key advantages of using prompt engineering instead of fine-tuning to adapt a pretrained language model for a specific task or domain? | ["https://docs.claude.com/en/docs/resources/glossary#fine-tuning","https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#when-to-prompt-engineer","https://docs.claude.com/en/docs/resources/glossary#pretraining"] | python:file://eval_retrieval.py |
| 89 | How can you authenticate with GCP before running requests to access Claude models on Vertex AI? | ["https://docs.claude.com/en/api/claude-on-vertex-ai#making-requests","https://docs.claude.com/en/api/claude-on-vertex-ai#accessing-vertex-ai"] | python:file://eval_retrieval.py |
| 90 | What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks? | ["https://docs.claude.com/en/release-notes/api#may-10th-2024"] | python:file://eval_retrieval.py |
| 91 | On what date did both the Claude 3.5 Sonnet model and the Artifacts feature in Claude.ai become available? | ["https://docs.claude.com/en/release-notes/api#june-20th-2024","https://docs.claude.com/en/release-notes/claude-apps#june-20th-2024"] | python:file://eval_retrieval.py |
| 92 | When putting words in Claude's mouth to shape the response, what header and value can you use in the request to limit Claude's response to a single token? | ["https://docs.claude.com/en/api/messages-examples#basic-request-and-response","https://docs.claude.com/en/api/messages-examples#putting-words-in-claudes-mouth"] | python:file://eval_retrieval.py |
| 93 | What does the temperature parameter do when working with large language models? | ["https://docs.claude.com/en/docs/resources/glossary#temperature","https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#2-optimize-prompt-and-output-length"] | python:file://eval_retrieval.py |
| 94 | What are two ways to specify API parameters when calling the Claude API using Claude for Sheets? | ["https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#tips-for-effective-evaluation","https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/prefill-claudes-response#how-to-prefill-claudes-response","https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#enter-your-first-prompt"] | python:file://eval_retrieval.py |
| 95 | How does prefilling the response with an opening curly brace ({ ) affect Claude's output when extracting structured data from text? | ["https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/prefill-claudes-response#example-1-controlling-output-formatting-and-skipping-the-preamble"] | python:file://eval_retrieval.py |
| 96 | What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude? | ["https://docs.claude.com/en/docs/build-with-claude/vision#dive-deeper-into-vision","https://docs.claude.com/en/docs/build-with-claude/vision#about-the-prompt-examples"] | python:file://eval_retrieval.py |
| 97 | How do you specify the API key when creating a new Anthropic client in the Python and TypeScript SDK examples? | ["https://docs.claude.com/en/api/client-sdks#typescript","https://docs.claude.com/en/api/client-sdks#python"] | python:file://eval_retrieval.py |
| 98 | What are two key benefits of using the Anthropic Evaluation tool when developing prompts for an AI classification application? | ["https://docs.claude.com/en/docs/about-claude/use-cases/classification#2-develop-your-test-cases","https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#understanding-results"] | python:file://eval_retrieval.py |
| 99 | What are the key differences between a pretrained language model like Claude's underlying model, and the final version of Claude available through Anthropic's API? | ["https://docs.claude.com/en/docs/resources/glossary#pretraining","https://docs.claude.com/en/docs/resources/glossary#llm","https://docs.claude.com/en/docs/resources/glossary#fine-tuning"] | python:file://eval_retrieval.py |
| 100 | What is the IPv6 address range used by Anthropic? | ["https://docs.claude.com/en/api/ip-addresses#ipv6"] | python:file://eval_retrieval.py |
| 101 | When using the Python SDK to create a message with Claude, what are two ways you can specify your API key? | ["https://docs.claude.com/en/api/messages-examples#multiple-conversational-turns","https://docs.claude.com/en/api/client-sdks#python"] | python:file://eval_retrieval.py |