mirror of
https://github.com/anthropics/claude-cookbooks.git
synced 2025-10-06 01:00:28 +03:00
chore: Updates all references per new branding.
This commit is contained in:
@@ -6,7 +6,7 @@ description: Validate Claude model usage against current public models
|
||||
Review the changed files for Claude model usage.
|
||||
|
||||
First, fetch the current list of allowed models from:
|
||||
https://docs.anthropic.com/en/docs/about-claude/models/overview.md
|
||||
https://docs.claude.com/en/docs/about-claude/models/overview.md
|
||||
|
||||
Then check:
|
||||
1. All model references are from the current public models list
|
||||
|
||||
@@ -7,7 +7,7 @@ Review the changes to Jupyter notebooks and Python scripts in this PR. Please ch
|
||||
|
||||
## Model Usage
|
||||
Verify all Claude model references against the current list at:
|
||||
https://docs.anthropic.com/en/docs/about-claude/models/overview.md
|
||||
https://docs.claude.com/en/docs/about-claude/models/overview.md
|
||||
- Flag any deprecated models (older Sonnet 3.5, Opus 3 versions)
|
||||
- Flag any internal/non-public model names
|
||||
- Suggest current alternatives when issues found
|
||||
@@ -17,7 +17,7 @@ https://docs.anthropic.com/en/docs/about-claude/models/overview.md
|
||||
- Python code follows PEP 8 conventions
|
||||
- Proper error handling
|
||||
- Clear variable names and documentation
|
||||
- No hardcoded API keys (use os.getenv("ANTHROPIC_API_KEY"))
|
||||
- No hardcoded API keys (use os.getenv("CLAUDE_API_KEY"))
|
||||
|
||||
## Notebook Structure
|
||||
- Clear introduction explaining what the notebook demonstrates and why it's useful
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
# Anthropic API Configuration
|
||||
# Claude API Configuration
|
||||
# Copy this file to .env and add your API key
|
||||
# Get your API key at: https://console.anthropic.com/settings/keys
|
||||
# Get your API key at: https://platform.claude.com/settings/keys
|
||||
|
||||
ANTHROPIC_API_KEY=sk-ant-api03-...
|
||||
CLAUDE_API_KEY=sk-ant-api03-...
|
||||
|
||||
# Optional: Default model for testing (recommended for cost savings)
|
||||
CLAUDE_MODEL=claude-3-5-haiku-latest
|
||||
|
||||
2
.github/workflows/claude-link-review.yml
vendored
2
.github/workflows/claude-link-review.yml
vendored
@@ -25,7 +25,7 @@ jobs:
|
||||
- name: Run Claude Link Review
|
||||
uses: anthropics/claude-code-action@v1
|
||||
with:
|
||||
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
CLAUDE_API_KEY: ${{ secrets.CLAUDE_API_KEY }}
|
||||
github_token: ${{ secrets.GITHUB_TOKEN }}
|
||||
prompt: "/link-review"
|
||||
claude_args: |
|
||||
|
||||
2
.github/workflows/claude-model-check.yml
vendored
2
.github/workflows/claude-model-check.yml
vendored
@@ -24,7 +24,7 @@ jobs:
|
||||
- name: Claude Model Validation
|
||||
uses: anthropics/claude-code-action@v1
|
||||
with:
|
||||
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
CLAUDE_API_KEY: ${{ secrets.CLAUDE_API_KEY }}
|
||||
github_token: ${{ secrets.GITHUB_TOKEN }}
|
||||
prompt: "/model-check"
|
||||
claude_args: |
|
||||
|
||||
2
.github/workflows/claude-notebook-review.yml
vendored
2
.github/workflows/claude-notebook-review.yml
vendored
@@ -25,7 +25,7 @@ jobs:
|
||||
- name: Run Claude Notebook Review
|
||||
uses: anthropics/claude-code-action@v1
|
||||
with:
|
||||
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
CLAUDE_API_KEY: ${{ secrets.CLAUDE_API_KEY }}
|
||||
github_token: ${{ secrets.GITHUB_TOKEN }}
|
||||
prompt: "/notebook-review"
|
||||
claude_args: |
|
||||
|
||||
4
.github/workflows/notebook-quality.yml
vendored
4
.github/workflows/notebook-quality.yml
vendored
@@ -57,7 +57,7 @@ jobs:
|
||||
if: github.event_name == 'pull_request' && steps.validate.outputs.has_issues == 'true'
|
||||
uses: anthropics/claude-code-action@v1
|
||||
with:
|
||||
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
CLAUDE_API_KEY: ${{ secrets.CLAUDE_API_KEY }}
|
||||
github_token: ${{ secrets.GITHUB_TOKEN }}
|
||||
prompt: |
|
||||
The notebook validation found these issues:
|
||||
@@ -88,7 +88,7 @@ jobs:
|
||||
github.event.pull_request.author_association == 'MEMBER' ||
|
||||
github.event.pull_request.author_association == 'OWNER'
|
||||
env:
|
||||
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
CLAUDE_API_KEY: ${{ secrets.CLAUDE_API_KEY }}
|
||||
run: |
|
||||
mkdir -p test_outputs
|
||||
for notebook in $(find . -name "*.ipynb" -not -path "*/.*" -not -path "*/test_outputs/*"); do
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Contributing to Anthropic Cookbook
|
||||
# Contributing to Claude Cookbook
|
||||
|
||||
Thank you for your interest in contributing to the Anthropic Cookbook! This guide will help you get started with development and ensure your contributions meet our quality standards.
|
||||
Thank you for your interest in contributing to the Claude Cookbook! This guide will help you get started with development and ensure your contributions meet our quality standards.
|
||||
|
||||
## Development Setup
|
||||
|
||||
@@ -45,7 +45,7 @@ Thank you for your interest in contributing to the Anthropic Cookbook! This guid
|
||||
5. **Set up your API key**:
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# Edit .env and add your Anthropic API key
|
||||
# Edit .env and add your Claude API key
|
||||
```
|
||||
|
||||
## Quality Standards
|
||||
@@ -113,12 +113,12 @@ If a hook fails, fix the issues and try committing again.
|
||||
1. **Use environment variables for API keys**:
|
||||
```python
|
||||
import os
|
||||
api_key = os.environ.get("ANTHROPIC_API_KEY")
|
||||
api_key = os.environ.get("CLAUDE_API_KEY")
|
||||
```
|
||||
|
||||
2. **Use current Claude models**:
|
||||
- Use model aliases (e.g., `claude-3-5-haiku-latest`) for better maintainability
|
||||
- Check current models at: https://docs.anthropic.com/en/docs/about-claude/models/overview
|
||||
- Check current models at: https://docs.claude.com/en/docs/about-claude/models/overview
|
||||
- Claude will automatically validate model usage in PR reviews
|
||||
|
||||
3. **Keep notebooks focused**:
|
||||
|
||||
14
README.md
14
README.md
@@ -1,26 +1,26 @@
|
||||
# Anthropic Cookbook
|
||||
# Claude Cookbook
|
||||
|
||||
The Anthropic Cookbook provides code and guides designed to help developers build with Claude, offering copy-able code snippets that you can easily integrate into your own projects.
|
||||
The Claude Cookbook provides code and guides designed to help developers build with Claude, offering copy-able code snippets that you can easily integrate into your own projects.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
To make the most of the examples in this cookbook, you'll need an Anthropic API key (sign up for free [here](https://www.anthropic.com)).
|
||||
To make the most of the examples in this cookbook, you'll need an Claude API key (sign up for free [here](https://www.anthropic.com)).
|
||||
|
||||
While the code examples are primarily written in Python, the concepts can be adapted to any programming language that supports interaction with the Anthropic API.
|
||||
While the code examples are primarily written in Python, the concepts can be adapted to any programming language that supports interaction with the Claude API.
|
||||
|
||||
If you're new to working with the Anthropic API, we recommend starting with our [Anthropic API Fundamentals course](https://github.com/anthropics/courses/tree/master/anthropic_api_fundamentals) to get a solid foundation.
|
||||
If you're new to working with the Claude API, we recommend starting with our [Claude API Fundamentals course](https://github.com/anthropics/courses/tree/master/anthropic_api_fundamentals) to get a solid foundation.
|
||||
|
||||
## Explore Further
|
||||
|
||||
Looking for more resources to enhance your experience with Claude and AI assistants? Check out these helpful links:
|
||||
|
||||
- [Anthropic developer documentation](https://docs.anthropic.com/claude/docs/guide-to-anthropics-prompt-engineering-resources)
|
||||
- [Anthropic developer documentation](https://docs.claude.com/claude/docs/guide-to-anthropics-prompt-engineering-resources)
|
||||
- [Anthropic support docs](https://support.anthropic.com)
|
||||
- [Anthropic Discord community](https://www.anthropic.com/discord)
|
||||
|
||||
## Contributing
|
||||
|
||||
The Anthropic Cookbook thrives on the contributions of the developer community. We value your input, whether it's submitting an idea, fixing a typo, adding a new guide, or improving an existing one. By contributing, you help make this resource even more valuable for everyone.
|
||||
The Claude Cookbook thrives on the contributions of the developer community. We value your input, whether it's submitting an idea, fixing a typo, adding a new guide, or improving an existing one. By contributing, you help make this resource even more valuable for everyone.
|
||||
|
||||
To avoid duplication of efforts, please review the existing issues and pull requests before contributing.
|
||||
|
||||
|
||||
@@ -3,7 +3,7 @@
|
||||
# Create a token at: https://github.com/settings/tokens
|
||||
GITHUB_TOKEN="your-github-personal-access-token-here"
|
||||
|
||||
# Anthropic API Key
|
||||
# Claude API Key
|
||||
# Required for using Claude SDK
|
||||
# Get your key at: https://console.anthropic.com/settings/keys
|
||||
ANTHROPIC_API_KEY="sk-ant-api03-your-api-key-here"
|
||||
# Get your key at: https://platform.claude.com/settings/keys
|
||||
CLAUDE_API_KEY="sk-ant-api03-your-api-key-here"
|
||||
|
||||
@@ -41,9 +41,9 @@
|
||||
"\n",
|
||||
"Instead, a research agent requires the flexibility to explore unexpected leads and change direction based on what it finds. In its simplest form, a research agent can be an agent that simply searches the internet and summarizes it for you. \n",
|
||||
"\n",
|
||||
"Below, we'll implement a basic research agent with just a few lines of code. We provide Claude with exactly one tool which the Claude Code SDK contains straight out of the box: [web search tool](https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/web-search-tool). \n",
|
||||
"Below, we'll implement a basic research agent with just a few lines of code. We provide Claude with exactly one tool which the Claude Code SDK contains straight out of the box: [web search tool](https://docs.claude.com/en/docs/agents-and-tools/tool-use/web-search-tool). \n",
|
||||
"\n",
|
||||
"> Check [here](https://docs.anthropic.com/en/docs/claude-code/settings#tools-available-to-claude) for a list of Claude Code's readily available tools"
|
||||
"> Check [here](https://docs.claude.com/en/docs/claude-code/settings#tools-available-to-claude) for a list of Claude Code's readily available tools"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -120,7 +120,7 @@
|
||||
"\n",
|
||||
"So far, we have laid out a very simple (maybe naive) implementation to illustrate how you can start leveraging the SDK to build a research agent. However, there are various ways we can improve our agent to turn it production ready. Let's cover a few of them:\n",
|
||||
"\n",
|
||||
"1. Notice how before we only sent one query? In many systems, a human will look at the output of the system, potentially assigning a follow up task. Just like text completions, if we want to send multiple queries to the agent (e.g., 1. analyze abc, 2. make xyz based on your analysis) we would have to copy over the entire analysis context in our second query. Instead, we can **[use the ClaudeSDKClient](https://docs.anthropic.com/en/docs/claude-code/sdk/sdk-python#1-the-claudesdkclient-class-recommended)** to maintain the conversation context for us.\n",
|
||||
"1. Notice how before we only sent one query? In many systems, a human will look at the output of the system, potentially assigning a follow up task. Just like text completions, if we want to send multiple queries to the agent (e.g., 1. analyze abc, 2. make xyz based on your analysis) we would have to copy over the entire analysis context in our second query. Instead, we can **[use the ClaudeSDKClient](https://docs.claude.com/en/docs/claude-code/sdk/sdk-python#1-the-claudesdkclient-class-recommended)** to maintain the conversation context for us.\n",
|
||||
"\n",
|
||||
"2. Another great way of steering the system is **providing a system prompt**, akin to a system prompt used for text completions. To learn how to write a good system prompt for a research agent, we recommend looking [here](https://github.com/anthropics/anthropic-cookbook/tree/main/patterns/agents/prompts).\n",
|
||||
"\n",
|
||||
|
||||
@@ -28,7 +28,7 @@
|
||||
"cell_type": "markdown",
|
||||
"id": "08cc95b6",
|
||||
"metadata": {},
|
||||
"source": "In the previous notebooks we have built a basic research agent and a Chief of Staff multi-agent framework. While the agents we have built are already powerful, they were still limited in what they could do: the web search agent is limited to searching the internet and our Chief of Staff agent was limited to interacting with its own filesystem.\n\nThis is a serious constraint: real-world agents often need to interact with other systems like databases, APIs, file systems, and other specialized services. [MCP (Model Context Protocol)](https://modelcontextprotocol.io/docs/getting-started/intro) is an open-source standard for AI-tool integrations that allows for an easy connection between our agents and these external systems. In this notebook, we will explore how to connect MCP servers to our agent.\n\n**Need more details on MCP?** For comprehensive setup instructions, configuration best practices, and troubleshooting tips, see the [Claude Code MCP documentation](https://docs.anthropic.com/en/docs/claude-code/mcp)."
|
||||
"source": "In the previous notebooks we have built a basic research agent and a Chief of Staff multi-agent framework. While the agents we have built are already powerful, they were still limited in what they could do: the web search agent is limited to searching the internet and our Chief of Staff agent was limited to interacting with its own filesystem.\n\nThis is a serious constraint: real-world agents often need to interact with other systems like databases, APIs, file systems, and other specialized services. [MCP (Model Context Protocol)](https://modelcontextprotocol.io/docs/getting-started/intro) is an open-source standard for AI-tool integrations that allows for an easy connection between our agents and these external systems. In this notebook, we will explore how to connect MCP servers to our agent.\n\n**Need more details on MCP?** For comprehensive setup instructions, configuration best practices, and troubleshooting tips, see the [Claude Code MCP documentation](https://docs.claude.com/en/docs/claude-code/mcp)."
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
|
||||
@@ -22,11 +22,11 @@ A tutorial series demonstrating how to build sophisticated general-purpose agent
|
||||
|
||||
```uv run python -m ipykernel install --user --name="cc-sdk-tutorial" --display-name "Python (cc-sdk-tutorial)" ```
|
||||
|
||||
#### 4. Anthropic API Key
|
||||
1. Visit [console.anthropic.com](https://console.anthropic.com/dashboard)
|
||||
#### 4. Claude API Key
|
||||
1. Visit [platform.claude.com](https://platform.claude.com/dashboard)
|
||||
2. Sign up or log in to your account
|
||||
3. Click on "Get API keys"
|
||||
4. Copy the key and paste it into your `.env` file as ```ANTHROPIC_API_KEY=```
|
||||
4. Copy the key and paste it into your `.env` file as ```CLAUDE_API_KEY=```
|
||||
|
||||
#### 5. GitHub Token for Notebook 02
|
||||
If you plan to work through the Observability Agent notebook:
|
||||
|
||||
@@ -16,7 +16,7 @@
|
||||
"\n",
|
||||
"This notebook demonstrates how to use Claude 3.7 Sonnet's extended thinking feature with various examples and edge cases.\n",
|
||||
"\n",
|
||||
"Extended thinking gives Claude 3.7 Sonnet enhanced reasoning capabilities for complex tasks, while also providing transparency into its step-by-step thought process before it delivers its final answer. When extended thinking is turned on, Claude creates `thinking` content blocks where it outputs its internal reasoning. Claude incorporates insights from this reasoning before crafting a final response. For more information on extended thinking, see our [documentation](https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking)."
|
||||
"Extended thinking gives Claude 3.7 Sonnet enhanced reasoning capabilities for complex tasks, while also providing transparency into its step-by-step thought process before it delivers its final answer. When extended thinking is turned on, Claude creates `thinking` content blocks where it outputs its internal reasoning. Claude incorporates insights from this reasoning before crafting a final response. For more information on extended thinking, see our [documentation](https://docs.claude.com/en/docs/build-with-claude/extended-thinking)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -59,7 +59,7 @@
|
||||
"import os\n",
|
||||
"\n",
|
||||
"# Set your API key as an environment variable or directly\n",
|
||||
"# os.environ[\"ANTHROPIC_API_KEY\"] = \"your-api-key-here\"\n",
|
||||
"# os.environ[\"CLAUDE_API_KEY\"] = \"your-api-key-here\"\n",
|
||||
"\n",
|
||||
"# Initialize the client\n",
|
||||
"client = anthropic.Anthropic()\n",
|
||||
@@ -561,7 +561,7 @@
|
||||
"\n",
|
||||
"Error with too small thinking budget: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'thinking.enabled.budget_tokens: Input should be greater than or equal to 1024'}}\n",
|
||||
"\n",
|
||||
"Error with temperature and thinking: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': '`temperature` may only be set to 1 when thinking is enabled. Please consult our documentation at https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#important-considerations-when-using-extended-thinking'}}\n",
|
||||
"Error with temperature and thinking: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': '`temperature` may only be set to 1 when thinking is enabled. Please consult our documentation at https://docs.claude.com/en/docs/build-with-claude/extended-thinking#important-considerations-when-using-extended-thinking'}}\n",
|
||||
"\n",
|
||||
"Error from exceeding context window: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'prompt is too long: 214315 tokens > 204798 maximum'}}\n"
|
||||
]
|
||||
|
||||
@@ -14,7 +14,7 @@
|
||||
"\n",
|
||||
"This notebook demonstrates how to use Claude 3.7 Sonnet's extended thinking feature with tools. The extended thinking feature allows you to see Claude's step-by-step thinking before it provides a final answer, providing transparency into how it decides which tools to use and how it interprets tool results.\n",
|
||||
"\n",
|
||||
"When using extended thinking with tool use, the model will show its thinking before making tool requests, but not repeat the thinking process after receiving tool results. Claude will not output another thinking block until after the next non-`tool_result` `user` turn. For more information on extended thinking, see our [documentation](https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking)."
|
||||
"When using extended thinking with tool use, the model will show its thinking before making tool requests, but not repeat the thinking process after receiving tool results. Claude will not output another thinking block until after the next non-`tool_result` `user` turn. For more information on extended thinking, see our [documentation](https://docs.claude.com/en/docs/build-with-claude/extended-thinking)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -63,7 +63,7 @@
|
||||
"THINKING_BUDGET_TOKENS = 2000\n",
|
||||
"\n",
|
||||
"# Set your API key as an environment variable or directly\n",
|
||||
"# os.environ[\"ANTHROPIC_API_KEY\"] = \"your_api_key_here\"\n",
|
||||
"# os.environ[\"CLAUDE_API_KEY\"] = \"your_api_key_here\"\n",
|
||||
"\n",
|
||||
"# Initialize the client\n",
|
||||
"client = anthropic.Anthropic()\n",
|
||||
@@ -656,7 +656,7 @@
|
||||
"Tool result: {'temperature': 60, 'condition': 'Foggy'}\n",
|
||||
"\n",
|
||||
"=== TEST 1: WITHOUT thinking block ===\n",
|
||||
"ERROR: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'messages.1.content.0.type: Expected `thinking` or `redacted_thinking`, but found `tool_use`. When `thinking` is enabled, a final `assistant` message must start with a thinking block (preceeding the lastmost set of `tool_use` and `tool_result` blocks). We recommend you include thinking blocks from previous turns. To avoid this requirement, disable `thinking`. Please consult our documentation at https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking'}}\n",
|
||||
"ERROR: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'messages.1.content.0.type: Expected `thinking` or `redacted_thinking`, but found `tool_use`. When `thinking` is enabled, a final `assistant` message must start with a thinking block (preceeding the lastmost set of `tool_use` and `tool_result` blocks). We recommend you include thinking blocks from previous turns. To avoid this requirement, disable `thinking`. Please consult our documentation at https://docs.claude.com/en/docs/build-with-claude/extended-thinking'}}\n",
|
||||
"This demonstrates that thinking blocks must be preserved\n",
|
||||
"\n",
|
||||
"=== TEST 2: WITH thinking block (correct approach) ===\n",
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Lychee configuration for Anthropic Cookbook
|
||||
# Lychee configuration for Claude Cookbook
|
||||
# Validates links in notebooks and documentation
|
||||
|
||||
# Core settings
|
||||
@@ -35,7 +35,7 @@ exclude_path = [
|
||||
# Exclude API endpoints and local development URLs from link checking
|
||||
exclude = [
|
||||
"https://api.anthropic.com.*",
|
||||
"https://console.anthropic.com.*",
|
||||
"https://platform.claude.com.*",
|
||||
"https://www.claude.ai/",
|
||||
"http://localhost.*",
|
||||
"http://127.0.0.1.*"
|
||||
|
||||
@@ -32,7 +32,7 @@
|
||||
"Based on the guidelines above, classify this text as either ALLOW or BLOCK. Return nothing else.\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"To use this, you would replace `{{USER_TEXT}}` with the actual user-generated text to be classified, and then send the prompt to Claude using the Anthropic API. Claude's response should be either \"ALLOW\" or \"BLOCK\", indicating how the text should be handled based on your provided guidelines."
|
||||
"To use this, you would replace `{{USER_TEXT}}` with the actual user-generated text to be classified, and then send the prompt to Claude using the Claude API. Claude's response should be either \"ALLOW\" or \"BLOCK\", indicating how the text should be handled based on your provided guidelines."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -22,7 +22,7 @@
|
||||
"\n",
|
||||
"Here we'd call thing1 and thing2 the \"variables\" -- and you want your prompt to behave well for many different possible values of thing1 and thing2.\n",
|
||||
"\n",
|
||||
"How can you test this prompt template? Maybe you have some real-life values you can substitute in. But maybe you don't, or maybe you aren't allowed to test on the ones you do have for privacy reasons. No worries -- Claude can make them up! This cookbook demonstrates how to generate synthetic test data for your prompts using Claude & the Anthropic API. It includes functions for extracting variables from templates, constructing example blocks, generating test cases, and iteratively refining the results. The benefits of this are twofold:\n",
|
||||
"How can you test this prompt template? Maybe you have some real-life values you can substitute in. But maybe you don't, or maybe you aren't allowed to test on the ones you do have for privacy reasons. No worries -- Claude can make them up! This cookbook demonstrates how to generate synthetic test data for your prompts using Claude & the Claude API. It includes functions for extracting variables from templates, constructing example blocks, generating test cases, and iteratively refining the results. The benefits of this are twofold:\n",
|
||||
"\n",
|
||||
"1. Prompt Evaluation\n",
|
||||
"You can use these test cases to see how Claude will perform on realistic examples.\n",
|
||||
@@ -241,7 +241,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def get_test_data(prompt_template, examples, custom_planning=None):\n",
|
||||
" \"\"\"Generate test data using the Anthropic API.\"\"\"\n",
|
||||
" \"\"\"Generate test data using the Claude API.\"\"\"\n",
|
||||
" synth_eval_prompt_ready = format_prompt_template_for_synth_evals(prompt_template, examples)\n",
|
||||
"\n",
|
||||
" messages = [\n",
|
||||
|
||||
@@ -39,7 +39,7 @@
|
||||
"from anthropic import Anthropic\n",
|
||||
"import sqlite3\n",
|
||||
"\n",
|
||||
"# Set up the Anthropic API client\n",
|
||||
"# Set up the Claude API client\n",
|
||||
"client = Anthropic()\n",
|
||||
"MODEL_NAME = \"claude-3-opus-20240229\""
|
||||
]
|
||||
|
||||
@@ -39,9 +39,9 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"STABILITY_API_KEY = \"\" # Stability API key goes here\n",
|
||||
"ANTHROPIC_API_KEY = \"\" # Anthropic API key goes here\n",
|
||||
"CLAUDE_API_KEY = \"\" # Claude API key goes here\n",
|
||||
"MODEL_NAME = \"claude-3-opus-20240229\"\n",
|
||||
"CLIENT = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)"
|
||||
"CLIENT = anthropic.Anthropic(api_key=CLAUDE_API_KEY)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -47,7 +47,7 @@
|
||||
"import anthropic, os, re, requests, trio, pandas as pd\n",
|
||||
"import numpy as np\n",
|
||||
"from bs4 import BeautifulSoup\n",
|
||||
"API_KEY = os.environ['ANTHROPIC_API_KEY']\n",
|
||||
"API_KEY = os.environ['CLAUDE_API_KEY']\n",
|
||||
"CLIENT = anthropic.Anthropic(api_key=API_KEY)"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -22,7 +22,7 @@
|
||||
"source": [
|
||||
"### Using This Notebook\n",
|
||||
"The notebook is designed to be maximally easy to use. You don't have to write any code. Just follow these steps:\n",
|
||||
"- Enter your Anthropic API key in between quotation marks where it says \"Put your API key here!\"\n",
|
||||
"- Enter your Claude API key in between quotation marks where it says \"Put your API key here!\"\n",
|
||||
"- Enter your task where it says \"Replace with your task!\"\n",
|
||||
"- Optionally, enter an all-caps list of variables in quotes separated by commas where it says \"specify the input variables you want Claude to use\".\n",
|
||||
"\n",
|
||||
@@ -48,9 +48,9 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import anthropic, re\n",
|
||||
"ANTHROPIC_API_KEY = \"\" # Put your API key here!\n",
|
||||
"CLAUDE_API_KEY = \"\" # Put your API key here!\n",
|
||||
"MODEL_NAME = \"claude-3-5-sonnet-20241022\"\n",
|
||||
"CLIENT = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)"
|
||||
"CLIENT = anthropic.Anthropic(api_key=CLAUDE_API_KEY)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -72,7 +72,7 @@
|
||||
"id": "xrDg6fb5_Bmo"
|
||||
},
|
||||
"source": [
|
||||
"We already have a PDF available in the `../multimodal/documents` directory. We'll convert the PDF file into base64 encoded bytes. This is the format required for the [PDF document block](https://docs.anthropic.com/en/docs/build-with-claude/pdf-support) in the Anthropic API. Note that this type of extraction works for both text and visual elements (like charts and graphs)."
|
||||
"We already have a PDF available in the `../multimodal/documents` directory. We'll convert the PDF file into base64 encoded bytes. This is the format required for the [PDF document block](https://docs.claude.com/en/docs/build-with-claude/pdf-support) in the Claude API. Note that this type of extraction works for both text and visual elements (like charts and graphs)."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Prompt caching through the Anthropic API\n",
|
||||
"# Prompt caching through the Claude API\n",
|
||||
"\n",
|
||||
"Prompt caching allows you to store and reuse context within your prompt. This makes it more practical to include additional information in your prompt—such as detailed instructions and example responses—which help improve every response Claude generates.\n",
|
||||
"\n",
|
||||
|
||||
@@ -37,7 +37,7 @@
|
||||
"# Import the required libraries\n",
|
||||
"from anthropic import Anthropic\n",
|
||||
"\n",
|
||||
"# Set up the Anthropic API client\n",
|
||||
"# Set up the Claude API client\n",
|
||||
"client = Anthropic()\n",
|
||||
"MODEL_NAME = \"claude-3-haiku-20240229\""
|
||||
]
|
||||
|
||||
@@ -6,7 +6,7 @@
|
||||
"source": [
|
||||
"# Citations \n",
|
||||
"\n",
|
||||
"The Anthropic API features citation support that enables Claude to provide detailed citations when answering questions about documents. Citations are a valuable affordance in many LLM powered applications to help users track and verify the sources of information in responses.\n",
|
||||
"The Claude API features citation support that enables Claude to provide detailed citations when answering questions about documents. Citations are a valuable affordance in many LLM powered applications to help users track and verify the sources of information in responses.\n",
|
||||
"\n",
|
||||
"Citations are supported on:\n",
|
||||
"* `claude-3-5-sonnet-20241022`\n",
|
||||
@@ -17,7 +17,7 @@
|
||||
"- The citation feature will not return citations pointing to documents or locations that were not provided as valid sources.\n",
|
||||
"- While testing we found the citation feature to generate citations with higher recall and percision than prompt based techniques.\n",
|
||||
"\n",
|
||||
"The documentation for citations can be found [here](https://docs.anthropic.com/en/docs/build-with-claude/citations)."
|
||||
"The documentation for citations can be found [here](https://docs.claude.com/en/docs/build-with-claude/citations)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -48,10 +48,10 @@
|
||||
"import os\n",
|
||||
"import json\n",
|
||||
"\n",
|
||||
"ANTHROPIC_API_KEY = os.environ.get(\"ANTHROPIC_API_KEY\")\n",
|
||||
"# ANTHROPIC_API_KEY = \"\" # Put your API key here!\n",
|
||||
"CLAUDE_API_KEY = os.environ.get(\"CLAUDE_API_KEY\")\n",
|
||||
"# CLAUDE_API_KEY = \"\" # Put your API key here!\n",
|
||||
"\n",
|
||||
"client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)"
|
||||
"client = anthropic.Anthropic(api_key=CLAUDE_API_KEY)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -14,7 +14,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Step 1: Set up the environment\n",
|
||||
"First, let's install the required libraries and set up the Anthropic API client."
|
||||
"First, let's install the required libraries and set up the Claude API client."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -42,7 +42,7 @@
|
||||
"import requests\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"# Set up the Anthropic API client\n",
|
||||
"# Set up the Claude API client\n",
|
||||
"client = Anthropic()\n",
|
||||
"MODEL_NAME = \"claude-3-haiku-20240229\""
|
||||
]
|
||||
|
||||
@@ -45,7 +45,7 @@
|
||||
"\n",
|
||||
"### Prerequisites & Security\n",
|
||||
"\n",
|
||||
"- **Admin API Key**: Get from [Anthropic Console](https://console.anthropic.com/settings/admin-keys) (format: `sk-ant-admin...`)\n",
|
||||
"- **Admin API Key**: Get from [Claude Console](https://platform.claude.com/settings/admin-keys) (format: `sk-ant-admin...`)\n",
|
||||
"- **Security**: Store keys in environment variables, rotate regularly, never commit to version control"
|
||||
]
|
||||
},
|
||||
@@ -816,7 +816,7 @@
|
||||
"\n",
|
||||
"### Next Steps\n",
|
||||
"\n",
|
||||
"- Check the [official API documentation](https://docs.anthropic.com) for the latest field definitions\n",
|
||||
"- Check the [official API documentation](https://docs.claude.com) for the latest field definitions\n",
|
||||
"- Test your integration with small date ranges first\n",
|
||||
"- Consider data retention needs for your use case\n",
|
||||
"- Monitor for new API features that may enhance your analysis\n",
|
||||
|
||||
@@ -2,7 +2,7 @@ from anthropic import Anthropic
|
||||
import os
|
||||
import re
|
||||
|
||||
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
|
||||
client = Anthropic(api_key=os.environ["CLAUDE_API_KEY"])
|
||||
|
||||
def llm_call(prompt: str, system_prompt: str = "", model="claude-3-5-sonnet-20241022") -> str:
|
||||
"""
|
||||
@@ -16,7 +16,7 @@ def llm_call(prompt: str, system_prompt: str = "", model="claude-3-5-sonnet-2024
|
||||
Returns:
|
||||
str: The response from the language model.
|
||||
"""
|
||||
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
|
||||
client = Anthropic(api_key=os.environ["CLAUDE_API_KEY"])
|
||||
messages = [{"role": "user", "content": prompt}]
|
||||
response = client.messages.create(
|
||||
model=model,
|
||||
|
||||
@@ -153,7 +153,7 @@ class NotebookValidator:
|
||||
"type": "hardcoded_api_key",
|
||||
"severity": "critical",
|
||||
"cell": i,
|
||||
"details": "Hardcoded Anthropic API key detected"
|
||||
"details": "Hardcoded Claude API key detected"
|
||||
})
|
||||
elif 'api_key=' in source.lower() and 'os.environ' not in source and 'getenv' not in source:
|
||||
result["status"] = "error"
|
||||
@@ -166,7 +166,7 @@ class NotebookValidator:
|
||||
|
||||
# Execute notebook if in full mode
|
||||
if mode == "full" and result["status"] != "error":
|
||||
if os.environ.get("ANTHROPIC_API_KEY"):
|
||||
if os.environ.get("CLAUDE_API_KEY"):
|
||||
exec_result = self.execute_notebook(notebook_path)
|
||||
if not exec_result["success"]:
|
||||
result["status"] = "error"
|
||||
@@ -306,8 +306,8 @@ Overall: {passing}/{total} notebooks passing ({percentage:.1f}%)
|
||||
dashboard += " → Run with --auto-fix to update deprecated models\n"
|
||||
if critical_issues:
|
||||
dashboard += " → Fix critical security issues first\n"
|
||||
if not os.environ.get("ANTHROPIC_API_KEY"):
|
||||
dashboard += " → Set ANTHROPIC_API_KEY to enable execution tests\n"
|
||||
if not os.environ.get("CLAUDE_API_KEY"):
|
||||
dashboard += " → Set CLAUDE_API_KEY to enable execution tests\n"
|
||||
|
||||
return dashboard
|
||||
|
||||
@@ -688,8 +688,8 @@ Overall: {passing}/{total} notebooks passing ({percentage:.1f}%)
|
||||
if choice == "1":
|
||||
self.run_validation(mode="quick")
|
||||
elif choice == "2":
|
||||
if not os.environ.get("ANTHROPIC_API_KEY"):
|
||||
print("\n⚠️ Warning: ANTHROPIC_API_KEY not set. Execution tests will be skipped.")
|
||||
if not os.environ.get("CLAUDE_API_KEY"):
|
||||
print("\n⚠️ Warning: CLAUDE_API_KEY not set. Execution tests will be skipped.")
|
||||
cont = input("Continue anyway? (y/n): ")
|
||||
if cont.lower() != 'y':
|
||||
continue
|
||||
@@ -766,8 +766,8 @@ Examples:
|
||||
if args.quick:
|
||||
validator.run_validation(mode="quick")
|
||||
elif args.full:
|
||||
if not os.environ.get("ANTHROPIC_API_KEY"):
|
||||
print("⚠️ Warning: ANTHROPIC_API_KEY not set. Execution tests will be skipped.")
|
||||
if not os.environ.get("CLAUDE_API_KEY"):
|
||||
print("⚠️ Warning: CLAUDE_API_KEY not set. Execution tests will be skipped.")
|
||||
validator.run_validation(mode="full")
|
||||
elif args.dashboard:
|
||||
print(validator.generate_dashboard())
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Claude Skills
|
||||
|
||||
Welcome to the Skills section of the Anthropic Cookbook! This directory contains a collection of guides that showcase specific skills and capabilities where Claude excels. Each guide provides an in-depth exploration of a particular skill, discussing potential use cases, prompt engineering techniques to optimize results, and approaches for evaluating Claude's performance.
|
||||
Welcome to the Skills section of the Claude Cookbook! This directory contains a collection of guides that showcase specific skills and capabilities where Claude excels. Each guide provides an in-depth exploration of a particular skill, discussing potential use cases, prompt engineering techniques to optimize results, and approaches for evaluating Claude's performance.
|
||||
|
||||
## Guides
|
||||
|
||||
|
||||
@@ -19,7 +19,7 @@ The evaluation is orchestrated by the `promptfooconfig.yaml` file. In this file
|
||||
- Prompts
|
||||
- Promptfoo enables you to import prompts in many different formats. You can read more about this [here](https://www.promptfoo.dev/docs/configuration/parameters).
|
||||
- In this example we will load 3 prompts - the same used in `guide.ipynb` from the `prompts.py` file:
|
||||
- The functions are identical to those used in `guide.ipynb` except that instead of calling the Anthropic API they just return the prompt. Promptfoo then handles the orchestration of calling the API and storing the results.
|
||||
- The functions are identical to those used in `guide.ipynb` except that instead of calling the Claude API they just return the prompt. Promptfoo then handles the orchestration of calling the API and storing the results.
|
||||
- You can read more about prompt functions [here](https://www.promptfoo.dev/docs/configuration/parameters#prompt-functions). Using python allows us to reuse the VectorDB class which is necessary for RAG, this is defined in `vectordb.py`.
|
||||
- Providers
|
||||
- With Promptfoo you can connect to many different LLMs from different platforms, see [here for more](https://www.promptfoo.dev/docs/providers). In `guide.ipynb` we used Haiku with default temperature 0.0. We will use Promptfoo to experiment with an array of different temperature settings to identify the optimal choice for our use case.
|
||||
@@ -39,7 +39,7 @@ To get started with Promptfoo open your terminal and navigate to this directory
|
||||
|
||||
Before running your evaluation you must define the following environment variables:
|
||||
|
||||
`export ANTHROPIC_API_KEY=YOUR_API_KEY`
|
||||
`export CLAUDE_API_KEY=YOUR_API_KEY`
|
||||
`export VOYAGE_API_KEY=YOUR_API_KEY`
|
||||
|
||||
From the `evaluation` directory, run the following command.
|
||||
|
||||
@@ -16,7 +16,7 @@
|
||||
"\n",
|
||||
"You will also need:\n",
|
||||
"\n",
|
||||
"- Anthropic API Key\n",
|
||||
"- Claude API Key\n",
|
||||
"- VoyageAI API Key (Optional)\n",
|
||||
" - Embeddings are pre-computed but you will need API key if you make any changes"
|
||||
]
|
||||
@@ -44,7 +44,7 @@
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ['VOYAGE_API_KEY'] = \"VOYAGE KEY HERE\"\n",
|
||||
"os.environ['ANTHROPIC_API_KEY'] = \"ANTHROPIC KEY HERE\""
|
||||
"os.environ['CLAUDE_API_KEY'] = \"ANTHROPIC KEY HERE\""
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -59,7 +59,7 @@
|
||||
"\n",
|
||||
"client = anthropic.Anthropic(\n",
|
||||
" # This is the default and can be omitted\n",
|
||||
" api_key=os.getenv(\"ANTHROPIC_API_KEY\"),\n",
|
||||
" api_key=os.getenv(\"CLAUDE_API_KEY\"),\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -239,7 +239,7 @@
|
||||
"\n",
|
||||
"By using this evaluation code, you can assess the performance of your classifier and visualize the confusion matrix to gain insights into the model's predictions.\n",
|
||||
"\n",
|
||||
"Adjust the `MAXIMUM_CONCURRENT_REQUESTS` to match the rate limits associated with your Anthropic accout, [see here](https://docs.anthropic.com/claude/reference/rate-limits)"
|
||||
"Adjust the `MAXIMUM_CONCURRENT_REQUESTS` to match the rate limits associated with your Anthropic accout, [see here](https://docs.claude.com/claude/reference/rate-limits)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -254,7 +254,7 @@
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"#you can increase this number to speed up evaluation, but keep in mind that you may need a higher API rate limit\n",
|
||||
"#see https://docs.anthropic.com/en/api/rate-limits#rate-limits for more details\n",
|
||||
"#see https://docs.claude.com/en/api/rate-limits#rate-limits for more details\n",
|
||||
"MAXIMUM_CONCURRENT_REQUESTS = 1\n",
|
||||
"\n",
|
||||
"def plot_confusion_matrix(cm, labels):\n",
|
||||
@@ -391,7 +391,7 @@
|
||||
"\n",
|
||||
"Now lets construct a simple classifier using Claude.\n",
|
||||
"\n",
|
||||
"First we will encode the categories in XML format. This will make it easier for Claude to interpret the information. Encoding information in XML is a general prompting strategy, for more information [see here](https://docs.anthropic.com/claude/docs/use-xml-tags)"
|
||||
"First we will encode the categories in XML format. This will make it easier for Claude to interpret the information. Encoding information in XML is a general prompting strategy, for more information [see here](https://docs.claude.com/claude/docs/use-xml-tags)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -551,7 +551,7 @@
|
||||
"\n",
|
||||
"To do this we will need to leverage a VectorDB, this will allow us to match a given query with similar examples from the training data. These examples will hopefully help increase the accuracy of our classifier\n",
|
||||
"\n",
|
||||
"We will build a simple VectorDB class that leverages the embedding models created by [VoyageAI](https://docs.anthropic.com/en/docs/embeddings)"
|
||||
"We will build a simple VectorDB class that leverages the embedding models created by [VoyageAI](https://docs.claude.com/en/docs/embeddings)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -905,7 +905,7 @@
|
||||
"source": [
|
||||
"# Evaluation\n",
|
||||
"\n",
|
||||
"This guide has illustrated the importance of measuring prompt performance empirically when prompt engineering. You can read more about our empirical methodology to prompt engineering [here](https://docs.anthropic.com/en/docs/prompt-engineering). Using a Jupyter Notebook is a great way to start prompt engineering but as your datasets grow larger and your prompts more numerous it is important to leverage tooling that will scale with you. \n",
|
||||
"This guide has illustrated the importance of measuring prompt performance empirically when prompt engineering. You can read more about our empirical methodology to prompt engineering [here](https://docs.claude.com/en/docs/prompt-engineering). Using a Jupyter Notebook is a great way to start prompt engineering but as your datasets grow larger and your prompts more numerous it is important to leverage tooling that will scale with you. \n",
|
||||
"\n",
|
||||
"In this section of the guide we will explore using [Promptfoo](https://www.promptfoo.dev/) an open source LLM evaluation toolkit. To get started head over to the `./evaluation` directory and checkout the `./evaluation/README.md`.\n",
|
||||
"\n",
|
||||
|
||||
@@ -98,7 +98,7 @@
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ['VOYAGE_API_KEY'] = \"YOUR KEY HERE\"\n",
|
||||
"os.environ['ANTHROPIC_API_KEY'] = \"YOUR KEY HERE\"\n",
|
||||
"os.environ['CLAUDE_API_KEY'] = \"YOUR KEY HERE\"\n",
|
||||
"os.environ['COHERE_API_KEY'] = \"YOUR KEY HERE\""
|
||||
]
|
||||
},
|
||||
@@ -112,7 +112,7 @@
|
||||
"\n",
|
||||
"client = anthropic.Anthropic(\n",
|
||||
" # This is the default and can be omitted\n",
|
||||
" api_key=os.getenv(\"ANTHROPIC_API_KEY\"),\n",
|
||||
" api_key=os.getenv(\"CLAUDE_API_KEY\"),\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -449,7 +449,7 @@
|
||||
"\n",
|
||||
"The extra work we're doing to 'situate' each document happens only at ingestion time: it's a cost you'll pay once when you store each document (and periodically in the future if you have a knowledge base that updates over time). There are many approaches like HyDE (hypothetical document embeddings) which involve performing steps to improve the representation of the query prior to executing a search. These techniques have shown to be moderately effective, but they add significant latency at runtime.\n",
|
||||
"\n",
|
||||
"[Prompt caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching) also makes this much more cost effective. Creating contextual embeddings requires us to pass the same document to the model for every chunk we want to generate extra context for. With prompt caching, we can write the overall doc to the cache once, and then because we're doing our ingestion job all in sequence, we can just read the document from cache as we generate context for each chunk within that document (the information you write to the cache has a 5 minute time to live). This means that the first time we pass a document to the model, we pay a bit more to write it to the cache, but for each subsequent API call that contains that doc, we receive a 90% discount on all of the input tokens read from the cache. Assuming 800 token chunks, 8k token documents, 50 token context instructions, and 100 tokens of context per chunk, the cost to generate contextualized chunks is $1.02 per million document tokens.\n",
|
||||
"[Prompt caching](https://docs.claude.com/en/docs/build-with-claude/prompt-caching) also makes this much more cost effective. Creating contextual embeddings requires us to pass the same document to the model for every chunk we want to generate extra context for. With prompt caching, we can write the overall doc to the cache once, and then because we're doing our ingestion job all in sequence, we can just read the document from cache as we generate context for each chunk within that document (the information you write to the cache has a 5 minute time to live). This means that the first time we pass a document to the model, we pay a bit more to write it to the cache, but for each subsequent API call that contains that doc, we receive a 90% discount on all of the input tokens read from the cache. Assuming 800 token chunks, 8k token documents, 50 token context instructions, and 100 tokens of context per chunk, the cost to generate contextualized chunks is $1.02 per million document tokens.\n",
|
||||
"\n",
|
||||
"When you load data into your ContextualVectorDB below, you'll see in logs just how big this impact is. \n",
|
||||
"\n",
|
||||
@@ -549,14 +549,14 @@
|
||||
"from concurrent.futures import ThreadPoolExecutor, as_completed\n",
|
||||
"\n",
|
||||
"class ContextualVectorDB:\n",
|
||||
" def __init__(self, name: str, voyage_api_key=None, anthropic_api_key=None):\n",
|
||||
" def __init__(self, name: str, voyage_api_key=None, CLAUDE_API_KEY=None):\n",
|
||||
" if voyage_api_key is None:\n",
|
||||
" voyage_api_key = os.getenv(\"VOYAGE_API_KEY\")\n",
|
||||
" if anthropic_api_key is None:\n",
|
||||
" anthropic_api_key = os.getenv(\"ANTHROPIC_API_KEY\")\n",
|
||||
" if CLAUDE_API_KEY is None:\n",
|
||||
" CLAUDE_API_KEY = os.getenv(\"CLAUDE_API_KEY\")\n",
|
||||
" \n",
|
||||
" self.voyage_client = voyageai.Client(api_key=voyage_api_key)\n",
|
||||
" self.anthropic_client = anthropic.Anthropic(api_key=anthropic_api_key)\n",
|
||||
" self.anthropic_client = anthropic.Anthropic(api_key=CLAUDE_API_KEY)\n",
|
||||
" self.name = name\n",
|
||||
" self.embeddings = []\n",
|
||||
" self.metadata = []\n",
|
||||
|
||||
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@@ -29,7 +29,7 @@ The evaluation is orchestrated by the `promptfooconfig...` `.yaml` files. In our
|
||||
- Prompts
|
||||
- Promptfoo enables you to import prompts in many different formats. You can read more about this [here](https://www.promptfoo.dev/docs/configuration/parameters).
|
||||
- We have 3 prompts in our end to end evaluation config: each of which corresponds to a method use
|
||||
- The functions are identical to those used in `guide.ipynb` except that instead of calling the Anthropic API they just return the prompt. Promptfoo then handles the orchestration of calling the API and storing the results.
|
||||
- The functions are identical to those used in `guide.ipynb` except that instead of calling the Claude API they just return the prompt. Promptfoo then handles the orchestration of calling the API and storing the results.
|
||||
- You can read more about prompt functions [here](https://www.promptfoo.dev/docs/configuration/parameters#prompt-functions). Using python allows us to reuse the VectorDB class which is necessary for RAG, this is defined in `vectordb.py`.
|
||||
- Providers
|
||||
- With Promptfoo you can connect to many different LLMs from different platforms, see [here for more](https://www.promptfoo.dev/docs/providers). In `guide.ipynb` we used Haiku with default temperature 0.0. We will use Promptfoo to experiment with different models.
|
||||
@@ -47,7 +47,7 @@ To get started with Promptfoo open your terminal and navigate to this directory
|
||||
|
||||
Before running your evaluation you must define the following enviroment variables:
|
||||
|
||||
`export ANTHROPIC_API_KEY=YOUR_API_KEY`
|
||||
`export CLAUDE_API_KEY=YOUR_API_KEY`
|
||||
`export VOYAGE_API_KEY=YOUR_API_KEY`
|
||||
|
||||
From the `evaluation` directory, run one of the following commands.
|
||||
|
||||
@@ -8,7 +8,7 @@ How do the additional tokens required for tool use in Claude API requests impact
|
||||
"When will the new Anthropic Developer Console features that show API usage, billing details, and rate limits be available?",0.3333333333333333,1.0,1.0,True
|
||||
"When deciding whether to use chain-of-thought (CoT) for a task, what are two key factors to consider in order to strike the right balance between performance and latency?",0.6666666666666666,1.0,1.0,False
|
||||
How can I use Claude to more easily digest the content of long PDF documents?,0.3333333333333333,0.5,0.3333333333333333,True
|
||||
"According to the documentation, where can you view your organization's current API rate limits in the Anthropic Console?",0.6666666666666666,1.0,1.0,False
|
||||
"According to the documentation, where can you view your organization's current API rate limits in the Claude Console?",0.6666666666666666,1.0,1.0,False
|
||||
How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy?,0.0,0.0,0.0,False
|
||||
How can you specify a system prompt using the Text Completions API versus the Messages API?,0.3333333333333333,0.5,1.0,True
|
||||
How can you combine XML tags with chain of thought reasoning to create high-performance prompts for Claude?,0.0,0.0,0.0,False
|
||||
@@ -23,23 +23,23 @@ How can you access and deploy Voyage embeddings on AWS Marketplace?,0.3333333333
|
||||
"When using tools just to get Claude to produce JSON output following a particular schema, what key things should you do in terms of tool setup and prompting?",0.3333333333333333,0.5,0.3333333333333333,False
|
||||
What are the key differences between the legacy Claude Instant 1.2 model and the Claude 3 Haiku model in terms of capabilities and performance?,0.6666666666666666,0.6666666666666666,1.0,False
|
||||
What is one key benefit of using examples when prompt engineering with Claude?,0.3333333333333333,1.0,0.5,True
|
||||
"According to the Anthropic documentation, what is one key advantage of using prompt engineering instead of fine-tuning when it comes to adapting an AI model to new domains or tasks?",0.3333333333333333,0.5,1.0,False
|
||||
"According to the Claude Documentation, what is one key advantage of using prompt engineering instead of fine-tuning when it comes to adapting an AI model to new domains or tasks?",0.3333333333333333,0.5,1.0,False
|
||||
How can I quickly get started using the Claude for Sheets extension with a pre-made template?,0.6666666666666666,1.0,1.0,True
|
||||
"How does the ""index"" field in the ""content_block_delta"" event relate to the text being streamed in a response?",0.3333333333333333,0.5,0.5,True
|
||||
"How can you include an image as part of a Claude API request, and what image formats are currently supported?",0.0,0.0,0.0,False
|
||||
What is the relationship between time to first token (TTFT) and latency when evaluating a language model's performance?,1.0,1.0,1.0,True
|
||||
How can providing Claude with examples of handling certain edge cases like implicit requests or emotional prioritization help improve its performance in routing support tickets?,0.3333333333333333,0.5,1.0,True
|
||||
"How does the stop_reason of ""tool_use"" relate to the overall workflow of integrating external tools with Claude?",0.3333333333333333,0.5,1.0,True
|
||||
"According to the documentation, what error event and corresponding HTTP error code may be sent during periods of high usage for the Anthropic API when using streaming responses?",1.0,1.0,1.0,True
|
||||
What are the two types of deltas that can be contained in a content_block_delta event when streaming responses from the Anthropic API?,0.6666666666666666,1.0,1.0,True
|
||||
"On what date did Claude 3.5 Sonnet and tool use both become generally available across the Anthropic API, Amazon Bedrock, and Google Vertex AI?",0.6666666666666666,1.0,1.0,False
|
||||
"According to the documentation, what error event and corresponding HTTP error code may be sent during periods of high usage for the Claude API when using streaming responses?",1.0,1.0,1.0,True
|
||||
What are the two types of deltas that can be contained in a content_block_delta event when streaming responses from the Claude API?,0.6666666666666666,1.0,1.0,True
|
||||
"On what date did Claude 3.5 Sonnet and tool use both become generally available across the Claude API, Amazon Bedrock, and Google Vertex AI?",0.6666666666666666,1.0,1.0,False
|
||||
In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe?,0.6666666666666666,1.0,1.0,True
|
||||
"When the API response from Claude has a stop_reason of ""tool_use"", what does this indicate and what should be done next to continue the conversation?",0.3333333333333333,0.5,1.0,True
|
||||
What Python libraries are used in the example code snippet for evaluating tone and style in a customer service chatbot?,0.0,0.0,0.0,True
|
||||
What are the two main ways to authenticate when using the Anthropic Python SDK to access Claude models on Amazon Bedrock?,0.6666666666666666,1.0,1.0,True
|
||||
"When deciding whether to implement leak-resistant prompt engineering strategies, what two factors should be considered and balanced?",1.0,1.0,1.0,True
|
||||
How can selecting the appropriate Claude model based on your specific requirements help reduce latency in your application?,0.6666666666666666,1.0,1.0,True
|
||||
How can you stream responses from the Anthropic API using the Python SDK?,0.3333333333333333,0.5,1.0,True
|
||||
How can you stream responses from the Claude API using the Python SDK?,0.3333333333333333,0.5,1.0,True
|
||||
"How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case?",0.0,0.0,0.0,True
|
||||
"What is more important when building an eval set for an AI system - having a larger number of test cases with automated grading, or having fewer high-quality test cases graded by humans?",0.3333333333333333,0.5,1.0,True
|
||||
What are the two required fields in a content_block_delta event for a text delta type?,0.6666666666666666,1.0,1.0,False
|
||||
@@ -48,7 +48,7 @@ Why does breaking a task into distinct subtasks for chained prompts help improve
|
||||
How does the streaming format for Messages responses differ from Text Completions streaming responses?,0.3333333333333333,1.0,1.0,True
|
||||
"What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?",0.0,0.0,0.0,False
|
||||
How can using chain prompts help reduce errors and inconsistency in complex tasks handled by Claude?,0.6666666666666666,1.0,1.0,True
|
||||
What HTTP status code does an overloaded_error event correspond to in a non-streaming context for the Anthropic API?,0.6666666666666666,1.0,1.0,True
|
||||
What HTTP status code does an overloaded_error event correspond to in a non-streaming context for the Claude API?,0.6666666666666666,1.0,1.0,True
|
||||
What are the two ways to specify the format in which Voyage AI returns embeddings through its HTTP API?,0.3333333333333333,1.0,1.0,True
|
||||
"When streaming API requests that use tools, how are the input JSON deltas for tool_use content blocks sent, and how can they be accumulated and parsed by the client?",0.3333333333333333,0.5,1.0,True
|
||||
"What are the two interactive prompt engineering tutorials that Anthropic offers, and how do they differ?",0.6666666666666666,1.0,1.0,True
|
||||
@@ -62,7 +62,7 @@ How can using examples in prompts improve Claude's performance on complex tasks?
|
||||
"What are the two types of content block deltas that can be emitted when streaming responses with tool use, and what does each delta type contain?",1.0,0.75,1.0,True
|
||||
What are two key capabilities of Claude that enable it to build interactive systems and personalized user experiences?,0.3333333333333333,1.0,0.5,False
|
||||
"What are the key event types included in a raw HTTP stream response when using message streaming, and what is the typical order they occur in?",0.6666666666666666,1.0,1.0,True
|
||||
What is the maximum number of images that can be included in a single request using the Anthropic API compared to the claude.ai interface?,0.3333333333333333,0.5,0.3333333333333333,True
|
||||
What is the maximum number of images that can be included in a single request using the Claude API compared to the claude.ai interface?,0.3333333333333333,0.5,0.3333333333333333,True
|
||||
"When Claude's response is cut off due to hitting the max_tokens limit and contains an incomplete tool use block, what should you do to get the full tool use?",0.0,0.0,0.0,False
|
||||
What two steps are needed before running a classification evaluation on Claude according to the documentation?,0.0,0.0,0.0,False
|
||||
How can you use the content parameter in the messages list to influence Claude's response?,0.0,0.0,0.0,False
|
||||
@@ -74,7 +74,7 @@ How do the streaming API delta formats differ between tool_use content blocks an
|
||||
What are the image file size limits when uploading images to Claude using the API versus on claude.ai?,0.3333333333333333,1.0,1.0,True
|
||||
What is one key consideration when selecting a Claude model for an enterprise use case that needs low latency?,0.6666666666666666,1.0,0.5,True
|
||||
"What embedding model does Anthropic recommend for code retrieval, and how does its performance compare to alternatives according to Voyage AI?",0.6666666666666666,1.0,1.0,True
|
||||
What are two ways the Anthropic Cookbook can help developers learn to use Anthropic's APIs?,0.6666666666666666,1.0,1.0,True
|
||||
What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?,0.6666666666666666,1.0,1.0,True
|
||||
How does the size of the context window impact a language model's ability to utilize retrieval augmented generation (RAG)?,0.6666666666666666,1.0,1.0,True
|
||||
How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?,0.3333333333333333,0.5,1.0,True
|
||||
Which Claude model has the fastest comparative latency according to the comparison tables?,0.6666666666666666,1.0,1.0,True
|
||||
|
||||
|
@@ -8,7 +8,7 @@ How do the additional tokens required for tool use in Claude API requests impact
|
||||
"When will the new Anthropic Developer Console features that show API usage, billing details, and rate limits be available?",0.3333333333333333,1.0,1.0,True
|
||||
"When deciding whether to use chain-of-thought (CoT) for a task, what are two key factors to consider in order to strike the right balance between performance and latency?",0.3333333333333333,0.5,1.0,True
|
||||
How can I use Claude to more easily digest the content of long PDF documents?,0.3333333333333333,0.5,1.0,True
|
||||
"According to the documentation, where can you view your organization's current API rate limits in the Anthropic Console?",0.6666666666666666,1.0,1.0,True
|
||||
"According to the documentation, where can you view your organization's current API rate limits in the Claude Console?",0.6666666666666666,1.0,1.0,True
|
||||
How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy?,0.3333333333333333,0.5,0.3333333333333333,True
|
||||
How can you specify a system prompt using the Text Completions API versus the Messages API?,0.3333333333333333,0.5,1.0,True
|
||||
How can you combine XML tags with chain of thought reasoning to create high-performance prompts for Claude?,0.3333333333333333,0.5,1.0,True
|
||||
@@ -23,23 +23,23 @@ How can you access and deploy Voyage embeddings on AWS Marketplace?,0.3333333333
|
||||
"When using tools just to get Claude to produce JSON output following a particular schema, what key things should you do in terms of tool setup and prompting?",0.3333333333333333,0.5,1.0,False
|
||||
What are the key differences between the legacy Claude Instant 1.2 model and the Claude 3 Haiku model in terms of capabilities and performance?,1.0,1.0,1.0,True
|
||||
What is one key benefit of using examples when prompt engineering with Claude?,0.3333333333333333,1.0,1.0,True
|
||||
"According to the Anthropic documentation, what is one key advantage of using prompt engineering instead of fine-tuning when it comes to adapting an AI model to new domains or tasks?",0.6666666666666666,1.0,1.0,False
|
||||
"According to the Claude Documentation, what is one key advantage of using prompt engineering instead of fine-tuning when it comes to adapting an AI model to new domains or tasks?",0.6666666666666666,1.0,1.0,False
|
||||
How can I quickly get started using the Claude for Sheets extension with a pre-made template?,0.6666666666666666,1.0,1.0,True
|
||||
"How does the ""index"" field in the ""content_block_delta"" event relate to the text being streamed in a response?",0.3333333333333333,0.5,0.5,True
|
||||
"How can you include an image as part of a Claude API request, and what image formats are currently supported?",0.3333333333333333,0.5,1.0,True
|
||||
What is the relationship between time to first token (TTFT) and latency when evaluating a language model's performance?,0.6666666666666666,0.6666666666666666,1.0,True
|
||||
How can providing Claude with examples of handling certain edge cases like implicit requests or emotional prioritization help improve its performance in routing support tickets?,0.3333333333333333,0.5,1.0,True
|
||||
"How does the stop_reason of ""tool_use"" relate to the overall workflow of integrating external tools with Claude?",0.3333333333333333,0.5,1.0,True
|
||||
"According to the documentation, what error event and corresponding HTTP error code may be sent during periods of high usage for the Anthropic API when using streaming responses?",1.0,1.0,1.0,True
|
||||
What are the two types of deltas that can be contained in a content_block_delta event when streaming responses from the Anthropic API?,0.6666666666666666,1.0,1.0,False
|
||||
"On what date did Claude 3.5 Sonnet and tool use both become generally available across the Anthropic API, Amazon Bedrock, and Google Vertex AI?",0.6666666666666666,1.0,1.0,False
|
||||
"According to the documentation, what error event and corresponding HTTP error code may be sent during periods of high usage for the Claude API when using streaming responses?",1.0,1.0,1.0,True
|
||||
What are the two types of deltas that can be contained in a content_block_delta event when streaming responses from the Claude API?,0.6666666666666666,1.0,1.0,False
|
||||
"On what date did Claude 3.5 Sonnet and tool use both become generally available across the Claude API, Amazon Bedrock, and Google Vertex AI?",0.6666666666666666,1.0,1.0,False
|
||||
In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe?,0.6666666666666666,1.0,1.0,True
|
||||
"When the API response from Claude has a stop_reason of ""tool_use"", what does this indicate and what should be done next to continue the conversation?",0.3333333333333333,0.5,1.0,True
|
||||
What Python libraries are used in the example code snippet for evaluating tone and style in a customer service chatbot?,0.0,0.0,0.0,False
|
||||
What are the two main ways to authenticate when using the Anthropic Python SDK to access Claude models on Amazon Bedrock?,0.3333333333333333,0.5,0.3333333333333333,False
|
||||
"When deciding whether to implement leak-resistant prompt engineering strategies, what two factors should be considered and balanced?",0.6666666666666666,1.0,1.0,True
|
||||
How can selecting the appropriate Claude model based on your specific requirements help reduce latency in your application?,0.0,0.0,0.0,True
|
||||
How can you stream responses from the Anthropic API using the Python SDK?,0.3333333333333333,0.5,1.0,True
|
||||
How can you stream responses from the Claude API using the Python SDK?,0.3333333333333333,0.5,1.0,True
|
||||
"How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case?",0.3333333333333333,0.5,0.3333333333333333,True
|
||||
"What is more important when building an eval set for an AI system - having a larger number of test cases with automated grading, or having fewer high-quality test cases graded by humans?",0.0,0.0,0.0,True
|
||||
What are the two required fields in a content_block_delta event for a text delta type?,0.6666666666666666,1.0,1.0,False
|
||||
@@ -48,7 +48,7 @@ Why does breaking a task into distinct subtasks for chained prompts help improve
|
||||
How does the streaming format for Messages responses differ from Text Completions streaming responses?,0.3333333333333333,1.0,1.0,True
|
||||
"What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?",0.3333333333333333,1.0,1.0,False
|
||||
How can using chain prompts help reduce errors and inconsistency in complex tasks handled by Claude?,0.3333333333333333,0.5,1.0,True
|
||||
What HTTP status code does an overloaded_error event correspond to in a non-streaming context for the Anthropic API?,0.6666666666666666,1.0,1.0,True
|
||||
What HTTP status code does an overloaded_error event correspond to in a non-streaming context for the Claude API?,0.6666666666666666,1.0,1.0,True
|
||||
What are the two ways to specify the format in which Voyage AI returns embeddings through its HTTP API?,0.3333333333333333,1.0,1.0,True
|
||||
"When streaming API requests that use tools, how are the input JSON deltas for tool_use content blocks sent, and how can they be accumulated and parsed by the client?",0.3333333333333333,0.5,1.0,True
|
||||
"What are the two interactive prompt engineering tutorials that Anthropic offers, and how do they differ?",0.3333333333333333,0.5,1.0,True
|
||||
@@ -62,7 +62,7 @@ How can using examples in prompts improve Claude's performance on complex tasks?
|
||||
"What are the two types of content block deltas that can be emitted when streaming responses with tool use, and what does each delta type contain?",0.6666666666666666,0.5,1.0,True
|
||||
What are two key capabilities of Claude that enable it to build interactive systems and personalized user experiences?,0.3333333333333333,1.0,1.0,True
|
||||
"What are the key event types included in a raw HTTP stream response when using message streaming, and what is the typical order they occur in?",0.6666666666666666,1.0,1.0,True
|
||||
What is the maximum number of images that can be included in a single request using the Anthropic API compared to the claude.ai interface?,0.3333333333333333,0.5,0.5,True
|
||||
What is the maximum number of images that can be included in a single request using the Claude API compared to the claude.ai interface?,0.3333333333333333,0.5,0.5,True
|
||||
"When Claude's response is cut off due to hitting the max_tokens limit and contains an incomplete tool use block, what should you do to get the full tool use?",0.3333333333333333,1.0,1.0,True
|
||||
What two steps are needed before running a classification evaluation on Claude according to the documentation?,0.3333333333333333,0.5,0.5,False
|
||||
How can you use the content parameter in the messages list to influence Claude's response?,0.0,0.0,0.0,True
|
||||
@@ -74,7 +74,7 @@ How do the streaming API delta formats differ between tool_use content blocks an
|
||||
What are the image file size limits when uploading images to Claude using the API versus on claude.ai?,0.3333333333333333,1.0,1.0,True
|
||||
What is one key consideration when selecting a Claude model for an enterprise use case that needs low latency?,0.3333333333333333,0.5,0.5,True
|
||||
"What embedding model does Anthropic recommend for code retrieval, and how does its performance compare to alternatives according to Voyage AI?",0.3333333333333333,0.5,1.0,False
|
||||
What are two ways the Anthropic Cookbook can help developers learn to use Anthropic's APIs?,0.6666666666666666,1.0,0.5,False
|
||||
What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?,0.6666666666666666,1.0,0.5,False
|
||||
How does the size of the context window impact a language model's ability to utilize retrieval augmented generation (RAG)?,0.6666666666666666,1.0,1.0,True
|
||||
How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?,0.3333333333333333,0.5,1.0,True
|
||||
Which Claude model has the fastest comparative latency according to the comparison tables?,0.0,0.0,0.0,True
|
||||
|
||||
|
@@ -8,7 +8,7 @@ How do the additional tokens required for tool use in Claude API requests impact
|
||||
"When will the new Anthropic Developer Console features that show API usage, billing details, and rate limits be available?",0.3333333333333333,1.0,1.0,True
|
||||
"When deciding whether to use chain-of-thought (CoT) for a task, what are two key factors to consider in order to strike the right balance between performance and latency?",0.6666666666666666,1.0,1.0,False
|
||||
How can I use Claude to more easily digest the content of long PDF documents?,0.3333333333333333,0.5,0.5,True
|
||||
"According to the documentation, where can you view your organization's current API rate limits in the Anthropic Console?",0.6666666666666666,1.0,0.5,True
|
||||
"According to the documentation, where can you view your organization's current API rate limits in the Claude Console?",0.6666666666666666,1.0,0.5,True
|
||||
How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy?,0.0,0.0,0.0,False
|
||||
How can you specify a system prompt using the Text Completions API versus the Messages API?,0.3333333333333333,0.5,1.0,True
|
||||
How can you combine XML tags with chain of thought reasoning to create high-performance prompts for Claude?,0.0,0.0,0.0,False
|
||||
@@ -23,23 +23,23 @@ How can you access and deploy Voyage embeddings on AWS Marketplace?,0.3333333333
|
||||
"When using tools just to get Claude to produce JSON output following a particular schema, what key things should you do in terms of tool setup and prompting?",0.3333333333333333,0.5,0.5,False
|
||||
What are the key differences between the legacy Claude Instant 1.2 model and the Claude 3 Haiku model in terms of capabilities and performance?,1.0,1.0,1.0,True
|
||||
What is one key benefit of using examples when prompt engineering with Claude?,0.3333333333333333,1.0,1.0,True
|
||||
"According to the Anthropic documentation, what is one key advantage of using prompt engineering instead of fine-tuning when it comes to adapting an AI model to new domains or tasks?",0.3333333333333333,0.5,1.0,False
|
||||
"According to the Claude Documentation, what is one key advantage of using prompt engineering instead of fine-tuning when it comes to adapting an AI model to new domains or tasks?",0.3333333333333333,0.5,1.0,False
|
||||
How can I quickly get started using the Claude for Sheets extension with a pre-made template?,0.6666666666666666,1.0,1.0,True
|
||||
"How does the ""index"" field in the ""content_block_delta"" event relate to the text being streamed in a response?",0.3333333333333333,0.5,0.5,True
|
||||
"How can you include an image as part of a Claude API request, and what image formats are currently supported?",0.3333333333333333,0.5,0.3333333333333333,True
|
||||
What is the relationship between time to first token (TTFT) and latency when evaluating a language model's performance?,1.0,1.0,1.0,True
|
||||
How can providing Claude with examples of handling certain edge cases like implicit requests or emotional prioritization help improve its performance in routing support tickets?,0.3333333333333333,0.5,1.0,True
|
||||
"How does the stop_reason of ""tool_use"" relate to the overall workflow of integrating external tools with Claude?",0.3333333333333333,0.5,1.0,True
|
||||
"According to the documentation, what error event and corresponding HTTP error code may be sent during periods of high usage for the Anthropic API when using streaming responses?",1.0,1.0,1.0,True
|
||||
What are the two types of deltas that can be contained in a content_block_delta event when streaming responses from the Anthropic API?,0.6666666666666666,1.0,1.0,True
|
||||
"On what date did Claude 3.5 Sonnet and tool use both become generally available across the Anthropic API, Amazon Bedrock, and Google Vertex AI?",0.6666666666666666,1.0,1.0,False
|
||||
"According to the documentation, what error event and corresponding HTTP error code may be sent during periods of high usage for the Claude API when using streaming responses?",1.0,1.0,1.0,True
|
||||
What are the two types of deltas that can be contained in a content_block_delta event when streaming responses from the Claude API?,0.6666666666666666,1.0,1.0,True
|
||||
"On what date did Claude 3.5 Sonnet and tool use both become generally available across the Claude API, Amazon Bedrock, and Google Vertex AI?",0.6666666666666666,1.0,1.0,False
|
||||
In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe?,0.6666666666666666,1.0,1.0,True
|
||||
"When the API response from Claude has a stop_reason of ""tool_use"", what does this indicate and what should be done next to continue the conversation?",0.3333333333333333,0.5,1.0,True
|
||||
What Python libraries are used in the example code snippet for evaluating tone and style in a customer service chatbot?,0.0,0.0,0.0,True
|
||||
What are the two main ways to authenticate when using the Anthropic Python SDK to access Claude models on Amazon Bedrock?,0.3333333333333333,0.5,1.0,True
|
||||
"When deciding whether to implement leak-resistant prompt engineering strategies, what two factors should be considered and balanced?",1.0,1.0,1.0,True
|
||||
How can selecting the appropriate Claude model based on your specific requirements help reduce latency in your application?,0.6666666666666666,1.0,1.0,True
|
||||
How can you stream responses from the Anthropic API using the Python SDK?,0.3333333333333333,0.5,1.0,True
|
||||
How can you stream responses from the Claude API using the Python SDK?,0.3333333333333333,0.5,1.0,True
|
||||
"How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case?",0.3333333333333333,0.5,0.3333333333333333,True
|
||||
"What is more important when building an eval set for an AI system - having a larger number of test cases with automated grading, or having fewer high-quality test cases graded by humans?",0.3333333333333333,0.5,1.0,True
|
||||
What are the two required fields in a content_block_delta event for a text delta type?,0.6666666666666666,1.0,1.0,False
|
||||
@@ -48,7 +48,7 @@ Why does breaking a task into distinct subtasks for chained prompts help improve
|
||||
How does the streaming format for Messages responses differ from Text Completions streaming responses?,0.3333333333333333,1.0,1.0,True
|
||||
"What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?",0.3333333333333333,1.0,0.5,False
|
||||
How can using chain prompts help reduce errors and inconsistency in complex tasks handled by Claude?,0.6666666666666666,1.0,1.0,True
|
||||
What HTTP status code does an overloaded_error event correspond to in a non-streaming context for the Anthropic API?,0.6666666666666666,1.0,1.0,True
|
||||
What HTTP status code does an overloaded_error event correspond to in a non-streaming context for the Claude API?,0.6666666666666666,1.0,1.0,True
|
||||
What are the two ways to specify the format in which Voyage AI returns embeddings through its HTTP API?,0.3333333333333333,1.0,1.0,True
|
||||
"When streaming API requests that use tools, how are the input JSON deltas for tool_use content blocks sent, and how can they be accumulated and parsed by the client?",0.3333333333333333,0.5,1.0,True
|
||||
"What are the two interactive prompt engineering tutorials that Anthropic offers, and how do they differ?",0.6666666666666666,1.0,1.0,True
|
||||
@@ -62,7 +62,7 @@ How can using examples in prompts improve Claude's performance on complex tasks?
|
||||
"What are the two types of content block deltas that can be emitted when streaming responses with tool use, and what does each delta type contain?",1.0,0.75,1.0,True
|
||||
What are two key capabilities of Claude that enable it to build interactive systems and personalized user experiences?,0.3333333333333333,1.0,0.5,False
|
||||
"What are the key event types included in a raw HTTP stream response when using message streaming, and what is the typical order they occur in?",0.6666666666666666,1.0,1.0,True
|
||||
What is the maximum number of images that can be included in a single request using the Anthropic API compared to the claude.ai interface?,0.0,0.0,0.0,True
|
||||
What is the maximum number of images that can be included in a single request using the Claude API compared to the claude.ai interface?,0.0,0.0,0.0,True
|
||||
"When Claude's response is cut off due to hitting the max_tokens limit and contains an incomplete tool use block, what should you do to get the full tool use?",0.3333333333333333,1.0,0.3333333333333333,True
|
||||
What two steps are needed before running a classification evaluation on Claude according to the documentation?,0.0,0.0,0.0,False
|
||||
How can you use the content parameter in the messages list to influence Claude's response?,0.3333333333333333,0.5,0.3333333333333333,False
|
||||
@@ -74,7 +74,7 @@ How do the streaming API delta formats differ between tool_use content blocks an
|
||||
What are the image file size limits when uploading images to Claude using the API versus on claude.ai?,0.3333333333333333,1.0,1.0,True
|
||||
What is one key consideration when selecting a Claude model for an enterprise use case that needs low latency?,0.6666666666666666,1.0,1.0,True
|
||||
"What embedding model does Anthropic recommend for code retrieval, and how does its performance compare to alternatives according to Voyage AI?",0.6666666666666666,1.0,1.0,True
|
||||
What are two ways the Anthropic Cookbook can help developers learn to use Anthropic's APIs?,0.6666666666666666,1.0,1.0,True
|
||||
What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?,0.6666666666666666,1.0,1.0,True
|
||||
How does the size of the context window impact a language model's ability to utilize retrieval augmented generation (RAG)?,0.6666666666666666,1.0,1.0,True
|
||||
How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?,0.3333333333333333,0.5,1.0,True
|
||||
Which Claude model has the fastest comparative latency according to the comparison tables?,0.3333333333333333,0.5,1.0,True
|
||||
|
||||
|
@@ -8,7 +8,7 @@ How do the additional tokens required for tool use in Claude API requests impact
|
||||
"When will the new Anthropic Developer Console features that show API usage, billing details, and rate limits be available?",0.3333333333333333,1.0,1.0,True
|
||||
"When deciding whether to use chain-of-thought (CoT) for a task, what are two key factors to consider in order to strike the right balance between performance and latency?",0.6666666666666666,1.0,1.0,False
|
||||
How can I use Claude to more easily digest the content of long PDF documents?,0.3333333333333333,0.5,0.3333333333333333,True
|
||||
"According to the documentation, where can you view your organization's current API rate limits in the Anthropic Console?",0.6666666666666666,1.0,1.0,False
|
||||
"According to the documentation, where can you view your organization's current API rate limits in the Claude Console?",0.6666666666666666,1.0,1.0,False
|
||||
How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy?,0.0,0.0,0.0,False
|
||||
How can you specify a system prompt using the Text Completions API versus the Messages API?,0.3333333333333333,0.5,1.0,True
|
||||
How can you combine XML tags with chain of thought reasoning to create high-performance prompts for Claude?,0.0,0.0,0.0,False
|
||||
@@ -23,23 +23,23 @@ How can you access and deploy Voyage embeddings on AWS Marketplace?,0.3333333333
|
||||
"When using tools just to get Claude to produce JSON output following a particular schema, what key things should you do in terms of tool setup and prompting?",0.3333333333333333,0.5,0.3333333333333333,False
|
||||
What are the key differences between the legacy Claude Instant 1.2 model and the Claude 3 Haiku model in terms of capabilities and performance?,0.6666666666666666,0.6666666666666666,1.0,True
|
||||
What is one key benefit of using examples when prompt engineering with Claude?,0.3333333333333333,1.0,0.5,True
|
||||
"According to the Anthropic documentation, what is one key advantage of using prompt engineering instead of fine-tuning when it comes to adapting an AI model to new domains or tasks?",0.3333333333333333,0.5,1.0,False
|
||||
"According to the Claude Documentation, what is one key advantage of using prompt engineering instead of fine-tuning when it comes to adapting an AI model to new domains or tasks?",0.3333333333333333,0.5,1.0,False
|
||||
How can I quickly get started using the Claude for Sheets extension with a pre-made template?,0.6666666666666666,1.0,1.0,True
|
||||
"How does the ""index"" field in the ""content_block_delta"" event relate to the text being streamed in a response?",0.3333333333333333,0.5,0.5,True
|
||||
"How can you include an image as part of a Claude API request, and what image formats are currently supported?",0.0,0.0,0.0,False
|
||||
What is the relationship between time to first token (TTFT) and latency when evaluating a language model's performance?,1.0,1.0,1.0,True
|
||||
How can providing Claude with examples of handling certain edge cases like implicit requests or emotional prioritization help improve its performance in routing support tickets?,0.3333333333333333,0.5,1.0,True
|
||||
"How does the stop_reason of ""tool_use"" relate to the overall workflow of integrating external tools with Claude?",0.3333333333333333,0.5,1.0,True
|
||||
"According to the documentation, what error event and corresponding HTTP error code may be sent during periods of high usage for the Anthropic API when using streaming responses?",1.0,1.0,1.0,True
|
||||
What are the two types of deltas that can be contained in a content_block_delta event when streaming responses from the Anthropic API?,0.6666666666666666,1.0,1.0,True
|
||||
"On what date did Claude 3.5 Sonnet and tool use both become generally available across the Anthropic API, Amazon Bedrock, and Google Vertex AI?",0.6666666666666666,1.0,1.0,False
|
||||
"According to the documentation, what error event and corresponding HTTP error code may be sent during periods of high usage for the Claude API when using streaming responses?",1.0,1.0,1.0,True
|
||||
What are the two types of deltas that can be contained in a content_block_delta event when streaming responses from the Claude API?,0.6666666666666666,1.0,1.0,True
|
||||
"On what date did Claude 3.5 Sonnet and tool use both become generally available across the Claude API, Amazon Bedrock, and Google Vertex AI?",0.6666666666666666,1.0,1.0,False
|
||||
In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe?,0.6666666666666666,1.0,1.0,True
|
||||
"When the API response from Claude has a stop_reason of ""tool_use"", what does this indicate and what should be done next to continue the conversation?",0.3333333333333333,0.5,1.0,True
|
||||
What Python libraries are used in the example code snippet for evaluating tone and style in a customer service chatbot?,0.0,0.0,0.0,True
|
||||
What are the two main ways to authenticate when using the Anthropic Python SDK to access Claude models on Amazon Bedrock?,0.6666666666666666,1.0,1.0,True
|
||||
"When deciding whether to implement leak-resistant prompt engineering strategies, what two factors should be considered and balanced?",1.0,1.0,1.0,True
|
||||
How can selecting the appropriate Claude model based on your specific requirements help reduce latency in your application?,0.6666666666666666,1.0,1.0,True
|
||||
How can you stream responses from the Anthropic API using the Python SDK?,0.3333333333333333,0.5,1.0,True
|
||||
How can you stream responses from the Claude API using the Python SDK?,0.3333333333333333,0.5,1.0,True
|
||||
"How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case?",0.0,0.0,0.0,True
|
||||
"What is more important when building an eval set for an AI system - having a larger number of test cases with automated grading, or having fewer high-quality test cases graded by humans?",0.3333333333333333,0.5,1.0,True
|
||||
What are the two required fields in a content_block_delta event for a text delta type?,0.6666666666666666,1.0,1.0,False
|
||||
@@ -48,7 +48,7 @@ Why does breaking a task into distinct subtasks for chained prompts help improve
|
||||
How does the streaming format for Messages responses differ from Text Completions streaming responses?,0.3333333333333333,1.0,1.0,True
|
||||
"What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?",0.0,0.0,0.0,False
|
||||
How can using chain prompts help reduce errors and inconsistency in complex tasks handled by Claude?,0.6666666666666666,1.0,1.0,True
|
||||
What HTTP status code does an overloaded_error event correspond to in a non-streaming context for the Anthropic API?,0.6666666666666666,1.0,1.0,True
|
||||
What HTTP status code does an overloaded_error event correspond to in a non-streaming context for the Claude API?,0.6666666666666666,1.0,1.0,True
|
||||
What are the two ways to specify the format in which Voyage AI returns embeddings through its HTTP API?,0.3333333333333333,1.0,1.0,True
|
||||
"When streaming API requests that use tools, how are the input JSON deltas for tool_use content blocks sent, and how can they be accumulated and parsed by the client?",0.3333333333333333,0.5,1.0,True
|
||||
"What are the two interactive prompt engineering tutorials that Anthropic offers, and how do they differ?",0.6666666666666666,1.0,1.0,True
|
||||
@@ -62,7 +62,7 @@ How can using examples in prompts improve Claude's performance on complex tasks?
|
||||
"What are the two types of content block deltas that can be emitted when streaming responses with tool use, and what does each delta type contain?",1.0,0.75,1.0,True
|
||||
What are two key capabilities of Claude that enable it to build interactive systems and personalized user experiences?,0.3333333333333333,1.0,0.5,False
|
||||
"What are the key event types included in a raw HTTP stream response when using message streaming, and what is the typical order they occur in?",0.6666666666666666,1.0,1.0,True
|
||||
What is the maximum number of images that can be included in a single request using the Anthropic API compared to the claude.ai interface?,0.3333333333333333,0.5,0.3333333333333333,True
|
||||
What is the maximum number of images that can be included in a single request using the Claude API compared to the claude.ai interface?,0.3333333333333333,0.5,0.3333333333333333,True
|
||||
"When Claude's response is cut off due to hitting the max_tokens limit and contains an incomplete tool use block, what should you do to get the full tool use?",0.0,0.0,0.0,False
|
||||
What two steps are needed before running a classification evaluation on Claude according to the documentation?,0.0,0.0,0.0,False
|
||||
How can you use the content parameter in the messages list to influence Claude's response?,0.0,0.0,0.0,False
|
||||
@@ -74,7 +74,7 @@ How do the streaming API delta formats differ between tool_use content blocks an
|
||||
What are the image file size limits when uploading images to Claude using the API versus on claude.ai?,0.3333333333333333,1.0,1.0,True
|
||||
What is one key consideration when selecting a Claude model for an enterprise use case that needs low latency?,0.6666666666666666,1.0,0.5,True
|
||||
"What embedding model does Anthropic recommend for code retrieval, and how does its performance compare to alternatives according to Voyage AI?",0.6666666666666666,1.0,1.0,True
|
||||
What are two ways the Anthropic Cookbook can help developers learn to use Anthropic's APIs?,0.6666666666666666,1.0,1.0,False
|
||||
What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?,0.6666666666666666,1.0,1.0,False
|
||||
How does the size of the context window impact a language model's ability to utilize retrieval augmented generation (RAG)?,0.6666666666666666,1.0,1.0,True
|
||||
How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?,0.3333333333333333,0.5,1.0,True
|
||||
Which Claude model has the fastest comparative latency according to the comparison tables?,0.6666666666666666,1.0,1.0,True
|
||||
|
||||
|
File diff suppressed because it is too large
Load Diff
@@ -32,7 +32,7 @@ def evaluate_end_to_end(query, generated_answer, correct_answer):
|
||||
</evaluation>
|
||||
"""
|
||||
|
||||
client = Anthropic(api_key=os.environ.get('ANTHROPIC_API_KEY'))
|
||||
client = Anthropic(api_key=os.environ.get('CLAUDE_API_KEY'))
|
||||
try:
|
||||
response = client.messages.create(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
|
||||
@@ -8,7 +8,7 @@ query,correct_answer,__expected
|
||||
"When will the new Anthropic Developer Console features that show API usage, billing details, and rate limits be available?","The new Usage, Cost, and Rate Limits tabs in the Anthropic Developer Console that show API usage, billing details, and current rate limits will be available on June 27th, 2024.","python:file://eval_end_to_end.py"
|
||||
"When deciding whether to use chain-of-thought (CoT) for a task, what are two key factors to consider in order to strike the right balance between performance and latency?","When deciding whether to use CoT, consider if the task requires in-depth thinking that a human would need to work through, and be aware that the increased output length from CoT may impact latency.","python:file://eval_end_to_end.py"
|
||||
"How can I use Claude to more easily digest the content of long PDF documents?","You can upload PDFs and have Claude summarize their content, making it easier to understand the key points of long documents without having to read through everything.","python:file://eval_end_to_end.py"
|
||||
"According to the documentation, where can you view your organization's current API rate limits in the Anthropic Console?","You can view your organization's current API rate limits in the Rate Limits tab of the Developer Console.","python:file://eval_end_to_end.py"
|
||||
"According to the documentation, where can you view your organization's current API rate limits in the Claude Console?","You can view your organization's current API rate limits in the Rate Limits tab of the Developer Console.","python:file://eval_end_to_end.py"
|
||||
"How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy?","In addition to accuracy, we can measure the 95th percentile response time and average cost per classification to assess the ticket classification system's performance and production-readiness.","python:file://eval_end_to_end.py"
|
||||
"How can you specify a system prompt using the Text Completions API versus the Messages API?","With the Text Completions API, the system prompt is added as text before the first ""\n\nHuman:"" turn. With the Messages API, the system prompt is specified using the separate ""system"" parameter when making the API request.","python:file://eval_end_to_end.py"
|
||||
"How can you combine XML tags with chain of thought reasoning to create high-performance prompts for Claude?","You can combine XML tags like <thinking> and <answer> with chain of thought reasoning, where Claude explains its step-by-step reasoning process, to create structured, high-performance prompts. For example, you can prompt Claude to show its reasoning by including ""Before answering, explain your reasoning step-by-step in <thinking> tags."" in the user message or system prompt.","python:file://eval_end_to_end.py"
|
||||
@@ -23,32 +23,32 @@ query,correct_answer,__expected
|
||||
"When using tools just to get Claude to produce JSON output following a particular schema, what key things should you do in terms of tool setup and prompting?","When using tools to get JSON output, you should provide a single tool, set the tool_choice to explicitly instruct the model to use that tool, and ensure the tool name and description are from the model's perspective since it will pass the input to the tool.","python:file://eval_end_to_end.py"
|
||||
"What are the key differences between the legacy Claude Instant 1.2 model and the Claude 3 Haiku model in terms of capabilities and performance?","The Claude 3 Haiku model has vision capabilities, is faster, more performant, and more intelligent than the legacy Claude Instant 1.2 model. Claude 3 Haiku also has more up-to-date training data.","python:file://eval_end_to_end.py"
|
||||
"What is one key benefit of using examples when prompt engineering with Claude?","One key benefit of using examples in prompts is that they reduce misinterpretation of instructions, leading to more accurate outputs from Claude.","python:file://eval_end_to_end.py"
|
||||
"According to the Anthropic documentation, what is one key advantage of using prompt engineering instead of fine-tuning when it comes to adapting an AI model to new domains or tasks?","Prompt engineering allows you to easily adapt AI models to new domains by providing domain-specific context directly in the prompts, without needing to retrain the model through fine-tuning.","python:file://eval_end_to_end.py"
|
||||
"According to the Claude Documentation, what is one key advantage of using prompt engineering instead of fine-tuning when it comes to adapting an AI model to new domains or tasks?","Prompt engineering allows you to easily adapt AI models to new domains by providing domain-specific context directly in the prompts, without needing to retrain the model through fine-tuning.","python:file://eval_end_to_end.py"
|
||||
"How can I quickly get started using the Claude for Sheets extension with a pre-made template?","You can make a copy of Anthropic's provided Claude for Sheets workbook template to quickly get started using the extension with your own work.","python:file://eval_end_to_end.py"
|
||||
"How does the ""index"" field in the ""content_block_delta"" event relate to the text being streamed in a response?","The ""index"" field in each ""content_block_delta"" event indicates which content block the text delta applies to. Multiple deltas with the same index consecutively stream the text for a single content block in the response.","python:file://eval_end_to_end.py"
|
||||
"How can you include an image as part of a Claude API request, and what image formats are currently supported?","To include an image in a Claude API request, provide it as a base64-encoded image in an ""image"" content block within the ""messages"" array. The currently supported image formats are JPEG, PNG, GIF, and WebP.","python:file://eval_end_to_end.py"
|
||||
"What is the relationship between time to first token (TTFT) and latency when evaluating a language model's performance?","TTFT is a specific measure of latency that captures the time it takes for a language model to generate the first token of its response after receiving a prompt. It is an important component of a model's overall latency and responsiveness, especially for interactive applications.","python:file://eval_end_to_end.py"
|
||||
"How can providing Claude with examples of handling certain edge cases like implicit requests or emotional prioritization help improve its performance in routing support tickets?","Providing edge case examples to Claude in the prompt can meaningfully improve its performance in correctly routing support tickets in scenarios where it may otherwise misclassify them, such as implicit requests, emotional prioritization, ambiguous intent vs. routing, or issue prioritization.","python:file://eval_end_to_end.py"
|
||||
"How does the stop_reason of ""tool_use"" relate to the overall workflow of integrating external tools with Claude?","When Claude determines that one of the user-provided tools can help answer the user's query, it constructs a tool use request. This causes the API response to have a stop_reason of ""tool_use"", signaling Claude's intent to use the tool. The user must then extract the tool input from Claude's request, run the actual tool code client-side, and continue the conversation by sending the tool results back to Claude.","python:file://eval_end_to_end.py"
|
||||
"According to the documentation, what error event and corresponding HTTP error code may be sent during periods of high usage for the Anthropic API when using streaming responses?","During periods of high usage, an overloaded_error event may be sent in the event stream, which would normally correspond to an HTTP 529 error code in a non-streaming context.","python:file://eval_end_to_end.py"
|
||||
"What are the two types of deltas that can be contained in a content_block_delta event when streaming responses from the Anthropic API?","The two types of deltas that can be contained in a content_block_delta event are text_delta and input_json_delta.","python:file://eval_end_to_end.py"
|
||||
"On what date did Claude 3.5 Sonnet and tool use both become generally available across the Anthropic API, Amazon Bedrock, and Google Vertex AI?","Claude 3.5 Sonnet became generally available across those platforms on June 20th, 2024, while tool use became generally available on May 30th, 2024.","python:file://eval_end_to_end.py"
|
||||
"According to the documentation, what error event and corresponding HTTP error code may be sent during periods of high usage for the Claude API when using streaming responses?","During periods of high usage, an overloaded_error event may be sent in the event stream, which would normally correspond to an HTTP 529 error code in a non-streaming context.","python:file://eval_end_to_end.py"
|
||||
"What are the two types of deltas that can be contained in a content_block_delta event when streaming responses from the Claude API?","The two types of deltas that can be contained in a content_block_delta event are text_delta and input_json_delta.","python:file://eval_end_to_end.py"
|
||||
"On what date did Claude 3.5 Sonnet and tool use both become generally available across the Claude API, Amazon Bedrock, and Google Vertex AI?","Claude 3.5 Sonnet became generally available across those platforms on June 20th, 2024, while tool use became generally available on May 30th, 2024.","python:file://eval_end_to_end.py"
|
||||
"In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe?","Anthropic launched Claude.ai and the Claude iOS app in Europe in May 2024, and then launched them in Canada the following month in June 2024.","python:file://eval_end_to_end.py"
|
||||
"When the API response from Claude has a stop_reason of ""tool_use"", what does this indicate and what should be done next to continue the conversation?","A stop_reason of ""tool_use"" signals that Claude has decided to use a tool and has constructed a formatted tool use request. To continue the conversation, the tool name and input should be extracted from Claude's request, the actual tool code should be executed client-side, and then a new user message containing a tool_result content block should be sent to Claude.","python:file://eval_end_to_end.py"
|
||||
"What Python libraries are used in the example code snippet for evaluating tone and style in a customer service chatbot?","The example code snippet for evaluating tone and style in a customer service chatbot uses the anthropic Python library to interact with the Claude AI model.","python:file://eval_end_to_end.py"
|
||||
"What are the two main ways to authenticate when using the Anthropic Python SDK to access Claude models on Amazon Bedrock?","The two main ways to authenticate are: 1) Directly providing the aws_access_key, aws_secret_key, and optionally aws_session_token, or 2) Using the default AWS credential providers, such as the ~/.aws/credentials file or the AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID environment variables.","python:file://eval_end_to_end.py"
|
||||
"When deciding whether to implement leak-resistant prompt engineering strategies, what two factors should be considered and balanced?","When deciding to use leak-resistant prompt engineering, the potential reduction in prompt leaks should be balanced against the risk of degraded model performance due to the added complexity of the prompt.","python:file://eval_end_to_end.py"
|
||||
"How can selecting the appropriate Claude model based on your specific requirements help reduce latency in your application?","Choosing the right Claude model that best fits your needs in terms of speed and output quality is one of the most straightforward ways to reduce latency in your application. Anthropic offers a range of Claude models with different capabilities and performance characteristics to allow you to choose the optimal balance of intelligence, speed, and cost for your use case.","python:file://eval_end_to_end.py"
|
||||
"How can you stream responses from the Anthropic API using the Python SDK?","You can stream responses from the Anthropic API using the Python SDK by using the client.messages.stream() method and iterating over the stream.text_stream attribute in a for loop.","python:file://eval_end_to_end.py"
|
||||
"How can you stream responses from the Claude API using the Python SDK?","You can stream responses from the Claude API using the Python SDK by using the client.messages.stream() method and iterating over the stream.text_stream attribute in a for loop.","python:file://eval_end_to_end.py"
|
||||
"How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case?","You can shape Claude's response by pre-filling part of it in the last position of the input messages list. To get a short response like a single multiple choice answer, you can set the ""max_tokens"" parameter to a small value like 1.","python:file://eval_end_to_end.py"
|
||||
"What is more important when building an eval set for an AI system - having a larger number of test cases with automated grading, or having fewer high-quality test cases graded by humans?","When building an eval set, it is better to prioritize having a larger volume of test cases with slightly lower signal automated grading over having fewer questions with high-quality human hand-grading.","python:file://eval_end_to_end.py"
|
||||
"What are the two required fields in a content_block_delta event for a text delta type?","The two required fields in a content_block_delta event for a text delta type are ""index"" and ""delta"", where the ""delta"" field contains a ""type"" of ""text_delta"" and the ""text"" being added.","python:file://eval_end_to_end.py"
|
||||
"What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?","The Anthropic Cookbook provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting.","python:file://eval_end_to_end.py"
|
||||
"What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?","The Claude Cookbook provides interactive Jupyter notebooks demonstrating how to upload PDFs, generate embeddings, and more. The Developer Console offers a prompt generator tool for easier, more powerful prompting.","python:file://eval_end_to_end.py"
|
||||
"Why does breaking a task into distinct subtasks for chained prompts help improve Claude's accuracy on the overall task?","Breaking a task into distinct subtasks for chained prompts improves Claude's accuracy because each subtask gets Claude's full attention, reducing errors compared to tackling the entire complex task at once.","python:file://eval_end_to_end.py"
|
||||
"How does the streaming format for Messages responses differ from Text Completions streaming responses?","Messages streaming responses can contain multiple content blocks of varying types, making the streaming format more complex compared to Text Completions which only include completion, ping, and error server-sent-events.","python:file://eval_end_to_end.py"
|
||||
"What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?","According to the documentation, users can start experimenting with Claude by visiting claude.ai or using Anthropic's web Console.","python:file://eval_end_to_end.py"
|
||||
"How can using chain prompts help reduce errors and inconsistency in complex tasks handled by Claude?","Chain prompts break complex tasks into smaller subtasks, allowing Claude to give its full attention to each one. This reduces errors and inconsistencies that may occur when trying to handle a complex workflow all at once.","python:file://eval_end_to_end.py"
|
||||
"What HTTP status code does an overloaded_error event correspond to in a non-streaming context for the Anthropic API?","In a non-streaming context, an overloaded_error event would normally correspond to an HTTP 529 status code.","python:file://eval_end_to_end.py"
|
||||
"What HTTP status code does an overloaded_error event correspond to in a non-streaming context for the Claude API?","In a non-streaming context, an overloaded_error event would normally correspond to an HTTP 529 status code.","python:file://eval_end_to_end.py"
|
||||
"What are the two ways to specify the format in which Voyage AI returns embeddings through its HTTP API?","When making a request to Voyage AI's embedding endpoint, you can either leave the encoding_format parameter unspecified to get the embeddings as lists of floating-point numbers, or set encoding_format to ""base64"" to get the embeddings compressed to Base64 encodings.","python:file://eval_end_to_end.py"
|
||||
"When streaming API requests that use tools, how are the input JSON deltas for tool_use content blocks sent, and how can they be accumulated and parsed by the client?","When streaming requests with tool use, the input JSON deltas for tool_use content blocks are sent as partial JSON strings in multiple content_block_delta events. The client can accumulate these partial JSON strings and parse the complete JSON object once a content_block_stop event is received, using a library like Pydantic for partial JSON parsing or helpers provided in Anthropic's SDKs.","python:file://eval_end_to_end.py"
|
||||
"What are the two interactive prompt engineering tutorials that Anthropic offers, and how do they differ?","Anthropic offers a GitHub prompting tutorial that covers prompt engineering concepts in-depth with examples, and a lighter-weight Google Sheets prompting tutorial that utilizes Claude for Sheets.","python:file://eval_end_to_end.py"
|
||||
@@ -62,7 +62,7 @@ query,correct_answer,__expected
|
||||
"What are the two types of content block deltas that can be emitted when streaming responses with tool use, and what does each delta type contain?","When streaming responses with tool use, the two types of content block deltas are text deltas and input JSON deltas. Text deltas contain a ""text"" field with a string of the incrementally generated text. Input JSON deltas contain a ""partial_json"" field with a string containing part of the JSON object specifying the tool's input.","python:file://eval_end_to_end.py"
|
||||
"What are two key capabilities of Claude that enable it to build interactive systems and personalized user experiences?","Claude's question answering and text analysis capabilities enable it to build intelligent, interactive systems like chatbots and personalize user experiences by understanding sentiment and preferences.","python:file://eval_end_to_end.py"
|
||||
"What are the key event types included in a raw HTTP stream response when using message streaming, and what is the typical order they occur in?","A raw HTTP stream response includes a message_start event, followed by one or more content blocks (each with a content_block_start, content_block_delta events, and content_block_stop), a message_delta event, and a final message_stop event. Ping events may also be dispersed throughout.","python:file://eval_end_to_end.py"
|
||||
"What is the maximum number of images that can be included in a single request using the Anthropic API compared to the claude.ai interface?","The Messages API allows including up to 20 images per request, while the claude.ai interface has a lower limit of up to 5 images per turn.","python:file://eval_end_to_end.py"
|
||||
"What is the maximum number of images that can be included in a single request using the Claude API compared to the claude.ai interface?","The Messages API allows including up to 20 images per request, while the claude.ai interface has a lower limit of up to 5 images per turn.","python:file://eval_end_to_end.py"
|
||||
"When Claude's response is cut off due to hitting the max_tokens limit and contains an incomplete tool use block, what should you do to get the full tool use?","If Claude's response hits the max_tokens limit and has an incomplete tool use block, you should retry the request with a higher max_tokens value to get Claude's full response including the complete tool use.","python:file://eval_end_to_end.py"
|
||||
"What two steps are needed before running a classification evaluation on Claude according to the documentation?","Before running a classification evaluation on Claude, you need to 1) develop your test cases, and 2) take a look at Anthropic's guide to developing test cases.","python:file://eval_end_to_end.py"
|
||||
"How can you use the content parameter in the messages list to influence Claude's response?","You can provide content in the last position of the messages list, with the ""assistant"" role, to pre-fill part of Claude's response. This allows you to shape the assistant's output.","python:file://eval_end_to_end.py"
|
||||
@@ -74,7 +74,7 @@ query,correct_answer,__expected
|
||||
"What are the image file size limits when uploading images to Claude using the API versus on claude.ai?","When uploading images to Claude, the API has a maximum file size limit of 5MB per image, while on claude.ai the limit is 10MB per image.","python:file://eval_end_to_end.py"
|
||||
"What is one key consideration when selecting a Claude model for an enterprise use case that needs low latency?","When selecting a Claude model for an enterprise use case that requires low latency, it's important to choose the model that best balances speed and output quality based on the specific requirements of the use case.","python:file://eval_end_to_end.py"
|
||||
"What embedding model does Anthropic recommend for code retrieval, and how does its performance compare to alternatives according to Voyage AI?","For code retrieval, Voyage AI recommends using the voyage-code-2 embedding model, which they claim performs 17% better than alternatives and achieves state-of-the-art results on general-purpose corpora as well.","python:file://eval_end_to_end.py"
|
||||
"What are two ways the Anthropic Cookbook can help developers learn to use Anthropic's APIs?","The Anthropic Cookbook provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs.","python:file://eval_end_to_end.py"
|
||||
"What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?","The Claude Cookbook provides interactive Jupyter notebooks that demonstrate how to upload PDFs and work with embeddings to help developers learn to use Anthropic's APIs.","python:file://eval_end_to_end.py"
|
||||
"How does the size of the context window impact a language model's ability to utilize retrieval augmented generation (RAG)?","The size of the context window determines how much retrieved information can be passed to the language model to augment its knowledge when generating a response using RAG. A larger context window allows more relevant retrieved information to be utilized by the model, improving the accuracy and groundedness of the generated text.","python:file://eval_end_to_end.py"
|
||||
"How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?","The Evaluation tool helps identify edge cases where prompts might falter, allows rating individual results to determine prompt performance, ensures consistent performance across inputs, and enables prompt refinement for better reliability. Reviewing results across test cases helps spot patterns to make informed adjustments that lead to more robust AI applications.","python:file://eval_end_to_end.py"
|
||||
"Which Claude model has the fastest comparative latency according to the comparison tables?","The Claude 3 Haiku model has the fastest comparative latency","python:file://eval_end_to_end.py"
|
||||
@@ -94,8 +94,8 @@ query,correct_answer,__expected
|
||||
"What are two ways to specify API parameters when calling the Claude API using Claude for Sheets?","When calling the Claude API using Claude for Sheets, you can specify API parameters in two ways: 1) As additional arguments after the prompt and model in the CLAUDE() function, like =CLAUDE(prompt, model, ""max_tokens"", 3). 2) By passing in an API key to be used just for a specific cell, like ""api_key"", ""sk-ant-api03-j1W...""","python:file://eval_end_to_end.py"
|
||||
"How does prefilling the response with an opening curly brace ({ ) affect Claude's output when extracting structured data from text?","Prefilling Claude's response with { causes it to skip the preamble explanation and directly output the extracted data as a JSON object, resulting in a more concise response that is easier for programs to parse without additional processing.","python:file://eval_end_to_end.py"
|
||||
"What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?","Anthropic provides a multimodal cookbook with tips on getting started with images and best practices, as well as API reference documentation for the Messages API that includes example API calls involving images.","python:file://eval_end_to_end.py"
|
||||
"How do you specify the API key when creating a new Anthropic client in the Python and TypeScript SDK examples?","In both the Python and TypeScript examples, you can specify the API key as a string parameter when creating a new Anthropic client object. If no API key is provided, it defaults to using the ANTHROPIC_API_KEY environment variable.","python:file://eval_end_to_end.py"
|
||||
"How do you specify the API key when creating a new Anthropic client in the Python and TypeScript SDK examples?","In both the Python and TypeScript examples, you can specify the API key as a string parameter when creating a new Anthropic client object. If no API key is provided, it defaults to using the CLAUDE_API_KEY environment variable.","python:file://eval_end_to_end.py"
|
||||
"What are two key benefits of using the Anthropic Evaluation tool when developing prompts for an AI classification application?","The Evaluation tool helps identify edge cases where the prompt might falter, and ensures consistent performance across a range of test case inputs. This allows you to refine the prompt for better reliability in the AI classification application.","python:file://eval_end_to_end.py"
|
||||
"What are the key differences between a pretrained language model like Claude's underlying model, and the final version of Claude available through Anthropic's API?","The pretrained language model that forms Claude's foundation is not inherently good at answering questions or following instructions. To create the helpful, honest and safe Claude assistant available through the API, the pretrained model underwent fine-tuning and reinforcement learning from human feedback (RLHF).","python:file://eval_end_to_end.py"
|
||||
"What is the IPv6 address range used by Anthropic?","The IPv6 address range used by Anthropic is 2607:6bc0::/48.","python:file://eval_end_to_end.py"
|
||||
"When using the Python SDK to create a message with Claude, what are two ways you can specify your API key?","When using the Python SDK, you can specify your API key either by passing it as the api_key parameter when initializing the Anthropic client, or by setting it as an environment variable named ANTHROPIC_API_KEY which the client will use by default.","python:file://eval_end_to_end.py"
|
||||
"When using the Python SDK to create a message with Claude, what are two ways you can specify your API key?","When using the Python SDK, you can specify your API key either by passing it as the api_key parameter when initializing the Anthropic client, or by setting it as an environment variable named CLAUDE_API_KEY which the client will use by default.","python:file://eval_end_to_end.py"
|
||||
|
||||
|
@@ -1,101 +1,101 @@
|
||||
query,correct_chunks,__expected
|
||||
"How can you create multiple test cases for an evaluation in the Anthropic Evaluation tool?","[""https://docs.anthropic.com/en/docs/test-and-evaluate/eval-tool#creating-test-cases"",""https://docs.anthropic.com/en/docs/build-with-claude/develop-tests#building-evals-and-test-cases""]","python:file://eval_retrieval.py"
|
||||
"What embeddings provider does Anthropic recommend for customized domain-specific models, and what capabilities does this provider offer?","[""https://docs.anthropic.com/en/docs/build-with-claude/embeddings#before-implementing-embeddings"",""https://docs.anthropic.com/en/docs/build-with-claude/embeddings#how-to-get-embeddings-with-anthropic""]","python:file://eval_retrieval.py"
|
||||
"What are some key success metrics to consider when evaluating Claude's performance on a classification task, and how do they relate to choosing the right model to reduce latency?","[""https://docs.anthropic.com/en/docs/about-claude/use-cases/classification#evaluation-metrics"",""https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#1-choose-the-right-model""]","python:file://eval_retrieval.py"
|
||||
"What are two ways that Claude for Sheets can improve prompt engineering workflows compared to using chained prompts?","[""https://docs.anthropic.com/en/docs/build-with-claude/claude-for-sheets#why-use-claude-for-sheets"",""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#how-to-chain-prompts""]","python:file://eval_retrieval.py"
|
||||
"What happens if a prompt for the Text Completions API is missing the ""\n\nHuman:"" and ""\n\nAssistant:"" turns?","[""https://docs.anthropic.com/en/api/migrating-from-text-completions-to-messages#system-prompt"",""https://docs.anthropic.com/en/api/prompt-validation#examples""]","python:file://eval_retrieval.py"
|
||||
"How do the additional tokens required for tool use in Claude API requests impact pricing compared to regular API requests?","[""https://docs.anthropic.com/en/docs/build-with-claude/tool-use#pricing"",""https://docs.anthropic.com/en/docs/build-with-claude/tool-use#how-tool-use-works""]","python:file://eval_retrieval.py"
|
||||
"When will the new Anthropic Developer Console features that show API usage, billing details, and rate limits be available?","[""https://docs.anthropic.com/en/release-notes/api#june-27th-2024""]","python:file://eval_retrieval.py"
|
||||
"When deciding whether to use chain-of-thought (CoT) for a task, what are two key factors to consider in order to strike the right balance between performance and latency?","[""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/chain-of-thought#why-not-let-claude-think"",""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/chain-of-thought#before-implementing-cot""]","python:file://eval_retrieval.py"
|
||||
"How can I use Claude to more easily digest the content of long PDF documents?","[""https://docs.anthropic.com/en/docs/build-with-claude/text-generation#anthropic-cookbook"",""https://docs.anthropic.com/en/docs/build-with-claude/vision#before-you-upload""]","python:file://eval_retrieval.py"
|
||||
"According to the documentation, where can you view your organization's current API rate limits in the Anthropic Console?","[""https://docs.anthropic.com/en/api/rate-limits#about-our-limits"",""https://docs.anthropic.com/en/release-notes/api#june-27th-2024""]","python:file://eval_retrieval.py"
|
||||
"How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy?","[""https://docs.anthropic.com/en/docs/about-claude/use-cases/ticket-routing#evaluation-methodology"",""https://docs.anthropic.com/en/docs/about-claude/use-cases/ticket-routing#prompting-claude-for-ticket-routing""]","python:file://eval_retrieval.py"
|
||||
"How can you specify a system prompt using the Text Completions API versus the Messages API?","[""https://docs.anthropic.com/en/api/prompt-validation#examples"",""https://docs.anthropic.com/en/api/migrating-from-text-completions-to-messages#system-prompt""]","python:file://eval_retrieval.py"
|
||||
"How can you combine XML tags with chain of thought reasoning to create high-performance prompts for Claude?","[""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags#tagging-best-practices"",""https://docs.anthropic.com/en/docs/build-with-claude/tool-use#chain-of-thought""]","python:file://eval_retrieval.py"
|
||||
"When evaluating the Claude model's performance for ticket routing, what three key metrics are calculated and what are the results for the claude-3-haiku-20240307 model on the 91 test samples?","[""https://docs.anthropic.com/en/docs/about-claude/use-cases/ticket-routing#evaluation-methodology"",""https://docs.anthropic.com/en/docs/about-claude/use-cases/ticket-routing#example-data""]","python:file://eval_retrieval.py"
|
||||
"Before starting to engineer and improve a prompt in Claude, what key things does Anthropic recommend you have in place first?","[""https://docs.anthropic.com/en/docs/build-with-claude/define-success#next-steps"",""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering#before-prompt-engineering""]","python:file://eval_retrieval.py"
|
||||
"How does the Messages API handle mid-response prompting compared to the Text Completions API?","[""https://docs.anthropic.com/en/api/migrating-from-text-completions-to-messages#inputs-and-outputs"",""https://docs.anthropic.com/en/api/migrating-from-text-completions-to-messages#putting-words-in-claudes-mouth""]","python:file://eval_retrieval.py"
|
||||
"How does Claude's response differ when given a role through a system prompt compared to not having a specific role in the financial analysis example?","[""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/system-prompts#example-2-financial-analysis""]","python:file://eval_retrieval.py"
|
||||
"What are some quantitative metrics that can be used to measure the success of a sentiment analysis model, and how might specific targets for those metrics be determined?","[""https://docs.anthropic.com/en/docs/build-with-claude/define-success#building-strong-criteria""]","python:file://eval_retrieval.py"
|
||||
"What is a power user tip mentioned in the documentation for creating high-performance prompts using XML tags?","[""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering#how-to-prompt-engineer"",""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags#tagging-best-practices""]","python:file://eval_retrieval.py"
|
||||
"How can you use an LLM like Claude to automatically grade the outputs of other LLMs based on a rubric?","[""https://docs.anthropic.com/en/docs/build-with-claude/develop-tests#tips-for-llm-based-grading"",""https://docs.anthropic.com/en/api/messages-examples#multiple-conversational-turns""]","python:file://eval_retrieval.py"
|
||||
"How can you access and deploy Voyage embeddings on AWS Marketplace?","[""https://docs.anthropic.com/en/docs/build-with-claude/embeddings#voyage-on-the-aws-marketplace""]","python:file://eval_retrieval.py"
|
||||
"When using tools just to get Claude to produce JSON output following a particular schema, what key things should you do in terms of tool setup and prompting?","[""https://docs.anthropic.com/en/docs/build-with-claude/tool-use#tool-use-examples"",""https://docs.anthropic.com/en/docs/build-with-claude/tool-use#json-output""]","python:file://eval_retrieval.py"
|
||||
"What are the key differences between the legacy Claude Instant 1.2 model and the Claude 3 Haiku model in terms of capabilities and performance?","[""https://docs.anthropic.com/en/docs/about-claude/models#legacy-model-comparison"",""https://docs.anthropic.com/en/docs/about-claude/models#model-comparison"",""https://docs.anthropic.com/en/docs/about-claude/models#legacy-models""]","python:file://eval_retrieval.py"
|
||||
"What is one key benefit of using examples when prompt engineering with Claude?","[""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/multishot-prompting#why-use-examples""]","python:file://eval_retrieval.py"
|
||||
"According to the Anthropic documentation, what is one key advantage of using prompt engineering instead of fine-tuning when it comes to adapting an AI model to new domains or tasks?","[""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering#when-to-prompt-engineer"",""https://docs.anthropic.com/en/docs/resources/glossary#fine-tuning""]","python:file://eval_retrieval.py"
|
||||
"How can I quickly get started using the Claude for Sheets extension with a pre-made template?","[""https://docs.anthropic.com/en/docs/build-with-claude/claude-for-sheets#claude-for-sheets-workbook-template"",""https://docs.anthropic.com/en/docs/build-with-claude/claude-for-sheets#get-started-with-claude-for-sheets""]","python:file://eval_retrieval.py"
|
||||
"How does the ""index"" field in the ""content_block_delta"" event relate to the text being streamed in a response?","[""https://docs.anthropic.com/en/api/messages-streaming#basic-streaming-request"",""https://docs.anthropic.com/en/api/messages-streaming#text-delta""]","python:file://eval_retrieval.py"
|
||||
"How can you include an image as part of a Claude API request, and what image formats are currently supported?","[""https://docs.anthropic.com/en/api/messages-examples#vision"",""https://docs.anthropic.com/en/docs/build-with-claude/vision#about-the-prompt-examples""]","python:file://eval_retrieval.py"
|
||||
"What is the relationship between time to first token (TTFT) and latency when evaluating a language model's performance?","[""https://docs.anthropic.com/en/docs/resources/glossary#ttft-time-to-first-token"",""https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#how-to-measure-latency"",""https://docs.anthropic.com/en/docs/resources/glossary#latency""]","python:file://eval_retrieval.py"
|
||||
"How can providing Claude with examples of handling certain edge cases like implicit requests or emotional prioritization help improve its performance in routing support tickets?","[""https://docs.anthropic.com/en/docs/about-claude/use-cases/ticket-routing#adapting-to-common-scenarios"",""https://docs.anthropic.com/en/docs/about-claude/use-cases/ticket-routing#prompting-claude-for-ticket-routing""]","python:file://eval_retrieval.py"
|
||||
"How does the stop_reason of ""tool_use"" relate to the overall workflow of integrating external tools with Claude?","[""https://docs.anthropic.com/en/api/messages-examples#tool-use-and-json-mode"",""https://docs.anthropic.com/en/docs/build-with-claude/tool-use#how-tool-use-works""]","python:file://eval_retrieval.py"
|
||||
"According to the documentation, what error event and corresponding HTTP error code may be sent during periods of high usage for the Anthropic API when using streaming responses?","[""https://docs.anthropic.com/en/api/messages-streaming#error-events"",""https://docs.anthropic.com/en/api/streaming#error-event-types"",""https://docs.anthropic.com/en/api/errors#http-errors""]","python:file://eval_retrieval.py"
|
||||
"What are the two types of deltas that can be contained in a content_block_delta event when streaming responses from the Anthropic API?","[""https://docs.anthropic.com/en/api/messages-streaming#text-delta"",""https://docs.anthropic.com/en/api/messages-streaming#delta-types""]","python:file://eval_retrieval.py"
|
||||
"On what date did Claude 3.5 Sonnet and tool use both become generally available across the Anthropic API, Amazon Bedrock, and Google Vertex AI?","[""https://docs.anthropic.com/en/release-notes/api#june-20th-2024"",""https://docs.anthropic.com/en/release-notes/api#may-30th-2024""]","python:file://eval_retrieval.py"
|
||||
"In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe?","[""https://docs.anthropic.com/en/release-notes/claude-apps#june-5th-2024"",""https://docs.anthropic.com/en/release-notes/claude-apps#may-13th-2024""]","python:file://eval_retrieval.py"
|
||||
"When the API response from Claude has a stop_reason of ""tool_use"", what does this indicate and what should be done next to continue the conversation?","[""https://docs.anthropic.com/en/docs/build-with-claude/tool-use#json-output"",""https://docs.anthropic.com/en/docs/build-with-claude/tool-use#how-tool-use-works""]","python:file://eval_retrieval.py"
|
||||
"What Python libraries are used in the example code snippet for evaluating tone and style in a customer service chatbot?","[""https://docs.anthropic.com/en/docs/build-with-claude/develop-tests#example-evals""]","python:file://eval_retrieval.py"
|
||||
"What are the two main ways to authenticate when using the Anthropic Python SDK to access Claude models on Amazon Bedrock?","[""https://docs.anthropic.com/en/api/claude-on-amazon-bedrock#install-an-sdk-for-accessing-bedrock"",""https://docs.anthropic.com/en/api/claude-on-amazon-bedrock#making-requests""]","python:file://eval_retrieval.py"
|
||||
"When deciding whether to implement leak-resistant prompt engineering strategies, what two factors should be considered and balanced?","[""https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-prompt-leak#strategies-to-reduce-prompt-leak"",""https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-prompt-leak#before-you-try-to-reduce-prompt-leak""]","python:file://eval_retrieval.py"
|
||||
"How can selecting the appropriate Claude model based on your specific requirements help reduce latency in your application?","[""https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#1-choose-the-right-model"",""https://docs.anthropic.com/en/docs/intro-to-claude#model-options""]","python:file://eval_retrieval.py"
|
||||
"How can you stream responses from the Anthropic API using the Python SDK?","[""https://docs.anthropic.com/en/api/messages-streaming#streaming-with-sdks"",""https://docs.anthropic.com/en/api/client-sdks#python""]","python:file://eval_retrieval.py"
|
||||
"How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case?","[""https://docs.anthropic.com/en/api/messages-examples#putting-words-in-claudes-mouth"",""https://docs.anthropic.com/en/api/messages-examples#basic-request-and-response""]","python:file://eval_retrieval.py"
|
||||
"What is more important when building an eval set for an AI system - having a larger number of test cases with automated grading, or having fewer high-quality test cases graded by humans?","[""https://docs.anthropic.com/en/docs/build-with-claude/develop-tests#eval-design-principles"",""https://docs.anthropic.com/en/docs/build-with-claude/develop-tests#building-evals-and-test-cases""]","python:file://eval_retrieval.py"
|
||||
"What are the two required fields in a content_block_delta event for a text delta type?","[""https://docs.anthropic.com/en/api/messages-streaming#delta-types"",""https://docs.anthropic.com/en/api/messages-streaming#text-delta""]","python:file://eval_retrieval.py"
|
||||
"What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?","[""https://docs.anthropic.com/en/docs/quickstart#next-steps"",""https://docs.anthropic.com/en/docs/welcome#develop-with-claude""]","python:file://eval_retrieval.py"
|
||||
"Why does breaking a task into distinct subtasks for chained prompts help improve Claude's accuracy on the overall task?","[""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#how-to-chain-prompts"",""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#why-chain-prompts""]","python:file://eval_retrieval.py"
|
||||
"How does the streaming format for Messages responses differ from Text Completions streaming responses?","[""https://docs.anthropic.com/en/api/migrating-from-text-completions-to-messages#streaming-format""]","python:file://eval_retrieval.py"
|
||||
"What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?","[""https://docs.anthropic.com/en/docs/about-claude/models#get-started-with-claude""]","python:file://eval_retrieval.py"
|
||||
"How can using chain prompts help reduce errors and inconsistency in complex tasks handled by Claude?","[""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#why-chain-prompts"",""https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/increase-consistency#chain-prompts-for-complex-tasks""]","python:file://eval_retrieval.py"
|
||||
"What HTTP status code does an overloaded_error event correspond to in a non-streaming context for the Anthropic API?","[""https://docs.anthropic.com/en/api/streaming#error-event-types"",""https://docs.anthropic.com/en/api/messages-streaming#error-events""]","python:file://eval_retrieval.py"
|
||||
"What are the two ways to specify the format in which Voyage AI returns embeddings through its HTTP API?","[""https://docs.anthropic.com/en/docs/build-with-claude/embeddings#voyage-http-api""]","python:file://eval_retrieval.py"
|
||||
"When streaming API requests that use tools, how are the input JSON deltas for tool_use content blocks sent, and how can they be accumulated and parsed by the client?","[""https://docs.anthropic.com/en/api/messages-streaming#input-json-delta"",""https://docs.anthropic.com/en/api/messages-streaming#streaming-request-with-tool-use""]","python:file://eval_retrieval.py"
|
||||
"What are the two interactive prompt engineering tutorials that Anthropic offers, and how do they differ?","[""https://docs.anthropic.com/en/docs/build-with-claude/claude-for-sheets#prompt-engineering-interactive-tutorial"",""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering#prompt-engineering-tutorial""]","python:file://eval_retrieval.py"
|
||||
"What are some of the key capabilities that make Claude suitable for enterprise use cases requiring integration with specialized applications and processing of large volumes of sensitive data?","[""https://docs.anthropic.com/en/docs/intro-to-claude#enterprise-considerations""]","python:file://eval_retrieval.py"
|
||||
"As of June 2024, in which regions are Anthropic's Claude.ai API and iOS app available?","[""https://docs.anthropic.com/en/release-notes/claude-apps#may-1st-2024"",""https://docs.anthropic.com/en/release-notes/claude-apps#june-5th-2024"",""https://docs.anthropic.com/en/release-notes/claude-apps#may-13th-2024""]","python:file://eval_retrieval.py"
|
||||
"What are the two main approaches for integrating Claude into a support ticket workflow, and how do they differ in terms of scalability and ease of implementation?","[""https://docs.anthropic.com/en/docs/about-claude/use-cases/ticket-routing#integrate-claude-into-your-support-workflow"",""https://docs.anthropic.com/en/docs/about-claude/use-cases/ticket-routing#introduction""]","python:file://eval_retrieval.py"
|
||||
"When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?","[""https://docs.anthropic.com/en/release-notes/api#may-10th-2024""]","python:file://eval_retrieval.py"
|
||||
"Which Claude 3 model provides the best balance of intelligence and speed for high-throughput tasks like sales forecasting and targeted marketing?","[""https://docs.anthropic.com/en/api/claude-on-vertex-ai#api-model-names"",""https://docs.anthropic.com/en/docs/intro-to-claude#claude-3-family""]","python:file://eval_retrieval.py"
|
||||
"How can you calculate the similarity between two Voyage embedding vectors, and what is this equivalent to since Voyage embeddings are normalized to length 1?","[""https://docs.anthropic.com/en/docs/build-with-claude/embeddings#faq"",""https://docs.anthropic.com/en/docs/build-with-claude/embeddings#voyage-embedding-example""]","python:file://eval_retrieval.py"
|
||||
"How can using examples in prompts improve Claude's performance on complex tasks?","[""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/multishot-prompting#why-use-examples"",""https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/increase-consistency#chain-prompts-for-complex-tasks""]","python:file://eval_retrieval.py"
|
||||
"What are the two types of content block deltas that can be emitted when streaming responses with tool use, and what does each delta type contain?","[""https://docs.anthropic.com/en/api/messages-streaming#input-json-delta"",""https://docs.anthropic.com/en/api/messages-streaming#text-delta"",""https://docs.anthropic.com/en/api/messages-streaming#streaming-request-with-tool-use"",""https://docs.anthropic.com/en/api/messages-streaming#delta-types""]","python:file://eval_retrieval.py"
|
||||
"What are two key capabilities of Claude that enable it to build interactive systems and personalized user experiences?","[""https://docs.anthropic.com/en/docs/build-with-claude/text-generation#text-capabilities-and-use-cases""]","python:file://eval_retrieval.py"
|
||||
"What are the key event types included in a raw HTTP stream response when using message streaming, and what is the typical order they occur in?","[""https://docs.anthropic.com/en/api/messages-streaming#event-types"",""https://docs.anthropic.com/en/api/messages-streaming#raw-http-stream-response""]","python:file://eval_retrieval.py"
|
||||
"What is the maximum number of images that can be included in a single request using the Anthropic API compared to the claude.ai interface?","[""https://docs.anthropic.com/en/docs/build-with-claude/vision#about-the-prompt-examples"",""https://docs.anthropic.com/en/docs/build-with-claude/vision#faq""]","python:file://eval_retrieval.py"
|
||||
"When Claude's response is cut off due to hitting the max_tokens limit and contains an incomplete tool use block, what should you do to get the full tool use?","[""https://docs.anthropic.com/en/docs/build-with-claude/tool-use#troubleshooting-errors""]","python:file://eval_retrieval.py"
|
||||
"What two steps are needed before running a classification evaluation on Claude according to the documentation?","[""https://docs.anthropic.com/en/docs/about-claude/use-cases/classification#3-run-your-eval"",""https://docs.anthropic.com/en/docs/about-claude/use-cases/classification#2-develop-your-test-cases""]","python:file://eval_retrieval.py"
|
||||
"How can you use the content parameter in the messages list to influence Claude's response?","[""https://docs.anthropic.com/en/api/messages-examples#basic-request-and-response"",""https://docs.anthropic.com/en/api/messages-examples#putting-words-in-claudes-mouth""]","python:file://eval_retrieval.py"
|
||||
"What are two key advantages of prompt engineering over fine-tuning when it comes to model comprehension and general knowledge preservation?","[""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering#when-to-prompt-engineer"",""https://docs.anthropic.com/en/docs/resources/glossary#fine-tuning""]","python:file://eval_retrieval.py"
|
||||
"What are the two main steps to get started with making requests to Claude models on Anthropic's Bedrock API?","[""https://docs.anthropic.com/en/api/claude-on-amazon-bedrock#install-and-configure-the-aws-cli"",""https://docs.anthropic.com/en/api/claude-on-amazon-bedrock#making-requests""]","python:file://eval_retrieval.py"
|
||||
"How can you check which Claude models are available in a specific AWS region using the AWS CLI?","[""https://docs.anthropic.com/en/api/claude-on-amazon-bedrock#subscribe-to-anthropic-models"",""https://docs.anthropic.com/en/api/claude-on-amazon-bedrock#list-available-models""]","python:file://eval_retrieval.py"
|
||||
"What argument can be passed to the voyageai.Client.embed() method or the Voyage HTTP API to specify whether the input text is a query or a document?","[""https://docs.anthropic.com/en/docs/build-with-claude/embeddings#voyage-python-package"",""https://docs.anthropic.com/en/docs/build-with-claude/embeddings#voyage-http-api""]","python:file://eval_retrieval.py"
|
||||
"How do the streaming API delta formats differ between tool_use content blocks and text content blocks?","[""https://docs.anthropic.com/en/api/messages-streaming#input-json-delta"",""https://docs.anthropic.com/en/api/messages-streaming#text-delta""]","python:file://eval_retrieval.py"
|
||||
"What are the image file size limits when uploading images to Claude using the API versus on claude.ai?","[""https://docs.anthropic.com/en/docs/build-with-claude/vision#faq""]","python:file://eval_retrieval.py"
|
||||
"What is one key consideration when selecting a Claude model for an enterprise use case that needs low latency?","[""https://docs.anthropic.com/en/docs/intro-to-claude#model-options"",""https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#1-choose-the-right-model""]","python:file://eval_retrieval.py"
|
||||
"What embedding model does Anthropic recommend for code retrieval, and how does its performance compare to alternatives according to Voyage AI?","[""https://docs.anthropic.com/en/docs/build-with-claude/embeddings#how-to-get-embeddings-with-anthropic"",""https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models""]","python:file://eval_retrieval.py"
|
||||
"What are two ways the Anthropic Cookbook can help developers learn to use Anthropic's APIs?","[""https://docs.anthropic.com/en/docs/welcome#develop-with-claude"",""https://docs.anthropic.com/en/docs/quickstart#next-steps""]","python:file://eval_retrieval.py"
|
||||
"How does the size of the context window impact a language model's ability to utilize retrieval augmented generation (RAG)?","[""https://docs.anthropic.com/en/docs/resources/glossary#context-window"",""https://docs.anthropic.com/en/docs/resources/glossary#rag-retrieval-augmented-generation""]","python:file://eval_retrieval.py"
|
||||
"How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?","[""https://docs.anthropic.com/en/docs/test-and-evaluate/eval-tool#understanding-results"",""https://docs.anthropic.com/en/docs/test-and-evaluate/eval-tool#creating-test-cases""]","python:file://eval_retrieval.py"
|
||||
"Which Claude model has the fastest comparative latency according to the comparison tables?","[""https://docs.anthropic.com/en/docs/about-claude/models#model-comparison"",""https://docs.anthropic.com/en/docs/about-claude/models#legacy-model-comparison""]","python:file://eval_retrieval.py"
|
||||
"How can you build up a conversation with multiple turns using the Anthropic Messages API in Python?","[""https://docs.anthropic.com/en/api/client-sdks#python"",""https://docs.anthropic.com/en/api/messages-examples#multiple-conversational-turns""]","python:file://eval_retrieval.py"
|
||||
"How can using XML tags to provide a specific role or context help improve Claude's analysis of a legal contract compared to not using a role prompt?","[""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags#examples"",""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/system-prompts#example-1-legal-contract-analysis""]","python:file://eval_retrieval.py"
|
||||
"What are the key differences between how Claude 3 Opus and Claude 3 Sonnet handle missing information when making tool calls?","[""https://docs.anthropic.com/en/docs/build-with-claude/tool-use#chain-of-thought"",""https://docs.anthropic.com/en/docs/build-with-claude/tool-use#tool-use-examples""]","python:file://eval_retrieval.py"
|
||||
"What steps should be taken to ensure a reliable deployment of an automated ticket routing system using Claude into a production environment?","[""https://docs.anthropic.com/en/docs/about-claude/use-cases/ticket-routing#additional-considerations"",""https://docs.anthropic.com/en/docs/about-claude/use-cases/ticket-routing#integrate-claude-into-your-support-workflow""]","python:file://eval_retrieval.py"
|
||||
"How should you evaluate a model's performance on a ticket routing classifier?","[""https://docs.anthropic.com/en/docs/about-claude/use-cases/ticket-routing#evaluating-the-performance-of-your-ticket-routing-classifier"",""https://docs.anthropic.com/en/docs/about-claude/use-cases/ticket-routing#integrate-claude-into-your-support-workflow""]","python:file://eval_retrieval.py"
|
||||
"What two methods does Anthropic recommend for learning how to prompt engineer with Claude before diving into the techniques?","[""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering#how-to-prompt-engineer"",""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering#prompt-engineering-tutorial""]","python:file://eval_retrieval.py"
|
||||
"What are the key differences between a pretrained large language model and Claude in terms of their training and capabilities?","[""https://docs.anthropic.com/en/docs/resources/glossary#llm"",""https://docs.anthropic.com/en/docs/resources/glossary#pretraining""]","python:file://eval_retrieval.py"
|
||||
"What are some key advantages of using prompt engineering instead of fine-tuning to adapt a pretrained language model for a specific task or domain?","[""https://docs.anthropic.com/en/docs/resources/glossary#fine-tuning"",""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering#when-to-prompt-engineer"",""https://docs.anthropic.com/en/docs/resources/glossary#pretraining""]","python:file://eval_retrieval.py"
|
||||
"How can you authenticate with GCP before running requests to access Claude models on Vertex AI?","[""https://docs.anthropic.com/en/api/claude-on-vertex-ai#making-requests"",""https://docs.anthropic.com/en/api/claude-on-vertex-ai#accessing-vertex-ai""]","python:file://eval_retrieval.py"
|
||||
"What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?","[""https://docs.anthropic.com/en/release-notes/api#may-10th-2024""]","python:file://eval_retrieval.py"
|
||||
"On what date did both the Claude 3.5 Sonnet model and the Artifacts feature in Claude.ai become available?","[""https://docs.anthropic.com/en/release-notes/api#june-20th-2024"",""https://docs.anthropic.com/en/release-notes/claude-apps#june-20th-2024""]","python:file://eval_retrieval.py"
|
||||
"When putting words in Claude's mouth to shape the response, what header and value can you use in the request to limit Claude's response to a single token?","[""https://docs.anthropic.com/en/api/messages-examples#basic-request-and-response"",""https://docs.anthropic.com/en/api/messages-examples#putting-words-in-claudes-mouth""]","python:file://eval_retrieval.py"
|
||||
"What does the temperature parameter do when working with large language models?","[""https://docs.anthropic.com/en/docs/resources/glossary#temperature"",""https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#2-optimize-prompt-and-output-length""]","python:file://eval_retrieval.py"
|
||||
"What are two ways to specify API parameters when calling the Claude API using Claude for Sheets?","[""https://docs.anthropic.com/en/docs/test-and-evaluate/eval-tool#tips-for-effective-evaluation"",""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/prefill-claudes-response#how-to-prefill-claudes-response"",""https://docs.anthropic.com/en/docs/build-with-claude/claude-for-sheets#enter-your-first-prompt""]","python:file://eval_retrieval.py"
|
||||
"How does prefilling the response with an opening curly brace ({ ) affect Claude's output when extracting structured data from text?","[""https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/prefill-claudes-response#example-1-controlling-output-formatting-and-skipping-the-preamble""]","python:file://eval_retrieval.py"
|
||||
"What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?","[""https://docs.anthropic.com/en/docs/build-with-claude/vision#dive-deeper-into-vision"",""https://docs.anthropic.com/en/docs/build-with-claude/vision#about-the-prompt-examples""]","python:file://eval_retrieval.py"
|
||||
"How do you specify the API key when creating a new Anthropic client in the Python and TypeScript SDK examples?","[""https://docs.anthropic.com/en/api/client-sdks#typescript"",""https://docs.anthropic.com/en/api/client-sdks#python""]","python:file://eval_retrieval.py"
|
||||
"What are two key benefits of using the Anthropic Evaluation tool when developing prompts for an AI classification application?","[""https://docs.anthropic.com/en/docs/about-claude/use-cases/classification#2-develop-your-test-cases"",""https://docs.anthropic.com/en/docs/test-and-evaluate/eval-tool#understanding-results""]","python:file://eval_retrieval.py"
|
||||
"What are the key differences between a pretrained language model like Claude's underlying model, and the final version of Claude available through Anthropic's API?","[""https://docs.anthropic.com/en/docs/resources/glossary#pretraining"",""https://docs.anthropic.com/en/docs/resources/glossary#llm"",""https://docs.anthropic.com/en/docs/resources/glossary#fine-tuning""]","python:file://eval_retrieval.py"
|
||||
"What is the IPv6 address range used by Anthropic?","[""https://docs.anthropic.com/en/api/ip-addresses#ipv6""]","python:file://eval_retrieval.py"
|
||||
"When using the Python SDK to create a message with Claude, what are two ways you can specify your API key?","[""https://docs.anthropic.com/en/api/messages-examples#multiple-conversational-turns"",""https://docs.anthropic.com/en/api/client-sdks#python""]","python:file://eval_retrieval.py"
|
||||
"How can you create multiple test cases for an evaluation in the Anthropic Evaluation tool?","[""https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#creating-test-cases"",""https://docs.claude.com/en/docs/build-with-claude/develop-tests#building-evals-and-test-cases""]","python:file://eval_retrieval.py"
|
||||
"What embeddings provider does Anthropic recommend for customized domain-specific models, and what capabilities does this provider offer?","[""https://docs.claude.com/en/docs/build-with-claude/embeddings#before-implementing-embeddings"",""https://docs.claude.com/en/docs/build-with-claude/embeddings#how-to-get-embeddings-with-anthropic""]","python:file://eval_retrieval.py"
|
||||
"What are some key success metrics to consider when evaluating Claude's performance on a classification task, and how do they relate to choosing the right model to reduce latency?","[""https://docs.claude.com/en/docs/about-claude/use-cases/classification#evaluation-metrics"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#1-choose-the-right-model""]","python:file://eval_retrieval.py"
|
||||
"What are two ways that Claude for Sheets can improve prompt engineering workflows compared to using chained prompts?","[""https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#why-use-claude-for-sheets"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#how-to-chain-prompts""]","python:file://eval_retrieval.py"
|
||||
"What happens if a prompt for the Text Completions API is missing the ""\n\nHuman:"" and ""\n\nAssistant:"" turns?","[""https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#system-prompt"",""https://docs.claude.com/en/api/prompt-validation#examples""]","python:file://eval_retrieval.py"
|
||||
"How do the additional tokens required for tool use in Claude API requests impact pricing compared to regular API requests?","[""https://docs.claude.com/en/docs/build-with-claude/tool-use#pricing"",""https://docs.claude.com/en/docs/build-with-claude/tool-use#how-tool-use-works""]","python:file://eval_retrieval.py"
|
||||
"When will the new Anthropic Developer Console features that show API usage, billing details, and rate limits be available?","[""https://docs.claude.com/en/release-notes/api#june-27th-2024""]","python:file://eval_retrieval.py"
|
||||
"When deciding whether to use chain-of-thought (CoT) for a task, what are two key factors to consider in order to strike the right balance between performance and latency?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-of-thought#why-not-let-claude-think"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-of-thought#before-implementing-cot""]","python:file://eval_retrieval.py"
|
||||
"How can I use Claude to more easily digest the content of long PDF documents?","[""https://docs.claude.com/en/docs/build-with-claude/text-generation#anthropic-cookbook"",""https://docs.claude.com/en/docs/build-with-claude/vision#before-you-upload""]","python:file://eval_retrieval.py"
|
||||
"According to the documentation, where can you view your organization's current API rate limits in the Claude Console?","[""https://docs.claude.com/en/api/rate-limits#about-our-limits"",""https://docs.claude.com/en/release-notes/api#june-27th-2024""]","python:file://eval_retrieval.py"
|
||||
"How can we measure the performance of the ticket classification system implemented using Claude beyond just accuracy?","[""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#evaluation-methodology"",""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#prompting-claude-for-ticket-routing""]","python:file://eval_retrieval.py"
|
||||
"How can you specify a system prompt using the Text Completions API versus the Messages API?","[""https://docs.claude.com/en/api/prompt-validation#examples"",""https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#system-prompt""]","python:file://eval_retrieval.py"
|
||||
"How can you combine XML tags with chain of thought reasoning to create high-performance prompts for Claude?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags#tagging-best-practices"",""https://docs.claude.com/en/docs/build-with-claude/tool-use#chain-of-thought""]","python:file://eval_retrieval.py"
|
||||
"When evaluating the Claude model's performance for ticket routing, what three key metrics are calculated and what are the results for the claude-3-haiku-20240307 model on the 91 test samples?","[""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#evaluation-methodology"",""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#example-data""]","python:file://eval_retrieval.py"
|
||||
"Before starting to engineer and improve a prompt in Claude, what key things does Anthropic recommend you have in place first?","[""https://docs.claude.com/en/docs/build-with-claude/define-success#next-steps"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#before-prompt-engineering""]","python:file://eval_retrieval.py"
|
||||
"How does the Messages API handle mid-response prompting compared to the Text Completions API?","[""https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#inputs-and-outputs"",""https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#putting-words-in-claudes-mouth""]","python:file://eval_retrieval.py"
|
||||
"How does Claude's response differ when given a role through a system prompt compared to not having a specific role in the financial analysis example?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/system-prompts#example-2-financial-analysis""]","python:file://eval_retrieval.py"
|
||||
"What are some quantitative metrics that can be used to measure the success of a sentiment analysis model, and how might specific targets for those metrics be determined?","[""https://docs.claude.com/en/docs/build-with-claude/define-success#building-strong-criteria""]","python:file://eval_retrieval.py"
|
||||
"What is a power user tip mentioned in the documentation for creating high-performance prompts using XML tags?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#how-to-prompt-engineer"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags#tagging-best-practices""]","python:file://eval_retrieval.py"
|
||||
"How can you use an LLM like Claude to automatically grade the outputs of other LLMs based on a rubric?","[""https://docs.claude.com/en/docs/build-with-claude/develop-tests#tips-for-llm-based-grading"",""https://docs.claude.com/en/api/messages-examples#multiple-conversational-turns""]","python:file://eval_retrieval.py"
|
||||
"How can you access and deploy Voyage embeddings on AWS Marketplace?","[""https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-on-the-aws-marketplace""]","python:file://eval_retrieval.py"
|
||||
"When using tools just to get Claude to produce JSON output following a particular schema, what key things should you do in terms of tool setup and prompting?","[""https://docs.claude.com/en/docs/build-with-claude/tool-use#tool-use-examples"",""https://docs.claude.com/en/docs/build-with-claude/tool-use#json-output""]","python:file://eval_retrieval.py"
|
||||
"What are the key differences between the legacy Claude Instant 1.2 model and the Claude 3 Haiku model in terms of capabilities and performance?","[""https://docs.claude.com/en/docs/about-claude/models#legacy-model-comparison"",""https://docs.claude.com/en/docs/about-claude/models#model-comparison"",""https://docs.claude.com/en/docs/about-claude/models#legacy-models""]","python:file://eval_retrieval.py"
|
||||
"What is one key benefit of using examples when prompt engineering with Claude?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/multishot-prompting#why-use-examples""]","python:file://eval_retrieval.py"
|
||||
"According to the Claude Documentation, what is one key advantage of using prompt engineering instead of fine-tuning when it comes to adapting an AI model to new domains or tasks?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#when-to-prompt-engineer"",""https://docs.claude.com/en/docs/resources/glossary#fine-tuning""]","python:file://eval_retrieval.py"
|
||||
"How can I quickly get started using the Claude for Sheets extension with a pre-made template?","[""https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#claude-for-sheets-workbook-template"",""https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#get-started-with-claude-for-sheets""]","python:file://eval_retrieval.py"
|
||||
"How does the ""index"" field in the ""content_block_delta"" event relate to the text being streamed in a response?","[""https://docs.claude.com/en/api/messages-streaming#basic-streaming-request"",""https://docs.claude.com/en/api/messages-streaming#text-delta""]","python:file://eval_retrieval.py"
|
||||
"How can you include an image as part of a Claude API request, and what image formats are currently supported?","[""https://docs.claude.com/en/api/messages-examples#vision"",""https://docs.claude.com/en/docs/build-with-claude/vision#about-the-prompt-examples""]","python:file://eval_retrieval.py"
|
||||
"What is the relationship between time to first token (TTFT) and latency when evaluating a language model's performance?","[""https://docs.claude.com/en/docs/resources/glossary#ttft-time-to-first-token"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#how-to-measure-latency"",""https://docs.claude.com/en/docs/resources/glossary#latency""]","python:file://eval_retrieval.py"
|
||||
"How can providing Claude with examples of handling certain edge cases like implicit requests or emotional prioritization help improve its performance in routing support tickets?","[""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#adapting-to-common-scenarios"",""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#prompting-claude-for-ticket-routing""]","python:file://eval_retrieval.py"
|
||||
"How does the stop_reason of ""tool_use"" relate to the overall workflow of integrating external tools with Claude?","[""https://docs.claude.com/en/api/messages-examples#tool-use-and-json-mode"",""https://docs.claude.com/en/docs/build-with-claude/tool-use#how-tool-use-works""]","python:file://eval_retrieval.py"
|
||||
"According to the documentation, what error event and corresponding HTTP error code may be sent during periods of high usage for the Claude API when using streaming responses?","[""https://docs.claude.com/en/api/messages-streaming#error-events"",""https://docs.claude.com/en/api/streaming#error-event-types"",""https://docs.claude.com/en/api/errors#http-errors""]","python:file://eval_retrieval.py"
|
||||
"What are the two types of deltas that can be contained in a content_block_delta event when streaming responses from the Claude API?","[""https://docs.claude.com/en/api/messages-streaming#text-delta"",""https://docs.claude.com/en/api/messages-streaming#delta-types""]","python:file://eval_retrieval.py"
|
||||
"On what date did Claude 3.5 Sonnet and tool use both become generally available across the Claude API, Amazon Bedrock, and Google Vertex AI?","[""https://docs.claude.com/en/release-notes/api#june-20th-2024"",""https://docs.claude.com/en/release-notes/api#may-30th-2024""]","python:file://eval_retrieval.py"
|
||||
"In what order did Anthropic launch Claude.ai and the Claude iOS app in Canada and Europe?","[""https://docs.claude.com/en/release-notes/claude-apps#june-5th-2024"",""https://docs.claude.com/en/release-notes/claude-apps#may-13th-2024""]","python:file://eval_retrieval.py"
|
||||
"When the API response from Claude has a stop_reason of ""tool_use"", what does this indicate and what should be done next to continue the conversation?","[""https://docs.claude.com/en/docs/build-with-claude/tool-use#json-output"",""https://docs.claude.com/en/docs/build-with-claude/tool-use#how-tool-use-works""]","python:file://eval_retrieval.py"
|
||||
"What Python libraries are used in the example code snippet for evaluating tone and style in a customer service chatbot?","[""https://docs.claude.com/en/docs/build-with-claude/develop-tests#example-evals""]","python:file://eval_retrieval.py"
|
||||
"What are the two main ways to authenticate when using the Anthropic Python SDK to access Claude models on Amazon Bedrock?","[""https://docs.claude.com/en/api/claude-on-amazon-bedrock#install-an-sdk-for-accessing-bedrock"",""https://docs.claude.com/en/api/claude-on-amazon-bedrock#making-requests""]","python:file://eval_retrieval.py"
|
||||
"When deciding whether to implement leak-resistant prompt engineering strategies, what two factors should be considered and balanced?","[""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-prompt-leak#strategies-to-reduce-prompt-leak"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-prompt-leak#before-you-try-to-reduce-prompt-leak""]","python:file://eval_retrieval.py"
|
||||
"How can selecting the appropriate Claude model based on your specific requirements help reduce latency in your application?","[""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#1-choose-the-right-model"",""https://docs.claude.com/en/docs/intro-to-claude#model-options""]","python:file://eval_retrieval.py"
|
||||
"How can you stream responses from the Claude API using the Python SDK?","[""https://docs.claude.com/en/api/messages-streaming#streaming-with-sdks"",""https://docs.claude.com/en/api/client-sdks#python""]","python:file://eval_retrieval.py"
|
||||
"How can you guide Claude's response by pre-filling part of the response, and what API parameter is used to generate a short response in this case?","[""https://docs.claude.com/en/api/messages-examples#putting-words-in-claudes-mouth"",""https://docs.claude.com/en/api/messages-examples#basic-request-and-response""]","python:file://eval_retrieval.py"
|
||||
"What is more important when building an eval set for an AI system - having a larger number of test cases with automated grading, or having fewer high-quality test cases graded by humans?","[""https://docs.claude.com/en/docs/build-with-claude/develop-tests#eval-design-principles"",""https://docs.claude.com/en/docs/build-with-claude/develop-tests#building-evals-and-test-cases""]","python:file://eval_retrieval.py"
|
||||
"What are the two required fields in a content_block_delta event for a text delta type?","[""https://docs.claude.com/en/api/messages-streaming#delta-types"",""https://docs.claude.com/en/api/messages-streaming#text-delta""]","python:file://eval_retrieval.py"
|
||||
"What are two interactive ways to learn how to use Claude's capabilities, such as uploading PDFs and generating embeddings?","[""https://docs.claude.com/en/docs/quickstart#next-steps"",""https://docs.claude.com/en/docs/welcome#develop-with-claude""]","python:file://eval_retrieval.py"
|
||||
"Why does breaking a task into distinct subtasks for chained prompts help improve Claude's accuracy on the overall task?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#how-to-chain-prompts"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#why-chain-prompts""]","python:file://eval_retrieval.py"
|
||||
"How does the streaming format for Messages responses differ from Text Completions streaming responses?","[""https://docs.claude.com/en/api/migrating-from-text-completions-to-messages#streaming-format""]","python:file://eval_retrieval.py"
|
||||
"What are two ways to start experimenting with Claude as a user, according to Anthropic's documentation?","[""https://docs.claude.com/en/docs/about-claude/models#get-started-with-claude""]","python:file://eval_retrieval.py"
|
||||
"How can using chain prompts help reduce errors and inconsistency in complex tasks handled by Claude?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/chain-prompts#why-chain-prompts"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/increase-consistency#chain-prompts-for-complex-tasks""]","python:file://eval_retrieval.py"
|
||||
"What HTTP status code does an overloaded_error event correspond to in a non-streaming context for the Claude API?","[""https://docs.claude.com/en/api/streaming#error-event-types"",""https://docs.claude.com/en/api/messages-streaming#error-events""]","python:file://eval_retrieval.py"
|
||||
"What are the two ways to specify the format in which Voyage AI returns embeddings through its HTTP API?","[""https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-http-api""]","python:file://eval_retrieval.py"
|
||||
"When streaming API requests that use tools, how are the input JSON deltas for tool_use content blocks sent, and how can they be accumulated and parsed by the client?","[""https://docs.claude.com/en/api/messages-streaming#input-json-delta"",""https://docs.claude.com/en/api/messages-streaming#streaming-request-with-tool-use""]","python:file://eval_retrieval.py"
|
||||
"What are the two interactive prompt engineering tutorials that Anthropic offers, and how do they differ?","[""https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#prompt-engineering-interactive-tutorial"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#prompt-engineering-tutorial""]","python:file://eval_retrieval.py"
|
||||
"What are some of the key capabilities that make Claude suitable for enterprise use cases requiring integration with specialized applications and processing of large volumes of sensitive data?","[""https://docs.claude.com/en/docs/intro-to-claude#enterprise-considerations""]","python:file://eval_retrieval.py"
|
||||
"As of June 2024, in which regions are Anthropic's Claude.ai API and iOS app available?","[""https://docs.claude.com/en/release-notes/claude-apps#may-1st-2024"",""https://docs.claude.com/en/release-notes/claude-apps#june-5th-2024"",""https://docs.claude.com/en/release-notes/claude-apps#may-13th-2024""]","python:file://eval_retrieval.py"
|
||||
"What are the two main approaches for integrating Claude into a support ticket workflow, and how do they differ in terms of scalability and ease of implementation?","[""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#integrate-claude-into-your-support-workflow"",""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#introduction""]","python:file://eval_retrieval.py"
|
||||
"When did Anthropic release a prompt generator tool to help guide Claude in generating high-quality prompts, and through what interface is it available?","[""https://docs.claude.com/en/release-notes/api#may-10th-2024""]","python:file://eval_retrieval.py"
|
||||
"Which Claude 3 model provides the best balance of intelligence and speed for high-throughput tasks like sales forecasting and targeted marketing?","[""https://docs.claude.com/en/api/claude-on-vertex-ai#api-model-names"",""https://docs.claude.com/en/docs/intro-to-claude#claude-3-family""]","python:file://eval_retrieval.py"
|
||||
"How can you calculate the similarity between two Voyage embedding vectors, and what is this equivalent to since Voyage embeddings are normalized to length 1?","[""https://docs.claude.com/en/docs/build-with-claude/embeddings#faq"",""https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-embedding-example""]","python:file://eval_retrieval.py"
|
||||
"How can using examples in prompts improve Claude's performance on complex tasks?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/multishot-prompting#why-use-examples"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/increase-consistency#chain-prompts-for-complex-tasks""]","python:file://eval_retrieval.py"
|
||||
"What are the two types of content block deltas that can be emitted when streaming responses with tool use, and what does each delta type contain?","[""https://docs.claude.com/en/api/messages-streaming#input-json-delta"",""https://docs.claude.com/en/api/messages-streaming#text-delta"",""https://docs.claude.com/en/api/messages-streaming#streaming-request-with-tool-use"",""https://docs.claude.com/en/api/messages-streaming#delta-types""]","python:file://eval_retrieval.py"
|
||||
"What are two key capabilities of Claude that enable it to build interactive systems and personalized user experiences?","[""https://docs.claude.com/en/docs/build-with-claude/text-generation#text-capabilities-and-use-cases""]","python:file://eval_retrieval.py"
|
||||
"What are the key event types included in a raw HTTP stream response when using message streaming, and what is the typical order they occur in?","[""https://docs.claude.com/en/api/messages-streaming#event-types"",""https://docs.claude.com/en/api/messages-streaming#raw-http-stream-response""]","python:file://eval_retrieval.py"
|
||||
"What is the maximum number of images that can be included in a single request using the Claude API compared to the claude.ai interface?","[""https://docs.claude.com/en/docs/build-with-claude/vision#about-the-prompt-examples"",""https://docs.claude.com/en/docs/build-with-claude/vision#faq""]","python:file://eval_retrieval.py"
|
||||
"When Claude's response is cut off due to hitting the max_tokens limit and contains an incomplete tool use block, what should you do to get the full tool use?","[""https://docs.claude.com/en/docs/build-with-claude/tool-use#troubleshooting-errors""]","python:file://eval_retrieval.py"
|
||||
"What two steps are needed before running a classification evaluation on Claude according to the documentation?","[""https://docs.claude.com/en/docs/about-claude/use-cases/classification#3-run-your-eval"",""https://docs.claude.com/en/docs/about-claude/use-cases/classification#2-develop-your-test-cases""]","python:file://eval_retrieval.py"
|
||||
"How can you use the content parameter in the messages list to influence Claude's response?","[""https://docs.claude.com/en/api/messages-examples#basic-request-and-response"",""https://docs.claude.com/en/api/messages-examples#putting-words-in-claudes-mouth""]","python:file://eval_retrieval.py"
|
||||
"What are two key advantages of prompt engineering over fine-tuning when it comes to model comprehension and general knowledge preservation?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#when-to-prompt-engineer"",""https://docs.claude.com/en/docs/resources/glossary#fine-tuning""]","python:file://eval_retrieval.py"
|
||||
"What are the two main steps to get started with making requests to Claude models on Anthropic's Bedrock API?","[""https://docs.claude.com/en/api/claude-on-amazon-bedrock#install-and-configure-the-aws-cli"",""https://docs.claude.com/en/api/claude-on-amazon-bedrock#making-requests""]","python:file://eval_retrieval.py"
|
||||
"How can you check which Claude models are available in a specific AWS region using the AWS CLI?","[""https://docs.claude.com/en/api/claude-on-amazon-bedrock#subscribe-to-anthropic-models"",""https://docs.claude.com/en/api/claude-on-amazon-bedrock#list-available-models""]","python:file://eval_retrieval.py"
|
||||
"What argument can be passed to the voyageai.Client.embed() method or the Voyage HTTP API to specify whether the input text is a query or a document?","[""https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-python-package"",""https://docs.claude.com/en/docs/build-with-claude/embeddings#voyage-http-api""]","python:file://eval_retrieval.py"
|
||||
"How do the streaming API delta formats differ between tool_use content blocks and text content blocks?","[""https://docs.claude.com/en/api/messages-streaming#input-json-delta"",""https://docs.claude.com/en/api/messages-streaming#text-delta""]","python:file://eval_retrieval.py"
|
||||
"What are the image file size limits when uploading images to Claude using the API versus on claude.ai?","[""https://docs.claude.com/en/docs/build-with-claude/vision#faq""]","python:file://eval_retrieval.py"
|
||||
"What is one key consideration when selecting a Claude model for an enterprise use case that needs low latency?","[""https://docs.claude.com/en/docs/intro-to-claude#model-options"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#1-choose-the-right-model""]","python:file://eval_retrieval.py"
|
||||
"What embedding model does Anthropic recommend for code retrieval, and how does its performance compare to alternatives according to Voyage AI?","[""https://docs.claude.com/en/docs/build-with-claude/embeddings#how-to-get-embeddings-with-anthropic"",""https://docs.claude.com/en/docs/build-with-claude/embeddings#available-voyage-models""]","python:file://eval_retrieval.py"
|
||||
"What are two ways the Claude Cookbook can help developers learn to use Anthropic's APIs?","[""https://docs.claude.com/en/docs/welcome#develop-with-claude"",""https://docs.claude.com/en/docs/quickstart#next-steps""]","python:file://eval_retrieval.py"
|
||||
"How does the size of the context window impact a language model's ability to utilize retrieval augmented generation (RAG)?","[""https://docs.claude.com/en/docs/resources/glossary#context-window"",""https://docs.claude.com/en/docs/resources/glossary#rag-retrieval-augmented-generation""]","python:file://eval_retrieval.py"
|
||||
"How can the Evaluation tool in Anthropic's Claude platform help improve prompts and build more robust AI applications?","[""https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#understanding-results"",""https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#creating-test-cases""]","python:file://eval_retrieval.py"
|
||||
"Which Claude model has the fastest comparative latency according to the comparison tables?","[""https://docs.claude.com/en/docs/about-claude/models#model-comparison"",""https://docs.claude.com/en/docs/about-claude/models#legacy-model-comparison""]","python:file://eval_retrieval.py"
|
||||
"How can you build up a conversation with multiple turns using the Anthropic Messages API in Python?","[""https://docs.claude.com/en/api/client-sdks#python"",""https://docs.claude.com/en/api/messages-examples#multiple-conversational-turns""]","python:file://eval_retrieval.py"
|
||||
"How can using XML tags to provide a specific role or context help improve Claude's analysis of a legal contract compared to not using a role prompt?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags#examples"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/system-prompts#example-1-legal-contract-analysis""]","python:file://eval_retrieval.py"
|
||||
"What are the key differences between how Claude 3 Opus and Claude 3 Sonnet handle missing information when making tool calls?","[""https://docs.claude.com/en/docs/build-with-claude/tool-use#chain-of-thought"",""https://docs.claude.com/en/docs/build-with-claude/tool-use#tool-use-examples""]","python:file://eval_retrieval.py"
|
||||
"What steps should be taken to ensure a reliable deployment of an automated ticket routing system using Claude into a production environment?","[""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#additional-considerations"",""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#integrate-claude-into-your-support-workflow""]","python:file://eval_retrieval.py"
|
||||
"How should you evaluate a model's performance on a ticket routing classifier?","[""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#evaluating-the-performance-of-your-ticket-routing-classifier"",""https://docs.claude.com/en/docs/about-claude/use-cases/ticket-routing#integrate-claude-into-your-support-workflow""]","python:file://eval_retrieval.py"
|
||||
"What two methods does Anthropic recommend for learning how to prompt engineer with Claude before diving into the techniques?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#how-to-prompt-engineer"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#prompt-engineering-tutorial""]","python:file://eval_retrieval.py"
|
||||
"What are the key differences between a pretrained large language model and Claude in terms of their training and capabilities?","[""https://docs.claude.com/en/docs/resources/glossary#llm"",""https://docs.claude.com/en/docs/resources/glossary#pretraining""]","python:file://eval_retrieval.py"
|
||||
"What are some key advantages of using prompt engineering instead of fine-tuning to adapt a pretrained language model for a specific task or domain?","[""https://docs.claude.com/en/docs/resources/glossary#fine-tuning"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering#when-to-prompt-engineer"",""https://docs.claude.com/en/docs/resources/glossary#pretraining""]","python:file://eval_retrieval.py"
|
||||
"How can you authenticate with GCP before running requests to access Claude models on Vertex AI?","[""https://docs.claude.com/en/api/claude-on-vertex-ai#making-requests"",""https://docs.claude.com/en/api/claude-on-vertex-ai#accessing-vertex-ai""]","python:file://eval_retrieval.py"
|
||||
"What new capabilities and features were introduced by Anthropic on May 10th, 2024 and how do they enable users to create and tailor prompts for specific tasks?","[""https://docs.claude.com/en/release-notes/api#may-10th-2024""]","python:file://eval_retrieval.py"
|
||||
"On what date did both the Claude 3.5 Sonnet model and the Artifacts feature in Claude.ai become available?","[""https://docs.claude.com/en/release-notes/api#june-20th-2024"",""https://docs.claude.com/en/release-notes/claude-apps#june-20th-2024""]","python:file://eval_retrieval.py"
|
||||
"When putting words in Claude's mouth to shape the response, what header and value can you use in the request to limit Claude's response to a single token?","[""https://docs.claude.com/en/api/messages-examples#basic-request-and-response"",""https://docs.claude.com/en/api/messages-examples#putting-words-in-claudes-mouth""]","python:file://eval_retrieval.py"
|
||||
"What does the temperature parameter do when working with large language models?","[""https://docs.claude.com/en/docs/resources/glossary#temperature"",""https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#2-optimize-prompt-and-output-length""]","python:file://eval_retrieval.py"
|
||||
"What are two ways to specify API parameters when calling the Claude API using Claude for Sheets?","[""https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#tips-for-effective-evaluation"",""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/prefill-claudes-response#how-to-prefill-claudes-response"",""https://docs.claude.com/en/docs/build-with-claude/claude-for-sheets#enter-your-first-prompt""]","python:file://eval_retrieval.py"
|
||||
"How does prefilling the response with an opening curly brace ({ ) affect Claude's output when extracting structured data from text?","[""https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/prefill-claudes-response#example-1-controlling-output-formatting-and-skipping-the-preamble""]","python:file://eval_retrieval.py"
|
||||
"What are some helpful resources provided by Anthropic to dive deeper into building with images using Claude?","[""https://docs.claude.com/en/docs/build-with-claude/vision#dive-deeper-into-vision"",""https://docs.claude.com/en/docs/build-with-claude/vision#about-the-prompt-examples""]","python:file://eval_retrieval.py"
|
||||
"How do you specify the API key when creating a new Anthropic client in the Python and TypeScript SDK examples?","[""https://docs.claude.com/en/api/client-sdks#typescript"",""https://docs.claude.com/en/api/client-sdks#python""]","python:file://eval_retrieval.py"
|
||||
"What are two key benefits of using the Anthropic Evaluation tool when developing prompts for an AI classification application?","[""https://docs.claude.com/en/docs/about-claude/use-cases/classification#2-develop-your-test-cases"",""https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#understanding-results""]","python:file://eval_retrieval.py"
|
||||
"What are the key differences between a pretrained language model like Claude's underlying model, and the final version of Claude available through Anthropic's API?","[""https://docs.claude.com/en/docs/resources/glossary#pretraining"",""https://docs.claude.com/en/docs/resources/glossary#llm"",""https://docs.claude.com/en/docs/resources/glossary#fine-tuning""]","python:file://eval_retrieval.py"
|
||||
"What is the IPv6 address range used by Anthropic?","[""https://docs.claude.com/en/api/ip-addresses#ipv6""]","python:file://eval_retrieval.py"
|
||||
"When using the Python SDK to create a message with Claude, what are two ways you can specify your API key?","[""https://docs.claude.com/en/api/messages-examples#multiple-conversational-turns"",""https://docs.claude.com/en/api/client-sdks#python""]","python:file://eval_retrieval.py"
|
||||
|
||||
|
@@ -4,11 +4,11 @@ from typing import Callable, List, Dict, Any, Tuple, Set
|
||||
from vectordb import VectorDB, SummaryIndexedVectorDB
|
||||
from anthropic import Anthropic
|
||||
|
||||
client = Anthropic(api_key=os.environ.get('ANTHROPIC_API_KEY'))
|
||||
client = Anthropic(api_key=os.environ.get('CLAUDE_API_KEY'))
|
||||
|
||||
# Initialize the VectorDB
|
||||
db = VectorDB("anthropic_docs")
|
||||
# Load the Anthropic documentation
|
||||
# Load the Claude Documentation
|
||||
with open('../data/anthropic_docs.json', 'r') as f:
|
||||
anthropic_docs = json.load(f)
|
||||
db.load_data(anthropic_docs)
|
||||
@@ -41,7 +41,7 @@ def answer_query_base(context):
|
||||
|
||||
# Initialize the VectorDB
|
||||
db_summary = SummaryIndexedVectorDB("anthropic_docs_summaries")
|
||||
# Load the Anthropic documentation
|
||||
# Load the Claude Documentation
|
||||
with open("../data/anthropic_summary_indexed_docs.json", 'r') as f:
|
||||
anthropic_docs_summaries = json.load(f)
|
||||
db_summary.load_data(anthropic_docs_summaries)
|
||||
@@ -74,7 +74,7 @@ def answer_query_level_two(context):
|
||||
|
||||
# Initialize the VectorDB
|
||||
db_rerank = SummaryIndexedVectorDB("anthropic_docs_rerank")
|
||||
# Load the Anthropic documentation
|
||||
# Load the Claude Documentation
|
||||
with open("../data/anthropic_summary_indexed_docs.json", 'r') as f:
|
||||
anthropic_docs_summaries = json.load(f)
|
||||
db_rerank.load_data(anthropic_docs_summaries)
|
||||
|
||||
@@ -6,7 +6,7 @@ from anthropic import Anthropic
|
||||
|
||||
# Initialize the VectorDB
|
||||
db = VectorDB("anthropic_docs")
|
||||
# Load the Anthropic documentation
|
||||
# Load the Claude Documentation
|
||||
with open('../data/anthropic_docs.json', 'r') as f:
|
||||
anthropic_docs = json.load(f)
|
||||
db.load_data(anthropic_docs)
|
||||
@@ -23,7 +23,7 @@ def retrieve_base(query, options, context):
|
||||
|
||||
# Initialize the VectorDB
|
||||
db_summary = SummaryIndexedVectorDB("anthropic_docs_summaries")
|
||||
# Load the Anthropic documentation
|
||||
# Load the Claude Documentation
|
||||
with open("../data/anthropic_summary_indexed_docs.json", 'r') as f:
|
||||
anthropic_docs_summaries = json.load(f)
|
||||
db_summary.load_data(anthropic_docs_summaries)
|
||||
@@ -64,7 +64,7 @@ def _rerank_results(query: str, results: List[Dict], k: int = 3) -> List[Dict]:
|
||||
<relevant_indices>put the numbers of your indices here, seeparted by commas</relevant_indices>
|
||||
"""
|
||||
|
||||
client = Anthropic(api_key=os.environ.get('ANTHROPIC_API_KEY'))
|
||||
client = Anthropic(api_key=os.environ.get('CLAUDE_API_KEY'))
|
||||
try:
|
||||
response = client.messages.create(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
@@ -108,7 +108,7 @@ def _rerank_results(query: str, results: List[Dict], k: int = 3) -> List[Dict]:
|
||||
|
||||
# Initialize the VectorDB
|
||||
db_rerank = SummaryIndexedVectorDB("anthropic_docs_summaries_rerank")
|
||||
# Load the Anthropic documentation
|
||||
# Load the Claude Documentation
|
||||
with open("../data/anthropic_summary_indexed_docs.json", 'r') as f:
|
||||
anthropic_docs_summaries = json.load(f)
|
||||
db_rerank.load_data(anthropic_docs_summaries)
|
||||
|
||||
@@ -8,7 +8,7 @@
|
||||
"\n",
|
||||
"Claude excels at a wide range of tasks, but it may struggle with queries specific to your unique business context. This is where Retrieval Augmented Generation (RAG) becomes invaluable. RAG enables Claude to leverage your internal knowledge bases or customer support documents, significantly enhancing its ability to answer domain-specific questions. Enterprises are increasingly building RAG applications to improve workflows in customer support, Q&A over internal company documents, financial & legal analysis, and much more.\n",
|
||||
"\n",
|
||||
"In this guide, we'll demonstrate how to build and optimize a RAG system using the Anthropic documentation as our knowledge base. We'll walk you through:\n",
|
||||
"In this guide, we'll demonstrate how to build and optimize a RAG system using the Claude Documentation as our knowledge base. We'll walk you through:\n",
|
||||
"\n",
|
||||
"1) Setting up a basic RAG system using an in-memory vector database and embeddings from [Voyage AI](https://www.voyageai.com/).\n",
|
||||
"\n",
|
||||
@@ -26,7 +26,7 @@
|
||||
"\n",
|
||||
"#### Note:\n",
|
||||
"\n",
|
||||
"The evaluations in this cookbook are meant to mirror a production evaluation system, and you should keep in mind that they can take a while to run. Also of note: if you run the evaluations in full, you may come up against rate limits unless you are in [Tier 2 and above](https://docs.anthropic.com/en/api/rate-limits). Consider skipping the full end to end eval if you're trying to conserve token usage.\n",
|
||||
"The evaluations in this cookbook are meant to mirror a production evaluation system, and you should keep in mind that they can take a while to run. Also of note: if you run the evaluations in full, you may come up against rate limits unless you are in [Tier 2 and above](https://docs.claude.com/en/api/rate-limits). Consider skipping the full end to end eval if you're trying to conserve token usage.\n",
|
||||
"\n",
|
||||
"## Table of Contents\n",
|
||||
"\n",
|
||||
@@ -166,7 +166,7 @@
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ['VOYAGE_API_KEY'] = \"VOYAGE KEY HERE\"\n",
|
||||
"os.environ['ANTHROPIC_API_KEY'] = \"ANTHROPIC KEY HERE\""
|
||||
"os.environ['CLAUDE_API_KEY'] = \"ANTHROPIC KEY HERE\""
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -180,7 +180,7 @@
|
||||
"\n",
|
||||
"client = anthropic.Anthropic(\n",
|
||||
" # This is the default and can be omitted\n",
|
||||
" api_key=os.getenv(\"ANTHROPIC_API_KEY\"),\n",
|
||||
" api_key=os.getenv(\"CLAUDE_API_KEY\"),\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -329,7 +329,7 @@
|
||||
"with open('evaluation/docs_evaluation_dataset.json', 'r') as f:\n",
|
||||
" eval_data = json.load(f)\n",
|
||||
"\n",
|
||||
"# Load the Anthropic documentation\n",
|
||||
"# Load the Claude Documentation\n",
|
||||
"with open('data/anthropic_docs.json', 'r') as f:\n",
|
||||
" anthropic_docs = json.load(f)\n",
|
||||
"\n",
|
||||
@@ -403,8 +403,8 @@
|
||||
" \"id\": \"efc09699\",\n",
|
||||
" \"question\": \"How can you create multiple test cases for an evaluation in the Anthropic Evaluation tool?\",\n",
|
||||
" \"correct_chunks\": [\n",
|
||||
" \"https://docs.anthropic.com/en/docs/test-and-evaluate/eval-tool#creating-test-cases\",\n",
|
||||
" \"https://docs.anthropic.com/en/docs/build-with-claude/develop-tests#building-evals-and-test-cases\"\n",
|
||||
" \"https://docs.claude.com/en/docs/test-and-evaluate/eval-tool#creating-test-cases\",\n",
|
||||
" \"https://docs.claude.com/en/docs/build-with-claude/develop-tests#building-evals-and-test-cases\"\n",
|
||||
" ],\n",
|
||||
" \"correct_answer\": \"To create multiple test cases in the Anthropic Evaluation tool, click the 'Add Test Case' button, fill in values for each variable in your prompt, and repeat the process to create additional test case scenarios.\"\n",
|
||||
" },\n",
|
||||
@@ -412,8 +412,8 @@
|
||||
" \"id\": \"1305ea00\",\n",
|
||||
" \"question\": \"What embeddings provider does Anthropic recommend for customized domain-specific models, and what capabilities does this provider offer?\",\n",
|
||||
" \"correct_chunks\": [\n",
|
||||
" \"https://docs.anthropic.com/en/docs/build-with-claude/embeddings#before-implementing-embeddings\",\n",
|
||||
" \"https://docs.anthropic.com/en/docs/build-with-claude/embeddings#how-to-get-embeddings-with-anthropic\"\n",
|
||||
" \"https://docs.claude.com/en/docs/build-with-claude/embeddings#before-implementing-embeddings\",\n",
|
||||
" \"https://docs.claude.com/en/docs/build-with-claude/embeddings#how-to-get-embeddings-with-anthropic\"\n",
|
||||
" ],\n",
|
||||
" \"correct_answer\": \"Anthropic recommends Voyage AI for embedding models. Voyage AI offers customized models for specific industry domains like finance and healthcare, as well as bespoke fine-tuned models for individual customers. They have a wide variety of options and capabilities.\"\n",
|
||||
" },\n",
|
||||
@@ -421,8 +421,8 @@
|
||||
" \"id\": \"1811c10d\",\n",
|
||||
" \"question\": \"What are some key success metrics to consider when evaluating Claude's performance on a classification task, and how do they relate to choosing the right model to reduce latency?\",\n",
|
||||
" \"correct_chunks\": [\n",
|
||||
" \"https://docs.anthropic.com/en/docs/about-claude/use-cases/classification#evaluation-metrics\",\n",
|
||||
" \"https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#1-choose-the-right-model\"\n",
|
||||
" \"https://docs.claude.com/en/docs/about-claude/use-cases/classification#evaluation-metrics\",\n",
|
||||
" \"https://docs.claude.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-latency#1-choose-the-right-model\"\n",
|
||||
" ],\n",
|
||||
" \"correct_answer\": \"When evaluating Claude's performance on a classification task, some key success metrics to consider include accuracy, F1 score, consistency, structure, speed, bias and fairness. Choosing the right model that fits your specific requirements in terms of speed and output quality is a straightforward way to reduce latency and meet the acceptable response time for your use case.\"\n",
|
||||
" }\n",
|
||||
@@ -1131,7 +1131,7 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"<content>\n",
|
||||
"<explanation>The Generated Answer is incorrect. According to the Correct Answer, rate limits can be viewed in the \"Rate Limits tab\" of the Developer Console. However, the Generated Answer states they can be found in the \"Plans and Billing section.\" These are two different locations, representing a direct contradiction. The Generated Answer provides incorrect information about where to find this specific information in the Anthropic Console.</explanation>\n",
|
||||
"<explanation>The Generated Answer is incorrect. According to the Correct Answer, rate limits can be viewed in the \"Rate Limits tab\" of the Developer Console. However, the Generated Answer states they can be found in the \"Plans and Billing section.\" These are two different locations, representing a direct contradiction. The Generated Answer provides incorrect information about where to find this specific information in the Claude Console.</explanation>\n",
|
||||
"<is_correct>false</is_correct>\n",
|
||||
"</content>\n",
|
||||
"\n",
|
||||
@@ -1933,7 +1933,7 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"<content>\n",
|
||||
"<explanation>The generated answer is incorrect. While it correctly mentions the Anthropic Cookbook as one interactive learning resource, it fails to mention the Developer Console and its prompt generator tool, which is a key component mentioned in the correct answer. Instead, it references the \"More Resources\" section and documentation, which weren't identified in the correct answer as interactive learning methods. The generated answer therefore misses one of the two main interactive learning tools specified in the correct answer.</explanation>\n",
|
||||
"<explanation>The generated answer is incorrect. While it correctly mentions the Claude Cookbook as one interactive learning resource, it fails to mention the Developer Console and its prompt generator tool, which is a key component mentioned in the correct answer. Instead, it references the \"More Resources\" section and documentation, which weren't identified in the correct answer as interactive learning methods. The generated answer therefore misses one of the two main interactive learning tools specified in the correct answer.</explanation>\n",
|
||||
"<is_correct>false</is_correct>\n",
|
||||
"</content>\n",
|
||||
"\n"
|
||||
@@ -2028,7 +2028,7 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"<content>\n",
|
||||
"<explanation>The Generated Answer is correct. Both answers state that an overloaded_error event corresponds to HTTP status code 529 in a non-streaming context for the Anthropic API. While the Correct Answer uses slightly more formal language (\"would normally correspond to\"), the core information - the 529 status code - is identical in both answers. The difference in phrasing does not change the fundamental meaning or accuracy of the response.</explanation>\n",
|
||||
"<explanation>The Generated Answer is correct. Both answers state that an overloaded_error event corresponds to HTTP status code 529 in a non-streaming context for the Claude API. While the Correct Answer uses slightly more formal language (\"would normally correspond to\"), the core information - the 529 status code - is identical in both answers. The difference in phrasing does not change the fundamental meaning or accuracy of the response.</explanation>\n",
|
||||
"<is_correct>true</is_correct>\n",
|
||||
"</content>\n",
|
||||
"\n",
|
||||
@@ -2346,7 +2346,7 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"<content>\n",
|
||||
"<explanation>The Generated Answer is correct. It conveys the same key information as the Correct Answer - specifically that the Anthropic API allows up to 20 images per request while the claude.ai interface has a lower limit of 5 images per turn. While the Generated Answer is more concise and uses slightly different wording, it captures the essential numerical limits accurately and maintains the key comparison between the two interfaces. There are no missing critical details or contradictions between the two answers.</explanation>\n",
|
||||
"<explanation>The Generated Answer is correct. It conveys the same key information as the Correct Answer - specifically that the Claude API allows up to 20 images per request while the claude.ai interface has a lower limit of 5 images per turn. While the Generated Answer is more concise and uses slightly different wording, it captures the essential numerical limits accurately and maintains the key comparison between the two interfaces. There are no missing critical details or contradictions between the two answers.</explanation>\n",
|
||||
"<is_correct>true</is_correct>\n",
|
||||
"</content>\n",
|
||||
"\n"
|
||||
@@ -3116,7 +3116,7 @@
|
||||
"<explanation>The Generated Answer is correct. It describes the same two methods for specifying the API key as mentioned in the Correct Answer:\n",
|
||||
"\n",
|
||||
"1. Passing the API key directly when initializing the Anthropic client\n",
|
||||
"2. Setting it as an environment variable named ANTHROPIC_API_KEY\n",
|
||||
"2. Setting it as an environment variable named CLAUDE_API_KEY\n",
|
||||
"\n",
|
||||
"The Generated Answer even provides helpful code examples to illustrate both methods, though these weren't required to match the Correct Answer. The substance and key information is identical between both answers, just expressed in slightly different words.</explanation>\n",
|
||||
"<is_correct>true</is_correct>\n",
|
||||
@@ -3804,7 +3804,7 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"<content>\n",
|
||||
"<explanation>The Generated Answer is correct as it conveys the same core message as the Correct Answer. Both answers emphasize that Claude can be used to summarize PDF documents, making it easier to understand long documents without reading everything. While the Generated Answer provides additional details about text analysis capabilities and mentions the Anthropic Cookbook, these are supplementary details that don't contradict the core message. The essential functionality - uploading PDFs and getting summaries to more easily digest long documents - is accurately captured in both answers.</explanation>\n",
|
||||
"<explanation>The Generated Answer is correct as it conveys the same core message as the Correct Answer. Both answers emphasize that Claude can be used to summarize PDF documents, making it easier to understand long documents without reading everything. While the Generated Answer provides additional details about text analysis capabilities and mentions the Claude Cookbook, these are supplementary details that don't contradict the core message. The essential functionality - uploading PDFs and getting summaries to more easily digest long documents - is accurately captured in both answers.</explanation>\n",
|
||||
"<is_correct>true</is_correct>\n",
|
||||
"</content>\n",
|
||||
"\n"
|
||||
@@ -3823,7 +3823,7 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"<content>\n",
|
||||
"<explanation>The Generated Answer is correct. Both answers indicate that you can view the API rate limits in a \"Rate Limits\" tab within Anthropic's console interface. While the Correct Answer specifically mentions \"Developer Console\" and the Generated Answer just says \"Anthropic Console,\" this is a minor difference in terminology that doesn't change the core substance of the answer. Both answers convey the same essential information - that rate limits can be viewed in a dedicated Rate Limits tab.</explanation>\n",
|
||||
"<explanation>The Generated Answer is correct. Both answers indicate that you can view the API rate limits in a \"Rate Limits\" tab within Anthropic's console interface. While the Correct Answer specifically mentions \"Developer Console\" and the Generated Answer just says \"Claude Console,\" this is a minor difference in terminology that doesn't change the core substance of the answer. Both answers convey the same essential information - that rate limits can be viewed in a dedicated Rate Limits tab.</explanation>\n",
|
||||
"<is_correct>true</is_correct>\n",
|
||||
"</content>\n",
|
||||
"\n",
|
||||
@@ -3947,7 +3947,7 @@
|
||||
"2. Having ways to empirically test against those criteria\n",
|
||||
"3. Having a first draft prompt to improve\n",
|
||||
"\n",
|
||||
"The Generated Answer even presents these points in the same order as the Correct Answer. While it adds an additional detail about using the prompt generator in the Anthropic Console, this extra information doesn't contradict the core message and doesn't affect the fundamental correctness of the answer. The substance and main requirements are identical between both answers.</explanation>\n",
|
||||
"The Generated Answer even presents these points in the same order as the Correct Answer. While it adds an additional detail about using the prompt generator in the Claude Console, this extra information doesn't contradict the core message and doesn't affect the fundamental correctness of the answer. The substance and main requirements are identical between both answers.</explanation>\n",
|
||||
"<is_correct>true</is_correct>\n",
|
||||
"</content>\n",
|
||||
"\n"
|
||||
@@ -4633,7 +4633,7 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"<content>\n",
|
||||
"<explanation>The Generated Answer is incorrect because it misses a critical piece of information from the Correct Answer. While it correctly mentions the Anthropic Cookbook as one interactive way to learn Claude's capabilities, it completely fails to mention the Developer Console and its prompt generator tool, which is the second key interactive learning method specified in the Correct Answer. Instead, it incorrectly references \"Claude for Sheets usage examples\" as the second method, which wasn't mentioned in the Correct Answer at all. The omission of the Developer Console and the inclusion of incorrect information makes this answer incomplete and partially inaccurate.</explanation>\n",
|
||||
"<explanation>The Generated Answer is incorrect because it misses a critical piece of information from the Correct Answer. While it correctly mentions the Claude Cookbook as one interactive way to learn Claude's capabilities, it completely fails to mention the Developer Console and its prompt generator tool, which is the second key interactive learning method specified in the Correct Answer. Instead, it incorrectly references \"Claude for Sheets usage examples\" as the second method, which wasn't mentioned in the Correct Answer at all. The omission of the Developer Console and the inclusion of incorrect information makes this answer incomplete and partially inaccurate.</explanation>\n",
|
||||
"<is_correct>false</is_correct>\n",
|
||||
"</content>\n",
|
||||
"\n"
|
||||
@@ -4728,7 +4728,7 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"<content>\n",
|
||||
"<explanation>The Generated Answer is correct. Both answers state that an overloaded_error event corresponds to HTTP status code 529 in a non-streaming context for the Anthropic API. While the Correct Answer uses slightly more formal language (\"would normally correspond to\"), the core information - the 529 status code - is identical in both answers. The difference in phrasing does not change the fundamental meaning or accuracy of the response.</explanation>\n",
|
||||
"<explanation>The Generated Answer is correct. Both answers state that an overloaded_error event corresponds to HTTP status code 529 in a non-streaming context for the Claude API. While the Correct Answer uses slightly more formal language (\"would normally correspond to\"), the core information - the 529 status code - is identical in both answers. The difference in phrasing does not change the fundamental meaning or accuracy of the response.</explanation>\n",
|
||||
"<is_correct>true</is_correct>\n",
|
||||
"</content>\n",
|
||||
"\n",
|
||||
@@ -5051,7 +5051,7 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"<content>\n",
|
||||
"<explanation>The Generated Answer is correct. It conveys the same key information as the Correct Answer - specifically that the Anthropic API allows up to 20 images per request while the claude.ai interface has a 5 image limit. While the Correct Answer uses slightly different wording (\"per turn\" vs \"per request\"), the substance and numerical limits stated are identical. There are no critical missing pieces of information or contradictions between the two answers.</explanation>\n",
|
||||
"<explanation>The Generated Answer is correct. It conveys the same key information as the Correct Answer - specifically that the Claude API allows up to 20 images per request while the claude.ai interface has a 5 image limit. While the Correct Answer uses slightly different wording (\"per turn\" vs \"per request\"), the substance and numerical limits stated are identical. There are no critical missing pieces of information or contradictions between the two answers.</explanation>\n",
|
||||
"<is_correct>true</is_correct>\n",
|
||||
"</content>\n",
|
||||
"\n"
|
||||
@@ -5298,7 +5298,7 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"<content>\n",
|
||||
"<explanation>The Generated Answer is essentially correct. Both answers highlight that the Anthropic Cookbook provides interactive Jupyter notebooks that demonstrate API functionality, specifically mentioning PDF uploads and embeddings. While the Generated Answer splits this into two points and adds some additional context about hands-on learning, the core information matches the Correct Answer. There are no contradictions or missing critical pieces of information between the two answers - they're conveying the same fundamental message about how the Cookbook helps developers learn through interactive notebooks and demonstrations.</explanation>\n",
|
||||
"<explanation>The Generated Answer is essentially correct. Both answers highlight that the Claude Cookbook provides interactive Jupyter notebooks that demonstrate API functionality, specifically mentioning PDF uploads and embeddings. While the Generated Answer splits this into two points and adds some additional context about hands-on learning, the core information matches the Correct Answer. There are no contradictions or missing critical pieces of information between the two answers - they're conveying the same fundamental message about how the Cookbook helps developers learn through interactive notebooks and demonstrations.</explanation>\n",
|
||||
"<is_correct>true</is_correct>\n",
|
||||
"</content>\n",
|
||||
"\n"
|
||||
@@ -5733,7 +5733,7 @@
|
||||
"<explanation>The Generated Answer is correct as it conveys the same essential information as the Correct Answer. Both answers indicate that:\n",
|
||||
"\n",
|
||||
"1. You can specify the API key as a parameter when creating a new Anthropic client\n",
|
||||
"2. If no API key is provided, it defaults to using the ANTHROPIC_API_KEY environment variable\n",
|
||||
"2. If no API key is provided, it defaults to using the CLAUDE_API_KEY environment variable\n",
|
||||
"\n",
|
||||
"The Generated Answer actually provides more detail by showing code examples in both Python and TypeScript, but the core information matches the Correct Answer. There are no contradictions between the two answers, and no critical information from the Correct Answer is missing from the Generated Answer.</explanation>\n",
|
||||
"<is_correct>true</is_correct>\n",
|
||||
@@ -5817,7 +5817,7 @@
|
||||
"\n",
|
||||
"<content>\n",
|
||||
"<explanation>The Generated Answer is correct. It identifies the same two methods for specifying the API key as mentioned in the Correct Answer:\n",
|
||||
"1. Using the environment variable ANTHROPIC_API_KEY\n",
|
||||
"1. Using the environment variable CLAUDE_API_KEY\n",
|
||||
"2. Passing the API key directly when initializing the client via the api_key parameter\n",
|
||||
"\n",
|
||||
"While the Generated Answer is more concise, it captures all the essential information from the Correct Answer. There are no contradictions between the two answers, and no critical information is missing. The differences are merely in phrasing and level of detail, but the core substance is identical.</explanation>\n",
|
||||
@@ -7947,7 +7947,7 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"<content>\n",
|
||||
"<explanation>The Generated Answer is correct. Both answers indicate that you can view the API rate limits in a Rate Limits tab within Anthropic's console interface. The only difference is minor wording variation (\"Developer Console\" vs \"Anthropic Console\") and the Generated Answer's inclusion of the word \"new,\" but these don't change the core substance of the answer. Both answers convey the same essential information about where to find the rate limits.</explanation>\n",
|
||||
"<explanation>The Generated Answer is correct. Both answers indicate that you can view the API rate limits in a Rate Limits tab within Anthropic's console interface. The only difference is minor wording variation (\"Developer Console\" vs \"Claude Console\") and the Generated Answer's inclusion of the word \"new,\" but these don't change the core substance of the answer. Both answers convey the same essential information about where to find the rate limits.</explanation>\n",
|
||||
"<is_correct>true</is_correct>\n",
|
||||
"</content>\n",
|
||||
"\n",
|
||||
@@ -8672,7 +8672,7 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"<content>\n",
|
||||
"<explanation>The Generated Answer is incorrect. It describes authentication methods for the standard Anthropic API, not for accessing Claude models through Amazon Bedrock. The correct authentication methods involve AWS credentials (either direct credentials or using AWS credential providers), while the Generated Answer talks about using ANTHROPIC_API_KEY. These are fundamentally different authentication approaches since Bedrock requires AWS-specific credentials. The Generated Answer shows no awareness of AWS authentication requirements and instead provides completely different, incorrect authentication methods.</explanation>\n",
|
||||
"<explanation>The Generated Answer is incorrect. It describes authentication methods for the standard Claude API, not for accessing Claude models through Amazon Bedrock. The correct authentication methods involve AWS credentials (either direct credentials or using AWS credential providers), while the Generated Answer talks about using CLAUDE_API_KEY. These are fundamentally different authentication approaches since Bedrock requires AWS-specific credentials. The Generated Answer shows no awareness of AWS authentication requirements and instead provides completely different, incorrect authentication methods.</explanation>\n",
|
||||
"<is_correct>false</is_correct>\n",
|
||||
"</content>\n",
|
||||
"\n",
|
||||
@@ -8845,7 +8845,7 @@
|
||||
"<content>\n",
|
||||
"<explanation>The Generated Answer is correct. It captures the two key interactive ways to learn Claude's capabilities that were mentioned in the Correct Answer:\n",
|
||||
"\n",
|
||||
"1. The Anthropic Cookbook with its interactive Jupyter notebooks\n",
|
||||
"1. The Claude Cookbook with its interactive Jupyter notebooks\n",
|
||||
"2. The Developer Console with its prompt generator tool\n",
|
||||
"\n",
|
||||
"The Generated Answer actually provides slightly more detail than the Correct Answer, but the core substance is the same. The mention of VoyageAI and additional details about the Developer Console don't contradict the Correct Answer - they're just supplementary information. Both answers focus on the same two main interactive learning methods, and there are no critical omissions or contradictions between them.</explanation>\n",
|
||||
@@ -8958,7 +8958,7 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"<content>\n",
|
||||
"<explanation>The Generated Answer is correct. Both answers state that an overloaded_error event corresponds to HTTP status code 529 in a non-streaming context for the Anthropic API. While the Correct Answer uses slightly more formal language (\"would normally correspond to\"), the core information - the 529 status code - is identical in both answers. The difference in phrasing does not change the fundamental meaning or accuracy of the response.</explanation>\n",
|
||||
"<explanation>The Generated Answer is correct. Both answers state that an overloaded_error event corresponds to HTTP status code 529 in a non-streaming context for the Claude API. While the Correct Answer uses slightly more formal language (\"would normally correspond to\"), the core information - the 529 status code - is identical in both answers. The difference in phrasing does not change the fundamental meaning or accuracy of the response.</explanation>\n",
|
||||
"<is_correct>true</is_correct>\n",
|
||||
"</content>\n",
|
||||
"\n",
|
||||
@@ -9327,7 +9327,7 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"<content>\n",
|
||||
"<explanation>The Generated Answer is correct. Both answers convey the same key information: that the Anthropic API allows up to 20 images per request, while the claude.ai interface has a limit of 5 images. While the Correct Answer provides slightly more context by mentioning \"Messages API\" and \"per turn,\" the core numerical limits are identical and accurately stated in the Generated Answer. The substance and critical information about the image limits are preserved, even if expressed more concisely.</explanation>\n",
|
||||
"<explanation>The Generated Answer is correct. Both answers convey the same key information: that the Claude API allows up to 20 images per request, while the claude.ai interface has a limit of 5 images. While the Correct Answer provides slightly more context by mentioning \"Messages API\" and \"per turn,\" the core numerical limits are identical and accurately stated in the Generated Answer. The substance and critical information about the image limits are preserved, even if expressed more concisely.</explanation>\n",
|
||||
"<is_correct>true</is_correct>\n",
|
||||
"</content>\n",
|
||||
"\n",
|
||||
@@ -10121,7 +10121,7 @@
|
||||
"<explanation>The Generated Answer is correct and actually provides more detailed information than the Correct Answer while maintaining the same core information. Both answers convey that:\n",
|
||||
"\n",
|
||||
"1. The API key can be specified as a parameter when creating a new Anthropic client\n",
|
||||
"2. If not provided explicitly, the SDK will default to using the ANTHROPIC_API_KEY environment variable\n",
|
||||
"2. If not provided explicitly, the SDK will default to using the CLAUDE_API_KEY environment variable\n",
|
||||
"\n",
|
||||
"The Generated Answer goes further by providing specific code examples in both Python and TypeScript, but this additional detail doesn't contradict or omit any of the key information from the Correct Answer. The substance of both answers is essentially the same.</explanation>\n",
|
||||
"<is_correct>true</is_correct>\n",
|
||||
@@ -10225,7 +10225,7 @@
|
||||
"\n",
|
||||
"<content>\n",
|
||||
"<explanation>The Generated Answer is correct. It captures both key methods for specifying the API key that are mentioned in the Correct Answer:\n",
|
||||
"1. Using the ANTHROPIC_API_KEY environment variable\n",
|
||||
"1. Using the CLAUDE_API_KEY environment variable\n",
|
||||
"2. Passing the API key directly when initializing the client\n",
|
||||
"\n",
|
||||
"While the Generated Answer is more concise, it contains the same essential information as the Correct Answer. The additional details in the Correct Answer (like mentioning that the environment variable is used \"by default\") are supplementary and don't change the core correctness of the Generated Answer. There are no contradictions between the two answers, and no critical information is missing.</explanation>\n",
|
||||
|
||||
@@ -3789,8 +3789,8 @@ Fail Reason: Average score is below threshold","[PASS] (1.66)
|
||||
|
||||
Pass Reason: All assertions passed","[FAIL] (0.34)
|
||||
|
||||
Error running Python script: anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of request tokens has exceeded your per-minute rate limit (https://docs.anthropic.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}
|
||||
Stack Trace: Error: anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of request tokens has exceeded your per-minute rate limit (https://docs.anthropic.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}
|
||||
Error running Python script: anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of request tokens has exceeded your per-minute rate limit (https://docs.claude.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}
|
||||
Stack Trace: Error: anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of request tokens has exceeded your per-minute rate limit (https://docs.claude.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}
|
||||
at PythonShell.parseError (/Users/sflamini/.npm/_npx/81bbc6515d992ace/node_modules/python-shell/index.js:303:21)
|
||||
at terminateIfNeeded (/Users/sflamini/.npm/_npx/81bbc6515d992ace/node_modules/python-shell/index.js:193:32)
|
||||
at ChildProcess.<anonymous> (/Users/sflamini/.npm/_npx/81bbc6515d992ace/node_modules/python-shell/index.js:185:13)
|
||||
@@ -3870,8 +3870,8 @@ Key Provisions:
|
||||
|
||||
</summary>
|
||||
|
||||
Fail Reason: Error running Python script: anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of request tokens has exceeded your per-minute rate limit (https://docs.anthropic.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}
|
||||
Stack Trace: Error: anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of request tokens has exceeded your per-minute rate limit (https://docs.anthropic.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}
|
||||
Fail Reason: Error running Python script: anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of request tokens has exceeded your per-minute rate limit (https://docs.claude.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}
|
||||
Stack Trace: Error: anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of request tokens has exceeded your per-minute rate limit (https://docs.claude.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}
|
||||
at PythonShell.parseError (/Users/sflamini/.npm/_npx/81bbc6515d992ace/node_modules/python-shell/index.js:303:21)
|
||||
at terminateIfNeeded (/Users/sflamini/.npm/_npx/81bbc6515d992ace/node_modules/python-shell/index.js:193:32)
|
||||
at ChildProcess.<anonymous> (/Users/sflamini/.npm/_npx/81bbc6515d992ace/node_modules/python-shell/index.js:185:13)
|
||||
@@ -3967,8 +3967,8 @@ Here is a summary of the key aspects of the sublease agreement:
|
||||
|
||||
Fail Reason: Expected output to contain all of ""parties involved, property details, term and rent, responsibilities, consent and notices, special provisions""","[FAIL] (0.75)
|
||||
|
||||
Error running Python script: anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of request tokens has exceeded your per-minute rate limit (https://docs.anthropic.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}
|
||||
Stack Trace: Error: anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of request tokens has exceeded your per-minute rate limit (https://docs.anthropic.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}
|
||||
Error running Python script: anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of request tokens has exceeded your per-minute rate limit (https://docs.claude.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}
|
||||
Stack Trace: Error: anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of request tokens has exceeded your per-minute rate limit (https://docs.claude.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}
|
||||
at PythonShell.parseError (/Users/sflamini/.npm/_npx/81bbc6515d992ace/node_modules/python-shell/index.js:303:21)
|
||||
at terminateIfNeeded (/Users/sflamini/.npm/_npx/81bbc6515d992ace/node_modules/python-shell/index.js:193:32)
|
||||
at ChildProcess.<anonymous> (/Users/sflamini/.npm/_npx/81bbc6515d992ace/node_modules/python-shell/index.js:185:13)
|
||||
@@ -4063,8 +4063,8 @@ Here is a summary of the key aspects of the sublease agreement:
|
||||
|
||||
</summary>
|
||||
|
||||
Fail Reason: Error running Python script: anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of request tokens has exceeded your per-minute rate limit (https://docs.anthropic.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}
|
||||
Stack Trace: Error: anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of request tokens has exceeded your per-minute rate limit (https://docs.anthropic.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}
|
||||
Fail Reason: Error running Python script: anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of request tokens has exceeded your per-minute rate limit (https://docs.claude.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}
|
||||
Stack Trace: Error: anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of request tokens has exceeded your per-minute rate limit (https://docs.claude.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}
|
||||
at PythonShell.parseError (/Users/sflamini/.npm/_npx/81bbc6515d992ace/node_modules/python-shell/index.js:303:21)
|
||||
at terminateIfNeeded (/Users/sflamini/.npm/_npx/81bbc6515d992ace/node_modules/python-shell/index.js:193:32)
|
||||
at ChildProcess.<anonymous> (/Users/sflamini/.npm/_npx/81bbc6515d992ace/node_modules/python-shell/index.js:185:13)
|
||||
@@ -7733,9 +7733,9 @@ Expected output to contain all of ""parties involved, property details, term and
|
||||
|
||||
Fail Reason: Expected output to contain all of ""parties involved, property details, term and rent, responsibilities, consent and notices, special provisions""","[FAIL] (0.00)
|
||||
|
||||
API call error: Number of request tokens has exceeded your per-minute rate limit (https://docs.anthropic.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase., status 429, type rate_limit_error
|
||||
API call error: Number of request tokens has exceeded your per-minute rate limit (https://docs.claude.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase., status 429, type rate_limit_error
|
||||
---
|
||||
API call error: Number of request tokens has exceeded your per-minute rate limit (https://docs.anthropic.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase., status 429, type rate_limit_error","[FAIL] (1.34)
|
||||
API call error: Number of request tokens has exceeded your per-minute rate limit (https://docs.claude.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase., status 429, type rate_limit_error","[FAIL] (1.34)
|
||||
|
||||
Expected output to contain all of ""parties involved, property details, term and rent, responsibilities, consent and notices, special provisions""
|
||||
---
|
||||
|
||||
|
Can't render this file because it is too large.
|
@@ -25,7 +25,7 @@ For this example you will need to install the following dependencies in order fo
|
||||
|
||||
### Getting Started
|
||||
|
||||
To get started, set your ANTHROPIC_API_KEY environment variable, or other required keys for the providers you selected. You can do `export ANTHROPIC_API_KEY=YOUR_API_KEY`.
|
||||
To get started, set your CLAUDE_API_KEY environment variable, or other required keys for the providers you selected. You can do `export CLAUDE_API_KEY=YOUR_API_KEY`.
|
||||
|
||||
Then, `cd` into the `evaluation` directory and write `npx promptfoo@latest eval -c promptfooconfig.yaml --output ../data/results.csv`
|
||||
|
||||
|
||||
@@ -14,7 +14,7 @@ def llm_eval(summary, input):
|
||||
Returns:
|
||||
bool: True if the average score is above the threshold, False otherwise.
|
||||
"""
|
||||
client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
|
||||
client = anthropic.Anthropic(api_key=os.getenv("CLAUDE_API_KEY"))
|
||||
|
||||
# You could include an example here too and likely improve performance further!
|
||||
prompt = f"""Evaluate the following summary based on these criteria:
|
||||
|
||||
@@ -54,7 +54,7 @@
|
||||
"- seaborn\n",
|
||||
"- [promptfoo](https://www.promptfoo.dev/) (for evaluation)\n",
|
||||
"\n",
|
||||
"You'll also need an Anthropic API key.\n",
|
||||
"You'll also need an Claude API key.\n",
|
||||
"\n",
|
||||
"Let's start by installing the required packages and setting up our environment:"
|
||||
]
|
||||
@@ -99,7 +99,7 @@
|
||||
"# load_dotenv()\n",
|
||||
"\n",
|
||||
"# or add your key directly\n",
|
||||
"api_key = 'ANTHROPIC_API_KEY' # Replace ANTHROPIC_API_KEY with your actual API key\n",
|
||||
"api_key = 'CLAUDE_API_KEY' # Replace CLAUDE_API_KEY with your actual API key\n",
|
||||
"client = anthropic.Anthropic(api_key=api_key)\n",
|
||||
"\n",
|
||||
"print(\"Setup complete!\")"
|
||||
@@ -1060,7 +1060,7 @@
|
||||
"\n",
|
||||
"As mentioned in the introduction to this cookbook, evaluating the quality of a summary is hard work. This is because there are many ways to summarize a document, and different summaries may be equally valid. Depending on the use case, different aspects of a summary may be more or less important.\n",
|
||||
"\n",
|
||||
"You can read more about our empirical methodology to prompt engineering [here](https://docs.anthropic.com/en/docs/prompt-engineering). Using a Jupyter Notebook is a great way to start prompt engineering but as your datasets grow larger and your prompts more numerous it is important to leverage tooling that will scale with you. \n",
|
||||
"You can read more about our empirical methodology to prompt engineering [here](https://docs.claude.com/en/docs/prompt-engineering). Using a Jupyter Notebook is a great way to start prompt engineering but as your datasets grow larger and your prompts more numerous it is important to leverage tooling that will scale with you. \n",
|
||||
"\n",
|
||||
"In this section of the guide we will explore using [Promptfoo](https://www.promptfoo.dev/) an open source LLM evaluation toolkit. To get started head over to the `./evaluation` directory and checkout the `./evaluation/README.md`.\n",
|
||||
"\n",
|
||||
|
||||
@@ -20,7 +20,7 @@ See the official docs [here](https://www.promptfoo.dev/docs/getting-started)
|
||||
|
||||
### Getting Started
|
||||
|
||||
To get started, set your ANTHROPIC_API_KEY environment variable, or other required keys for the providers you selected. You can do `export ANTHROPIC_API_KEY=YOUR_API_KEY`.
|
||||
To get started, set your CLAUDE_API_KEY environment variable, or other required keys for the providers you selected. You can do `export CLAUDE_API_KEY=YOUR_API_KEY`.
|
||||
|
||||
Then, `cd` into the `evaluation` directory and write `npx promptfoo@latest eval -c promptfooconfig.yaml --output ../data/results.csv`
|
||||
|
||||
|
||||
@@ -100,8 +100,8 @@
|
||||
"import pandas as pd\n",
|
||||
"from IPython.display import display\n",
|
||||
"\n",
|
||||
"# Set your Anthropic API key\n",
|
||||
"os.environ[\"ANTHROPIC_API_KEY\"] = \"YOUR_ANTHROPIC_API_KEY\"\n",
|
||||
"# Set your Claude API key\n",
|
||||
"os.environ[\"CLAUDE_API_KEY\"] = \"YOUR_CLAUDE_API_KEY\"\n",
|
||||
"os.environ[\"VOYAGE_API_KEY\"] = \"YOUR_VOYAGE_API_KEY\"\n",
|
||||
"\n",
|
||||
"# Initialize the Anthropic client\n",
|
||||
@@ -610,7 +610,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now let's use this prompt with the Anthropic API to generate SQL:"
|
||||
"Now let's use this prompt with the Claude API to generate SQL:"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
2
third_party/Deepgram/README.md
vendored
2
third_party/Deepgram/README.md
vendored
@@ -1,4 +1,4 @@
|
||||
# Deepgram <> Anthropic Cookbooks
|
||||
# Deepgram <> Claude Cookbooks
|
||||
|
||||
[Deepgram](https://deepgram.com/) is a foundational AI company providing the speech-to-text, text-to-speech, text-to-text and language intelligence capabilities you need to make your data readable and actionable by human or machines.
|
||||
|
||||
|
||||
6
third_party/Deepgram/prerecorded_audio.ipynb
vendored
6
third_party/Deepgram/prerecorded_audio.ipynb
vendored
@@ -225,10 +225,10 @@
|
||||
"# Load the transcript from the JSON file\n",
|
||||
"message_text = get_transcript(transcription_file)\n",
|
||||
"\n",
|
||||
"# Initialize the Anthropic API client\n",
|
||||
"# Initialize the Claude API client\n",
|
||||
"client = anthropic.Anthropic(\n",
|
||||
" # Defaults to os.environ.get(\"ANTHROPIC_API_KEY\")\n",
|
||||
" # Anthropic API key\n",
|
||||
" # Defaults to os.environ.get(\"CLAUDE_API_KEY\")\n",
|
||||
" # Claude API key\n",
|
||||
" api_key=\"🔑🔑🔑 Your API Key here! 🔑🔑🔑\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
|
||||
@@ -58,7 +58,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"os.environ['ANTHROPIC_API_KEY'] = 'YOUR ANTHROPIC API KEY'"
|
||||
"os.environ['CLAUDE_API_KEY'] = 'YOUR Claude API KEY'"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -84,7 +84,7 @@
|
||||
"id": "No_1L4P4K5J2"
|
||||
},
|
||||
"source": [
|
||||
"### Set Anthropic API Key"
|
||||
"### Set Claude API Key"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -96,7 +96,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"os.environ['ANTHROPIC_API_KEY'] = 'YOUR ANTHROPIC API KEY'"
|
||||
"os.environ['CLAUDE_API_KEY'] = 'YOUR Claude API KEY'"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
2
third_party/LlamaIndex/Multi_Modal.ipynb
vendored
2
third_party/LlamaIndex/Multi_Modal.ipynb
vendored
@@ -48,7 +48,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"os.environ['ANTHROPIC_API_KEY'] = 'YOUR ANTHROPIC API KEY'"
|
||||
"os.environ['CLAUDE_API_KEY'] = 'YOUR Claude API KEY'"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
2
third_party/LlamaIndex/README.md
vendored
2
third_party/LlamaIndex/README.md
vendored
@@ -1,4 +1,4 @@
|
||||
# LlamaIndex <> Anthropic Cookbooks
|
||||
# LlamaIndex <> Claude Cookbooks
|
||||
|
||||
[LlamaIndex](https://github.com/run-llama/llama_index) is a data framework for LLM-based applications that benefit from context augmentation.
|
||||
|
||||
|
||||
2
third_party/LlamaIndex/ReAct_Agent.ipynb
vendored
2
third_party/LlamaIndex/ReAct_Agent.ipynb
vendored
@@ -63,7 +63,7 @@
|
||||
"import os\n",
|
||||
"\n",
|
||||
"# Using Anthropic LLM API for LLM\n",
|
||||
"os.environ['ANTHROPIC_API_KEY'] = 'YOUR ANTHROPIC API KEY'\n",
|
||||
"os.environ['CLAUDE_API_KEY'] = 'YOUR Claude API KEY'\n",
|
||||
"\n",
|
||||
"from IPython.display import display, HTML"
|
||||
]
|
||||
|
||||
@@ -84,7 +84,7 @@
|
||||
"id": "No_1L4P4K5J2"
|
||||
},
|
||||
"source": [
|
||||
"### Set Anthropic API Key"
|
||||
"### Set Claude API Key"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -96,7 +96,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"os.environ['ANTHROPIC_API_KEY'] = 'YOUR ANTHROPIC API KEY'"
|
||||
"os.environ['CLAUDE_API_KEY'] = 'YOUR Claude API KEY'"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -53,7 +53,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"os.environ['ANTHROPIC_API_KEY'] = 'YOUR ANTHROPIC API KEY'"
|
||||
"os.environ['CLAUDE_API_KEY'] = 'YOUR Claude API KEY'"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
8
third_party/MongoDB/rag_using_mongodb.ipynb
vendored
8
third_party/MongoDB/rag_using_mongodb.ipynb
vendored
@@ -16,7 +16,7 @@
|
||||
"\n",
|
||||
"\n",
|
||||
"You will need the following:\n",
|
||||
"- Anthropic API Key\n",
|
||||
"- Claude API Key\n",
|
||||
"- VoyageAI API Key\n",
|
||||
"- Hugging Face Access Token"
|
||||
]
|
||||
@@ -469,7 +469,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The next step in this section is to import the anthropic library and load the client to access the anthropic’s methods for handling messages and accessing Claude models. Ensure you obtain an Anthropic API key located within the settings page on the [official Anthropic website](https://console.anthropic.com/settings/keys).\n"
|
||||
"The next step in this section is to import the anthropic library and load the client to access the anthropic’s methods for handling messages and accessing Claude models. Ensure you obtain an Claude API key located within the settings page on the [official Anthropic website](https://platform.claude.com/settings/keys).\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -481,7 +481,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import anthropic\n",
|
||||
"client = anthropic.Client(api_key=userdata.get(\"ANTHROPIC_API_KEY\"))"
|
||||
"client = anthropic.Client(api_key=userdata.get(\"CLAUDE_API_KEY\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -492,7 +492,7 @@
|
||||
"\n",
|
||||
"1. Vector Search Execution: The function begins by calling `vector_search` with the user's query and a specified collection as arguments. This performs a search within the collection, leveraging vector embeddings to find relevant information related to the query.\n",
|
||||
"2. Compile Search Results: `search_result` is initialized as an empty string to aggregate information from the search. The search results are compiled by iterating over the results returned by the `vector_search` function, formates each item's details (title, company name, URL, publication date, article URL, and description) into a human-readable string, appending this information to search_result with a newline character \\n at the end of each entry.\n",
|
||||
"3. Generate Response Using Anthropic Client: The function then constructs a request to the Anthropic API (through a client object, presumably an instance of the anthropic. Client class created earlier). It specifies:\n",
|
||||
"3. Generate Response Using Anthropic Client: The function then constructs a request to the Claude API (through a client object, presumably an instance of the anthropic. Client class created earlier). It specifies:\n",
|
||||
"- The model to use (\"claude-3-opus-20240229\") indicates a specific version of the Claude 3 model.\n",
|
||||
"- The maximum token limit for the generated response (max_tokens=1024).\n",
|
||||
"- A system description guides the model to behave as a \"Venture Capital Tech Analyst\" with access to tech company articles and information, using this context to advise.\n",
|
||||
|
||||
@@ -92,7 +92,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And grab the required API keys. We will need API keys for [Claude](https://docs.anthropic.com/claude/reference/getting-started-with-the-api), [Voyage AI](https://docs.voyageai.com/install/), and [Pinecone](https://docs.pinecone.io/docs/quickstart)."
|
||||
"And grab the required API keys. We will need API keys for [Claude](https://docs.claude.com/claude/reference/getting-started-with-the-api), [Voyage AI](https://docs.voyageai.com/install/), and [Pinecone](https://docs.pinecone.io/docs/quickstart)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -102,7 +102,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Insert your API keys here\n",
|
||||
"ANTHROPIC_API_KEY=\"<YOUR_ANTHROPIC_API_KEY>\"\n",
|
||||
"CLAUDE_API_KEY=\"<YOUR_CLAUDE_API_KEY>\"\n",
|
||||
"PINECONE_API_KEY=\"<YOUR_PINECONE_API_KEY>\"\n",
|
||||
"VOYAGE_API_KEY=\"<YOUR_VOYAGE_API_KEY>\""
|
||||
]
|
||||
@@ -684,7 +684,7 @@
|
||||
"source": [
|
||||
"We can see the XML format being used throughout the prompt when explaining to the LLM how it should use tools.\n",
|
||||
"\n",
|
||||
"Next we initialize our connection to Anthropic, for this we need an [Anthropic API key](https://console.anthropic.com/)."
|
||||
"Next we initialize our connection to Anthropic, for this we need an [Claude API key](https://platform.claude.com/)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -699,7 +699,7 @@
|
||||
"\n",
|
||||
"# chat completion llm\n",
|
||||
"llm = ChatAnthropic(\n",
|
||||
" anthropic_api_key=ANTHROPIC_API_KEY,\n",
|
||||
" CLAUDE_API_KEY=CLAUDE_API_KEY,\n",
|
||||
" model_name=\"claude-3-opus-20240229\", # change \"opus\" -> \"sonnet\" for speed\n",
|
||||
" temperature=0.0\n",
|
||||
")"
|
||||
|
||||
@@ -21,7 +21,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"First, let's install the necessary libraries and set the API keys we will need to use in this notebook. We will need to get a [Claude API key](https://docs.anthropic.com/claude/reference/getting-started-with-the-api), a free [Pinecone API key](https://docs.pinecone.io/docs/quickstart), and a free [Voyage AI API key](https://docs.voyageai.com/install/). "
|
||||
"First, let's install the necessary libraries and set the API keys we will need to use in this notebook. We will need to get a [Claude API key](https://docs.claude.com/claude/reference/getting-started-with-the-api), a free [Pinecone API key](https://docs.pinecone.io/docs/quickstart), and a free [Voyage AI API key](https://docs.voyageai.com/install/). "
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -40,7 +40,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Insert your API keys here\n",
|
||||
"ANTHROPIC_API_KEY=\"<YOUR_ANTHROPIC_API_KEY>\"\n",
|
||||
"CLAUDE_API_KEY=\"<YOUR_CLAUDE_API_KEY>\"\n",
|
||||
"PINECONE_API_KEY=\"<YOUR_PINECONE_API_KEY>\"\n",
|
||||
"VOYAGE_API_KEY=\"<YOUR_VOYAGE_API_KEY>\""
|
||||
]
|
||||
@@ -392,7 +392,7 @@
|
||||
"source": [
|
||||
"import anthropic\n",
|
||||
"\n",
|
||||
"client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)\n",
|
||||
"client = anthropic.Anthropic(api_key=CLAUDE_API_KEY)\n",
|
||||
"def get_completion(prompt):\n",
|
||||
" completion = client.completions.create(\n",
|
||||
" model=\"claude-2.1\",\n",
|
||||
|
||||
@@ -433,7 +433,7 @@
|
||||
"wikipedia_search_tool = WikipediaSearchTool()\n",
|
||||
"ANTHROPIC_SEARCH_MODEL = \"claude-2\"\n",
|
||||
"\n",
|
||||
"client = ClientWithRetrieval(api_key=os.environ['ANTHROPIC_API_KEY'], verbose=True, search_tool = wikipedia_search_tool)\n",
|
||||
"client = ClientWithRetrieval(api_key=os.environ['CLAUDE_API_KEY'], verbose=True, search_tool = wikipedia_search_tool)\n",
|
||||
"\n",
|
||||
"query = \"Which movie came out first: Oppenheimer, or Are You There God It's Me Margaret?\"\n",
|
||||
"\n",
|
||||
|
||||
2
third_party/WolframAlpha/using_llm_api.ipynb
vendored
2
third_party/WolframAlpha/using_llm_api.ipynb
vendored
@@ -13,7 +13,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Step 1: Set up the environment\n",
|
||||
"First, let's install the required libraries and set up the Anthropic API client. We also will need to set our APP ID for using WolframAlpha. You can sign up and create a new App ID for this project for free [here](https://developer.wolframalpha.com/access)."
|
||||
"First, let's install the required libraries and set up the Claude API client. We also will need to set our APP ID for using WolframAlpha. You can sign up and create a new App ID for this project for free [here](https://developer.wolframalpha.com/access)."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -14,7 +14,7 @@
|
||||
"source": [
|
||||
"## Step 1: Set up the environment\n",
|
||||
"\n",
|
||||
"First, let's install the required libraries and set up the Anthropic API client."
|
||||
"First, let's install the required libraries and set up the Claude API client."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -15,7 +15,7 @@
|
||||
"source": [
|
||||
"## Step 1: Set up the environment\n",
|
||||
"\n",
|
||||
"First, let's install the required libraries and set up the Anthropic API client."
|
||||
"First, let's install the required libraries and set up the Claude API client."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -17,7 +17,7 @@
|
||||
"source": [
|
||||
"## Set up the environment\n",
|
||||
"\n",
|
||||
"First, let's install the required libraries and set up the Anthropic API client."
|
||||
"First, let's install the required libraries and set up the Claude API client."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -34,7 +34,7 @@
|
||||
"\n",
|
||||
"#### Why do we need to manage memory?\n",
|
||||
"\n",
|
||||
"LLMs have finite context windows (200k tokens for Claude 4 Sonnet & Opus). This means that for any request, if the sum of prompt tokens and output tokens exceeds the model’s context window, the system will return a validation error. As many teams building with LLMs quickly learn, there is additional complexity in identifying and working within the *effective* [context window](https://docs.anthropic.com/en/docs/build-with-claude/context-windows) of an LLM. See our tips for [long context prompting](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/long-context-tips) to learn more about effective context windows and best practices.\n",
|
||||
"LLMs have finite context windows (200k tokens for Claude 4 Sonnet & Opus). This means that for any request, if the sum of prompt tokens and output tokens exceeds the model’s context window, the system will return a validation error. As many teams building with LLMs quickly learn, there is additional complexity in identifying and working within the *effective* [context window](https://docs.claude.com/en/docs/build-with-claude/context-windows) of an LLM. See our tips for [long context prompting](https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/long-context-tips) to learn more about effective context windows and best practices.\n",
|
||||
"\n",
|
||||
"In addition to the above, memory is important for the following reasons:\n",
|
||||
"- **Long context windows are computationally expensive:** Attention mechanisms scale quadratically—doubling context length quadruples compute cost. Most tasks only need a small fraction of available context, making it wasteful to process millions of irrelevant tokens. This is why humans don't memorize entire textbooks; we take notes and build mental models instead.\n",
|
||||
@@ -73,8 +73,8 @@
|
||||
"\n",
|
||||
"# api key must be in .env file in project\n",
|
||||
"load_dotenv()\n",
|
||||
"if os.getenv(\"ANTHROPIC_API_KEY\") is None:\n",
|
||||
" raise ValueError(\"ANTHROPIC_API_KEY not found in .env file\")\n",
|
||||
"if os.getenv(\"CLAUDE_API_KEY\") is None:\n",
|
||||
" raise ValueError(\"CLAUDE_API_KEY not found in .env file\")\n",
|
||||
"\n",
|
||||
"client = Anthropic()"
|
||||
]
|
||||
@@ -164,7 +164,7 @@
|
||||
"source": [
|
||||
"### Implementation 1: Simple Memory Tool\n",
|
||||
"\n",
|
||||
"*This implementation is a reflection of our agents quickstarts repo [here](https://github.com/anthropics/anthropic-quickstarts/tree/main/agents/tools). For more information on tool use, see the Anthropic API tools [docs](https://docs.anthropic.com/en/docs/build-with-claude/tool-use/overview).*\n",
|
||||
"*This implementation is a reflection of our agents quickstarts repo [here](https://github.com/anthropics/anthropic-quickstarts/tree/main/agents/tools). For more information on tool use, see the Claude API tools [docs](https://docs.claude.com/en/docs/build-with-claude/tool-use/overview).*\n",
|
||||
"\n",
|
||||
"The `SimpleMemory()` tool gives the model a scratchpad to manage memory. This is maintained as a single string that can be read or updated.\n",
|
||||
"\n",
|
||||
@@ -373,7 +373,7 @@
|
||||
"\n",
|
||||
"This implementation gives Claude the ability to interact with a 'memory' system represented to the model as a hierarchical file structure. The example below implements a basic directory, where the 'files' are just strings that we've labeled as plaintext files (the '.txt' label has no impact functionally, but can be useful for behavioral consistency).\n",
|
||||
"\n",
|
||||
"Hierarchical directory structures are easily readable and well-understood by humans and LLMs alike, so it's fitting to use them as a mechanism to represent persistent state more generally to an LLM. While you can connect to and define access patterns for any external storage system, a quick way to get started is with Anthropic's new <b>[Files API](https://docs.anthropic.com/en/docs/build-with-claude/files)</b>. The Files API enables storage and retrieval of objects for use in future requests.\n",
|
||||
"Hierarchical directory structures are easily readable and well-understood by humans and LLMs alike, so it's fitting to use them as a mechanism to represent persistent state more generally to an LLM. While you can connect to and define access patterns for any external storage system, a quick way to get started is with Anthropic's new <b>[Files API](https://docs.claude.com/en/docs/build-with-claude/files)</b>. The Files API enables storage and retrieval of objects for use in future requests.\n",
|
||||
"\n",
|
||||
"Ideally you (the developer & domain expert) would construct an initial state for the directory structure that adequately represents your domain context. Having some pre-defined structure provides useful behavioral queues for the model, but you should also introduce more explicit guidance to guard against excessive reads / writes / new file creation / etc."
|
||||
]
|
||||
@@ -595,7 +595,7 @@
|
||||
"class StorageManager:\n",
|
||||
" def __init__(self, api_key):\n",
|
||||
" if api_key is None:\n",
|
||||
" raise ValueError(\"ANTHROPIC_API_KEY not available.\")\n",
|
||||
" raise ValueError(\"CLAUDE_API_KEY not available.\")\n",
|
||||
" self.api_key = api_key\n",
|
||||
" self.base_url = \"https://api.anthropic.com/v1/files\"\n",
|
||||
" self.headers = {\n",
|
||||
@@ -662,7 +662,7 @@
|
||||
" \n",
|
||||
"# example usage\n",
|
||||
"#file_path = \"/Users/user/Downloads/SB1029-ProjectUpdate-FINAL_020317-A11Y.pdf\" # REPLACE\n",
|
||||
"storage_manager = StorageManager(os.getenv(\"ANTHROPIC_API_KEY\"))\n",
|
||||
"storage_manager = StorageManager(os.getenv(\"CLAUDE_API_KEY\"))\n",
|
||||
"#uploaded = storage_manager.upload_file(file_path)\n",
|
||||
"#storage_manager.get_file_metadata(uploaded['id'])\n",
|
||||
"storage_manager.list_files()[:2]"
|
||||
@@ -816,7 +816,7 @@
|
||||
" new_memory_object = kwargs.get('new_memory_object')\n",
|
||||
"\n",
|
||||
" if action == 'get':\n",
|
||||
" # we need to build the file messages from the file metadata (https://docs.anthropic.com/en/docs/docs/build-with-claude/files)\n",
|
||||
" # we need to build the file messages from the file metadata (https://docs.claude.com/en/docs/docs/build-with-claude/files)\n",
|
||||
" message_refs = [{\"type\": \"document\", \"source\": { \"type\": \"file\", \"file_id\": self.full_memory.get(path)}} for path in paths]\n",
|
||||
" return message_refs\n",
|
||||
"\n",
|
||||
|
||||
@@ -14,7 +14,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Step 1: Set up the environment\n",
|
||||
"First, let's install the required libraries and set up the Anthropic API client."
|
||||
"First, let's install the required libraries and set up the Claude API client."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -19,7 +19,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"First, let's install the necessary libraries and set up the Anthropic API client:"
|
||||
"First, let's install the necessary libraries and set up the Claude API client:"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
Reference in New Issue
Block a user