mirror of
https://github.com/anthropics/claude-cookbooks.git
synced 2025-10-06 01:00:28 +03:00
533 lines
15 KiB
Plaintext
533 lines
15 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "20de56a8",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Prompting Claude for \"JSON Mode\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "8e7fe136",
|
|
"metadata": {},
|
|
"source": [
|
|
"Claude doesn't have a formal \"JSON Mode\" with constrained sampling. But not to worry -- you can still get reliable JSON from Claude! This recipe will show you how."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e5e1a230",
|
|
"metadata": {},
|
|
"source": [
|
|
"First, let's look at Claude's default behavior."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "0c07a2c1",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%pip install anthropic"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"id": "fc39114b",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from anthropic import Anthropic\n",
|
|
"import json\n",
|
|
"import re\n",
|
|
"from pprint import pprint"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "1b991340",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"client = Anthropic()\n",
|
|
"MODEL_NAME = \"claude-3-opus-20240229\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "c28ca1ad",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Here is a JSON dictionary with names of famous athletes and their respective sports:\n",
|
|
"\n",
|
|
"{\n",
|
|
" \"athletes\": [\n",
|
|
" {\n",
|
|
" \"name\": \"Usain Bolt\",\n",
|
|
" \"sport\": \"Track and Field\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\": \"Michael Phelps\",\n",
|
|
" \"sport\": \"Swimming\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\": \"Serena Williams\",\n",
|
|
" \"sport\": \"Tennis\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\": \"LeBron James\",\n",
|
|
" \"sport\": \"Basketball\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\": \"Lionel Messi\",\n",
|
|
" \"sport\": \"Soccer\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\": \"Simone Biles\",\n",
|
|
" \"sport\": \"Gymnastics\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\": \"Tom Brady\",\n",
|
|
" \"sport\": \"American Football\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\": \"Muhammad Ali\",\n",
|
|
" \"sport\": \"Boxing\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\": \"Nadia Comaneci\",\n",
|
|
" \"sport\": \"Gymnastics\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\": \"Michael Jordan\",\n",
|
|
" \"sport\": \"Basketball\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\": \"Pelé\",\n",
|
|
" \"sport\": \"Soccer\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\": \"Roger Federer\",\n",
|
|
" \"sport\": \"Tennis\"\n",
|
|
" }\n",
|
|
" ]\n",
|
|
"}\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"message = client.messages.create(\n",
|
|
" model=MODEL_NAME,\n",
|
|
" max_tokens=1024,\n",
|
|
" messages=[\n",
|
|
" {\n",
|
|
" \"role\": \"user\", \n",
|
|
" \"content\": \"Give me a JSON dict with names of famous athletes & their sports.\"\n",
|
|
" },\n",
|
|
" ]\n",
|
|
").content[0].text\n",
|
|
"print(message)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "0a63ad7e",
|
|
"metadata": {},
|
|
"source": [
|
|
"Claude followed instructions and outputted a nice dictionary, which we can extract with code:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"id": "08626553",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"{'athletes': [{'name': 'Usain Bolt', 'sport': 'Track and Field'},\n",
|
|
" {'name': 'Michael Phelps', 'sport': 'Swimming'},\n",
|
|
" {'name': 'Serena Williams', 'sport': 'Tennis'},\n",
|
|
" {'name': 'LeBron James', 'sport': 'Basketball'},\n",
|
|
" {'name': 'Lionel Messi', 'sport': 'Soccer'},\n",
|
|
" {'name': 'Simone Biles', 'sport': 'Gymnastics'},\n",
|
|
" {'name': 'Tom Brady', 'sport': 'American Football'},\n",
|
|
" {'name': 'Muhammad Ali', 'sport': 'Boxing'},\n",
|
|
" {'name': 'Nadia Comaneci', 'sport': 'Gymnastics'},\n",
|
|
" {'name': 'Michael Jordan', 'sport': 'Basketball'},\n",
|
|
" {'name': 'Pelé', 'sport': 'Soccer'},\n",
|
|
" {'name': 'Roger Federer', 'sport': 'Tennis'}]}"
|
|
]
|
|
},
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"def extract_json(response):\n",
|
|
" json_start = response.index(\"{\")\n",
|
|
" json_end = response.rfind(\"}\")\n",
|
|
" return json.loads(response[json_start:json_end + 1])\n",
|
|
"extract_json(message)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "39275885",
|
|
"metadata": {},
|
|
"source": [
|
|
"But what if we want Claude to skip the preamble and go straight to the JSON? One simple way is to prefill Claude's response and include a \"{\" character."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"id": "155e088a",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\n",
|
|
" \"athletes\":[\n",
|
|
" {\n",
|
|
" \"name\":\"Michael Jordan\",\n",
|
|
" \"sport\":\"Basketball\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\":\"Babe Ruth\",\n",
|
|
" \"sport\":\"Baseball\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\":\"Muhammad Ali\",\n",
|
|
" \"sport\":\"Boxing\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\":\"Serena Williams\",\n",
|
|
" \"sport\":\"Tennis\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\":\"Wayne Gretzky\",\n",
|
|
" \"sport\":\"Hockey\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\":\"Michael Phelps\",\n",
|
|
" \"sport\":\"Swimming\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\":\"Usain Bolt\",\n",
|
|
" \"sport\":\"Track and Field\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\":\"Mia Hamm\",\n",
|
|
" \"sport\":\"Soccer\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\":\"Michael Schumacher\",\n",
|
|
" \"sport\":\"Formula 1 Racing\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"name\":\"Simone Biles\",\n",
|
|
" \"sport\":\"Gymnastics\"\n",
|
|
" }\n",
|
|
" ]\n",
|
|
"}\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"message = client.messages.create(\n",
|
|
" model=MODEL_NAME,\n",
|
|
" max_tokens=1024,\n",
|
|
" messages=[\n",
|
|
" {\n",
|
|
" \"role\": \"user\", \n",
|
|
" \"content\": \"Give me a JSON dict with names of famous athletes & their sports.\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"role\": \"assistant\",\n",
|
|
" \"content\": \"Here is the JSON requested:\\n{\"\n",
|
|
" }\n",
|
|
" ]\n",
|
|
").content[0].text\n",
|
|
"print(message)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "64e94c65",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now all we have to do is add back the \"{\" that we prefilled and we can extract the JSON."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"id": "6c066ac6",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"{'athletes': [{'name': 'Michael Jordan', 'sport': 'Basketball'},\n",
|
|
" {'name': 'Babe Ruth', 'sport': 'Baseball'},\n",
|
|
" {'name': 'Muhammad Ali', 'sport': 'Boxing'},\n",
|
|
" {'name': 'Serena Williams', 'sport': 'Tennis'},\n",
|
|
" {'name': 'Wayne Gretzky', 'sport': 'Hockey'},\n",
|
|
" {'name': 'Michael Phelps', 'sport': 'Swimming'},\n",
|
|
" {'name': 'Usain Bolt', 'sport': 'Track and Field'},\n",
|
|
" {'name': 'Mia Hamm', 'sport': 'Soccer'},\n",
|
|
" {'name': 'Michael Schumacher', 'sport': 'Formula 1 Racing'},\n",
|
|
" {'name': 'Simone Biles', 'sport': 'Gymnastics'}]}"
|
|
]
|
|
},
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"output_json = json.loads(\"{\" + message[:message.rfind(\"}\") + 1])\n",
|
|
"output_json"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "cd4492fd",
|
|
"metadata": {},
|
|
"source": [
|
|
"For very long and complicated prompts, which contain multiple JSON outputs so that a string search for \"{\" and \"}\" don't do the trick, you can also have Claude output each JSON item in specified tags for future extraction."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"id": "443ad932",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
" \n",
|
|
"\n",
|
|
"<athlete_sports>\n",
|
|
"{\n",
|
|
" \"Michael Jordan\": \"Basketball\",\n",
|
|
" \"Serena Williams\": \"Tennis\",\n",
|
|
" \"Lionel Messi\": \"Soccer\", \n",
|
|
" \"Usain Bolt\": \"Track and Field\",\n",
|
|
" \"Michael Phelps\": \"Swimming\"\n",
|
|
"}\n",
|
|
"</athlete_sports>\n",
|
|
"\n",
|
|
"<athlete_name>\n",
|
|
"{\n",
|
|
" \"first\": [\"Magnificent\", \"Motivating\", \"Memorable\"],\n",
|
|
" \"last\": [\"Joyful\", \"Jumping\", \"Jocular\"]\n",
|
|
"}\n",
|
|
"</athlete_name>\n",
|
|
"\n",
|
|
"<athlete_name>\n",
|
|
"{\n",
|
|
" \"first\": [\"Skillful\", \"Strong\", \"Superstar\"],\n",
|
|
" \"last\": [\"Winning\", \"Willful\", \"Wise\"]\n",
|
|
"}\n",
|
|
"</athlete_name>\n",
|
|
"\n",
|
|
"<athlete_name>\n",
|
|
"{\n",
|
|
" \"first\": [\"Legendary\", \"Lively\", \"Leaping\"],\n",
|
|
" \"last\": [\"Magical\", \"Marvelous\", \"Masterful\"] \n",
|
|
"}\n",
|
|
"</athlete_name>\n",
|
|
"\n",
|
|
"<athlete_name>\n",
|
|
"{\n",
|
|
" \"first\": [\"Unbeatable\", \"Unbelievable\", \"Unstoppable\"],\n",
|
|
" \"last\": [\"Brave\", \"Bold\", \"Brilliant\"]\n",
|
|
"}\n",
|
|
"</athlete_name>\n",
|
|
"\n",
|
|
"<athlete_name>\n",
|
|
"{\n",
|
|
" \"first\": [\"Marvelous\", \"Methodical\", \"Medalist\"],\n",
|
|
" \"last\": [\"Powerful\", \"Persevering\", \"Precise\"]\n",
|
|
"}\n",
|
|
"</athlete_name>\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"message = client.messages.create(\n",
|
|
" model=MODEL_NAME,\n",
|
|
" max_tokens=1024,\n",
|
|
" messages=[\n",
|
|
" {\n",
|
|
" \"role\": \"user\", \n",
|
|
" \"content\": \"\"\"Give me a JSON dict with the names of 5 famous athletes & their sports.\n",
|
|
"Put this dictionary in <athlete_sports> tags. \n",
|
|
"\n",
|
|
"Then, for each athlete, output an additional JSON dictionary. In each of these additional dictionaries:\n",
|
|
"- Include two keys: the athlete's first name and last name.\n",
|
|
"- For the values, list three words that start with the same letter as that name.\n",
|
|
"Put each of these additional dictionaries in separate <athlete_name> tags.\"\"\"\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"role\": \"assistant\",\n",
|
|
" \"content\": \"Here is the JSON requested:\"\n",
|
|
" }\n",
|
|
" ],\n",
|
|
").content[0].text\n",
|
|
"print(message)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "74043369",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now, we can use an extraction regex to get all the dictionaries."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"id": "bd847a70",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import re\n",
|
|
"def extract_between_tags(tag: str, string: str, strip: bool = False) -> list[str]:\n",
|
|
" ext_list = re.findall(f\"<{tag}>(.+?)</{tag}>\", string, re.DOTALL)\n",
|
|
" if strip:\n",
|
|
" ext_list = [e.strip() for e in ext_list]\n",
|
|
" return ext_list\n",
|
|
"\n",
|
|
"athlete_sports_dict = json.loads(extract_between_tags(\"athlete_sports\", message)[0])\n",
|
|
"athlete_name_dicts = [\n",
|
|
" json.loads(d)\n",
|
|
" for d in extract_between_tags(\"athlete_name\", message)\n",
|
|
"]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"id": "eb61ee06",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"{'Lionel Messi': 'Soccer',\n",
|
|
" 'Michael Jordan': 'Basketball',\n",
|
|
" 'Michael Phelps': 'Swimming',\n",
|
|
" 'Serena Williams': 'Tennis',\n",
|
|
" 'Usain Bolt': 'Track and Field'}\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"pprint(athlete_sports_dict)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 16,
|
|
"id": "57dade0f",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"[{'first': ['Magnificent',\n",
|
|
" 'Motivating',\n",
|
|
" 'Memorable'],\n",
|
|
" 'last': ['Joyful',\n",
|
|
" 'Jumping',\n",
|
|
" 'Jocular']},\n",
|
|
" {'first': ['Skillful',\n",
|
|
" 'Strong',\n",
|
|
" 'Superstar'],\n",
|
|
" 'last': ['Winning',\n",
|
|
" 'Willful',\n",
|
|
" 'Wise']},\n",
|
|
" {'first': ['Legendary',\n",
|
|
" 'Lively',\n",
|
|
" 'Leaping'],\n",
|
|
" 'last': ['Magical',\n",
|
|
" 'Marvelous',\n",
|
|
" 'Masterful']},\n",
|
|
" {'first': ['Unbeatable',\n",
|
|
" 'Unbelievable',\n",
|
|
" 'Unstoppable'],\n",
|
|
" 'last': ['Brave',\n",
|
|
" 'Bold',\n",
|
|
" 'Brilliant']},\n",
|
|
" {'first': ['Marvelous',\n",
|
|
" 'Methodical',\n",
|
|
" 'Medalist'],\n",
|
|
" 'last': ['Powerful',\n",
|
|
" 'Persevering',\n",
|
|
" 'Precise']}]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"pprint(athlete_name_dicts, width=1)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5e5854c0",
|
|
"metadata": {},
|
|
"source": [
|
|
"So to recap:\n",
|
|
"\n",
|
|
"- You can use string parsing to extract the text between \"```json\" and \"```\" to get the JSON.\n",
|
|
"- You can remove preambles *before* the JSON via a partial Assistant message. (However, this removes the possibility of having Claude do \"Chain of Thought\" for increased intelligence before beginning to output the JSON.)\n",
|
|
"- You can get rid of text that comes *after* the JSON by using a stop sequence.\n",
|
|
"- You can instruct Claude to output JSON in XML tags to make it easy to collect afterward for more complex prompts."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.12"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|