mirror of
https://github.com/anthropics/claude-cookbooks.git
synced 2025-10-06 01:00:28 +03:00
306 lines
11 KiB
Plaintext
306 lines
11 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Evaluator-Optimizer Workflow\n",
|
|
"In this workflow, one LLM call generates a response while another provides evaluation and feedback in a loop.\n",
|
|
"\n",
|
|
"### When to use this workflow\n",
|
|
"This workflow is particularly effective when we have:\n",
|
|
"\n",
|
|
"- Clear evaluation criteria\n",
|
|
"- Value from iterative refinement\n",
|
|
"\n",
|
|
"The two signs of good fit are:\n",
|
|
"\n",
|
|
"- LLM responses can be demonstrably improved when feedback is provided\n",
|
|
"- The LLM can provide meaningful feedback itself"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from util import llm_call, extract_xml\n",
|
|
"\n",
|
|
"def generate(prompt: str, task: str, context: str = \"\") -> tuple[str, str]:\n",
|
|
" \"\"\"Generate and improve a solution based on feedback.\"\"\"\n",
|
|
" full_prompt = f\"{prompt}\\n{context}\\nTask: {task}\" if context else f\"{prompt}\\nTask: {task}\"\n",
|
|
" response = llm_call(full_prompt)\n",
|
|
" thoughts = extract_xml(response, \"thoughts\")\n",
|
|
" result = extract_xml(response, \"response\")\n",
|
|
" \n",
|
|
" print(\"\\n=== GENERATION START ===\")\n",
|
|
" print(f\"Thoughts:\\n{thoughts}\\n\")\n",
|
|
" print(f\"Generated:\\n{result}\")\n",
|
|
" print(\"=== GENERATION END ===\\n\")\n",
|
|
" \n",
|
|
" return thoughts, result\n",
|
|
"\n",
|
|
"def evaluate(prompt: str, content: str, task: str) -> tuple[str, str]:\n",
|
|
" \"\"\"Evaluate if a solution meets requirements.\"\"\"\n",
|
|
" full_prompt = f\"{prompt}\\nOriginal task: {task}\\nContent to evaluate: {content}\"\n",
|
|
" response = llm_call(full_prompt)\n",
|
|
" evaluation = extract_xml(response, \"evaluation\")\n",
|
|
" feedback = extract_xml(response, \"feedback\")\n",
|
|
" \n",
|
|
" print(\"=== EVALUATION START ===\")\n",
|
|
" print(f\"Status: {evaluation}\")\n",
|
|
" print(f\"Feedback: {feedback}\")\n",
|
|
" print(\"=== EVALUATION END ===\\n\")\n",
|
|
" \n",
|
|
" return evaluation, feedback\n",
|
|
"\n",
|
|
"def loop(task: str, evaluator_prompt: str, generator_prompt: str) -> tuple[str, list[dict]]:\n",
|
|
" \"\"\"Keep generating and evaluating until requirements are met.\"\"\"\n",
|
|
" memory = []\n",
|
|
" chain_of_thought = []\n",
|
|
" \n",
|
|
" thoughts, result = generate(generator_prompt, task)\n",
|
|
" memory.append(result)\n",
|
|
" chain_of_thought.append({\"thoughts\": thoughts, \"result\": result})\n",
|
|
" \n",
|
|
" while True:\n",
|
|
" evaluation, feedback = evaluate(evaluator_prompt, result, task)\n",
|
|
" if evaluation == \"PASS\":\n",
|
|
" return result, chain_of_thought\n",
|
|
" \n",
|
|
" context = \"\\n\".join([\n",
|
|
" \"Previous attempts:\",\n",
|
|
" *[f\"- {m}\" for m in memory],\n",
|
|
" f\"\\nFeedback: {feedback}\"\n",
|
|
" ])\n",
|
|
" \n",
|
|
" thoughts, result = generate(generator_prompt, task, context)\n",
|
|
" memory.append(result)\n",
|
|
" chain_of_thought.append({\"thoughts\": thoughts, \"result\": result})"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Example Use Case: Iterative coding loop\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\n",
|
|
"=== GENERATION START ===\n",
|
|
"Thoughts:\n",
|
|
"\n",
|
|
"The task requires implementing a Stack with constant time operations including finding minimum. \n",
|
|
"To achieve O(1) for getMin(), we need to maintain a second stack that keeps track of minimums.\n",
|
|
"Each time we push, if the value is smaller than current min, we add it to minStack.\n",
|
|
"When we pop, if the popped value equals current min, we also pop from minStack.\n",
|
|
"\n",
|
|
"\n",
|
|
"Generated:\n",
|
|
"\n",
|
|
"```python\n",
|
|
"class MinStack:\n",
|
|
" def __init__(self):\n",
|
|
" self.stack = []\n",
|
|
" self.minStack = []\n",
|
|
" \n",
|
|
" def push(self, x: int) -> None:\n",
|
|
" self.stack.append(x)\n",
|
|
" if not self.minStack or x <= self.minStack[-1]:\n",
|
|
" self.minStack.append(x)\n",
|
|
" \n",
|
|
" def pop(self) -> None:\n",
|
|
" if not self.stack:\n",
|
|
" return\n",
|
|
" if self.stack[-1] == self.minStack[-1]:\n",
|
|
" self.minStack.pop()\n",
|
|
" self.stack.pop()\n",
|
|
" \n",
|
|
" def getMin(self) -> int:\n",
|
|
" if not self.minStack:\n",
|
|
" return None\n",
|
|
" return self.minStack[-1]\n",
|
|
"```\n",
|
|
"\n",
|
|
"=== GENERATION END ===\n",
|
|
"\n",
|
|
"=== EVALUATION START ===\n",
|
|
"Status: NEEDS_IMPROVEMENT\n",
|
|
"Feedback: \n",
|
|
"While the implementation is generally correct and achieves O(1) time complexity for all operations, there are several areas for improvement:\n",
|
|
"\n",
|
|
"1. Error Handling:\n",
|
|
"- pop() should raise an exception when stack is empty rather than silently returning\n",
|
|
"- getMin() should raise an exception when stack is empty rather than returning None\n",
|
|
"- These behaviors should be consistent with standard stack implementations\n",
|
|
"\n",
|
|
"2. Type Hints:\n",
|
|
"- Return type hint for pop() should be None or void\n",
|
|
"- Missing type hints for class variables stack and minStack\n",
|
|
"\n",
|
|
"3. Documentation:\n",
|
|
"- Missing docstrings for class and methods\n",
|
|
"- Missing parameter descriptions\n",
|
|
"\n",
|
|
"4. Edge Cases:\n",
|
|
"- No explicit handling of invalid input types for push()\n",
|
|
"\n",
|
|
"The core algorithm is correct and efficient, but the implementation could be more robust and better documented for production use.\n",
|
|
"\n",
|
|
"=== EVALUATION END ===\n",
|
|
"\n",
|
|
"\n",
|
|
"=== GENERATION START ===\n",
|
|
"Thoughts:\n",
|
|
"\n",
|
|
"Based on the feedback, I'll improve the implementation by:\n",
|
|
"1. Adding proper error handling with exceptions\n",
|
|
"2. Including type hints and docstrings\n",
|
|
"3. Adding input validation\n",
|
|
"4. Maintaining O(1) time complexity for all operations\n",
|
|
"\n",
|
|
"\n",
|
|
"Generated:\n",
|
|
"\n",
|
|
"```python\n",
|
|
"from typing import List\n",
|
|
"\n",
|
|
"class MinStack:\n",
|
|
" \"\"\"A stack that supports push, pop, and getting minimum element in O(1) time.\"\"\"\n",
|
|
" \n",
|
|
" def __init__(self):\n",
|
|
" \"\"\"Initialize empty stack with two internal lists.\"\"\"\n",
|
|
" self.stack: List[int] = []\n",
|
|
" self.minStack: List[int] = []\n",
|
|
" \n",
|
|
" def push(self, x: int) -> None:\n",
|
|
" \"\"\"\n",
|
|
" Push element onto stack and update minimum stack.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" x: Integer to push onto stack\n",
|
|
" \n",
|
|
" Raises:\n",
|
|
" TypeError: If x is not an integer\n",
|
|
" \"\"\"\n",
|
|
" if not isinstance(x, int):\n",
|
|
" raise TypeError(\"Input must be an integer\")\n",
|
|
" \n",
|
|
" self.stack.append(x)\n",
|
|
" if not self.minStack or x <= self.minStack[-1]:\n",
|
|
" self.minStack.append(x)\n",
|
|
" \n",
|
|
" def pop(self) -> None:\n",
|
|
" \"\"\"\n",
|
|
" Remove and return top element from stack.\n",
|
|
" \n",
|
|
" Raises:\n",
|
|
" IndexError: If stack is empty\n",
|
|
" \"\"\"\n",
|
|
" if not self.stack:\n",
|
|
" raise IndexError(\"Cannot pop from empty stack\")\n",
|
|
" \n",
|
|
" if self.stack[-1] == self.minStack[-1]:\n",
|
|
" self.minStack.pop()\n",
|
|
" self.stack.pop()\n",
|
|
" \n",
|
|
" def getMin(self) -> int:\n",
|
|
" \"\"\"\n",
|
|
" Get minimum element in stack.\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" int: Minimum element in stack\n",
|
|
" \n",
|
|
" Raises:\n",
|
|
" IndexError: If stack is empty\n",
|
|
" \"\"\"\n",
|
|
" if not self.minStack:\n",
|
|
" raise IndexError(\"Cannot get minimum from empty stack\")\n",
|
|
" return self.minStack[-1]\n",
|
|
"```\n",
|
|
"\n",
|
|
"=== GENERATION END ===\n",
|
|
"\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"evaluator_prompt = \"\"\"\n",
|
|
"Evaluate this following code implementation for:\n",
|
|
"1. code correctness\n",
|
|
"2. time complexity\n",
|
|
"3. style and best practices\n",
|
|
"\n",
|
|
"You should be evaluating only and not attemping to solve the task.\n",
|
|
"Only output \"PASS\" if all criteria are met and you have no further suggestions for improvements.\n",
|
|
"Output your evaluation concisely in the following format.\n",
|
|
"\n",
|
|
"<evaluation>PASS, NEEDS_IMPROVEMENT, or FAIL</evaluation>\n",
|
|
"<feedback>\n",
|
|
"What needs improvement and why.\n",
|
|
"</feedback>\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"generator_prompt = \"\"\"\n",
|
|
"Your goal is to complete the task based on <user input>. If there are feedback \n",
|
|
"from your previous generations, you should reflect on them to improve your solution\n",
|
|
"\n",
|
|
"Output your answer concisely in the following format: \n",
|
|
"\n",
|
|
"<thoughts>\n",
|
|
"[Your understanding of the task and feedback and how you plan to improve]\n",
|
|
"</thoughts>\n",
|
|
"\n",
|
|
"<response>\n",
|
|
"[Your code implementation here]\n",
|
|
"</response>\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"task = \"\"\"\n",
|
|
"<user input>\n",
|
|
"Implement a Stack with:\n",
|
|
"1. push(x)\n",
|
|
"2. pop()\n",
|
|
"3. getMin()\n",
|
|
"All operations should be O(1).\n",
|
|
"</user input>\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"loop(task, evaluator_prompt, generator_prompt)\n"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "py311",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|