Files
Alex Notov 8d1c93365b Revert CLAUDE_API_KEY to ANTHROPIC_API_KEY throughout the repository
Reverted all instances of CLAUDE_API_KEY back to ANTHROPIC_API_KEY to maintain
compatibility with existing infrastructure and GitHub secrets. This affects:
- Environment variable examples (.env.example files)
- Python scripts and notebooks
- Documentation and README files
- Evaluation scripts and test files

Other naming changes (Claude API, Claude Console, Claude Docs, Claude Cookbook) remain intact.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-16 17:02:29 -06:00

1407 lines
183 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Summarization with Claude\n",
"\n",
"## Introduction\n",
"\n",
"Summarization is a crucial task in natural language processing that involves condensing large amounts of text into shorter, more digestible formats while retaining key information. In today's information-rich world, the ability to quickly extract and synthesize essential points from lengthy documents is invaluable across various industries and applications.\n",
"\n",
"This guide focuses on leveraging Claude's capabilities for summarization, with a particular emphasis on legal documents. Legal documents can often be long and tedious to read particularly where there is a lot of fine print and legal terminology. We'll explore techniques for effective summarization of such documents, methods for evaluating summary quality, and strategies for systematically improving summarization performance.\n",
"\n",
"Key aspects we'll cover include:\n",
"- Crafting effective prompts for summarization\n",
"- Extracting specific metadata from documents\n",
"- Handling longer documents beyond typical token limits\n",
"- Evaluating summary quality using automated methods (e.g., ROUGE scores and [Promptfoo](https://www.promptfoo.dev/) custom methods)\n",
"- Iteratively improving summarization performance\n",
"- General conclusive tips on how to optimize your summarization workflows\n",
"\n",
"By the end of this guide, you'll have a solid understanding of how to implement and refine summarization tasks using Claude, along with a framework for applying these techniques to your own specific use cases.\n",
"\n",
"Before we get going, it's worth talking about evaluations in this guide. Evaluating the quality of summarization is a notoriously challenging task. Unlike many other natural language processing tasks, summarization evaluation often lacks clear-cut, objective metrics. The process can be highly subjective, with different readers valuing different aspects of a summary. Traditional empirical methods like ROUGE scores, while useful, have limitations in capturing nuanced aspects such as coherence, factual accuracy, and relevance. Moreover, the \"best\" summary can vary depending on the specific use case, target audience, and desired level of detail. Despite these challenges, we explore several different approaches in this guide that can be leveraged, combining automated metrics, regular expressions, and task-specific criteria. In this guide we recognize that the most effective approach often involves a tailored combination of techniques suited to the particular summarization task at hand.\n",
"\n",
"## Table of Contents\n",
"\n",
"1. [Setup](#setup)\n",
"2. [Data Preparation](#data-preparation)\n",
"3. [Basic Summarization](#basic-summarization)\n",
"4. [Multi-Shot Basic Summarization](#multi-shot-basic-summarization)\n",
"5. [Advanced Techniques](#advanced-techniques)\n",
" - [Guided Summarization](#guided-summarization)\n",
" - [Domain-Specific Guided Summarization](#domain-specific-guided-summarization)\n",
" - [Meta-Summarization](#including-the-context-of-the-entire-document-meta-summarization)\n",
"6. [Summary Indexed Documents: An Advanced RAG Approach](#summary-indexed-documents-an-advanced-rag-approach)\n",
" - [Best Practices for Summarization Rag](#best-practices-for-summarization-rag)\n",
"7. [Evaluations](#evaluations)\n",
"8. [Iterative Improvement](#iterative-improvement)\n",
"9. [Conclusion and Best Practices](#conclusion-and-best-practices)\n",
"\n",
"## Setup\n",
"\n",
"To complete this guide, you'll need to install the following packages:\n",
"- anthropic \n",
"- pypdf\n",
"- pandas\n",
"- matplotlib\n",
"- sklearn\n",
"- numpy\n",
"- rouge-score\n",
"- nltk\n",
"- seaborn\n",
"- [promptfoo](https://www.promptfoo.dev/) (for evaluation)\n",
"\n",
"You'll also need an Claude API key.\n",
"\n",
"Let's start by installing the required packages and setting up our environment:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# install packages\n",
"!pip install anthropic pypdf pandas matplotlib numpy rouge-score nltk seaborn --quiet"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Setup complete!\n"
]
}
],
"source": [
"import os\n",
"import re\n",
"import anthropic\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"from typing import List, Dict, Tuple\n",
"import json\n",
"import seaborn as sns\n",
"\n",
"# Set up Anthropic client\n",
"# You can set up a .env file with your API key to keep it private, and import it like so:\n",
"# from dotenv import load_dotenv\n",
"# load_dotenv()\n",
"\n",
"# or add your key directly\n",
"api_key = 'ANTHROPIC_API_KEY' # Replace ANTHROPIC_API_KEY with your actual API key\n",
"client = anthropic.Anthropic(api_key=api_key)\n",
"\n",
"print(\"Setup complete!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data Preparation\n",
"Before we can begin summarizing documents, we need to prepare our data. This involves extracting text from PDFs, cleaning the text, and ensuring it's ready for input into our language model. For the purposes of this demo, we have sourced a [publicly available Sublease Agreement from the sec.gov website](https://www.sec.gov/Archives/edgar/data/1045425/000119312507044370/dex1032.htm). \n",
"\n",
"If you have any PDF you want to test this on, feel free to import it into this directory, and then change the file path below. **If you want to just use a text blob via copy and paste, skip this step and define `text = <text blob>`**.\n",
"\n",
"Here's a set of functions to handle this process:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"EX-10.32 7 dex1032.htm SUBLEASE AGREEMENT Exhibit 10.32 SUBLEASE AGREEMENT THIS SUBLEASE AGREEMENT (“Sublease ”), is dated as of April 1, 2006, by and between COHEN BROTHERS, LLC d/b/a COHEN & COMP ANY (“Sublessor ”) and TABERNA CAPIT AL MANAGEMENT , LLC (“Sublessee ”), collectively , the “ Parties ” and each a “ Party ”. WHEREAS, Sublessor is the lessee under a written lease agreement dated June 22, 2005 wherein Brandywine Cira, L.P ., a Delaware limited partnership (“ Lessor ”), leased Suite N\n"
]
}
],
"source": [
"import pypdf\n",
"import re\n",
"\n",
"pdf_path = \"data/Sample Sublease Agreement.pdf\"\n",
"\n",
"def extract_text_from_pdf(pdf_path):\n",
" with open(pdf_path, 'rb') as file:\n",
" reader = pypdf.PdfReader(file)\n",
" text = \"\"\n",
" for page in reader.pages:\n",
" text += page.extract_text() + \"\\n\"\n",
" return text\n",
"\n",
"def clean_text(text):\n",
" # Remove extra whitespace\n",
" text = re.sub(r'\\s+', ' ', text)\n",
" # Remove page numbers\n",
" text = re.sub(r'\\n\\s*\\d+\\s*\\n', '\\n', text)\n",
" return text.strip()\n",
"\n",
"def prepare_for_llm(text, max_tokens=180000):\n",
" # Truncate text to fit within token limit (approximate)\n",
" return text[:max_tokens * 4] # Assuming average of 4 characters per token\n",
"\n",
"def get_llm_text(path):\n",
" extracted_text = extract_text_from_pdf(path)\n",
" cleaned_text = clean_text(extracted_text)\n",
" llm_ready_text = prepare_for_llm(cleaned_text)\n",
" return llm_ready_text\n",
"\n",
"# You can now use get_llm_text in your LLM prompt\n",
"text = get_llm_text(pdf_path)\n",
"print(text[:500])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This setup allows us to easily process PDF documents and prepare them for summarization. In the next section, we'll start with a basic summarization approach and then build upon it with more advanced techniques."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Summarization\n",
"\n",
"Let's start with a simple summarization function using Claude. This is a simple attempt at using Claude to summarize the text from the document above. As we progress through this guide, we will improve on this method.\n",
"\n",
"One thing to note is while this might seem basic, we are actually using some important functionality of Claude already. One piece worth noting is the use of the assistant role and stop sequences. The assistant preamble tees Claude up to include the summary directly after the final phrase `<summary>`. The stop sequence `</summary>` then tells Claude to stop generating. This is a pattern we will continue to use throughout this guide."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"def basic_summarize(text, max_tokens=1000):\n",
"\n",
" # Prompt the model to summarize the text\n",
" prompt = f\"\"\"Summarize the following text in bullet points. Focus on the main ideas and key details:\n",
" {text}\n",
" \"\"\"\n",
"\n",
" response = client.messages.create(\n",
" model=\"claude-3-5-sonnet-20241022\",\n",
" max_tokens=max_tokens,\n",
" system=\"You are a legal analyst known for highly accurate and detailed summaries of legal documents.\",\n",
" messages=[\n",
" {\n",
" \"role\": \"user\", \n",
" \"content\": prompt\n",
" },\n",
" {\n",
" \"role\": \"assistant\",\n",
" \"content\": \"Here is the summary of the legal document: <summary>\" \n",
" }\n",
" ],\n",
" stop_sequences=[\"</summary>\"]\n",
" )\n",
"\n",
" return response.content[0].text"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"Key Points:\n",
"•Between parties: COHEN BROTHERS, LLC d/b/a COHEN & COMPANY (Sublessor) and TABERNA CAPITAL MANAGEMENT, LLC (Sublessee).\n",
"•Signed on April 1, 2006.\n",
"•Premises: 2,000 square feet of office space in Suite 1703 in the Cira Center at 2929 Arch Street, Philadelphia.\n",
"\n",
"Major Terms:\n",
"•Term: 5 years starting April 1, 2006\n",
"•Payment: Fixed rent increases annually from $34.50/sf to $37.34/sf over the term.\n",
"•Utilities: Tenant pays for electricity and pro rata share of building expenses.\n",
"•Use: General office use only.\n",
"•Assignment/Subletting: Requires prior written consent of landlord.\n",
"\n",
"Key Obligations:\n",
"•Tenant must maintain insurance including liability and property insurance. \n",
"•Tenant responsible for interior maintenance/condition of the premises.\n",
"•Tenant must comply with all building rules and regulations.\n",
"•Tenant must maintain premises in good order and repair.\n",
"\n",
"Notable Provisions:\n",
"•Sublessor can recapture premises if tenant tries to assign/sublet without proper consent.\n",
"•Default provisions give sublessor multiple remedies including termination and accelerated rent. \n",
"•Tenant must indemnify landlord for claims related to tenant's use or actions.\n",
"•Tenant responsible for maintaining and repairing interior of premises.\n",
"\n",
"This appears to be a fairly standard commercial office sublease with typical provisions regarding tenant obligations, insurance requirements, default remedies, etc. The sublessor retains significant control and remedies while the sublessee has standard obligations for an office tenant.\n"
]
}
],
"source": [
"basic_response = basic_summarize(text, max_tokens=1000)\n",
"\n",
"print(basic_response)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This basic approach provides a simple summary, but it may not capture all the nuances we need for legal or financial documents. As you'll notice too, when you rerun the cell above, there is no standard, formalized output. Instead, we retrieve a basic summarization of the document, without much structured output to parse through. This makes it harder to read, more difficult to trust (how do we know it didn't miss something?) and thus trickier to use in any real world context.\n",
"\n",
"Let's see if we can tweak our prompt to get a more structure version of our summary."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Multi-Shot Basic Summarization\n",
"\n",
"It's pretty cool that we can summarize massive documents so fast, but we can do even better. Let's try adding a few examples to our prompt to see if we can improve the output and create a bit of structure in our output before we move on to even more advanced techniques. \n",
"\n",
"Note, here we haven't really change the actual format of the request, although we have appended 2 additional pieces: \n",
"\n",
"1. We've told the model \"do not preamble\". This can often be a good idea when it comes to constraining the model output to just the answer we want, without that initial form conversational angle you might be familiar with if you've used Claude before. It's particularly important when we aren't using other \"instructions\" within the prompt (as we might do later in this guide).\n",
"2. We append 3 examples of summarized documents. This is called few-shot or multi-shot learning, and it can help the model understand what we're looking for.\n",
"\n",
"Let's see how the output changes:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# We import from our data directory to save space in our notebook\n",
"from data.multiple_subleases import document1, document2, document3, sample1, sample2, sample3\n",
"\n",
"def basic_summarize_multishot(text, max_tokens=1000):\n",
"\n",
" # Prompt the model to summarize the text\n",
" prompt = f\"\"\"Summarize the following text in bullet points. Focus on the main ideas and key details:\n",
" {text}\n",
"\n",
" Do not preamble.\n",
"\n",
" Use these examples for guidance in summarizing:\n",
"\n",
" <example1>\n",
" <original1>\n",
" {document1}\n",
" </original1>\n",
"\n",
" <summary1>\n",
" {sample1}\n",
" </summary1>\n",
" </example1>\n",
"\n",
" <example2>\n",
" <original2>\n",
" {document2}\n",
" </original2>\n",
"\n",
" <summary2>\n",
" {sample2}\n",
" </summary2>\n",
" </example2>\n",
"\n",
" <example3>\n",
" <original3>\n",
" {document3}\n",
" </original3>\n",
"\n",
" <summary3>\n",
" {sample3}\n",
" </summary3>\n",
" </example3>\n",
" \"\"\"\n",
"\n",
" response = client.messages.create(\n",
" model=\"claude-3-5-sonnet-20241022\",\n",
" max_tokens=max_tokens,\n",
" system=\"You are a legal analyst known for highly accurate and detailed summaries of legal documents.\",\n",
" messages=[\n",
" {\n",
" \"role\": \"user\", \n",
" \"content\": prompt\n",
" },\n",
" {\n",
" \"role\": \"assistant\",\n",
" \"content\": \"Here is the summary of the legal document: <summary>\" \n",
" }\n",
" ],\n",
" stop_sequences=[\"</summary>\"]\n",
" )\n",
"\n",
" return response.content[0].text"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"Description: This is a sublease agreement between Cohen Brothers, LLC (Sublessor) and Taberna Capital Management, LLC (Sublessee) for office space in Philadelphia.\n",
"\n",
"<parties involved>\n",
"Sublessor: Cohen Brothers, LLC d/b/a Cohen & Company\n",
"Sublessee: Taberna Capital Management, LLC\n",
"Original lessor: Brandywine Cira, L.P.\n",
"</parties involved>\n",
"\n",
"<property details> \n",
"Address: 2929 Arch Street, Suite 1703, Philadelphia, PA\n",
"Description: 2,000 square feet of office space with access to file space, printers, copiers, kitchen, conference rooms\n",
"Permitted use: General office use\n",
"</property details>\n",
"\n",
"<term and rent>\n",
"Start date: April 1, 2006\n",
"End date: 5 years from start date\n",
"Monthly rent:\n",
"• Months 1-12: $5,750\n",
"• Months 13-24: $5,865\n",
"• Months 25-36: $5,981.67\n",
"• Months 37-48: $6,101.67\n",
"• Months 49-60: $6,223.33\n",
"</term and rent>\n",
"\n",
"<responsibilities>\n",
"Utilities: Not explicitly specified\n",
"Maintenance: Not explicitly specified\n",
"Repairs: Tenant responsible for damage repairs\n",
"Insurance: Tenant required to maintain liability insurance with $3M limit and workers compensation insurance\n",
"</responsibilities>\n",
"\n",
"<special provisions>\n",
"Default: Detailed events of default and remedies specified\n",
"Holdover: Double rent for unauthorized holdover period\n",
"Assignment/Subletting: Not permitted without landlord consent\n",
"Alterations: Require landlord consent\n",
"Access to services: Includes file space, copiers, conference rooms, receptionist services\n",
"</special provisions>\n",
"\n",
"\n"
]
}
],
"source": [
"basic_multishot_response = basic_summarize_multishot(text, max_tokens=1000)\n",
"\n",
"print(basic_multishot_response)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you look at the examples we provided, you see see that the format of the output above is the same *(go to data/<any of the .txt files> to see)*. This is interesting we didn't explicitly tell Claude to follow the format of the examples, but it seems to have picked up on it anyway. This illustrates the power of few-shot learning, and how Claude can generalize from a few examples to new inputs."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Summarization Techniques\n",
"\n",
"### Guided Summarization\n",
"\n",
"Guided summarization is where we explicitly define a framework for the model to abide by in it's summarization task. We can do this all via prompting, changing the details of the prompt to guide Claude to be more or less verbose, include more or less technical terminology, or provide a higher or lower level summary of the context at hand. For legal documents, we can guide the summarization to focus on specific aspects.\n",
"\n",
"Note, we could likely accomplish the same formatted output we reveal below via examples (which we explored above)!"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"def guided_legal_summary(text, max_tokens=1000):\n",
"\n",
" # Prompt the model to summarize the text\n",
" prompt = f\"\"\"Summarize the following legal document. Focus on these key aspects:\n",
"\n",
" 1. Parties involved\n",
" 2. Main subject matter\n",
" 3. Key terms and conditions\n",
" 4. Important dates or deadlines\n",
" 5. Any unusual or notable clauses\n",
"\n",
" Provide the summary in bullet points under each category.\n",
"\n",
" Document text:\n",
" {text}\n",
" \n",
" \"\"\"\n",
"\n",
" response = client.messages.create(\n",
" model=\"claude-3-5-sonnet-20241022\",\n",
" max_tokens=max_tokens,\n",
" system=\"You are a legal analyst known for highly accurate and detailed summaries of legal documents.\",\n",
" messages=[\n",
" {\n",
" \"role\": \"user\", \n",
" \"content\": prompt\n",
" },\n",
" {\n",
" \"role\": \"assistant\",\n",
" \"content\": \"Here is the summary of the legal document: <summary>\" \n",
" }\n",
" ],\n",
" stop_sequences=[\"</summary>\"]\n",
" )\n",
"\n",
" return response.content[0].text"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"1. Parties Involved\n",
"- Sublessor: Cohen Brothers, LLC d/b/a Cohen & Company\n",
"- Sublessee: Taberna Capital Management, LLC\n",
"- Original Landlord: Brandywine Cira, L.P. (Master Lease landlord)\n",
"\n",
"2. Main Subject Matter\n",
"- Sublease agreement for Suite 1703 at Cira Centre, 2929 Arch Street, Philadelphia, PA\n",
"- 2,000 square feet of office space within the Master Premises of 13,777 rentable square feet\n",
"- Includes furniture, file space, printers, copiers, kitchen, conference room facilities and receptionist/secretarial services\n",
"\n",
"3. Key Terms and Conditions\n",
"- Initial Term: 5 years from April 1, 2006 \n",
"- Fixed Rent: Escalating annual rent schedule starting at $34.50/sq ft in Year 1 ($69,000 annually) up to $37.34/sq ft in Year 5 ($74,680 annually)\n",
"- Pro rata share of operating expenses and utilities\n",
"- No assignment or subletting without Sublessor's prior written consent\n",
"- Sublessee takes premises \"AS IS\"\n",
"- Sublessee must maintain required insurance coverage\n",
"- Default provisions for non-payment, breach of lease terms, bankruptcy, etc.\n",
"\n",
"4. Important Dates/Deadlines \n",
"- Commencement Date: April 1, 2006\n",
"- Expiration Date: 5 years from Commencement Date\n",
"- Fixed Rent payable monthly in advance on 1st of each month\n",
"- 5-day grace period for late payments before default\n",
"\n",
"5. Notable Clauses\n",
"- Indemnification requirements for both parties\n",
"- Holdover rent at 2x monthly rate if Sublessee remains after term ends\n",
"- Sublessor not liable for utilities/services interruption\n",
"- Sublessee responsible for any construction liens\n",
"- Confession of judgment provision\n",
"- Waiver of jury trial provision\n",
"\n",
"\n"
]
}
],
"source": [
"# Example usage\n",
"legal_summary = guided_legal_summary(text)\n",
"print(legal_summary)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This certainly makes it easier to parse out the most relevant sections of the document and understand the implications of specific line items and important clauses."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Domain-Specific Guided Summarization\n",
"\n",
"You could give the above guided summarization prompt to any type of document, but we can make it even more powerful by tailoring it to specific document types. For example, if we know we're dealing with a sublease agreement, we can guide the model to focus on the most relevant legal terms and concepts for that particular type of document. This would be most relevant when we are working on a specific use case using Claude and explicitly know the most relevant values we want to extract.\n",
"\n",
"Here's an example of how we might modify our guided summarization function for sublease agreements. Note that we'll also add the 'model' as an additional parameter to our function so that we can more easily choose different models for summarization based upon the task:\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"def guided_sublease_summary(text, model=\"claude-3-5-sonnet-20241022\", max_tokens=1000):\n",
"\n",
" # Prompt the model to summarize the sublease agreement\n",
" prompt = f\"\"\"Summarize the following sublease agreement. Focus on these key aspects:\n",
"\n",
" 1. Parties involved (sublessor, sublessee, original lessor)\n",
" 2. Property details (address, description, permitted use)\n",
" 3. Term and rent (start date, end date, monthly rent, security deposit)\n",
" 4. Responsibilities (utilities, maintenance, repairs)\n",
" 5. Consent and notices (landlord's consent, notice requirements)\n",
" 6. Special provisions (furniture, parking, subletting restrictions)\n",
"\n",
" Provide the summary in bullet points nested within the XML header for each section. For example:\n",
"\n",
" <parties involved>\n",
" - Sublessor: [Name]\n",
" // Add more details as needed\n",
" </parties involved>\n",
" \n",
" If any information is not explicitly stated in the document, note it as \"Not specified\". Do not preamble.\n",
"\n",
" Sublease agreement text:\n",
" {text}\n",
" \n",
" \"\"\"\n",
"\n",
" response = client.messages.create(\n",
" model=model,\n",
" max_tokens=max_tokens,\n",
" system=\"You are a legal analyst specializing in real estate law, known for highly accurate and detailed summaries of sublease agreements.\",\n",
" messages=[\n",
" {\n",
" \"role\": \"user\", \n",
" \"content\": prompt\n",
" },\n",
" {\n",
" \"role\": \"assistant\",\n",
" \"content\": \"Here is the summary of the sublease agreement: <summary>\" \n",
" }\n",
" ],\n",
" stop_sequences=[\"</summary>\"]\n",
" )\n",
"\n",
" return response.content[0].text"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"<parties involved>\n",
"- Sublessor: Cohen Brothers, LLC (d/b/a Cohen & Company) \n",
"- Sublessee: Taberna Capital Management, LLC\n",
"- Original Lessor: Brandywine Cira, L.P. (Master Lease holder)\n",
"</parties involved>\n",
"\n",
"<property details>\n",
"- Address: 2929 Arch Street, Suite 1703, Philadelphia, PA\n",
"- Description: 2,000 square feet of office space in Suite 1703 \n",
"- Permitted Use: General office use\n",
"- Includes: Access to file space, printers, copiers, kitchen, conference rooms, receptionist/secretarial services\n",
"</property details>\n",
"\n",
"<term and rent>\n",
"- Start Date: April 1, 2006\n",
"- End Date: 5 years from commencement\n",
"- Monthly Rent: Escalating schedule starting at $5,750 in year 1 up to $6,223.33 in year 5\n",
"- Security Deposit: Not specified\n",
"</term and rent>\n",
"\n",
"<responsibilities>\n",
"- Utilities: Sublessee pays proportional share of utilities and operating expenses\n",
"- Maintenance: Sublessor responsible for base building maintenance\n",
"- Repairs: Sublessee responsible for repairs due to its use\n",
"- Insurance: Sublessee must maintain general liability and property insurance\n",
"</responsibilities>\n",
"\n",
"<consent and notices>\n",
"- Landlord's Consent: Required for assignment/subletting\n",
"- Notice Requirements: All notices must be in writing and delivered to specified addresses\n",
"- Sublessor's Consent: Required for alterations, improvements, signage\n",
"</consent and notices>\n",
"\n",
"<special provisions>\n",
"- Furniture: Included in lease\n",
"- Parking: Not included\n",
"- Assignment: No assignment/subletting without Sublessor's consent\n",
"- Default Remedies: Specified remedies including termination and accelerated rent\n",
"</special provisions>\n",
"\n",
"\n"
]
}
],
"source": [
"# Example usage\n",
"sublease_summary = guided_sublease_summary(text)\n",
"print(sublease_summary)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Because we decided to output each section of the summary in XML tags, we can now parse them individually out like so (this could also be done via JSON or any other format):"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Parties involved:\n",
"- Sublessor: Cohen Brothers, LLC (d/b/a Cohen & Company)\n",
"- Sublessee: Taberna Capital Management, LLC\n",
"- Original Lessor: Brandywine Cira, L.P. (Master Lease holder)\n"
]
}
],
"source": [
"import re\n",
"\n",
"def parse_sections_regex(text):\n",
" pattern = r'<(.*?)>(.*?)</\\1>'\n",
" matches = re.findall(pattern, text, re.DOTALL)\n",
" \n",
" parsed_sections = {}\n",
" for tag, content in matches:\n",
" items = [item.strip('- ').strip() for item in content.strip().split('\\n') if item.strip()]\n",
" parsed_sections[tag] = items\n",
" \n",
" return parsed_sections\n",
"\n",
"\n",
"# Parse the sections\n",
"parsed_sections = parse_sections_regex(sublease_summary)\n",
"\n",
"# Check if parsing was successful\n",
"if isinstance(parsed_sections, dict) and 'parties involved' in parsed_sections:\n",
" print(\"Parties involved:\")\n",
" for item in parsed_sections['parties involved']:\n",
" print(f\"- {item}\")\n",
"else:\n",
" print(\"Error: Parsing failed or 'parties involved' section not found.\")\n",
" print(\"Parsed result:\", parsed_sections)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Including the Context of Multiple Documents (Meta-Summarization)\n",
"\n",
"What if we have a lot of documents related to the same client? We can use a chunking method in order to handle this. This is a technique that involves breaking down documents into smaller, manageable chunks and then processing each chunk separately. We can then combine the summaries of each chunk to create a meta-summary of the entire document. This can be particularly helpful when we want to summarize a large number of documents or when we want to summarize a single document that is very long.\n",
"\n",
"Here's an example of how we might do this:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"<parties involved>\n",
"- Sublessor: Apex Innovations, Inc. (Delaware corporation) later identified as TechHub Enterprises, LLC\n",
"- Sublessee: NanoSphere Solutions, Inc. and Quantum Dynamics, LLC (California LLC)\n",
"- Original Lessor: Innovate Properties, LLP\n",
"</parties involved>\n",
"\n",
"<property details>\n",
"- Address: 9876 Innovation Park, Building C, San Francisco, CA 94107\n",
"- Description: Approximately 15,000-25,000 square feet of office and laboratory space\n",
"- Permitted Use: General office purposes, research and development, and laboratory uses consistent with BSL-2 facility requirements\n",
"</property details>\n",
"\n",
"<term and rent>\n",
"- Start Date: September 1, 2023\n",
"- End Date: August 31, 2026 (with option to extend for 3-5 additional years)\n",
"- Monthly Rent: Starting at $75,000/month with annual 3% increases\n",
"- Security Deposit: $450,000-$787,500\n",
"</term and rent>\n",
"\n",
"<responsibilities>\n",
"- Utilities: Sublessee responsible for all utilities and services, including electricity, gas, water, sewer, telephone, internet, and janitorial\n",
"- Maintenance: Sublessee responsible for interior maintenance, repairs and replacements, including walls, floors, ceilings, doors, windows, fixtures\n",
"- Repairs: Sublessee responsible for repairs except building structure, exterior walls, roof which are Sublessor's responsibility\n",
"</responsibilities>\n",
"\n",
"<consent and notices>\n",
"- Landlord's Consent: Required for assignments, subletting, alterations\n",
"- Notice Requirements: 30 days written notice for defaults, insurance changes; 9-12 months notice for term extensions\n",
"</consent and notices>\n",
"\n",
"<special provisions>\n",
"- Furniture: Right to install furniture/equipment 15-30 days before commencement\n",
"- Parking: Non-exclusive right to use common parking facilities\n",
"- Subletting Restrictions: No assignment/subletting without prior written consent, except to affiliated entities\n",
"- Additional: Hazardous materials restrictions, OFAC compliance requirements, jury trial waiver\n",
"</special provisions>\n",
"\n"
]
}
],
"source": [
"from data.multiple_subleases import document1, document2, document3\n",
"\n",
"def chunk_text(text, chunk_size=2000):\n",
" return [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]\n",
"\n",
"def summarize_long_document(text, max_tokens=2000):\n",
"\n",
" chunks = chunk_text(text)\n",
"\n",
" # Iterate over chunks and summarize each one\n",
" # We use guided_legal_summary here, but you can use basic_summarize or any other summarization function\n",
" # Note that we'll also use haiku for the interim summaries, and the 3.5 sonnet for the final summary\n",
" chunk_summaries = [guided_sublease_summary(chunk, model=\"claude-3-haiku-20240307\", max_tokens=max_tokens) for chunk in chunks]\n",
" \n",
" final_summary_prompt = f\"\"\"\n",
" \n",
" You are looking at the chunked summaries of multiple documents that are all related. Combine the following summaries of the document from different truthful sources into a coherent overall summary:\n",
"\n",
" {\"\".join(chunk_summaries)}\n",
"\n",
" 1. Parties involved (sublessor, sublessee, original lessor)\n",
" 2. Property details (address, description, permitted use)\n",
" 3. Term and rent (start date, end date, monthly rent, security deposit)\n",
" 4. Responsibilities (utilities, maintenance, repairs)\n",
" 5. Consent and notices (landlord's consent, notice requirements)\n",
" 6. Special provisions (furniture, parking, subletting restrictions)\n",
"\n",
" Provide the summary in bullet points nested within the XML header for each section. For example:\n",
"\n",
" <parties involved>\n",
" - Sublessor: [Name]\n",
" // Add more details as needed\n",
" </parties involved>\n",
" \n",
" If any information is not explicitly stated in the document, note it as \"Not specified\".\n",
"\n",
" Summary:\n",
" \"\"\"\n",
"\n",
" response = client.messages.create(\n",
" model=\"claude-3-5-sonnet-20241022\",\n",
" max_tokens=max_tokens,\n",
" system=\"You are a legal expert that summarizes notes on one document.\",\n",
" messages=[\n",
" {\n",
" \"role\": \"user\", \n",
" \"content\": final_summary_prompt\n",
" },\n",
" {\n",
" \"role\": \"assistant\",\n",
" \"content\": \"Here is the summary of the legal document: <summary>\" \n",
" }\n",
" ],\n",
" stop_sequences=[\"</summary>\"]\n",
" )\n",
" \n",
" return response.content[0].text\n",
"\n",
"# Example usage\n",
"# combine 3 documents (all related) together\n",
"text = document1 + document2 + document3\n",
"long_summary = summarize_long_document(text)\n",
"print(long_summary)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary Indexed Documents: An Advanced RAG Approach\n",
"\n",
"Summary Indexed Documents is an advanced approach to Retrieval-Augmented Generation (RAG) that operates at the document level. \n",
"\n",
"This method offers several advantages over traditional RAG techniques, particularly in scenarios involving large documents or when precise information retrieval is crucial.\n",
"\n",
"#### How It Works\n",
"\n",
"1. Document Summarization: Generate concise summaries for each document in your corpus (subset of text is queried and quickly summarized).\n",
"2. Context Window Optimization: Ensure all summaries fit within the context window of your language model.\n",
"3. Relevancy Scoring: Ask a model to rank the relevance of each summary to the query being asked.\n",
"4. Reranking (Optional): Apply reranking techniques to further refine and compress the top-K results.\n",
"5. Answer the query at-hand.\n",
"\n",
"There are some distinct advantages to this approach: \n",
"- More efficient way of ranking documents for retrieval, using less context than traditional RAG methods.\n",
"- Superior Performance on Specific Tasks: Outperforms other RAG methods, consistently ranking the correct document first.\n",
"- Optimized Information Retrieval: Reranking helps compress results, ensuring the most concise and relevant information is presented to the model."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"class LegalSummaryIndexedDocuments:\n",
"\n",
" def __init__(self, client):\n",
" self.client = client # Claude client\n",
" self.documents: List[Dict[str, str]] = [] # List of docs to store\n",
" self.summaries: List[str] = []\n",
"\n",
" def add_document(self, doc_id: str, content: str):\n",
" # Adds a document to the index\n",
" self.documents.append({\"id\": doc_id, \"content\": content})\n",
"\n",
" def generate_summaries(self):\n",
" # Generates summaries for all documents in the index\n",
" for doc in self.documents:\n",
" summary = self._generate_legal_summary(doc[\"content\"])\n",
" self.summaries.append(summary)\n",
"\n",
" def _generate_legal_summary(self, content: str) -> str:\n",
"\n",
" # Note how we constrain the content to a maximum of 2000 words. We do this because we don't need that much information for the intial ranking.\n",
" prompt = f\"\"\"\n",
" Summarize the following sublease agreement. Focus on these key aspects:\n",
"\n",
" 1. Parties involved (sublessor, sublessee, original lessor)\n",
" 2. Property details (address, description, permitted use)\n",
" 3. Term and rent (start date, end date, monthly rent, security deposit)\n",
" 4. Responsibilities (utilities, maintenance, repairs)\n",
" 5. Consent and notices (landlord's consent, notice requirements)\n",
" 6. Special provisions (furniture, parking, subletting restrictions)\n",
"\n",
" Provide the summary in bullet points nested within the XML header for each section. For example:\n",
"\n",
" <parties involved>\n",
" - Sublessor: [Name]\n",
" // Add more details as needed\n",
" </parties involved>\n",
" \n",
" If any information is not explicitly stated in the document, note it as \"Not specified\".\n",
"\n",
" Sublease agreement text:\n",
" {content[:2000]}...\n",
"\n",
" Summary:\n",
" \"\"\"\n",
"\n",
" response = client.messages.create(\n",
" model=\"claude-3-5-sonnet-20241022\",\n",
" max_tokens=500,\n",
" temperature=0.2,\n",
" messages=[\n",
" {\"role\": \"user\", \"content\": prompt},\n",
" {\"role\": \"assistant\", \"content\": \"Here is the summary of the legal document: <summary>\"}\n",
" ],\n",
" stop_sequences=[\"</summary>\"] \n",
" )\n",
" return response.content[0].text\n",
"\n",
" def rank_documents(self, query: str, top_k: int = 3) -> List[Tuple[str, float]]:\n",
" \"\"\"\n",
" Rank documents based on their relevance to the given query.\n",
" We use Haiku here as a cheaper, faster model for ranking. \n",
" \"\"\"\n",
" ranked_scores = []\n",
" for summary in self.summaries:\n",
"\n",
" prompt=f\"Legal document summary: {summary}\\n\\nLegal query: {query}\\n\\nRate the relevance of this legal document to the query on a scale of 0 to 10. Only output the numeric value:\"\n",
"\n",
" response = client.messages.create(\n",
" model=\"claude-3-haiku-20240307\",\n",
" max_tokens=2,\n",
" temperature=0,\n",
" messages=[\n",
" {\"role\": \"user\", \"content\": prompt}\n",
" ]\n",
" )\n",
" ranked_score = float(response.content[0].text)\n",
" ranked_scores.append(ranked_score)\n",
"\n",
" ranked_indices = np.argsort(ranked_scores)[::-1][:top_k]\n",
" return [(self.documents[i][\"id\"], ranked_scores[i]) for i in ranked_indices]\n",
"\n",
" def extract_relevant_clauses(self, doc_id: str, query: str) -> List[str]:\n",
" \"\"\"\n",
" Extracts relevant clauses from a document based on a query.\n",
" \"\"\"\n",
" doc_content = next(doc[\"content\"] for doc in self.documents if doc[\"id\"] == doc_id)\n",
" \n",
" prompt = f\"\"\"\n",
" Given the following legal query and document content, extract the most relevant clauses or sections and write the answer to the query. \n",
" Provide each relevant clause or section separately, preserving the original legal language:\n",
"\n",
" Legal query: {query}\n",
"\n",
" Document content: {doc_content}...\n",
"\n",
" Relevant clauses or sections (separated by '---'):\"\"\"\n",
"\n",
" response = client.messages.create(\n",
" model=\"claude-3-5-sonnet-20241022\",\n",
" max_tokens=1000,\n",
" temperature=0,\n",
" messages=[\n",
" {\"role\": \"user\", \"content\": prompt}\n",
" ]\n",
" )\n",
" \n",
" clauses = re.split(r'\\n\\s*---\\s*\\n', response.content[0].text.strip())\n",
" return [clause.strip() for clause in clauses if clause.strip()]"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Initial ranking: [('doc1', 8.0), ('doc3', 0.0), ('doc2', 0.0)]\n",
"\n",
"Relevant clauses from the top-ranked document:\n",
"Clause 1: COMMERCIAL SUBLEASE AGREEMENT\n",
"\n",
"THIS COMMERCIAL SUBLEASE AGREEMENT (hereinafter referred to as the \"Sublease\") is made and entered into on this 15th day of August, 2023 (the \"Effective Date\"), by and between:\n",
"\n",
"SUBLESSOR: Apex Innovations, Inc., a Delaware corporation with its principal place of business at 1234 Tech Boulevard, Suite 5000, San Francisco, CA 94105 (hereinafter referred to as the \"Sublessor\")\n",
"Clause 2: WHEREAS, Sublessor is the Tenant under that certain Master Lease Agreement dated January 1, 2020 (hereinafter referred to as the \"Master Lease\"), wherein Innovate Properties, LLP (hereinafter referred to as the \"Master Lessor\") leased to Sublessor those certain premises consisting of approximately 50,000 square feet of office space located at 9876 Innovation Park, Building C, Floors 10-12, San Francisco, CA 94107 (hereinafter referred to as the \"Master Premises\");\n",
"Clause 3: Answer: There appears to be an error in the legal query. The query refers to \"Apex Innovations, LLC\" but the document shows that the sublessor is actually \"Apex Innovations, Inc.\", a Delaware corporation. The contract for Apex Innovations, Inc. is:\n",
"\n",
"1. A Commercial Sublease Agreement dated August 15, 2023, where they are the Sublessor\n",
"2. A Master Lease Agreement dated January 1, 2020, where they are the Tenant under Innovate Properties, LLP\n"
]
}
],
"source": [
"from data.multiple_subleases import document1, document2, document3\n",
"\n",
"lsid = LegalSummaryIndexedDocuments(client=client)\n",
"\n",
"# Add documents\n",
"lsid.add_document(\"doc1\", document1)\n",
"lsid.add_document(\"doc2\", document2)\n",
"lsid.add_document(\"doc3\", document3)\n",
"\n",
"# Generate summaries - this would happen at ingestion\n",
"lsid.generate_summaries()\n",
"\n",
"# Rank documents for a legal query\n",
"legal_query = \"What contract is for the sublessor Apex Innovations, LLC?\"\n",
"ranked_results = lsid.rank_documents(legal_query)\n",
"\n",
"print(\"Initial ranking:\", ranked_results)\n",
"\n",
"# Extract relevant clauses from the top-ranked document\n",
"top_doc_id = ranked_results[0][0]\n",
"relevant_clauses = lsid.extract_relevant_clauses(top_doc_id, legal_query)\n",
"\n",
"print(\"\\nRelevant clauses from the top-ranked document:\")\n",
"for i, clause in enumerate(relevant_clauses[1:], 1):\n",
" print(f\"Clause {i}: {clause}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Best Practices for Summarization RAG\n",
"\n",
"- Optimal Summary Length: Experiment with different summary lengths to find the right balance between conciseness and informativeness.\n",
"- Iterative Reranking: Consider multiple rounds of reranking for more precise results, especially with larger document sets.\n",
"- Caching: Implement caching mechanisms for summaries and initial rankings to improve performance on repeated queries.\n",
"\n",
"Summary Indexed Documents offer a powerful approach to RAG, particularly excelling in scenarios involving large documents or when precise information retrieval is crucial. By leveraging document summarization, log probability scoring, and optional reranking, this method provides an efficient and effective way to retrieve and present relevant information to language models."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evaluations\n",
"\n",
"As mentioned in the introduction to this cookbook, evaluating the quality of a summary is hard work. This is because there are many ways to summarize a document, and different summaries may be equally valid. Depending on the use case, different aspects of a summary may be more or less important.\n",
"\n",
"You can read more about our empirical methodology to prompt engineering [here](https://docs.claude.com/en/docs/prompt-engineering). Using a Jupyter Notebook is a great way to start prompt engineering but as your datasets grow larger and your prompts more numerous it is important to leverage tooling that will scale with you. \n",
"\n",
"In this section of the guide we will explore using [Promptfoo](https://www.promptfoo.dev/) an open source LLM evaluation toolkit. To get started head over to the `./evaluation` directory and checkout the `./evaluation/README.md`.\n",
"\n",
"When you have successfully run an evaluation come back here to view the results. You can also view the results in a dynamic way using the command `npx promptfoo@latest view`, after creating some results."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/var/folders/c8/rjj6d5_15tj4qh_zhlnz9xxr0000gp/T/ipykernel_58010/2701104606.py:7: MatplotlibDeprecationWarning: The seaborn styles shipped by Matplotlib are deprecated since 3.6, as they no longer correspond to the styles shipped by seaborn. However, they will remain available as 'seaborn-v0_8-<style>'. Alternatively, directly use the seaborn API instead.\n",
" plt.style.use('seaborn')\n"
]
},
{
"data": {
"text/plain": [
"<Figure size 1200x600 with 0 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 800x550 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 800x800 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>[3.0 Haiku] prompts.py:basic_summarize_score</th>\n",
" <th>[3.0 Haiku] prompts.py:guided_legal_summary_score</th>\n",
" <th>[3.0 Haiku] prompts.py:summarize_long_document_score</th>\n",
" <th>[3.5 Sonnet] prompts.py:basic_summarize_score</th>\n",
" <th>[3.5 Sonnet] prompts.py:guided_legal_summary_score</th>\n",
" <th>[3.5 Sonnet] prompts.py:summarize_long_document_score</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>9.000000</td>\n",
" <td>9.000000</td>\n",
" <td>9.000000</td>\n",
" <td>9.000000</td>\n",
" <td>9.000000</td>\n",
" <td>9.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>1.423333</td>\n",
" <td>1.443333</td>\n",
" <td>1.522222</td>\n",
" <td>1.088889</td>\n",
" <td>1.330000</td>\n",
" <td>1.475556</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>0.135647</td>\n",
" <td>0.146969</td>\n",
" <td>0.092030</td>\n",
" <td>0.535547</td>\n",
" <td>0.086458</td>\n",
" <td>0.285750</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.190000</td>\n",
" <td>1.270000</td>\n",
" <td>1.440000</td>\n",
" <td>0.000000</td>\n",
" <td>1.230000</td>\n",
" <td>0.750000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>1.400000</td>\n",
" <td>1.300000</td>\n",
" <td>1.460000</td>\n",
" <td>1.210000</td>\n",
" <td>1.280000</td>\n",
" <td>1.450000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>1.440000</td>\n",
" <td>1.450000</td>\n",
" <td>1.460000</td>\n",
" <td>1.290000</td>\n",
" <td>1.310000</td>\n",
" <td>1.600000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>1.510000</td>\n",
" <td>1.490000</td>\n",
" <td>1.630000</td>\n",
" <td>1.400000</td>\n",
" <td>1.340000</td>\n",
" <td>1.640000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>1.600000</td>\n",
" <td>1.660000</td>\n",
" <td>1.660000</td>\n",
" <td>1.490000</td>\n",
" <td>1.480000</td>\n",
" <td>1.650000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" [3.0 Haiku] prompts.py:basic_summarize_score \\\n",
"count 9.000000 \n",
"mean 1.423333 \n",
"std 0.135647 \n",
"min 1.190000 \n",
"25% 1.400000 \n",
"50% 1.440000 \n",
"75% 1.510000 \n",
"max 1.600000 \n",
"\n",
" [3.0 Haiku] prompts.py:guided_legal_summary_score \\\n",
"count 9.000000 \n",
"mean 1.443333 \n",
"std 0.146969 \n",
"min 1.270000 \n",
"25% 1.300000 \n",
"50% 1.450000 \n",
"75% 1.490000 \n",
"max 1.660000 \n",
"\n",
" [3.0 Haiku] prompts.py:summarize_long_document_score \\\n",
"count 9.000000 \n",
"mean 1.522222 \n",
"std 0.092030 \n",
"min 1.440000 \n",
"25% 1.460000 \n",
"50% 1.460000 \n",
"75% 1.630000 \n",
"max 1.660000 \n",
"\n",
" [3.5 Sonnet] prompts.py:basic_summarize_score \\\n",
"count 9.000000 \n",
"mean 1.088889 \n",
"std 0.535547 \n",
"min 0.000000 \n",
"25% 1.210000 \n",
"50% 1.290000 \n",
"75% 1.400000 \n",
"max 1.490000 \n",
"\n",
" [3.5 Sonnet] prompts.py:guided_legal_summary_score \\\n",
"count 9.000000 \n",
"mean 1.330000 \n",
"std 0.086458 \n",
"min 1.230000 \n",
"25% 1.280000 \n",
"50% 1.310000 \n",
"75% 1.340000 \n",
"max 1.480000 \n",
"\n",
" [3.5 Sonnet] prompts.py:summarize_long_document_score \n",
"count 9.000000 \n",
"mean 1.475556 \n",
"std 0.285750 \n",
"min 0.750000 \n",
"25% 1.450000 \n",
"50% 1.600000 \n",
"75% 1.640000 \n",
"max 1.650000 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import re\n",
"\n",
"%matplotlib inline\n",
"plt.style.use('seaborn')\n",
"\n",
"# Load the data\n",
"df = pd.read_csv('data/results.csv')\n",
"\n",
"# Function to extract PASS/FAIL and score\n",
"def extract_result(text):\n",
" match = re.search(r'\\[(PASS|FAIL)\\]\\s*\\((\\d+\\.\\d+)\\)', str(text))\n",
" if match:\n",
" return match.group(1), float(match.group(2))\n",
" return 'UNKNOWN', 0.0\n",
"\n",
"# Apply the extraction to relevant columns\n",
"for col in df.columns[2:]:\n",
" df[f'{col}_result'], df[f'{col}_score'] = zip(*df[col].apply(extract_result))\n",
"\n",
"# Prepare data for grouped accuracy score\n",
"models = ['3.5 Sonnet', '3.0 Haiku']\n",
"prompts = ['basic_summarize', 'guided_legal_summary', 'summarize_long_document']\n",
"\n",
"results = []\n",
"for model in models:\n",
" for prompt in prompts:\n",
" col = f'[{model}] prompts.py:{prompt}_result'\n",
" if col in df.columns:\n",
" pass_rate = (df[col] == 'PASS').mean()\n",
" results.append({'Model': model, 'Prompt': prompt, 'Pass Rate': pass_rate})\n",
"\n",
"result_df = pd.DataFrame(results)\n",
"\n",
"# 1. Grouped bar chart for accuracy scores\n",
"plt.figure(figsize=(12, 6))\n",
"result_pivot = result_df.pivot(index='Prompt', columns='Model', values='Pass Rate')\n",
"result_pivot.plot(kind='bar')\n",
"plt.title('Pass Rate by Model and Prompt')\n",
"plt.ylabel('Pass Rate')\n",
"plt.legend(title='Model')\n",
"plt.xticks(rotation=45, ha='right')\n",
"plt.tight_layout()\n",
"plt.show()\n",
"\n",
"# 2. Box plot of scores\n",
"plt.figure(figsize=(8, 8))\n",
"score_cols = [col for col in df.columns if col.endswith('_score')]\n",
"score_data = df[score_cols].melt()\n",
"sns.boxplot(x='variable', y='value', data=score_data)\n",
"plt.title('Distribution of Scores')\n",
"plt.xticks(rotation=90)\n",
"plt.xlabel('Model and Prompt')\n",
"plt.ylabel('Score')\n",
"plt.tight_layout()\n",
"plt.show()\n",
"\n",
"# Display summary statistics\n",
"summary_stats = df[[col for col in df.columns if col.endswith('_score')]].describe()\n",
"display(summary_stats)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Looking at the results, it seems like our best performer is 3.5 Sonnet, with a 66% pass rate across all evals and only 3 failed tests out of 45 (when one test fails for a prompt, it's deemed a fail). And this is just the beginning we are using entirely notional data here that was either (a) generated by Claude, or (b) taken from the SEC gov website. We can do much better when we have real data, because we know more about the distinct problem set we are working with."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Iterative Improvement\n",
"\n",
"As we look into the eval results more, there's continued room for improvement. This is where the iterative part of prompt engineering comes in. Here are some steps we can take to improve our results:\n",
"\n",
"1. Analyze the Promptfoo results to identify strengths and weaknesses for example, it seems our contains eval was failing a lot. This is probably because some of the documents don't contain the information which is required in the XML tags. We should refine this eval if we are to accurately assess performance (but this is just an example!).\n",
"2. Refine prompts to address specific issues (e.g., improve conciseness or completeness) we saw that multi-shot was an initial really good first attempt. This is somethign we should incorporate alongside some of the advanced techniques to improve performance further.\n",
"3. Experiment with different chunking strategies for long documents.\n",
"4. Fine-tune temperature and max_tokens parameters.\n",
"5. Implement post-processing steps to enhance summary quality.\n",
"\n",
"## Conclusion and Best Practices\n",
"\n",
"In this guide, we've covered a range of techniques for summarizing documents with Claude, with a focus on legal documents. Building a perfect summarization system and eval framework for summarization is an art: it requires a combination of these methods in order to succeed. As we mentioned at the beginning, summarization is a very subjective topic, and yet we've had a good stab at finding feasible ways to evaluate it and feel comfortable with our results. Always remember too you aren't benchmarking your results against 100% accuracy. You're benchmarking against how well you could perform this complex task yourself; and with the speed and efficiency of Claude as demonstrated in this guide, you start to realise the true benefits to such a methodological approach, so you can spend time on the real decision making. \n",
"\n",
"Wrapping up the advice here, we've included some best practices to keep in mind:\n",
"\n",
"1. Craft clear and specific prompts. Use things like \"don't preamble\" to constrain the output.\n",
"2. Use at least 2 examples.\n",
"3. Use guided summarization for domain-specific documents.\n",
"4. Implement effective, advanced strategies for long documents.\n",
"5. Regularly evaluate and refine your approach.\n",
"6. Consider the ethical implications and limitations of AI-generated summaries."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "py311",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}