mirror of
				https://github.com/anthropics/claude-cookbooks.git
				synced 2025-10-06 01:00:28 +03:00 
			
		
		
		
	Updates to Text to SQL guide
This commit is contained in:
		| @@ -2078,6 +2078,58 @@ | ||||
|     "\n", | ||||
|     "" | ||||
|    ] | ||||
|   }, | ||||
|   { | ||||
|    "cell_type": "markdown", | ||||
|    "metadata": {}, | ||||
|    "source": [ | ||||
|     "## Further Exploration & Next Steps\n", | ||||
|     "\n", | ||||
|     "This guide covers the basics of building a Text-to-SQL system with Claude. Here are some directions to explore that can help improve your solution:\n", | ||||
|     "\n", | ||||
|     "### Refining Retrieval Performance\n", | ||||
|     "\n", | ||||
|     "As databases grow, it's important to make sure your RAG system finds the most relevant and current information:\n", | ||||
|     "\n", | ||||
|     "1. **Recent usage filter**: Focus on actively-maintained and fresh data by only including tables in the RAG lookup that have been queried a certain number of times in a set timeframe.\n", | ||||
|     "\n", | ||||
|     "2. **Query frequency ranking**: Prioritize more commonly-used data by ranking RAG results based on how often tables are queried in production.\n", | ||||
|     "\n", | ||||
|     "3. **Regular updates**: Set up a system to update your vector database when schemas change, keeping it current with your database.\n", | ||||
|     "\n", | ||||
|     "4. **Context-aware embeddings**: Try embedding table relationships and usage patterns, along with their names, to improve search relevance.\n", | ||||
|     "\n", | ||||
|     "### Adding More Context to Prompts\n", | ||||
|     "\n", | ||||
|     "Giving Claude more information about your data structure and content in prompts, in addition to database schemas, can help it generate better queries:\n", | ||||
|     "\n", | ||||
|     "1. **Data samples**: Include a few rows of actual data for each relevant table in your prompts. For example:\n", | ||||
|     "\n", | ||||
|     "```\n", | ||||
|     "<data_sample>\n", | ||||
|     "Sample data for employees table:\n", | ||||
|     "id | name         | age | department_id | salary  | hire_date\n", | ||||
|     "1  | John Doe     | 35  | 2             | 75000.0 | 2020-01-15\n", | ||||
|     "2  | Jane Smith   | 28  | 3             | 65000.0 | 2021-03-01\n", | ||||
|     "</data_sample>\n", | ||||
|     "```\n", | ||||
|     "\n", | ||||
|     "2. **Column statistics**: Add useful facts about each column, such as:\n", | ||||
|     "   - Percentage of empty values\n", | ||||
|     "   - Lowest and highest values for number columns\n", | ||||
|     "   - Most common values for category columns\n", | ||||
|     "\n", | ||||
|     "3. **Data quality notes**: Mention any known issues or special cases with specific columns or tables.\n", | ||||
|     "\n", | ||||
|     "4. **Data catalog information**: If you use tools like dbt to manage your data, include relevant details from your data catalog:\n", | ||||
|     "   - Table overviews from dbt .yml files\n", | ||||
|     "   - Column explanations, including calculations or business rules\n", | ||||
|     "   - Table relationships and data lineage\n", | ||||
|     "   - Data quality checks and expectations\n", | ||||
|     "   - Update frequency for time-sensitive queries\n", | ||||
|     "\n", | ||||
|     "5. **Business context**: Add information about how the data is used in your organization, common types of analyses performed, or important business metrics derived from the data." | ||||
|    ] | ||||
|   } | ||||
|  ], | ||||
|  "metadata": { | ||||
|   | ||||
		Reference in New Issue
	
	Block a user
	 Mahesh Murag
					Mahesh Murag