Updates to Text to SQL guide

This commit is contained in:
Mahesh Murag
2024-09-25 17:16:54 +02:00
parent a3c89d8a4a
commit 2e28921650

View File

@@ -2078,6 +2078,58 @@
"![image-3.png](attachment:image-3.png)\n",
"![image.png](attachment:image.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Further Exploration & Next Steps\n",
"\n",
"This guide covers the basics of building a Text-to-SQL system with Claude. Here are some directions to explore that can help improve your solution:\n",
"\n",
"### Refining Retrieval Performance\n",
"\n",
"As databases grow, it's important to make sure your RAG system finds the most relevant and current information:\n",
"\n",
"1. **Recent usage filter**: Focus on actively-maintained and fresh data by only including tables in the RAG lookup that have been queried a certain number of times in a set timeframe.\n",
"\n",
"2. **Query frequency ranking**: Prioritize more commonly-used data by ranking RAG results based on how often tables are queried in production.\n",
"\n",
"3. **Regular updates**: Set up a system to update your vector database when schemas change, keeping it current with your database.\n",
"\n",
"4. **Context-aware embeddings**: Try embedding table relationships and usage patterns, along with their names, to improve search relevance.\n",
"\n",
"### Adding More Context to Prompts\n",
"\n",
"Giving Claude more information about your data structure and content in prompts, in addition to database schemas, can help it generate better queries:\n",
"\n",
"1. **Data samples**: Include a few rows of actual data for each relevant table in your prompts. For example:\n",
"\n",
"```\n",
"<data_sample>\n",
"Sample data for employees table:\n",
"id | name | age | department_id | salary | hire_date\n",
"1 | John Doe | 35 | 2 | 75000.0 | 2020-01-15\n",
"2 | Jane Smith | 28 | 3 | 65000.0 | 2021-03-01\n",
"</data_sample>\n",
"```\n",
"\n",
"2. **Column statistics**: Add useful facts about each column, such as:\n",
" - Percentage of empty values\n",
" - Lowest and highest values for number columns\n",
" - Most common values for category columns\n",
"\n",
"3. **Data quality notes**: Mention any known issues or special cases with specific columns or tables.\n",
"\n",
"4. **Data catalog information**: If you use tools like dbt to manage your data, include relevant details from your data catalog:\n",
" - Table overviews from dbt .yml files\n",
" - Column explanations, including calculations or business rules\n",
" - Table relationships and data lineage\n",
" - Data quality checks and expectations\n",
" - Update frequency for time-sensitive queries\n",
"\n",
"5. **Business context**: Add information about how the data is used in your organization, common types of analyses performed, or important business metrics derived from the data."
]
}
],
"metadata": {