Updates to Text to SQL guide

2025-10-06 01:00:28 +03:00 · 2024-09-25 17:16:54 +02:00
parent a3c89d8a4a
commit 2e28921650
1 changed files with 52 additions and 0 deletions
--- a/skills/text_to_sql/guide.ipynb
+++ b/skills/text_to_sql/guide.ipynb
@@ -2078,6 +2078,58 @@
    "![image-3.png](attachment:image-3.png)\n",
    "![image.png](attachment:image.png)"
   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Further Exploration & Next Steps\n",
+    "\n",
+    "This guide covers the basics of building a Text-to-SQL system with Claude. Here are some directions to explore that can help improve your solution:\n",
+    "\n",
+    "### Refining Retrieval Performance\n",
+    "\n",
+    "As databases grow, it's important to make sure your RAG system finds the most relevant and current information:\n",
+    "\n",
+    "1. **Recent usage filter**: Focus on actively-maintained and fresh data by only including tables in the RAG lookup that have been queried a certain number of times in a set timeframe.\n",
+    "\n",
+    "2. **Query frequency ranking**: Prioritize more commonly-used data by ranking RAG results based on how often tables are queried in production.\n",
+    "\n",
+    "3. **Regular updates**: Set up a system to update your vector database when schemas change, keeping it current with your database.\n",
+    "\n",
+    "4. **Context-aware embeddings**: Try embedding table relationships and usage patterns, along with their names, to improve search relevance.\n",
+    "\n",
+    "### Adding More Context to Prompts\n",
+    "\n",
+    "Giving Claude more information about your data structure and content in prompts, in addition to database schemas, can help it generate better queries:\n",
+    "\n",
+    "1. **Data samples**: Include a few rows of actual data for each relevant table in your prompts. For example:\n",
+    "\n",
+    "```\n",
+    "<data_sample>\n",
+    "Sample data for employees table:\n",
+    "id | name         | age | department_id | salary  | hire_date\n",
+    "1  | John Doe     | 35  | 2             | 75000.0 | 2020-01-15\n",
+    "2  | Jane Smith   | 28  | 3             | 65000.0 | 2021-03-01\n",
+    "</data_sample>\n",
+    "```\n",
+    "\n",
+    "2. **Column statistics**: Add useful facts about each column, such as:\n",
+    "   - Percentage of empty values\n",
+    "   - Lowest and highest values for number columns\n",
+    "   - Most common values for category columns\n",
+    "\n",
+    "3. **Data quality notes**: Mention any known issues or special cases with specific columns or tables.\n",
+    "\n",
+    "4. **Data catalog information**: If you use tools like dbt to manage your data, include relevant details from your data catalog:\n",
+    "   - Table overviews from dbt .yml files\n",
+    "   - Column explanations, including calculations or business rules\n",
+    "   - Table relationships and data lineage\n",
+    "   - Data quality checks and expectations\n",
+    "   - Update frequency for time-sensitive queries\n",
+    "\n",
+    "5. **Business context**: Add information about how the data is used in your organization, common types of analyses performed, or important business metrics derived from the data."
+   ]
  }
 ],
 "metadata": {