more comments and markdown

This commit is contained in:
rotbka
2025-02-15 21:05:36 +02:00
parent c1d4bb450f
commit db8b6a7b6c

View File

@@ -48,9 +48,6 @@
"\n",
"2. **Scoring & Selection** \n",
" - Each documents overall score combines **relevance** and **diversity**: \n",
" \\[\n",
" \\text{Score}(d) = \\alpha \\cdot \\text{Relevance}(d)\\;-\\;\\beta \\cdot \\text{Diversity}(d)\n",
" \\] \n",
" - Select the highest-scoring document, then penalize documents that are overly similar to it. \n",
" - Repeat until top-k documents are identified.\n",
"\n",
@@ -260,7 +257,7 @@
"metadata": {},
"source": [
"### Regular top k retrieval\n",
"- This demonstration shows that when database is dense (here we simulate density by loading each document 5 times), the results are not good, we don't get the most relevant results. "
"- This demonstration shows that when database is dense (here we simulate density by loading each document 5 times), the results are not good, we don't get the most relevant results. Note that the top 3 results are all repetitions of the same document."
]
},
{
@@ -359,7 +356,8 @@
"metadata": {},
"source": [
"\n",
"### Definitions of parameters, and the actual function that optimizes both relevance and diversity "
"### Definitions of parameters, and the actual function that optimizes both relevance and diversity \n",
"This is the core function that chooses the top k documents based on relevance and diversity. It uses distances between each candidate document and the query and between candidate documents."
]
},
{
@@ -405,7 +403,12 @@
"metadata": {},
"source": [
"\n",
"### Main function for using the dartboard retrieval. This serves instead of get_context (which is simple RAG)"
"### Main function for using the dartboard retrieval. This serves instead of get_context (which is simple RAG) it:\n",
"\n",
"1. Takes a text query, vectorzes it, gets the top k documents (and their vectors) via simple RAG. \n",
"2. Uses these vectors to calculate the similarities to query and between candidate matches.\n",
"3. Runs the dartboard algorithm to refine the candidate matches to a final list of k documents.\n",
"4. Returns the final list of documents and their scores."
]
},
{