add bm25 hybrid to docs (#137)

Signed-off-by: ChengZi <chen.zhang@zilliz.com>
2025-10-06 01:10:02 +03:00 · 2025-08-08 11:56:10 +08:00
parent 5589686e50
commit 5210b97857
5 changed files with 12 additions and 5 deletions
--- a/.env.example
+++ b/.env.example
@@ -82,3 +82,6 @@ SPLITTER_TYPE=ast
 # Additional ignore patterns to exclude files/directories (comma-separated)
 # Example: temp/**,*.backup,private/**,uploads/**
 # CUSTOM_IGNORE_PATTERNS=temp/**,*.backup,private/**
 # Whether to use hybrid search mode. If true, it will use both dense vector and BM25; if false, it will use only dense vector search.
 # HYBRID_MODE=true
--- a/.gitignore
+++ b/.gitignore
@@ -54,6 +54,9 @@ Thumbs.db
 *.crx
 *.pem
 __pycache__/
 *.log
 .claude/*
 CLAUDE.md
--- a/README.md
+++ b/README.md
@@ -403,7 +403,7 @@ For more detailed MCP environment variable configuration, see our [Environment V
 ### 🔧 Implementation Details
- 🔍 **Semantic Code Search**: Ask questions like *"find functions that handle user authentication"* and get relevant, context-rich code instantly.
+- 🔍 **Hybrid Code Search**: Ask questions like *"find functions that handle user authentication"* and get relevant, context-rich code instantly using advanced hybrid search (BM25 + dense vector).
 - 🧠 **Context-Aware**: Discover large codebase, understand how different parts of your codebase relate, even across millions of lines of code.
 - ⚡ **Incremental Indexing**: Efficiently re-index only changed files using Merkle trees.
 - 🧩 **Intelligent Code Chunking**: Analyze code in Abstract Syntax Trees (AST) for chunking.
--- a/docs/getting-started/environment-variables.md
+++ b/docs/getting-started/environment-variables.md
@@ -40,6 +40,7 @@ Claude Context supports a global configuration file at `~/.context/.env` to simp
 ### Advanced Configuration
 | Variable | Description | Default |
 |----------|-------------|---------|
 | `HYBRID_MODE` | Enable hybrid search (BM25 + dense vector). Set to `false` for dense-only search | `true` |
 | `EMBEDDING_BATCH_SIZE` | Batch size for processing. Larger batch size means less indexing time | `100` |
 | `SPLITTER_TYPE` | Code splitter type: `ast`, `langchain` | `ast` |
 | `CUSTOM_EXTENSIONS` | Additional file extensions to include (comma-separated, e.g., `.vue,.svelte,.astro`) | None |
--- a/docs/getting-started/overview.md
+++ b/docs/getting-started/overview.md
@@ -6,8 +6,8 @@ Claude Context is a powerful semantic code search tool that gives AI coding assi
 ## Key Features
-### 🔍 Semantic Code Search
+### 🔍 Hybrid Code Search
-Ask natural language questions like "find functions that handle user authentication" and get relevant code snippets from across your entire codebase.
+Ask natural language questions like "find functions that handle user authentication" and get relevant code snippets from across your entire codebase using advanced hybrid search (BM25 + dense vector).
 ### 🧠 Context-Aware Understanding
 Discover relationships between different parts of your code, even across millions of lines. The system understands code structure, patterns, and dependencies.
@@ -38,8 +38,8 @@ Each code chunk is converted into high-dimensional vectors using state-of-the-ar
 ### 4. Vector Storage
 Embeddings are stored in a vector database (Milvus/Zilliz Cloud) for efficient similarity search.
-### 5. Semantic Search
+### 5. Hybrid Search
-Natural language queries are converted to vectors and matched against stored code embeddings.
+Natural language queries are processed using both dense vector embeddings and BM25 sparse retrieval, then combined with RRF (Reciprocal Rank Fusion) for optimal results.
 ## Architecture Components