add bm25 hybrid to docs (#137)

Signed-off-by: ChengZi <chen.zhang@zilliz.com>
This commit is contained in:
Cheney Zhang
2025-08-08 11:56:10 +08:00
committed by GitHub
parent 5589686e50
commit 5210b97857
5 changed files with 12 additions and 5 deletions

View File

@@ -82,3 +82,6 @@ SPLITTER_TYPE=ast
# Additional ignore patterns to exclude files/directories (comma-separated) # Additional ignore patterns to exclude files/directories (comma-separated)
# Example: temp/**,*.backup,private/**,uploads/** # Example: temp/**,*.backup,private/**,uploads/**
# CUSTOM_IGNORE_PATTERNS=temp/**,*.backup,private/** # CUSTOM_IGNORE_PATTERNS=temp/**,*.backup,private/**
# Whether to use hybrid search mode. If true, it will use both dense vector and BM25; if false, it will use only dense vector search.
# HYBRID_MODE=true

3
.gitignore vendored
View File

@@ -54,6 +54,9 @@ Thumbs.db
*.crx *.crx
*.pem *.pem
__pycache__/
*.log
.claude/* .claude/*
CLAUDE.md CLAUDE.md

View File

@@ -403,7 +403,7 @@ For more detailed MCP environment variable configuration, see our [Environment V
### 🔧 Implementation Details ### 🔧 Implementation Details
- 🔍 **Semantic Code Search**: Ask questions like *"find functions that handle user authentication"* and get relevant, context-rich code instantly. - 🔍 **Hybrid Code Search**: Ask questions like *"find functions that handle user authentication"* and get relevant, context-rich code instantly using advanced hybrid search (BM25 + dense vector).
- 🧠 **Context-Aware**: Discover large codebase, understand how different parts of your codebase relate, even across millions of lines of code. - 🧠 **Context-Aware**: Discover large codebase, understand how different parts of your codebase relate, even across millions of lines of code.
- ⚡ **Incremental Indexing**: Efficiently re-index only changed files using Merkle trees. - ⚡ **Incremental Indexing**: Efficiently re-index only changed files using Merkle trees.
- 🧩 **Intelligent Code Chunking**: Analyze code in Abstract Syntax Trees (AST) for chunking. - 🧩 **Intelligent Code Chunking**: Analyze code in Abstract Syntax Trees (AST) for chunking.

View File

@@ -40,6 +40,7 @@ Claude Context supports a global configuration file at `~/.context/.env` to simp
### Advanced Configuration ### Advanced Configuration
| Variable | Description | Default | | Variable | Description | Default |
|----------|-------------|---------| |----------|-------------|---------|
| `HYBRID_MODE` | Enable hybrid search (BM25 + dense vector). Set to `false` for dense-only search | `true` |
| `EMBEDDING_BATCH_SIZE` | Batch size for processing. Larger batch size means less indexing time | `100` | | `EMBEDDING_BATCH_SIZE` | Batch size for processing. Larger batch size means less indexing time | `100` |
| `SPLITTER_TYPE` | Code splitter type: `ast`, `langchain` | `ast` | | `SPLITTER_TYPE` | Code splitter type: `ast`, `langchain` | `ast` |
| `CUSTOM_EXTENSIONS` | Additional file extensions to include (comma-separated, e.g., `.vue,.svelte,.astro`) | None | | `CUSTOM_EXTENSIONS` | Additional file extensions to include (comma-separated, e.g., `.vue,.svelte,.astro`) | None |

View File

@@ -6,8 +6,8 @@ Claude Context is a powerful semantic code search tool that gives AI coding assi
## Key Features ## Key Features
### 🔍 Semantic Code Search ### 🔍 Hybrid Code Search
Ask natural language questions like "find functions that handle user authentication" and get relevant code snippets from across your entire codebase. Ask natural language questions like "find functions that handle user authentication" and get relevant code snippets from across your entire codebase using advanced hybrid search (BM25 + dense vector).
### 🧠 Context-Aware Understanding ### 🧠 Context-Aware Understanding
Discover relationships between different parts of your code, even across millions of lines. The system understands code structure, patterns, and dependencies. Discover relationships between different parts of your code, even across millions of lines. The system understands code structure, patterns, and dependencies.
@@ -38,8 +38,8 @@ Each code chunk is converted into high-dimensional vectors using state-of-the-ar
### 4. Vector Storage ### 4. Vector Storage
Embeddings are stored in a vector database (Milvus/Zilliz Cloud) for efficient similarity search. Embeddings are stored in a vector database (Milvus/Zilliz Cloud) for efficient similarity search.
### 5. Semantic Search ### 5. Hybrid Search
Natural language queries are converted to vectors and matched against stored code embeddings. Natural language queries are processed using both dense vector embeddings and BM25 sparse retrieval, then combined with RRF (Reciprocal Rank Fusion) for optimal results.
## Architecture Components ## Architecture Components