add bm25 hybrid to docs (#137)

Signed-off-by: ChengZi <chen.zhang@zilliz.com>
This commit is contained in:
Cheney Zhang
2025-08-08 11:56:10 +08:00
committed by GitHub
parent 5589686e50
commit 5210b97857
5 changed files with 12 additions and 5 deletions

View File

@@ -82,3 +82,6 @@ SPLITTER_TYPE=ast
# Additional ignore patterns to exclude files/directories (comma-separated)
# Example: temp/**,*.backup,private/**,uploads/**
# CUSTOM_IGNORE_PATTERNS=temp/**,*.backup,private/**
# Whether to use hybrid search mode. If true, it will use both dense vector and BM25; if false, it will use only dense vector search.
# HYBRID_MODE=true

3
.gitignore vendored
View File

@@ -54,6 +54,9 @@ Thumbs.db
*.crx
*.pem
__pycache__/
*.log
.claude/*
CLAUDE.md

View File

@@ -403,7 +403,7 @@ For more detailed MCP environment variable configuration, see our [Environment V
### 🔧 Implementation Details
- 🔍 **Semantic Code Search**: Ask questions like *"find functions that handle user authentication"* and get relevant, context-rich code instantly.
- 🔍 **Hybrid Code Search**: Ask questions like *"find functions that handle user authentication"* and get relevant, context-rich code instantly using advanced hybrid search (BM25 + dense vector).
- 🧠 **Context-Aware**: Discover large codebase, understand how different parts of your codebase relate, even across millions of lines of code.
- ⚡ **Incremental Indexing**: Efficiently re-index only changed files using Merkle trees.
- 🧩 **Intelligent Code Chunking**: Analyze code in Abstract Syntax Trees (AST) for chunking.

View File

@@ -40,6 +40,7 @@ Claude Context supports a global configuration file at `~/.context/.env` to simp
### Advanced Configuration
| Variable | Description | Default |
|----------|-------------|---------|
| `HYBRID_MODE` | Enable hybrid search (BM25 + dense vector). Set to `false` for dense-only search | `true` |
| `EMBEDDING_BATCH_SIZE` | Batch size for processing. Larger batch size means less indexing time | `100` |
| `SPLITTER_TYPE` | Code splitter type: `ast`, `langchain` | `ast` |
| `CUSTOM_EXTENSIONS` | Additional file extensions to include (comma-separated, e.g., `.vue,.svelte,.astro`) | None |

View File

@@ -6,8 +6,8 @@ Claude Context is a powerful semantic code search tool that gives AI coding assi
## Key Features
### 🔍 Semantic Code Search
Ask natural language questions like "find functions that handle user authentication" and get relevant code snippets from across your entire codebase.
### 🔍 Hybrid Code Search
Ask natural language questions like "find functions that handle user authentication" and get relevant code snippets from across your entire codebase using advanced hybrid search (BM25 + dense vector).
### 🧠 Context-Aware Understanding
Discover relationships between different parts of your code, even across millions of lines. The system understands code structure, patterns, and dependencies.
@@ -38,8 +38,8 @@ Each code chunk is converted into high-dimensional vectors using state-of-the-ar
### 4. Vector Storage
Embeddings are stored in a vector database (Milvus/Zilliz Cloud) for efficient similarity search.
### 5. Semantic Search
Natural language queries are converted to vectors and matched against stored code embeddings.
### 5. Hybrid Search
Natural language queries are processed using both dense vector embeddings and BM25 sparse retrieval, then combined with RRF (Reciprocal Rank Fusion) for optimal results.
## Architecture Components