mirror of
https://github.com/zilliztech/claude-context.git
synced 2025-10-06 01:10:02 +03:00
add bm25 hybrid to docs (#137)
Signed-off-by: ChengZi <chen.zhang@zilliz.com>
This commit is contained in:
@@ -82,3 +82,6 @@ SPLITTER_TYPE=ast
|
||||
# Additional ignore patterns to exclude files/directories (comma-separated)
|
||||
# Example: temp/**,*.backup,private/**,uploads/**
|
||||
# CUSTOM_IGNORE_PATTERNS=temp/**,*.backup,private/**
|
||||
|
||||
# Whether to use hybrid search mode. If true, it will use both dense vector and BM25; if false, it will use only dense vector search.
|
||||
# HYBRID_MODE=true
|
||||
|
||||
3
.gitignore
vendored
3
.gitignore
vendored
@@ -54,6 +54,9 @@ Thumbs.db
|
||||
*.crx
|
||||
*.pem
|
||||
|
||||
__pycache__/
|
||||
*.log
|
||||
|
||||
.claude/*
|
||||
CLAUDE.md
|
||||
|
||||
|
||||
@@ -403,7 +403,7 @@ For more detailed MCP environment variable configuration, see our [Environment V
|
||||
|
||||
### 🔧 Implementation Details
|
||||
|
||||
- 🔍 **Semantic Code Search**: Ask questions like *"find functions that handle user authentication"* and get relevant, context-rich code instantly.
|
||||
- 🔍 **Hybrid Code Search**: Ask questions like *"find functions that handle user authentication"* and get relevant, context-rich code instantly using advanced hybrid search (BM25 + dense vector).
|
||||
- 🧠 **Context-Aware**: Discover large codebase, understand how different parts of your codebase relate, even across millions of lines of code.
|
||||
- ⚡ **Incremental Indexing**: Efficiently re-index only changed files using Merkle trees.
|
||||
- 🧩 **Intelligent Code Chunking**: Analyze code in Abstract Syntax Trees (AST) for chunking.
|
||||
|
||||
@@ -40,6 +40,7 @@ Claude Context supports a global configuration file at `~/.context/.env` to simp
|
||||
### Advanced Configuration
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `HYBRID_MODE` | Enable hybrid search (BM25 + dense vector). Set to `false` for dense-only search | `true` |
|
||||
| `EMBEDDING_BATCH_SIZE` | Batch size for processing. Larger batch size means less indexing time | `100` |
|
||||
| `SPLITTER_TYPE` | Code splitter type: `ast`, `langchain` | `ast` |
|
||||
| `CUSTOM_EXTENSIONS` | Additional file extensions to include (comma-separated, e.g., `.vue,.svelte,.astro`) | None |
|
||||
|
||||
@@ -6,8 +6,8 @@ Claude Context is a powerful semantic code search tool that gives AI coding assi
|
||||
|
||||
## Key Features
|
||||
|
||||
### 🔍 Semantic Code Search
|
||||
Ask natural language questions like "find functions that handle user authentication" and get relevant code snippets from across your entire codebase.
|
||||
### 🔍 Hybrid Code Search
|
||||
Ask natural language questions like "find functions that handle user authentication" and get relevant code snippets from across your entire codebase using advanced hybrid search (BM25 + dense vector).
|
||||
|
||||
### 🧠 Context-Aware Understanding
|
||||
Discover relationships between different parts of your code, even across millions of lines. The system understands code structure, patterns, and dependencies.
|
||||
@@ -38,8 +38,8 @@ Each code chunk is converted into high-dimensional vectors using state-of-the-ar
|
||||
### 4. Vector Storage
|
||||
Embeddings are stored in a vector database (Milvus/Zilliz Cloud) for efficient similarity search.
|
||||
|
||||
### 5. Semantic Search
|
||||
Natural language queries are converted to vectors and matched against stored code embeddings.
|
||||
### 5. Hybrid Search
|
||||
Natural language queries are processed using both dense vector embeddings and BM25 sparse retrieval, then combined with RRF (Reciprocal Rank Fusion) for optimal results.
|
||||
|
||||
## Architecture Components
|
||||
|
||||
|
||||
Reference in New Issue
Block a user