enhance environment variables and mcp documentation (#169)

Signed-off-by: ChengZi <chen.zhang@zilliz.com>
2025-10-06 01:10:02 +03:00 · 2025-08-20 11:32:40 +08:00
parent 188529de44
commit 9cbe7870f5
4 changed files with 61 additions and 40 deletions
--- a/README.md
+++ b/README.md
@@ -411,13 +411,13 @@ npx @zilliz/claude-context-mcp@latest

 </details>

-For more detailed MCP environment variable configuration, see our [Environment Variables Guide](docs/getting-started/environment-variables.md).
+---
+**How to configure environment variables for MCP:** For more detailed MCP environment variable configuration, see our [Environment Variables Guide](docs/getting-started/environment-variables.md).
+
+**Using Different Embedding Models with MCP:** To configure specific embedding models (e.g., `text-embedding-3-large` for OpenAI, `voyage-code-3` for VoyageAI), see the [MCP Configuration Examples](packages/mcp/README.md#embedding-provider-configuration) for detailed setup instructions for each provider.

 📚 **Need more help?** Check out our [complete documentation](docs/) for detailed guides and troubleshooting tips.

---
-
-
 ---

 ## 🏗️ Architecture
--- a/docs/getting-started/environment-variables.md
+++ b/docs/getting-started/environment-variables.md
@@ -21,21 +21,37 @@ Claude Context supports a global configuration file at `~/.context/.env` to simp
 | Variable | Description | Default |
 |----------|-------------|---------|
 | `EMBEDDING_PROVIDER` | Provider: `OpenAI`, `VoyageAI`, `Gemini`, `Ollama` | `OpenAI` |
+| `EMBEDDING_MODEL` | Embedding model name (works for all providers) | Provider-specific default |
 | `OPENAI_API_KEY` | OpenAI API key | Required for OpenAI |
 | `VOYAGEAI_API_KEY` | VoyageAI API key | Required for VoyageAI |
 | `GEMINI_API_KEY` | Gemini API key | Required for Gemini |

+> **💡 Note:** `EMBEDDING_MODEL` is a universal environment variable that works with all embedding providers. Simply set it to the model name you want to use (e.g., `text-embedding-3-large` for OpenAI, `voyage-code-3` for VoyageAI, etc.).
+
+> **Supported Model Names:**
+> 
+> - OpenAI Models: See `getSupportedModels` in [`openai-embedding.ts`](https://github.com/zilliztech/claude-context/blob/master/packages/core/src/embedding/openai-embedding.ts) for the full list of supported models.
+> 
+> - VoyageAI Models: See `getSupportedModels` in [`voyageai-embedding.ts`](https://github.com/zilliztech/claude-context/blob/master/packages/core/src/embedding/voyageai-embedding.ts) for the full list of supported models.
+> 
+> - Gemini Models: See `getSupportedModels` in [`gemini-embedding.ts`](https://github.com/zilliztech/claude-context/blob/master/packages/core/src/embedding/gemini-embedding.ts) for the full list of supported models.
+> 
+> - Ollama Models: Depends on the model you install locally.
+
+> **📖 For detailed provider-specific configuration examples and setup instructions, see the [MCP Configuration Guide](../../packages/mcp/README.md#embedding-provider-configuration).**
+
 ### Vector Database
 | Variable | Description | Default |
 |----------|-------------|---------|
 | `MILVUS_TOKEN` | Milvus authentication token. Get [Zilliz Personal API Key](https://github.com/zilliztech/claude-context/blob/master/assets/signup_and_get_apikey.png) | Recommended |
 | `MILVUS_ADDRESS` | Milvus server address. Optional when using Zilliz Personal API Key | Auto-resolved from token |

-### Ollama (Local)
+### Ollama (Optional)
 | Variable | Description | Default |
 |----------|-------------|---------|
 | `OLLAMA_HOST` | Ollama server URL | `http://127.0.0.1:11434` |
-| `OLLAMA_MODEL` | Model name | `nomic-embed-text` |
+| `OLLAMA_MODEL`(alternative to `EMBEDDING_MODEL`) | Model name |  |
+

 ### Advanced Configuration
 | Variable | Description | Default |
@@ -54,6 +70,7 @@ mkdir -p ~/.context
 cat > ~/.context/.env << 'EOF'
 EMBEDDING_PROVIDER=OpenAI
 OPENAI_API_KEY=sk-your-openai-api-key
+EMBEDDING_MODEL=text-embedding-3-small
 MILVUS_TOKEN=your-zilliz-cloud-api-key
 EOF
 ```
--- a/packages/mcp/README.md
+++ b/packages/mcp/README.md
@@ -31,7 +31,7 @@ Before using the MCP server, make sure you have:

 Claude Context MCP supports multiple embedding providers. Choose the one that best fits your needs:

-> 💡 **Tip**: You can also use [global environment variables](../../docs/getting-started/environment-variables.md) for easier configuration management across different MCP clients.
+> 📋 **Quick Reference**: For a complete list of environment variables and their descriptions, see the [Environment Variables Guide](../../docs/getting-started/environment-variables.md).

 ```bash
 # Supported providers: OpenAI, VoyageAI, Gemini, Ollama
@@ -55,9 +55,7 @@ OPENAI_BASE_URL=https://api.openai.com/v1
 ```

 **Available Models:**
- `text-embedding-3-small` (1536 dimensions, faster, lower cost)
- `text-embedding-3-large` (3072 dimensions, higher quality)
- `text-embedding-ada-002` (1536 dimensions, legacy model)
+See `getSupportedModels` in [`openai-embedding.ts`](https://github.com/zilliztech/claude-context/blob/master/packages/core/src/embedding/openai-embedding.ts) for the full list of supported models.

 **Getting API Key:**
 1. Visit [OpenAI Platform](https://platform.openai.com/api-keys)
@@ -81,9 +79,7 @@ EMBEDDING_MODEL=voyage-code-3
 ```

 **Available Models:**
- `voyage-code-3` (1024 dimensions, optimized for code)
- `voyage-3` (1024 dimensions, general purpose)
- `voyage-3-lite` (512 dimensions, faster inference)
+See `getSupportedModels` in [`voyageai-embedding.ts`](https://github.com/zilliztech/claude-context/blob/master/packages/core/src/embedding/voyageai-embedding.ts) for the full list of supported models.

 **Getting API Key:**
 1. Visit [VoyageAI Console](https://dash.voyageai.com/)
@@ -107,7 +103,7 @@ EMBEDDING_MODEL=gemini-embedding-001
 ```

 **Available Models:**
- `gemini-embedding-001` (3072 dimensions, latest model)
+See `getSupportedModels` in [`gemini-embedding.ts`](https://github.com/zilliztech/claude-context/blob/master/packages/core/src/embedding/gemini-embedding.ts) for the full list of supported models.

 **Getting API Key:**
 1. Visit [Google AI Studio](https://aistudio.google.com/)
@@ -130,11 +126,6 @@ EMBEDDING_MODEL=nomic-embed-text
 OLLAMA_HOST=http://127.0.0.1:11434
 ```

-**Available Models:**
- `nomic-embed-text` (768 dimensions, recommended for code)
- `mxbai-embed-large` (1024 dimensions, higher quality)
- `all-minilm` (384 dimensions, lightweight)
-
 **Setup Instructions:**
 1. Install Ollama from [ollama.ai](https://ollama.ai/)
 2. Pull the embedding model:
@@ -557,18 +548,19 @@ npx @zilliz/claude-context-mcp@latest

 ## Features

- 🔌 MCP Protocol Compliance: Full compatibility with MCP-enabled AI assistants and agents
- 🔍 Semantic Code Search: Natural language queries to find relevant code snippets
- 📁 Codebase Indexing: Index entire codebases for fast semantic search
- 🔄 Auto-Sync: Automatically detects and synchronizes file changes to keep index up-to-date
- 🧠 AI-Powered: Uses OpenAI embeddings and Milvus vector database
- ⚡ Real-time: Interactive indexing and searching with progress feedback
- 🛠️ Tool-based: Exposes three main tools via MCP protocol
+- 🔌 **MCP Protocol Compliance**: Full compatibility with MCP-enabled AI assistants and agents
+- 🔍 **Hybrid Code Search**: Natural language queries using advanced hybrid search (BM25 + dense vector) to find relevant code snippets
+- 📁 **Codebase Indexing**: Index entire codebases for fast hybrid search across millions of lines of code
+- 🔄 **Incremental Indexing**: Efficiently re-index only changed files using Merkle trees for auto-sync
+- 🧩 **Intelligent Code Chunking**: AST-based code analysis for syntax-aware chunking with automatic fallback
+- 🗄️ **Scalable**: Integrates with Zilliz Cloud for scalable vector search, no matter how large your codebase is
+- 🛠️ **Customizable**: Configure file extensions, ignore patterns, and embedding models
+- ⚡ **Real-time**: Interactive indexing and searching with progress feedback

 ## Available Tools

 ### 1. `index_codebase`
-Index a codebase directory for semantic search.
+Index a codebase directory for hybrid search (BM25 + dense vector).

 **Parameters:**
 - `path` (required): Absolute path to the codebase directory to index
@@ -578,12 +570,13 @@ Index a codebase directory for semantic search.
 - `ignorePatterns` (optional): Additional ignore patterns to exclude specific files/directories beyond defaults (e.g., ['static/**', '*.tmp', 'private/**']) (default: [])

 ### 2. `search_code`
-Search the indexed codebase using natural language queries.
+Search the indexed codebase using natural language queries with hybrid search (BM25 + dense vector).

 **Parameters:**
 - `path` (required): Absolute path to the codebase directory to search in
 - `query` (required): Natural language query to search for in the codebase
 - `limit` (optional): Maximum number of results to return (default: 10, max: 50)
+- `extensionFilter` (optional): List of file extensions to filter results (e.g., ['.ts', '.py']) (default: [])

 ### 3. `clear_index`
 Clear the search index for a specific codebase.
@@ -591,6 +584,12 @@ Clear the search index for a specific codebase.
 **Parameters:**
 - `path` (required): Absolute path to the codebase directory to clear index for

+### 4. `get_indexing_status`
+Get the current indexing status of a codebase. Shows progress percentage for actively indexing codebases and completion status for indexed codebases.
+
+**Parameters:**
+- `path` (required): Absolute path to the codebase directory to check status for
+

 ## Contributing

--- a/packages/mcp/src/config.ts
+++ b/packages/mcp/src/config.ts
@@ -45,7 +45,7 @@ export function getDefaultModelForProvider(provider: string): string {
 export function getEmbeddingModelForProvider(provider: string): string {
    switch (provider) {
        case 'Ollama':
-            // For Ollama, prioritize OLLAMA_MODEL over EMBEDDING_MODEL
+            // For Ollama, prioritize OLLAMA_MODEL over EMBEDDING_MODEL for backward compatibility
            const ollamaModel = envManager.get('OLLAMA_MODEL') || envManager.get('EMBEDDING_MODEL') || getDefaultModelForProvider(provider);
            console.log(`[DEBUG] 🎯 Ollama model selection: OLLAMA_MODEL=${envManager.get('OLLAMA_MODEL') || 'NOT SET'}, EMBEDDING_MODEL=${envManager.get('EMBEDDING_MODEL') || 'NOT SET'}, selected=${ollamaModel}`);
            return ollamaModel;
@@ -53,8 +53,10 @@ export function getEmbeddingModelForProvider(provider: string): string {
        case 'VoyageAI':
        case 'Gemini':
        default:
-            // For other providers, use EMBEDDING_MODEL or default
-            return envManager.get('EMBEDDING_MODEL') || getDefaultModelForProvider(provider);
+            // For all other providers, use EMBEDDING_MODEL or default
+            const selectedModel = envManager.get('EMBEDDING_MODEL') || getDefaultModelForProvider(provider);
+            console.log(`[DEBUG] 🎯 ${provider} model selection: EMBEDDING_MODEL=${envManager.get('EMBEDDING_MODEL') || 'NOT SET'}, selected=${selectedModel}`);
+            return selectedModel;
    }
 }

@@ -138,7 +140,7 @@ Environment Variables:
  
  Embedding Provider Configuration:
  EMBEDDING_PROVIDER      Embedding provider: OpenAI, VoyageAI, Gemini, Ollama (default: OpenAI)
-  EMBEDDING_MODEL         Embedding model name (auto-detected if not specified)
+  EMBEDDING_MODEL         Embedding model name (works for all providers)
  
  Provider-specific API Keys:
  OPENAI_API_KEY          OpenAI API key (required for OpenAI provider)
@@ -148,7 +150,7 @@ Environment Variables:
  
  Ollama Configuration:
  OLLAMA_HOST             Ollama server host (default: http://127.0.0.1:11434)
-  OLLAMA_MODEL            Ollama model name (default: nomic-embed-text)
+  OLLAMA_MODEL            Ollama model name (alternative to EMBEDDING_MODEL for Ollama)
  
  Vector Database Configuration:
  MILVUS_ADDRESS          Milvus address (optional, can be auto-resolved from token)
@@ -158,16 +160,19 @@ Examples:
  # Start MCP server with OpenAI (default) and explicit Milvus address
  OPENAI_API_KEY=sk-xxx MILVUS_ADDRESS=localhost:19530 npx @zilliz/claude-context-mcp@latest
  
-  # Start MCP server with OpenAI and auto-resolve Milvus address from token
-  OPENAI_API_KEY=sk-xxx MILVUS_TOKEN=your-zilliz-token npx @zilliz/claude-context-mcp@latest
+  # Start MCP server with OpenAI and specific model
+  OPENAI_API_KEY=sk-xxx EMBEDDING_MODEL=text-embedding-3-large MILVUS_TOKEN=your-token npx @zilliz/claude-context-mcp@latest
  
-  # Start MCP server with VoyageAI
-  EMBEDDING_PROVIDER=VoyageAI VOYAGEAI_API_KEY=pa-xxx MILVUS_TOKEN=your-token npx @zilliz/claude-context-mcp@latest
+  # Start MCP server with VoyageAI and specific model
+  EMBEDDING_PROVIDER=VoyageAI VOYAGEAI_API_KEY=pa-xxx EMBEDDING_MODEL=voyage-3-large MILVUS_TOKEN=your-token npx @zilliz/claude-context-mcp@latest
  
-  # Start MCP server with Gemini
-  EMBEDDING_PROVIDER=Gemini GEMINI_API_KEY=xxx MILVUS_TOKEN=your-token npx @zilliz/claude-context-mcp@latest
+  # Start MCP server with Gemini and specific model
+  EMBEDDING_PROVIDER=Gemini GEMINI_API_KEY=xxx EMBEDDING_MODEL=gemini-embedding-001 MILVUS_TOKEN=your-token npx @zilliz/claude-context-mcp@latest
  
-  # Start MCP server with Ollama
+  # Start MCP server with Ollama and specific model (using OLLAMA_MODEL)
+  EMBEDDING_PROVIDER=Ollama OLLAMA_MODEL=mxbai-embed-large MILVUS_TOKEN=your-token npx @zilliz/claude-context-mcp@latest
+  
+  # Start MCP server with Ollama and specific model (using EMBEDDING_MODEL)
  EMBEDDING_PROVIDER=Ollama EMBEDDING_MODEL=nomic-embed-text MILVUS_TOKEN=your-token npx @zilliz/claude-context-mcp@latest
        `);
 }