support custom file extensions

Signed-off-by: ChengZi <chen.zhang@zilliz.com>
This commit is contained in:
ChengZi
2025-08-01 16:12:10 +08:00
committed by Cheney Zhang
parent 546e19de36
commit 2934a0ba98
7 changed files with 195 additions and 19 deletions

View File

@@ -70,3 +70,15 @@ MILVUS_TOKEN=your-zilliz-cloud-api-key
# Code splitter type: ast, langchain
SPLITTER_TYPE=ast
# =============================================================================
# Custom File Processing Configuration
# =============================================================================
# Additional file extensions to include beyond defaults (comma-separated)
# Example: .vue,.svelte,.astro,.twig,.blade.php
# CUSTOM_EXTENSIONS=.vue,.svelte,.astro
# Additional ignore patterns to exclude files/directories (comma-separated)
# Example: temp/**,*.backup,private/**,uploads/**
# CUSTOM_IGNORE_PATTERNS=temp/**,*.backup,private/**

View File

@@ -42,6 +42,8 @@ Code Context supports a global configuration file at `~/.codecontext/.env` to si
|----------|-------------|---------|
| `EMBEDDING_BATCH_SIZE` | Batch size for processing. Larger batch size means less indexing time | `100` |
| `SPLITTER_TYPE` | Code splitter type: `ast`, `langchain` | `ast` |
| `CUSTOM_EXTENSIONS` | Additional file extensions to include (comma-separated, e.g., `.vue,.svelte,.astro`) | None |
| `CUSTOM_IGNORE_PATTERNS` | Additional ignore patterns (comma-separated, e.g., `temp/**,*.backup,private/**`) | None |
## 🚀 Quick Setup
@@ -73,4 +75,9 @@ claude mcp add code-context -- npx @zilliz/code-context-mcp@latest
}
}
```
## 📚 Additional Information
For detailed information about file processing rules and how custom patterns work, see:
- [What files does Code Context decide to embed?](../troubleshooting/faq.md#q-what-files-does-code-context-decide-to-embed)

View File

@@ -5,24 +5,44 @@
**A:** Code Context embeds files based on the following rules:
**Files that are included:**
- Files with supported extensions (DEFAULT_SUPPORTED_EXTENSIONS)
- Files with supported extensions from multiple sources:
- DEFAULT_SUPPORTED_EXTENSIONS (built-in defaults)
- MCP custom extensions (via `customExtensions` parameter)
- Environment variable custom extensions (via `CUSTOM_EXTENSIONS`)
**Files that are excluded:**
- Files matching DEFAULT_IGNORE_PATTERNS
- Files matching patterns in .gitignore
- Files matching patterns in any .xxxignore files (e.g., .cursorignore, .codeiumignore)
- Files matching patterns in global ~/.codecontext/.codecontextignore
- Files matching ignore patterns from multiple sources:
- DEFAULT_IGNORE_PATTERNS (built-in defaults)
- MCP custom ignore patterns (via `ignorePatterns` parameter)
- Environment variable custom ignore patterns (via `CUSTOM_IGNORE_PATTERNS`)
- Files matching patterns in .gitignore
- Files matching patterns in any .xxxignore files (e.g., .cursorignore, .codeiumignore)
- Files matching patterns in global ~/.codecontext/.codecontextignore
The final rule is: `DEFAULT_SUPPORTED_EXTENSIONS - (DEFAULT_IGNORE_PATTERNS + MCP_CUSTOM_PATTERNS + .gitignore + .xxxignore files + global .codecontextignore)`
The final rule is: `(DEFAULT_SUPPORTED_EXTENSIONS + MCP custom extensions + custom extensions from env variable) - (DEFAULT_IGNORE_PATTERNS + MCP custom ignore patterns + custom ignore patterns from env variable + .gitignore + .xxxignore files + global .codecontextignore)`
**Ignore pattern merging (all patterns are combined):**
**Extension sources (all patterns are combined):**
1. **Default extensions**: Built-in supported file extensions (.ts, .js, .py, .java, .cpp, .md, etc.)
2. **MCP custom extensions**: Additional extensions passed via MCP `customExtensions` parameter
3. **Environment custom extensions**: Extensions from `CUSTOM_EXTENSIONS` env variable (comma-separated, e.g., `.vue,.svelte,.astro`)
**Ignore pattern sources (all patterns are combined):**
1. **Default patterns**: Built-in ignore patterns for common build outputs, dependencies, etc.
2. **MCP custom patterns**: Additional patterns passed via MCP `ignorePatterns` parameter
3. **.gitignore**: Standard Git ignore patterns in codebase root
4. **.xxxignore files**: Any file in codebase root matching pattern `.xxxignore` (e.g., `.cursorignore`, `.codeiumignore`)
5. **Global ignore**: `~/.codecontext/.codecontextignore` for user-wide patterns
2. **MCP custom ignore patterns**: Additional patterns passed via MCP `ignorePatterns` parameter
3. **Environment custom ignore patterns**: Patterns from `CUSTOM_IGNORE_PATTERNS` env variable (comma-separated)
4. **.gitignore**: Standard Git ignore patterns in codebase root
5. **.xxxignore files**: Any file in codebase root matching pattern `.xxxignore` (e.g., `.cursorignore`, `.codeiumignore`)
6. **Global ignore**: `~/.codecontext/.codecontextignore` for user-wide patterns
All patterns are merged together - MCP custom patterns will NOT be overwritten by file-based patterns.
All patterns are merged together - MCP custom patterns and environment variables will NOT be overwritten by file-based patterns.
**Environment Variables:**
- `CUSTOM_EXTENSIONS`: Comma-separated list of file extensions (e.g., `.vue,.svelte,.astro`)
- `CUSTOM_IGNORE_PATTERNS`: Comma-separated list of ignore patterns (e.g., `temp/**,*.backup,private/**`)
These environment variables can be set in:
- System environment variables (highest priority)
- Global `~/.codecontext/.env` file (lower priority)
Supported extensions include common programming languages (.ts, .js, .py, .java, .cpp, etc.) and documentation files (.md, .markdown). Default ignore patterns cover build outputs, dependencies (node_modules), IDE files, and temporary files.

View File

@@ -89,6 +89,8 @@ export interface CodeContextConfig {
codeSplitter?: Splitter;
supportedExtensions?: string[];
ignorePatterns?: string[];
customExtensions?: string[]; // New: custom extensions from MCP
customIgnorePatterns?: string[]; // New: custom ignore patterns from MCP
}
export class CodeContext {
@@ -114,8 +116,39 @@ export class CodeContext {
this.codeSplitter = config.codeSplitter || new AstCodeSplitter(2500, 300);
this.supportedExtensions = config.supportedExtensions || DEFAULT_SUPPORTED_EXTENSIONS;
this.ignorePatterns = config.ignorePatterns || DEFAULT_IGNORE_PATTERNS;
// Load custom extensions from environment variables
const envCustomExtensions = this.getCustomExtensionsFromEnv();
// Combine default extensions with config extensions and env extensions
const allSupportedExtensions = [
...DEFAULT_SUPPORTED_EXTENSIONS,
...(config.supportedExtensions || []),
...(config.customExtensions || []),
...envCustomExtensions
];
// Remove duplicates
this.supportedExtensions = [...new Set(allSupportedExtensions)];
// Load custom ignore patterns from environment variables
const envCustomIgnorePatterns = this.getCustomIgnorePatternsFromEnv();
// Start with default ignore patterns
const allIgnorePatterns = [
...DEFAULT_IGNORE_PATTERNS,
...(config.ignorePatterns || []),
...(config.customIgnorePatterns || []),
...envCustomIgnorePatterns
];
// Remove duplicates
this.ignorePatterns = [...new Set(allIgnorePatterns)];
console.log(`🔧 Initialized with ${this.supportedExtensions.length} supported extensions and ${this.ignorePatterns.length} ignore patterns`);
if (envCustomExtensions.length > 0) {
console.log(`📎 Loaded ${envCustomExtensions.length} custom extensions from environment: ${envCustomExtensions.join(', ')}`);
}
if (envCustomIgnorePatterns.length > 0) {
console.log(`🚫 Loaded ${envCustomIgnorePatterns.length} custom ignore patterns from environment: ${envCustomIgnorePatterns.join(', ')}`);
}
}
/**
@@ -854,6 +887,74 @@ export class CodeContext {
return regex.test(text);
}
/**
* Get custom extensions from environment variables
* Supports CUSTOM_EXTENSIONS as comma-separated list
* @returns Array of custom extensions
*/
private getCustomExtensionsFromEnv(): string[] {
const envExtensions = envManager.get('CUSTOM_EXTENSIONS');
if (!envExtensions) {
return [];
}
try {
const extensions = envExtensions
.split(',')
.map(ext => ext.trim())
.filter(ext => ext.length > 0)
.map(ext => ext.startsWith('.') ? ext : `.${ext}`); // Ensure extensions start with dot
return extensions;
} catch (error) {
console.warn(`⚠️ Failed to parse CUSTOM_EXTENSIONS: ${error}`);
return [];
}
}
/**
* Get custom ignore patterns from environment variables
* Supports CUSTOM_IGNORE_PATTERNS as comma-separated list
* @returns Array of custom ignore patterns
*/
private getCustomIgnorePatternsFromEnv(): string[] {
const envIgnorePatterns = envManager.get('CUSTOM_IGNORE_PATTERNS');
if (!envIgnorePatterns) {
return [];
}
try {
const patterns = envIgnorePatterns
.split(',')
.map(pattern => pattern.trim())
.filter(pattern => pattern.length > 0);
return patterns;
} catch (error) {
console.warn(`⚠️ Failed to parse CUSTOM_IGNORE_PATTERNS: ${error}`);
return [];
}
}
/**
* Add custom extensions (from MCP or other sources) without replacing existing ones
* @param customExtensions Array of custom extensions to add
*/
addCustomExtensions(customExtensions: string[]): void {
if (customExtensions.length === 0) return;
// Ensure extensions start with dot
const normalizedExtensions = customExtensions.map(ext =>
ext.startsWith('.') ? ext : `.${ext}`
);
// Merge current extensions with new custom extensions, avoiding duplicates
const mergedExtensions = [...this.supportedExtensions, ...normalizedExtensions];
const uniqueExtensions: string[] = [...new Set(mergedExtensions)];
this.supportedExtensions = uniqueExtensions;
console.log(`📎 Added ${customExtensions.length} custom extensions. Total: ${this.supportedExtensions.length} extensions`);
}
/**
* Get current splitter information
*/

View File

@@ -167,6 +167,19 @@ You can set the embedding batch size to optimize the performance of the MCP serv
EMBEDDING_BATCH_SIZE=512
```
#### Custom File Processing (Optional)
You can configure custom file extensions and ignore patterns globally via environment variables:
```bash
# Additional file extensions to include beyond defaults
CUSTOM_EXTENSIONS=.vue,.svelte,.astro,.twig
# Additional ignore patterns to exclude files/directories
CUSTOM_IGNORE_PATTERNS=temp/**,*.backup,private/**,uploads/**
```
These settings work in combination with tool parameters - patterns from both sources will be merged together.
## Usage with MCP Clients
@@ -531,21 +544,25 @@ npx @zilliz/code-context-mcp@latest
Index a codebase directory for semantic search.
**Parameters:**
- `path` (required): Path to the codebase directory to index
- `path` (required): Absolute path to the codebase directory to index
- `force` (optional): Force re-indexing even if already indexed (default: false)
- `splitter` (optional): Code splitter to use - 'ast' for syntax-aware splitting with automatic fallback, 'langchain' for character-based splitting (default: "ast")
- `customExtensions` (optional): Additional file extensions to include beyond defaults (e.g., ['.vue', '.svelte', '.astro']). Extensions should include the dot prefix or will be automatically added (default: [])
- `ignorePatterns` (optional): Additional ignore patterns to exclude specific files/directories beyond defaults (e.g., ['static/**', '*.tmp', 'private/**']) (default: [])
### 2. `search_code`
Search the indexed codebase using natural language queries.
**Parameters:**
- `path` (required): Absolute path to the codebase directory to search in
- `query` (required): Natural language query to search for in the codebase
- `limit` (optional): Maximum number of results to return (default: 10, max: 50)
### 3. `clear_index`
Clear the search index.
Clear the search index for a specific codebase.
**Parameters:**
- `confirm` (required): Confirmation flag to prevent accidental clearing
- `path` (required): Absolute path to the codebase directory to clear index for
## Contributing

View File

@@ -142,9 +142,10 @@ export class ToolHandlers {
}
public async handleIndexCodebase(args: any) {
const { path: codebasePath, force, splitter, ignorePatterns } = args;
const { path: codebasePath, force, splitter, customExtensions, ignorePatterns } = args;
const forceReindex = force || false;
const splitterType = splitter || 'ast'; // Default to AST
const customFileExtensions = customExtensions || [];
const customIgnorePatterns = ignorePatterns || [];
try {
@@ -278,6 +279,12 @@ export class ToolHandlers {
}
}
// Add custom extensions if provided
if (customFileExtensions.length > 0) {
console.log(`[CUSTOM-EXTENSIONS] Adding ${customFileExtensions.length} custom extensions: ${customFileExtensions.join(', ')}`);
this.codeContext.addCustomExtensions(customFileExtensions);
}
// Add custom ignore patterns if provided (before loading file-based patterns)
if (customIgnorePatterns.length > 0) {
console.log(`[IGNORE-PATTERNS] Adding ${customIgnorePatterns.length} custom ignore patterns: ${customIgnorePatterns.join(', ')}`);
@@ -298,6 +305,10 @@ export class ToolHandlers {
? `\nNote: Input path '${codebasePath}' was resolved to absolute path '${absolutePath}'`
: '';
const extensionInfo = customFileExtensions.length > 0
? `\nUsing ${customFileExtensions.length} custom extensions: ${customFileExtensions.join(', ')}`
: '';
const ignoreInfo = customIgnorePatterns.length > 0
? `\nUsing ${customIgnorePatterns.length} custom ignore patterns: ${customIgnorePatterns.join(', ')}`
: '';
@@ -305,7 +316,7 @@ export class ToolHandlers {
return {
content: [{
type: "text",
text: `Started background indexing for codebase '${absolutePath}' using ${splitterType.toUpperCase()} splitter.${pathInfo}${ignoreInfo}\n\nIndexing is running in the background. You can search the codebase while indexing is in progress, but results may be incomplete until indexing completes.`
text: `Started background indexing for codebase '${absolutePath}' using ${splitterType.toUpperCase()} splitter.${pathInfo}${extensionInfo}${ignoreInfo}\n\nIndexing is running in the background. You can search the codebase while indexing is in progress, but results may be incomplete until indexing completes.`
}]
};

View File

@@ -139,6 +139,14 @@ Search the indexed codebase using natural language queries within a specified ab
enum: ["ast", "langchain"],
default: "ast"
},
customExtensions: {
type: "array",
items: {
type: "string"
},
description: "Optional: Additional file extensions to include beyond defaults (e.g., ['.vue', '.svelte', '.astro']). Extensions should include the dot prefix or will be automatically added",
default: []
},
ignorePatterns: {
type: "array",
items: {