docs: 更新README文件以包含致谢部分和英文文档
在README-CN.md中添加了致谢部分,感谢开发过程中使用的AI工具和模型。同时新增了README.md文件,提供项目的英文文档,包括功能、安装、使用说明、性能优化等内容。
This commit is contained in:
17
README-CN.md
17
README-CN.md
@@ -112,6 +112,23 @@ mcp run whisper_server.py
|
||||
|
||||
MIT
|
||||
|
||||
## 致谢
|
||||
|
||||
本项目在开发过程中得到了以下优秀AI工具和模型的帮助:
|
||||
|
||||
- [GitHub Copilot](https://github.com/features/copilot) - AI结对编程助手
|
||||
- [Trae](https://trae.ai/) - 智能AI编码助手
|
||||
- [Cline](https://cline.ai/) - AI驱动的终端
|
||||
- [DeepSeek](https://www.deepseek.com/) - 先进的AI模型
|
||||
- [Claude-3.7-Sonnet](https://www.anthropic.com/claude) - Anthropic强大的AI助手
|
||||
- [Gemini-2.0-Flash](https://ai.google/gemini/) - Google的多模态AI模型
|
||||
- [VS Code](https://code.visualstudio.com/) - 强大的代码编辑器
|
||||
- [Whisper](https://github.com/openai/whisper) - OpenAI的语音识别模型
|
||||
- [Faster Whisper](https://github.com/guillaumekln/faster-whisper) - 优化的Whisper实现
|
||||
|
||||
特别感谢这些出色的工具和背后的团队。
|
||||
|
||||
---
|
||||
|
||||
# Whisper 语音识别 MCP 服务器(cline claude sonnet 3.7 完成所有任务后的说明)
|
||||
|
||||
|
||||
163
README.md
Normal file
163
README.md
Normal file
@@ -0,0 +1,163 @@
|
||||
# Whisper Speech Recognition MCP Server
|
||||
---
|
||||
[中文文档](README-CN.md)
|
||||
---
|
||||
A high-performance speech recognition MCP server based on Faster Whisper, providing efficient audio transcription capabilities.
|
||||
|
||||
## Features
|
||||
|
||||
- Integrated with Faster Whisper for efficient speech recognition
|
||||
- Batch processing acceleration for improved transcription speed
|
||||
- Automatic CUDA acceleration (if available)
|
||||
- Support for multiple model sizes (tiny to large-v3)
|
||||
- Output formats include VTT subtitles, SRT, and JSON
|
||||
- Support for batch transcription of audio files in a folder
|
||||
- Model instance caching to avoid repeated loading
|
||||
- Dynamic batch size adjustment based on GPU memory
|
||||
|
||||
## Installation
|
||||
|
||||
### Dependencies
|
||||
|
||||
- Python 3.10+
|
||||
- faster-whisper>=0.9.0
|
||||
- torch==2.6.0+cu126
|
||||
- torchaudio==2.6.0+cu126
|
||||
- mcp[cli]>=1.2.0
|
||||
|
||||
### Installation Steps
|
||||
|
||||
1. Clone or download this repository
|
||||
2. Create and activate a virtual environment (recommended)
|
||||
3. Install dependencies:
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### PyTorch Installation Guide
|
||||
|
||||
Install the appropriate version of PyTorch based on your CUDA version:
|
||||
|
||||
- CUDA 12.6:
|
||||
```bash
|
||||
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
|
||||
```
|
||||
|
||||
- CUDA 12.1:
|
||||
```bash
|
||||
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
|
||||
```
|
||||
|
||||
- CPU version:
|
||||
```bash
|
||||
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cpu
|
||||
```
|
||||
|
||||
You can check your CUDA version with `nvcc --version` or `nvidia-smi`.
|
||||
|
||||
## Usage
|
||||
|
||||
### Starting the Server
|
||||
|
||||
On Windows, simply run `start_server.bat`.
|
||||
|
||||
On other platforms, run:
|
||||
|
||||
```bash
|
||||
python whisper_server.py
|
||||
```
|
||||
|
||||
### Configuring Claude Desktop
|
||||
|
||||
1. Open the Claude Desktop configuration file:
|
||||
- Windows: `%APPDATA%\Claude\claude_desktop_config.json`
|
||||
- macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
|
||||
|
||||
2. Add the Whisper server configuration:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"whisper": {
|
||||
"command": "python",
|
||||
"args": ["D:/path/to/whisper_server.py"],
|
||||
"env": {}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
3. Restart Claude Desktop
|
||||
|
||||
### Available Tools
|
||||
|
||||
The server provides the following tools:
|
||||
|
||||
1. **get_model_info** - Get information about available Whisper models
|
||||
2. **transcribe** - Transcribe a single audio file
|
||||
3. **batch_transcribe** - Batch transcribe audio files in a folder
|
||||
|
||||
## Performance Optimization Tips
|
||||
|
||||
- Using CUDA acceleration significantly improves transcription speed
|
||||
- Batch processing mode is more efficient for large numbers of short audio files
|
||||
- Batch size is automatically adjusted based on GPU memory size
|
||||
- Using VAD (Voice Activity Detection) filtering improves accuracy for long audio
|
||||
- Specifying the correct language can improve transcription quality
|
||||
|
||||
## Local Testing Methods
|
||||
|
||||
1. Use MCP Inspector for quick testing:
|
||||
|
||||
```bash
|
||||
mcp dev whisper_server.py
|
||||
```
|
||||
|
||||
2. Use Claude Desktop for integration testing
|
||||
|
||||
3. Use command line direct invocation (requires mcp[cli]):
|
||||
|
||||
```bash
|
||||
mcp run whisper_server.py
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
The server implements the following error handling mechanisms:
|
||||
|
||||
- Audio file existence check
|
||||
- Model loading failure handling
|
||||
- Transcription process exception catching
|
||||
- GPU memory management
|
||||
- Batch processing parameter adaptive adjustment
|
||||
|
||||
## Project Structure
|
||||
|
||||
- `whisper_server.py`: Main server code
|
||||
- `model_manager.py`: Whisper model loading and caching
|
||||
- `audio_processor.py`: Audio file validation and preprocessing
|
||||
- `formatters.py`: Output formatting (VTT, SRT, JSON)
|
||||
- `transcriber.py`: Core transcription logic
|
||||
- `start_server.bat`: Windows startup script
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
|
||||
## Acknowledgements
|
||||
|
||||
This project was developed with the assistance of these amazing AI tools and models:
|
||||
|
||||
- [GitHub Copilot](https://github.com/features/copilot) - AI pair programmer
|
||||
- [Trae](https://trae.ai/) - Agentic AI coding assistant
|
||||
- [Cline](https://cline.ai/) - AI-powered terminal
|
||||
- [DeepSeek](https://www.deepseek.com/) - Advanced AI model
|
||||
- [Claude-3.7-Sonnet](https://www.anthropic.com/claude) - Anthropic's powerful AI assistant
|
||||
- [Gemini-2.0-Flash](https://ai.google/gemini/) - Google's multimodal AI model
|
||||
- [VS Code](https://code.visualstudio.com/) - Powerful code editor
|
||||
- [Whisper](https://github.com/openai/whisper) - OpenAI's speech recognition model
|
||||
- [Faster Whisper](https://github.com/guillaumekln/faster-whisper) - Optimized Whisper implementation
|
||||
|
||||
Special thanks to these incredible tools and the teams behind them.
|
||||
|
||||
Reference in New Issue
Block a user