docs: 更新README文件以包含致谢部分和英文文档
在README-CN.md中添加了致谢部分,感谢开发过程中使用的AI工具和模型。同时新增了README.md文件,提供项目的英文文档,包括功能、安装、使用说明、性能优化等内容。
This commit is contained in:
17
README-CN.md
17
README-CN.md
@@ -112,6 +112,23 @@ mcp run whisper_server.py
|
|||||||
|
|
||||||
MIT
|
MIT
|
||||||
|
|
||||||
|
## 致谢
|
||||||
|
|
||||||
|
本项目在开发过程中得到了以下优秀AI工具和模型的帮助:
|
||||||
|
|
||||||
|
- [GitHub Copilot](https://github.com/features/copilot) - AI结对编程助手
|
||||||
|
- [Trae](https://trae.ai/) - 智能AI编码助手
|
||||||
|
- [Cline](https://cline.ai/) - AI驱动的终端
|
||||||
|
- [DeepSeek](https://www.deepseek.com/) - 先进的AI模型
|
||||||
|
- [Claude-3.7-Sonnet](https://www.anthropic.com/claude) - Anthropic强大的AI助手
|
||||||
|
- [Gemini-2.0-Flash](https://ai.google/gemini/) - Google的多模态AI模型
|
||||||
|
- [VS Code](https://code.visualstudio.com/) - 强大的代码编辑器
|
||||||
|
- [Whisper](https://github.com/openai/whisper) - OpenAI的语音识别模型
|
||||||
|
- [Faster Whisper](https://github.com/guillaumekln/faster-whisper) - 优化的Whisper实现
|
||||||
|
|
||||||
|
特别感谢这些出色的工具和背后的团队。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
# Whisper 语音识别 MCP 服务器(cline claude sonnet 3.7 完成所有任务后的说明)
|
# Whisper 语音识别 MCP 服务器(cline claude sonnet 3.7 完成所有任务后的说明)
|
||||||
|
|
||||||
|
|||||||
163
README.md
Normal file
163
README.md
Normal file
@@ -0,0 +1,163 @@
|
|||||||
|
# Whisper Speech Recognition MCP Server
|
||||||
|
---
|
||||||
|
[中文文档](README-CN.md)
|
||||||
|
---
|
||||||
|
A high-performance speech recognition MCP server based on Faster Whisper, providing efficient audio transcription capabilities.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- Integrated with Faster Whisper for efficient speech recognition
|
||||||
|
- Batch processing acceleration for improved transcription speed
|
||||||
|
- Automatic CUDA acceleration (if available)
|
||||||
|
- Support for multiple model sizes (tiny to large-v3)
|
||||||
|
- Output formats include VTT subtitles, SRT, and JSON
|
||||||
|
- Support for batch transcription of audio files in a folder
|
||||||
|
- Model instance caching to avoid repeated loading
|
||||||
|
- Dynamic batch size adjustment based on GPU memory
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
### Dependencies
|
||||||
|
|
||||||
|
- Python 3.10+
|
||||||
|
- faster-whisper>=0.9.0
|
||||||
|
- torch==2.6.0+cu126
|
||||||
|
- torchaudio==2.6.0+cu126
|
||||||
|
- mcp[cli]>=1.2.0
|
||||||
|
|
||||||
|
### Installation Steps
|
||||||
|
|
||||||
|
1. Clone or download this repository
|
||||||
|
2. Create and activate a virtual environment (recommended)
|
||||||
|
3. Install dependencies:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
### PyTorch Installation Guide
|
||||||
|
|
||||||
|
Install the appropriate version of PyTorch based on your CUDA version:
|
||||||
|
|
||||||
|
- CUDA 12.6:
|
||||||
|
```bash
|
||||||
|
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
|
||||||
|
```
|
||||||
|
|
||||||
|
- CUDA 12.1:
|
||||||
|
```bash
|
||||||
|
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
|
||||||
|
```
|
||||||
|
|
||||||
|
- CPU version:
|
||||||
|
```bash
|
||||||
|
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cpu
|
||||||
|
```
|
||||||
|
|
||||||
|
You can check your CUDA version with `nvcc --version` or `nvidia-smi`.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Starting the Server
|
||||||
|
|
||||||
|
On Windows, simply run `start_server.bat`.
|
||||||
|
|
||||||
|
On other platforms, run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python whisper_server.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### Configuring Claude Desktop
|
||||||
|
|
||||||
|
1. Open the Claude Desktop configuration file:
|
||||||
|
- Windows: `%APPDATA%\Claude\claude_desktop_config.json`
|
||||||
|
- macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
|
||||||
|
|
||||||
|
2. Add the Whisper server configuration:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"mcpServers": {
|
||||||
|
"whisper": {
|
||||||
|
"command": "python",
|
||||||
|
"args": ["D:/path/to/whisper_server.py"],
|
||||||
|
"env": {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Restart Claude Desktop
|
||||||
|
|
||||||
|
### Available Tools
|
||||||
|
|
||||||
|
The server provides the following tools:
|
||||||
|
|
||||||
|
1. **get_model_info** - Get information about available Whisper models
|
||||||
|
2. **transcribe** - Transcribe a single audio file
|
||||||
|
3. **batch_transcribe** - Batch transcribe audio files in a folder
|
||||||
|
|
||||||
|
## Performance Optimization Tips
|
||||||
|
|
||||||
|
- Using CUDA acceleration significantly improves transcription speed
|
||||||
|
- Batch processing mode is more efficient for large numbers of short audio files
|
||||||
|
- Batch size is automatically adjusted based on GPU memory size
|
||||||
|
- Using VAD (Voice Activity Detection) filtering improves accuracy for long audio
|
||||||
|
- Specifying the correct language can improve transcription quality
|
||||||
|
|
||||||
|
## Local Testing Methods
|
||||||
|
|
||||||
|
1. Use MCP Inspector for quick testing:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp dev whisper_server.py
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Use Claude Desktop for integration testing
|
||||||
|
|
||||||
|
3. Use command line direct invocation (requires mcp[cli]):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp run whisper_server.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
The server implements the following error handling mechanisms:
|
||||||
|
|
||||||
|
- Audio file existence check
|
||||||
|
- Model loading failure handling
|
||||||
|
- Transcription process exception catching
|
||||||
|
- GPU memory management
|
||||||
|
- Batch processing parameter adaptive adjustment
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
- `whisper_server.py`: Main server code
|
||||||
|
- `model_manager.py`: Whisper model loading and caching
|
||||||
|
- `audio_processor.py`: Audio file validation and preprocessing
|
||||||
|
- `formatters.py`: Output formatting (VTT, SRT, JSON)
|
||||||
|
- `transcriber.py`: Core transcription logic
|
||||||
|
- `start_server.bat`: Windows startup script
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT
|
||||||
|
|
||||||
|
## Acknowledgements
|
||||||
|
|
||||||
|
This project was developed with the assistance of these amazing AI tools and models:
|
||||||
|
|
||||||
|
- [GitHub Copilot](https://github.com/features/copilot) - AI pair programmer
|
||||||
|
- [Trae](https://trae.ai/) - Agentic AI coding assistant
|
||||||
|
- [Cline](https://cline.ai/) - AI-powered terminal
|
||||||
|
- [DeepSeek](https://www.deepseek.com/) - Advanced AI model
|
||||||
|
- [Claude-3.7-Sonnet](https://www.anthropic.com/claude) - Anthropic's powerful AI assistant
|
||||||
|
- [Gemini-2.0-Flash](https://ai.google/gemini/) - Google's multimodal AI model
|
||||||
|
- [VS Code](https://code.visualstudio.com/) - Powerful code editor
|
||||||
|
- [Whisper](https://github.com/openai/whisper) - OpenAI's speech recognition model
|
||||||
|
- [Faster Whisper](https://github.com/guillaumekln/faster-whisper) - Optimized Whisper implementation
|
||||||
|
|
||||||
|
Special thanks to these incredible tools and the teams behind them.
|
||||||
|
|
||||||
Reference in New Issue
Block a user