From 5c2cfaa20687b84cee58d8aa05db3839f5178d9c Mon Sep 17 00:00:00 2001 From: BigUncleHomePC Date: Sat, 22 Mar 2025 05:38:27 +0800 Subject: [PATCH] =?UTF-8?q?docs:=20=E6=9B=B4=E6=96=B0README=E6=96=87?= =?UTF-8?q?=E4=BB=B6=E4=BB=A5=E5=8C=85=E5=90=AB=E8=87=B4=E8=B0=A2=E9=83=A8?= =?UTF-8?q?=E5=88=86=E5=92=8C=E8=8B=B1=E6=96=87=E6=96=87=E6=A1=A3?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 在README-CN.md中添加了致谢部分,感谢开发过程中使用的AI工具和模型。同时新增了README.md文件,提供项目的英文文档,包括功能、安装、使用说明、性能优化等内容。 --- README-CN.md | 17 ++++++ README.md | 163 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 180 insertions(+) create mode 100644 README.md diff --git a/README-CN.md b/README-CN.md index 602a730..bc2ddec 100644 --- a/README-CN.md +++ b/README-CN.md @@ -112,6 +112,23 @@ mcp run whisper_server.py MIT +## 致谢 + +本项目在开发过程中得到了以下优秀AI工具和模型的帮助: + +- [GitHub Copilot](https://github.com/features/copilot) - AI结对编程助手 +- [Trae](https://trae.ai/) - 智能AI编码助手 +- [Cline](https://cline.ai/) - AI驱动的终端 +- [DeepSeek](https://www.deepseek.com/) - 先进的AI模型 +- [Claude-3.7-Sonnet](https://www.anthropic.com/claude) - Anthropic强大的AI助手 +- [Gemini-2.0-Flash](https://ai.google/gemini/) - Google的多模态AI模型 +- [VS Code](https://code.visualstudio.com/) - 强大的代码编辑器 +- [Whisper](https://github.com/openai/whisper) - OpenAI的语音识别模型 +- [Faster Whisper](https://github.com/guillaumekln/faster-whisper) - 优化的Whisper实现 + +特别感谢这些出色的工具和背后的团队。 + +--- # Whisper 语音识别 MCP 服务器(cline claude sonnet 3.7 完成所有任务后的说明) diff --git a/README.md b/README.md new file mode 100644 index 0000000..a24f6f3 --- /dev/null +++ b/README.md @@ -0,0 +1,163 @@ +# Whisper Speech Recognition MCP Server +--- +[中文文档](README-CN.md) +--- +A high-performance speech recognition MCP server based on Faster Whisper, providing efficient audio transcription capabilities. + +## Features + +- Integrated with Faster Whisper for efficient speech recognition +- Batch processing acceleration for improved transcription speed +- Automatic CUDA acceleration (if available) +- Support for multiple model sizes (tiny to large-v3) +- Output formats include VTT subtitles, SRT, and JSON +- Support for batch transcription of audio files in a folder +- Model instance caching to avoid repeated loading +- Dynamic batch size adjustment based on GPU memory + +## Installation + +### Dependencies + +- Python 3.10+ +- faster-whisper>=0.9.0 +- torch==2.6.0+cu126 +- torchaudio==2.6.0+cu126 +- mcp[cli]>=1.2.0 + +### Installation Steps + +1. Clone or download this repository +2. Create and activate a virtual environment (recommended) +3. Install dependencies: + +```bash +pip install -r requirements.txt +``` + +### PyTorch Installation Guide + +Install the appropriate version of PyTorch based on your CUDA version: + +- CUDA 12.6: + ```bash + pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126 + ``` + +- CUDA 12.1: + ```bash + pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121 + ``` + +- CPU version: + ```bash + pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cpu + ``` + +You can check your CUDA version with `nvcc --version` or `nvidia-smi`. + +## Usage + +### Starting the Server + +On Windows, simply run `start_server.bat`. + +On other platforms, run: + +```bash +python whisper_server.py +``` + +### Configuring Claude Desktop + +1. Open the Claude Desktop configuration file: + - Windows: `%APPDATA%\Claude\claude_desktop_config.json` + - macOS: `~/Library/Application Support/Claude/claude_desktop_config.json` + +2. Add the Whisper server configuration: + +```json +{ + "mcpServers": { + "whisper": { + "command": "python", + "args": ["D:/path/to/whisper_server.py"], + "env": {} + } + } +} +``` + +3. Restart Claude Desktop + +### Available Tools + +The server provides the following tools: + +1. **get_model_info** - Get information about available Whisper models +2. **transcribe** - Transcribe a single audio file +3. **batch_transcribe** - Batch transcribe audio files in a folder + +## Performance Optimization Tips + +- Using CUDA acceleration significantly improves transcription speed +- Batch processing mode is more efficient for large numbers of short audio files +- Batch size is automatically adjusted based on GPU memory size +- Using VAD (Voice Activity Detection) filtering improves accuracy for long audio +- Specifying the correct language can improve transcription quality + +## Local Testing Methods + +1. Use MCP Inspector for quick testing: + +```bash +mcp dev whisper_server.py +``` + +2. Use Claude Desktop for integration testing + +3. Use command line direct invocation (requires mcp[cli]): + +```bash +mcp run whisper_server.py +``` + +## Error Handling + +The server implements the following error handling mechanisms: + +- Audio file existence check +- Model loading failure handling +- Transcription process exception catching +- GPU memory management +- Batch processing parameter adaptive adjustment + +## Project Structure + +- `whisper_server.py`: Main server code +- `model_manager.py`: Whisper model loading and caching +- `audio_processor.py`: Audio file validation and preprocessing +- `formatters.py`: Output formatting (VTT, SRT, JSON) +- `transcriber.py`: Core transcription logic +- `start_server.bat`: Windows startup script + +## License + +MIT + +## Acknowledgements + +This project was developed with the assistance of these amazing AI tools and models: + +- [GitHub Copilot](https://github.com/features/copilot) - AI pair programmer +- [Trae](https://trae.ai/) - Agentic AI coding assistant +- [Cline](https://cline.ai/) - AI-powered terminal +- [DeepSeek](https://www.deepseek.com/) - Advanced AI model +- [Claude-3.7-Sonnet](https://www.anthropic.com/claude) - Anthropic's powerful AI assistant +- [Gemini-2.0-Flash](https://ai.google/gemini/) - Google's multimodal AI model +- [VS Code](https://code.visualstudio.com/) - Powerful code editor +- [Whisper](https://github.com/openai/whisper) - OpenAI's speech recognition model +- [Faster Whisper](https://github.com/guillaumekln/faster-whisper) - Optimized Whisper implementation + +Special thanks to these incredible tools and the teams behind them. +