From 5c2cfaa20687b84cee58d8aa05db3839f5178d9c Mon Sep 17 00:00:00 2001
From: BigUncleHomePC <biguncle2017@gmail.com>
Date: Sat, 22 Mar 2025 05:38:27 +0800
Subject: [PATCH] =?UTF-8?q?docs:=20=E6=9B=B4=E6=96=B0README=E6=96=87?=
 =?UTF-8?q?=E4=BB=B6=E4=BB=A5=E5=8C=85=E5=90=AB=E8=87=B4=E8=B0=A2=E9=83=A8?=
 =?UTF-8?q?=E5=88=86=E5=92=8C=E8=8B=B1=E6=96=87=E6=96=87=E6=A1=A3?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

在README-CN.md中添加了致谢部分，感谢开发过程中使用的AI工具和模型。同时新增了README.md文件，提供项目的英文文档，包括功能、安装、使用说明、性能优化等内容。
---
 README-CN.md |  17 ++++++
 README.md    | 163 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 180 insertions(+)
 create mode 100644 README.md

diff --git a/README-CN.md b/README-CN.md
index 602a730..bc2ddec 100644
--- a/README-CN.md
+++ b/README-CN.md
@@ -112,6 +112,23 @@ mcp run whisper_server.py
 
 MIT
 
+## 致谢
+
+本项目在开发过程中得到了以下优秀AI工具和模型的帮助：
+
+- [GitHub Copilot](https://github.com/features/copilot) - AI结对编程助手
+- [Trae](https://trae.ai/) - 智能AI编码助手
+- [Cline](https://cline.ai/) - AI驱动的终端
+- [DeepSeek](https://www.deepseek.com/) - 先进的AI模型
+- [Claude-3.7-Sonnet](https://www.anthropic.com/claude) - Anthropic强大的AI助手
+- [Gemini-2.0-Flash](https://ai.google/gemini/) - Google的多模态AI模型
+- [VS Code](https://code.visualstudio.com/) - 强大的代码编辑器
+- [Whisper](https://github.com/openai/whisper) - OpenAI的语音识别模型
+- [Faster Whisper](https://github.com/guillaumekln/faster-whisper) - 优化的Whisper实现
+
+特别感谢这些出色的工具和背后的团队。
+
+---
 
 # Whisper 语音识别 MCP 服务器（cline claude sonnet 3.7 完成所有任务后的说明）
 
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..a24f6f3
--- /dev/null
+++ b/README.md
@@ -0,0 +1,163 @@
+# Whisper Speech Recognition MCP Server
+---
+[中文文档](README-CN.md)
+---
+A high-performance speech recognition MCP server based on Faster Whisper, providing efficient audio transcription capabilities.
+
+## Features
+
+- Integrated with Faster Whisper for efficient speech recognition
+- Batch processing acceleration for improved transcription speed
+- Automatic CUDA acceleration (if available)
+- Support for multiple model sizes (tiny to large-v3)
+- Output formats include VTT subtitles, SRT, and JSON
+- Support for batch transcription of audio files in a folder
+- Model instance caching to avoid repeated loading
+- Dynamic batch size adjustment based on GPU memory
+
+## Installation
+
+### Dependencies
+
+- Python 3.10+
+- faster-whisper>=0.9.0
+- torch==2.6.0+cu126
+- torchaudio==2.6.0+cu126
+- mcp[cli]>=1.2.0
+
+### Installation Steps
+
+1. Clone or download this repository
+2. Create and activate a virtual environment (recommended)
+3. Install dependencies:
+
+```bash
+pip install -r requirements.txt
+```
+
+### PyTorch Installation Guide
+
+Install the appropriate version of PyTorch based on your CUDA version:
+
+- CUDA 12.6:
+  ```bash
+  pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
+  ```
+
+- CUDA 12.1:
+  ```bash
+  pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
+  ```
+
+- CPU version:
+  ```bash
+  pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cpu
+  ```
+
+You can check your CUDA version with `nvcc --version` or `nvidia-smi`.
+
+## Usage
+
+### Starting the Server
+
+On Windows, simply run `start_server.bat`.
+
+On other platforms, run:
+
+```bash
+python whisper_server.py
+```
+
+### Configuring Claude Desktop
+
+1. Open the Claude Desktop configuration file:
+   - Windows: `%APPDATA%\Claude\claude_desktop_config.json`
+   - macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
+
+2. Add the Whisper server configuration:
+
+```json
+{
+  "mcpServers": {
+    "whisper": {
+      "command": "python",
+      "args": ["D:/path/to/whisper_server.py"],
+      "env": {}
+    }
+  }
+}
+```
+
+3. Restart Claude Desktop
+
+### Available Tools
+
+The server provides the following tools:
+
+1. **get_model_info** - Get information about available Whisper models
+2. **transcribe** - Transcribe a single audio file
+3. **batch_transcribe** - Batch transcribe audio files in a folder
+
+## Performance Optimization Tips
+
+- Using CUDA acceleration significantly improves transcription speed
+- Batch processing mode is more efficient for large numbers of short audio files
+- Batch size is automatically adjusted based on GPU memory size
+- Using VAD (Voice Activity Detection) filtering improves accuracy for long audio
+- Specifying the correct language can improve transcription quality
+
+## Local Testing Methods
+
+1. Use MCP Inspector for quick testing:
+
+```bash
+mcp dev whisper_server.py
+```
+
+2. Use Claude Desktop for integration testing
+
+3. Use command line direct invocation (requires mcp[cli]):
+
+```bash
+mcp run whisper_server.py
+```
+
+## Error Handling
+
+The server implements the following error handling mechanisms:
+
+- Audio file existence check
+- Model loading failure handling
+- Transcription process exception catching
+- GPU memory management
+- Batch processing parameter adaptive adjustment
+
+## Project Structure
+
+- `whisper_server.py`: Main server code
+- `model_manager.py`: Whisper model loading and caching
+- `audio_processor.py`: Audio file validation and preprocessing
+- `formatters.py`: Output formatting (VTT, SRT, JSON)
+- `transcriber.py`: Core transcription logic
+- `start_server.bat`: Windows startup script
+
+## License
+
+MIT
+
+## Acknowledgements
+
+This project was developed with the assistance of these amazing AI tools and models:
+
+- [GitHub Copilot](https://github.com/features/copilot) - AI pair programmer
+- [Trae](https://trae.ai/) - Agentic AI coding assistant
+- [Cline](https://cline.ai/) - AI-powered terminal
+- [DeepSeek](https://www.deepseek.com/) - Advanced AI model
+- [Claude-3.7-Sonnet](https://www.anthropic.com/claude) - Anthropic's powerful AI assistant
+- [Gemini-2.0-Flash](https://ai.google/gemini/) - Google's multimodal AI model
+- [VS Code](https://code.visualstudio.com/) - Powerful code editor
+- [Whisper](https://github.com/openai/whisper) - OpenAI's speech recognition model
+- [Faster Whisper](https://github.com/guillaumekln/faster-whisper) - Optimized Whisper implementation
+
+Special thanks to these incredible tools and the teams behind them.
+