feat: 初始化基于Faster Whisper的语音识别MCP服务器
添加了服务器核心代码、启动脚本、依赖配置及文档,支持批处理加速、CUDA优化及多格式输出,便于集成到Claude Desktop中。
This commit is contained in:
17
.gitignore
vendored
Normal file
17
.gitignore
vendored
Normal file
@@ -0,0 +1,17 @@
|
||||
# Python-generated files
|
||||
__pycache__/
|
||||
*.py[oc]
|
||||
build/
|
||||
dist/
|
||||
wheels/
|
||||
*.egg-info
|
||||
|
||||
# Virtual environments
|
||||
.venv/
|
||||
venv/
|
||||
.ven/
|
||||
|
||||
# Cython
|
||||
*.pyd
|
||||
|
||||
|
||||
1
.python-version
Normal file
1
.python-version
Normal file
@@ -0,0 +1 @@
|
||||
3.12
|
||||
167
README-CN.md
Normal file
167
README-CN.md
Normal file
@@ -0,0 +1,167 @@
|
||||
# Whisper 语音识别 MCP 服务器
|
||||
|
||||
基于 Faster Whisper 的语音识别 MCP 服务器,提供高性能的音频转录功能。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- 集成 Faster Whisper 进行高效语音识别
|
||||
- 支持批处理加速,提高转录速度
|
||||
- 自动使用 CUDA 加速(如果可用)
|
||||
- 支持多种模型大小(tiny 到 large-v3)
|
||||
- 输出格式支持 VTT 字幕和 JSON
|
||||
- 支持批量转录文件夹中的音频文件
|
||||
- 模型实例缓存,避免重复加载
|
||||
|
||||
## 安装
|
||||
|
||||
### 依赖项
|
||||
|
||||
- Python 3.10+
|
||||
- faster-whisper>=0.9.0
|
||||
- torch==2.6.0+cu126
|
||||
- torchaudio==2.6.0+cu126
|
||||
- mcp[cli]>=1.2.0
|
||||
|
||||
### 安装步骤
|
||||
|
||||
1. 克隆或下载此仓库
|
||||
2. 创建并激活虚拟环境(推荐)
|
||||
3. 安装依赖项:
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## 使用方法
|
||||
|
||||
### 启动服务器
|
||||
|
||||
在 Windows 上,直接运行 `start_server.bat`。
|
||||
|
||||
在其他平台上,运行:
|
||||
|
||||
```bash
|
||||
python whisper_server.py
|
||||
```
|
||||
|
||||
### 配置 Claude Desktop
|
||||
|
||||
1. 打开 Claude Desktop 配置文件:
|
||||
- Windows: `%APPDATA%\Claude\claude_desktop_config.json`
|
||||
- macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
|
||||
|
||||
2. 添加 Whisper 服务器配置:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"whisper": {
|
||||
"command": "python",
|
||||
"args": ["D:/path/to/whisper_server.py"],
|
||||
"env": {}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
3. 重启 Claude Desktop
|
||||
|
||||
### 可用工具
|
||||
|
||||
服务器提供以下工具:
|
||||
|
||||
1. **get_model_info** - 获取可用的 Whisper 模型信息
|
||||
2. **transcribe** - 转录单个音频文件
|
||||
3. **batch_transcribe** - 批量转录文件夹中的音频文件
|
||||
|
||||
## 性能优化建议
|
||||
|
||||
- 使用 CUDA 加速可显著提高转录速度
|
||||
- 对于大量短音频,批处理模式效率更高
|
||||
- 根据 GPU 显存大小自动调整批处理大小
|
||||
- 对于长音频,使用 VAD 过滤可提高准确性
|
||||
- 指定正确的语言可提高转录质量
|
||||
|
||||
## 本地测试方案
|
||||
|
||||
1. 使用 MCP Inspector 进行快速测试:
|
||||
|
||||
```bash
|
||||
mcp dev whisper_server.py
|
||||
```
|
||||
|
||||
2. 使用 Claude Desktop 进行集成测试
|
||||
|
||||
3. 使用命令行直接调用(需要安装 mcp[cli]):
|
||||
|
||||
```bash
|
||||
mcp run whisper_server.py
|
||||
```
|
||||
|
||||
## 错误处理
|
||||
|
||||
服务器实现了以下错误处理机制:
|
||||
|
||||
- 音频文件不存在检查
|
||||
- 模型加载失败处理
|
||||
- 转录过程异常捕获
|
||||
- GPU 内存管理
|
||||
- 批处理参数自适应调整
|
||||
|
||||
## 许可证
|
||||
|
||||
MIT
|
||||
|
||||
|
||||
# Whisper 语音识别 MCP 服务器(cline claude sonnet 3.7 完成所有任务后的说明)
|
||||
|
||||
我已经成功创建了基于Python的语音识别MCP服务器,该服务器集成了Faster Whisper并支持批处理加速。服务器具有以下特点:
|
||||
|
||||
## 核心功能
|
||||
|
||||
- **集成Faster Whisper**:使用最新版本(>=0.9.0)进行高效语音识别
|
||||
- **CUDA加速**:自动检测并使用CUDA加速,默认使用torch==2.6.0+cu126
|
||||
- **批处理优化**:使用BatchedInferencePipeline实现批处理,根据GPU显存自动调整批大小
|
||||
- **模型缓存**:实现模型实例缓存,避免重复加载模型
|
||||
- **多格式输出**:支持VTT字幕和JSON格式的转录结果
|
||||
- **批量处理**:支持批量转录整个文件夹的音频文件
|
||||
|
||||
## 主要工具
|
||||
|
||||
服务器提供了三个主要工具:
|
||||
|
||||
1. **get_model_info**:获取可用的Whisper模型信息和系统配置
|
||||
2. **transcribe**:转录单个音频文件,支持多种参数配置
|
||||
3. **batch_transcribe**:批量转录文件夹中的音频文件
|
||||
|
||||
## 错误处理机制
|
||||
|
||||
- 音频文件存在性验证
|
||||
- 模型加载异常捕获和日志记录
|
||||
- 转录过程异常处理
|
||||
- GPU内存管理和清理
|
||||
- 批处理参数自适应调整
|
||||
|
||||
## 性能优化
|
||||
|
||||
- 根据GPU显存大小动态调整批处理大小(4-32)
|
||||
- 使用VAD(语音活动检测)过滤提高准确性
|
||||
- 模型实例缓存避免重复加载
|
||||
- 自动选择最佳设备和计算类型
|
||||
|
||||
## 本地测试方案
|
||||
|
||||
提供了多种测试方法:
|
||||
|
||||
- 使用MCP Inspector进行快速测试:`mcp dev whisper_server.py`
|
||||
- 使用Claude Desktop进行集成测试
|
||||
- 使用命令行直接调用:`mcp run whisper_server.py`
|
||||
|
||||
所有文件已准备就绪,包括:
|
||||
|
||||
- whisper_server.py:主服务器代码
|
||||
- requirements.txt:依赖项列表
|
||||
- start_server.bat:Windows启动脚本
|
||||
- README.md:详细文档
|
||||
|
||||
您可以通过运行start_server.bat或直接执行`python whisper_server.py`来启动服务器。
|
||||
5
__init__.py
Normal file
5
__init__.py
Normal file
@@ -0,0 +1,5 @@
|
||||
"""
|
||||
语音识别MCP服务模块
|
||||
"""
|
||||
|
||||
__version__ = "0.1.0"
|
||||
6
main.py
Normal file
6
main.py
Normal file
@@ -0,0 +1,6 @@
|
||||
def main():
|
||||
print("Hello from fast-whisper-mcp-server!")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
7
pyproject.toml
Normal file
7
pyproject.toml
Normal file
@@ -0,0 +1,7 @@
|
||||
[project]
|
||||
name = "fast-whisper-mcp-server"
|
||||
version = "0.1.0"
|
||||
description = "Add your description here"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.12"
|
||||
dependencies = []
|
||||
4
requirements.txt
Normal file
4
requirements.txt
Normal file
@@ -0,0 +1,4 @@
|
||||
faster-whisper>=0.9.0
|
||||
torch==2.6.0+cu126
|
||||
torchaudio==2.6.0+cu126
|
||||
mcp[cli]>=1.2.0
|
||||
16
start_server.bat
Normal file
16
start_server.bat
Normal file
@@ -0,0 +1,16 @@
|
||||
@echo off
|
||||
echo 启动Whisper语音识别MCP服务器...
|
||||
|
||||
:: 激活虚拟环境(如果存在)
|
||||
if exist "..\venv\Scripts\activate.bat" (
|
||||
call ..\venv\Scripts\activate.bat
|
||||
)
|
||||
|
||||
:: 运行MCP服务器
|
||||
python whisper_server.py
|
||||
|
||||
:: 如果出错,暂停以查看错误信息
|
||||
if %ERRORLEVEL% neq 0 (
|
||||
echo 服务器启动失败,错误代码: %ERRORLEVEL%
|
||||
pause
|
||||
)
|
||||
296
whisper_server.py
Normal file
296
whisper_server.py
Normal file
@@ -0,0 +1,296 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
基于Faster Whisper的语音识别MCP服务
|
||||
"""
|
||||
|
||||
import os
|
||||
import json
|
||||
import logging
|
||||
from typing import Optional, Dict, List
|
||||
import torch
|
||||
from faster_whisper import WhisperModel, BatchedInferencePipeline
|
||||
from mcp.server.fastmcp import FastMCP, Context
|
||||
|
||||
# 日志配置
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# 创建FastMCP服务器实例
|
||||
mcp = FastMCP(
|
||||
name="whisper-server",
|
||||
version="0.1.0",
|
||||
dependencies=["faster-whisper>=0.9.0", "torch==2.6.0+cu126", "torchaudio==2.6.0+cu126"]
|
||||
)
|
||||
|
||||
# 全局模型实例缓存
|
||||
model_instances = {}
|
||||
|
||||
@mcp.tool()
|
||||
def get_model_info() -> str:
|
||||
"""获取可用的Whisper模型信息"""
|
||||
models = [
|
||||
"tiny", "base", "small", "medium", "large-v1", "large-v2", "large-v3"
|
||||
]
|
||||
devices = ["cpu", "cuda"] if torch.cuda.is_available() else ["cpu"]
|
||||
compute_types = ["float16", "int8"] if torch.cuda.is_available() else ["int8"]
|
||||
|
||||
info = {
|
||||
"available_models": models,
|
||||
"default_model": "large-v3",
|
||||
"available_devices": devices,
|
||||
"default_device": "cuda" if torch.cuda.is_available() else "cpu",
|
||||
"available_compute_types": compute_types,
|
||||
"default_compute_type": "float16" if torch.cuda.is_available() else "int8",
|
||||
"cuda_available": torch.cuda.is_available()
|
||||
}
|
||||
|
||||
if torch.cuda.is_available():
|
||||
info["gpu_info"] = {
|
||||
"name": torch.cuda.get_device_name(0),
|
||||
"memory_total": f"{torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB"
|
||||
}
|
||||
|
||||
return json.dumps(info, indent=2)
|
||||
|
||||
def get_whisper_model(model_name: str, device: str, compute_type: str) -> Dict:
|
||||
"""
|
||||
获取或创建Whisper模型实例
|
||||
|
||||
Args:
|
||||
model_name: 模型名称 (tiny, base, small, medium, large-v1, large-v2, large-v3)
|
||||
device: 运行设备 (cpu, cuda)
|
||||
compute_type: 计算类型 (float16, int8)
|
||||
|
||||
Returns:
|
||||
dict: 包含模型实例和配置的字典
|
||||
"""
|
||||
global model_instances
|
||||
|
||||
# 生成模型键
|
||||
model_key = f"{model_name}_{device}_{compute_type}"
|
||||
|
||||
# 如果模型已实例化,直接返回
|
||||
if model_key in model_instances:
|
||||
return model_instances[model_key]
|
||||
|
||||
# 自动检测设备
|
||||
if device == "auto":
|
||||
device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||
compute_type = "float16" if device == "cuda" else "int8"
|
||||
|
||||
# 清理GPU内存(如果使用CUDA)
|
||||
if device == "cuda":
|
||||
torch.cuda.empty_cache()
|
||||
|
||||
# 实例化模型
|
||||
try:
|
||||
logger.info(f"加载Whisper模型: {model_name} 设备: {device} 计算类型: {compute_type}")
|
||||
|
||||
# 基础模型
|
||||
model = WhisperModel(
|
||||
model_name,
|
||||
device=device,
|
||||
compute_type=compute_type
|
||||
)
|
||||
|
||||
# 批处理设置 - 默认启用批处理以提高速度
|
||||
batched_model = None
|
||||
batch_size = 0
|
||||
|
||||
if device == "cuda": # 只在CUDA设备上使用批处理
|
||||
# 根据显存大小确定合适的批大小
|
||||
if torch.cuda.is_available():
|
||||
gpu_mem = torch.cuda.get_device_properties(0).total_memory
|
||||
# 根据GPU显存动态调整批大小
|
||||
if gpu_mem > 16e9: # >16GB
|
||||
batch_size = 32
|
||||
elif gpu_mem > 12e9: # >12GB
|
||||
batch_size = 16
|
||||
elif gpu_mem > 8e9: # >8GB
|
||||
batch_size = 8
|
||||
else: # 较小显存
|
||||
batch_size = 4
|
||||
else:
|
||||
batch_size = 8 # 默认值
|
||||
|
||||
logger.info(f"启用批处理加速,批大小: {batch_size}")
|
||||
batched_model = BatchedInferencePipeline(model=model)
|
||||
|
||||
# 创建结果对象
|
||||
result = {
|
||||
'model': model,
|
||||
'device': device,
|
||||
'compute_type': compute_type,
|
||||
'batched_model': batched_model,
|
||||
'batch_size': batch_size
|
||||
}
|
||||
|
||||
# 缓存实例
|
||||
model_instances[model_key] = result
|
||||
return result
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"加载模型失败: {str(e)}")
|
||||
raise
|
||||
|
||||
@mcp.tool()
|
||||
def transcribe(audio_path: str, model_name: str = "large-v3", device: str = "auto",
|
||||
compute_type: str = "auto", language: str = None, output_format: str = "vtt") -> str:
|
||||
"""
|
||||
使用Faster Whisper转录音频文件
|
||||
|
||||
Args:
|
||||
audio_path: 音频文件路径
|
||||
model_name: 模型名称 (tiny, base, small, medium, large-v1, large-v2, large-v3)
|
||||
device: 运行设备 (cpu, cuda, auto)
|
||||
compute_type: 计算类型 (float16, int8, auto)
|
||||
language: 语言代码 (如zh, en, ja等,默认自动检测)
|
||||
output_format: 输出格式 (vtt或json)
|
||||
|
||||
Returns:
|
||||
str: 转录结果,格式为VTT字幕或JSON
|
||||
"""
|
||||
# 验证参数
|
||||
if not os.path.exists(audio_path):
|
||||
return f"错误: 音频文件不存在: {audio_path}"
|
||||
|
||||
try:
|
||||
# 获取模型实例
|
||||
model_instance = get_whisper_model(model_name, device, compute_type)
|
||||
|
||||
# 设置转录参数
|
||||
options = {
|
||||
"language": language,
|
||||
"vad_filter": True, # 使用语音活动检测
|
||||
"vad_parameters": {"min_silence_duration_ms": 500}, # VAD参数优化
|
||||
}
|
||||
|
||||
# 执行转录 - 优先使用批处理模型
|
||||
if model_instance['batched_model'] is not None and model_instance['device'] == 'cuda':
|
||||
logger.info("使用批处理加速进行转录...")
|
||||
# 批处理模型需要单独设置batch_size参数
|
||||
segments, info = model_instance['batched_model'].transcribe(
|
||||
audio_path,
|
||||
batch_size=model_instance['batch_size'],
|
||||
**options
|
||||
)
|
||||
else:
|
||||
logger.info("使用标准模型进行转录...")
|
||||
segments, info = model_instance['model'].transcribe(audio_path, **options)
|
||||
|
||||
# 将生成器转换为列表
|
||||
segment_list = list(segments)
|
||||
|
||||
if not segment_list:
|
||||
return "转录失败,未获得结果"
|
||||
|
||||
# 根据输出格式返回结果
|
||||
if output_format.lower() == "vtt":
|
||||
return format_vtt(segment_list)
|
||||
else:
|
||||
return format_json(segment_list, info)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"转录失败: {str(e)}")
|
||||
return f"转录过程中发生错误: {str(e)}"
|
||||
|
||||
def format_vtt(segments) -> str:
|
||||
"""将转录结果格式化为VTT"""
|
||||
vtt_content = "WEBVTT\n\n"
|
||||
|
||||
for segment in segments:
|
||||
start = format_timestamp(segment.start)
|
||||
end = format_timestamp(segment.end)
|
||||
text = segment.text.strip()
|
||||
|
||||
if text:
|
||||
vtt_content += f"{start} --> {end}\n{text}\n\n"
|
||||
|
||||
return vtt_content
|
||||
|
||||
def format_json(segments, info) -> str:
|
||||
"""将转录结果格式化为JSON"""
|
||||
result = {
|
||||
"segments": [{
|
||||
"start": segment.start,
|
||||
"end": segment.end,
|
||||
"text": segment.text
|
||||
} for segment in segments],
|
||||
"language": info.language,
|
||||
"duration": info.duration
|
||||
}
|
||||
return json.dumps(result, indent=2, ensure_ascii=False)
|
||||
|
||||
def format_timestamp(seconds: float) -> str:
|
||||
"""格式化时间戳为VTT格式"""
|
||||
hours = int(seconds // 3600)
|
||||
minutes = int((seconds % 3600) // 60)
|
||||
seconds = seconds % 60
|
||||
return f"{hours:02d}:{minutes:02d}:{seconds:06.3f}"
|
||||
|
||||
@mcp.tool()
|
||||
def batch_transcribe(audio_folder: str, output_folder: str = None, model_name: str = "large-v3",
|
||||
device: str = "auto", compute_type: str = "auto") -> str:
|
||||
"""
|
||||
批量转录文件夹中的音频文件
|
||||
|
||||
Args:
|
||||
audio_folder: 包含音频文件的文件夹路径
|
||||
output_folder: 输出文件夹路径,默认为audio_folder下的transcript子文件夹
|
||||
model_name: 模型名称
|
||||
device: 运行设备
|
||||
compute_type: 计算类型
|
||||
|
||||
Returns:
|
||||
str: 批处理结果摘要
|
||||
"""
|
||||
if not os.path.isdir(audio_folder):
|
||||
return f"错误: 文件夹不存在: {audio_folder}"
|
||||
|
||||
# 设置输出文件夹
|
||||
if output_folder is None:
|
||||
output_folder = os.path.join(audio_folder, "transcript")
|
||||
|
||||
# 确保输出目录存在
|
||||
os.makedirs(output_folder, exist_ok=True)
|
||||
|
||||
# 获取所有音频文件
|
||||
audio_files = []
|
||||
for filename in os.listdir(audio_folder):
|
||||
if filename.lower().endswith(('.mp3', '.wav', '.m4a', '.flac')):
|
||||
audio_files.append(os.path.join(audio_folder, filename))
|
||||
|
||||
if not audio_files:
|
||||
return f"在 {audio_folder} 中未找到音频文件"
|
||||
|
||||
# 处理每个文件
|
||||
results = []
|
||||
for i, audio_path in enumerate(audio_files):
|
||||
logger.info(f"处理第 {i+1}/{len(audio_files)} 个文件: {os.path.basename(audio_path)}")
|
||||
|
||||
# 设置输出文件路径
|
||||
base_name = os.path.splitext(os.path.basename(audio_path))[0]
|
||||
vtt_path = os.path.join(output_folder, f"{base_name}.vtt")
|
||||
|
||||
# 执行转录
|
||||
result = transcribe(
|
||||
audio_path=audio_path,
|
||||
model_name=model_name,
|
||||
device=device,
|
||||
compute_type=compute_type,
|
||||
output_format="vtt"
|
||||
)
|
||||
|
||||
# 保存结果到文件
|
||||
with open(vtt_path, 'w', encoding='utf-8') as f:
|
||||
f.write(result)
|
||||
|
||||
results.append(f"已转录: {os.path.basename(audio_path)} -> {os.path.basename(vtt_path)}")
|
||||
|
||||
summary = f"批处理完成,成功转录 {len(results)}/{len(audio_files)} 个文件\n\n"
|
||||
summary += "\n".join(results)
|
||||
return summary
|
||||
|
||||
if __name__ == "__main__":
|
||||
# 运行服务器
|
||||
mcp.run()
|
||||
Reference in New Issue
Block a user