Compare commits

...

2 Commits

7 changed files with 604 additions and 11 deletions

265
CLAUDE.md Normal file
View File

@@ -0,0 +1,265 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Overview
This is a Whisper-based speech recognition service that provides high-performance audio transcription using Faster Whisper. The service can run as either:
1. **MCP Server** - For integration with Claude Desktop and other MCP clients
2. **REST API Server** - For HTTP-based integrations
Both servers share the same core transcription logic and can run independently or simultaneously on different ports.
## Development Commands
### Environment Setup
```bash
# Create and activate virtual environment
python3.12 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Install PyTorch with CUDA 12.6 support
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
# For CUDA 12.1
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
# For CPU-only
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cpu
```
### Running the Servers
#### MCP Server (for Claude Desktop)
```bash
# Using the startup script (recommended - sets all env vars)
./run_mcp_server.sh
# Direct Python execution
python whisper_server.py
# Using MCP CLI for development testing
mcp dev whisper_server.py
# Run server with MCP CLI
mcp run whisper_server.py
```
#### REST API Server (for HTTP clients)
```bash
# Using the startup script (recommended - sets all env vars)
./run_api_server.sh
# Direct Python execution with uvicorn
python api_server.py
# Or using uvicorn directly
uvicorn api_server:app --host 0.0.0.0 --port 8000
# Development mode with auto-reload
uvicorn api_server:app --reload --host 0.0.0.0 --port 8000
```
#### Running Both Simultaneously
```bash
# Terminal 1: Start MCP server
./run_mcp_server.sh
# Terminal 2: Start REST API server
./run_api_server.sh
```
### Docker
```bash
# Build Docker image
docker build -t whisper-mcp-server .
# Run with GPU support
docker run --gpus all -v /path/to/models:/models -v /path/to/outputs:/outputs whisper-mcp-server
```
## Architecture
### Core Components
1. **whisper_server.py** - MCP server entry point
- Uses FastMCP framework to expose three MCP tools
- Delegates to transcriber.py for actual processing
- Server initialization at line 19
2. **api_server.py** - REST API server entry point
- Uses FastAPI framework to expose HTTP endpoints
- Provides 5 REST endpoints: `/`, `/health`, `/models`, `/transcribe`, `/batch-transcribe`, `/upload-transcribe`
- Shares the same core transcription logic with MCP server
- Includes file upload support via multipart/form-data
3. **transcriber.py** - Core transcription logic (shared by both servers)
- `transcribe_audio()` (line 38) - Single file transcription with environment variable support
- `batch_transcribe()` (line 208) - Batch processing with progress reporting
- All parameters support environment variable defaults
- Handles output formatting delegation to formatters.py
4. **model_manager.py** - Whisper model lifecycle management
- `get_whisper_model()` (line 44) - Returns cached model instances or loads new ones
- `test_gpu_driver()` (line 20) - GPU validation before model loading
- Global `model_instances` dict caches loaded models to prevent reloading
- Automatically determines batch size based on available GPU memory (lines 113-134)
5. **audio_processor.py** - Audio file validation and preprocessing
- `validate_audio_file()` (line 15) - Checks file existence, format, and size
- `process_audio()` (line 50) - Decodes audio using faster_whisper's decode_audio
6. **formatters.py** - Output format conversion
- `format_vtt()`, `format_srt()`, `format_txt()`, `format_json()` - Convert segments to various formats
- All formatters accept segment lists from Whisper output
### Key Architecture Patterns
- **Dual Server Architecture**: Both MCP and REST API servers import and use the same core modules (transcriber.py, model_manager.py, audio_processor.py, formatters.py), ensuring consistent behavior
- **Model Caching**: Models are cached in `model_instances` dictionary with key format `{model_name}_{device}_{compute_type}` (model_manager.py:84). This cache is shared if both servers run in the same process
- **Batch Processing**: CUDA devices automatically use BatchedInferencePipeline for performance (model_manager.py:109-134)
- **Environment Variable Configuration**: All transcription parameters support env var defaults (transcriber.py:19-36)
- **Device Auto-Detection**: `device="auto"` automatically selects CUDA if available, otherwise CPU (model_manager.py:64-66)
## Environment Variables
All configuration can be set via environment variables in run_mcp_server.sh and run_api_server.sh:
**API Server Specific:**
- `API_HOST` - API server host (default: 0.0.0.0)
- `API_PORT` - API server port (default: 8000)
**Transcription Configuration (shared by both servers):**
- `CUDA_VISIBLE_DEVICES` - GPU device selection
- `WHISPER_MODEL_DIR` - Model storage location (defaults to None for HuggingFace cache)
- `TRANSCRIPTION_OUTPUT_DIR` - Default output directory for single transcriptions
- `TRANSCRIPTION_BATCH_OUTPUT_DIR` - Default output directory for batch processing
- `TRANSCRIPTION_MODEL` - Model size (tiny, base, small, medium, large-v1, large-v2, large-v3)
- `TRANSCRIPTION_DEVICE` - Execution device (cpu, cuda, auto)
- `TRANSCRIPTION_COMPUTE_TYPE` - Computation type (float16, int8, auto)
- `TRANSCRIPTION_OUTPUT_FORMAT` - Output format (vtt, srt, txt, json)
- `TRANSCRIPTION_BEAM_SIZE` - Beam search size (default: 5)
- `TRANSCRIPTION_TEMPERATURE` - Sampling temperature (default: 0.0)
- `TRANSCRIPTION_USE_TIMESTAMP` - Add timestamp to filenames (true/false)
- `TRANSCRIPTION_FILENAME_PREFIX` - Prefix for output filenames
- `TRANSCRIPTION_FILENAME_SUFFIX` - Suffix for output filenames
- `TRANSCRIPTION_LANGUAGE` - Language code (zh, en, ja, etc., auto-detect if not set)
## Supported Configurations
- **Models**: tiny, base, small, medium, large-v1, large-v2, large-v3
- **Audio formats**: .mp3, .wav, .m4a, .flac, .ogg, .aac
- **Output formats**: vtt, srt, json, txt
- **Languages**: zh (Chinese), en (English), ja (Japanese), ko (Korean), de (German), fr (French), es (Spanish), ru (Russian), it (Italian), pt (Portuguese), nl (Dutch), ar (Arabic), hi (Hindi), tr (Turkish), vi (Vietnamese), th (Thai), id (Indonesian)
## REST API Endpoints
The REST API server provides the following HTTP endpoints:
### GET /
Returns API information and available endpoints.
### GET /health
Health check endpoint. Returns `{"status": "healthy", "service": "whisper-transcription"}`.
### GET /models
Returns available Whisper models, devices, languages, and system information (GPU details if CUDA available).
### POST /transcribe
Transcribe a single audio file that exists on the server.
**Request Body:**
```json
{
"audio_path": "/path/to/audio.mp3",
"model_name": "large-v3",
"device": "auto",
"compute_type": "auto",
"language": "en",
"output_format": "txt",
"beam_size": 5,
"temperature": 0.0,
"initial_prompt": null,
"output_directory": null
}
```
**Response:**
```json
{
"success": true,
"message": "Transcription successful, results saved to: /path/to/output.txt",
"output_path": "/path/to/output.txt"
}
```
### POST /batch-transcribe
Batch transcribe all audio files in a folder.
**Request Body:**
```json
{
"audio_folder": "/path/to/audio/folder",
"output_folder": "/path/to/output",
"model_name": "large-v3",
"output_format": "txt",
...
}
```
**Response:**
```json
{
"success": true,
"summary": "Batch processing completed, total transcription time: 00:05:23 | Success: 10/10 | Failed: 0/10"
}
```
### POST /upload-transcribe
Upload an audio file and transcribe it immediately. Returns the transcription file as a download.
**Form Data:**
- `file`: Audio file (multipart/form-data)
- `model_name`: Model name (default: "large-v3")
- `device`: Device (default: "auto")
- `output_format`: Output format (default: "txt")
- ... (other transcription parameters)
**Response:** Returns the transcription file for download.
### API Usage Examples
```bash
# Get model information
curl http://localhost:8000/models
# Transcribe existing file
curl -X POST http://localhost:8000/transcribe \
-H "Content-Type: application/json" \
-d '{"audio_path": "/path/to/audio.mp3", "output_format": "txt"}'
# Upload and transcribe
curl -X POST http://localhost:8000/upload-transcribe \
-F "file=@audio.mp3" \
-F "output_format=txt" \
-F "model_name=large-v3"
```
## Important Implementation Details
- GPU memory is checked before loading models (model_manager.py:115-127)
- Batch size dynamically adjusts: 32 (>16GB), 16 (>12GB), 8 (>8GB), 4 (>4GB), 2 (otherwise)
- VAD (Voice Activity Detection) is enabled by default for better long-audio accuracy (transcriber.py:101)
- Word timestamps are enabled by default (transcriber.py:106)
- Model loading includes GPU driver test to fail fast if GPU is unavailable (model_manager.py:92)
- Files over 1GB generate warnings about processing time (audio_processor.py:42)
- Default output format is "txt" for REST API, configured via environment variables for MCP server

286
api_server.py Normal file
View File

@@ -0,0 +1,286 @@
#!/usr/bin/env python3
"""
FastAPI REST API Server for Whisper Transcription
Provides HTTP REST endpoints for audio transcription
"""
import os
import logging
from typing import Optional
from fastapi import FastAPI, HTTPException, UploadFile, File, Form
from fastapi.responses import JSONResponse, FileResponse
from pydantic import BaseModel, Field
import json
from model_manager import get_model_info
from transcriber import transcribe_audio, batch_transcribe
# Logging configuration
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Create FastAPI app
app = FastAPI(
title="Whisper Speech Recognition API",
description="High-performance audio transcription API based on Faster Whisper",
version="0.1.1"
)
# Request/Response Models
class TranscribeRequest(BaseModel):
audio_path: str = Field(..., description="Path to the audio file on the server")
model_name: str = Field("large-v3", description="Whisper model name")
device: str = Field("auto", description="Execution device (cpu, cuda, auto)")
compute_type: str = Field("auto", description="Computation type (float16, int8, auto)")
language: Optional[str] = Field(None, description="Language code (zh, en, ja, etc.)")
output_format: str = Field("txt", description="Output format (vtt, srt, json, txt)")
beam_size: int = Field(5, description="Beam search size")
temperature: float = Field(0.0, description="Sampling temperature")
initial_prompt: Optional[str] = Field(None, description="Initial prompt text")
output_directory: Optional[str] = Field(None, description="Output directory path")
class BatchTranscribeRequest(BaseModel):
audio_folder: str = Field(..., description="Path to folder containing audio files")
output_folder: Optional[str] = Field(None, description="Output folder path")
model_name: str = Field("large-v3", description="Whisper model name")
device: str = Field("auto", description="Execution device (cpu, cuda, auto)")
compute_type: str = Field("auto", description="Computation type (float16, int8, auto)")
language: Optional[str] = Field(None, description="Language code (zh, en, ja, etc.)")
output_format: str = Field("txt", description="Output format (vtt, srt, json, txt)")
beam_size: int = Field(5, description="Beam search size")
temperature: float = Field(0.0, description="Sampling temperature")
initial_prompt: Optional[str] = Field(None, description="Initial prompt text")
parallel_files: int = Field(1, description="Number of files to process in parallel")
class TranscribeResponse(BaseModel):
success: bool
message: str
output_path: Optional[str] = None
class BatchTranscribeResponse(BaseModel):
success: bool
summary: str
# API Endpoints
@app.get("/")
async def root():
"""Root endpoint with API information"""
return {
"name": "Whisper Speech Recognition API",
"version": "0.1.1",
"endpoints": {
"GET /health": "Health check",
"GET /models": "Get available models information",
"POST /transcribe": "Transcribe a single audio file",
"POST /batch-transcribe": "Batch transcribe audio files",
"POST /upload-transcribe": "Upload and transcribe audio file"
}
}
@app.get("/health")
async def health_check():
"""Health check endpoint"""
return {"status": "healthy", "service": "whisper-transcription"}
@app.get("/models")
async def get_models():
"""Get available Whisper models and configuration information"""
try:
model_info = get_model_info()
return JSONResponse(content=json.loads(model_info))
except Exception as e:
logger.error(f"Failed to get model info: {str(e)}")
raise HTTPException(status_code=500, detail=f"Failed to get model info: {str(e)}")
@app.post("/transcribe", response_model=TranscribeResponse)
async def transcribe(request: TranscribeRequest):
"""
Transcribe a single audio file
The audio file must already exist on the server at the specified path.
"""
try:
logger.info(f"Received transcription request for: {request.audio_path}")
result = transcribe_audio(
audio_path=request.audio_path,
model_name=request.model_name,
device=request.device,
compute_type=request.compute_type,
language=request.language,
output_format=request.output_format,
beam_size=request.beam_size,
temperature=request.temperature,
initial_prompt=request.initial_prompt,
output_directory=request.output_directory
)
# Parse result to determine success
if result.startswith("Error") or "failed" in result.lower():
return TranscribeResponse(
success=False,
message=result,
output_path=None
)
# Extract output path from success message
output_path = None
if "saved to:" in result:
output_path = result.split("saved to:")[1].strip()
return TranscribeResponse(
success=True,
message=result,
output_path=output_path
)
except Exception as e:
logger.error(f"Transcription failed: {str(e)}")
raise HTTPException(status_code=500, detail=f"Transcription failed: {str(e)}")
@app.post("/batch-transcribe", response_model=BatchTranscribeResponse)
async def batch_transcribe_endpoint(request: BatchTranscribeRequest):
"""
Batch transcribe all audio files in a folder
Processes all supported audio files in the specified folder.
"""
try:
logger.info(f"Received batch transcription request for: {request.audio_folder}")
result = batch_transcribe(
audio_folder=request.audio_folder,
output_folder=request.output_folder,
model_name=request.model_name,
device=request.device,
compute_type=request.compute_type,
language=request.language,
output_format=request.output_format,
beam_size=request.beam_size,
temperature=request.temperature,
initial_prompt=request.initial_prompt,
parallel_files=request.parallel_files
)
# Check if there were errors
success = not result.startswith("Error")
return BatchTranscribeResponse(
success=success,
summary=result
)
except Exception as e:
logger.error(f"Batch transcription failed: {str(e)}")
raise HTTPException(status_code=500, detail=f"Batch transcription failed: {str(e)}")
@app.post("/upload-transcribe")
async def upload_and_transcribe(
file: UploadFile = File(...),
model_name: str = Form("large-v3"),
device: str = Form("auto"),
compute_type: str = Form("auto"),
language: Optional[str] = Form(None),
output_format: str = Form("txt"),
beam_size: int = Form(5),
temperature: float = Form(0.0),
initial_prompt: Optional[str] = Form(None)
):
"""
Upload an audio file and transcribe it
This endpoint accepts file uploads via multipart/form-data.
"""
import tempfile
import shutil
try:
# Create temporary directory for upload
temp_dir = tempfile.mkdtemp(prefix="whisper_upload_")
# Save uploaded file
file_ext = os.path.splitext(file.filename)[1]
temp_audio_path = os.path.join(temp_dir, f"upload{file_ext}")
with open(temp_audio_path, "wb") as buffer:
shutil.copyfileobj(file.file, buffer)
logger.info(f"Uploaded file saved to: {temp_audio_path}")
# Transcribe the uploaded file
result = transcribe_audio(
audio_path=temp_audio_path,
model_name=model_name,
device=device,
compute_type=compute_type,
language=language,
output_format=output_format,
beam_size=beam_size,
temperature=temperature,
initial_prompt=initial_prompt,
output_directory=temp_dir
)
# Parse result
if result.startswith("Error") or "failed" in result.lower():
# Clean up temp files
shutil.rmtree(temp_dir, ignore_errors=True)
raise HTTPException(status_code=500, detail=result)
# Extract output path
output_path = None
if "saved to:" in result:
output_path = result.split("saved to:")[1].strip()
# Return the transcription file
if output_path and os.path.exists(output_path):
return FileResponse(
output_path,
media_type="text/plain",
filename=os.path.basename(output_path),
background=None # Don't delete yet, we'll clean up after
)
else:
# Clean up temp files
shutil.rmtree(temp_dir, ignore_errors=True)
return JSONResponse(content={
"success": True,
"message": result
})
except Exception as e:
logger.error(f"Upload and transcribe failed: {str(e)}")
# Clean up temp files on error
if 'temp_dir' in locals():
shutil.rmtree(temp_dir, ignore_errors=True)
raise HTTPException(status_code=500, detail=f"Upload and transcribe failed: {str(e)}")
finally:
await file.close()
if __name__ == "__main__":
import uvicorn
# Get configuration from environment variables
host = os.getenv("API_HOST", "0.0.0.0")
port = int(os.getenv("API_PORT", "8000"))
logger.info(f"Starting Whisper REST API server on {host}:{port}")
uvicorn.run(
app,
host=host,
port=port,
log_level="info"
)

View File

@@ -1,6 +0,0 @@
{"jsonrpc":"2.0","id":1,"result":{"protocolVersion":"2025-03-26","capabilities":{"experimental":{},"prompts":{"listChanged":false},"resources":{"subscribe":false,"listChanged":false},"tools":{"listChanged":false}},"serverInfo":{"name":"fast-whisper-mcp-server","version":"1.9.4"}}}
INFO:mcp.server.lowlevel.server:Processing request of type ListToolsRequest
{"jsonrpc":"2.0","id":2,"result":{"tools":[{"name":"get_model_info_api","description":"\n Get available Whisper model information\n ","inputSchema":{"properties":{},"title":"get_model_info_apiArguments","type":"object"}},{"name":"transcribe","description":"\n Transcribe audio files using Faster Whisper\n\n Args:\n audio_path: Path to the audio file\n model_name: Model name (tiny, base, small, medium, large-v1, large-v2, large-v3)\n device: Execution device (cpu, cuda, auto)\n compute_type: Computation type (float16, int8, auto)\n language: Language code (such as zh, en, ja, etc., auto-detect by default)\n output_format: Output format (vtt, srt, json or txt)\n beam_size: Beam search size, larger values may improve accuracy but reduce speed\n temperature: Sampling temperature, greedy decoding\n initial_prompt: Initial prompt text, can help the model better understand context\n output_directory: Output directory path, defaults to the audio file's directory\n\n Returns:\n str: Transcription result, in VTT subtitle or JSON format\n ","inputSchema":{"properties":{"audio_path":{"title":"Audio Path","type":"string"},"model_name":{"default":"large-v3","title":"Model Name","type":"string"},"device":{"default":"auto","title":"Device","type":"string"},"compute_type":{"default":"auto","title":"Compute Type","type":"string"},"language":{"default":null,"title":"Language","type":"string"},"output_format":{"default":"vtt","title":"Output Format","type":"string"},"beam_size":{"default":5,"title":"Beam Size","type":"integer"},"temperature":{"default":0.0,"title":"Temperature","type":"number"},"initial_prompt":{"default":null,"title":"Initial Prompt","type":"string"},"output_directory":{"default":null,"title":"Output Directory","type":"string"}},"required":["audio_path"],"title":"transcribeArguments","type":"object"}},{"name":"batch_transcribe_audio","description":"\n Batch transcribe audio files in a folder\n\n Args:\n audio_folder: Path to the folder containing audio files\n output_folder: Output folder path, defaults to a 'transcript' subfolder in audio_folder\n model_name: Model name (tiny, base, small, medium, large-v1, large-v2, large-v3)\n device: Execution device (cpu, cuda, auto)\n compute_type: Computation type (float16, int8, auto)\n language: Language code (such as zh, en, ja, etc., auto-detect by default)\n output_format: Output format (vtt, srt, json or txt)\n beam_size: Beam search size, larger values may improve accuracy but reduce speed\n temperature: Sampling temperature, 0 means greedy decoding\n initial_prompt: Initial prompt text, can help the model better understand context\n parallel_files: Number of files to process in parallel (only effective in CPU mode)\n\n Returns:\n str: Batch processing summary, including processing time and success rate\n ","inputSchema":{"properties":{"audio_folder":{"title":"Audio Folder","type":"string"},"output_folder":{"default":null,"title":"Output Folder","type":"string"},"model_name":{"default":"large-v3","title":"Model Name","type":"string"},"device":{"default":"auto","title":"Device","type":"string"},"compute_type":{"default":"auto","title":"Compute Type","type":"string"},"language":{"default":null,"title":"Language","type":"string"},"output_format":{"default":"vtt","title":"Output Format","type":"string"},"beam_size":{"default":5,"title":"Beam Size","type":"integer"},"temperature":{"default":0.0,"title":"Temperature","type":"number"},"initial_prompt":{"default":null,"title":"Initial Prompt","type":"string"},"parallel_files":{"default":1,"title":"Parallel Files","type":"integer"}},"required":["audio_folder"],"title":"batch_transcribe_audioArguments","type":"object"}}]}}
INFO:mcp.server.lowlevel.server:Processing request of type CallToolRequest
INFO:model_manager:GPU test passed: NVIDIA GeForce RTX 3060 (12.5GB)
INFO:model_manager:Loading Whisper model: large-v3 device: cuda compute type: float16

View File

@@ -8,6 +8,11 @@ torchaudio #==2.6.0+cu126
# pip install mcp[cli]>=1.2.0
mcp[cli]
# REST API dependencies
fastapi>=0.115.0
uvicorn[standard]>=0.32.0
python-multipart>=0.0.9
# PyTorch Installation Guide:
# Please install the appropriate version of PyTorch based on your CUDA version:
#

42
run_api_server.sh Executable file
View File

@@ -0,0 +1,42 @@
#!/bin/bash
set -e
datetime_prefix() {
date "+[%Y-%m-%d %H:%M:%S]"
}
# Set environment variables
export CUDA_VISIBLE_DEVICES=1
export WHISPER_MODEL_DIR="/home/uad/agents/tools/mcp-transcriptor/data/models"
export TRANSCRIPTION_OUTPUT_DIR="/media/raid/agents/tools/mcp-transcriptor/outputs"
export TRANSCRIPTION_BATCH_OUTPUT_DIR="/media/raid/agents/tools/mcp-transcriptor/outputs/batch"
export TRANSCRIPTION_MODEL="large-v3"
export TRANSCRIPTION_DEVICE="cuda"
export TRANSCRIPTION_COMPUTE_TYPE="float16"
export TRANSCRIPTION_OUTPUT_FORMAT="txt"
export TRANSCRIPTION_BEAM_SIZE="5"
export TRANSCRIPTION_TEMPERATURE="0.0"
export TRANSCRIPTION_USE_TIMESTAMP="false"
export TRANSCRIPTION_FILENAME_PREFIX=""
# API server configuration
export API_HOST="0.0.0.0"
export API_PORT="8000"
# Log start of the script
echo "$(datetime_prefix) Starting Whisper REST API server..."
echo "$(datetime_prefix) Model directory: $WHISPER_MODEL_DIR"
echo "$(datetime_prefix) API server: http://$API_HOST:$API_PORT"
# Optional: Verify required directories exist
if [ ! -d "$WHISPER_MODEL_DIR" ]; then
echo "$(datetime_prefix) Warning: Whisper model directory does not exist: $WHISPER_MODEL_DIR"
echo "$(datetime_prefix) Models will be downloaded to default cache directory"
fi
# Ensure output directories exist
mkdir -p "$TRANSCRIPTION_OUTPUT_DIR"
mkdir -p "$TRANSCRIPTION_BATCH_OUTPUT_DIR"
# Run the API server
/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/api_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/api.logs

View File

@@ -34,4 +34,6 @@ if [ ! -d "$WHISPER_MODEL_DIR" ]; then
fi
# Run the Python script with the defined environment variables
/home/uad/agents/tools/mcp-transcriptor/venv/bin/python /home/uad/agents/tools/mcp-transcriptor/whisper_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/mcp.logs
#/home/uad/agents/tools/mcp-transcriptor/venv/bin/python /home/uad/agents/tools/mcp-transcriptor/whisper_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/mcp.logs
/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/whisper_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/mcp.logs

View File

@@ -98,7 +98,6 @@ def transcribe_audio(
# Set transcription parameters
options = {
"verbose": True,
"language": language,
"vad_filter": True,
"vad_parameters": {"min_silence_duration_ms": 500},
@@ -181,12 +180,12 @@ def transcribe_audio(
# Add suffix if specified
if FILENAME_SUFFIX:
filename_parts.append(FILENAME_SUFFIX)
# Add timestamp if enabled
if USE_TIMESTAMP:
timestamp = time.strftime("%Y%m%d%H%M%S")
filename_parts.append(timestamp)
# Join parts and add extension
base_name = "_".join(filename_parts)
output_filename = f"{base_name}.{output_format_lower}"
@@ -358,4 +357,4 @@ def report_progress(current: int, total: int, elapsed_time: float) -> str:
eta = (elapsed_time / current) * (total - current) if current > 0 else 0
return (f"Progress: {current}/{total} ({progress:.1f}%)" +
f" | Time used: {format_time(elapsed_time)}" +
f" | Estimated remaining: {format_time(eta)}")
f" | Estimated remaining: {format_time(eta)}")