Merge branch 'alihan-specific' of https://gitea.umutalihandikel.com/alihan/Fast-Whisper-MCP-Server into alihan-specific

seperate mcp & api servers
2025-10-07 11:20:34 +03:00 · 2025-10-07 11:20:03 +03:00
7 changed files with 604 additions and 11 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,265 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Overview
+
+This is a Whisper-based speech recognition service that provides high-performance audio transcription using Faster Whisper. The service can run as either:
+
+1. **MCP Server** - For integration with Claude Desktop and other MCP clients
+2. **REST API Server** - For HTTP-based integrations
+
+Both servers share the same core transcription logic and can run independently or simultaneously on different ports.
+
+## Development Commands
+
+### Environment Setup
+```bash
+# Create and activate virtual environment
+python3.12 -m venv venv
+source venv/bin/activate
+
+# Install dependencies
+pip install -r requirements.txt
+
+# Install PyTorch with CUDA 12.6 support
+pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
+
+# For CUDA 12.1
+pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
+
+# For CPU-only
+pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cpu
+```
+
+### Running the Servers
+
+#### MCP Server (for Claude Desktop)
+
+```bash
+# Using the startup script (recommended - sets all env vars)
+./run_mcp_server.sh
+
+# Direct Python execution
+python whisper_server.py
+
+# Using MCP CLI for development testing
+mcp dev whisper_server.py
+
+# Run server with MCP CLI
+mcp run whisper_server.py
+```
+
+#### REST API Server (for HTTP clients)
+
+```bash
+# Using the startup script (recommended - sets all env vars)
+./run_api_server.sh
+
+# Direct Python execution with uvicorn
+python api_server.py
+
+# Or using uvicorn directly
+uvicorn api_server:app --host 0.0.0.0 --port 8000
+
+# Development mode with auto-reload
+uvicorn api_server:app --reload --host 0.0.0.0 --port 8000
+```
+
+#### Running Both Simultaneously
+
+```bash
+# Terminal 1: Start MCP server
+./run_mcp_server.sh
+
+# Terminal 2: Start REST API server
+./run_api_server.sh
+```
+
+### Docker
+
+```bash
+# Build Docker image
+docker build -t whisper-mcp-server .
+
+# Run with GPU support
+docker run --gpus all -v /path/to/models:/models -v /path/to/outputs:/outputs whisper-mcp-server
+```
+
+## Architecture
+
+### Core Components
+
+1. **whisper_server.py** - MCP server entry point
+   - Uses FastMCP framework to expose three MCP tools
+   - Delegates to transcriber.py for actual processing
+   - Server initialization at line 19
+
+2. **api_server.py** - REST API server entry point
+   - Uses FastAPI framework to expose HTTP endpoints
+   - Provides 5 REST endpoints: `/`, `/health`, `/models`, `/transcribe`, `/batch-transcribe`, `/upload-transcribe`
+   - Shares the same core transcription logic with MCP server
+   - Includes file upload support via multipart/form-data
+
+3. **transcriber.py** - Core transcription logic (shared by both servers)
+   - `transcribe_audio()` (line 38) - Single file transcription with environment variable support
+   - `batch_transcribe()` (line 208) - Batch processing with progress reporting
+   - All parameters support environment variable defaults
+   - Handles output formatting delegation to formatters.py
+
+4. **model_manager.py** - Whisper model lifecycle management
+   - `get_whisper_model()` (line 44) - Returns cached model instances or loads new ones
+   - `test_gpu_driver()` (line 20) - GPU validation before model loading
+   - Global `model_instances` dict caches loaded models to prevent reloading
+   - Automatically determines batch size based on available GPU memory (lines 113-134)
+
+5. **audio_processor.py** - Audio file validation and preprocessing
+   - `validate_audio_file()` (line 15) - Checks file existence, format, and size
+   - `process_audio()` (line 50) - Decodes audio using faster_whisper's decode_audio
+
+6. **formatters.py** - Output format conversion
+   - `format_vtt()`, `format_srt()`, `format_txt()`, `format_json()` - Convert segments to various formats
+   - All formatters accept segment lists from Whisper output
+
+### Key Architecture Patterns
+
+- **Dual Server Architecture**: Both MCP and REST API servers import and use the same core modules (transcriber.py, model_manager.py, audio_processor.py, formatters.py), ensuring consistent behavior
+- **Model Caching**: Models are cached in `model_instances` dictionary with key format `{model_name}_{device}_{compute_type}` (model_manager.py:84). This cache is shared if both servers run in the same process
+- **Batch Processing**: CUDA devices automatically use BatchedInferencePipeline for performance (model_manager.py:109-134)
+- **Environment Variable Configuration**: All transcription parameters support env var defaults (transcriber.py:19-36)
+- **Device Auto-Detection**: `device="auto"` automatically selects CUDA if available, otherwise CPU (model_manager.py:64-66)
+
+## Environment Variables
+
+All configuration can be set via environment variables in run_mcp_server.sh and run_api_server.sh:
+
+**API Server Specific:**
+- `API_HOST` - API server host (default: 0.0.0.0)
+- `API_PORT` - API server port (default: 8000)
+
+**Transcription Configuration (shared by both servers):**
+
+- `CUDA_VISIBLE_DEVICES` - GPU device selection
+- `WHISPER_MODEL_DIR` - Model storage location (defaults to None for HuggingFace cache)
+- `TRANSCRIPTION_OUTPUT_DIR` - Default output directory for single transcriptions
+- `TRANSCRIPTION_BATCH_OUTPUT_DIR` - Default output directory for batch processing
+- `TRANSCRIPTION_MODEL` - Model size (tiny, base, small, medium, large-v1, large-v2, large-v3)
+- `TRANSCRIPTION_DEVICE` - Execution device (cpu, cuda, auto)
+- `TRANSCRIPTION_COMPUTE_TYPE` - Computation type (float16, int8, auto)
+- `TRANSCRIPTION_OUTPUT_FORMAT` - Output format (vtt, srt, txt, json)
+- `TRANSCRIPTION_BEAM_SIZE` - Beam search size (default: 5)
+- `TRANSCRIPTION_TEMPERATURE` - Sampling temperature (default: 0.0)
+- `TRANSCRIPTION_USE_TIMESTAMP` - Add timestamp to filenames (true/false)
+- `TRANSCRIPTION_FILENAME_PREFIX` - Prefix for output filenames
+- `TRANSCRIPTION_FILENAME_SUFFIX` - Suffix for output filenames
+- `TRANSCRIPTION_LANGUAGE` - Language code (zh, en, ja, etc., auto-detect if not set)
+
+## Supported Configurations
+
+- **Models**: tiny, base, small, medium, large-v1, large-v2, large-v3
+- **Audio formats**: .mp3, .wav, .m4a, .flac, .ogg, .aac
+- **Output formats**: vtt, srt, json, txt
+- **Languages**: zh (Chinese), en (English), ja (Japanese), ko (Korean), de (German), fr (French), es (Spanish), ru (Russian), it (Italian), pt (Portuguese), nl (Dutch), ar (Arabic), hi (Hindi), tr (Turkish), vi (Vietnamese), th (Thai), id (Indonesian)
+
+## REST API Endpoints
+
+The REST API server provides the following HTTP endpoints:
+
+### GET /
+Returns API information and available endpoints.
+
+### GET /health
+Health check endpoint. Returns `{"status": "healthy", "service": "whisper-transcription"}`.
+
+### GET /models
+Returns available Whisper models, devices, languages, and system information (GPU details if CUDA available).
+
+### POST /transcribe
+Transcribe a single audio file that exists on the server.
+
+**Request Body:**
+```json
+{
+  "audio_path": "/path/to/audio.mp3",
+  "model_name": "large-v3",
+  "device": "auto",
+  "compute_type": "auto",
+  "language": "en",
+  "output_format": "txt",
+  "beam_size": 5,
+  "temperature": 0.0,
+  "initial_prompt": null,
+  "output_directory": null
+}
+```
+
+**Response:**
+```json
+{
+  "success": true,
+  "message": "Transcription successful, results saved to: /path/to/output.txt",
+  "output_path": "/path/to/output.txt"
+}
+```
+
+### POST /batch-transcribe
+Batch transcribe all audio files in a folder.
+
+**Request Body:**
+```json
+{
+  "audio_folder": "/path/to/audio/folder",
+  "output_folder": "/path/to/output",
+  "model_name": "large-v3",
+  "output_format": "txt",
+  ...
+}
+```
+
+**Response:**
+```json
+{
+  "success": true,
+  "summary": "Batch processing completed, total transcription time: 00:05:23 | Success: 10/10 | Failed: 0/10"
+}
+```
+
+### POST /upload-transcribe
+Upload an audio file and transcribe it immediately. Returns the transcription file as a download.
+
+**Form Data:**
+- `file`: Audio file (multipart/form-data)
+- `model_name`: Model name (default: "large-v3")
+- `device`: Device (default: "auto")
+- `output_format`: Output format (default: "txt")
+- ... (other transcription parameters)
+
+**Response:** Returns the transcription file for download.
+
+### API Usage Examples
+
+```bash
+# Get model information
+curl http://localhost:8000/models
+
+# Transcribe existing file
+curl -X POST http://localhost:8000/transcribe \
+  -H "Content-Type: application/json" \
+  -d '{"audio_path": "/path/to/audio.mp3", "output_format": "txt"}'
+
+# Upload and transcribe
+curl -X POST http://localhost:8000/upload-transcribe \
+  -F "file=@audio.mp3" \
+  -F "output_format=txt" \
+  -F "model_name=large-v3"
+```
+
+## Important Implementation Details
+
+- GPU memory is checked before loading models (model_manager.py:115-127)
+- Batch size dynamically adjusts: 32 (>16GB), 16 (>12GB), 8 (>8GB), 4 (>4GB), 2 (otherwise)
+- VAD (Voice Activity Detection) is enabled by default for better long-audio accuracy (transcriber.py:101)
+- Word timestamps are enabled by default (transcriber.py:106)
+- Model loading includes GPU driver test to fail fast if GPU is unavailable (model_manager.py:92)
+- Files over 1GB generate warnings about processing time (audio_processor.py:42)
+- Default output format is "txt" for REST API, configured via environment variables for MCP server
--- a/api_server.py
+++ b/api_server.py
@@ -0,0 +1,286 @@
+#!/usr/bin/env python3
+"""
+FastAPI REST API Server for Whisper Transcription
+Provides HTTP REST endpoints for audio transcription
+"""
+
+import os
+import logging
+from typing import Optional
+from fastapi import FastAPI, HTTPException, UploadFile, File, Form
+from fastapi.responses import JSONResponse, FileResponse
+from pydantic import BaseModel, Field
+import json
+
+from model_manager import get_model_info
+from transcriber import transcribe_audio, batch_transcribe
+
+# Logging configuration
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+# Create FastAPI app
+app = FastAPI(
+    title="Whisper Speech Recognition API",
+    description="High-performance audio transcription API based on Faster Whisper",
+    version="0.1.1"
+)
+
+
+# Request/Response Models
+class TranscribeRequest(BaseModel):
+    audio_path: str = Field(..., description="Path to the audio file on the server")
+    model_name: str = Field("large-v3", description="Whisper model name")
+    device: str = Field("auto", description="Execution device (cpu, cuda, auto)")
+    compute_type: str = Field("auto", description="Computation type (float16, int8, auto)")
+    language: Optional[str] = Field(None, description="Language code (zh, en, ja, etc.)")
+    output_format: str = Field("txt", description="Output format (vtt, srt, json, txt)")
+    beam_size: int = Field(5, description="Beam search size")
+    temperature: float = Field(0.0, description="Sampling temperature")
+    initial_prompt: Optional[str] = Field(None, description="Initial prompt text")
+    output_directory: Optional[str] = Field(None, description="Output directory path")
+
+
+class BatchTranscribeRequest(BaseModel):
+    audio_folder: str = Field(..., description="Path to folder containing audio files")
+    output_folder: Optional[str] = Field(None, description="Output folder path")
+    model_name: str = Field("large-v3", description="Whisper model name")
+    device: str = Field("auto", description="Execution device (cpu, cuda, auto)")
+    compute_type: str = Field("auto", description="Computation type (float16, int8, auto)")
+    language: Optional[str] = Field(None, description="Language code (zh, en, ja, etc.)")
+    output_format: str = Field("txt", description="Output format (vtt, srt, json, txt)")
+    beam_size: int = Field(5, description="Beam search size")
+    temperature: float = Field(0.0, description="Sampling temperature")
+    initial_prompt: Optional[str] = Field(None, description="Initial prompt text")
+    parallel_files: int = Field(1, description="Number of files to process in parallel")
+
+
+class TranscribeResponse(BaseModel):
+    success: bool
+    message: str
+    output_path: Optional[str] = None
+
+
+class BatchTranscribeResponse(BaseModel):
+    success: bool
+    summary: str
+
+
+# API Endpoints
+
+@app.get("/")
+async def root():
+    """Root endpoint with API information"""
+    return {
+        "name": "Whisper Speech Recognition API",
+        "version": "0.1.1",
+        "endpoints": {
+            "GET /health": "Health check",
+            "GET /models": "Get available models information",
+            "POST /transcribe": "Transcribe a single audio file",
+            "POST /batch-transcribe": "Batch transcribe audio files",
+            "POST /upload-transcribe": "Upload and transcribe audio file"
+        }
+    }
+
+
+@app.get("/health")
+async def health_check():
+    """Health check endpoint"""
+    return {"status": "healthy", "service": "whisper-transcription"}
+
+
+@app.get("/models")
+async def get_models():
+    """Get available Whisper models and configuration information"""
+    try:
+        model_info = get_model_info()
+        return JSONResponse(content=json.loads(model_info))
+    except Exception as e:
+        logger.error(f"Failed to get model info: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"Failed to get model info: {str(e)}")
+
+
+@app.post("/transcribe", response_model=TranscribeResponse)
+async def transcribe(request: TranscribeRequest):
+    """
+    Transcribe a single audio file
+
+    The audio file must already exist on the server at the specified path.
+    """
+    try:
+        logger.info(f"Received transcription request for: {request.audio_path}")
+
+        result = transcribe_audio(
+            audio_path=request.audio_path,
+            model_name=request.model_name,
+            device=request.device,
+            compute_type=request.compute_type,
+            language=request.language,
+            output_format=request.output_format,
+            beam_size=request.beam_size,
+            temperature=request.temperature,
+            initial_prompt=request.initial_prompt,
+            output_directory=request.output_directory
+        )
+
+        # Parse result to determine success
+        if result.startswith("Error") or "failed" in result.lower():
+            return TranscribeResponse(
+                success=False,
+                message=result,
+                output_path=None
+            )
+
+        # Extract output path from success message
+        output_path = None
+        if "saved to:" in result:
+            output_path = result.split("saved to:")[1].strip()
+
+        return TranscribeResponse(
+            success=True,
+            message=result,
+            output_path=output_path
+        )
+
+    except Exception as e:
+        logger.error(f"Transcription failed: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"Transcription failed: {str(e)}")
+
+
+@app.post("/batch-transcribe", response_model=BatchTranscribeResponse)
+async def batch_transcribe_endpoint(request: BatchTranscribeRequest):
+    """
+    Batch transcribe all audio files in a folder
+
+    Processes all supported audio files in the specified folder.
+    """
+    try:
+        logger.info(f"Received batch transcription request for: {request.audio_folder}")
+
+        result = batch_transcribe(
+            audio_folder=request.audio_folder,
+            output_folder=request.output_folder,
+            model_name=request.model_name,
+            device=request.device,
+            compute_type=request.compute_type,
+            language=request.language,
+            output_format=request.output_format,
+            beam_size=request.beam_size,
+            temperature=request.temperature,
+            initial_prompt=request.initial_prompt,
+            parallel_files=request.parallel_files
+        )
+
+        # Check if there were errors
+        success = not result.startswith("Error")
+
+        return BatchTranscribeResponse(
+            success=success,
+            summary=result
+        )
+
+    except Exception as e:
+        logger.error(f"Batch transcription failed: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"Batch transcription failed: {str(e)}")
+
+
+@app.post("/upload-transcribe")
+async def upload_and_transcribe(
+    file: UploadFile = File(...),
+    model_name: str = Form("large-v3"),
+    device: str = Form("auto"),
+    compute_type: str = Form("auto"),
+    language: Optional[str] = Form(None),
+    output_format: str = Form("txt"),
+    beam_size: int = Form(5),
+    temperature: float = Form(0.0),
+    initial_prompt: Optional[str] = Form(None)
+):
+    """
+    Upload an audio file and transcribe it
+
+    This endpoint accepts file uploads via multipart/form-data.
+    """
+    import tempfile
+    import shutil
+
+    try:
+        # Create temporary directory for upload
+        temp_dir = tempfile.mkdtemp(prefix="whisper_upload_")
+
+        # Save uploaded file
+        file_ext = os.path.splitext(file.filename)[1]
+        temp_audio_path = os.path.join(temp_dir, f"upload{file_ext}")
+
+        with open(temp_audio_path, "wb") as buffer:
+            shutil.copyfileobj(file.file, buffer)
+
+        logger.info(f"Uploaded file saved to: {temp_audio_path}")
+
+        # Transcribe the uploaded file
+        result = transcribe_audio(
+            audio_path=temp_audio_path,
+            model_name=model_name,
+            device=device,
+            compute_type=compute_type,
+            language=language,
+            output_format=output_format,
+            beam_size=beam_size,
+            temperature=temperature,
+            initial_prompt=initial_prompt,
+            output_directory=temp_dir
+        )
+
+        # Parse result
+        if result.startswith("Error") or "failed" in result.lower():
+            # Clean up temp files
+            shutil.rmtree(temp_dir, ignore_errors=True)
+            raise HTTPException(status_code=500, detail=result)
+
+        # Extract output path
+        output_path = None
+        if "saved to:" in result:
+            output_path = result.split("saved to:")[1].strip()
+
+        # Return the transcription file
+        if output_path and os.path.exists(output_path):
+            return FileResponse(
+                output_path,
+                media_type="text/plain",
+                filename=os.path.basename(output_path),
+                background=None  # Don't delete yet, we'll clean up after
+            )
+        else:
+            # Clean up temp files
+            shutil.rmtree(temp_dir, ignore_errors=True)
+            return JSONResponse(content={
+                "success": True,
+                "message": result
+            })
+
+    except Exception as e:
+        logger.error(f"Upload and transcribe failed: {str(e)}")
+        # Clean up temp files on error
+        if 'temp_dir' in locals():
+            shutil.rmtree(temp_dir, ignore_errors=True)
+        raise HTTPException(status_code=500, detail=f"Upload and transcribe failed: {str(e)}")
+    finally:
+        await file.close()
+
+
+if __name__ == "__main__":
+    import uvicorn
+
+    # Get configuration from environment variables
+    host = os.getenv("API_HOST", "0.0.0.0")
+    port = int(os.getenv("API_PORT", "8000"))
+
+    logger.info(f"Starting Whisper REST API server on {host}:{port}")
+
+    uvicorn.run(
+        app,
+        host=host,
+        port=port,
+        log_level="info"
+    )
--- a/mcp.logs
+++ b/mcp.logs
@@ -1,6 +0,0 @@
-{"jsonrpc":"2.0","id":1,"result":{"protocolVersion":"2025-03-26","capabilities":{"experimental":{},"prompts":{"listChanged":false},"resources":{"subscribe":false,"listChanged":false},"tools":{"listChanged":false}},"serverInfo":{"name":"fast-whisper-mcp-server","version":"1.9.4"}}}
-INFO:mcp.server.lowlevel.server:Processing request of type ListToolsRequest
-{"jsonrpc":"2.0","id":2,"result":{"tools":[{"name":"get_model_info_api","description":"\n    Get available Whisper model information\n    ","inputSchema":{"properties":{},"title":"get_model_info_apiArguments","type":"object"}},{"name":"transcribe","description":"\n    Transcribe audio files using Faster Whisper\n\n    Args:\n        audio_path: Path to the audio file\n        model_name: Model name (tiny, base, small, medium, large-v1, large-v2, large-v3)\n        device: Execution device (cpu, cuda, auto)\n        compute_type: Computation type (float16, int8, auto)\n        language: Language code (such as zh, en, ja, etc., auto-detect by default)\n        output_format: Output format (vtt, srt, json or txt)\n        beam_size: Beam search size, larger values may improve accuracy but reduce speed\n        temperature: Sampling temperature, greedy decoding\n        initial_prompt: Initial prompt text, can help the model better understand context\n        output_directory: Output directory path, defaults to the audio file's directory\n\n    Returns:\n        str: Transcription result, in VTT subtitle or JSON format\n    ","inputSchema":{"properties":{"audio_path":{"title":"Audio Path","type":"string"},"model_name":{"default":"large-v3","title":"Model Name","type":"string"},"device":{"default":"auto","title":"Device","type":"string"},"compute_type":{"default":"auto","title":"Compute Type","type":"string"},"language":{"default":null,"title":"Language","type":"string"},"output_format":{"default":"vtt","title":"Output Format","type":"string"},"beam_size":{"default":5,"title":"Beam Size","type":"integer"},"temperature":{"default":0.0,"title":"Temperature","type":"number"},"initial_prompt":{"default":null,"title":"Initial Prompt","type":"string"},"output_directory":{"default":null,"title":"Output Directory","type":"string"}},"required":["audio_path"],"title":"transcribeArguments","type":"object"}},{"name":"batch_transcribe_audio","description":"\n    Batch transcribe audio files in a folder\n\n    Args:\n        audio_folder: Path to the folder containing audio files\n        output_folder: Output folder path, defaults to a 'transcript' subfolder in audio_folder\n        model_name: Model name (tiny, base, small, medium, large-v1, large-v2, large-v3)\n        device: Execution device (cpu, cuda, auto)\n        compute_type: Computation type (float16, int8, auto)\n        language: Language code (such as zh, en, ja, etc., auto-detect by default)\n        output_format: Output format (vtt, srt, json or txt)\n        beam_size: Beam search size, larger values may improve accuracy but reduce speed\n        temperature: Sampling temperature, 0 means greedy decoding\n        initial_prompt: Initial prompt text, can help the model better understand context\n        parallel_files: Number of files to process in parallel (only effective in CPU mode)\n\n    Returns:\n        str: Batch processing summary, including processing time and success rate\n    ","inputSchema":{"properties":{"audio_folder":{"title":"Audio Folder","type":"string"},"output_folder":{"default":null,"title":"Output Folder","type":"string"},"model_name":{"default":"large-v3","title":"Model Name","type":"string"},"device":{"default":"auto","title":"Device","type":"string"},"compute_type":{"default":"auto","title":"Compute Type","type":"string"},"language":{"default":null,"title":"Language","type":"string"},"output_format":{"default":"vtt","title":"Output Format","type":"string"},"beam_size":{"default":5,"title":"Beam Size","type":"integer"},"temperature":{"default":0.0,"title":"Temperature","type":"number"},"initial_prompt":{"default":null,"title":"Initial Prompt","type":"string"},"parallel_files":{"default":1,"title":"Parallel Files","type":"integer"}},"required":["audio_folder"],"title":"batch_transcribe_audioArguments","type":"object"}}]}}
-INFO:mcp.server.lowlevel.server:Processing request of type CallToolRequest
-INFO:model_manager:GPU test passed: NVIDIA GeForce RTX 3060 (12.5GB)
-INFO:model_manager:Loading Whisper model: large-v3 device: cuda compute type: float16
--- a/requirements.txt
+++ b/requirements.txt
@@ -8,6 +8,11 @@ torchaudio #==2.6.0+cu126
 # pip install mcp[cli]>=1.2.0
 mcp[cli]

+# REST API dependencies
+fastapi>=0.115.0
+uvicorn[standard]>=0.32.0
+python-multipart>=0.0.9
+
 # PyTorch Installation Guide:
 #    Please install the appropriate version of PyTorch based on your CUDA version:
 #
--- a/run_api_server.sh
+++ b/run_api_server.sh
@@ -0,0 +1,42 @@
+#!/bin/bash
+set -e
+
+datetime_prefix() {
+  date "+[%Y-%m-%d %H:%M:%S]"
+}
+
+# Set environment variables
+export CUDA_VISIBLE_DEVICES=1
+export WHISPER_MODEL_DIR="/home/uad/agents/tools/mcp-transcriptor/data/models"
+export TRANSCRIPTION_OUTPUT_DIR="/media/raid/agents/tools/mcp-transcriptor/outputs"
+export TRANSCRIPTION_BATCH_OUTPUT_DIR="/media/raid/agents/tools/mcp-transcriptor/outputs/batch"
+export TRANSCRIPTION_MODEL="large-v3"
+export TRANSCRIPTION_DEVICE="cuda"
+export TRANSCRIPTION_COMPUTE_TYPE="float16"
+export TRANSCRIPTION_OUTPUT_FORMAT="txt"
+export TRANSCRIPTION_BEAM_SIZE="5"
+export TRANSCRIPTION_TEMPERATURE="0.0"
+export TRANSCRIPTION_USE_TIMESTAMP="false"
+export TRANSCRIPTION_FILENAME_PREFIX=""
+
+# API server configuration
+export API_HOST="0.0.0.0"
+export API_PORT="8000"
+
+# Log start of the script
+echo "$(datetime_prefix) Starting Whisper REST API server..."
+echo "$(datetime_prefix) Model directory: $WHISPER_MODEL_DIR"
+echo "$(datetime_prefix) API server: http://$API_HOST:$API_PORT"
+
+# Optional: Verify required directories exist
+if [ ! -d "$WHISPER_MODEL_DIR" ]; then
+  echo "$(datetime_prefix) Warning: Whisper model directory does not exist: $WHISPER_MODEL_DIR"
+  echo "$(datetime_prefix) Models will be downloaded to default cache directory"
+fi
+
+# Ensure output directories exist
+mkdir -p "$TRANSCRIPTION_OUTPUT_DIR"
+mkdir -p "$TRANSCRIPTION_BATCH_OUTPUT_DIR"
+
+# Run the API server
+/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/api_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/api.logs
--- a/run_mcp_server.sh
+++ b/run_mcp_server.sh
@@ -34,4 +34,6 @@ if [ ! -d "$WHISPER_MODEL_DIR" ]; then
 fi

 # Run the Python script with the defined environment variables
-/home/uad/agents/tools/mcp-transcriptor/venv/bin/python /home/uad/agents/tools/mcp-transcriptor/whisper_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/mcp.logs
+#/home/uad/agents/tools/mcp-transcriptor/venv/bin/python /home/uad/agents/tools/mcp-transcriptor/whisper_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/mcp.logs
+/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/whisper_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/mcp.logs
+
--- a/transcriber.py
+++ b/transcriber.py
@@ -98,7 +98,6 @@ def transcribe_audio(

        # Set transcription parameters
        options = {
-            "verbose": True,
            "language": language,
            "vad_filter": True,
            "vad_parameters": {"min_silence_duration_ms": 500},
@@ -181,12 +180,12 @@ def transcribe_audio(
        # Add suffix if specified
        if FILENAME_SUFFIX:
            filename_parts.append(FILENAME_SUFFIX)
-        
+
        # Add timestamp if enabled
        if USE_TIMESTAMP:
            timestamp = time.strftime("%Y%m%d%H%M%S")
            filename_parts.append(timestamp)
-        
+
        # Join parts and add extension
        base_name = "_".join(filename_parts)
        output_filename = f"{base_name}.{output_format_lower}"
@@ -358,4 +357,4 @@ def report_progress(current: int, total: int, elapsed_time: float) -> str:
    eta = (elapsed_time / current) * (total - current) if current > 0 else 0
    return (f"Progress: {current}/{total} ({progress:.1f}%)" +
            f" | Time used: {format_time(elapsed_time)}" +
-            f" | Estimated remaining: {format_time(eta)}")
+            f" | Estimated remaining: {format_time(eta)}")
Author	SHA1	Message	Date
Alihan	7c9a8d8378	Merge branch 'alihan-specific' of https://gitea.umutalihandikel.com/alihan/Fast-Whisper-MCP-Server into alihan-specific	2025-10-07 11:20:34 +03:00
Alihan	2cc9f298a5	seperate mcp & api servers	2025-10-07 11:20:03 +03:00