Refactor codebase structure with organized src/ directory

- Reorganize source code into src/ directory with logical subdirectories:
  - src/servers/: MCP and REST API server implementations
  - src/core/: Core business logic (transcriber, model_manager)
  - src/utils/: Utility modules (audio_processor, formatters)

- Update all import statements to use proper module paths
- Configure PYTHONPATH in startup scripts and Dockerfile
- Update documentation with new structure and paths
- Update pyproject.toml with package configuration
- Keep DevOps files (scripts, Dockerfile, configs) at root level

All functionality validated and working correctly.
This commit is contained in:
Alihan
2025-10-07 12:28:03 +03:00
parent 7c9a8d8378
commit e7a457e602
15 changed files with 69 additions and 34 deletions

View File

@@ -88,46 +88,67 @@ docker run --gpus all -v /path/to/models:/models -v /path/to/outputs:/outputs wh
## Architecture
### Directory Structure
```
.
├── src/ # Source code directory
│ ├── servers/ # Server implementations
│ │ ├── whisper_server.py # MCP server entry point
│ │ └── api_server.py # REST API server entry point
│ ├── core/ # Core business logic
│ │ ├── transcriber.py # Transcription logic
│ │ └── model_manager.py # Model lifecycle management
│ └── utils/ # Utility modules
│ ├── audio_processor.py # Audio validation and preprocessing
│ └── formatters.py # Output format conversion
├── run_mcp_server.sh # MCP server startup script
├── run_api_server.sh # API server startup script
├── Dockerfile # Docker container configuration
├── requirements.txt # Python dependencies
└── pyproject.toml # Project configuration
```
### Core Components
1. **whisper_server.py** - MCP server entry point
1. **src/servers/whisper_server.py** - MCP server entry point
- Uses FastMCP framework to expose three MCP tools
- Delegates to transcriber.py for actual processing
- Delegates to core.transcriber for actual processing
- Server initialization at line 19
2. **api_server.py** - REST API server entry point
2. **src/servers/api_server.py** - REST API server entry point
- Uses FastAPI framework to expose HTTP endpoints
- Provides 5 REST endpoints: `/`, `/health`, `/models`, `/transcribe`, `/batch-transcribe`, `/upload-transcribe`
- Shares the same core transcription logic with MCP server
- Includes file upload support via multipart/form-data
3. **transcriber.py** - Core transcription logic (shared by both servers)
- `transcribe_audio()` (line 38) - Single file transcription with environment variable support
- `batch_transcribe()` (line 208) - Batch processing with progress reporting
3. **src/core/transcriber.py** - Core transcription logic (shared by both servers)
- `transcribe_audio()` (line 39) - Single file transcription with environment variable support
- `batch_transcribe()` (line 209) - Batch processing with progress reporting
- All parameters support environment variable defaults
- Handles output formatting delegation to formatters.py
- Handles output formatting delegation to utils.formatters
4. **model_manager.py** - Whisper model lifecycle management
4. **src/core/model_manager.py** - Whisper model lifecycle management
- `get_whisper_model()` (line 44) - Returns cached model instances or loads new ones
- `test_gpu_driver()` (line 20) - GPU validation before model loading
- Global `model_instances` dict caches loaded models to prevent reloading
- Automatically determines batch size based on available GPU memory (lines 113-134)
5. **audio_processor.py** - Audio file validation and preprocessing
5. **src/utils/audio_processor.py** - Audio file validation and preprocessing
- `validate_audio_file()` (line 15) - Checks file existence, format, and size
- `process_audio()` (line 50) - Decodes audio using faster_whisper's decode_audio
6. **formatters.py** - Output format conversion
6. **src/utils/formatters.py** - Output format conversion
- `format_vtt()`, `format_srt()`, `format_txt()`, `format_json()` - Convert segments to various formats
- All formatters accept segment lists from Whisper output
### Key Architecture Patterns
- **Dual Server Architecture**: Both MCP and REST API servers import and use the same core modules (transcriber.py, model_manager.py, audio_processor.py, formatters.py), ensuring consistent behavior
- **Model Caching**: Models are cached in `model_instances` dictionary with key format `{model_name}_{device}_{compute_type}` (model_manager.py:84). This cache is shared if both servers run in the same process
- **Batch Processing**: CUDA devices automatically use BatchedInferencePipeline for performance (model_manager.py:109-134)
- **Environment Variable Configuration**: All transcription parameters support env var defaults (transcriber.py:19-36)
- **Device Auto-Detection**: `device="auto"` automatically selects CUDA if available, otherwise CPU (model_manager.py:64-66)
- **Dual Server Architecture**: Both MCP and REST API servers import and use the same core modules (core.transcriber, core.model_manager, utils.audio_processor, utils.formatters), ensuring consistent behavior
- **Model Caching**: Models are cached in `model_instances` dictionary with key format `{model_name}_{device}_{compute_type}` (src/core/model_manager.py:84). This cache is shared if both servers run in the same process
- **Batch Processing**: CUDA devices automatically use BatchedInferencePipeline for performance (src/core/model_manager.py:109-134)
- **Environment Variable Configuration**: All transcription parameters support env var defaults (src/core/transcriber.py:19-36)
- **Device Auto-Detection**: `device="auto"` automatically selects CUDA if available, otherwise CPU (src/core/model_manager.py:64-66)
## Environment Variables
@@ -256,10 +277,10 @@ curl -X POST http://localhost:8000/upload-transcribe \
## Important Implementation Details
- GPU memory is checked before loading models (model_manager.py:115-127)
- GPU memory is checked before loading models (src/core/model_manager.py:115-127)
- Batch size dynamically adjusts: 32 (>16GB), 16 (>12GB), 8 (>8GB), 4 (>4GB), 2 (otherwise)
- VAD (Voice Activity Detection) is enabled by default for better long-audio accuracy (transcriber.py:101)
- Word timestamps are enabled by default (transcriber.py:106)
- Model loading includes GPU driver test to fail fast if GPU is unavailable (model_manager.py:92)
- Files over 1GB generate warnings about processing time (audio_processor.py:42)
- VAD (Voice Activity Detection) is enabled by default for better long-audio accuracy (src/core/transcriber.py:102)
- Word timestamps are enabled by default (src/core/transcriber.py:107)
- Model loading includes GPU driver test to fail fast if GPU is unavailable (src/core/model_manager.py:92)
- Files over 1GB generate warnings about processing time (src/utils/audio_processor.py:42)
- Default output format is "txt" for REST API, configured via environment variables for MCP server

View File

@@ -25,7 +25,7 @@ RUN python -m pip install --upgrade pip
WORKDIR /app
# Copy requirements first for better caching
COPY fast-whisper-mcp-server/requirements.txt .
COPY requirements.txt .
# Install Python dependencies with CUDA support
RUN pip install --no-cache-dir \
@@ -35,11 +35,16 @@ RUN pip install --no-cache-dir \
mcp[cli]
# Copy application code
COPY fast-whisper-mcp-server/ .
COPY src/ ./src/
COPY pyproject.toml .
COPY README.md .
# Create directories for models and outputs
RUN mkdir -p /models /outputs
# Set Python path
ENV PYTHONPATH=/app/src
# Set environment variables for GPU
ENV WHISPER_MODEL_DIR=/models
ENV TRANSCRIPTION_OUTPUT_DIR=/outputs
@@ -48,4 +53,4 @@ ENV TRANSCRIPTION_DEVICE=cuda
ENV TRANSCRIPTION_COMPUTE_TYPE=float16
# Run the server
CMD ["python", "whisper_server.py"]
CMD ["python", "src/servers/whisper_server.py"]

View File

@@ -1,9 +1,12 @@
[project]
name = "fast-whisper-mcp-server"
version = "0.1.0"
description = "Add your description here"
version = "0.1.1"
description = "High-performance speech recognition service with MCP and REST API servers"
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
"faster-whisper>=1.1.1",
]
[tool.setuptools]
packages = ["src"]

View File

@@ -5,6 +5,9 @@ datetime_prefix() {
date "+[%Y-%m-%d %H:%M:%S]"
}
# Set Python path to include src directory
export PYTHONPATH="/home/uad/agents/tools/mcp-transcriptor/src:$PYTHONPATH"
# Set environment variables
export CUDA_VISIBLE_DEVICES=1
export WHISPER_MODEL_DIR="/home/uad/agents/tools/mcp-transcriptor/data/models"
@@ -39,4 +42,4 @@ mkdir -p "$TRANSCRIPTION_OUTPUT_DIR"
mkdir -p "$TRANSCRIPTION_BATCH_OUTPUT_DIR"
# Run the API server
/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/api_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/api.logs
/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/src/servers/api_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/api.logs

View File

@@ -9,6 +9,9 @@ datetime_prefix() {
USER_ID=$(id -u)
GROUP_ID=$(id -g)
# Set Python path to include src directory
export PYTHONPATH="/home/uad/agents/tools/mcp-transcriptor/src:$PYTHONPATH"
# Set environment variables
export CUDA_VISIBLE_DEVICES=1
export WHISPER_MODEL_DIR="/home/uad/agents/tools/mcp-transcriptor/data/models"
@@ -35,5 +38,5 @@ fi
# Run the Python script with the defined environment variables
#/home/uad/agents/tools/mcp-transcriptor/venv/bin/python /home/uad/agents/tools/mcp-transcriptor/whisper_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/mcp.logs
/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/whisper_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/mcp.logs
/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/src/servers/whisper_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/mcp.logs

0
src/core/__init__.py Normal file
View File

View File

@@ -9,9 +9,9 @@ import time
import logging
from typing import Dict, Any, Tuple, List, Optional, Union
from model_manager import get_whisper_model
from audio_processor import validate_audio_file, process_audio
from formatters import format_vtt, format_srt, format_json, format_txt, format_time
from core.model_manager import get_whisper_model
from utils.audio_processor import validate_audio_file, process_audio
from utils.formatters import format_vtt, format_srt, format_json, format_txt, format_time
# Logging configuration
logger = logging.getLogger(__name__)

0
src/servers/__init__.py Normal file
View File

View File

@@ -12,8 +12,8 @@ from fastapi.responses import JSONResponse, FileResponse
from pydantic import BaseModel, Field
import json
from model_manager import get_model_info
from transcriber import transcribe_audio, batch_transcribe
from core.model_manager import get_model_info
from core.transcriber import transcribe_audio, batch_transcribe
# Logging configuration
logging.basicConfig(level=logging.INFO)

View File

@@ -8,8 +8,8 @@ import os
import logging
from mcp.server.fastmcp import FastMCP
from model_manager import get_model_info
from transcriber import transcribe_audio, batch_transcribe
from core.model_manager import get_model_info
from core.transcriber import transcribe_audio, batch_transcribe
# Log configuration
logging.basicConfig(level=logging.INFO)

0
src/utils/__init__.py Normal file
View File