Refactor codebase structure with organized src/ directory

- Reorganize source code into src/ directory with logical subdirectories: - src/servers/: MCP and REST API server implementations - src/core/: Core business logic (transcriber, model_manager) - src/utils/: Utility modules (audio_processor, formatters) - Update all import statements to use proper module paths - Configure PYTHONPATH in startup scripts and Dockerfile - Update documentation with new structure and paths - Update pyproject.toml with package configuration - Keep DevOps files (scripts, Dockerfile, configs) at root level All functionality validated and working correctly.
2025-10-07 12:28:03 +03:00
parent 7c9a8d8378
commit e7a457e602
15 changed files with 69 additions and 34 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -88,46 +88,67 @@ docker run --gpus all -v /path/to/models:/models -v /path/to/outputs:/outputs wh

 ## Architecture

+### Directory Structure
+
+```
+.
+├── src/                          # Source code directory
+│   ├── servers/                  # Server implementations
+│   │   ├── whisper_server.py    # MCP server entry point
+│   │   └── api_server.py        # REST API server entry point
+│   ├── core/                     # Core business logic
+│   │   ├── transcriber.py       # Transcription logic
+│   │   └── model_manager.py     # Model lifecycle management
+│   └── utils/                    # Utility modules
+│       ├── audio_processor.py   # Audio validation and preprocessing
+│       └── formatters.py        # Output format conversion
+├── run_mcp_server.sh            # MCP server startup script
+├── run_api_server.sh            # API server startup script
+├── Dockerfile                    # Docker container configuration
+├── requirements.txt              # Python dependencies
+└── pyproject.toml               # Project configuration
+```
+
 ### Core Components

-1. **whisper_server.py** - MCP server entry point
+1. **src/servers/whisper_server.py** - MCP server entry point
   - Uses FastMCP framework to expose three MCP tools
-   - Delegates to transcriber.py for actual processing
+   - Delegates to core.transcriber for actual processing
   - Server initialization at line 19

-2. **api_server.py** - REST API server entry point
+2. **src/servers/api_server.py** - REST API server entry point
   - Uses FastAPI framework to expose HTTP endpoints
   - Provides 5 REST endpoints: `/`, `/health`, `/models`, `/transcribe`, `/batch-transcribe`, `/upload-transcribe`
   - Shares the same core transcription logic with MCP server
   - Includes file upload support via multipart/form-data

-3. **transcriber.py** - Core transcription logic (shared by both servers)
-   - `transcribe_audio()` (line 38) - Single file transcription with environment variable support
-   - `batch_transcribe()` (line 208) - Batch processing with progress reporting
+3. **src/core/transcriber.py** - Core transcription logic (shared by both servers)
+   - `transcribe_audio()` (line 39) - Single file transcription with environment variable support
+   - `batch_transcribe()` (line 209) - Batch processing with progress reporting
   - All parameters support environment variable defaults
-   - Handles output formatting delegation to formatters.py
+   - Handles output formatting delegation to utils.formatters

-4. **model_manager.py** - Whisper model lifecycle management
+4. **src/core/model_manager.py** - Whisper model lifecycle management
   - `get_whisper_model()` (line 44) - Returns cached model instances or loads new ones
   - `test_gpu_driver()` (line 20) - GPU validation before model loading
   - Global `model_instances` dict caches loaded models to prevent reloading
   - Automatically determines batch size based on available GPU memory (lines 113-134)

-5. **audio_processor.py** - Audio file validation and preprocessing
+5. **src/utils/audio_processor.py** - Audio file validation and preprocessing
   - `validate_audio_file()` (line 15) - Checks file existence, format, and size
   - `process_audio()` (line 50) - Decodes audio using faster_whisper's decode_audio

-6. **formatters.py** - Output format conversion
+6. **src/utils/formatters.py** - Output format conversion
   - `format_vtt()`, `format_srt()`, `format_txt()`, `format_json()` - Convert segments to various formats
   - All formatters accept segment lists from Whisper output

 ### Key Architecture Patterns

- **Dual Server Architecture**: Both MCP and REST API servers import and use the same core modules (transcriber.py, model_manager.py, audio_processor.py, formatters.py), ensuring consistent behavior
- **Model Caching**: Models are cached in `model_instances` dictionary with key format `{model_name}_{device}_{compute_type}` (model_manager.py:84). This cache is shared if both servers run in the same process
- **Batch Processing**: CUDA devices automatically use BatchedInferencePipeline for performance (model_manager.py:109-134)
- **Environment Variable Configuration**: All transcription parameters support env var defaults (transcriber.py:19-36)
- **Device Auto-Detection**: `device="auto"` automatically selects CUDA if available, otherwise CPU (model_manager.py:64-66)
+- **Dual Server Architecture**: Both MCP and REST API servers import and use the same core modules (core.transcriber, core.model_manager, utils.audio_processor, utils.formatters), ensuring consistent behavior
+- **Model Caching**: Models are cached in `model_instances` dictionary with key format `{model_name}_{device}_{compute_type}` (src/core/model_manager.py:84). This cache is shared if both servers run in the same process
+- **Batch Processing**: CUDA devices automatically use BatchedInferencePipeline for performance (src/core/model_manager.py:109-134)
+- **Environment Variable Configuration**: All transcription parameters support env var defaults (src/core/transcriber.py:19-36)
+- **Device Auto-Detection**: `device="auto"` automatically selects CUDA if available, otherwise CPU (src/core/model_manager.py:64-66)

 ## Environment Variables

@@ -256,10 +277,10 @@ curl -X POST http://localhost:8000/upload-transcribe \

 ## Important Implementation Details

- GPU memory is checked before loading models (model_manager.py:115-127)
+- GPU memory is checked before loading models (src/core/model_manager.py:115-127)
 - Batch size dynamically adjusts: 32 (>16GB), 16 (>12GB), 8 (>8GB), 4 (>4GB), 2 (otherwise)
- VAD (Voice Activity Detection) is enabled by default for better long-audio accuracy (transcriber.py:101)
- Word timestamps are enabled by default (transcriber.py:106)
- Model loading includes GPU driver test to fail fast if GPU is unavailable (model_manager.py:92)
- Files over 1GB generate warnings about processing time (audio_processor.py:42)
+- VAD (Voice Activity Detection) is enabled by default for better long-audio accuracy (src/core/transcriber.py:102)
+- Word timestamps are enabled by default (src/core/transcriber.py:107)
+- Model loading includes GPU driver test to fail fast if GPU is unavailable (src/core/model_manager.py:92)
+- Files over 1GB generate warnings about processing time (src/utils/audio_processor.py:42)
 - Default output format is "txt" for REST API, configured via environment variables for MCP server
--- a/11
+++ b/11
@@ -25,7 +25,7 @@ RUN python -m pip install --upgrade pip
 WORKDIR /app

 # Copy requirements first for better caching
-COPY fast-whisper-mcp-server/requirements.txt .
+COPY requirements.txt .

 # Install Python dependencies with CUDA support
 RUN pip install --no-cache-dir \
@@ -35,11 +35,16 @@ RUN pip install --no-cache-dir \
    mcp[cli]

 # Copy application code
-COPY fast-whisper-mcp-server/ .
+COPY src/ ./src/
+COPY pyproject.toml .
+COPY README.md .

 # Create directories for models and outputs
 RUN mkdir -p /models /outputs

+# Set Python path
+ENV PYTHONPATH=/app/src
+
 # Set environment variables for GPU
 ENV WHISPER_MODEL_DIR=/models
 ENV TRANSCRIPTION_OUTPUT_DIR=/outputs
@@ -48,4 +53,4 @@ ENV TRANSCRIPTION_DEVICE=cuda
 ENV TRANSCRIPTION_COMPUTE_TYPE=float16

 # Run the server
-CMD ["python", "whisper_server.py"]
+CMD ["python", "src/servers/whisper_server.py"]
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,9 +1,12 @@
 [project]
 name = "fast-whisper-mcp-server"
-version = "0.1.0"
-description = "Add your description here"
+version = "0.1.1"
+description = "High-performance speech recognition service with MCP and REST API servers"
 readme = "README.md"
 requires-python = ">=3.12"
 dependencies = [
    "faster-whisper>=1.1.1",
 ]
+
+[tool.setuptools]
+packages = ["src"]
--- a/run_api_server.sh
+++ b/run_api_server.sh
@@ -5,6 +5,9 @@ datetime_prefix() {
  date "+[%Y-%m-%d %H:%M:%S]"
 }

+# Set Python path to include src directory
+export PYTHONPATH="/home/uad/agents/tools/mcp-transcriptor/src:$PYTHONPATH"
+
 # Set environment variables
 export CUDA_VISIBLE_DEVICES=1
 export WHISPER_MODEL_DIR="/home/uad/agents/tools/mcp-transcriptor/data/models"
@@ -39,4 +42,4 @@ mkdir -p "$TRANSCRIPTION_OUTPUT_DIR"
 mkdir -p "$TRANSCRIPTION_BATCH_OUTPUT_DIR"

 # Run the API server
-/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/api_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/api.logs
+/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/src/servers/api_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/api.logs
--- a/run_mcp_server.sh
+++ b/run_mcp_server.sh
@@ -9,6 +9,9 @@ datetime_prefix() {
 USER_ID=$(id -u)
 GROUP_ID=$(id -g)

+# Set Python path to include src directory
+export PYTHONPATH="/home/uad/agents/tools/mcp-transcriptor/src:$PYTHONPATH"
+
 # Set environment variables
 export CUDA_VISIBLE_DEVICES=1
 export WHISPER_MODEL_DIR="/home/uad/agents/tools/mcp-transcriptor/data/models"
@@ -35,5 +38,5 @@ fi

 # Run the Python script with the defined environment variables
 #/home/uad/agents/tools/mcp-transcriptor/venv/bin/python /home/uad/agents/tools/mcp-transcriptor/whisper_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/mcp.logs
-/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/whisper_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/mcp.logs
+/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/src/servers/whisper_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/mcp.logs

--- a/src/init.py
+++ b/src/init.py
--- a/src/core/init.py
+++ b/src/core/init.py
--- a/src/core/model_manager.py
+++ b/src/core/model_manager.py
--- a/src/core/transcriber.py
+++ b/src/core/transcriber.py
@@ -9,9 +9,9 @@ import time
 import logging
 from typing import Dict, Any, Tuple, List, Optional, Union

-from model_manager import get_whisper_model
-from audio_processor import validate_audio_file, process_audio
-from formatters import format_vtt, format_srt, format_json, format_txt, format_time
+from core.model_manager import get_whisper_model
+from utils.audio_processor import validate_audio_file, process_audio
+from utils.formatters import format_vtt, format_srt, format_json, format_txt, format_time

 # Logging configuration
 logger = logging.getLogger(__name__)
--- a/src/servers/init.py
+++ b/src/servers/init.py
--- a/src/servers/api_server.py
+++ b/src/servers/api_server.py
@@ -12,8 +12,8 @@ from fastapi.responses import JSONResponse, FileResponse
 from pydantic import BaseModel, Field
 import json

-from model_manager import get_model_info
-from transcriber import transcribe_audio, batch_transcribe
+from core.model_manager import get_model_info
+from core.transcriber import transcribe_audio, batch_transcribe

 # Logging configuration
 logging.basicConfig(level=logging.INFO)
--- a/src/servers/whisper_server.py
+++ b/src/servers/whisper_server.py
@@ -8,8 +8,8 @@ import os
 import logging
 from mcp.server.fastmcp import FastMCP

-from model_manager import get_model_info
-from transcriber import transcribe_audio, batch_transcribe
+from core.model_manager import get_model_info
+from core.transcriber import transcribe_audio, batch_transcribe

 # Log configuration
 logging.basicConfig(level=logging.INFO)
--- a/src/utils/init.py
+++ b/src/utils/init.py
--- a/src/utils/audio_processor.py
+++ b/src/utils/audio_processor.py
--- a/src/utils/formatters.py
+++ b/src/utils/formatters.py