Refactor codebase structure with organized src/ directory
- Reorganize source code into src/ directory with logical subdirectories: - src/servers/: MCP and REST API server implementations - src/core/: Core business logic (transcriber, model_manager) - src/utils/: Utility modules (audio_processor, formatters) - Update all import statements to use proper module paths - Configure PYTHONPATH in startup scripts and Dockerfile - Update documentation with new structure and paths - Update pyproject.toml with package configuration - Keep DevOps files (scripts, Dockerfile, configs) at root level All functionality validated and working correctly.
This commit is contained in:
61
CLAUDE.md
61
CLAUDE.md
@@ -88,46 +88,67 @@ docker run --gpus all -v /path/to/models:/models -v /path/to/outputs:/outputs wh
|
||||
|
||||
## Architecture
|
||||
|
||||
### Directory Structure
|
||||
|
||||
```
|
||||
.
|
||||
├── src/ # Source code directory
|
||||
│ ├── servers/ # Server implementations
|
||||
│ │ ├── whisper_server.py # MCP server entry point
|
||||
│ │ └── api_server.py # REST API server entry point
|
||||
│ ├── core/ # Core business logic
|
||||
│ │ ├── transcriber.py # Transcription logic
|
||||
│ │ └── model_manager.py # Model lifecycle management
|
||||
│ └── utils/ # Utility modules
|
||||
│ ├── audio_processor.py # Audio validation and preprocessing
|
||||
│ └── formatters.py # Output format conversion
|
||||
├── run_mcp_server.sh # MCP server startup script
|
||||
├── run_api_server.sh # API server startup script
|
||||
├── Dockerfile # Docker container configuration
|
||||
├── requirements.txt # Python dependencies
|
||||
└── pyproject.toml # Project configuration
|
||||
```
|
||||
|
||||
### Core Components
|
||||
|
||||
1. **whisper_server.py** - MCP server entry point
|
||||
1. **src/servers/whisper_server.py** - MCP server entry point
|
||||
- Uses FastMCP framework to expose three MCP tools
|
||||
- Delegates to transcriber.py for actual processing
|
||||
- Delegates to core.transcriber for actual processing
|
||||
- Server initialization at line 19
|
||||
|
||||
2. **api_server.py** - REST API server entry point
|
||||
2. **src/servers/api_server.py** - REST API server entry point
|
||||
- Uses FastAPI framework to expose HTTP endpoints
|
||||
- Provides 5 REST endpoints: `/`, `/health`, `/models`, `/transcribe`, `/batch-transcribe`, `/upload-transcribe`
|
||||
- Shares the same core transcription logic with MCP server
|
||||
- Includes file upload support via multipart/form-data
|
||||
|
||||
3. **transcriber.py** - Core transcription logic (shared by both servers)
|
||||
- `transcribe_audio()` (line 38) - Single file transcription with environment variable support
|
||||
- `batch_transcribe()` (line 208) - Batch processing with progress reporting
|
||||
3. **src/core/transcriber.py** - Core transcription logic (shared by both servers)
|
||||
- `transcribe_audio()` (line 39) - Single file transcription with environment variable support
|
||||
- `batch_transcribe()` (line 209) - Batch processing with progress reporting
|
||||
- All parameters support environment variable defaults
|
||||
- Handles output formatting delegation to formatters.py
|
||||
- Handles output formatting delegation to utils.formatters
|
||||
|
||||
4. **model_manager.py** - Whisper model lifecycle management
|
||||
4. **src/core/model_manager.py** - Whisper model lifecycle management
|
||||
- `get_whisper_model()` (line 44) - Returns cached model instances or loads new ones
|
||||
- `test_gpu_driver()` (line 20) - GPU validation before model loading
|
||||
- Global `model_instances` dict caches loaded models to prevent reloading
|
||||
- Automatically determines batch size based on available GPU memory (lines 113-134)
|
||||
|
||||
5. **audio_processor.py** - Audio file validation and preprocessing
|
||||
5. **src/utils/audio_processor.py** - Audio file validation and preprocessing
|
||||
- `validate_audio_file()` (line 15) - Checks file existence, format, and size
|
||||
- `process_audio()` (line 50) - Decodes audio using faster_whisper's decode_audio
|
||||
|
||||
6. **formatters.py** - Output format conversion
|
||||
6. **src/utils/formatters.py** - Output format conversion
|
||||
- `format_vtt()`, `format_srt()`, `format_txt()`, `format_json()` - Convert segments to various formats
|
||||
- All formatters accept segment lists from Whisper output
|
||||
|
||||
### Key Architecture Patterns
|
||||
|
||||
- **Dual Server Architecture**: Both MCP and REST API servers import and use the same core modules (transcriber.py, model_manager.py, audio_processor.py, formatters.py), ensuring consistent behavior
|
||||
- **Model Caching**: Models are cached in `model_instances` dictionary with key format `{model_name}_{device}_{compute_type}` (model_manager.py:84). This cache is shared if both servers run in the same process
|
||||
- **Batch Processing**: CUDA devices automatically use BatchedInferencePipeline for performance (model_manager.py:109-134)
|
||||
- **Environment Variable Configuration**: All transcription parameters support env var defaults (transcriber.py:19-36)
|
||||
- **Device Auto-Detection**: `device="auto"` automatically selects CUDA if available, otherwise CPU (model_manager.py:64-66)
|
||||
- **Dual Server Architecture**: Both MCP and REST API servers import and use the same core modules (core.transcriber, core.model_manager, utils.audio_processor, utils.formatters), ensuring consistent behavior
|
||||
- **Model Caching**: Models are cached in `model_instances` dictionary with key format `{model_name}_{device}_{compute_type}` (src/core/model_manager.py:84). This cache is shared if both servers run in the same process
|
||||
- **Batch Processing**: CUDA devices automatically use BatchedInferencePipeline for performance (src/core/model_manager.py:109-134)
|
||||
- **Environment Variable Configuration**: All transcription parameters support env var defaults (src/core/transcriber.py:19-36)
|
||||
- **Device Auto-Detection**: `device="auto"` automatically selects CUDA if available, otherwise CPU (src/core/model_manager.py:64-66)
|
||||
|
||||
## Environment Variables
|
||||
|
||||
@@ -256,10 +277,10 @@ curl -X POST http://localhost:8000/upload-transcribe \
|
||||
|
||||
## Important Implementation Details
|
||||
|
||||
- GPU memory is checked before loading models (model_manager.py:115-127)
|
||||
- GPU memory is checked before loading models (src/core/model_manager.py:115-127)
|
||||
- Batch size dynamically adjusts: 32 (>16GB), 16 (>12GB), 8 (>8GB), 4 (>4GB), 2 (otherwise)
|
||||
- VAD (Voice Activity Detection) is enabled by default for better long-audio accuracy (transcriber.py:101)
|
||||
- Word timestamps are enabled by default (transcriber.py:106)
|
||||
- Model loading includes GPU driver test to fail fast if GPU is unavailable (model_manager.py:92)
|
||||
- Files over 1GB generate warnings about processing time (audio_processor.py:42)
|
||||
- VAD (Voice Activity Detection) is enabled by default for better long-audio accuracy (src/core/transcriber.py:102)
|
||||
- Word timestamps are enabled by default (src/core/transcriber.py:107)
|
||||
- Model loading includes GPU driver test to fail fast if GPU is unavailable (src/core/model_manager.py:92)
|
||||
- Files over 1GB generate warnings about processing time (src/utils/audio_processor.py:42)
|
||||
- Default output format is "txt" for REST API, configured via environment variables for MCP server
|
||||
|
||||
11
Dockerfile
11
Dockerfile
@@ -25,7 +25,7 @@ RUN python -m pip install --upgrade pip
|
||||
WORKDIR /app
|
||||
|
||||
# Copy requirements first for better caching
|
||||
COPY fast-whisper-mcp-server/requirements.txt .
|
||||
COPY requirements.txt .
|
||||
|
||||
# Install Python dependencies with CUDA support
|
||||
RUN pip install --no-cache-dir \
|
||||
@@ -35,11 +35,16 @@ RUN pip install --no-cache-dir \
|
||||
mcp[cli]
|
||||
|
||||
# Copy application code
|
||||
COPY fast-whisper-mcp-server/ .
|
||||
COPY src/ ./src/
|
||||
COPY pyproject.toml .
|
||||
COPY README.md .
|
||||
|
||||
# Create directories for models and outputs
|
||||
RUN mkdir -p /models /outputs
|
||||
|
||||
# Set Python path
|
||||
ENV PYTHONPATH=/app/src
|
||||
|
||||
# Set environment variables for GPU
|
||||
ENV WHISPER_MODEL_DIR=/models
|
||||
ENV TRANSCRIPTION_OUTPUT_DIR=/outputs
|
||||
@@ -48,4 +53,4 @@ ENV TRANSCRIPTION_DEVICE=cuda
|
||||
ENV TRANSCRIPTION_COMPUTE_TYPE=float16
|
||||
|
||||
# Run the server
|
||||
CMD ["python", "whisper_server.py"]
|
||||
CMD ["python", "src/servers/whisper_server.py"]
|
||||
@@ -1,9 +1,12 @@
|
||||
[project]
|
||||
name = "fast-whisper-mcp-server"
|
||||
version = "0.1.0"
|
||||
description = "Add your description here"
|
||||
version = "0.1.1"
|
||||
description = "High-performance speech recognition service with MCP and REST API servers"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.12"
|
||||
dependencies = [
|
||||
"faster-whisper>=1.1.1",
|
||||
]
|
||||
|
||||
[tool.setuptools]
|
||||
packages = ["src"]
|
||||
|
||||
@@ -5,6 +5,9 @@ datetime_prefix() {
|
||||
date "+[%Y-%m-%d %H:%M:%S]"
|
||||
}
|
||||
|
||||
# Set Python path to include src directory
|
||||
export PYTHONPATH="/home/uad/agents/tools/mcp-transcriptor/src:$PYTHONPATH"
|
||||
|
||||
# Set environment variables
|
||||
export CUDA_VISIBLE_DEVICES=1
|
||||
export WHISPER_MODEL_DIR="/home/uad/agents/tools/mcp-transcriptor/data/models"
|
||||
@@ -39,4 +42,4 @@ mkdir -p "$TRANSCRIPTION_OUTPUT_DIR"
|
||||
mkdir -p "$TRANSCRIPTION_BATCH_OUTPUT_DIR"
|
||||
|
||||
# Run the API server
|
||||
/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/api_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/api.logs
|
||||
/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/src/servers/api_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/api.logs
|
||||
|
||||
@@ -9,6 +9,9 @@ datetime_prefix() {
|
||||
USER_ID=$(id -u)
|
||||
GROUP_ID=$(id -g)
|
||||
|
||||
# Set Python path to include src directory
|
||||
export PYTHONPATH="/home/uad/agents/tools/mcp-transcriptor/src:$PYTHONPATH"
|
||||
|
||||
# Set environment variables
|
||||
export CUDA_VISIBLE_DEVICES=1
|
||||
export WHISPER_MODEL_DIR="/home/uad/agents/tools/mcp-transcriptor/data/models"
|
||||
@@ -35,5 +38,5 @@ fi
|
||||
|
||||
# Run the Python script with the defined environment variables
|
||||
#/home/uad/agents/tools/mcp-transcriptor/venv/bin/python /home/uad/agents/tools/mcp-transcriptor/whisper_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/mcp.logs
|
||||
/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/whisper_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/mcp.logs
|
||||
/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/src/servers/whisper_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/mcp.logs
|
||||
|
||||
|
||||
0
src/core/__init__.py
Normal file
0
src/core/__init__.py
Normal file
@@ -9,9 +9,9 @@ import time
|
||||
import logging
|
||||
from typing import Dict, Any, Tuple, List, Optional, Union
|
||||
|
||||
from model_manager import get_whisper_model
|
||||
from audio_processor import validate_audio_file, process_audio
|
||||
from formatters import format_vtt, format_srt, format_json, format_txt, format_time
|
||||
from core.model_manager import get_whisper_model
|
||||
from utils.audio_processor import validate_audio_file, process_audio
|
||||
from utils.formatters import format_vtt, format_srt, format_json, format_txt, format_time
|
||||
|
||||
# Logging configuration
|
||||
logger = logging.getLogger(__name__)
|
||||
0
src/servers/__init__.py
Normal file
0
src/servers/__init__.py
Normal file
@@ -12,8 +12,8 @@ from fastapi.responses import JSONResponse, FileResponse
|
||||
from pydantic import BaseModel, Field
|
||||
import json
|
||||
|
||||
from model_manager import get_model_info
|
||||
from transcriber import transcribe_audio, batch_transcribe
|
||||
from core.model_manager import get_model_info
|
||||
from core.transcriber import transcribe_audio, batch_transcribe
|
||||
|
||||
# Logging configuration
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
@@ -8,8 +8,8 @@ import os
|
||||
import logging
|
||||
from mcp.server.fastmcp import FastMCP
|
||||
|
||||
from model_manager import get_model_info
|
||||
from transcriber import transcribe_audio, batch_transcribe
|
||||
from core.model_manager import get_model_info
|
||||
from core.transcriber import transcribe_audio, batch_transcribe
|
||||
|
||||
# Log configuration
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
0
src/utils/__init__.py
Normal file
0
src/utils/__init__.py
Normal file
Reference in New Issue
Block a user