Add circuit breaker, input validation, and refactor startup logic

- Implement circuit breaker pattern for GPU health checks - Prevents repeated failures with configurable thresholds - Three states: CLOSED, OPEN, HALF_OPEN - Integrated into GPU health monitoring - Add comprehensive input validation and path sanitization - Path traversal attack prevention - Whitelist-based validation for models, devices, formats - Error message sanitization to prevent information leakage - File size limits and security checks - Centralize startup logic across servers - Extract common startup procedures to utils/startup.py - Deduplicate GPU health checks and initialization code - Simplify both MCP and API server startup sequences - Add proper Python package structure - Add __init__.py files to all modules - Improve package organization - Add circuit breaker status API endpoints - GET /health/circuit-breaker - View circuit breaker stats - POST /health/circuit-breaker/reset - Reset circuit breaker - Reorganize test files into tests/ directory - Rename and restructure test files for better organization
Fix deadlock in job queue and refactor Phase 2 tests
2025-10-10 01:03:55 +03:00 · 2025-10-10 00:11:36 +03:00 · 2025-10-09 23:13:11 +03:00 · 2025-10-07 12:28:03 +03:00
35 changed files with 5149 additions and 696 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -4,13 +4,21 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

 ## Overview

-This is a Whisper-based speech recognition service that provides high-performance audio transcription using Faster Whisper. The service can run as either:
+This is a Whisper-based speech recognition service that provides high-performance audio transcription using Faster Whisper. The service runs as either:

 1. **MCP Server** - For integration with Claude Desktop and other MCP clients
-2. **REST API Server** - For HTTP-based integrations
+2. **REST API Server** - For HTTP-based integrations with async job queue support

 Both servers share the same core transcription logic and can run independently or simultaneously on different ports.

+**Key Features:**
+- Async job queue system for long-running transcriptions (prevents HTTP timeouts)
+- GPU health monitoring with strict failure detection (prevents silent CPU fallback)
+- **Automatic GPU driver reset** on CUDA errors with cooldown protection (handles sleep/wake issues)
+- Dual-server architecture (MCP + REST API)
+- Model caching for fast repeated transcriptions
+- Automatic batch size optimization based on GPU memory
+
 ## Development Commands

 ### Environment Setup
@@ -88,46 +96,93 @@ docker run --gpus all -v /path/to/models:/models -v /path/to/outputs:/outputs wh

 ## Architecture

+### Directory Structure
+
+```
+.
+├── src/                          # Source code directory
+│   ├── servers/                  # Server implementations
+│   │   ├── whisper_server.py    # MCP server entry point
+│   │   └── api_server.py        # REST API server (async job queue)
+│   ├── core/                     # Core business logic
+│   │   ├── transcriber.py       # Transcription logic (single & batch)
+│   │   ├── model_manager.py     # Model lifecycle & caching
+│   │   ├── job_queue.py         # Async job queue manager
+│   │   └── gpu_health.py        # GPU health monitoring
+│   └── utils/                    # Utility modules
+│       ├── audio_processor.py   # Audio validation & preprocessing
+│       ├── formatters.py        # Output format conversion
+│       └── test_audio_generator.py # Test audio generation for GPU checks
+├── run_mcp_server.sh            # MCP server startup script
+├── run_api_server.sh            # API server startup script
+├── reset_gpu.sh                 # GPU driver reset script
+├── DEV_PLAN.md                  # Development plan for async features
+├── requirements.txt              # Python dependencies
+└── pyproject.toml               # Project configuration
+```
+
 ### Core Components

-1. **whisper_server.py** - MCP server entry point
-   - Uses FastMCP framework to expose three MCP tools
-   - Delegates to transcriber.py for actual processing
+1. **src/servers/whisper_server.py** - MCP server entry point
+   - Uses FastMCP framework to expose MCP tools
+   - Three main tools: `get_model_info_api()`, `transcribe()`, `batch_transcribe_audio()`
   - Server initialization at line 19

-2. **api_server.py** - REST API server entry point
-   - Uses FastAPI framework to expose HTTP endpoints
-   - Provides 5 REST endpoints: `/`, `/health`, `/models`, `/transcribe`, `/batch-transcribe`, `/upload-transcribe`
-   - Shares the same core transcription logic with MCP server
-   - Includes file upload support via multipart/form-data
+2. **src/servers/api_server.py** - REST API server entry point
+   - Uses FastAPI framework for HTTP endpoints
+   - Provides REST endpoints: `/`, `/health`, `/models`, `/transcribe`, `/batch-transcribe`, `/upload-transcribe`
+   - Shares core transcription logic with MCP server
+   - File upload support via multipart/form-data

-3. **transcriber.py** - Core transcription logic (shared by both servers)
-   - `transcribe_audio()` (line 38) - Single file transcription with environment variable support
-   - `batch_transcribe()` (line 208) - Batch processing with progress reporting
-   - All parameters support environment variable defaults
-   - Handles output formatting delegation to formatters.py
+3. **src/core/transcriber.py** - Core transcription logic (shared by both servers)
+   - `transcribe_audio()`:39 - Single file transcription with environment variable support
+   - `batch_transcribe()`:209 - Batch processing with progress reporting
+   - All parameters support environment variable defaults (lines 21-37)
+   - Delegates output formatting to utils.formatters

-4. **model_manager.py** - Whisper model lifecycle management
-   - `get_whisper_model()` (line 44) - Returns cached model instances or loads new ones
-   - `test_gpu_driver()` (line 20) - GPU validation before model loading
+4. **src/core/model_manager.py** - Whisper model lifecycle management
+   - `get_whisper_model()`:44 - Returns cached model instances or loads new ones
+   - `test_gpu_driver()`:20 - GPU validation before model loading
+   - **CRITICAL**: GPU-only mode enforced at lines 64-90 (no CPU fallback)
   - Global `model_instances` dict caches loaded models to prevent reloading
-   - Automatically determines batch size based on available GPU memory (lines 113-134)
+   - Automatic batch size optimization based on GPU memory (lines 134-147)

-5. **audio_processor.py** - Audio file validation and preprocessing
-   - `validate_audio_file()` (line 15) - Checks file existence, format, and size
-   - `process_audio()` (line 50) - Decodes audio using faster_whisper's decode_audio
+5. **src/core/job_queue.py** - Async job queue manager
+   - `JobQueue` class manages FIFO queue with background worker thread
+   - `submit_job()` - Validates audio, checks GPU health, adds to queue
+   - `get_job_status()` - Returns current job status and queue position
+   - `get_job_result()` - Returns transcription result for completed jobs
+   - Jobs persist to disk as JSON files for crash recovery
+   - Single worker thread processes jobs sequentially (prevents GPU contention)

-6. **formatters.py** - Output format conversion
+6. **src/core/gpu_health.py** - GPU health monitoring
+   - `check_gpu_health()`:39 - Real GPU test using tiny model + test audio
+   - `GPUHealthStatus` dataclass contains detailed GPU metrics
+   - **CRITICAL**: Raises RuntimeError if device=cuda but GPU fails (lines 99-135)
+   - Prevents silent CPU fallback that would cause 10-100x slowdown
+   - `HealthMonitor` class for periodic background monitoring
+
+7. **src/utils/audio_processor.py** - Audio file validation and preprocessing
+   - `validate_audio_file()`:15 - Checks file existence, format, and size
+   - `process_audio()`:50 - Decodes audio using faster_whisper's decode_audio
+
+8. **src/utils/formatters.py** - Output format conversion
   - `format_vtt()`, `format_srt()`, `format_txt()`, `format_json()` - Convert segments to various formats
   - All formatters accept segment lists from Whisper output

+9. **src/utils/test_audio_generator.py** - Test audio generation
+   - `generate_test_audio()` - Creates synthetic 1-second audio for GPU health checks
+   - Uses numpy to generate sine wave, no external audio files needed
+
 ### Key Architecture Patterns

- **Dual Server Architecture**: Both MCP and REST API servers import and use the same core modules (transcriber.py, model_manager.py, audio_processor.py, formatters.py), ensuring consistent behavior
- **Model Caching**: Models are cached in `model_instances` dictionary with key format `{model_name}_{device}_{compute_type}` (model_manager.py:84). This cache is shared if both servers run in the same process
- **Batch Processing**: CUDA devices automatically use BatchedInferencePipeline for performance (model_manager.py:109-134)
- **Environment Variable Configuration**: All transcription parameters support env var defaults (transcriber.py:19-36)
- **Device Auto-Detection**: `device="auto"` automatically selects CUDA if available, otherwise CPU (model_manager.py:64-66)
+- **Dual Server Architecture**: Both MCP and REST API servers import and use the same core modules (core.transcriber, core.model_manager, utils.audio_processor, utils.formatters), ensuring consistent behavior
+- **Model Caching**: Models are cached in `model_instances` dictionary with key format `{model_name}_{device}_{compute_type}` (src/core/model_manager.py:104). This cache is shared if both servers run in the same process
+- **Batch Processing**: CUDA devices automatically use BatchedInferencePipeline for performance (src/core/model_manager.py:132-160)
+- **Environment Variable Configuration**: All transcription parameters support env var defaults (src/core/transcriber.py:21-37)
+- **GPU-Only Mode**: Service is configured for GPU-only operation. `device="auto"` requires CUDA, `device="cpu"` is rejected (src/core/model_manager.py:64-90)
+- **Async Job Queue**: Long-running transcriptions use async queue pattern to prevent HTTP timeouts. Jobs return immediately with job_id for polling
+- **GPU Health Monitoring**: Real GPU tests with tiny model prevent silent CPU fallback. Jobs are rejected immediately if GPU fails rather than running 10-100x slower on CPU

 ## Environment Variables

@@ -137,6 +192,22 @@ All configuration can be set via environment variables in run_mcp_server.sh and
 - `API_HOST` - API server host (default: 0.0.0.0)
 - `API_PORT` - API server port (default: 8000)

+**Job Queue Configuration (if using async features):**
+- `JOB_QUEUE_MAX_SIZE` - Maximum queue size (default: 100)
+- `JOB_METADATA_DIR` - Directory for job metadata JSON files
+- `JOB_RETENTION_DAYS` - Auto-cleanup old jobs (0=disabled)
+
+**GPU Health Monitoring:**
+- `GPU_HEALTH_CHECK_ENABLED` - Enable periodic GPU monitoring (true/false)
+- `GPU_HEALTH_CHECK_INTERVAL_MINUTES` - Monitoring interval (default: 10)
+- `GPU_HEALTH_TEST_MODEL` - Model for health checks (default: tiny)
+
+**GPU Auto-Reset Configuration:**
+- `GPU_RESET_COOLDOWN_MINUTES` - Minimum time between GPU reset attempts (default: 5 minutes)
+  - Prevents reset loops while allowing recovery from sleep/wake cycles
+  - Auto-reset is **enabled by default**
+  - Service terminates if GPU unavailable after reset attempt
+
 **Transcription Configuration (shared by both servers):**

 - `CUDA_VISIBLE_DEVICES` - GPU device selection
@@ -144,7 +215,7 @@ All configuration can be set via environment variables in run_mcp_server.sh and
 - `TRANSCRIPTION_OUTPUT_DIR` - Default output directory for single transcriptions
 - `TRANSCRIPTION_BATCH_OUTPUT_DIR` - Default output directory for batch processing
 - `TRANSCRIPTION_MODEL` - Model size (tiny, base, small, medium, large-v1, large-v2, large-v3)
- `TRANSCRIPTION_DEVICE` - Execution device (cpu, cuda, auto)
+- `TRANSCRIPTION_DEVICE` - Execution device (cuda, auto) - **NOTE: cpu is rejected in GPU-only mode**
 - `TRANSCRIPTION_COMPUTE_TYPE` - Computation type (float16, int8, auto)
 - `TRANSCRIPTION_OUTPUT_FORMAT` - Output format (vtt, srt, txt, json)
 - `TRANSCRIPTION_BEAM_SIZE` - Beam search size (default: 5)
@@ -242,7 +313,7 @@ Upload an audio file and transcribe it immediately. Returns the transcription fi
 # Get model information
 curl http://localhost:8000/models

-# Transcribe existing file
+# Transcribe existing file (synchronous)
 curl -X POST http://localhost:8000/transcribe \
  -H "Content-Type: application/json" \
  -d '{"audio_path": "/path/to/audio.mp3", "output_format": "txt"}'
@@ -252,14 +323,207 @@ curl -X POST http://localhost:8000/upload-transcribe \
  -F "file=@audio.mp3" \
  -F "output_format=txt" \
  -F "model_name=large-v3"
+
+# Async job queue (if enabled)
+# Submit job
+curl -X POST http://localhost:8000/jobs \
+  -H "Content-Type: application/json" \
+  -d '{"audio_path": "/path/to/audio.mp3"}'
+# Returns: {"job_id": "abc-123", "status": "queued", "queue_position": 1}
+
+# Check status
+curl http://localhost:8000/jobs/abc-123
+# Returns: {"status": "running", ...}
+
+# Get result (when completed)
+curl http://localhost:8000/jobs/abc-123/result
+# Returns: transcription text
+
+# Check GPU health
+curl http://localhost:8000/health/gpu
+# Returns: {"gpu_available": true, "gpu_working": true, ...}
+```
+
+## GPU Auto-Reset Configuration
+
+### Overview
+This service features automatic GPU driver reset on CUDA errors, which is especially useful for recovering from sleep/wake cycles. The reset functionality is **enabled by default** and includes cooldown protection to prevent reset loops.
+
+### Passwordless Sudo Setup (Required)
+
+For automatic GPU reset to work, you must configure passwordless sudo for NVIDIA commands. Create a sudoers configuration file:
+
+```bash
+sudo visudo -f /etc/sudoers.d/whisper-gpu-reset
+```
+
+Add the following (replace `your_username` with your actual username):
+
+```
+# Whisper GPU Auto-Reset Permissions
+your_username ALL=(ALL) NOPASSWD: /bin/systemctl stop nvidia-persistenced
+your_username ALL=(ALL) NOPASSWD: /bin/systemctl start nvidia-persistenced
+your_username ALL=(ALL) NOPASSWD: /sbin/rmmod nvidia_uvm
+your_username ALL=(ALL) NOPASSWD: /sbin/rmmod nvidia_drm
+your_username ALL=(ALL) NOPASSWD: /sbin/rmmod nvidia_modeset
+your_username ALL=(ALL) NOPASSWD: /sbin/rmmod nvidia
+your_username ALL=(ALL) NOPASSWD: /sbin/modprobe nvidia
+your_username ALL=(ALL) NOPASSWD: /sbin/modprobe nvidia_modeset
+your_username ALL=(ALL) NOPASSWD: /sbin/modprobe nvidia_uvm
+your_username ALL=(ALL) NOPASSWD: /sbin/modprobe nvidia_drm
+```
+
+**Security Note:** These permissions are limited to specific NVIDIA driver commands only. The reset script (`reset_gpu.sh`) is executed with sudo but is part of the codebase and can be audited.
+
+### How It Works
+
+1. **Startup Check**: When the service starts, it performs a GPU health check
+   - If CUDA errors detected → automatic reset attempt → retry
+   - If retry fails → service terminates
+
+2. **Runtime Check**: Before job submission and model loading
+   - If CUDA errors detected → automatic reset attempt → retry
+   - If retry fails → job rejected, service continues
+
+3. **Cooldown Protection**: Prevents reset loops
+   - Minimum 5 minutes between reset attempts (configurable via `GPU_RESET_COOLDOWN_MINUTES`)
+   - Cooldown persists across restarts (stored in `/tmp/whisper-gpu-last-reset`)
+   - If reset needed but cooldown active → service/job fails immediately
+
+### Manual GPU Reset
+
+You can manually reset the GPU anytime:
+
+```bash
+./reset_gpu.sh
+```
+
+Or clear the cooldown to allow immediate reset:
+
+```python
+from core.gpu_reset import clear_reset_cooldown
+clear_reset_cooldown()
+```
+
+### Behavior Examples
+
+**After sleep/wake with GPU issue:**
+```
+Service starts → GPU check fails (CUDA error)
+→ Cooldown OK → Reset drivers → Wait 3s → Retry
+→ Success → Service continues
+```
+
+**Multiple failures (hardware issue):**
+```
+First failure → Reset → Retry fails → Job fails
+Second failure within 5 min → Cooldown active → Fail immediately
+(Prevents reset loop)
+```
+
+**Normal operation:**
+```
+No CUDA errors → No resets → Normal performance
+Reset only happens on actual CUDA failures
 ```

 ## Important Implementation Details

- GPU memory is checked before loading models (model_manager.py:115-127)
+### GPU-Only Architecture
+- **CRITICAL**: Service enforces GPU-only mode. CPU device is explicitly rejected (src/core/model_manager.py:84-90)
+- `device="auto"` requires CUDA to be available, raises RuntimeError if not (src/core/model_manager.py:64-73)
+- GPU health checks use real model loading + transcription, not just torch.cuda.is_available()
+- If GPU health check fails, jobs are rejected immediately rather than silently falling back to CPU
+- **GPU Auto-Reset**: Automatic driver reset on CUDA errors with 5-minute cooldown (handles sleep/wake issues)
+
+### Model Management
+- GPU memory is checked before loading models (src/core/model_manager.py:115-127)
 - Batch size dynamically adjusts: 32 (>16GB), 16 (>12GB), 8 (>8GB), 4 (>4GB), 2 (otherwise)
- VAD (Voice Activity Detection) is enabled by default for better long-audio accuracy (transcriber.py:101)
- Word timestamps are enabled by default (transcriber.py:106)
- Model loading includes GPU driver test to fail fast if GPU is unavailable (model_manager.py:92)
- Files over 1GB generate warnings about processing time (audio_processor.py:42)
+- Models are cached globally in `model_instances` dict, shared across requests
+- Model loading includes GPU driver test to fail fast if GPU is unavailable (src/core/model_manager.py:112-114)
+
+### Transcription Settings
+- VAD (Voice Activity Detection) is enabled by default for better long-audio accuracy (src/core/transcriber.py:102)
+- Word timestamps are enabled by default (src/core/transcriber.py:107)
+- Files over 1GB generate warnings about processing time (src/utils/audio_processor.py:42)
 - Default output format is "txt" for REST API, configured via environment variables for MCP server
+
+### Async Job Queue (if enabled)
+- Single worker thread processes jobs sequentially (prevents GPU memory contention)
+- Jobs persist to disk as JSON files in JOB_METADATA_DIR
+- Queue has max size limit (default 100), returns 503 when full
+- Job status polling recommended every 5-10 seconds for LLM agents
+
+## Development Workflow
+
+### Testing GPU Health
+```python
+# Test GPU health check manually
+from src.core.gpu_health import check_gpu_health
+
+status = check_gpu_health(expected_device="cuda")
+print(f"GPU Working: {status.gpu_working}")
+print(f"Device: {status.device_used}")
+print(f"Test Duration: {status.test_duration_seconds}s")
+# Expected: <1s for GPU, 3-10s for CPU
+```
+
+### Testing Job Queue
+```python
+# Test job queue manually
+from src.core.job_queue import JobQueue
+
+queue = JobQueue(max_queue_size=100, metadata_dir="/tmp/jobs")
+queue.start()
+
+# Submit job
+job_info = queue.submit_job(
+    audio_path="/path/to/test.mp3",
+    model_name="large-v3",
+    device="cuda"
+)
+print(f"Job ID: {job_info['job_id']}")
+
+# Poll status
+status = queue.get_job_status(job_info['job_id'])
+print(f"Status: {status['status']}")
+
+# Get result when completed
+result = queue.get_job_result(job_info['job_id'])
+```
+
+### Common Debugging
+
+**Model loading issues:**
+- Check `WHISPER_MODEL_DIR` is set correctly
+- Verify GPU memory with `nvidia-smi`
+- Check logs for GPU driver test failures at model_manager.py:112-114
+
+**GPU not detected:**
+- Verify `CUDA_VISIBLE_DEVICES` is set correctly
+- Check `torch.cuda.is_available()` returns True
+- Run GPU health check to see detailed error
+
+**Silent failures:**
+- Check that service is NOT silently falling back to CPU
+- GPU health check should RAISE errors, not log warnings
+- If device=cuda fails, the job should be rejected, not processed on CPU
+
+**Job queue issues:**
+- Check `JOB_METADATA_DIR` exists and is writable
+- Verify background worker thread is running (check logs)
+- Job metadata files are in {JOB_METADATA_DIR}/{job_id}.json
+
+### File Locations
+
+- **Logs**: `mcp.logs` (MCP server), `api.logs` (API server)
+- **Models**: `$WHISPER_MODEL_DIR` or HuggingFace cache
+- **Outputs**: `$TRANSCRIPTION_OUTPUT_DIR` or `$TRANSCRIPTION_BATCH_OUTPUT_DIR`
+- **Job Metadata**: `$JOB_METADATA_DIR/{job_id}.json`
+
+### Important Development Notes
+
+- See `DEV_PLAN.md` for detailed architecture and implementation plan for async job queue features
+- The service is designed for GPU-only operation - CPU fallback is intentionally disabled to prevent silent performance degradation
+- When modifying model_manager.py, maintain the strict GPU-only enforcement
+- When adding new endpoints, follow the async pattern if transcription time >30 seconds
--- a/11
+++ b/11
@@ -25,7 +25,7 @@ RUN python -m pip install --upgrade pip
 WORKDIR /app

 # Copy requirements first for better caching
-COPY fast-whisper-mcp-server/requirements.txt .
+COPY requirements.txt .

 # Install Python dependencies with CUDA support
 RUN pip install --no-cache-dir \
@@ -35,11 +35,16 @@ RUN pip install --no-cache-dir \
    mcp[cli]

 # Copy application code
-COPY fast-whisper-mcp-server/ .
+COPY src/ ./src/
+COPY pyproject.toml .
+COPY README.md .

 # Create directories for models and outputs
 RUN mkdir -p /models /outputs

+# Set Python path
+ENV PYTHONPATH=/app/src
+
 # Set environment variables for GPU
 ENV WHISPER_MODEL_DIR=/models
 ENV TRANSCRIPTION_OUTPUT_DIR=/outputs
@@ -48,4 +53,4 @@ ENV TRANSCRIPTION_DEVICE=cuda
 ENV TRANSCRIPTION_COMPUTE_TYPE=float16

 # Run the server
-CMD ["python", "whisper_server.py"]
+CMD ["python", "src/servers/whisper_server.py"]
--- a/README.md
+++ b/README.md
@@ -1,163 +0,0 @@
-# Whisper Speech Recognition MCP Server
---
-[中文文档](README-CN.md)
---
-A high-performance speech recognition MCP server based on Faster Whisper, providing efficient audio transcription capabilities.
-
-## Features
-
- Integrated with Faster Whisper for efficient speech recognition
- Batch processing acceleration for improved transcription speed
- Automatic CUDA acceleration (if available)
- Support for multiple model sizes (tiny to large-v3)
- Output formats include VTT subtitles, SRT, and JSON
- Support for batch transcription of audio files in a folder
- Model instance caching to avoid repeated loading
- Dynamic batch size adjustment based on GPU memory
-
-## Installation
-
-### Dependencies
-
- Python 3.10+
- faster-whisper>=0.9.0
- torch==2.6.0+cu126
- torchaudio==2.6.0+cu126
- mcp[cli]>=1.2.0
-
-### Installation Steps
-
-1. Clone or download this repository
-2. Create and activate a virtual environment (recommended)
-3. Install dependencies:
-
-```bash
-pip install -r requirements.txt
-```
-
-### PyTorch Installation Guide
-
-Install the appropriate version of PyTorch based on your CUDA version:
-
- CUDA 12.6:
-  ```bash
-  pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
-  ```
-
- CUDA 12.1:
-  ```bash
-  pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
-  ```
-
- CPU version:
-  ```bash
-  pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cpu
-  ```
-
-You can check your CUDA version with `nvcc --version` or `nvidia-smi`.
-
-## Usage
-
-### Starting the Server
-
-On Windows, simply run `start_server.bat`.
-
-On other platforms, run:
-
-```bash
-python whisper_server.py
-```
-
-### Configuring Claude Desktop
-
-1. Open the Claude Desktop configuration file:
-   - Windows: `%APPDATA%\Claude\claude_desktop_config.json`
-   - macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
-
-2. Add the Whisper server configuration:
-
-```json
-{
-  "mcpServers": {
-    "whisper": {
-      "command": "python",
-      "args": ["D:/path/to/whisper_server.py"],
-      "env": {}
-    }
-  }
-}
-```
-
-3. Restart Claude Desktop
-
-### Available Tools
-
-The server provides the following tools:
-
-1. **get_model_info** - Get information about available Whisper models
-2. **transcribe** - Transcribe a single audio file
-3. **batch_transcribe** - Batch transcribe audio files in a folder
-
-## Performance Optimization Tips
-
- Using CUDA acceleration significantly improves transcription speed
- Batch processing mode is more efficient for large numbers of short audio files
- Batch size is automatically adjusted based on GPU memory size
- Using VAD (Voice Activity Detection) filtering improves accuracy for long audio
- Specifying the correct language can improve transcription quality
-
-## Local Testing Methods
-
-1. Use MCP Inspector for quick testing:
-
-```bash
-mcp dev whisper_server.py
-```
-
-2. Use Claude Desktop for integration testing
-
-3. Use command line direct invocation (requires mcp[cli]):
-
-```bash
-mcp run whisper_server.py
-```
-
-## Error Handling
-
-The server implements the following error handling mechanisms:
-
- Audio file existence check
- Model loading failure handling
- Transcription process exception catching
- GPU memory management
- Batch processing parameter adaptive adjustment
-
-## Project Structure
-
- `whisper_server.py`: Main server code
- `model_manager.py`: Whisper model loading and caching
- `audio_processor.py`: Audio file validation and preprocessing
- `formatters.py`: Output formatting (VTT, SRT, JSON)
- `transcriber.py`: Core transcription logic
- `start_server.bat`: Windows startup script
-
-## License
-
-MIT
-
-## Acknowledgements
-
-This project was developed with the assistance of these amazing AI tools and models:
-
- [GitHub Copilot](https://github.com/features/copilot) - AI pair programmer
- [Trae](https://trae.ai/) - Agentic AI coding assistant
- [Cline](https://cline.ai/) - AI-powered terminal
- [DeepSeek](https://www.deepseek.com/) - Advanced AI model
- [Claude-3.7-Sonnet](https://www.anthropic.com/claude) - Anthropic's powerful AI assistant
- [Gemini-2.0-Flash](https://ai.google/gemini/) - Google's multimodal AI model
- [VS Code](https://code.visualstudio.com/) - Powerful code editor
- [Whisper](https://github.com/openai/whisper) - OpenAI's speech recognition model
- [Faster Whisper](https://github.com/guillaumekln/faster-whisper) - Optimized Whisper implementation
-
-Special thanks to these incredible tools and the teams behind them.
-
--- a/init.py
+++ b/init.py
@@ -1,5 +0,0 @@
-"""
-语音识别MCP服务模块
-"""
-
-__version__ = "0.1.0"
--- a/api.logs
+++ b/api.logs
@@ -0,0 +1,85 @@
+INFO:__main__:======================================================================
+INFO:__main__:PERFORMING STARTUP GPU HEALTH CHECK
+INFO:__main__:======================================================================
+INFO:faster_whisper:Processing audio with duration 00:01.512
+INFO:faster_whisper:Detected language 'en' with probability 0.95
+INFO:core.gpu_health:GPU health check passed: NVIDIA GeForce RTX 3060, test duration: 1.04s
+INFO:__main__:======================================================================
+INFO:__main__:STARTUP GPU CHECK SUCCESSFUL
+INFO:__main__:GPU Device: NVIDIA GeForce RTX 3060
+INFO:__main__:Memory Available: 11.66 GB
+INFO:__main__:Test Duration: 1.04s
+INFO:__main__:======================================================================
+INFO:__main__:Starting Whisper REST API server on 0.0.0.0:8000
+INFO:     Started server process [69821]
+INFO:     Waiting for application startup.
+INFO:__main__:Starting job queue and health monitor...
+INFO:core.job_queue:Starting job queue (max size: 100)
+INFO:core.job_queue:Loading jobs from /media/raid/agents/tools/mcp-transcriptor/outputs/jobs
+INFO:core.job_queue:Loaded 8 jobs from disk
+INFO:core.job_queue:Job queue worker loop started
+INFO:core.job_queue:Job queue worker started
+INFO:__main__:Job queue started (max_size=100, metadata_dir=/media/raid/agents/tools/mcp-transcriptor/outputs/jobs)
+INFO:core.gpu_health:Starting GPU health monitor (interval: 10.0 minutes)
+INFO:faster_whisper:Processing audio with duration 00:01.512
+INFO:faster_whisper:Detected language 'en' with probability 0.95
+INFO:core.gpu_health:GPU health check passed: NVIDIA GeForce RTX 3060, test duration: 0.37s
+INFO:__main__:GPU health monitor started (interval=10 minutes)
+INFO:     Application startup complete.
+INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
+INFO:     127.0.0.1:48092 - "GET /jobs HTTP/1.1" 200 OK
+INFO:     127.0.0.1:60874 - "GET /jobs?status=completed&limit=3 HTTP/1.1" 200 OK
+INFO:     127.0.0.1:60876 - "GET /jobs?status=failed&limit=10 HTTP/1.1" 200 OK
+INFO:core.job_queue:Running GPU health check before job submission
+INFO:faster_whisper:Processing audio with duration 00:01.512
+INFO:faster_whisper:Detected language 'en' with probability 0.95
+INFO:core.gpu_health:GPU health check passed: NVIDIA GeForce RTX 3060, test duration: 0.39s
+INFO:core.job_queue:GPU health check passed
+INFO:core.job_queue:Job 6be8e49a-bdc1-4508-af99-280bef033cb0 submitted: /tmp/whisper_test_voice_1s.mp3 (queue position: 1)
+INFO:     127.0.0.1:58376 - "POST /jobs HTTP/1.1" 200 OK
+INFO:core.job_queue:Job 6be8e49a-bdc1-4508-af99-280bef033cb0 started processing
+INFO:core.model_manager:Running GPU health check with auto-reset before model loading
+INFO:faster_whisper:Processing audio with duration 00:01.512
+INFO:faster_whisper:Detected language 'en' with probability 0.95
+INFO:core.gpu_health:GPU health check passed: NVIDIA GeForce RTX 3060, test duration: 0.54s
+INFO:core.model_manager:Loading Whisper model: tiny device: cuda compute type: float16
+INFO:core.model_manager:Available GPU memory: 12.52 GB
+INFO:core.model_manager:Enabling batch processing acceleration, batch size: 16
+INFO:core.transcriber:Starting transcription of file: whisper_test_voice_1s.mp3
+INFO:utils.audio_processor:Successfully preprocessed audio: whisper_test_voice_1s.mp3
+INFO:core.transcriber:Using batch acceleration for transcription...
+INFO:faster_whisper:Processing audio with duration 00:01.512
+INFO:faster_whisper:VAD filter removed 00:00.000 of audio
+INFO:faster_whisper:Detected language 'en' with probability 0.95
+INFO:core.transcriber:Transcription completed, time used: 0.16 seconds, detected language: en, audio length: 1.51 seconds
+INFO:core.transcriber:Transcription results saved to: /media/raid/agents/tools/mcp-transcriptor/outputs/whisper_test_voice_1s.txt
+INFO:core.job_queue:Job 6be8e49a-bdc1-4508-af99-280bef033cb0 completed successfully: /media/raid/agents/tools/mcp-transcriptor/outputs/whisper_test_voice_1s.txt
+INFO:core.job_queue:Job 6be8e49a-bdc1-4508-af99-280bef033cb0 finished: status=completed, duration=1.1s
+INFO:     127.0.0.1:41646 - "GET /jobs/6be8e49a-bdc1-4508-af99-280bef033cb0 HTTP/1.1" 200 OK
+INFO:     127.0.0.1:34046 - "GET /jobs/6be8e49a-bdc1-4508-af99-280bef033cb0/result HTTP/1.1" 200 OK
+INFO:core.job_queue:Running GPU health check before job submission
+INFO:faster_whisper:Processing audio with duration 00:01.512
+INFO:faster_whisper:Detected language 'en' with probability 0.95
+INFO:core.gpu_health:GPU health check passed: NVIDIA GeForce RTX 3060, test duration: 0.39s
+INFO:core.job_queue:GPU health check passed
+INFO:core.job_queue:Job 41ce74c0-8929-457b-96b3-1b8e4a720a7a submitted: /home/uad/agents/tools/mcp-transcriptor/data/test.mp3 (queue position: 1)
+INFO:     127.0.0.1:44576 - "POST /jobs HTTP/1.1" 200 OK
+INFO:core.job_queue:Job 41ce74c0-8929-457b-96b3-1b8e4a720a7a started processing
+INFO:core.model_manager:Running GPU health check with auto-reset before model loading
+INFO:faster_whisper:Processing audio with duration 00:01.512
+INFO:faster_whisper:Detected language 'en' with probability 0.95
+INFO:core.gpu_health:GPU health check passed: NVIDIA GeForce RTX 3060, test duration: 0.39s
+INFO:core.model_manager:Loading Whisper model: large-v3 device: cuda compute type: float16
+INFO:core.model_manager:Available GPU memory: 12.52 GB
+INFO:core.model_manager:Enabling batch processing acceleration, batch size: 16
+INFO:core.transcriber:Starting transcription of file: test.mp3
+INFO:utils.audio_processor:Successfully preprocessed audio: test.mp3
+INFO:core.transcriber:Using batch acceleration for transcription...
+INFO:faster_whisper:Processing audio with duration 00:06.955
+INFO:faster_whisper:VAD filter removed 00:00.299 of audio
+INFO:core.transcriber:Transcription completed, time used: 0.52 seconds, detected language: en, audio length: 6.95 seconds
+INFO:core.transcriber:Transcription results saved to: /media/raid/agents/tools/mcp-transcriptor/outputs/test.txt
+INFO:core.job_queue:Job 41ce74c0-8929-457b-96b3-1b8e4a720a7a completed successfully: /media/raid/agents/tools/mcp-transcriptor/outputs/test.txt
+INFO:core.job_queue:Job 41ce74c0-8929-457b-96b3-1b8e4a720a7a finished: status=completed, duration=23.3s
+INFO:     127.0.0.1:59120 - "GET /jobs/41ce74c0-8929-457b-96b3-1b8e4a720a7a HTTP/1.1" 200 OK
+INFO:     127.0.0.1:53806 - "GET /jobs/41ce74c0-8929-457b-96b3-1b8e4a720a7a/result HTTP/1.1" 200 OK
--- a/api_server.py
+++ b/api_server.py
@@ -1,286 +0,0 @@
-#!/usr/bin/env python3
-"""
-FastAPI REST API Server for Whisper Transcription
-Provides HTTP REST endpoints for audio transcription
-"""
-
-import os
-import logging
-from typing import Optional
-from fastapi import FastAPI, HTTPException, UploadFile, File, Form
-from fastapi.responses import JSONResponse, FileResponse
-from pydantic import BaseModel, Field
-import json
-
-from model_manager import get_model_info
-from transcriber import transcribe_audio, batch_transcribe
-
-# Logging configuration
-logging.basicConfig(level=logging.INFO)
-logger = logging.getLogger(__name__)
-
-# Create FastAPI app
-app = FastAPI(
-    title="Whisper Speech Recognition API",
-    description="High-performance audio transcription API based on Faster Whisper",
-    version="0.1.1"
-)
-
-
-# Request/Response Models
-class TranscribeRequest(BaseModel):
-    audio_path: str = Field(..., description="Path to the audio file on the server")
-    model_name: str = Field("large-v3", description="Whisper model name")
-    device: str = Field("auto", description="Execution device (cpu, cuda, auto)")
-    compute_type: str = Field("auto", description="Computation type (float16, int8, auto)")
-    language: Optional[str] = Field(None, description="Language code (zh, en, ja, etc.)")
-    output_format: str = Field("txt", description="Output format (vtt, srt, json, txt)")
-    beam_size: int = Field(5, description="Beam search size")
-    temperature: float = Field(0.0, description="Sampling temperature")
-    initial_prompt: Optional[str] = Field(None, description="Initial prompt text")
-    output_directory: Optional[str] = Field(None, description="Output directory path")
-
-
-class BatchTranscribeRequest(BaseModel):
-    audio_folder: str = Field(..., description="Path to folder containing audio files")
-    output_folder: Optional[str] = Field(None, description="Output folder path")
-    model_name: str = Field("large-v3", description="Whisper model name")
-    device: str = Field("auto", description="Execution device (cpu, cuda, auto)")
-    compute_type: str = Field("auto", description="Computation type (float16, int8, auto)")
-    language: Optional[str] = Field(None, description="Language code (zh, en, ja, etc.)")
-    output_format: str = Field("txt", description="Output format (vtt, srt, json, txt)")
-    beam_size: int = Field(5, description="Beam search size")
-    temperature: float = Field(0.0, description="Sampling temperature")
-    initial_prompt: Optional[str] = Field(None, description="Initial prompt text")
-    parallel_files: int = Field(1, description="Number of files to process in parallel")
-
-
-class TranscribeResponse(BaseModel):
-    success: bool
-    message: str
-    output_path: Optional[str] = None
-
-
-class BatchTranscribeResponse(BaseModel):
-    success: bool
-    summary: str
-
-
-# API Endpoints
-
-@app.get("/")
-async def root():
-    """Root endpoint with API information"""
-    return {
-        "name": "Whisper Speech Recognition API",
-        "version": "0.1.1",
-        "endpoints": {
-            "GET /health": "Health check",
-            "GET /models": "Get available models information",
-            "POST /transcribe": "Transcribe a single audio file",
-            "POST /batch-transcribe": "Batch transcribe audio files",
-            "POST /upload-transcribe": "Upload and transcribe audio file"
-        }
-    }
-
-
-@app.get("/health")
-async def health_check():
-    """Health check endpoint"""
-    return {"status": "healthy", "service": "whisper-transcription"}
-
-
-@app.get("/models")
-async def get_models():
-    """Get available Whisper models and configuration information"""
-    try:
-        model_info = get_model_info()
-        return JSONResponse(content=json.loads(model_info))
-    except Exception as e:
-        logger.error(f"Failed to get model info: {str(e)}")
-        raise HTTPException(status_code=500, detail=f"Failed to get model info: {str(e)}")
-
-
-@app.post("/transcribe", response_model=TranscribeResponse)
-async def transcribe(request: TranscribeRequest):
-    """
-    Transcribe a single audio file
-
-    The audio file must already exist on the server at the specified path.
-    """
-    try:
-        logger.info(f"Received transcription request for: {request.audio_path}")
-
-        result = transcribe_audio(
-            audio_path=request.audio_path,
-            model_name=request.model_name,
-            device=request.device,
-            compute_type=request.compute_type,
-            language=request.language,
-            output_format=request.output_format,
-            beam_size=request.beam_size,
-            temperature=request.temperature,
-            initial_prompt=request.initial_prompt,
-            output_directory=request.output_directory
-        )
-
-        # Parse result to determine success
-        if result.startswith("Error") or "failed" in result.lower():
-            return TranscribeResponse(
-                success=False,
-                message=result,
-                output_path=None
-            )
-
-        # Extract output path from success message
-        output_path = None
-        if "saved to:" in result:
-            output_path = result.split("saved to:")[1].strip()
-
-        return TranscribeResponse(
-            success=True,
-            message=result,
-            output_path=output_path
-        )
-
-    except Exception as e:
-        logger.error(f"Transcription failed: {str(e)}")
-        raise HTTPException(status_code=500, detail=f"Transcription failed: {str(e)}")
-
-
-@app.post("/batch-transcribe", response_model=BatchTranscribeResponse)
-async def batch_transcribe_endpoint(request: BatchTranscribeRequest):
-    """
-    Batch transcribe all audio files in a folder
-
-    Processes all supported audio files in the specified folder.
-    """
-    try:
-        logger.info(f"Received batch transcription request for: {request.audio_folder}")
-
-        result = batch_transcribe(
-            audio_folder=request.audio_folder,
-            output_folder=request.output_folder,
-            model_name=request.model_name,
-            device=request.device,
-            compute_type=request.compute_type,
-            language=request.language,
-            output_format=request.output_format,
-            beam_size=request.beam_size,
-            temperature=request.temperature,
-            initial_prompt=request.initial_prompt,
-            parallel_files=request.parallel_files
-        )
-
-        # Check if there were errors
-        success = not result.startswith("Error")
-
-        return BatchTranscribeResponse(
-            success=success,
-            summary=result
-        )
-
-    except Exception as e:
-        logger.error(f"Batch transcription failed: {str(e)}")
-        raise HTTPException(status_code=500, detail=f"Batch transcription failed: {str(e)}")
-
-
-@app.post("/upload-transcribe")
-async def upload_and_transcribe(
-    file: UploadFile = File(...),
-    model_name: str = Form("large-v3"),
-    device: str = Form("auto"),
-    compute_type: str = Form("auto"),
-    language: Optional[str] = Form(None),
-    output_format: str = Form("txt"),
-    beam_size: int = Form(5),
-    temperature: float = Form(0.0),
-    initial_prompt: Optional[str] = Form(None)
-):
-    """
-    Upload an audio file and transcribe it
-
-    This endpoint accepts file uploads via multipart/form-data.
-    """
-    import tempfile
-    import shutil
-
-    try:
-        # Create temporary directory for upload
-        temp_dir = tempfile.mkdtemp(prefix="whisper_upload_")
-
-        # Save uploaded file
-        file_ext = os.path.splitext(file.filename)[1]
-        temp_audio_path = os.path.join(temp_dir, f"upload{file_ext}")
-
-        with open(temp_audio_path, "wb") as buffer:
-            shutil.copyfileobj(file.file, buffer)
-
-        logger.info(f"Uploaded file saved to: {temp_audio_path}")
-
-        # Transcribe the uploaded file
-        result = transcribe_audio(
-            audio_path=temp_audio_path,
-            model_name=model_name,
-            device=device,
-            compute_type=compute_type,
-            language=language,
-            output_format=output_format,
-            beam_size=beam_size,
-            temperature=temperature,
-            initial_prompt=initial_prompt,
-            output_directory=temp_dir
-        )
-
-        # Parse result
-        if result.startswith("Error") or "failed" in result.lower():
-            # Clean up temp files
-            shutil.rmtree(temp_dir, ignore_errors=True)
-            raise HTTPException(status_code=500, detail=result)
-
-        # Extract output path
-        output_path = None
-        if "saved to:" in result:
-            output_path = result.split("saved to:")[1].strip()
-
-        # Return the transcription file
-        if output_path and os.path.exists(output_path):
-            return FileResponse(
-                output_path,
-                media_type="text/plain",
-                filename=os.path.basename(output_path),
-                background=None  # Don't delete yet, we'll clean up after
-            )
-        else:
-            # Clean up temp files
-            shutil.rmtree(temp_dir, ignore_errors=True)
-            return JSONResponse(content={
-                "success": True,
-                "message": result
-            })
-
-    except Exception as e:
-        logger.error(f"Upload and transcribe failed: {str(e)}")
-        # Clean up temp files on error
-        if 'temp_dir' in locals():
-            shutil.rmtree(temp_dir, ignore_errors=True)
-        raise HTTPException(status_code=500, detail=f"Upload and transcribe failed: {str(e)}")
-    finally:
-        await file.close()
-
-
-if __name__ == "__main__":
-    import uvicorn
-
-    # Get configuration from environment variables
-    host = os.getenv("API_HOST", "0.0.0.0")
-    port = int(os.getenv("API_PORT", "8000"))
-
-    logger.info(f"Starting Whisper REST API server on {host}:{port}")
-
-    uvicorn.run(
-        app,
-        host=host,
-        port=port,
-        log_level="info"
-    )
--- a/audio_processor.py
+++ b/audio_processor.py
@@ -1,67 +0,0 @@
-#!/usr/bin/env python3
-"""
-Audio Processing Module
-Responsible for audio file validation and preprocessing
-"""
-
-import os
-import logging
-from typing import Union, Any
-from faster_whisper import decode_audio
-
-# Log configuration
-logger = logging.getLogger(__name__)
-
-def validate_audio_file(audio_path: str) -> str:
-    """
-    Validate if an audio file is valid
-
-    Args:
-        audio_path: Path to the audio file
-
-    Returns:
-        str: Validation result, "ok" indicates validation passed, otherwise returns error message
-    """
-    # Validate parameters
-    if not os.path.exists(audio_path):
-        return f"Error: Audio file does not exist: {audio_path}"
-
-    # Validate file format
-    supported_formats = [".mp3", ".wav", ".m4a", ".flac", ".ogg", ".aac"]
-    file_ext = os.path.splitext(audio_path)[1].lower()
-    if file_ext not in supported_formats:
-        return f"Error: Unsupported audio format: {file_ext}. Supported formats: {', '.join(supported_formats)}"
-
-    # Validate file size
-    try:
-        file_size = os.path.getsize(audio_path)
-        if file_size == 0:
-            return f"Error: Audio file is empty: {audio_path}"
-
-        # Warning for large files (over 1GB)
-        if file_size > 1024 * 1024 * 1024:
-            logger.warning(f"Warning: File size exceeds 1GB, may require longer processing time: {audio_path}")
-    except Exception as e:
-        logger.error(f"Failed to check file size: {str(e)}")
-        return f"Error: Failed to check file size: {str(e)}"
-
-    return "ok"
-
-def process_audio(audio_path: str) -> Union[str, Any]:
-    """
-    Process audio file, perform decoding and preprocessing
-
-    Args:
-        audio_path: Path to the audio file
-
-    Returns:
-        Union[str, Any]: Processed audio data or original file path
-    """
-    # Try to preprocess audio using decode_audio to handle more formats
-    try:
-        audio_data = decode_audio(audio_path)
-        logger.info(f"Successfully preprocessed audio: {os.path.basename(audio_path)}")
-        return audio_data
-    except Exception as audio_error:
-        logger.warning(f"Audio preprocessing failed, will use file path directly: {str(audio_error)}")
-        return audio_path
--- a/mcp.logs
+++ b/mcp.logs
@@ -0,0 +1,25 @@
+starting mcp server for whisper stt transcriptor
+INFO:__main__:======================================================================
+INFO:__main__:PERFORMING STARTUP GPU HEALTH CHECK
+INFO:__main__:======================================================================
+INFO:faster_whisper:Processing audio with duration 00:01.512
+INFO:faster_whisper:Detected language 'en' with probability 0.95
+INFO:core.gpu_health:GPU health check passed: NVIDIA GeForce RTX 3060, test duration: 0.93s
+INFO:__main__:======================================================================
+INFO:__main__:STARTUP GPU CHECK SUCCESSFUL
+INFO:__main__:GPU Device: NVIDIA GeForce RTX 3060
+INFO:__main__:Memory Available: 11.66 GB
+INFO:__main__:Test Duration: 0.93s
+INFO:__main__:======================================================================
+INFO:__main__:Initializing job queue...
+INFO:core.job_queue:Starting job queue (max size: 100)
+INFO:core.job_queue:Loading jobs from /media/raid/agents/tools/mcp-transcriptor/outputs/jobs
+INFO:core.job_queue:Loaded 5 jobs from disk
+INFO:core.job_queue:Job queue worker loop started
+INFO:core.job_queue:Job queue worker started
+INFO:__main__:Job queue started (max_size=100, metadata_dir=/media/raid/agents/tools/mcp-transcriptor/outputs/jobs)
+INFO:core.gpu_health:Starting GPU health monitor (interval: 10.0 minutes)
+INFO:faster_whisper:Processing audio with duration 00:01.512
+INFO:faster_whisper:Detected language 'en' with probability 0.95
+INFO:core.gpu_health:GPU health check passed: NVIDIA GeForce RTX 3060, test duration: 0.38s
+INFO:__main__:GPU health monitor started (interval=10 minutes)
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,9 +1,12 @@
 [project]
 name = "fast-whisper-mcp-server"
-version = "0.1.0"
-description = "Add your description here"
+version = "0.1.1"
+description = "High-performance speech recognition service with MCP and REST API servers"
 readme = "README.md"
 requires-python = ">=3.12"
 dependencies = [
    "faster-whisper>=1.1.1",
 ]
+
+[tool.setuptools]
+packages = ["src"]
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,9 +1,9 @@
-# uv pip install -r ./requirements.txt --index-url https://download.pytorch.org/whl/cu126
+# uv pip install -r ./requirements.txt --index-url https://download.pytorch.org/whl/cu124
 faster-whisper
-torch #==2.6.0+cu126
-torchaudio #==2.6.0+cu126
+torch #==2.6.0+cu124
+torchaudio #==2.6.0+cu124

-# uv pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
+# uv pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
 # pip install faster-whisper>=0.9.0
 # pip install mcp[cli]>=1.2.0
 mcp[cli]
@@ -13,12 +13,21 @@ fastapi>=0.115.0
 uvicorn[standard]>=0.32.0
 python-multipart>=0.0.9

+# Test audio generation dependencies
+gTTS>=2.3.0
+pyttsx3>=2.90
+scipy>=1.10.0
+numpy>=1.24.0
+
 # PyTorch Installation Guide:
 #    Please install the appropriate version of PyTorch based on your CUDA version:
 #
 #    • CUDA 12.6:
 #      pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
 #
+#    • CUDA 12.4:
+#      pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
+#
 #    • CUDA 12.1:
 #      pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
 #
--- a/reset_gpu.sh
+++ b/reset_gpu.sh
@@ -0,0 +1,70 @@
+#!/bin/bash
+
+# Script to reset NVIDIA GPU drivers without rebooting
+# This reloads kernel modules and restarts nvidia-persistenced service
+
+echo "============================================================"
+echo "NVIDIA GPU Driver Reset Script"
+echo "============================================================"
+echo ""
+
+# Stop nvidia-persistenced service
+echo "Stopping nvidia-persistenced service..."
+sudo systemctl stop nvidia-persistenced
+if [ $? -eq 0 ]; then
+    echo "✓ nvidia-persistenced stopped"
+else
+    echo "✗ Failed to stop nvidia-persistenced"
+    exit 1
+fi
+echo ""
+
+# Unload NVIDIA kernel modules (in correct order)
+echo "Unloading NVIDIA kernel modules..."
+sudo rmmod nvidia_uvm 2>/dev/null && echo "✓ nvidia_uvm unloaded" || echo "  nvidia_uvm not loaded or failed to unload"
+sudo rmmod nvidia_drm 2>/dev/null && echo "✓ nvidia_drm unloaded" || echo "  nvidia_drm not loaded or failed to unload"
+sudo rmmod nvidia_modeset 2>/dev/null && echo "✓ nvidia_modeset unloaded" || echo "  nvidia_modeset not loaded or failed to unload"
+sudo rmmod nvidia 2>/dev/null && echo "✓ nvidia unloaded" || echo "  nvidia not loaded or failed to unload"
+echo ""
+
+# Small delay to ensure clean unload
+sleep 1
+
+# Reload NVIDIA kernel modules (in correct order)
+echo "Loading NVIDIA kernel modules..."
+sudo modprobe nvidia && echo "✓ nvidia loaded" || { echo "✗ Failed to load nvidia"; exit 1; }
+sudo modprobe nvidia_modeset && echo "✓ nvidia_modeset loaded" || { echo "✗ Failed to load nvidia_modeset"; exit 1; }
+sudo modprobe nvidia_uvm && echo "✓ nvidia_uvm loaded" || { echo "✗ Failed to load nvidia_uvm"; exit 1; }
+sudo modprobe nvidia_drm && echo "✓ nvidia_drm loaded" || { echo "✗ Failed to load nvidia_drm"; exit 1; }
+echo ""
+
+# Restart nvidia-persistenced service
+echo "Starting nvidia-persistenced service..."
+sudo systemctl start nvidia-persistenced
+if [ $? -eq 0 ]; then
+    echo "✓ nvidia-persistenced started"
+else
+    echo "✗ Failed to start nvidia-persistenced"
+    exit 1
+fi
+echo ""
+
+# Verify GPU is accessible
+echo "Verifying GPU accessibility..."
+if command -v nvidia-smi &> /dev/null; then
+    nvidia-smi --query-gpu=name,memory.total --format=csv,noheader
+    if [ $? -eq 0 ]; then
+        echo "✓ GPU reset successful"
+    else
+        echo "✗ GPU not accessible"
+        exit 1
+    fi
+else
+    echo "✗ nvidia-smi not found"
+    exit 1
+fi
+echo ""
+
+echo "============================================================"
+echo "GPU driver reset completed successfully"
+echo "============================================================"
--- a/run_api_server.sh
+++ b/run_api_server.sh
@@ -5,6 +5,12 @@ datetime_prefix() {
  date "+[%Y-%m-%d %H:%M:%S]"
 }

+# Set Python path to include src directory
+export PYTHONPATH="/home/uad/agents/tools/mcp-transcriptor/src:$PYTHONPATH"
+
+# Set CUDA library path
+export LD_LIBRARY_PATH=/usr/local/cuda-12.4/targets/x86_64-linux/lib:$LD_LIBRARY_PATH
+
 # Set environment variables
 export CUDA_VISIBLE_DEVICES=1
 export WHISPER_MODEL_DIR="/home/uad/agents/tools/mcp-transcriptor/data/models"
@@ -23,6 +29,19 @@ export TRANSCRIPTION_FILENAME_PREFIX=""
 export API_HOST="0.0.0.0"
 export API_PORT="8000"

+# GPU Auto-Reset Configuration
+export GPU_RESET_COOLDOWN_MINUTES=5  # Minimum time between GPU reset attempts
+
+# Job Queue Configuration
+export JOB_QUEUE_MAX_SIZE=100
+export JOB_METADATA_DIR="/media/raid/agents/tools/mcp-transcriptor/outputs/jobs"
+export JOB_RETENTION_DAYS=7
+
+# GPU Health Monitoring
+export GPU_HEALTH_CHECK_ENABLED=true
+export GPU_HEALTH_CHECK_INTERVAL_MINUTES=10
+export GPU_HEALTH_TEST_MODEL="tiny"
+
 # Log start of the script
 echo "$(datetime_prefix) Starting Whisper REST API server..."
 echo "$(datetime_prefix) Model directory: $WHISPER_MODEL_DIR"
@@ -37,6 +56,7 @@ fi
 # Ensure output directories exist
 mkdir -p "$TRANSCRIPTION_OUTPUT_DIR"
 mkdir -p "$TRANSCRIPTION_BATCH_OUTPUT_DIR"
+mkdir -p "$JOB_METADATA_DIR"

 # Run the API server
-/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/api_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/api.logs
+/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/src/servers/api_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/api.logs
--- a/run_mcp_server.sh
+++ b/run_mcp_server.sh
@@ -9,6 +9,12 @@ datetime_prefix() {
 USER_ID=$(id -u)
 GROUP_ID=$(id -g)

+# Set Python path to include src directory
+export PYTHONPATH="/home/uad/agents/tools/mcp-transcriptor/src:$PYTHONPATH"
+
+# Set CUDA library path
+export LD_LIBRARY_PATH=/usr/local/cuda-12.4/targets/x86_64-linux/lib:$LD_LIBRARY_PATH
+
 # Set environment variables
 export CUDA_VISIBLE_DEVICES=1
 export WHISPER_MODEL_DIR="/home/uad/agents/tools/mcp-transcriptor/data/models"
@@ -23,6 +29,19 @@ export TRANSCRIPTION_TEMPERATURE="0.0"
 export TRANSCRIPTION_USE_TIMESTAMP="false"
 export TRANSCRIPTION_FILENAME_PREFIX="test_"

+# GPU Auto-Reset Configuration
+export GPU_RESET_COOLDOWN_MINUTES=5  # Minimum time between GPU reset attempts
+
+# Job Queue Configuration
+export JOB_QUEUE_MAX_SIZE=100
+export JOB_METADATA_DIR="/media/raid/agents/tools/mcp-transcriptor/outputs/jobs"
+export JOB_RETENTION_DAYS=7
+
+# GPU Health Monitoring
+export GPU_HEALTH_CHECK_ENABLED=true
+export GPU_HEALTH_CHECK_INTERVAL_MINUTES=10
+export GPU_HEALTH_TEST_MODEL="tiny"
+
 # Log start of the script
 echo "$(datetime_prefix) Starting whisper server script..."
 echo "test: $WHISPER_MODEL_DIR"
@@ -33,7 +52,10 @@ if [ ! -d "$WHISPER_MODEL_DIR" ]; then
  exit 1
 fi

+# Ensure job metadata directory exists
+mkdir -p "$JOB_METADATA_DIR"
+
 # Run the Python script with the defined environment variables
 #/home/uad/agents/tools/mcp-transcriptor/venv/bin/python /home/uad/agents/tools/mcp-transcriptor/whisper_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/mcp.logs
-/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/whisper_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/mcp.logs
+/home/uad/agents/tools/mcp-transcriptor/venv/bin/python -u /home/uad/agents/tools/mcp-transcriptor/src/servers/whisper_server.py 2>&1 | tee /home/uad/agents/tools/mcp-transcriptor/mcp.logs

--- a/src/init.py
+++ b/src/init.py
@@ -0,0 +1,9 @@
+"""
+Faster Whisper MCP Transcription Service
+
+High-performance audio transcription service with dual-server architecture
+(MCP and REST API) featuring async job queue and GPU health monitoring.
+"""
+
+__version__ = "0.2.0"
+__author__ = "Whisper MCP Team"
--- a/src/core/init.py
+++ b/src/core/init.py
@@ -0,0 +1,6 @@
+"""
+Core modules for Whisper transcription service.
+
+Includes model management, transcription logic, job queue, GPU health monitoring,
+and GPU reset functionality.
+"""
--- a/src/core/gpu_health.py
+++ b/src/core/gpu_health.py
@@ -0,0 +1,467 @@
+"""
+GPU health monitoring for Whisper transcription service.
+
+Performs real GPU health checks using actual model loading and transcription,
+with strict failure handling to prevent silent CPU fallbacks.
+Includes circuit breaker pattern to prevent repeated failed checks.
+"""
+
+import time
+import logging
+import threading
+from dataclasses import dataclass, asdict
+from datetime import datetime
+from typing import Optional, List
+import torch
+
+from utils.test_audio_generator import generate_test_audio
+from utils.circuit_breaker import CircuitBreaker, CircuitBreakerOpen
+
+logger = logging.getLogger(__name__)
+
+# Global circuit breaker for GPU health checks
+_gpu_health_circuit_breaker = CircuitBreaker(
+    name="gpu_health_check",
+    failure_threshold=3,      # Open after 3 consecutive failures
+    success_threshold=2,      # Close after 2 consecutive successes
+    timeout_seconds=60,       # Try again after 60 seconds
+    half_open_max_calls=1     # Only 1 test call in half-open state
+)
+
+# Import reset functionality (after logger initialization)
+try:
+    from core.gpu_reset import (
+        reset_gpu_drivers,
+        can_attempt_reset,
+        record_reset_attempt
+    )
+    GPU_RESET_AVAILABLE = True
+except ImportError as e:
+    logger.warning(f"GPU reset functionality not available: {e}")
+    GPU_RESET_AVAILABLE = False
+
+
+@dataclass
+class GPUHealthStatus:
+    """Data class for GPU health check results."""
+    gpu_available: bool          # torch.cuda.is_available()
+    gpu_working: bool           # Model actually loaded and ran on GPU
+    device_used: str            # "cuda" or "cpu"
+    device_name: str            # GPU name if available
+    memory_total_gb: float      # Total GPU memory
+    memory_available_gb: float  # Available GPU memory
+    test_duration_seconds: float # How long test took
+    timestamp: str              # ISO timestamp
+    error: Optional[str] = None # Error message if any
+
+    def to_dict(self) -> dict:
+        """Convert to dictionary for JSON serialization."""
+        return asdict(self)
+
+
+def _check_gpu_health_internal(expected_device: str = "auto") -> GPUHealthStatus:
+    """
+    Comprehensive GPU health check using real model + transcription.
+
+    Args:
+        expected_device: Expected device ("auto", "cuda", "cpu")
+
+    Returns:
+        GPUHealthStatus object
+
+    Raises:
+        RuntimeError: If expected_device="cuda" but GPU test fails
+
+    Implementation:
+        1. Generate test audio (1 second)
+        2. Load tiny model with requested device
+        3. Transcribe test audio
+        4. Time the operation
+        5. Verify model actually ran on GPU (check torch.cuda.memory_allocated)
+        6. CRITICAL: If expected_device="cuda" but used="cpu" → raise RuntimeError
+        7. Return detailed status
+
+    Performance Expectations:
+        - GPU (tiny model): 0.3-1.0 seconds
+        - CPU (tiny model): 3-10 seconds
+        - If GPU test takes >2 seconds, likely running on CPU
+    """
+    timestamp = datetime.utcnow().isoformat() + "Z"
+    start_time = time.time()
+
+    # Initialize status with defaults
+    gpu_available = torch.cuda.is_available()
+    device_name = ""
+    memory_total_gb = 0.0
+    memory_available_gb = 0.0
+    actual_device = "cpu"
+    gpu_working = False
+    error_msg = None
+
+    # Get GPU info if available
+    if gpu_available:
+        try:
+            device_name = torch.cuda.get_device_name(0)
+            memory_total_gb = torch.cuda.get_device_properties(0).total_memory / (1024**3)
+            memory_available_gb = (torch.cuda.get_device_properties(0).total_memory -
+                                  torch.cuda.memory_allocated(0)) / (1024**3)
+        except Exception as e:
+            logger.warning(f"Failed to get GPU info: {e}")
+
+    try:
+        # Generate test audio
+        test_audio_path = generate_test_audio(duration_seconds=1.0)
+
+        # Import here to avoid circular dependencies
+        from faster_whisper import WhisperModel
+
+        # Determine device to use
+        test_device = "cpu"
+        test_compute_type = "int8"
+
+        if expected_device == "cuda" or (expected_device == "auto" and gpu_available):
+            test_device = "cuda"
+            test_compute_type = "float16"
+
+        # Record GPU memory before loading model
+        gpu_memory_before = 0
+        if torch.cuda.is_available():
+            gpu_memory_before = torch.cuda.memory_allocated(0)
+
+        # Load tiny model and transcribe
+        try:
+            model = WhisperModel(
+                "tiny",
+                device=test_device,
+                compute_type=test_compute_type
+            )
+
+            # Transcribe test audio
+            segments, info = model.transcribe(test_audio_path, beam_size=1)
+
+            # Consume segments (needed to actually run inference)
+            list(segments)
+
+            # Check if GPU was actually used
+            # faster-whisper uses CTranslate2 which manages GPU memory separately
+            # from PyTorch, so we check model.model.device instead of torch memory
+            actual_device = model.model.device
+            if actual_device == "cuda":
+                gpu_working = True
+            else:
+                gpu_working = False
+
+        except Exception as e:
+            error_msg = f"Model loading/inference failed: {str(e)}"
+            actual_device = "cpu"
+            gpu_working = False
+            logger.error(f"GPU health check failed: {error_msg}")
+
+    except Exception as e:
+        error_msg = f"Health check setup failed: {str(e)}"
+        logger.error(f"GPU health check error: {error_msg}")
+
+    # Calculate test duration
+    test_duration = time.time() - start_time
+
+    # Create status object
+    status = GPUHealthStatus(
+        gpu_available=gpu_available,
+        gpu_working=gpu_working,
+        device_used=actual_device,
+        device_name=device_name,
+        memory_total_gb=round(memory_total_gb, 2),
+        memory_available_gb=round(memory_available_gb, 2),
+        test_duration_seconds=round(test_duration, 2),
+        timestamp=timestamp,
+        error=error_msg
+    )
+
+    # CRITICAL: Reject if expected CUDA but got CPU
+    # This service is GPU-only, so also reject device="auto" that falls back to CPU
+    if expected_device == "cuda" and actual_device == "cpu":
+        error_message = (
+            f"GPU device requested but model loaded on CPU. "
+            f"This indicates GPU driver issues or insufficient memory. "
+            f"Transcription would be 10-100x slower than expected. "
+            f"Please check CUDA installation and GPU availability. "
+            f"Details: {error_msg or 'Unknown cause'}"
+        )
+        logger.error(error_message)
+        raise RuntimeError(error_message)
+
+    # CRITICAL: Service requires GPU - reject CPU fallback even with device="auto"
+    if expected_device == "auto" and actual_device == "cpu":
+        error_message = (
+            f"GPU required but not available. This service is configured for GPU-only operation. "
+            f"CUDA is not available or GPU health check failed. "
+            f"Please ensure CUDA is properly installed and GPU is accessible. "
+            f"Details: {error_msg or 'CUDA not available'}"
+        )
+        logger.error(error_message)
+        raise RuntimeError(error_message)
+
+    # Log health check result
+    if gpu_working:
+        logger.info(
+            f"GPU health check passed: {device_name}, "
+            f"test duration: {test_duration:.2f}s"
+        )
+    elif gpu_available and not gpu_working:
+        logger.warning(
+            f"GPU available but health check failed. "
+            f"Test duration: {test_duration:.2f}s, Error: {error_msg}"
+        )
+    else:
+        logger.info(f"GPU not available, using CPU. Test duration: {test_duration:.2f}s")
+
+    return status
+
+
+def check_gpu_health(expected_device: str = "auto", use_circuit_breaker: bool = True) -> GPUHealthStatus:
+    """
+    GPU health check with optional circuit breaker protection.
+
+    This is the main entry point for GPU health checks. By default, it uses
+    circuit breaker pattern to prevent repeated failed checks.
+
+    Args:
+        expected_device: Expected device ("auto", "cuda", "cpu")
+        use_circuit_breaker: Enable circuit breaker protection (default: True)
+
+    Returns:
+        GPUHealthStatus object
+
+    Raises:
+        RuntimeError: If expected_device="cuda" but GPU test fails
+        CircuitBreakerOpen: If circuit breaker is open (too many recent failures)
+    """
+    if use_circuit_breaker:
+        try:
+            return _gpu_health_circuit_breaker.call(_check_gpu_health_internal, expected_device)
+        except CircuitBreakerOpen as e:
+            # Circuit is open, fail fast without attempting check
+            logger.warning(f"GPU health check circuit breaker is OPEN: {e}")
+            raise RuntimeError(f"GPU health check unavailable: {str(e)}")
+    else:
+        return _check_gpu_health_internal(expected_device)
+
+
+def get_circuit_breaker_stats() -> dict:
+    """
+    Get current circuit breaker statistics.
+
+    Returns:
+        Dictionary with circuit state and failure/success counts
+    """
+    return _gpu_health_circuit_breaker.get_stats()
+
+
+def reset_circuit_breaker():
+    """
+    Manually reset the GPU health check circuit breaker.
+
+    Useful for:
+    - Testing
+    - Manual intervention after fixing GPU issues
+    - Clearing persistent error state
+    """
+    _gpu_health_circuit_breaker.reset()
+    logger.info("GPU health check circuit breaker manually reset")
+
+
+def check_gpu_health_with_reset(
+    expected_device: str = "cuda",
+    auto_reset: bool = True
+) -> GPUHealthStatus:
+    """
+    Check GPU health with automatic reset on failure.
+
+    This function wraps check_gpu_health() and adds automatic GPU driver
+    reset capability with cooldown protection to prevent reset loops.
+
+    Behavior:
+        1. Attempt GPU health check
+        2. If fails and auto_reset=True:
+           a. Check cooldown (prevents reset loops)
+           b. If cooldown active → raise error immediately
+           c. If cooldown OK → reset drivers → wait 3s → retry
+        3. If retry fails → raise error (terminate service)
+
+    Args:
+        expected_device: Expected device ("auto", "cuda", "cpu")
+        auto_reset: Enable automatic reset on failure (default: True)
+
+    Returns:
+        GPUHealthStatus object
+
+    Raises:
+        RuntimeError: If GPU check fails and reset not available/allowed,
+                     or if retry after reset fails
+    """
+    try:
+        # First attempt
+        return check_gpu_health(expected_device)
+
+    except (RuntimeError, Exception) as first_error:
+        # GPU health check failed
+
+        if not auto_reset:
+            logger.error(f"GPU health check failed (auto-reset disabled): {first_error}")
+            raise
+
+        if not GPU_RESET_AVAILABLE:
+            logger.error(
+                f"GPU health check failed but reset not available: {first_error}"
+            )
+            raise RuntimeError(
+                f"GPU unavailable and reset functionality not available: {first_error}"
+            )
+
+        # Check if reset is allowed (cooldown protection)
+        if not can_attempt_reset():
+            error_msg = (
+                f"GPU health check failed but reset cooldown is active. "
+                f"This prevents reset loops. Last error: {first_error}. "
+                f"Service will terminate. If this happens after sleep/wake, "
+                f"the cooldown should expire soon and next restart will work."
+            )
+            logger.error(error_msg)
+            raise RuntimeError(error_msg)
+
+        # Attempt GPU reset
+        logger.warning("=" * 70)
+        logger.warning(f"GPU HEALTH CHECK FAILED: {first_error}")
+        logger.warning("Attempting automatic GPU driver reset...")
+        logger.warning("=" * 70)
+
+        try:
+            reset_gpu_drivers()
+            record_reset_attempt()
+
+            logger.info("GPU reset completed, waiting 3 seconds for stabilization...")
+            time.sleep(3)
+
+            # Retry GPU health check
+            logger.info("Retrying GPU health check after reset...")
+            status = check_gpu_health(expected_device)
+
+            logger.warning("=" * 70)
+            logger.warning("GPU HEALTH CHECK SUCCESS AFTER RESET")
+            logger.warning(f"Device: {status.device_name}")
+            logger.warning(f"Memory: {status.memory_available_gb:.2f} GB available")
+            logger.warning("=" * 70)
+
+            return status
+
+        except Exception as reset_error:
+            error_msg = (
+                f"GPU health check failed after reset attempt. "
+                f"Original error: {first_error}. "
+                f"Reset error: {reset_error}. "
+                f"Service will terminate."
+            )
+            logger.error(error_msg)
+            raise RuntimeError(error_msg)
+
+
+class HealthMonitor:
+    """Background thread for periodic GPU health monitoring."""
+
+    def __init__(self, check_interval_minutes: int = 10):
+        """
+        Initialize health monitor.
+
+        Args:
+            check_interval_minutes: Interval between health checks (default: 10)
+        """
+        self._check_interval_seconds = check_interval_minutes * 60
+        self._thread: Optional[threading.Thread] = None
+        self._stop_event = threading.Event()
+        self._latest_status: Optional[GPUHealthStatus] = None
+        self._history: List[GPUHealthStatus] = []
+        self._max_history = 100
+        self._lock = threading.Lock()
+
+    def start(self):
+        """Start background monitoring thread."""
+        if self._thread is not None and self._thread.is_alive():
+            logger.warning("Health monitor already running")
+            return
+
+        logger.info(
+            f"Starting GPU health monitor "
+            f"(interval: {self._check_interval_seconds / 60:.1f} minutes)"
+        )
+
+        # Run initial health check
+        try:
+            status = check_gpu_health(expected_device="auto")
+            with self._lock:
+                self._latest_status = status
+                self._history.append(status)
+        except Exception as e:
+            logger.error(f"Initial GPU health check failed: {e}")
+
+        # Start background thread
+        self._stop_event.clear()
+        self._thread = threading.Thread(target=self._monitor_loop, daemon=True)
+        self._thread.start()
+
+    def stop(self):
+        """Stop background monitoring thread."""
+        if self._thread is None:
+            return
+
+        logger.info("Stopping GPU health monitor")
+        self._stop_event.set()
+        self._thread.join(timeout=5.0)
+        self._thread = None
+
+    def get_latest_status(self) -> Optional[GPUHealthStatus]:
+        """Get most recent health check result."""
+        with self._lock:
+            return self._latest_status
+
+    def get_health_history(self, limit: int = 10) -> List[GPUHealthStatus]:
+        """
+        Get recent health check history.
+
+        Args:
+            limit: Maximum number of results to return
+
+        Returns:
+            List of GPUHealthStatus objects (most recent first)
+        """
+        with self._lock:
+            return list(reversed(self._history[-limit:]))
+
+    def _monitor_loop(self):
+        """Background monitoring loop."""
+        while not self._stop_event.wait(timeout=self._check_interval_seconds):
+            try:
+                logger.debug("Running periodic GPU health check")
+                status = check_gpu_health(expected_device="auto")
+
+                with self._lock:
+                    self._latest_status = status
+                    self._history.append(status)
+
+                    # Trim history if too long
+                    if len(self._history) > self._max_history:
+                        self._history = self._history[-self._max_history:]
+
+                # Log warning if performance degraded
+                if status.gpu_available and not status.gpu_working:
+                    logger.warning(
+                        f"GPU health degraded: {status.error}. "
+                        f"Test duration: {status.test_duration_seconds}s"
+                    )
+                elif status.gpu_working and status.test_duration_seconds > 2.0:
+                    logger.warning(
+                        f"GPU health check slow: {status.test_duration_seconds}s "
+                        f"(expected <1s). May indicate performance issues."
+                    )
+
+            except Exception as e:
+                logger.error(f"Periodic GPU health check failed: {e}")
--- a/src/core/gpu_reset.py
+++ b/src/core/gpu_reset.py
@@ -0,0 +1,220 @@
+"""
+GPU driver reset functionality with cooldown protection.
+
+Provides automatic GPU driver reset on CUDA errors with safeguards
+to prevent reset loops while allowing recovery from sleep/wake cycles.
+"""
+
+import os
+import time
+import logging
+import subprocess
+from datetime import datetime, timedelta
+from pathlib import Path
+from typing import Optional
+
+logger = logging.getLogger(__name__)
+
+# Cooldown file location
+RESET_TIMESTAMP_FILE = "/tmp/whisper-gpu-last-reset"
+
+# Default cooldown period (minutes)
+DEFAULT_COOLDOWN_MINUTES = 5
+
+
+def get_cooldown_minutes() -> int:
+    """
+    Get cooldown period from environment variable.
+
+    Returns:
+        Cooldown period in minutes (default: 5)
+    """
+    try:
+        return int(os.getenv("GPU_RESET_COOLDOWN_MINUTES", str(DEFAULT_COOLDOWN_MINUTES)))
+    except ValueError:
+        logger.warning(
+            f"Invalid GPU_RESET_COOLDOWN_MINUTES value, using default: {DEFAULT_COOLDOWN_MINUTES}"
+        )
+        return DEFAULT_COOLDOWN_MINUTES
+
+
+def get_last_reset_time() -> Optional[datetime]:
+    """
+    Read timestamp of last GPU reset attempt.
+
+    Returns:
+        datetime object of last reset, or None if no previous reset
+    """
+    try:
+        if not os.path.exists(RESET_TIMESTAMP_FILE):
+            return None
+
+        with open(RESET_TIMESTAMP_FILE, 'r') as f:
+            timestamp_str = f.read().strip()
+
+        return datetime.fromisoformat(timestamp_str)
+
+    except Exception as e:
+        logger.warning(f"Failed to read last reset timestamp: {e}")
+        return None
+
+
+def record_reset_attempt() -> None:
+    """
+    Record current time as last GPU reset attempt.
+
+    Creates/updates timestamp file with current UTC time.
+    """
+    try:
+        timestamp = datetime.utcnow().isoformat()
+
+        with open(RESET_TIMESTAMP_FILE, 'w') as f:
+            f.write(timestamp)
+
+        logger.info(f"Recorded GPU reset timestamp: {timestamp}")
+
+    except Exception as e:
+        logger.error(f"Failed to record reset timestamp: {e}")
+
+
+def can_attempt_reset() -> bool:
+    """
+    Check if GPU reset can be attempted based on cooldown period.
+
+    Returns:
+        True if reset is allowed (no recent reset or cooldown expired),
+        False if cooldown is still active
+    """
+    last_reset = get_last_reset_time()
+
+    if last_reset is None:
+        # No previous reset recorded
+        logger.debug("No previous GPU reset found, reset allowed")
+        return True
+
+    cooldown_minutes = get_cooldown_minutes()
+    cooldown_period = timedelta(minutes=cooldown_minutes)
+    time_since_reset = datetime.utcnow() - last_reset
+
+    if time_since_reset < cooldown_period:
+        remaining = cooldown_period - time_since_reset
+        logger.warning(
+            f"GPU reset cooldown active. "
+            f"Last reset: {last_reset.isoformat()}, "
+            f"Cooldown: {cooldown_minutes} min, "
+            f"Remaining: {remaining.total_seconds():.0f}s"
+        )
+        return False
+
+    logger.info(
+        f"GPU reset cooldown expired. "
+        f"Last reset: {last_reset.isoformat()}, "
+        f"Time since: {time_since_reset.total_seconds():.0f}s"
+    )
+    return True
+
+
+def reset_gpu_drivers() -> None:
+    """
+    Execute GPU driver reset using reset_gpu.sh script.
+
+    This function:
+    1. Locates reset_gpu.sh in project root
+    2. Executes it with sudo
+    3. Waits for completion
+    4. Raises exception if reset fails
+
+    Raises:
+        RuntimeError: If reset script fails or is not found
+        PermissionError: If sudo permissions not configured
+
+    Note:
+        Requires passwordless sudo for nvidia commands.
+        See CLAUDE.md for setup instructions.
+    """
+    logger.warning("=" * 60)
+    logger.warning("INITIATING GPU DRIVER RESET")
+    logger.warning("=" * 60)
+
+    # Find reset_gpu.sh script
+    script_path = Path(__file__).parent.parent.parent / "reset_gpu.sh"
+
+    if not script_path.exists():
+        error_msg = f"GPU reset script not found: {script_path}"
+        logger.error(error_msg)
+        raise RuntimeError(error_msg)
+
+    if not os.access(script_path, os.X_OK):
+        error_msg = f"GPU reset script not executable: {script_path}"
+        logger.error(error_msg)
+        raise RuntimeError(error_msg)
+
+    logger.info(f"Executing GPU reset script: {script_path}")
+    logger.warning("This will temporarily interrupt all GPU operations")
+
+    try:
+        # Execute reset script with sudo
+        result = subprocess.run(
+            ['sudo', str(script_path)],
+            capture_output=True,
+            text=True,
+            timeout=30  # 30 second timeout
+        )
+
+        # Log script output
+        if result.stdout:
+            logger.info(f"Reset script output:\n{result.stdout}")
+
+        if result.stderr:
+            logger.warning(f"Reset script stderr:\n{result.stderr}")
+
+        # Check exit code
+        if result.returncode != 0:
+            error_msg = (
+                f"GPU reset script failed with exit code {result.returncode}. "
+                f"This may indicate:\n"
+                f"  1. Sudo permissions not configured (see CLAUDE.md)\n"
+                f"  2. GPU hardware issue\n"
+                f"  3. Driver installation problem\n"
+                f"Output: {result.stderr or result.stdout}"
+            )
+            logger.error(error_msg)
+            raise RuntimeError(error_msg)
+
+        logger.warning("=" * 60)
+        logger.warning("GPU DRIVER RESET COMPLETED SUCCESSFULLY")
+        logger.warning("=" * 60)
+
+    except subprocess.TimeoutExpired:
+        error_msg = "GPU reset script timed out after 30 seconds"
+        logger.error(error_msg)
+        raise RuntimeError(error_msg)
+
+    except FileNotFoundError:
+        error_msg = (
+            "sudo command not found. This service requires sudo access "
+            "for GPU driver reset. Please ensure sudo is installed."
+        )
+        logger.error(error_msg)
+        raise RuntimeError(error_msg)
+
+    except Exception as e:
+        error_msg = f"Unexpected error during GPU reset: {e}"
+        logger.error(error_msg)
+        raise RuntimeError(error_msg)
+
+
+def clear_reset_cooldown() -> None:
+    """
+    Clear reset cooldown by removing timestamp file.
+
+    Useful for testing or manual intervention.
+    """
+    try:
+        if os.path.exists(RESET_TIMESTAMP_FILE):
+            os.remove(RESET_TIMESTAMP_FILE)
+            logger.info("GPU reset cooldown cleared")
+        else:
+            logger.debug("No cooldown to clear")
+    except Exception as e:
+        logger.error(f"Failed to clear reset cooldown: {e}")
--- a/src/core/job_queue.py
+++ b/src/core/job_queue.py
@@ -0,0 +1,545 @@
+"""
+Job queue manager for asynchronous transcription processing.
+
+Provides FIFO job queue with background worker thread and disk persistence.
+"""
+
+import os
+import json
+import time
+import queue
+import logging
+import threading
+import uuid
+from enum import Enum
+from dataclasses import dataclass, asdict
+from datetime import datetime
+from typing import Optional, List, Dict
+
+from core.gpu_health import check_gpu_health_with_reset
+from core.transcriber import transcribe_audio
+from utils.audio_processor import validate_audio_file
+
+logger = logging.getLogger(__name__)
+
+
+class JobStatus(Enum):
+    """Job status enumeration."""
+    QUEUED = "queued"      # In queue, waiting
+    RUNNING = "running"    # Currently processing
+    COMPLETED = "completed" # Successfully finished
+    FAILED = "failed"      # Error occurred
+
+
+@dataclass
+class Job:
+    """Represents a transcription job."""
+    job_id: str
+    status: JobStatus
+    created_at: datetime
+    started_at: Optional[datetime]
+    completed_at: Optional[datetime]
+    queue_position: int
+
+    # Request parameters
+    audio_path: str
+    model_name: str
+    device: str
+    compute_type: str
+    language: Optional[str]
+    output_format: str
+    beam_size: int
+    temperature: float
+    initial_prompt: Optional[str]
+    output_directory: Optional[str]
+
+    # Results
+    result_path: Optional[str]
+    error: Optional[str]
+    processing_time_seconds: Optional[float]
+
+    def to_dict(self) -> dict:
+        """Serialize to dictionary for JSON storage."""
+        return {
+            "job_id": self.job_id,
+            "status": self.status.value,
+            "created_at": self.created_at.isoformat() if self.created_at else None,
+            "started_at": self.started_at.isoformat() if self.started_at else None,
+            "completed_at": self.completed_at.isoformat() if self.completed_at else None,
+            "queue_position": self.queue_position,
+            "request_params": {
+                "audio_path": self.audio_path,
+                "model_name": self.model_name,
+                "device": self.device,
+                "compute_type": self.compute_type,
+                "language": self.language,
+                "output_format": self.output_format,
+                "beam_size": self.beam_size,
+                "temperature": self.temperature,
+                "initial_prompt": self.initial_prompt,
+                "output_directory": self.output_directory,
+            },
+            "result_path": self.result_path,
+            "error": self.error,
+            "processing_time_seconds": self.processing_time_seconds,
+        }
+
+    @classmethod
+    def from_dict(cls, data: dict) -> 'Job':
+        """Deserialize from dictionary."""
+        params = data.get("request_params", {})
+        return cls(
+            job_id=data["job_id"],
+            status=JobStatus(data["status"]),
+            created_at=datetime.fromisoformat(data["created_at"]) if data.get("created_at") else datetime.utcnow(),
+            started_at=datetime.fromisoformat(data["started_at"]) if data.get("started_at") else None,
+            completed_at=datetime.fromisoformat(data["completed_at"]) if data.get("completed_at") else None,
+            queue_position=data.get("queue_position", 0),
+            audio_path=params["audio_path"],
+            model_name=params.get("model_name", "large-v3"),
+            device=params.get("device", "auto"),
+            compute_type=params.get("compute_type", "auto"),
+            language=params.get("language"),
+            output_format=params.get("output_format", "txt"),
+            beam_size=params.get("beam_size", 5),
+            temperature=params.get("temperature", 0.0),
+            initial_prompt=params.get("initial_prompt"),
+            output_directory=params.get("output_directory"),
+            result_path=data.get("result_path"),
+            error=data.get("error"),
+            processing_time_seconds=data.get("processing_time_seconds"),
+        )
+
+    def save_to_disk(self, metadata_dir: str):
+        """Save job metadata to {metadata_dir}/{job_id}.json"""
+        os.makedirs(metadata_dir, exist_ok=True)
+        filepath = os.path.join(metadata_dir, f"{self.job_id}.json")
+
+        try:
+            with open(filepath, 'w') as f:
+                json.dump(self.to_dict(), f, indent=2)
+        except Exception as e:
+            logger.error(f"Failed to save job {self.job_id} to disk: {e}")
+
+
+class JobQueue:
+    """Manages job queue with background worker."""
+
+    def __init__(self,
+                 max_queue_size: int = 100,
+                 metadata_dir: str = "/outputs/jobs"):
+        """
+        Initialize job queue.
+
+        Args:
+            max_queue_size: Maximum number of jobs in queue
+            metadata_dir: Directory to store job metadata JSON files
+        """
+        self._queue = queue.Queue(maxsize=max_queue_size)
+        self._jobs: Dict[str, Job] = {}
+        self._metadata_dir = metadata_dir
+        self._worker_thread: Optional[threading.Thread] = None
+        self._stop_event = threading.Event()
+        self._current_job_id: Optional[str] = None
+        self._lock = threading.RLock()  # Use RLock to allow re-entrant locking
+        self._max_queue_size = max_queue_size
+
+        # Create metadata directory
+        os.makedirs(metadata_dir, exist_ok=True)
+
+    def start(self):
+        """
+        Start background worker thread.
+        Load existing jobs from disk on startup.
+        """
+        if self._worker_thread is not None and self._worker_thread.is_alive():
+            logger.warning("Job queue worker already running")
+            return
+
+        logger.info(f"Starting job queue (max size: {self._max_queue_size})")
+
+        # Load existing jobs from disk
+        self._load_jobs_from_disk()
+
+        # Start background worker thread
+        self._stop_event.clear()
+        self._worker_thread = threading.Thread(target=self._worker_loop, daemon=True)
+        self._worker_thread.start()
+
+        logger.info("Job queue worker started")
+
+    def stop(self, wait_for_current: bool = True):
+        """
+        Stop background worker.
+
+        Args:
+            wait_for_current: If True, wait for current job to complete
+        """
+        if self._worker_thread is None:
+            return
+
+        logger.info(f"Stopping job queue (wait_for_current={wait_for_current})")
+        self._stop_event.set()
+
+        if wait_for_current:
+            self._worker_thread.join(timeout=30.0)
+        else:
+            self._worker_thread.join(timeout=1.0)
+
+        self._worker_thread = None
+        logger.info("Job queue worker stopped")
+
+    def submit_job(self,
+                   audio_path: str,
+                   model_name: str = "large-v3",
+                   device: str = "auto",
+                   compute_type: str = "auto",
+                   language: Optional[str] = None,
+                   output_format: str = "txt",
+                   beam_size: int = 5,
+                   temperature: float = 0.0,
+                   initial_prompt: Optional[str] = None,
+                   output_directory: Optional[str] = None) -> dict:
+        """
+        Submit a new transcription job.
+
+        Returns:
+            dict: {
+                "job_id": str,
+                "status": str,
+                "queue_position": int,
+                "created_at": str
+            }
+
+        Raises:
+            queue.Full: If queue is at max capacity
+            RuntimeError: If GPU health check fails (when device="cuda")
+            FileNotFoundError: If audio file doesn't exist
+        """
+        # 1. Validate audio file exists
+        try:
+            validate_audio_file(audio_path)
+        except Exception as e:
+            raise FileNotFoundError(f"Audio file validation failed: {e}")
+
+        # 2. Check GPU health (GPU required for all devices since this is GPU-only service)
+        # Both device="cuda" and device="auto" require GPU
+        if device == "cuda" or device == "auto":
+            try:
+                logger.info("Running GPU health check before job submission")
+                # Use expected_device to match what user requested
+                expected = "cuda" if device == "cuda" else "auto"
+                health_status = check_gpu_health_with_reset(expected_device=expected, auto_reset=True)
+                if not health_status.gpu_working:
+                    raise RuntimeError(
+                        f"GPU device required but not available. "
+                        f"GPU check failed: {health_status.error}. "
+                        f"This service is configured for GPU-only operation."
+                    )
+                logger.info("GPU health check passed")
+            except RuntimeError as e:
+                # Re-raise GPU health errors
+                logger.error(f"Job rejected due to GPU health check failure: {e}")
+                raise RuntimeError(f"Job rejected: {e}")
+        elif device == "cpu":
+            # Reject CPU device explicitly
+            raise ValueError(
+                "CPU device requested but this service is configured for GPU-only operation. "
+                "Please use device='cuda' or device='auto' with a GPU available."
+            )
+
+        # 3. Generate job_id
+        job_id = str(uuid.uuid4())
+
+        # 4. Create Job object
+        job = Job(
+            job_id=job_id,
+            status=JobStatus.QUEUED,
+            created_at=datetime.utcnow(),
+            started_at=None,
+            completed_at=None,
+            queue_position=0,  # Will be calculated
+            audio_path=audio_path,
+            model_name=model_name,
+            device=device,
+            compute_type=compute_type,
+            language=language,
+            output_format=output_format,
+            beam_size=beam_size,
+            temperature=temperature,
+            initial_prompt=initial_prompt,
+            output_directory=output_directory,
+            result_path=None,
+            error=None,
+            processing_time_seconds=None,
+        )
+
+        # 5. Add to queue (raises queue.Full if full)
+        try:
+            self._queue.put_nowait(job)
+        except queue.Full:
+            raise queue.Full(
+                f"Job queue is full (max size: {self._max_queue_size}). "
+                f"Please try again later."
+            )
+
+        # 6. Add to jobs dict and save to disk
+        with self._lock:
+            self._jobs[job_id] = job
+            self._calculate_queue_positions()
+            job.save_to_disk(self._metadata_dir)
+
+        logger.info(
+            f"Job {job_id} submitted: {audio_path} "
+            f"(queue position: {job.queue_position})"
+        )
+
+        # 7. Return job info
+        return {
+            "job_id": job_id,
+            "status": job.status.value,
+            "queue_position": job.queue_position,
+            "created_at": job.created_at.isoformat() + "Z"
+        }
+
+    def get_job_status(self, job_id: str) -> dict:
+        """
+        Get current status of a job.
+
+        Returns:
+            dict: Job status information
+
+        Raises:
+            KeyError: If job_id not found
+        """
+        with self._lock:
+            if job_id not in self._jobs:
+                raise KeyError(f"Job {job_id} not found")
+
+            job = self._jobs[job_id]
+            return {
+                "job_id": job.job_id,
+                "status": job.status.value,
+                "queue_position": job.queue_position if job.status == JobStatus.QUEUED else None,
+                "created_at": job.created_at.isoformat() + "Z",
+                "started_at": job.started_at.isoformat() + "Z" if job.started_at else None,
+                "completed_at": job.completed_at.isoformat() + "Z" if job.completed_at else None,
+                "result_path": job.result_path,
+                "error": job.error,
+                "processing_time_seconds": job.processing_time_seconds,
+            }
+
+    def get_job_result(self, job_id: str) -> str:
+        """
+        Get transcription result text for completed job.
+
+        Returns:
+            str: Content of transcription file
+
+        Raises:
+            KeyError: If job_id not found
+            ValueError: If job not completed
+            FileNotFoundError: If result file missing
+        """
+        with self._lock:
+            if job_id not in self._jobs:
+                raise KeyError(f"Job {job_id} not found")
+
+            job = self._jobs[job_id]
+
+            if job.status != JobStatus.COMPLETED:
+                raise ValueError(
+                    f"Job {job_id} is not completed yet. "
+                    f"Current status: {job.status.value}"
+                )
+
+            if not job.result_path:
+                raise FileNotFoundError(f"Job {job_id} has no result path")
+
+        # Read result file (outside lock to avoid blocking)
+        if not os.path.exists(job.result_path):
+            raise FileNotFoundError(
+                f"Result file not found: {job.result_path}"
+            )
+
+        with open(job.result_path, 'r', encoding='utf-8') as f:
+            return f.read()
+
+    def list_jobs(self,
+                  status_filter: Optional[JobStatus] = None,
+                  limit: int = 100) -> List[dict]:
+        """
+        List jobs with optional status filter.
+
+        Args:
+            status_filter: Only return jobs with this status
+            limit: Maximum number of jobs to return
+
+        Returns:
+            List of job status dictionaries
+        """
+        with self._lock:
+            jobs = list(self._jobs.values())
+
+            # Filter by status
+            if status_filter:
+                jobs = [j for j in jobs if j.status == status_filter]
+
+            # Sort by created_at (newest first)
+            jobs.sort(key=lambda j: j.created_at, reverse=True)
+
+            # Limit results
+            jobs = jobs[:limit]
+
+            # Convert to dict
+            return [self.get_job_status(j.job_id) for j in jobs]
+
+    def _worker_loop(self):
+        """
+        Background worker thread function.
+        Processes jobs from queue in FIFO order.
+        """
+        logger.info("Job queue worker loop started")
+
+        while not self._stop_event.is_set():
+            try:
+                # Get job from queue (with timeout to check stop_event)
+                try:
+                    job = self._queue.get(timeout=1.0)
+                except queue.Empty:
+                    continue
+
+                # Update job status to running
+                with self._lock:
+                    self._current_job_id = job.job_id
+                    job.status = JobStatus.RUNNING
+                    job.started_at = datetime.utcnow()
+                    job.queue_position = 0
+                    self._calculate_queue_positions()
+                    job.save_to_disk(self._metadata_dir)
+
+                logger.info(f"Job {job.job_id} started processing")
+
+                # Process job
+                start_time = time.time()
+                try:
+                    result = transcribe_audio(
+                        audio_path=job.audio_path,
+                        model_name=job.model_name,
+                        device=job.device,
+                        compute_type=job.compute_type,
+                        language=job.language,
+                        output_format=job.output_format,
+                        beam_size=job.beam_size,
+                        temperature=job.temperature,
+                        initial_prompt=job.initial_prompt,
+                        output_directory=job.output_directory
+                    )
+
+                    # Parse result
+                    if "saved to:" in result:
+                        job.result_path = result.split("saved to:")[1].strip()
+                        job.status = JobStatus.COMPLETED
+                        logger.info(
+                            f"Job {job.job_id} completed successfully: {job.result_path}"
+                        )
+                    else:
+                        job.status = JobStatus.FAILED
+                        job.error = result
+                        logger.error(f"Job {job.job_id} failed: {result}")
+
+                except Exception as e:
+                    job.status = JobStatus.FAILED
+                    job.error = str(e)
+                    logger.error(f"Job {job.job_id} failed with exception: {e}")
+
+                finally:
+                    # Update job completion info
+                    job.completed_at = datetime.utcnow()
+                    job.processing_time_seconds = time.time() - start_time
+                    job.save_to_disk(self._metadata_dir)
+
+                    with self._lock:
+                        self._current_job_id = None
+                        self._calculate_queue_positions()
+
+                    self._queue.task_done()
+
+                    logger.info(
+                        f"Job {job.job_id} finished: "
+                        f"status={job.status.value}, "
+                        f"duration={job.processing_time_seconds:.1f}s"
+                    )
+
+            except Exception as e:
+                logger.error(f"Unexpected error in worker loop: {e}")
+
+        logger.info("Job queue worker loop stopped")
+
+    def _load_jobs_from_disk(self):
+        """Load existing job metadata from disk on startup."""
+        if not os.path.exists(self._metadata_dir):
+            logger.info("No existing job metadata directory found")
+            return
+
+        logger.info(f"Loading jobs from {self._metadata_dir}")
+
+        loaded_count = 0
+        for filename in os.listdir(self._metadata_dir):
+            if not filename.endswith(".json"):
+                continue
+
+            filepath = os.path.join(self._metadata_dir, filename)
+            try:
+                with open(filepath, 'r') as f:
+                    data = json.load(f)
+
+                job = Job.from_dict(data)
+
+                # Handle jobs that were running when server stopped
+                if job.status == JobStatus.RUNNING:
+                    job.status = JobStatus.FAILED
+                    job.error = "Server restarted while job was running"
+                    job.completed_at = datetime.utcnow()
+                    job.save_to_disk(self._metadata_dir)
+                    logger.warning(
+                        f"Job {job.job_id} was running during shutdown, "
+                        f"marking as failed"
+                    )
+
+                # Re-queue queued jobs
+                elif job.status == JobStatus.QUEUED:
+                    try:
+                        self._queue.put_nowait(job)
+                        logger.info(f"Re-queued job {job.job_id} from disk")
+                    except queue.Full:
+                        job.status = JobStatus.FAILED
+                        job.error = "Queue full on server restart"
+                        job.completed_at = datetime.utcnow()
+                        job.save_to_disk(self._metadata_dir)
+                        logger.warning(
+                            f"Job {job.job_id} could not be re-queued (queue full)"
+                        )
+
+                self._jobs[job.job_id] = job
+                loaded_count += 1
+
+            except Exception as e:
+                logger.error(f"Failed to load job from {filepath}: {e}")
+
+        logger.info(f"Loaded {loaded_count} jobs from disk")
+        self._calculate_queue_positions()
+
+    def _calculate_queue_positions(self):
+        """Update queue_position for all queued jobs."""
+        queued_jobs = [
+            j for j in self._jobs.values()
+            if j.status == JobStatus.QUEUED
+        ]
+
+        # Sort by created_at (FIFO)
+        queued_jobs.sort(key=lambda j: j.created_at)
+
+        # Update positions
+        for i, job in enumerate(queued_jobs, start=1):
+            job.queue_position = i
--- a/src/core/model_manager.py
+++ b/src/core/model_manager.py
@@ -4,7 +4,7 @@ Model Management Module
 Responsible for loading, caching, and managing Whisper models
 """

-import os; print(os.environ.get("WHISPER_MODEL_DIR"))
+import os
 import time
 import logging
 from typing import Dict, Any
@@ -14,6 +14,14 @@ from faster_whisper import WhisperModel, BatchedInferencePipeline
 # Log configuration
 logger = logging.getLogger(__name__)

+# Import GPU health check with reset capability
+try:
+    from core.gpu_health import check_gpu_health_with_reset
+    GPU_HEALTH_CHECK_AVAILABLE = True
+except ImportError:
+    logger.warning("GPU health check with reset not available")
+    GPU_HEALTH_CHECK_AVAILABLE = False
+
 # Global model instance cache
 model_instances = {}

@@ -60,15 +68,35 @@ def get_whisper_model(model_name: str, device: str, compute_type: str) -> Dict[s
    if model_name not in valid_models:
        raise ValueError(f"Invalid model name: {model_name}. Valid models: {', '.join(valid_models)}")

-    # Auto-detect device
+    # Auto-detect device - GPU REQUIRED (no CPU fallback)
    if device == "auto":
-        device = "cuda" if torch.cuda.is_available() else "cpu"
+        if not torch.cuda.is_available():
+            error_msg = (
+                "GPU required but CUDA is not available. "
+                "This service is configured for GPU-only operation. "
+                "Please ensure CUDA is properly installed and GPU is accessible."
+            )
+            logger.error(error_msg)
+            raise RuntimeError(error_msg)
+        device = "cuda"
+
+    # Auto-detect compute type
+    if compute_type == "auto":
        compute_type = "float16" if device == "cuda" else "int8"

    # Validate device and compute type
    if device not in ["cpu", "cuda"]:
        raise ValueError(f"Invalid device: {device}. Valid devices: cpu, cuda")

+    # CRITICAL: Reject CPU device - this service is GPU-only
+    if device == "cpu":
+        error_msg = (
+            "CPU device requested but this service is configured for GPU-only operation. "
+            "Please use device='cuda' or device='auto' with a GPU available."
+        )
+        logger.error(error_msg)
+        raise ValueError(error_msg)
+
    if device == "cuda" and not torch.cuda.is_available():
        logger.error("CUDA requested but not available")
        raise RuntimeError("CUDA not available but explicitly requested")
@@ -90,7 +118,18 @@ def get_whisper_model(model_name: str, device: str, compute_type: str) -> Dict[s

    # Test GPU driver before loading model and clean
    if device == "cuda":
-        test_gpu_driver()
+        # Use GPU health check with reset capability if available
+        if GPU_HEALTH_CHECK_AVAILABLE:
+            try:
+                logger.info("Running GPU health check with auto-reset before model loading")
+                check_gpu_health_with_reset(expected_device="cuda", auto_reset=True)
+            except Exception as e:
+                logger.error(f"GPU health check failed: {e}")
+                raise RuntimeError(f"GPU not available for model loading: {e}")
+        else:
+            # Fallback to simple GPU test
+            test_gpu_driver()
+
        torch.cuda.empty_cache()

    # Instantiate model
@@ -163,8 +202,14 @@ def get_model_info() -> str:
    models = [
        "tiny", "base", "small", "medium", "large-v1", "large-v2", "large-v3"
    ]
-    devices = ["cpu", "cuda"] if torch.cuda.is_available() else ["cpu"]
-    compute_types = ["float16", "int8"] if torch.cuda.is_available() else ["int8"]
+    # GPU-only service - CUDA required
+    if not torch.cuda.is_available():
+        raise RuntimeError(
+            "GPU required but CUDA is not available. "
+            "This service is configured for GPU-only operation."
+        )
+    devices = ["cuda"]  # CPU not supported in GPU-only mode
+    compute_types = ["float16", "int8"]

    # Supported language list
    languages = {
@@ -181,20 +226,21 @@ def get_model_info() -> str:
        "available_models": models,
        "default_model": "large-v3",
        "available_devices": devices,
-        "default_device": "cuda" if torch.cuda.is_available() else "cpu",
+        "default_device": "cuda",  # GPU-only service
        "available_compute_types": compute_types,
-        "default_compute_type": "float16" if torch.cuda.is_available() else "int8",
-        "cuda_available": torch.cuda.is_available(),
+        "default_compute_type": "float16",
+        "cuda_available": True,  # Required for this service
+        "gpu_only_mode": True,  # Indicate this is GPU-only
        "supported_languages": languages,
        "supported_audio_formats": audio_formats,
        "version": "0.1.1"
    }

-    if torch.cuda.is_available():
-        info["gpu_info"] = {
-            "name": torch.cuda.get_device_name(0),
-            "memory_total": f"{torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB",
-            "memory_available": f"{torch.cuda.get_device_properties(0).total_memory / 1e9 - torch.cuda.memory_allocated() / 1e9:.2f} GB"
-        }
+    # GPU info (always present in GPU-only mode)
+    info["gpu_info"] = {
+        "name": torch.cuda.get_device_name(0),
+        "memory_total": f"{torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB",
+        "memory_available": f"{torch.cuda.get_device_properties(0).total_memory / 1e9 - torch.cuda.memory_allocated() / 1e9:.2f} GB"
+    }

    return json.dumps(info, indent=2)
--- a/src/core/transcriber.py
+++ b/src/core/transcriber.py
@@ -9,9 +9,9 @@ import time
 import logging
 from typing import Dict, Any, Tuple, List, Optional, Union

-from model_manager import get_whisper_model
-from audio_processor import validate_audio_file, process_audio
-from formatters import format_vtt, format_srt, format_json, format_txt, format_time
+from core.model_manager import get_whisper_model
+from utils.audio_processor import validate_audio_file, process_audio
+from utils.formatters import format_vtt, format_srt, format_json, format_txt, format_time

 # Logging configuration
 logger = logging.getLogger(__name__)
@@ -76,9 +76,10 @@ def transcribe_audio(
    temperature = temperature if temperature is not None else DEFAULT_TEMPERATURE
    
    # Validate audio file
-    validation_result = validate_audio_file(audio_path)
-    if validation_result != "ok":
-        return validation_result
+    try:
+        validate_audio_file(audio_path)
+    except (FileNotFoundError, ValueError, OSError) as e:
+        return f"Error: {str(e)}"

    try:
        # Get model instance
--- a/src/servers/init.py
+++ b/src/servers/init.py
@@ -0,0 +1,5 @@
+"""
+Server implementations for Whisper transcription service.
+
+Includes MCP server (whisper_server.py) and REST API server (api_server.py).
+"""
--- a/src/servers/api_server.py
+++ b/src/servers/api_server.py
@@ -0,0 +1,466 @@
+#!/usr/bin/env python3
+"""
+FastAPI REST API Server for Whisper Transcription
+Provides HTTP REST endpoints for audio transcription
+"""
+
+import os
+import sys
+import logging
+import queue as queue_module
+import tempfile
+from pathlib import Path
+from contextlib import asynccontextmanager
+from typing import Optional, List
+from fastapi import FastAPI, HTTPException, UploadFile, File, Form
+from fastapi.responses import JSONResponse, FileResponse
+from pydantic import BaseModel, Field
+import json
+
+from core.model_manager import get_model_info
+from core.job_queue import JobQueue, JobStatus
+from core.gpu_health import HealthMonitor, check_gpu_health, get_circuit_breaker_stats, reset_circuit_breaker
+from utils.startup import startup_sequence, cleanup_on_shutdown
+
+# Logging configuration
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+# Global instances
+job_queue: Optional[JobQueue] = None
+health_monitor: Optional[HealthMonitor] = None
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    """FastAPI lifespan context manager for startup/shutdown"""
+    global job_queue, health_monitor
+
+    # Startup - use common startup logic (without GPU check, handled in main)
+    logger.info("Initializing job queue and health monitor...")
+
+    from utils.startup import initialize_job_queue, initialize_health_monitor
+
+    job_queue = initialize_job_queue()
+    health_monitor = initialize_health_monitor()
+
+    yield
+
+    # Shutdown - use common cleanup logic
+    cleanup_on_shutdown(
+        job_queue=job_queue,
+        health_monitor=health_monitor,
+        wait_for_current_job=True
+    )
+
+
+# Create FastAPI app
+app = FastAPI(
+    title="Whisper Speech Recognition API",
+    description="High-performance audio transcription API based on Faster Whisper with async job queue",
+    version="0.2.0",
+    lifespan=lifespan
+)
+
+
+# Request/Response Models
+class SubmitJobRequest(BaseModel):
+    audio_path: str = Field(..., description="Path to the audio file on the server")
+    model_name: str = Field("large-v3", description="Whisper model name")
+    device: str = Field("auto", description="Execution device (cuda, auto)")
+    compute_type: str = Field("auto", description="Computation type (float16, int8, auto)")
+    language: Optional[str] = Field(None, description="Language code (zh, en, ja, etc.)")
+    output_format: str = Field("txt", description="Output format (vtt, srt, json, txt)")
+    beam_size: int = Field(5, description="Beam search size")
+    temperature: float = Field(0.0, description="Sampling temperature")
+    initial_prompt: Optional[str] = Field(None, description="Initial prompt text")
+    output_directory: Optional[str] = Field(None, description="Output directory path")
+
+
+# API Endpoints
+
+@app.get("/")
+async def root():
+    """Root endpoint with API information"""
+    return {
+        "name": "Whisper Speech Recognition API",
+        "version": "0.2.0",
+        "description": "Async job queue-based transcription service",
+        "endpoints": {
+            "GET /": "API information",
+            "GET /health": "Health check",
+            "GET /health/gpu": "GPU health check",
+            "GET /health/circuit-breaker": "Get circuit breaker stats",
+            "POST /health/circuit-breaker/reset": "Reset circuit breaker",
+            "GET /models": "Get available models information",
+            "POST /jobs": "Submit transcription job (async)",
+            "GET /jobs/{job_id}": "Get job status",
+            "GET /jobs/{job_id}/result": "Get job result",
+            "GET /jobs": "List jobs with optional filtering"
+        },
+        "workflow": {
+            "1": "Submit job via POST /jobs → receive job_id",
+            "2": "Poll status via GET /jobs/{job_id} → wait for status='completed'",
+            "3": "Get result via GET /jobs/{job_id}/result → retrieve transcription"
+        }
+    }
+
+
+@app.get("/health")
+async def health_check():
+    """Health check endpoint"""
+    return {"status": "healthy", "service": "whisper-transcription"}
+
+
+@app.get("/models")
+async def get_models():
+    """Get available Whisper models and configuration information"""
+    try:
+        model_info = get_model_info()
+        return JSONResponse(content=json.loads(model_info))
+    except Exception as e:
+        logger.error(f"Failed to get model info: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"Failed to get model info: {str(e)}")
+
+
+@app.post("/jobs")
+async def submit_job(request: SubmitJobRequest):
+    """
+    Submit a transcription job for async processing.
+
+    Returns immediately with job_id. Poll GET /jobs/{job_id} for status.
+    """
+    try:
+        job_info = job_queue.submit_job(
+            audio_path=request.audio_path,
+            model_name=request.model_name,
+            device=request.device,
+            compute_type=request.compute_type,
+            language=request.language,
+            output_format=request.output_format,
+            beam_size=request.beam_size,
+            temperature=request.temperature,
+            initial_prompt=request.initial_prompt,
+            output_directory=request.output_directory
+        )
+
+        return JSONResponse(
+            status_code=200,
+            content={
+                **job_info,
+                "message": f"Job submitted successfully. Poll /jobs/{job_info['job_id']} for status."
+            }
+        )
+
+    except queue_module.Full:
+        # Queue is full
+        logger.warning("Job queue is full, rejecting request")
+        raise HTTPException(
+            status_code=503,
+            detail={
+                "error": "Queue full",
+                "message": f"Job queue is full ({job_queue._max_queue_size}/{job_queue._max_queue_size}). Please try again later or contact administrator.",
+                "queue_size": job_queue._max_queue_size,
+                "max_queue_size": job_queue._max_queue_size
+            }
+        )
+
+    except FileNotFoundError as e:
+        # Invalid audio file
+        logger.error(f"Invalid audio file: {e}")
+        raise HTTPException(
+            status_code=400,
+            detail={
+                "error": "Invalid audio file",
+                "message": str(e),
+                "audio_path": request.audio_path
+            }
+        )
+
+    except ValueError as e:
+        # CPU device rejected
+        logger.error(f"Invalid device parameter: {e}")
+        raise HTTPException(
+            status_code=400,
+            detail={
+                "error": "Invalid device",
+                "message": str(e)
+            }
+        )
+
+    except RuntimeError as e:
+        # GPU health check failed
+        logger.error(f"GPU health check failed: {e}")
+        raise HTTPException(
+            status_code=500,
+            detail={
+                "error": "GPU unavailable",
+                "message": f"Job rejected: {str(e)}"
+            }
+        )
+
+    except Exception as e:
+        logger.error(f"Failed to submit job: {e}")
+        raise HTTPException(
+            status_code=500,
+            detail={
+                "error": "Internal error",
+                "message": f"Failed to submit job: {str(e)}"
+            }
+        )
+
+
+@app.get("/jobs/{job_id}")
+async def get_job_status_endpoint(job_id: str):
+    """
+    Get the current status of a job.
+
+    Returns job status including queue position, timestamps, and result path when completed.
+    """
+    try:
+        status = job_queue.get_job_status(job_id)
+        return JSONResponse(status_code=200, content=status)
+
+    except KeyError:
+        raise HTTPException(
+            status_code=404,
+            detail={
+                "error": "Job not found",
+                "message": f"Job ID '{job_id}' does not exist or has been cleaned up",
+                "job_id": job_id
+            }
+        )
+
+    except Exception as e:
+        logger.error(f"Failed to get job status: {e}")
+        raise HTTPException(
+            status_code=500,
+            detail={
+                "error": "Internal error",
+                "message": f"Failed to get job status: {str(e)}"
+            }
+        )
+
+
+@app.get("/jobs/{job_id}/result")
+async def get_job_result_endpoint(job_id: str):
+    """
+    Get the transcription result for a completed job.
+
+    Returns the transcription text. Only works for jobs with status='completed'.
+    """
+    try:
+        result_text = job_queue.get_job_result(job_id)
+        return JSONResponse(
+            status_code=200,
+            content={"job_id": job_id, "result": result_text}
+        )
+
+    except KeyError:
+        raise HTTPException(
+            status_code=404,
+            detail={
+                "error": "Job not found",
+                "message": f"Job ID '{job_id}' does not exist or has been cleaned up",
+                "job_id": job_id
+            }
+        )
+
+    except ValueError as e:
+        # Job not completed yet
+        # Extract current status from error message
+        status_match = str(e).split("Current status: ")
+        current_status = status_match[1] if len(status_match) > 1 else "unknown"
+
+        raise HTTPException(
+            status_code=409,
+            detail={
+                "error": "Job not completed",
+                "message": f"Job is not completed yet. Current status: {current_status}. Please wait and poll again.",
+                "job_id": job_id,
+                "current_status": current_status
+            }
+        )
+
+    except FileNotFoundError as e:
+        # Result file missing
+        logger.error(f"Result file not found for job {job_id}: {e}")
+        raise HTTPException(
+            status_code=500,
+            detail={
+                "error": "Result file not found",
+                "message": str(e),
+                "job_id": job_id
+            }
+        )
+
+    except Exception as e:
+        logger.error(f"Failed to get job result: {e}")
+        raise HTTPException(
+            status_code=500,
+            detail={
+                "error": "Internal error",
+                "message": f"Failed to get job result: {str(e)}"
+            }
+        )
+
+
+@app.get("/jobs")
+async def list_jobs_endpoint(
+    status: Optional[str] = None,
+    limit: int = 100
+):
+    """
+    List jobs with optional filtering.
+
+    Query parameters:
+    - status: Filter by status (queued, running, completed, failed)
+    - limit: Maximum number of results (default: 100)
+    """
+    try:
+        # Parse status filter
+        status_filter = None
+        if status:
+            try:
+                status_filter = JobStatus(status)
+            except ValueError:
+                raise HTTPException(
+                    status_code=400,
+                    detail={
+                        "error": "Invalid status",
+                        "message": f"Invalid status value: {status}. Must be one of: queued, running, completed, failed"
+                    }
+                )
+
+        jobs = job_queue.list_jobs(status_filter=status_filter, limit=limit)
+
+        return JSONResponse(
+            status_code=200,
+            content={
+                "jobs": jobs,
+                "total": len(jobs),
+                "filters": {
+                    "status": status,
+                    "limit": limit
+                }
+            }
+        )
+
+    except Exception as e:
+        logger.error(f"Failed to list jobs: {e}")
+        raise HTTPException(
+            status_code=500,
+            detail={
+                "error": "Internal error",
+                "message": f"Failed to list jobs: {str(e)}"
+            }
+        )
+
+
+@app.get("/health/gpu")
+async def gpu_health_check_endpoint():
+    """
+    Check GPU health by running a quick transcription test.
+
+    Returns detailed GPU status including device name, memory, and test duration.
+    """
+    try:
+        status = check_gpu_health(expected_device="auto")
+
+        # Add interpretation
+        interpretation = "GPU is healthy and working correctly"
+        if not status.gpu_available:
+            interpretation = "GPU not available on this system"
+        elif not status.gpu_working:
+            interpretation = f"GPU available but not working correctly: {status.error}"
+        elif status.test_duration_seconds > 2.0:
+            interpretation = f"GPU working but performance degraded (test took {status.test_duration_seconds:.2f}s, expected <1s)"
+
+        return JSONResponse(
+            status_code=200,
+            content={
+                **status.to_dict(),
+                "interpretation": interpretation
+            }
+        )
+
+    except Exception as e:
+        logger.error(f"GPU health check failed: {e}")
+        return JSONResponse(
+            status_code=200,  # Still return 200 but with error details
+            content={
+                "gpu_available": False,
+                "gpu_working": False,
+                "device_used": "unknown",
+                "device_name": "",
+                "memory_total_gb": 0.0,
+                "memory_available_gb": 0.0,
+                "test_duration_seconds": 0.0,
+                "timestamp": "",
+                "error": str(e),
+                "interpretation": f"GPU health check failed: {str(e)}"
+            }
+        )
+
+
+@app.get("/health/circuit-breaker")
+async def get_circuit_breaker_status():
+    """
+    Get GPU health check circuit breaker statistics.
+
+    Returns current state, failure/success counts, and last failure time.
+    """
+    try:
+        stats = get_circuit_breaker_stats()
+        return JSONResponse(status_code=200, content=stats)
+    except Exception as e:
+        logger.error(f"Failed to get circuit breaker stats: {e}")
+        raise HTTPException(status_code=500, detail=str(e))
+
+
+@app.post("/health/circuit-breaker/reset")
+async def reset_circuit_breaker_endpoint():
+    """
+    Manually reset the GPU health check circuit breaker.
+
+    Useful after fixing GPU issues or for testing purposes.
+    """
+    try:
+        reset_circuit_breaker()
+        return JSONResponse(
+            status_code=200,
+            content={"message": "Circuit breaker reset successfully"}
+        )
+    except Exception as e:
+        logger.error(f"Failed to reset circuit breaker: {e}")
+        raise HTTPException(status_code=500, detail=str(e))
+
+
+
+
+
+
+if __name__ == "__main__":
+    import uvicorn
+
+    # Perform startup GPU health check
+    from utils.startup import perform_startup_gpu_check
+
+    perform_startup_gpu_check(
+        required_device="cuda",
+        auto_reset=True,
+        exit_on_failure=True
+    )
+
+    # Get configuration from environment variables
+    host = os.getenv("API_HOST", "0.0.0.0")
+    port = int(os.getenv("API_PORT", "8000"))
+
+    logger.info(f"Starting Whisper REST API server on {host}:{port}")
+
+    uvicorn.run(
+        app,
+        host=host,
+        port=port,
+        log_level="info"
+    )
--- a/src/servers/whisper_server.py
+++ b/src/servers/whisper_server.py
@@ -0,0 +1,334 @@
+#!/usr/bin/env python3
+"""
+Faster Whisper-based Speech Recognition MCP Service
+Provides high-performance audio transcription with batch processing acceleration and multiple output formats
+"""
+
+import os
+import sys
+import logging
+import json
+import base64
+import tempfile
+from pathlib import Path
+from typing import Optional
+from mcp.server.fastmcp import FastMCP
+
+from core.model_manager import get_model_info
+from core.job_queue import JobQueue, JobStatus
+from core.gpu_health import HealthMonitor, check_gpu_health
+from utils.startup import startup_sequence, cleanup_on_shutdown
+
+# Log configuration
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+# Global instances
+job_queue: Optional[JobQueue] = None
+health_monitor: Optional[HealthMonitor] = None
+
+# Create FastMCP server instance
+mcp = FastMCP(
+    name="fast-whisper-mcp-server",
+    version="0.2.0",
+    dependencies=["faster-whisper>=0.9.0", "torch==2.6.0+cu126", "torchaudio==2.6.0+cu126", "numpy>=1.20.0"]
+)
+
+@mcp.tool()
+def get_model_info_api() -> str:
+    """
+    Get available Whisper model information and system configuration.
+
+    Returns available models, devices, languages, and GPU information.
+    """
+    return get_model_info()
+
+
+@mcp.tool()
+def transcribe_async(
+    audio_path: str,
+    model_name: str = "large-v3",
+    device: str = "auto",
+    compute_type: str = "auto",
+    language: Optional[str] = None,
+    output_format: str = "txt",
+    beam_size: int = 5,
+    temperature: float = 0.0,
+    initial_prompt: Optional[str] = None,
+    output_directory: Optional[str] = None
+) -> str:
+    """
+    Submit an audio file for asynchronous transcription.
+
+    IMPORTANT: This tool returns immediately with a job_id. Use get_job_status()
+    to check progress and get_job_result() to retrieve the transcription.
+
+    WORKFLOW FOR LLM AGENTS:
+    1. Call this tool to submit the job
+    2. You will receive a job_id and queue_position
+    3. Poll get_job_status(job_id) every 5-10 seconds to check progress
+    4. When status="completed", call get_job_result(job_id) to get transcription
+
+    For long audio files (>10 minutes), expect processing to take several minutes.
+    You can check queue_position to estimate wait time (each job ~2-5 minutes).
+
+    Args:
+        audio_path: Path to audio file on server
+        model_name: Whisper model (tiny, base, small, medium, large-v3)
+        device: Execution device (cuda, auto) - cpu is rejected
+        compute_type: Computation type (float16, int8, auto)
+        language: Language code (en, zh, ja, etc.) or auto-detect
+        output_format: Output format (txt, vtt, srt, json)
+        beam_size: Beam search size (larger=better quality, slower)
+        temperature: Sampling temperature (0.0=greedy)
+        initial_prompt: Optional prompt to guide transcription
+        output_directory: Where to save result (uses default if not specified)
+
+    Returns:
+        JSON string with job_id, status, queue_position, and instructions
+    """
+    try:
+        job_info = job_queue.submit_job(
+            audio_path=audio_path,
+            model_name=model_name,
+            device=device,
+            compute_type=compute_type,
+            language=language,
+            output_format=output_format,
+            beam_size=beam_size,
+            temperature=temperature,
+            initial_prompt=initial_prompt,
+            output_directory=output_directory
+        )
+        return json.dumps({
+            **job_info,
+            "message": f"Job submitted successfully. Poll get_job_status('{job_info['job_id']}') for updates."
+        }, indent=2)
+
+    except Exception as e:
+        error_type = type(e).__name__
+        if "Full" in error_type or "queue is full" in str(e).lower():
+            error_code = "QUEUE_FULL"
+            message = f"Job queue is full. Please try again in a few minutes. Error: {str(e)}"
+        elif "FileNotFoundError" in error_type or "not found" in str(e).lower():
+            error_code = "INVALID_AUDIO_FILE"
+            message = f"Audio file not found or invalid. Error: {str(e)}"
+        elif "RuntimeError" in error_type or "GPU" in str(e):
+            error_code = "GPU_UNAVAILABLE"
+            message = f"GPU unavailable. Error: {str(e)}"
+        elif "ValueError" in error_type or "CPU" in str(e):
+            error_code = "INVALID_DEVICE"
+            message = f"Invalid device parameter. Error: {str(e)}"
+        else:
+            error_code = "INTERNAL_ERROR"
+            message = f"Failed to submit job. Error: {str(e)}"
+
+        return json.dumps({
+            "error": error_code,
+            "message": message
+        }, indent=2)
+
+
+@mcp.tool()
+def get_job_status(job_id: str) -> str:
+    """
+    Check the status of a transcription job.
+
+    Status values:
+    - "queued": Job is waiting in queue. Check queue_position.
+    - "running": Job is currently being processed.
+    - "completed": Transcription finished. Call get_job_result() to retrieve.
+    - "failed": Job failed. Check error field for details.
+
+    Args:
+        job_id: Job ID from transcribe_async()
+
+    Returns:
+        JSON string with detailed job status including:
+        - status, queue_position, timestamps, error (if any)
+    """
+    try:
+        status = job_queue.get_job_status(job_id)
+        return json.dumps(status, indent=2)
+
+    except KeyError:
+        return json.dumps({
+            "error": "JOB_NOT_FOUND",
+            "message": f"Job ID '{job_id}' does not exist. Please check the job_id."
+        }, indent=2)
+
+    except Exception as e:
+        return json.dumps({
+            "error": "INTERNAL_ERROR",
+            "message": f"Failed to get job status. Error: {str(e)}"
+        }, indent=2)
+
+
+@mcp.tool()
+def get_job_result(job_id: str) -> str:
+    """
+    Retrieve the transcription result for a completed job.
+
+    IMPORTANT: Only call this when get_job_status() returns status="completed".
+    If the job is not completed, this will return an error.
+
+    Args:
+        job_id: Job ID from transcribe_async()
+
+    Returns:
+        Transcription text as a string
+
+    Errors:
+        - "Job not found" if invalid job_id
+        - "Job not completed yet" if status is not "completed"
+        - "Result file not found" if transcription file is missing
+    """
+    try:
+        result_text = job_queue.get_job_result(job_id)
+        return result_text  # Return raw text, not JSON
+
+    except KeyError:
+        return json.dumps({
+            "error": "JOB_NOT_FOUND",
+            "message": f"Job ID '{job_id}' does not exist."
+        }, indent=2)
+
+    except ValueError as e:
+        # Extract status from error message
+        status_match = str(e).split("Current status: ")
+        current_status = status_match[1] if len(status_match) > 1 else "unknown"
+        return json.dumps({
+            "error": "JOB_NOT_COMPLETED",
+            "message": f"Job is not completed yet. Current status: {current_status}. Please wait and check again.",
+            "current_status": current_status
+        }, indent=2)
+
+    except FileNotFoundError as e:
+        return json.dumps({
+            "error": "RESULT_FILE_NOT_FOUND",
+            "message": f"Result file not found. Error: {str(e)}"
+        }, indent=2)
+
+    except Exception as e:
+        return json.dumps({
+            "error": "INTERNAL_ERROR",
+            "message": f"Failed to get job result. Error: {str(e)}"
+        }, indent=2)
+
+
+@mcp.tool()
+def list_transcription_jobs(
+    status_filter: Optional[str] = None,
+    limit: int = 20
+) -> str:
+    """
+    List transcription jobs with optional filtering.
+
+    Useful for:
+    - Checking all your submitted jobs
+    - Finding completed jobs
+    - Monitoring queue status
+
+    Args:
+        status_filter: Filter by status (queued, running, completed, failed)
+        limit: Maximum number of jobs to return (default: 20)
+
+    Returns:
+        JSON string with list of jobs
+    """
+    try:
+        # Parse status filter
+        status_obj = None
+        if status_filter:
+            try:
+                status_obj = JobStatus(status_filter)
+            except ValueError:
+                return json.dumps({
+                    "error": "INVALID_STATUS",
+                    "message": f"Invalid status: {status_filter}. Must be one of: queued, running, completed, failed"
+                }, indent=2)
+
+        jobs = job_queue.list_jobs(status_filter=status_obj, limit=limit)
+
+        return json.dumps({
+            "jobs": jobs,
+            "total": len(jobs),
+            "filters": {
+                "status": status_filter,
+                "limit": limit
+            }
+        }, indent=2)
+
+    except Exception as e:
+        return json.dumps({
+            "error": "INTERNAL_ERROR",
+            "message": f"Failed to list jobs. Error: {str(e)}"
+        }, indent=2)
+
+
+@mcp.tool()
+def check_gpu_health() -> str:
+    """
+    Test GPU availability and performance by running a quick transcription.
+
+    This tool loads the tiny model and transcribes a 1-second test audio file
+    to verify the GPU is working correctly.
+
+    Use this when:
+    - You want to verify GPU is available before submitting large jobs
+    - You suspect GPU performance issues
+    - For monitoring/debugging purposes
+
+    Returns:
+        JSON string with detailed GPU status including:
+        - gpu_available, gpu_working, device_name, memory_info
+        - test_duration_seconds (GPU: <1s, CPU: 5-10s)
+        - interpretation message
+
+    Note: If this returns gpu_working=false, transcriptions will be very slow.
+    """
+    try:
+        status = check_gpu_health(expected_device="auto")
+
+        # Add interpretation
+        interpretation = "GPU is healthy and working correctly"
+        if not status.gpu_available:
+            interpretation = "GPU not available on this system"
+        elif not status.gpu_working:
+            interpretation = f"GPU available but not working correctly: {status.error}"
+        elif status.test_duration_seconds > 2.0:
+            interpretation = f"GPU working but performance degraded (test took {status.test_duration_seconds:.2f}s, expected <1s)"
+
+        result = status.to_dict()
+        result["interpretation"] = interpretation
+
+        return json.dumps(result, indent=2)
+
+    except Exception as e:
+        return json.dumps({
+            "error": "GPU_CHECK_FAILED",
+            "message": f"GPU health check failed. Error: {str(e)}",
+            "gpu_available": False,
+            "gpu_working": False
+        }, indent=2)
+
+if __name__ == "__main__":
+    print("starting mcp server for whisper stt transcriptor")
+
+    # Execute common startup sequence
+    job_queue, health_monitor = startup_sequence(
+        service_name="MCP Whisper Server",
+        require_gpu=True,
+        initialize_queue=True,
+        initialize_monitoring=True
+    )
+
+    try:
+        mcp.run()
+    finally:
+        # Cleanup on shutdown
+        cleanup_on_shutdown(
+            job_queue=job_queue,
+            health_monitor=health_monitor,
+            wait_for_current_job=True
+        )
--- a/src/utils/init.py
+++ b/src/utils/init.py
@@ -0,0 +1,6 @@
+"""
+Utility modules for Whisper transcription service.
+
+Includes audio processing, formatters, test audio generation, input validation,
+circuit breaker, and startup logic.
+"""
--- a/src/utils/audio_processor.py
+++ b/src/utils/audio_processor.py
@@ -0,0 +1,68 @@
+#!/usr/bin/env python3
+"""
+Audio Processing Module
+Responsible for audio file validation and preprocessing
+"""
+
+import os
+import logging
+from typing import Union, Any
+from pathlib import Path
+from faster_whisper import decode_audio
+
+from utils.input_validation import (
+    validate_audio_file as validate_audio_file_secure,
+    sanitize_error_message
+)
+
+# Log configuration
+logger = logging.getLogger(__name__)
+
+def validate_audio_file(audio_path: str, allowed_dirs: list = None) -> None:
+    """
+    Validate if an audio file is valid (with security checks).
+
+    Args:
+        audio_path: Path to the audio file
+        allowed_dirs: Optional list of allowed base directories
+
+    Raises:
+        FileNotFoundError: If audio file doesn't exist
+        ValueError: If audio file format is unsupported or file is empty
+        OSError: If file size cannot be checked
+
+    Returns:
+        None: If validation passes
+    """
+    try:
+        # Use secure validation
+        validate_audio_file_secure(audio_path, allowed_dirs)
+    except Exception as e:
+        # Re-raise with sanitized error messages
+        error_msg = sanitize_error_message(str(e))
+
+        if "not found" in str(e).lower():
+            raise FileNotFoundError(error_msg)
+        elif "size" in str(e).lower():
+            raise OSError(error_msg)
+        else:
+            raise ValueError(error_msg)
+
+def process_audio(audio_path: str) -> Union[str, Any]:
+    """
+    Process audio file, perform decoding and preprocessing
+
+    Args:
+        audio_path: Path to the audio file
+
+    Returns:
+        Union[str, Any]: Processed audio data or original file path
+    """
+    # Try to preprocess audio using decode_audio to handle more formats
+    try:
+        audio_data = decode_audio(audio_path)
+        logger.info(f"Successfully preprocessed audio: {os.path.basename(audio_path)}")
+        return audio_data
+    except Exception as audio_error:
+        logger.warning(f"Audio preprocessing failed, will use file path directly: {str(audio_error)}")
+        return audio_path
--- a/src/utils/circuit_breaker.py
+++ b/src/utils/circuit_breaker.py
@@ -0,0 +1,291 @@
+#!/usr/bin/env python3
+"""
+Circuit Breaker Pattern Implementation
+
+Prevents repeated failed attempts and provides fail-fast behavior.
+Useful for GPU health checks and other operations that may fail repeatedly.
+"""
+
+import time
+import logging
+import threading
+from enum import Enum
+from dataclasses import dataclass
+from datetime import datetime, timedelta
+from typing import Callable, Any, Optional
+
+logger = logging.getLogger(__name__)
+
+
+class CircuitState(Enum):
+    """Circuit breaker states."""
+    CLOSED = "closed"      # Normal operation, requests pass through
+    OPEN = "open"          # Circuit is open, requests fail immediately
+    HALF_OPEN = "half_open"  # Testing if circuit can close
+
+
+@dataclass
+class CircuitBreakerConfig:
+    """Configuration for circuit breaker."""
+    failure_threshold: int = 3          # Failures before opening circuit
+    success_threshold: int = 2          # Successes before closing from half-open
+    timeout_seconds: int = 60           # Time before attempting half-open
+    half_open_max_calls: int = 1        # Max calls to test in half-open state
+
+
+class CircuitBreaker:
+    """
+    Circuit breaker implementation for preventing repeated failures.
+
+    Usage:
+        breaker = CircuitBreaker(name="gpu_health", failure_threshold=3)
+
+        @breaker.call
+        def check_gpu():
+            # This function will be protected by circuit breaker
+            return perform_gpu_check()
+
+        # Or use decorator:
+        @breaker.decorator()
+        def my_function():
+            return "result"
+    """
+
+    def __init__(
+        self,
+        name: str,
+        failure_threshold: int = 3,
+        success_threshold: int = 2,
+        timeout_seconds: int = 60,
+        half_open_max_calls: int = 1
+    ):
+        """
+        Initialize circuit breaker.
+
+        Args:
+            name: Name of the circuit (for logging)
+            failure_threshold: Number of failures before opening
+            success_threshold: Number of successes to close from half-open
+            timeout_seconds: Seconds before transitioning to half-open
+            half_open_max_calls: Max concurrent calls in half-open state
+        """
+        self.name = name
+        self.config = CircuitBreakerConfig(
+            failure_threshold=failure_threshold,
+            success_threshold=success_threshold,
+            timeout_seconds=timeout_seconds,
+            half_open_max_calls=half_open_max_calls
+        )
+
+        self._state = CircuitState.CLOSED
+        self._failure_count = 0
+        self._success_count = 0
+        self._last_failure_time: Optional[datetime] = None
+        self._half_open_calls = 0
+        self._lock = threading.RLock()
+
+        logger.info(
+            f"Circuit breaker '{name}' initialized: "
+            f"failure_threshold={failure_threshold}, "
+            f"timeout={timeout_seconds}s"
+        )
+
+    @property
+    def state(self) -> CircuitState:
+        """Get current circuit state."""
+        with self._lock:
+            self._update_state()
+            return self._state
+
+    @property
+    def is_closed(self) -> bool:
+        """Check if circuit is closed (normal operation)."""
+        return self.state == CircuitState.CLOSED
+
+    @property
+    def is_open(self) -> bool:
+        """Check if circuit is open (failing fast)."""
+        return self.state == CircuitState.OPEN
+
+    @property
+    def is_half_open(self) -> bool:
+        """Check if circuit is half-open (testing)."""
+        return self.state == CircuitState.HALF_OPEN
+
+    def _update_state(self):
+        """Update state based on timeout and counters."""
+        if self._state == CircuitState.OPEN:
+            # Check if timeout has passed
+            if self._last_failure_time:
+                elapsed = datetime.utcnow() - self._last_failure_time
+                if elapsed.total_seconds() >= self.config.timeout_seconds:
+                    logger.info(
+                        f"Circuit '{self.name}': Transitioning to HALF_OPEN "
+                        f"after {elapsed.total_seconds():.0f}s timeout"
+                    )
+                    self._state = CircuitState.HALF_OPEN
+                    self._half_open_calls = 0
+                    self._success_count = 0
+
+    def _on_success(self):
+        """Handle successful call."""
+        with self._lock:
+            if self._state == CircuitState.HALF_OPEN:
+                self._success_count += 1
+                logger.debug(
+                    f"Circuit '{self.name}': Success in HALF_OPEN "
+                    f"({self._success_count}/{self.config.success_threshold})"
+                )
+
+                if self._success_count >= self.config.success_threshold:
+                    logger.info(f"Circuit '{self.name}': Closing circuit after successful test")
+                    self._state = CircuitState.CLOSED
+                    self._failure_count = 0
+                    self._success_count = 0
+                    self._last_failure_time = None
+
+            elif self._state == CircuitState.CLOSED:
+                # Reset failure count on success
+                self._failure_count = 0
+
+    def _on_failure(self, error: Exception):
+        """Handle failed call."""
+        with self._lock:
+            self._failure_count += 1
+            self._last_failure_time = datetime.utcnow()
+
+            if self._state == CircuitState.HALF_OPEN:
+                logger.warning(
+                    f"Circuit '{self.name}': Failure in HALF_OPEN, reopening circuit"
+                )
+                self._state = CircuitState.OPEN
+                self._success_count = 0
+
+            elif self._state == CircuitState.CLOSED:
+                logger.debug(
+                    f"Circuit '{self.name}': Failure {self._failure_count}/"
+                    f"{self.config.failure_threshold}"
+                )
+
+                if self._failure_count >= self.config.failure_threshold:
+                    logger.warning(
+                        f"Circuit '{self.name}': Opening circuit after "
+                        f"{self._failure_count} failures. "
+                        f"Will retry in {self.config.timeout_seconds}s"
+                    )
+                    self._state = CircuitState.OPEN
+                    self._success_count = 0
+
+    def call(self, func: Callable, *args, **kwargs) -> Any:
+        """
+        Execute function with circuit breaker protection.
+
+        Args:
+            func: Function to execute
+            *args: Positional arguments
+            **kwargs: Keyword arguments
+
+        Returns:
+            Function result
+
+        Raises:
+            CircuitBreakerOpen: If circuit is open
+            Exception: Original exception from func if it fails
+        """
+        with self._lock:
+            self._update_state()
+
+            # Check if circuit is open
+            if self._state == CircuitState.OPEN:
+                raise CircuitBreakerOpen(
+                    f"Circuit '{self.name}' is OPEN. "
+                    f"Failing fast to prevent repeated failures. "
+                    f"Last failure: {self._last_failure_time.isoformat() if self._last_failure_time else 'unknown'}. "
+                    f"Will retry in {self.config.timeout_seconds}s"
+                )
+
+            # Check half-open call limit
+            if self._state == CircuitState.HALF_OPEN:
+                if self._half_open_calls >= self.config.half_open_max_calls:
+                    raise CircuitBreakerOpen(
+                        f"Circuit '{self.name}' is HALF_OPEN with max calls reached. "
+                        f"Please wait for current test to complete."
+                    )
+                self._half_open_calls += 1
+
+        # Execute function
+        try:
+            result = func(*args, **kwargs)
+            self._on_success()
+            return result
+
+        except Exception as e:
+            self._on_failure(e)
+            raise
+
+        finally:
+            # Decrement half-open counter
+            with self._lock:
+                if self._state == CircuitState.HALF_OPEN:
+                    self._half_open_calls -= 1
+
+    def decorator(self):
+        """
+        Decorator for protecting functions with circuit breaker.
+
+        Usage:
+            breaker = CircuitBreaker("my_service")
+
+            @breaker.decorator()
+            def my_function():
+                return do_something()
+        """
+        def wrapper(func):
+            def decorated(*args, **kwargs):
+                return self.call(func, *args, **kwargs)
+            return decorated
+        return wrapper
+
+    def reset(self):
+        """
+        Manually reset circuit breaker to closed state.
+
+        Useful for:
+        - Testing
+        - Manual intervention
+        - Clearing error state
+        """
+        with self._lock:
+            logger.info(f"Circuit '{self.name}': Manual reset to CLOSED state")
+            self._state = CircuitState.CLOSED
+            self._failure_count = 0
+            self._success_count = 0
+            self._last_failure_time = None
+            self._half_open_calls = 0
+
+    def get_stats(self) -> dict:
+        """
+        Get circuit breaker statistics.
+
+        Returns:
+            Dictionary with current state and counters
+        """
+        with self._lock:
+            self._update_state()
+            return {
+                "name": self.name,
+                "state": self._state.value,
+                "failure_count": self._failure_count,
+                "success_count": self._success_count,
+                "last_failure_time": self._last_failure_time.isoformat() if self._last_failure_time else None,
+                "config": {
+                    "failure_threshold": self.config.failure_threshold,
+                    "success_threshold": self.config.success_threshold,
+                    "timeout_seconds": self.config.timeout_seconds,
+                }
+            }
+
+
+class CircuitBreakerOpen(Exception):
+    """Exception raised when circuit breaker is open."""
+    pass
--- a/src/utils/formatters.py
+++ b/src/utils/formatters.py
--- a/src/utils/input_validation.py
+++ b/src/utils/input_validation.py
@@ -0,0 +1,411 @@
+#!/usr/bin/env python3
+"""
+Input Validation and Path Sanitization Module
+
+Provides robust validation for user inputs with security protections
+against path traversal, injection attacks, and other malicious inputs.
+"""
+
+import os
+import re
+import logging
+from pathlib import Path
+from typing import Optional, List
+
+logger = logging.getLogger(__name__)
+
+# Maximum file size (10GB)
+MAX_FILE_SIZE_BYTES = 10 * 1024 * 1024 * 1024
+
+# Allowed audio file extensions
+ALLOWED_AUDIO_EXTENSIONS = {".mp3", ".wav", ".m4a", ".flac", ".ogg", ".aac"}
+
+# Allowed output formats
+ALLOWED_OUTPUT_FORMATS = {"vtt", "srt", "txt", "json"}
+
+# Model name validation (whitelist)
+ALLOWED_MODEL_NAMES = {"tiny", "base", "small", "medium", "large-v1", "large-v2", "large-v3"}
+
+# Device validation
+ALLOWED_DEVICES = {"cuda", "auto", "cpu"}
+
+# Compute type validation
+ALLOWED_COMPUTE_TYPES = {"float16", "int8", "auto"}
+
+
+class ValidationError(Exception):
+    """Base exception for validation errors."""
+    pass
+
+
+class PathTraversalError(ValidationError):
+    """Exception for path traversal attempts."""
+    pass
+
+
+class InvalidFileTypeError(ValidationError):
+    """Exception for invalid file types."""
+    pass
+
+
+class FileSizeError(ValidationError):
+    """Exception for file size issues."""
+    pass
+
+
+def sanitize_error_message(error_msg: str, sanitize_paths: bool = True) -> str:
+    """
+    Sanitize error messages to prevent information leakage.
+
+    Args:
+        error_msg: Original error message
+        sanitize_paths: Whether to sanitize file paths (default: True)
+
+    Returns:
+        Sanitized error message
+    """
+    if not sanitize_paths:
+        return error_msg
+
+    # Replace absolute paths with relative paths
+    # Pattern: /home/user/... or /media/... or /var/... or /tmp/...
+    path_pattern = r'(/(?:home|media|var|tmp|opt|usr)/[^\s:,]+)'
+
+    def replace_path(match):
+        full_path = match.group(1)
+        try:
+            # Try to get just the filename
+            basename = os.path.basename(full_path)
+            return f"<file:{basename}>"
+        except:
+            return "<file:redacted>"
+
+    sanitized = re.sub(path_pattern, replace_path, error_msg)
+
+    # Also sanitize user names if present
+    sanitized = re.sub(r'/home/([^/]+)/', '/home/<user>/', sanitized)
+
+    return sanitized
+
+
+def validate_path_safe(file_path: str, allowed_dirs: Optional[List[str]] = None) -> Path:
+    """
+    Validate and sanitize a file path to prevent directory traversal attacks.
+
+    Args:
+        file_path: Path to validate
+        allowed_dirs: Optional list of allowed base directories
+
+    Returns:
+        Resolved Path object
+
+    Raises:
+        PathTraversalError: If path contains traversal attempts
+        ValidationError: If path is invalid
+    """
+    if not file_path:
+        raise ValidationError("File path cannot be empty")
+
+    # Convert to Path object
+    try:
+        path = Path(file_path)
+    except Exception as e:
+        raise ValidationError(f"Invalid path format: {sanitize_error_message(str(e))}")
+
+    # Check for path traversal attempts
+    path_str = str(path)
+    if ".." in path_str:
+        logger.warning(f"Path traversal attempt detected: {path_str}")
+        raise PathTraversalError("Path traversal (..) is not allowed")
+
+    # Check for null bytes
+    if "\x00" in path_str:
+        logger.warning(f"Null byte in path detected: {path_str}")
+        raise PathTraversalError("Null bytes in path are not allowed")
+
+    # Resolve to absolute path (but don't follow symlinks yet)
+    try:
+        resolved_path = path.resolve()
+    except Exception as e:
+        raise ValidationError(f"Cannot resolve path: {sanitize_error_message(str(e))}")
+
+    # If allowed_dirs specified, ensure path is within one of them
+    if allowed_dirs:
+        allowed = False
+        for allowed_dir in allowed_dirs:
+            try:
+                allowed_dir_path = Path(allowed_dir).resolve()
+                # Check if resolved_path is under allowed_dir
+                resolved_path.relative_to(allowed_dir_path)
+                allowed = True
+                break
+            except ValueError:
+                # Not relative to this allowed_dir
+                continue
+
+        if not allowed:
+            logger.warning(
+                f"Path outside allowed directories: {path_str}, "
+                f"allowed: {allowed_dirs}"
+            )
+            raise PathTraversalError(
+                f"Path must be within allowed directories. "
+                f"Allowed: {[os.path.basename(d) for d in allowed_dirs]}"
+            )
+
+    return resolved_path
+
+
+def validate_audio_file(
+    file_path: str,
+    allowed_dirs: Optional[List[str]] = None,
+    max_size_bytes: int = MAX_FILE_SIZE_BYTES
+) -> Path:
+    """
+    Validate audio file path with security checks.
+
+    Args:
+        file_path: Path to audio file
+        allowed_dirs: Optional list of allowed base directories
+        max_size_bytes: Maximum allowed file size
+
+    Returns:
+        Validated Path object
+
+    Raises:
+        ValidationError: If validation fails
+        PathTraversalError: If path traversal detected
+        FileNotFoundError: If file doesn't exist
+        InvalidFileTypeError: If file type not allowed
+        FileSizeError: If file too large
+    """
+    # Validate and sanitize path
+    validated_path = validate_path_safe(file_path, allowed_dirs)
+
+    # Check file exists
+    if not validated_path.exists():
+        raise FileNotFoundError(f"Audio file not found: {validated_path.name}")
+
+    # Check it's a file (not directory)
+    if not validated_path.is_file():
+        raise ValidationError(f"Path is not a file: {validated_path.name}")
+
+    # Check file extension
+    file_ext = validated_path.suffix.lower()
+    if file_ext not in ALLOWED_AUDIO_EXTENSIONS:
+        raise InvalidFileTypeError(
+            f"Unsupported audio format: {file_ext}. "
+            f"Supported: {', '.join(sorted(ALLOWED_AUDIO_EXTENSIONS))}"
+        )
+
+    # Check file size
+    try:
+        file_size = validated_path.stat().st_size
+    except Exception as e:
+        raise ValidationError(f"Cannot check file size: {sanitize_error_message(str(e))}")
+
+    if file_size == 0:
+        raise FileSizeError(f"Audio file is empty: {validated_path.name}")
+
+    if file_size > max_size_bytes:
+        raise FileSizeError(
+            f"File too large: {file_size / (1024**3):.2f}GB. "
+            f"Maximum: {max_size_bytes / (1024**3):.2f}GB"
+        )
+
+    # Warn for large files (>1GB)
+    if file_size > 1024 * 1024 * 1024:
+        logger.warning(
+            f"Large file: {file_size / (1024**3):.2f}GB, "
+            f"may require extended processing time"
+        )
+
+    return validated_path
+
+
+def validate_output_directory(
+    dir_path: str,
+    allowed_dirs: Optional[List[str]] = None,
+    create_if_missing: bool = True
+) -> Path:
+    """
+    Validate output directory path.
+
+    Args:
+        dir_path: Directory path
+        allowed_dirs: Optional list of allowed base directories
+        create_if_missing: Create directory if it doesn't exist
+
+    Returns:
+        Validated Path object
+
+    Raises:
+        ValidationError: If validation fails
+        PathTraversalError: If path traversal detected
+    """
+    # Validate and sanitize path
+    validated_path = validate_path_safe(dir_path, allowed_dirs)
+
+    # Create if requested and doesn't exist
+    if create_if_missing and not validated_path.exists():
+        try:
+            validated_path.mkdir(parents=True, exist_ok=True)
+            logger.info(f"Created output directory: {validated_path}")
+        except Exception as e:
+            raise ValidationError(
+                f"Cannot create output directory: {sanitize_error_message(str(e))}"
+            )
+
+    # Check it's a directory
+    if validated_path.exists() and not validated_path.is_dir():
+        raise ValidationError(f"Path exists but is not a directory: {validated_path.name}")
+
+    return validated_path
+
+
+def validate_model_name(model_name: str) -> str:
+    """
+    Validate Whisper model name.
+
+    Args:
+        model_name: Model name to validate
+
+    Returns:
+        Validated model name
+
+    Raises:
+        ValidationError: If model name invalid
+    """
+    if not model_name:
+        raise ValidationError("Model name cannot be empty")
+
+    model_name = model_name.strip().lower()
+
+    if model_name not in ALLOWED_MODEL_NAMES:
+        raise ValidationError(
+            f"Invalid model name: {model_name}. "
+            f"Allowed: {', '.join(sorted(ALLOWED_MODEL_NAMES))}"
+        )
+
+    return model_name
+
+
+def validate_device(device: str) -> str:
+    """
+    Validate device parameter.
+
+    Args:
+        device: Device name to validate
+
+    Returns:
+        Validated device name
+
+    Raises:
+        ValidationError: If device invalid
+    """
+    if not device:
+        raise ValidationError("Device cannot be empty")
+
+    device = device.strip().lower()
+
+    if device not in ALLOWED_DEVICES:
+        raise ValidationError(
+            f"Invalid device: {device}. "
+            f"Allowed: {', '.join(sorted(ALLOWED_DEVICES))}"
+        )
+
+    return device
+
+
+def validate_compute_type(compute_type: str) -> str:
+    """
+    Validate compute type parameter.
+
+    Args:
+        compute_type: Compute type to validate
+
+    Returns:
+        Validated compute type
+
+    Raises:
+        ValidationError: If compute type invalid
+    """
+    if not compute_type:
+        raise ValidationError("Compute type cannot be empty")
+
+    compute_type = compute_type.strip().lower()
+
+    if compute_type not in ALLOWED_COMPUTE_TYPES:
+        raise ValidationError(
+            f"Invalid compute type: {compute_type}. "
+            f"Allowed: {', '.join(sorted(ALLOWED_COMPUTE_TYPES))}"
+        )
+
+    return compute_type
+
+
+def validate_output_format(output_format: str) -> str:
+    """
+    Validate output format parameter.
+
+    Args:
+        output_format: Output format to validate
+
+    Returns:
+        Validated output format
+
+    Raises:
+        ValidationError: If output format invalid
+    """
+    if not output_format:
+        raise ValidationError("Output format cannot be empty")
+
+    output_format = output_format.strip().lower()
+
+    if output_format not in ALLOWED_OUTPUT_FORMATS:
+        raise ValidationError(
+            f"Invalid output format: {output_format}. "
+            f"Allowed: {', '.join(sorted(ALLOWED_OUTPUT_FORMATS))}"
+        )
+
+    return output_format
+
+
+def validate_numeric_range(
+    value: float,
+    min_value: float,
+    max_value: float,
+    param_name: str
+) -> float:
+    """
+    Validate numeric parameter is within range.
+
+    Args:
+        value: Value to validate
+        min_value: Minimum allowed value
+        max_value: Maximum allowed value
+        param_name: Parameter name for error messages
+
+    Returns:
+        Validated value
+
+    Raises:
+        ValidationError: If value out of range
+    """
+    if value < min_value or value > max_value:
+        raise ValidationError(
+            f"{param_name} must be between {min_value} and {max_value}, "
+            f"got {value}"
+        )
+
+    return value
+
+
+def validate_beam_size(beam_size: int) -> int:
+    """Validate beam size parameter."""
+    return int(validate_numeric_range(beam_size, 1, 20, "beam_size"))
+
+
+def validate_temperature(temperature: float) -> float:
+    """Validate temperature parameter."""
+    return validate_numeric_range(temperature, 0.0, 1.0, "temperature")
--- a/src/utils/startup.py
+++ b/src/utils/startup.py
@@ -0,0 +1,237 @@
+#!/usr/bin/env python3
+"""
+Common Startup Logic Module
+
+Centralizes startup procedures shared between MCP and API servers,
+including GPU health checks, job queue initialization, and health monitoring.
+"""
+
+import os
+import sys
+import logging
+from typing import Optional, Tuple
+
+logger = logging.getLogger(__name__)
+
+# Import GPU health check with reset
+try:
+    from core.gpu_health import check_gpu_health_with_reset
+    GPU_HEALTH_CHECK_AVAILABLE = True
+except ImportError as e:
+    logger.warning(f"GPU health check with reset not available: {e}")
+    GPU_HEALTH_CHECK_AVAILABLE = False
+
+
+def perform_startup_gpu_check(
+    required_device: str = "cuda",
+    auto_reset: bool = True,
+    exit_on_failure: bool = True
+) -> bool:
+    """
+    Perform startup GPU health check with optional auto-reset.
+
+    This function:
+    1. Checks if GPU health check is available
+    2. Runs comprehensive GPU health check
+    3. Attempts auto-reset if check fails and auto_reset=True
+    4. Optionally exits process if check fails
+
+    Args:
+        required_device: Required device ("cuda", "auto")
+        auto_reset: Enable automatic GPU driver reset on failure
+        exit_on_failure: Exit process if GPU check fails
+
+    Returns:
+        True if GPU check passed, False otherwise
+
+    Side effects:
+        May exit process if exit_on_failure=True and check fails
+    """
+    if not GPU_HEALTH_CHECK_AVAILABLE:
+        logger.warning("GPU health check not available, starting without GPU validation")
+        if exit_on_failure:
+            logger.error("GPU health check required but not available. Exiting.")
+            sys.exit(1)
+        return False
+
+    try:
+        logger.info("=" * 70)
+        logger.info("PERFORMING STARTUP GPU HEALTH CHECK")
+        logger.info("=" * 70)
+
+        status = check_gpu_health_with_reset(
+            expected_device=required_device,
+            auto_reset=auto_reset
+        )
+
+        logger.info("=" * 70)
+        logger.info("STARTUP GPU CHECK SUCCESSFUL")
+        logger.info(f"GPU Device: {status.device_name}")
+        logger.info(f"Memory Available: {status.memory_available_gb:.2f} GB")
+        logger.info(f"Test Duration: {status.test_duration_seconds:.2f}s")
+        logger.info("=" * 70)
+
+        return True
+
+    except Exception as e:
+        logger.error("=" * 70)
+        logger.error("STARTUP GPU CHECK FAILED")
+        logger.error(f"Error: {e}")
+
+        if exit_on_failure:
+            logger.error("This service requires GPU. Terminating.")
+            logger.error("=" * 70)
+            sys.exit(1)
+        else:
+            logger.error("Continuing without GPU (may have reduced functionality)")
+            logger.error("=" * 70)
+            return False
+
+
+def initialize_job_queue(
+    max_queue_size: Optional[int] = None,
+    metadata_dir: Optional[str] = None
+) -> 'JobQueue':
+    """
+    Initialize job queue with environment variable configuration.
+
+    Args:
+        max_queue_size: Override for max queue size (uses env var if None)
+        metadata_dir: Override for metadata directory (uses env var if None)
+
+    Returns:
+        Initialized JobQueue instance (started)
+    """
+    from core.job_queue import JobQueue
+
+    # Get configuration from environment
+    if max_queue_size is None:
+        max_queue_size = int(os.getenv("JOB_QUEUE_MAX_SIZE", "100"))
+
+    if metadata_dir is None:
+        metadata_dir = os.getenv(
+            "JOB_METADATA_DIR",
+            "/media/raid/agents/tools/mcp-transcriptor/outputs/jobs"
+        )
+
+    logger.info("Initializing job queue...")
+    job_queue = JobQueue(max_queue_size=max_queue_size, metadata_dir=metadata_dir)
+    job_queue.start()
+    logger.info(f"Job queue started (max_size={max_queue_size}, metadata_dir={metadata_dir})")
+
+    return job_queue
+
+
+def initialize_health_monitor(
+    check_interval_minutes: Optional[int] = None,
+    enabled: Optional[bool] = None
+) -> Optional['HealthMonitor']:
+    """
+    Initialize GPU health monitor with environment variable configuration.
+
+    Args:
+        check_interval_minutes: Override for check interval (uses env var if None)
+        enabled: Override for enabled status (uses env var if None)
+
+    Returns:
+        Initialized HealthMonitor instance (started), or None if disabled
+    """
+    from core.gpu_health import HealthMonitor
+
+    # Get configuration from environment
+    if enabled is None:
+        enabled = os.getenv("GPU_HEALTH_CHECK_ENABLED", "true").lower() == "true"
+
+    if not enabled:
+        logger.info("GPU health monitoring disabled")
+        return None
+
+    if check_interval_minutes is None:
+        check_interval_minutes = int(os.getenv("GPU_HEALTH_CHECK_INTERVAL_MINUTES", "10"))
+
+    health_monitor = HealthMonitor(check_interval_minutes=check_interval_minutes)
+    health_monitor.start()
+    logger.info(f"GPU health monitor started (interval={check_interval_minutes} minutes)")
+
+    return health_monitor
+
+
+def startup_sequence(
+    service_name: str = "whisper-transcription",
+    require_gpu: bool = True,
+    initialize_queue: bool = True,
+    initialize_monitoring: bool = True
+) -> Tuple[Optional['JobQueue'], Optional['HealthMonitor']]:
+    """
+    Execute complete startup sequence for a Whisper transcription server.
+
+    This function performs all common startup tasks:
+    1. GPU health check with auto-reset
+    2. Job queue initialization
+    3. Health monitor initialization
+
+    Args:
+        service_name: Name of the service (for logging)
+        require_gpu: Whether GPU is required (exit if not available)
+        initialize_queue: Whether to initialize job queue
+        initialize_monitoring: Whether to initialize health monitoring
+
+    Returns:
+        Tuple of (job_queue, health_monitor) - either may be None
+
+    Side effects:
+        May exit process if GPU required but unavailable
+    """
+    logger.info(f"Starting {service_name}...")
+
+    # Step 1: GPU health check
+    gpu_ok = perform_startup_gpu_check(
+        required_device="cuda",
+        auto_reset=True,
+        exit_on_failure=require_gpu
+    )
+
+    if not gpu_ok and require_gpu:
+        # Should not reach here (exit_on_failure should have exited)
+        logger.error("GPU check failed and GPU is required")
+        sys.exit(1)
+
+    # Step 2: Initialize job queue
+    job_queue = None
+    if initialize_queue:
+        job_queue = initialize_job_queue()
+
+    # Step 3: Initialize health monitor
+    health_monitor = None
+    if initialize_monitoring:
+        health_monitor = initialize_health_monitor()
+
+    logger.info(f"{service_name} startup sequence completed")
+
+    return job_queue, health_monitor
+
+
+def cleanup_on_shutdown(
+    job_queue: Optional['JobQueue'] = None,
+    health_monitor: Optional['HealthMonitor'] = None,
+    wait_for_current_job: bool = True
+) -> None:
+    """
+    Perform cleanup on server shutdown.
+
+    Args:
+        job_queue: JobQueue instance to stop (if any)
+        health_monitor: HealthMonitor instance to stop (if any)
+        wait_for_current_job: Wait for current job to complete before stopping
+    """
+    logger.info("Shutting down...")
+
+    if job_queue:
+        job_queue.stop(wait_for_current=wait_for_current_job)
+        logger.info("Job queue stopped")
+
+    if health_monitor:
+        health_monitor.stop()
+        logger.info("Health monitor stopped")
+
+    logger.info("Shutdown complete")
--- a/src/utils/test_audio_generator.py
+++ b/src/utils/test_audio_generator.py
@@ -0,0 +1,96 @@
+"""
+Test audio generator for GPU health checks.
+
+Generates realistic test audio with speech using TTS (text-to-speech).
+"""
+
+import os
+import tempfile
+
+
+def generate_test_audio(duration_seconds: float = 3.0, frequency: int = 440) -> str:
+    """
+    Generate a test audio file with real speech for GPU health checks.
+
+    Args:
+        duration_seconds: Duration of audio in seconds (default: 3.0)
+        frequency: Legacy parameter, ignored (kept for backward compatibility)
+
+    Returns:
+        str: Path to temporary audio file
+
+    Implementation:
+        - Generate real speech using gTTS (Google Text-to-Speech)
+        - Fallback to pyttsx3 if gTTS fails or is unavailable
+        - Raises RuntimeError if both TTS engines fail
+        - Save as MP3 format
+        - Store in system temp directory
+        - Reuse same file if exists (cache)
+    """
+    # Use a consistent filename in temp directory for caching
+    temp_dir = tempfile.gettempdir()
+    audio_path = os.path.join(temp_dir, f"whisper_test_voice_{int(duration_seconds)}s.mp3")
+
+    # Return cached file if it exists and is valid
+    if os.path.exists(audio_path):
+        try:
+            # Verify file is readable and not empty
+            if os.path.getsize(audio_path) > 0:
+                return audio_path
+        except Exception:
+            # If file is corrupted, regenerate it
+            pass
+
+    # Generate speech with different text based on duration
+    if duration_seconds >= 3:
+        text = "This is a test of the Whisper speech recognition system. Testing one, two, three."
+    elif duration_seconds >= 2:
+        text = "This is a test of the Whisper system."
+    else:
+        text = "Testing Whisper."
+
+    # Try gTTS first (better quality, requires internet)
+    try:
+        from gtts import gTTS
+        tts = gTTS(text=text, lang='en', slow=False)
+        tts.save(audio_path)
+        return audio_path
+    except Exception as e:
+        print(f"gTTS failed ({e}), trying pyttsx3...")
+
+    # Fallback to pyttsx3 (offline, lower quality)
+    try:
+        import pyttsx3
+        engine = pyttsx3.init()
+        engine.save_to_file(text, audio_path)
+        engine.runAndWait()
+
+        # Verify file was created
+        if os.path.exists(audio_path) and os.path.getsize(audio_path) > 0:
+            return audio_path
+    except Exception as e:
+        raise RuntimeError(
+            f"Failed to generate test audio. Both gTTS and pyttsx3 failed. "
+            f"gTTS error: {e}. Please ensure TTS dependencies are installed: "
+            f"pip install gTTS pyttsx3"
+        )
+
+
+def cleanup_test_audio() -> None:
+    """
+    Remove all cached test audio files from temp directory.
+
+    Useful for cleanup after testing or to force regeneration.
+    """
+    temp_dir = tempfile.gettempdir()
+
+    # Find all test audio files
+    for filename in os.listdir(temp_dir):
+        if (filename.startswith("whisper_test_") and
+            (filename.endswith(".wav") or filename.endswith(".mp3"))):
+            filepath = os.path.join(temp_dir, filename)
+            try:
+                os.remove(filepath)
+            except Exception:
+                # Ignore errors during cleanup
+                pass
--- a/supervisor/transcriptor-api.conf
+++ b/supervisor/transcriptor-api.conf
@@ -0,0 +1,24 @@
+[program:whisper-api-server]
+command=/home/uad/agents/tools/mcp-transcriptor/venv/bin/python /home/uad/agents/tools/mcp-transcriptor/src/servers/api_server.py
+directory=/home/uad/agents/tools/mcp-transcriptor
+user=uad
+autostart=true
+autorestart=true
+redirect_stderr=true
+stdout_logfile=/home/uad/agents/tools/mcp-transcriptor/logs/transcriptor-api.log
+stdout_logfile_maxbytes=50MB
+stdout_logfile_backups=10
+environment=
+    PYTHONPATH="/home/uad/agents/tools/mcp-transcriptor/src",
+    CUDA_VISIBLE_DEVICES="0",
+    API_HOST="0.0.0.0",
+    API_PORT="8000",
+    WHISPER_MODEL_DIR="/home/uad/agents/tools/mcp-transcriptor/models",
+    TRANSCRIPTION_OUTPUT_DIR="/home/uad/agents/tools/mcp-transcriptor/outputs",
+    TRANSCRIPTION_BATCH_OUTPUT_DIR="/home/uad/agents/tools/mcp-transcriptor/outputs/batch",
+    TRANSCRIPTION_MODEL="large-v3",
+    TRANSCRIPTION_DEVICE="auto",
+    TRANSCRIPTION_COMPUTE_TYPE="auto",
+    TRANSCRIPTION_OUTPUT_FORMAT="txt"
+stopwaitsecs=10
+stopsignal=TERM
--- a/tests/test_async_api_integration.py
+++ b/tests/test_async_api_integration.py
@@ -0,0 +1,537 @@
+#!/usr/bin/env python3
+"""
+Test Phase 2: Async Job Queue Integration
+
+Tests the async job queue system for both API and MCP servers.
+Validates all new endpoints and error handling.
+"""
+
+import os
+import sys
+import time
+import json
+import logging
+import requests
+from pathlib import Path
+
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s [%(levelname)s] %(message)s',
+    datefmt='%Y-%m-%d %H:%M:%S'
+)
+logger = logging.getLogger(__name__)
+
+# Add src to path (go up one level from tests/ to root)
+sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
+
+# Color codes for terminal output
+class Colors:
+    GREEN = '\033[92m'
+    RED = '\033[91m'
+    YELLOW = '\033[93m'
+    BLUE = '\033[94m'
+    END = '\033[0m'
+    BOLD = '\033[1m'
+
+def print_success(msg):
+    print(f"{Colors.GREEN}✓ {msg}{Colors.END}")
+
+def print_error(msg):
+    print(f"{Colors.RED}✗ {msg}{Colors.END}")
+
+def print_info(msg):
+    print(f"{Colors.BLUE}ℹ {msg}{Colors.END}")
+
+def print_section(msg):
+    print(f"\n{Colors.BOLD}{Colors.YELLOW}{'='*70}{Colors.END}")
+    print(f"{Colors.BOLD}{Colors.YELLOW}{msg}{Colors.END}")
+    print(f"{Colors.BOLD}{Colors.YELLOW}{'='*70}{Colors.END}\n")
+
+
+class Phase2Tester:
+    def __init__(self, api_url="http://localhost:8000"):
+        self.api_url = api_url
+        self.test_results = []
+
+    def test(self, name, func):
+        """Run a test and record result"""
+        try:
+            logger.info(f"Testing: {name}")
+            print_info(f"Testing: {name}")
+            func()
+            logger.info(f"PASSED: {name}")
+            print_success(f"PASSED: {name}")
+            self.test_results.append((name, True, None))
+            return True
+        except AssertionError as e:
+            logger.error(f"FAILED: {name} - {str(e)}")
+            print_error(f"FAILED: {name}")
+            print_error(f"  Reason: {str(e)}")
+            self.test_results.append((name, False, str(e)))
+            return False
+        except Exception as e:
+            logger.error(f"ERROR: {name} - {str(e)}")
+            print_error(f"ERROR: {name}")
+            print_error(f"  Exception: {str(e)}")
+            self.test_results.append((name, False, f"Exception: {str(e)}"))
+            return False
+
+    def print_summary(self):
+        """Print test summary"""
+        print_section("TEST SUMMARY")
+
+        passed = sum(1 for _, result, _ in self.test_results if result)
+        failed = len(self.test_results) - passed
+
+        for name, result, error in self.test_results:
+            if result:
+                print_success(f"{name}")
+            else:
+                print_error(f"{name}")
+                if error:
+                    print(f"    {error}")
+
+        print(f"\n{Colors.BOLD}Total: {len(self.test_results)} | ", end="")
+        print(f"{Colors.GREEN}Passed: {passed}{Colors.END} | ", end="")
+        print(f"{Colors.RED}Failed: {failed}{Colors.END}\n")
+
+        return failed == 0
+
+    # ========================================================================
+    # API Server Tests
+    # ========================================================================
+
+    def test_api_root_endpoint(self):
+        """Test GET / returns new API information"""
+        logger.info(f"GET {self.api_url}/")
+        resp = requests.get(f"{self.api_url}/")
+        logger.info(f"Response status: {resp.status_code}")
+        assert resp.status_code == 200, f"Expected 200, got {resp.status_code}"
+
+        data = resp.json()
+        logger.info(f"Response data: {json.dumps(data, indent=2)}")
+        assert data["version"] == "0.2.0", "Version should be 0.2.0"
+        assert "POST /jobs" in str(data["endpoints"]), "Should have POST /jobs endpoint"
+        assert "workflow" in data, "Should have workflow documentation"
+
+    def test_api_health_endpoint(self):
+        """Test GET /health still works"""
+        logger.info(f"GET {self.api_url}/health")
+        resp = requests.get(f"{self.api_url}/health")
+        logger.info(f"Response status: {resp.status_code}")
+        assert resp.status_code == 200, f"Expected 200, got {resp.status_code}"
+
+        data = resp.json()
+        logger.info(f"Response data: {data}")
+        assert data["status"] == "healthy", "Health check should return healthy"
+
+    def test_api_models_endpoint(self):
+        """Test GET /models still works"""
+        logger.info(f"GET {self.api_url}/models")
+        resp = requests.get(f"{self.api_url}/models")
+        logger.info(f"Response status: {resp.status_code}")
+        assert resp.status_code == 200, f"Expected 200, got {resp.status_code}"
+
+        data = resp.json()
+        logger.info(f"Available models: {data.get('available_models', [])}")
+        assert "available_models" in data, "Should return available models"
+
+    def test_api_gpu_health_endpoint(self):
+        """Test GET /health/gpu returns GPU status"""
+        logger.info(f"GET {self.api_url}/health/gpu")
+        resp = requests.get(f"{self.api_url}/health/gpu")
+        logger.info(f"Response status: {resp.status_code}")
+        assert resp.status_code == 200, f"Expected 200, got {resp.status_code}"
+
+        data = resp.json()
+        logger.info(f"GPU health: {json.dumps(data, indent=2)}")
+        assert "gpu_available" in data, "Should have gpu_available field"
+        assert "gpu_working" in data, "Should have gpu_working field"
+        assert "interpretation" in data, "Should have interpretation field"
+
+        print_info(f"  GPU Status: {data.get('interpretation', 'unknown')}")
+
+    def test_api_submit_job_invalid_audio(self):
+        """Test POST /jobs with invalid audio path returns 400"""
+        payload = {
+            "audio_path": "/nonexistent/file.mp3",
+            "model_name": "tiny",
+            "output_format": "txt"
+        }
+
+        logger.info(f"POST {self.api_url}/jobs with invalid audio path")
+        logger.info(f"Payload: {json.dumps(payload, indent=2)}")
+        resp = requests.post(f"{self.api_url}/jobs", json=payload)
+        logger.info(f"Response status: {resp.status_code}")
+        logger.info(f"Response: {resp.json()}")
+        assert resp.status_code == 400, f"Expected 400, got {resp.status_code}"
+
+        data = resp.json()
+        assert "error" in data["detail"], "Should have error field"
+        assert data["detail"]["error"] == "Invalid audio file", f"Wrong error type: {data['detail']['error']}"
+
+        print_info(f"  Error message: {data['detail']['message'][:50]}...")
+
+    def test_api_submit_job_cpu_device_rejected(self):
+        """Test POST /jobs with device=cpu is rejected (400)"""
+        # Create a test audio file first
+        logger.info("Creating test audio file...")
+        test_audio = self._create_test_audio_file()
+        logger.info(f"Test audio created at: {test_audio}")
+
+        payload = {
+            "audio_path": test_audio,
+            "model_name": "tiny",
+            "device": "cpu",
+            "output_format": "txt"
+        }
+
+        logger.info(f"POST {self.api_url}/jobs with device=cpu")
+        logger.info(f"Payload: {json.dumps(payload, indent=2)}")
+        resp = requests.post(f"{self.api_url}/jobs", json=payload)
+        logger.info(f"Response status: {resp.status_code}")
+        logger.info(f"Response: {resp.json()}")
+        assert resp.status_code == 400, f"Expected 400, got {resp.status_code}"
+
+        data = resp.json()
+        assert "error" in data["detail"], "Should have error field"
+        assert "Invalid device" in data["detail"]["error"] or "CPU" in data["detail"]["message"], \
+            "Should reject CPU device"
+
+    def test_api_submit_job_success(self):
+        """Test POST /jobs with valid audio returns job_id"""
+        logger.info("Creating test audio file...")
+        test_audio = self._create_test_audio_file()
+        logger.info(f"Test audio created at: {test_audio}")
+
+        payload = {
+            "audio_path": test_audio,
+            "model_name": "tiny",
+            "device": "auto",
+            "output_format": "txt"
+        }
+
+        logger.info(f"POST {self.api_url}/jobs with valid audio")
+        logger.info(f"Payload: {json.dumps(payload, indent=2)}")
+        resp = requests.post(f"{self.api_url}/jobs", json=payload)
+        logger.info(f"Response status: {resp.status_code}")
+        logger.info(f"Response: {json.dumps(resp.json(), indent=2)}")
+        assert resp.status_code == 200, f"Expected 200, got {resp.status_code}"
+
+        data = resp.json()
+        assert "job_id" in data, "Should return job_id"
+        assert "status" in data, "Should return status"
+        assert data["status"] == "queued", f"Status should be queued, got {data['status']}"
+        assert "queue_position" in data, "Should return queue_position"
+        assert "message" in data, "Should return message"
+
+        logger.info(f"Job submitted successfully: {data['job_id']}")
+        print_info(f"  Job ID: {data['job_id']}")
+        print_info(f"  Queue position: {data['queue_position']}")
+
+        # Store job_id for later tests
+        self.test_job_id = data["job_id"]
+
+    def test_api_get_job_status(self):
+        """Test GET /jobs/{job_id} returns job status"""
+        if not hasattr(self, 'test_job_id'):
+            print_info("  Skipping (no test_job_id from previous test)")
+            return
+
+        resp = requests.get(f"{self.api_url}/jobs/{self.test_job_id}")
+        assert resp.status_code == 200, f"Expected 200, got {resp.status_code}"
+
+        data = resp.json()
+        assert "job_id" in data, "Should return job_id"
+        assert "status" in data, "Should return status"
+        assert data["status"] in ["queued", "running", "completed", "failed"], \
+            f"Invalid status: {data['status']}"
+
+        print_info(f"  Status: {data['status']}")
+
+    def test_api_get_job_status_not_found(self):
+        """Test GET /jobs/{job_id} with invalid ID returns 404"""
+        fake_job_id = "00000000-0000-0000-0000-000000000000"
+
+        resp = requests.get(f"{self.api_url}/jobs/{fake_job_id}")
+        assert resp.status_code == 404, f"Expected 404, got {resp.status_code}"
+
+        data = resp.json()
+        assert "error" in data["detail"], "Should have error field"
+        assert data["detail"]["error"] == "Job not found", f"Wrong error: {data['detail']['error']}"
+
+    def test_api_get_job_result_not_completed(self):
+        """Test GET /jobs/{job_id}/result when job not completed returns 409"""
+        if not hasattr(self, 'test_job_id'):
+            print_info("  Skipping (no test_job_id from previous test)")
+            return
+
+        # Check current status
+        status_resp = requests.get(f"{self.api_url}/jobs/{self.test_job_id}")
+        current_status = status_resp.json()["status"]
+
+        if current_status == "completed":
+            print_info("  Skipping (job already completed)")
+            return
+
+        resp = requests.get(f"{self.api_url}/jobs/{self.test_job_id}/result")
+        assert resp.status_code == 409, f"Expected 409, got {resp.status_code}"
+
+        data = resp.json()
+        assert "error" in data["detail"], "Should have error field"
+        assert data["detail"]["error"] == "Job not completed", f"Wrong error: {data['detail']['error']}"
+        assert "current_status" in data["detail"], "Should include current_status"
+
+    def test_api_list_jobs(self):
+        """Test GET /jobs returns job list"""
+        resp = requests.get(f"{self.api_url}/jobs")
+        assert resp.status_code == 200, f"Expected 200, got {resp.status_code}"
+
+        data = resp.json()
+        assert "jobs" in data, "Should have jobs field"
+        assert "total" in data, "Should have total field"
+        assert isinstance(data["jobs"], list), "Jobs should be a list"
+
+        print_info(f"  Total jobs: {data['total']}")
+
+    def test_api_list_jobs_with_filter(self):
+        """Test GET /jobs?status=queued filters by status"""
+        resp = requests.get(f"{self.api_url}/jobs?status=queued&limit=10")
+        assert resp.status_code == 200, f"Expected 200, got {resp.status_code}"
+
+        data = resp.json()
+        assert "jobs" in data, "Should have jobs field"
+        assert "filters" in data, "Should have filters field"
+        assert data["filters"]["status"] == "queued", "Filter should be applied"
+
+        # All returned jobs should be queued
+        for job in data["jobs"]:
+            assert job["status"] == "queued", f"Job {job['job_id']} has wrong status: {job['status']}"
+
+    def test_api_wait_for_job_completion(self):
+        """Test waiting for job to complete and retrieving result"""
+        if not hasattr(self, 'test_job_id'):
+            logger.warning("Skipping - no test_job_id from previous test")
+            print_info("  Skipping (no test_job_id from previous test)")
+            return
+
+        logger.info(f"Waiting for job {self.test_job_id} to complete (max 60s)...")
+        print_info("  Waiting for job to complete (max 60s)...")
+
+        max_wait = 60
+        start_time = time.time()
+
+        while time.time() - start_time < max_wait:
+            resp = requests.get(f"{self.api_url}/jobs/{self.test_job_id}")
+            data = resp.json()
+            status = data["status"]
+            elapsed = int(time.time() - start_time)
+
+            logger.info(f"Job status: {status} (elapsed: {elapsed}s)")
+            print_info(f"    Status: {status} (elapsed: {elapsed}s)")
+
+            if status == "completed":
+                logger.info("Job completed successfully!")
+                print_success("  Job completed!")
+
+                # Now get the result
+                logger.info("Fetching job result...")
+                result_resp = requests.get(f"{self.api_url}/jobs/{self.test_job_id}/result")
+                logger.info(f"Result response status: {result_resp.status_code}")
+                assert result_resp.status_code == 200, f"Expected 200, got {result_resp.status_code}"
+
+                result_data = result_resp.json()
+                logger.info(f"Result data keys: {result_data.keys()}")
+                assert "result" in result_data, "Should have result field"
+                assert len(result_data["result"]) > 0, "Result should not be empty"
+
+                actual_text = result_data["result"].strip()
+                logger.info(f"Transcription result: '{actual_text}'")
+                print_info(f"  Transcription: '{actual_text}'")
+                return
+
+            elif status == "failed":
+                error_msg = f"Job failed: {data.get('error', 'unknown error')}"
+                logger.error(error_msg)
+                raise AssertionError(error_msg)
+
+            time.sleep(2)
+
+        error_msg = f"Job did not complete within {max_wait}s"
+        logger.error(error_msg)
+        raise AssertionError(error_msg)
+
+    # ========================================================================
+    # MCP Server Tests (Import-based)
+    # ========================================================================
+
+    def test_mcp_imports(self):
+        """Test MCP server modules can be imported"""
+        try:
+            logger.info("Importing MCP server module...")
+            from servers import whisper_server
+
+            logger.info("Checking for new async tools...")
+            assert hasattr(whisper_server, 'transcribe_async'), "Should have transcribe_async tool"
+            assert hasattr(whisper_server, 'get_job_status'), "Should have get_job_status tool"
+            assert hasattr(whisper_server, 'get_job_result'), "Should have get_job_result tool"
+            assert hasattr(whisper_server, 'list_transcription_jobs'), "Should have list_transcription_jobs tool"
+            assert hasattr(whisper_server, 'check_gpu_health'), "Should have check_gpu_health tool"
+            assert hasattr(whisper_server, 'get_model_info_api'), "Should have get_model_info_api tool"
+            logger.info("All new tools found!")
+
+            # Verify old tools are removed
+            logger.info("Verifying old tools are removed...")
+            assert not hasattr(whisper_server, 'transcribe'), "Old transcribe tool should be removed"
+            assert not hasattr(whisper_server, 'batch_transcribe_audio'), "Old batch_transcribe_audio tool should be removed"
+            logger.info("Old tools successfully removed!")
+
+        except ImportError as e:
+            logger.error(f"Failed to import MCP server: {e}")
+            raise AssertionError(f"Failed to import MCP server: {e}")
+
+    def test_job_queue_integration(self):
+        """Test JobQueue integration is working"""
+        from core.job_queue import JobQueue, JobStatus
+
+        # Create a test queue
+        test_queue = JobQueue(max_queue_size=5, metadata_dir="/tmp/test_job_queue")
+
+        try:
+            # Verify it can be started
+            test_queue.start()
+            assert test_queue._worker_thread is not None, "Worker thread should be created"
+            assert test_queue._worker_thread.is_alive(), "Worker thread should be running"
+
+        finally:
+            # Clean up
+            test_queue.stop(wait_for_current=False)
+
+    def test_health_monitor_integration(self):
+        """Test HealthMonitor integration is working"""
+        from core.gpu_health import HealthMonitor
+
+        # Create a test monitor
+        test_monitor = HealthMonitor(check_interval_minutes=60)  # Long interval
+
+        try:
+            # Verify it can be started
+            test_monitor.start()
+            assert test_monitor._thread is not None, "Monitor thread should be created"
+            assert test_monitor._thread.is_alive(), "Monitor thread should be running"
+
+            # Check we can get status
+            status = test_monitor.get_latest_status()
+            assert status is not None, "Should have initial status"
+
+        finally:
+            # Clean up
+            test_monitor.stop()
+
+    # ========================================================================
+    # Helper Methods
+    # ========================================================================
+
+    def _create_test_audio_file(self):
+        """Get the path to the test audio file"""
+        # Use relative path from project root
+        project_root = Path(__file__).parent.parent
+        test_audio_path = str(project_root / "data" / "test.mp3")
+        if not os.path.exists(test_audio_path):
+            raise FileNotFoundError(f"Test audio file not found: {test_audio_path}")
+        return test_audio_path
+
+
+def main():
+    print_section("PHASE 2: ASYNC JOB QUEUE INTEGRATION TESTS")
+    logger.info("=" * 70)
+    logger.info("PHASE 2: ASYNC JOB QUEUE INTEGRATION TESTS")
+    logger.info("=" * 70)
+
+    # Check if API server is running
+    api_url = os.getenv("API_URL", "http://localhost:8000")
+    logger.info(f"Testing API server at: {api_url}")
+    print_info(f"Testing API server at: {api_url}")
+
+    try:
+        logger.info("Checking API server health...")
+        resp = requests.get(f"{api_url}/health", timeout=2)
+        logger.info(f"Health check status: {resp.status_code}")
+        if resp.status_code != 200:
+            logger.error(f"API server not responding correctly at {api_url}")
+            print_error(f"API server not responding correctly at {api_url}")
+            print_error("Please start the API server with: ./run_api_server.sh")
+            return 1
+    except requests.exceptions.RequestException as e:
+        logger.error(f"Cannot connect to API server: {e}")
+        print_error(f"Cannot connect to API server at {api_url}")
+        print_error("Please start the API server with: ./run_api_server.sh")
+        return 1
+
+    logger.info(f"API server is running at {api_url}")
+    print_success(f"API server is running at {api_url}")
+
+    # Create tester
+    tester = Phase2Tester(api_url=api_url)
+
+    # ========================================================================
+    # Run API Tests
+    # ========================================================================
+    print_section("API SERVER TESTS")
+    logger.info("Starting API server tests...")
+
+    tester.test("API Root Endpoint", tester.test_api_root_endpoint)
+    tester.test("API Health Endpoint", tester.test_api_health_endpoint)
+    tester.test("API Models Endpoint", tester.test_api_models_endpoint)
+    tester.test("API GPU Health Endpoint", tester.test_api_gpu_health_endpoint)
+
+    print_section("API JOB SUBMISSION TESTS")
+
+    tester.test("Submit Job - Invalid Audio (400)", tester.test_api_submit_job_invalid_audio)
+    tester.test("Submit Job - CPU Device Rejected (400)", tester.test_api_submit_job_cpu_device_rejected)
+    tester.test("Submit Job - Success (200)", tester.test_api_submit_job_success)
+
+    print_section("API JOB STATUS TESTS")
+
+    tester.test("Get Job Status - Success", tester.test_api_get_job_status)
+    tester.test("Get Job Status - Not Found (404)", tester.test_api_get_job_status_not_found)
+    tester.test("Get Job Result - Not Completed (409)", tester.test_api_get_job_result_not_completed)
+
+    print_section("API JOB LISTING TESTS")
+
+    tester.test("List Jobs", tester.test_api_list_jobs)
+    tester.test("List Jobs - With Filter", tester.test_api_list_jobs_with_filter)
+
+    print_section("API JOB COMPLETION TEST")
+
+    tester.test("Wait for Job Completion & Get Result", tester.test_api_wait_for_job_completion)
+
+    # ========================================================================
+    # Run MCP Tests
+    # ========================================================================
+    print_section("MCP SERVER TESTS")
+    logger.info("Starting MCP server tests...")
+
+    tester.test("MCP Module Imports", tester.test_mcp_imports)
+    tester.test("JobQueue Integration", tester.test_job_queue_integration)
+    tester.test("HealthMonitor Integration", tester.test_health_monitor_integration)
+
+    # ========================================================================
+    # Print Summary
+    # ========================================================================
+    logger.info("All tests completed, generating summary...")
+    success = tester.print_summary()
+
+    if success:
+        logger.info("ALL TESTS PASSED!")
+        print_section("ALL TESTS PASSED! ✓")
+        return 0
+    else:
+        logger.error("SOME TESTS FAILED!")
+        print_section("SOME TESTS FAILED! ✗")
+        return 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/tests/test_core_components.py
+++ b/tests/test_core_components.py
@@ -0,0 +1,287 @@
+#!/usr/bin/env python3
+
+"""
+Test script for Phase 1 components.
+
+Tests:
+1. Test audio file validation
+2. GPU health check
+3. Job queue operations
+
+IMPORTANT: This service requires GPU. Tests will fail if GPU is not available.
+"""
+
+import sys
+import os
+import time
+import torch
+import logging
+
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(levelname)s - %(message)s',
+    datefmt='%H:%M:%S'
+)
+
+# Add src to path (go up one level from tests/ to root)
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'src'))
+
+from core.gpu_health import check_gpu_health, HealthMonitor
+from core.job_queue import JobQueue, JobStatus
+
+
+def check_gpu_available():
+    """
+    Check if GPU is available. Exit if not.
+    This service requires GPU and will not run on CPU.
+    """
+    print("\n" + "="*60)
+    print("GPU REQUIREMENT CHECK")
+    print("="*60)
+
+    if not torch.cuda.is_available():
+        print("✗ CUDA not available - GPU is required for this service")
+        print("  This service is configured for GPU-only operation")
+        print("  Please ensure CUDA is properly installed and GPU is accessible")
+        print("="*60)
+        sys.exit(1)
+
+    gpu_name = torch.cuda.get_device_name(0)
+    gpu_memory = torch.cuda.get_device_properties(0).total_memory / (1024**3)
+    print(f"✓ GPU available: {gpu_name}")
+    print(f"✓ GPU memory: {gpu_memory:.2f} GB")
+    print("="*60)
+
+
+def test_audio_file():
+    """Test audio file existence and validity."""
+    print("\n" + "="*60)
+    print("TEST 1: Test Audio File")
+    print("="*60)
+
+    try:
+        # Use the actual test audio file (relative to project root)
+        project_root = os.path.join(os.path.dirname(__file__), '..')
+        audio_path = os.path.join(project_root, "data/test.mp3")
+
+        # Verify file exists
+        assert os.path.exists(audio_path), "Audio file not found"
+        print(f"✓ Test audio file exists: {audio_path}")
+
+        # Verify file is not empty
+        file_size = os.path.getsize(audio_path)
+        assert file_size > 0, "Audio file is empty"
+        print(f"✓ Audio file size: {file_size} bytes")
+
+        return True
+
+    except Exception as e:
+        print(f"✗ Audio file test failed: {e}")
+        import traceback
+        traceback.print_exc()
+        return False
+
+
+def test_gpu_health():
+    """Test GPU health check."""
+    print("\n" + "="*60)
+    print("TEST 2: GPU Health Check")
+    print("="*60)
+
+    try:
+        # Test with cuda device (enforcing GPU requirement)
+        print("\nRunning health check with device='cuda'...")
+        logging.info("Starting GPU health check...")
+        status = check_gpu_health(expected_device="cuda")
+        logging.info("GPU health check completed")
+
+        print(f"✓ Health check completed")
+        print(f"  - GPU available: {status.gpu_available}")
+        print(f"  - GPU working: {status.gpu_working}")
+        print(f"  - Device used: {status.device_used}")
+        print(f"  - Device name: {status.device_name}")
+        print(f"  - Memory total: {status.memory_total_gb:.2f} GB")
+        print(f"  - Memory available: {status.memory_available_gb:.2f} GB")
+        print(f"  - Test duration: {status.test_duration_seconds:.2f}s")
+        print(f"  - Error: {status.error}")
+
+        # Test health monitor
+        print("\nTesting HealthMonitor...")
+        monitor = HealthMonitor(check_interval_minutes=1)
+        monitor.start()
+        print("✓ Health monitor started")
+
+        time.sleep(1)
+
+        latest = monitor.get_latest_status()
+        assert latest is not None, "No status available from monitor"
+        print(f"✓ Latest status retrieved: {latest.device_used}")
+
+        monitor.stop()
+        print("✓ Health monitor stopped")
+
+        return True
+
+    except Exception as e:
+        print(f"✗ GPU health test failed: {e}")
+        import traceback
+        traceback.print_exc()
+        return False
+
+
+def test_job_queue():
+    """Test job queue operations."""
+    print("\n" + "="*60)
+    print("TEST 3: Job Queue")
+    print("="*60)
+
+    # Create temp directory for testing
+    import tempfile
+    temp_dir = tempfile.mkdtemp(prefix="job_queue_test_")
+    print(f"Using temp directory: {temp_dir}")
+
+    try:
+        # Initialize job queue
+        print("\nInitializing job queue...")
+        job_queue = JobQueue(max_queue_size=10, metadata_dir=temp_dir)
+        job_queue.start()
+        print("✓ Job queue started")
+
+        # Use the actual test audio file (relative to project root)
+        project_root = os.path.join(os.path.dirname(__file__), '..')
+        audio_path = os.path.join(project_root, "data/test.mp3")
+
+        # Test job submission
+        print("\nSubmitting test job...")
+        logging.info("Submitting transcription job to queue...")
+        job_info = job_queue.submit_job(
+            audio_path=audio_path,
+            model_name="tiny",
+            device="cuda",  # Enforcing GPU requirement
+            output_format="txt"
+        )
+        job_id = job_info["job_id"]
+        logging.info(f"Job submitted: {job_id}")
+        print(f"✓ Job submitted: {job_id}")
+        print(f"  - Status: {job_info['status']}")
+        print(f"  - Queue position: {job_info['queue_position']}")
+
+        # Test job status retrieval
+        print("\nRetrieving job status...")
+        logging.info("About to call get_job_status()...")
+        status = job_queue.get_job_status(job_id)
+        logging.info(f"get_job_status() returned: {status['status']}")
+        print(f"✓ Job status retrieved")
+        print(f"  - Status: {status['status']}")
+        print(f"  - Queue position: {status['queue_position']}")
+
+        # Wait for job to process
+        print("\nWaiting for job to process (max 30 seconds)...", flush=True)
+        logging.info("Waiting for transcription to complete...")
+        max_wait = 30
+        start = time.time()
+        while time.time() - start < max_wait:
+            logging.info("Calling get_job_status()...")
+            status = job_queue.get_job_status(job_id)
+            print(f"  Status: {status['status']}", flush=True)
+            logging.info(f"Job status: {status['status']}")
+
+            if status['status'] in ['completed', 'failed']:
+                logging.info("Job completed or failed, breaking out of loop")
+                break
+
+            logging.info("Job still running, sleeping 2 seconds...")
+            time.sleep(2)
+
+        final_status = job_queue.get_job_status(job_id)
+        print(f"\nFinal job status: {final_status['status']}")
+
+        if final_status['status'] == 'completed':
+            print(f"✓ Job completed successfully")
+            print(f"  - Result path: {final_status['result_path']}")
+            print(f"  - Processing time: {final_status['processing_time_seconds']:.2f}s")
+
+            # Test result retrieval
+            print("\nRetrieving job result...")
+            logging.info("Calling get_job_result()...")
+            result = job_queue.get_job_result(job_id)
+            logging.info(f"Result retrieved: {len(result)} characters")
+            print(f"✓ Result retrieved ({len(result)} characters)")
+            print(f"  Preview: {result[:100]}...")
+
+        elif final_status['status'] == 'failed':
+            print(f"✗ Job failed: {final_status['error']}")
+
+        # Test persistence by stopping and restarting
+        print("\nTesting persistence...")
+        logging.info("Stopping job queue...")
+        job_queue.stop(wait_for_current=False)
+        print("✓ Job queue stopped")
+        logging.info("Job queue stopped")
+
+        logging.info("Restarting job queue...")
+        job_queue2 = JobQueue(max_queue_size=10, metadata_dir=temp_dir)
+        job_queue2.start()
+        print("✓ Job queue restarted")
+        logging.info("Job queue restarted")
+
+        logging.info("Checking job status after restart...")
+        status_after_restart = job_queue2.get_job_status(job_id)
+        print(f"✓ Job still exists after restart: {status_after_restart['status']}")
+        logging.info(f"Job status after restart: {status_after_restart['status']}")
+
+        logging.info("Stopping job queue 2...")
+        job_queue2.stop()
+        logging.info("Job queue 2 stopped")
+
+        # Cleanup
+        import shutil
+        shutil.rmtree(temp_dir)
+        print(f"✓ Cleaned up temp directory")
+
+        return final_status['status'] == 'completed'
+
+    except Exception as e:
+        print(f"✗ Job queue test failed: {e}")
+        import traceback
+        traceback.print_exc()
+        return False
+
+
+def main():
+    """Run all tests."""
+    print("\n" + "="*60)
+    print("PHASE 1 COMPONENT TESTS")
+    print("="*60)
+
+    # Check GPU availability first - exit if no GPU
+    check_gpu_available()
+
+    results = {
+        "Test Audio File": test_audio_file(),
+        "GPU Health Check": test_gpu_health(),
+        "Job Queue": test_job_queue(),
+    }
+
+    print("\n" + "="*60)
+    print("TEST SUMMARY")
+    print("="*60)
+
+    for test_name, passed in results.items():
+        status = "✓ PASSED" if passed else "✗ FAILED"
+        print(f"{test_name:.<40} {status}")
+
+    all_passed = all(results.values())
+    print("\n" + "="*60)
+    if all_passed:
+        print("ALL TESTS PASSED ✓")
+    else:
+        print("SOME TESTS FAILED ✗")
+    print("="*60)
+
+    return 0 if all_passed else 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/tests/test_e2e_integration.py
+++ b/tests/test_e2e_integration.py
@@ -0,0 +1,523 @@
+#!/usr/bin/env python3
+"""
+Test Phase 4: End-to-End Integration Testing
+
+Comprehensive integration tests for the async job queue system.
+Tests all scenarios from the DEV_PLAN.md Phase 4 checklist.
+"""
+
+import os
+import sys
+import time
+import json
+import logging
+import requests
+import subprocess
+import signal
+from pathlib import Path
+from datetime import datetime
+
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s [%(levelname)s] %(message)s',
+    datefmt='%Y-%m-%d %H:%M:%S'
+)
+logger = logging.getLogger(__name__)
+
+# Add src to path (go up one level from tests/ to root)
+sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
+
+# Color codes for terminal output
+class Colors:
+    GREEN = '\033[92m'
+    RED = '\033[91m'
+    YELLOW = '\033[93m'
+    BLUE = '\033[94m'
+    CYAN = '\033[96m'
+    END = '\033[0m'
+    BOLD = '\033[1m'
+
+def print_success(msg):
+    print(f"{Colors.GREEN}✓ {msg}{Colors.END}")
+
+def print_error(msg):
+    print(f"{Colors.RED}✗ {msg}{Colors.END}")
+
+def print_info(msg):
+    print(f"{Colors.BLUE}ℹ {msg}{Colors.END}")
+
+def print_warning(msg):
+    print(f"{Colors.YELLOW}⚠ {msg}{Colors.END}")
+
+def print_section(msg):
+    print(f"\n{Colors.BOLD}{Colors.YELLOW}{'='*70}{Colors.END}")
+    print(f"{Colors.BOLD}{Colors.YELLOW}{msg}{Colors.END}")
+    print(f"{Colors.BOLD}{Colors.YELLOW}{'='*70}{Colors.END}\n")
+
+
+class Phase4Tester:
+    def __init__(self, api_url="http://localhost:8000", test_audio=None):
+        self.api_url = api_url
+        # Use relative path from project root if not provided
+        if test_audio is None:
+            project_root = Path(__file__).parent.parent
+            test_audio = str(project_root / "data" / "test.mp3")
+        self.test_audio = test_audio
+        self.test_results = []
+        self.server_process = None
+
+        # Verify test audio exists
+        if not os.path.exists(test_audio):
+            raise FileNotFoundError(f"Test audio file not found: {test_audio}")
+
+    def test(self, name, func):
+        """Run a test and record result"""
+        try:
+            logger.info(f"Testing: {name}")
+            print_info(f"Testing: {name}")
+            func()
+            logger.info(f"PASSED: {name}")
+            print_success(f"PASSED: {name}")
+            self.test_results.append((name, True, None))
+            return True
+        except AssertionError as e:
+            logger.error(f"FAILED: {name} - {str(e)}")
+            print_error(f"FAILED: {name}")
+            print_error(f"  Reason: {str(e)}")
+            self.test_results.append((name, False, str(e)))
+            return False
+        except Exception as e:
+            logger.error(f"ERROR: {name} - {str(e)}")
+            print_error(f"ERROR: {name}")
+            print_error(f"  Exception: {str(e)}")
+            self.test_results.append((name, False, f"Exception: {str(e)}"))
+            return False
+
+    def start_api_server(self, wait_time=5):
+        """Start the API server in background"""
+        print_info("Starting API server...")
+        # Script is in project root, one level up from tests/
+        script_path = Path(__file__).parent.parent / "run_api_server.sh"
+
+        # Start server in background
+        self.server_process = subprocess.Popen(
+            [str(script_path)],
+            stdout=subprocess.PIPE,
+            stderr=subprocess.STDOUT,
+            preexec_fn=os.setsid
+        )
+
+        # Wait for server to start
+        time.sleep(wait_time)
+
+        # Verify server is running
+        try:
+            response = requests.get(f"{self.api_url}/health", timeout=5)
+            if response.status_code == 200:
+                print_success("API server started successfully")
+                return True
+        except:
+            pass
+
+        print_error("API server failed to start")
+        return False
+
+    def stop_api_server(self):
+        """Stop the API server"""
+        if self.server_process:
+            print_info("Stopping API server...")
+            os.killpg(os.getpgid(self.server_process.pid), signal.SIGTERM)
+            self.server_process.wait(timeout=10)
+            print_success("API server stopped")
+
+    def wait_for_job_completion(self, job_id, timeout=60, poll_interval=2):
+        """Poll job status until completed or failed"""
+        start_time = time.time()
+        last_status = None
+
+        while time.time() - start_time < timeout:
+            try:
+                response = requests.get(f"{self.api_url}/jobs/{job_id}")
+                assert response.status_code == 200, f"Failed to get job status: {response.status_code}"
+
+                status_data = response.json()
+                current_status = status_data['status']
+
+                # Print status changes
+                if current_status != last_status:
+                    if status_data.get('queue_position') is not None:
+                        print_info(f"  Job status: {current_status}, queue position: {status_data['queue_position']}")
+                    else:
+                        print_info(f"  Job status: {current_status}")
+                    last_status = current_status
+
+                if current_status == "completed":
+                    return status_data
+                elif current_status == "failed":
+                    raise AssertionError(f"Job failed: {status_data.get('error', 'Unknown error')}")
+
+                time.sleep(poll_interval)
+            except requests.exceptions.RequestException as e:
+                raise AssertionError(f"Request failed: {e}")
+
+        raise AssertionError(f"Job did not complete within {timeout} seconds")
+
+    # ========================================================================
+    # TEST 1: Single Job Submission and Completion
+    # ========================================================================
+    def test_single_job_flow(self):
+        """Test complete job flow: submit → poll → get result"""
+        # Submit job
+        print_info("  Submitting job...")
+        response = requests.post(f"{self.api_url}/jobs", json={
+            "audio_path": self.test_audio,
+            "model_name": "large-v3",
+            "output_format": "txt"
+        })
+        assert response.status_code == 200, f"Job submission failed: {response.status_code}"
+
+        job_data = response.json()
+        assert 'job_id' in job_data, "No job_id in response"
+        # Status can be 'queued' or 'running' (if queue is empty and job starts immediately)
+        assert job_data['status'] in ['queued', 'running'], f"Expected status 'queued' or 'running', got '{job_data['status']}'"
+
+        job_id = job_data['job_id']
+        print_success(f"  Job submitted: {job_id}")
+
+        # Wait for completion
+        print_info("  Waiting for job completion...")
+        final_status = self.wait_for_job_completion(job_id)
+
+        assert final_status['status'] == 'completed', "Job did not complete"
+        assert final_status['result_path'] is not None, "No result_path in completed job"
+        assert final_status['processing_time_seconds'] is not None, "No processing time"
+        print_success(f"  Job completed in {final_status['processing_time_seconds']:.2f}s")
+
+        # Get result
+        print_info("  Retrieving result...")
+        response = requests.get(f"{self.api_url}/jobs/{job_id}/result")
+        assert response.status_code == 200, f"Failed to get result: {response.status_code}"
+
+        result_text = response.text
+        assert len(result_text) > 0, "Empty result"
+        print_success(f"  Got result: {len(result_text)} characters")
+
+    # ========================================================================
+    # TEST 2: Multiple Jobs in Queue (FIFO)
+    # ========================================================================
+    def test_multiple_jobs_fifo(self):
+        """Test multiple jobs are processed in FIFO order"""
+        job_ids = []
+
+        # Submit 3 jobs
+        print_info("  Submitting 3 jobs...")
+        for i in range(3):
+            response = requests.post(f"{self.api_url}/jobs", json={
+                "audio_path": self.test_audio,
+                "model_name": "tiny",  # Use tiny model for faster processing
+                "output_format": "txt"
+            })
+            assert response.status_code == 200, f"Job {i+1} submission failed"
+
+            job_data = response.json()
+            job_ids.append(job_data['job_id'])
+            print_info(f"    Job {i+1} submitted: {job_data['job_id']}, queue_position: {job_data.get('queue_position', 0)}")
+
+        # Wait for all jobs to complete
+        print_info("  Waiting for all jobs to complete...")
+        for i, job_id in enumerate(job_ids):
+            print_info(f"  Waiting for job {i+1}/{len(job_ids)}...")
+            final_status = self.wait_for_job_completion(job_id, timeout=120)
+            assert final_status['status'] == 'completed', f"Job {i+1} failed"
+
+        print_success(f"  All {len(job_ids)} jobs completed in FIFO order")
+
+    # ========================================================================
+    # TEST 3: GPU Health Check
+    # ========================================================================
+    def test_gpu_health_check(self):
+        """Test GPU health check endpoint"""
+        print_info("  Checking GPU health...")
+        response = requests.get(f"{self.api_url}/health/gpu")
+        assert response.status_code == 200, f"GPU health check failed: {response.status_code}"
+
+        health_data = response.json()
+        assert 'gpu_available' in health_data, "Missing gpu_available field"
+        assert 'gpu_working' in health_data, "Missing gpu_working field"
+        assert 'device_used' in health_data, "Missing device_used field"
+
+        print_info(f"    GPU Available: {health_data['gpu_available']}")
+        print_info(f"    GPU Working: {health_data['gpu_working']}")
+        print_info(f"    Device: {health_data['device_used']}")
+
+        if health_data['gpu_available']:
+            assert health_data['device_name'], "GPU available but no device_name"
+            assert health_data['test_duration_seconds'] < 3, "GPU test took too long (might be using CPU)"
+            print_success(f"  GPU is healthy: {health_data['device_name']}")
+        else:
+            print_warning("  GPU not available on this system")
+
+    # ========================================================================
+    # TEST 4: Invalid Audio Path
+    # ========================================================================
+    def test_invalid_audio_path(self):
+        """Test job submission with invalid audio path"""
+        print_info("  Submitting job with invalid path...")
+        response = requests.post(f"{self.api_url}/jobs", json={
+            "audio_path": "/invalid/path/does/not/exist.mp3",
+            "model_name": "large-v3"
+        })
+
+        # Should return 400 Bad Request
+        assert response.status_code == 400, f"Expected 400, got {response.status_code}"
+
+        error_data = response.json()
+        assert 'detail' in error_data or 'error' in error_data, "No error message in response"
+        print_success("  Invalid path rejected correctly")
+
+    # ========================================================================
+    # TEST 5: Job Not Found
+    # ========================================================================
+    def test_job_not_found(self):
+        """Test retrieving non-existent job"""
+        print_info("  Requesting non-existent job...")
+        fake_job_id = "00000000-0000-0000-0000-000000000000"
+
+        response = requests.get(f"{self.api_url}/jobs/{fake_job_id}")
+        assert response.status_code == 404, f"Expected 404, got {response.status_code}"
+        print_success("  Non-existent job handled correctly")
+
+    # ========================================================================
+    # TEST 6: Result Before Completion
+    # ========================================================================
+    def test_result_before_completion(self):
+        """Test getting result for job that hasn't completed"""
+        print_info("  Submitting job and trying to get result immediately...")
+
+        # Submit job
+        response = requests.post(f"{self.api_url}/jobs", json={
+            "audio_path": self.test_audio,
+            "model_name": "large-v3"
+        })
+        assert response.status_code == 200
+        job_id = response.json()['job_id']
+
+        # Try to get result immediately (job is still queued/running)
+        time.sleep(0.5)
+        response = requests.get(f"{self.api_url}/jobs/{job_id}/result")
+
+        # Should return 409 Conflict or similar
+        assert response.status_code in [409, 400, 404], f"Expected 4xx error, got {response.status_code}"
+        print_success("  Result request before completion handled correctly")
+
+        # Clean up: wait for job to complete
+        self.wait_for_job_completion(job_id)
+
+    # ========================================================================
+    # TEST 7: List Jobs
+    # ========================================================================
+    def test_list_jobs(self):
+        """Test listing jobs with filters"""
+        print_info("  Testing job listing...")
+
+        # List all jobs
+        response = requests.get(f"{self.api_url}/jobs")
+        assert response.status_code == 200, f"List jobs failed: {response.status_code}"
+
+        jobs_data = response.json()
+        assert 'jobs' in jobs_data, "No jobs array in response"
+        assert isinstance(jobs_data['jobs'], list), "Jobs is not a list"
+        print_info(f"    Found {len(jobs_data['jobs'])} jobs")
+
+        # List only completed jobs
+        response = requests.get(f"{self.api_url}/jobs?status=completed")
+        assert response.status_code == 200
+        completed_jobs = response.json()['jobs']
+        print_info(f"    Found {len(completed_jobs)} completed jobs")
+
+        # List with limit
+        response = requests.get(f"{self.api_url}/jobs?limit=5")
+        assert response.status_code == 200
+        limited_jobs = response.json()['jobs']
+        assert len(limited_jobs) <= 5, "Limit not respected"
+        print_success("  Job listing works correctly")
+
+    # ========================================================================
+    # TEST 8: Server Restart with Job Persistence
+    # ========================================================================
+    def test_server_restart_persistence(self):
+        """Test that jobs persist across server restarts"""
+        print_info("  Testing job persistence across restart...")
+
+        # Submit a job
+        response = requests.post(f"{self.api_url}/jobs", json={
+            "audio_path": self.test_audio,
+            "model_name": "tiny"
+        })
+        assert response.status_code == 200
+        job_id = response.json()['job_id']
+        print_info(f"    Submitted job: {job_id}")
+
+        # Get job count before restart
+        response = requests.get(f"{self.api_url}/jobs")
+        jobs_before = len(response.json()['jobs'])
+        print_info(f"    Jobs before restart: {jobs_before}")
+
+        # Restart server
+        print_info("  Restarting server...")
+        self.stop_api_server()
+        time.sleep(2)
+        assert self.start_api_server(wait_time=8), "Server failed to restart"
+
+        # Check jobs after restart
+        response = requests.get(f"{self.api_url}/jobs")
+        assert response.status_code == 200
+        jobs_after = len(response.json()['jobs'])
+        print_info(f"    Jobs after restart: {jobs_after}")
+
+        # Check our specific job is still there (this is the key test)
+        response = requests.get(f"{self.api_url}/jobs/{job_id}")
+        assert response.status_code == 200, "Job not found after restart"
+
+        # Note: Total count may differ due to job retention/cleanup, but persistence works if we can find the job
+        if jobs_after < jobs_before:
+            print_warning(f"  Job count decreased ({jobs_before} -> {jobs_after}), may be due to cleanup")
+
+        print_success("  Jobs persisted correctly across restart")
+
+    # ========================================================================
+    # TEST 9: Health Endpoint
+    # ========================================================================
+    def test_health_endpoint(self):
+        """Test basic health endpoint"""
+        print_info("  Checking health endpoint...")
+        response = requests.get(f"{self.api_url}/health")
+        assert response.status_code == 200, f"Health check failed: {response.status_code}"
+
+        health_data = response.json()
+        assert health_data['status'] == 'healthy', "Server not healthy"
+        print_success("  Health endpoint OK")
+
+    # ========================================================================
+    # TEST 10: Models Endpoint
+    # ========================================================================
+    def test_models_endpoint(self):
+        """Test models information endpoint"""
+        print_info("  Checking models endpoint...")
+        response = requests.get(f"{self.api_url}/models")
+        assert response.status_code == 200, f"Models endpoint failed: {response.status_code}"
+
+        models_data = response.json()
+        assert 'available_models' in models_data, "No available_models field"
+        assert 'available_devices' in models_data, "No available_devices field"
+        assert len(models_data['available_models']) > 0, "No models listed"
+        print_info(f"    Available models: {len(models_data['available_models'])}")
+        print_success("  Models endpoint OK")
+
+    def print_summary(self):
+        """Print test summary"""
+        print_section("TEST SUMMARY")
+
+        passed = sum(1 for _, result, _ in self.test_results if result)
+        failed = len(self.test_results) - passed
+
+        for name, result, error in self.test_results:
+            if result:
+                print_success(f"{name}")
+            else:
+                print_error(f"{name}")
+                if error:
+                    print(f"    {error}")
+
+        print(f"\n{Colors.BOLD}Total: {len(self.test_results)} | ", end="")
+        print(f"{Colors.GREEN}Passed: {passed}{Colors.END} | ", end="")
+        print(f"{Colors.RED}Failed: {failed}{Colors.END}\n")
+
+        return failed == 0
+
+    def run_all_tests(self, start_server=True):
+        """Run all Phase 4 integration tests"""
+        print_section("PHASE 4: END-TO-END INTEGRATION TESTING")
+
+        try:
+            # Start server if requested
+            if start_server:
+                if not self.start_api_server():
+                    print_error("Failed to start API server. Aborting tests.")
+                    return False
+            else:
+                # Verify server is already running
+                try:
+                    response = requests.get(f"{self.api_url}/health", timeout=5)
+                    if response.status_code != 200:
+                        print_error("Server is not responding. Please start it first.")
+                        return False
+                    print_info("Using existing API server")
+                except:
+                    print_error("Cannot connect to API server. Please start it first.")
+                    return False
+
+            # Run tests
+            print_section("TEST 1: Single Job Submission and Completion")
+            self.test("Single job flow (submit → poll → get result)", self.test_single_job_flow)
+
+            print_section("TEST 2: Multiple Jobs (FIFO Order)")
+            self.test("Multiple jobs in queue (FIFO)", self.test_multiple_jobs_fifo)
+
+            print_section("TEST 3: GPU Health Check")
+            self.test("GPU health check endpoint", self.test_gpu_health_check)
+
+            print_section("TEST 4: Error Handling - Invalid Path")
+            self.test("Invalid audio path rejection", self.test_invalid_audio_path)
+
+            print_section("TEST 5: Error Handling - Job Not Found")
+            self.test("Non-existent job handling", self.test_job_not_found)
+
+            print_section("TEST 6: Error Handling - Result Before Completion")
+            self.test("Result request before completion", self.test_result_before_completion)
+
+            print_section("TEST 7: Job Listing")
+            self.test("List jobs with filters", self.test_list_jobs)
+
+            print_section("TEST 8: Health Endpoint")
+            self.test("Basic health endpoint", self.test_health_endpoint)
+
+            print_section("TEST 9: Models Endpoint")
+            self.test("Models information endpoint", self.test_models_endpoint)
+
+            print_section("TEST 10: Server Restart Persistence")
+            self.test("Job persistence across server restart", self.test_server_restart_persistence)
+
+            # Print summary
+            success = self.print_summary()
+
+            return success
+
+        finally:
+            # Cleanup
+            if start_server and self.server_process:
+                self.stop_api_server()
+
+
+def main():
+    """Main test runner"""
+    import argparse
+
+    parser = argparse.ArgumentParser(description='Phase 4 Integration Tests')
+    parser.add_argument('--url', default='http://localhost:8000', help='API server URL')
+    # Default to None so Phase4Tester uses relative path
+    parser.add_argument('--audio', default=None,
+                       help='Path to test audio file (default: <project_root>/data/test.mp3)')
+    parser.add_argument('--no-start-server', action='store_true',
+                       help='Do not start server (assume it is already running)')
+    args = parser.parse_args()
+
+    tester = Phase4Tester(api_url=args.url, test_audio=args.audio)
+    success = tester.run_all_tests(start_server=not args.no_start_server)
+
+    sys.exit(0 if success else 1)
+
+
+if __name__ == '__main__':
+    main()
--- a/whisper_server.py
+++ b/whisper_server.py
@@ -1,108 +0,0 @@
-#!/usr/bin/env python3
-"""
-Faster Whisper-based Speech Recognition MCP Service
-Provides high-performance audio transcription with batch processing acceleration and multiple output formats
-"""
-
-import os
-import logging
-from mcp.server.fastmcp import FastMCP
-
-from model_manager import get_model_info
-from transcriber import transcribe_audio, batch_transcribe
-
-# Log configuration
-logging.basicConfig(level=logging.INFO)
-logger = logging.getLogger(__name__)
-
-# Create FastMCP server instance
-mcp = FastMCP(
-    name="fast-whisper-mcp-server",
-    version="0.1.1",
-    dependencies=["faster-whisper>=0.9.0", "torch==2.6.0+cu126", "torchaudio==2.6.0+cu126", "numpy>=1.20.0"]
-)
-
-@mcp.tool()
-def get_model_info_api() -> str:
-    """
-    Get available Whisper model information
-    """
-    return get_model_info()
-
-@mcp.tool()
-def transcribe(audio_path: str, model_name: str = "large-v3", device: str = "auto",
-              compute_type: str = "auto", language: str = None, output_format: str = "vtt",
-              beam_size: int = 5, temperature: float = 0.0, initial_prompt: str = None,
-              output_directory: str = None) -> str:
-    """
-    Transcribe audio files using Faster Whisper
-
-    Args:
-        audio_path: Path to the audio file
-        model_name: Model name (tiny, base, small, medium, large-v1, large-v2, large-v3)
-        device: Execution device (cpu, cuda, auto)
-        compute_type: Computation type (float16, int8, auto)
-        language: Language code (such as zh, en, ja, etc., auto-detect by default)
-        output_format: Output format (vtt, srt, json or txt)
-        beam_size: Beam search size, larger values may improve accuracy but reduce speed
-        temperature: Sampling temperature, greedy decoding
-        initial_prompt: Initial prompt text, can help the model better understand context
-        output_directory: Output directory path, defaults to the audio file's directory
-
-    Returns:
-        str: Transcription result, in VTT subtitle or JSON format
-    """
-    return transcribe_audio(
-        audio_path=audio_path,
-        model_name=model_name,
-        device=device,
-        compute_type=compute_type,
-        language=language,
-        output_format=output_format,
-        beam_size=beam_size,
-        temperature=temperature,
-        initial_prompt=initial_prompt,
-        output_directory=output_directory
-    )
-
-@mcp.tool()
-def batch_transcribe_audio(audio_folder: str, output_folder: str = None, model_name: str = "large-v3",
-                    device: str = "auto", compute_type: str = "auto", language: str = None,
-                    output_format: str = "vtt", beam_size: int = 5, temperature: float = 0.0,
-                    initial_prompt: str = None, parallel_files: int = 1) -> str:
-    """
-    Batch transcribe audio files in a folder
-
-    Args:
-        audio_folder: Path to the folder containing audio files
-        output_folder: Output folder path, defaults to a 'transcript' subfolder in audio_folder
-        model_name: Model name (tiny, base, small, medium, large-v1, large-v2, large-v3)
-        device: Execution device (cpu, cuda, auto)
-        compute_type: Computation type (float16, int8, auto)
-        language: Language code (such as zh, en, ja, etc., auto-detect by default)
-        output_format: Output format (vtt, srt, json or txt)
-        beam_size: Beam search size, larger values may improve accuracy but reduce speed
-        temperature: Sampling temperature, 0 means greedy decoding
-        initial_prompt: Initial prompt text, can help the model better understand context
-        parallel_files: Number of files to process in parallel (only effective in CPU mode)
-
-    Returns:
-        str: Batch processing summary, including processing time and success rate
-    """
-    return batch_transcribe(
-        audio_folder=audio_folder,
-        output_folder=output_folder,
-        model_name=model_name,
-        device=device,
-        compute_type=compute_type,
-        language=language,
-        output_format=output_format,
-        beam_size=beam_size,
-        temperature=temperature,
-        initial_prompt=initial_prompt,
-        parallel_files=parallel_files
-    )
-
-if __name__ == "__main__":
-    print("starting mcp server for whisper stt transcriptor")
-    mcp.run()
Author	SHA1	Message	Date
Alihan	5fb742a312	Add circuit breaker, input validation, and refactor startup logic - Implement circuit breaker pattern for GPU health checks - Prevents repeated failures with configurable thresholds - Three states: CLOSED, OPEN, HALF_OPEN - Integrated into GPU health monitoring - Add comprehensive input validation and path sanitization - Path traversal attack prevention - Whitelist-based validation for models, devices, formats - Error message sanitization to prevent information leakage - File size limits and security checks - Centralize startup logic across servers - Extract common startup procedures to utils/startup.py - Deduplicate GPU health checks and initialization code - Simplify both MCP and API server startup sequences - Add proper Python package structure - Add __init__.py files to all modules - Improve package organization - Add circuit breaker status API endpoints - GET /health/circuit-breaker - View circuit breaker stats - POST /health/circuit-breaker/reset - Reset circuit breaker - Reorganize test files into tests/ directory - Rename and restructure test files for better organization	2025-10-10 01:03:55 +03:00
Alihan	40555592e6	Fix deadlock in job queue and refactor Phase 2 tests - Fix: Change threading.Lock to threading.RLock in JobQueue to prevent deadlock - Issue: list_jobs() acquired lock then called get_job_status() which tried to acquire same lock - Solution: Use re-entrant lock (RLock) to allow nested lock acquisition (src/core/job_queue.py:144) - Refactor: Update test_phase2.py to use real test.mp3 file - Changed _create_test_audio_file() to return /home/uad/agents/tools/mcp-transcriptor/data/test.mp3 - Removed specific text assertion, now just verifies transcription is not empty - Tests use tiny model for speed while processing real 6.95s audio file - Update: Improve audio validation error handling in transcriber.py - Changed validate_audio_file() to use exception-based validation - Better error messages for API responses - Add: Job queue configuration to startup scripts - Added JOB_QUEUE_MAX_SIZE, JOB_METADATA_DIR, JOB_RETENTION_DAYS env vars - Added GPU health monitoring configuration - Create job metadata directory on startup	2025-10-10 00:11:36 +03:00
Alihan	1292f0f09b	Add GPU auto-reset, job queue, health monitoring, and test infrastructure Major features: - GPU auto-reset on CUDA errors with cooldown protection (handles sleep/wake) - Async job queue system for long-running transcriptions - Comprehensive GPU health monitoring with real model tests - Phase 1 component testing with detailed logging New modules: - src/core/gpu_reset.py: GPU driver reset with 5-min cooldown - src/core/gpu_health.py: Real GPU health checks using model inference - src/core/job_queue.py: FIFO queue with background worker and persistence - src/utils/test_audio_generator.py: Test audio generation for GPU checks - test_phase1.py: Component tests with logging - reset_gpu.sh: GPU driver reset script Updates: - CLAUDE.md: Added GPU auto-reset docs and passwordless sudo setup - requirements.txt: Updated to PyTorch CUDA 12.4 - Model manager: Integrated GPU health check with reset - Both servers: Added startup GPU validation with auto-reset - Startup scripts: Added GPU_RESET_COOLDOWN_MINUTES env var	2025-10-09 23:13:11 +03:00
Alihan	e7a457e602	Refactor codebase structure with organized src/ directory - Reorganize source code into src/ directory with logical subdirectories: - src/servers/: MCP and REST API server implementations - src/core/: Core business logic (transcriber, model_manager) - src/utils/: Utility modules (audio_processor, formatters) - Update all import statements to use proper module paths - Configure PYTHONPATH in startup scripts and Dockerfile - Update documentation with new structure and paths - Update pyproject.toml with package configuration - Keep DevOps files (scripts, Dockerfile, configs) at root level All functionality validated and working correctly.	2025-10-07 12:28:03 +03:00