Commit Graph

3 Commits

Author SHA1 Message Date
Alihan
1292f0f09b Add GPU auto-reset, job queue, health monitoring, and test infrastructure
Major features:
- GPU auto-reset on CUDA errors with cooldown protection (handles sleep/wake)
- Async job queue system for long-running transcriptions
- Comprehensive GPU health monitoring with real model tests
- Phase 1 component testing with detailed logging

New modules:
- src/core/gpu_reset.py: GPU driver reset with 5-min cooldown
- src/core/gpu_health.py: Real GPU health checks using model inference
- src/core/job_queue.py: FIFO queue with background worker and persistence
- src/utils/test_audio_generator.py: Test audio generation for GPU checks
- test_phase1.py: Component tests with logging
- reset_gpu.sh: GPU driver reset script

Updates:
- CLAUDE.md: Added GPU auto-reset docs and passwordless sudo setup
- requirements.txt: Updated to PyTorch CUDA 12.4
- Model manager: Integrated GPU health check with reset
- Both servers: Added startup GPU validation with auto-reset
- Startup scripts: Added GPU_RESET_COOLDOWN_MINUTES env var
2025-10-09 23:13:11 +03:00
Alihan
e7a457e602 Refactor codebase structure with organized src/ directory
- Reorganize source code into src/ directory with logical subdirectories:
  - src/servers/: MCP and REST API server implementations
  - src/core/: Core business logic (transcriber, model_manager)
  - src/utils/: Utility modules (audio_processor, formatters)

- Update all import statements to use proper module paths
- Configure PYTHONPATH in startup scripts and Dockerfile
- Update documentation with new structure and paths
- Update pyproject.toml with package configuration
- Keep DevOps files (scripts, Dockerfile, configs) at root level

All functionality validated and working correctly.
2025-10-07 12:28:03 +03:00
Alihan
2cc9f298a5 seperate mcp & api servers 2025-10-07 11:20:03 +03:00