- Upgrade PyTorch and torchaudio to 2.6.0 with CUDA 12.4 support - Update GPU reset script to gracefully stop/start Ollama via supervisorctl - Add Docker Compose configuration for both API and MCP server modes - Implement comprehensive Docker entrypoint for multi-mode deployment - Add GPU health check cleanup to prevent memory leaks - Fix transcription memory management with proper resource cleanup - Add filename security validation to prevent path traversal attacks - Include .dockerignore for optimized Docker builds - Remove deprecated supervisor configuration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
11 KiB
Transcriptor API - Filename Validation Bug Fix
Issue Summary
The transcriptor API is rejecting valid audio files due to overly strict path validation. Files with .. (double periods) anywhere in the filename are being rejected as potential path traversal attacks, even when they appear naturally in legitimate filenames.
Current Behavior
Error Observed
{
"detail": {
"error": "Upload failed",
"message": "Audio file validation failed: Path traversal (..) is not allowed"
}
}
HTTP Response
- Status Code: 500
- Endpoint:
POST /transcribe - Request: File upload with filename containing
..
Example Failing Filename
This Weird FPV Drone only takes one kind of Battery... Rekon 35 V2.m4a
^^^
(Three dots, parsed as "..")
Root Cause Analysis
Current Validation Logic (Problematic)
The API is likely checking for .. anywhere in the filename string, which creates false positives:
# CURRENT (WRONG)
if ".." in filename:
raise ValidationError("Path traversal (..) is not allowed")
This rejects legitimate filenames like:
"video...mp4"(ellipsis in title)"Part 1... Part 2.m4a"(ellipsis separator)"Wait... what.mp4"(dramatic pause)
Actual Security Concern
Path traversal attacks use .. as directory separators to navigate up the filesystem:
../../etc/passwd(DANGEROUS)../../../secrets.txt(DANGEROUS)video...mp4(SAFE - just a filename)
Recommended Fix
Option 1: Path Component Validation (Recommended)
Check for .. only when it appears as a complete path component, not as part of the filename text.
import os
from pathlib import Path
def validate_filename(filename: str) -> bool:
"""
Validate filename for path traversal attacks.
Returns True if safe, raises ValidationError if dangerous.
"""
# Normalize the path
normalized = os.path.normpath(filename)
# Check if normalization changed the path (indicates traversal)
if normalized != filename:
raise ValidationError(f"Path traversal detected: {filename}")
# Check for absolute paths
if os.path.isabs(filename):
raise ValidationError(f"Absolute paths not allowed: {filename}")
# Split into components and check for parent directory references
parts = Path(filename).parts
if ".." in parts:
raise ValidationError(f"Parent directory references not allowed: {filename}")
# Check for any path separators (should be basename only)
if os.sep in filename or (os.altsep and os.altsep in filename):
raise ValidationError(f"Path separators not allowed: {filename}")
return True
# Examples:
validate_filename("video.mp4") # ✓ PASS
validate_filename("video...mp4") # ✓ PASS (ellipsis)
validate_filename("This is... a video.m4a") # ✓ PASS
validate_filename("../../../etc/passwd") # ✗ FAIL (traversal)
validate_filename("dir/../file.mp4") # ✗ FAIL (traversal)
validate_filename("/etc/passwd") # ✗ FAIL (absolute)
Option 2: Basename-Only Validation (Simpler)
Only accept basenames (no directory components at all):
import os
def validate_filename(filename: str) -> bool:
"""
Ensure filename contains no path components.
"""
# Extract basename
basename = os.path.basename(filename)
# Must match original (no path components)
if basename != filename:
raise ValidationError(f"Filename must not contain path components: {filename}")
# Additional check: no path separators
if "/" in filename or "\\" in filename:
raise ValidationError(f"Path separators not allowed: {filename}")
return True
# Examples:
validate_filename("video.mp4") # ✓ PASS
validate_filename("video...mp4") # ✓ PASS
validate_filename("../file.mp4") # ✗ FAIL
validate_filename("dir/file.mp4") # ✗ FAIL
Option 3: Regex Pattern Matching (Most Strict)
Use a whitelist approach for allowed characters:
import re
def validate_filename(filename: str) -> bool:
"""
Validate filename using whitelist of safe characters.
"""
# Allow: letters, numbers, spaces, dots, hyphens, underscores
# Length: 1-255 characters
pattern = r'^[a-zA-Z0-9 .\-_]{1,255}\.[a-zA-Z0-9]{2,10}$'
if not re.match(pattern, filename):
raise ValidationError(f"Invalid filename format: {filename}")
# Additional safety: reject if starts/ends with dot
if filename.startswith('.') or filename.endswith('.'):
raise ValidationError(f"Filename cannot start or end with dot: {filename}")
return True
# Examples:
validate_filename("video.mp4") # ✓ PASS
validate_filename("video...mp4") # ✓ PASS
validate_filename("My Video... Part 2.m4a") # ✓ PASS
validate_filename("../file.mp4") # ✗ FAIL (starts with ..)
validate_filename("file<>.mp4") # ✗ FAIL (invalid chars)
Implementation Steps
1. Locate Current Validation Code
Search for files containing the validation logic:
grep -r "Path traversal" /path/to/transcriptor-api
grep -r '".."' /path/to/transcriptor-api
grep -r "normpath\|basename" /path/to/transcriptor-api
2. Update Validation Function
Replace the current naive check with one of the recommended solutions above.
Priority Order:
- Option 1 (Path Component Validation) - Best security/usability balance
- Option 2 (Basename-Only) - Simplest, very secure
- Option 3 (Regex) - Most restrictive, may reject valid files
3. Test Cases
Create comprehensive test suite:
import pytest
def test_valid_filenames():
"""Test filenames that should be accepted."""
valid_names = [
"video.mp4",
"audio.m4a",
"This is... a test.mp4",
"Part 1... Part 2.wav",
"video...multiple...dots.mp3",
"My-Video_2024.mp4",
"song (remix).m4a",
]
for filename in valid_names:
assert validate_filename(filename), f"Should accept: {filename}"
def test_dangerous_filenames():
"""Test filenames that should be rejected."""
dangerous_names = [
"../../../etc/passwd",
"../../secrets.txt",
"../file.mp4",
"/etc/passwd",
"C:\\Windows\\System32\\file.txt",
"dir/../file.mp4",
"file/../../etc/passwd",
]
for filename in dangerous_names:
with pytest.raises(ValidationError):
validate_filename(filename)
def test_edge_cases():
"""Test edge cases."""
edge_cases = [
(".", False), # Current directory
("..", False), # Parent directory
("...", True), # Just dots (valid)
("....", True), # Multiple dots (valid)
(".hidden.mp4", True), # Hidden file (valid on Unix)
("", False), # Empty string
("a" * 256, False), # Too long
]
for filename, should_pass in edge_cases:
if should_pass:
assert validate_filename(filename)
else:
with pytest.raises(ValidationError):
validate_filename(filename)
4. Update Error Response
Provide clearer error messages:
# BAD (current)
{"detail": {"error": "Upload failed", "message": "Audio file validation failed: Path traversal (..) is not allowed"}}
# GOOD (improved)
{
"detail": {
"error": "Invalid filename",
"message": "Filename contains path traversal characters. Please use only the filename without directory paths.",
"filename": "../../etc/passwd",
"suggestion": "Use: passwd.txt"
}
}
Testing the Fix
Manual Testing
-
Test with problematic filename from bug report:
curl -X POST http://192.168.1.210:33767/transcribe \ -F "file=@/path/to/This Weird FPV Drone only takes one kind of Battery... Rekon 35 V2.m4a" \ -F "model=medium"Expected: HTTP 200 (success)
-
Test with actual path traversal:
curl -X POST http://192.168.1.210:33767/transcribe \ -F "file=@/tmp/test.m4a;filename=../../etc/passwd" \ -F "model=medium"Expected: HTTP 400 (validation error)
-
Test with various ellipsis patterns:
"video...mp4"→ Should pass"Part 1... Part 2.m4a"→ Should pass"Wait... what!.mp4"→ Should pass
Automated Testing
# integration_test.py
import requests
def test_ellipsis_filenames():
"""Test files with ellipsis in names."""
test_cases = [
"video...mp4",
"This is... a test.m4a",
"Wait... what.mp3",
]
for filename in test_cases:
response = requests.post(
"http://192.168.1.210:33767/transcribe",
files={"file": (filename, open("test_audio.m4a", "rb"))},
data={"model": "medium"}
)
assert response.status_code == 200, f"Failed for: {filename}"
Security Considerations
What We're Protecting Against
- Path Traversal:
../../../sensitive/file - Absolute Paths:
/etc/passwdorC:\Windows\System32\ - Hidden Paths:
./.git/config
What We're NOT Breaking
- Ellipsis in titles:
"Wait... what.mp4" - Multiple extensions:
"file.tar.gz" - Special characters:
"My Video (2024).mp4"
Additional Hardening (Optional)
def sanitize_and_validate_filename(filename: str) -> str:
"""
Sanitize filename and validate for safety.
Returns cleaned filename or raises error.
"""
# Remove null bytes
filename = filename.replace("\0", "")
# Extract basename (strips any path components)
filename = os.path.basename(filename)
# Limit length
max_length = 255
if len(filename) > max_length:
name, ext = os.path.splitext(filename)
filename = name[:max_length-len(ext)] + ext
# Validate
validate_filename(filename)
return filename
Deployment Checklist
- Update validation function with recommended fix
- Add comprehensive test suite
- Test with real-world filenames (including bug report case)
- Test security: attempt path traversal attacks
- Update API documentation
- Review error messages for clarity
- Deploy to staging environment
- Run integration tests
- Monitor logs for validation failures
- Deploy to production
- Verify bug reporter's file now works
Contact & Context
Bug Report Date: 2025-10-26
Affected Endpoint: POST /transcribe
Error Code: HTTP 500
Client Application: yt-dlp-webui v3
Example Failing Request:
POST http://192.168.1.210:33767/transcribe
Content-Type: multipart/form-data
file: "This Weird FPV Drone only takes one kind of Battery... Rekon 35 V2.m4a"
model: "medium"
Current Behavior: Returns 500 error with path traversal message Expected Behavior: Accepts file and processes transcription
Quick Reference
Files to Check
/path/to/api/validators.pyor similar/path/to/api/upload_handler.py/path/to/api/routes/transcribe.py
Search Commands
# Find validation code
rg "Path traversal" --type py
rg '"\.\."' --type py
rg "ValidationError.*filename" --type py
# Find upload handlers
rg "def.*upload|def.*transcribe" --type py
Priority Fix
Use Option 1 (Path Component Validation) - it provides the best balance of security and usability.