mirror of
https://github.com/PierrunoYT/Kokoro-TTS-Local.git
synced 2025-01-27 02:30:25 +03:00
- Added audio format conversion functionality using pydub, supporting WAV, MP3, and AAC formats. - Improved error handling for voice directory access and audio conversion processes. - Updated README to reflect new web interface features and installation requirements, including FFmpeg. - Enhanced the TTS generation function to utilize the correct Python interpreter across platforms. - Documented new features in the README, including real-time progress monitoring and network sharing capabilities.
5.4 KiB
5.4 KiB
Kokoro TTS Local
A local implementation of the Kokoro Text-to-Speech model, featuring dynamic module loading, automatic dependency management, and a web interface.
Current Status
✅ WORKING - READY TO USE ✅
The project has been updated with:
- Automatic espeak-ng installation and configuration
- Dynamic module loading from Hugging Face
- Improved error handling and debugging
- Interactive CLI interface
- Cross-platform setup scripts
- Web interface with Gradio
Features
- Local text-to-speech synthesis using the Kokoro model
- Automatic espeak-ng setup using espeakng-loader
- Multiple voice support with easy voice selection
- Phoneme output support and visualization
- Interactive CLI for custom text input
- Voice listing functionality
- Dynamic module loading from Hugging Face
- Comprehensive error handling and logging
- Cross-platform support (Windows, Linux, macOS)
- NEW: Web Interface Features
- Modern, user-friendly UI
- Real-time generation progress
- Multiple output formats (WAV, MP3, AAC)
- Network sharing capabilities
- Audio playback and download
- Voice selection dropdown
- Detailed process logging
Prerequisites
- Python 3.8 or higher
- Git (for cloning the repository)
- Internet connection (for initial model download)
- FFmpeg (required for MP3/AAC conversion):
- Windows: Automatically installed with pydub
- Linux:
sudo apt-get install ffmpeg - macOS:
brew install ffmpeg
Dependencies
torch
phonemizer-fork
transformers
scipy
munch
soundfile
huggingface-hub
espeakng-loader
gradio>=4.0.0
pydub # For audio format conversion
Setup
Windows
# Clone the repository
git clone https://github.com/PierrunoYT/Kokoro-TTS-Local.git
cd Kokoro-TTS-Local
# Run the setup script
.\setup.ps1
Linux/macOS
# Clone the repository
git clone https://github.com/PierrunoYT/Kokoro-TTS-Local.git
cd Kokoro-TTS-Local
# Run the setup script
chmod +x setup.sh
./setup.sh
# Install FFmpeg (if needed)
# Linux:
sudo apt-get install ffmpeg
# macOS:
brew install ffmpeg
Manual Setup
If you prefer to set up manually:
- Create a virtual environment:
# Windows
python -m venv venv
.\venv\Scripts\activate
# Linux/macOS
python3 -m venv venv
source venv/bin/activate
- Install dependencies:
python -m pip install --upgrade pip
pip install -r requirements.txt
- Install system dependencies:
# Windows
# FFmpeg is automatically installed with pydub
# Linux
sudo apt-get update
sudo apt-get install espeak-ng ffmpeg
# macOS
brew install espeak ffmpeg
Usage
Web Interface
# Start the web interface
python gradio_interface.py
This will:
- Launch a web interface at http://localhost:7860
- Create a public share link (optional)
- Allow you to:
- Input text to synthesize
- Select from available voices
- Choose output format (WAV/MP3/AAC)
- Monitor generation progress
- Play or download generated audio
Command Line Interface
python tts_demo.py
The script will:
- Download necessary model files from Hugging Face
- Set up espeak-ng automatically using espeakng-loader
- Import required modules dynamically
- Test the phonemizer functionality
- Generate speech from your text with phoneme visualization
- Save the output as 'output.wav' (22050Hz sample rate)
Project Structure
models.py: Core model loading and speech generation functionality- Model building and initialization with dynamic imports
- Voice loading and management from Hugging Face
- Speech generation with phoneme output
- Voice listing functionality
- Automatic espeak-ng configuration
- Error handling and logging
tts_demo.py: Demo script showing basic usage- Command-line interface with argparse
- Interactive text input mode
- Voice selection and listing
- Error handling and user feedback
gradio_interface.py: Web interface implementation- Modern, responsive UI
- Real-time progress monitoring
- Multiple output formats
- Network sharing capabilities
setup.ps1: Windows PowerShell setup script- Environment creation
- Dependency installation
- Automatic configuration
setup.sh: Linux/macOS bash setup script- Environment creation
- Dependency installation
- Automatic configuration
requirements.txt: Project dependencies
Model Information
The project uses the Kokoro-82M model from Hugging Face:
- Repository: hexgrad/Kokoro-82M
- Model file:
kokoro-v0_19.pth - Voice files: Located in the
voices/directory - Supports multiple voice styles (use
--list-voicesto see available options) - Automatically downloads required files from Hugging Face
Technical Details
- Sample rate: 22050Hz
- Input: Text in any language (English recommended)
- Output: WAV/MP3/AAC audio file
- Dependencies are automatically managed
- Modules are dynamically loaded from Hugging Face
- Error handling includes stack traces for debugging
- Cross-platform compatibility through setup scripts
Contributing
Feel free to contribute by:
- Opening issues for bugs or feature requests
- Submitting pull requests with improvements
- Helping with documentation
- Testing different voices and reporting issues
- Suggesting new features or optimizations
- Testing on different platforms and reporting results
License
This project is licensed under the Apache 2.0 License.