- Added voice model files and patterns to .gitignore to prevent unnecessary tracking.
- Enhanced README to include details about the new 'uv' package manager for faster dependency management.
- Clarified setup instructions, emphasizing automatic installation of required tools and voice files.
- Updated voice file organization in the documentation to reflect on-demand downloading, improving user understanding of voice availability.
- Deleted multiple voice files including 'af_bella.pt', 'af_nicole.pt', 'af_sarah.pt', 'af_sky.pt', 'af.pt', 'am_adam.pt', 'am_michael.pt', 'bf_emma.pt', 'bf_isabella.pt', 'bm_george.pt', and 'bm_lewis.pt'.
- This cleanup is part of the ongoing effort to streamline voice management and improve project organization.
- Updated setup.ps1 and setup.sh to check for and install the 'uv' package manager if not already present.
- Modified virtual environment creation to use 'uv' for consistency across platforms.
- Improved activation instructions for the virtual environment in both scripts.
- Added FFmpeg installation for Windows and system dependencies for Linux and macOS to ensure all necessary tools are available for TTS functionality.
- Streamlined dependency installation process to utilize 'uv' for package management.
- Streamlined the project structure section by removing redundant details and focusing on key files and their purposes.
- Updated the file structure to include new directories and files, such as the Gradio SSL certificate and various audio output formats.
- Clarified voice categories in the documentation, ensuring accurate representation of available voices.
- Improved overall readability and organization of the README to enhance user understanding of the project layout.
- Changed voice category names in the README from "African" to "American" for clarity and accuracy.
- Ensured that the list of available voices reflects the correct regional designations, enhancing user understanding of voice options.
- Added a comprehensive file structure section to the README, outlining the organization of project files and directories.
- Improved clarity on the purpose of key files, such as the Gradio interface, model implementation, and setup scripts, enhancing user understanding of the project layout.
- Refactored the TTS generation process to initialize the model globally and load voices dynamically, improving efficiency and usability.
- Introduced a new load_and_validate_voice function to ensure requested voices exist before loading, enhancing error handling.
- Updated generate_tts_with_logs to provide real-time logging during speech generation, including phoneme processing and audio saving.
- Improved audio conversion process with better error handling and temporary file management.
- Set default voice to 'af_bella' in the Gradio interface for improved user experience.
- Updated the generate_speech function to download and import the Kokoro module dynamically from Hugging Face, enhancing flexibility and maintainability.
- Improved error handling during speech generation to provide clearer feedback in case of failures.
- Improved voice file organization by storing them in a local 'voices' directory and ensuring its automatic creation if missing.
- Enhanced load_voice function to download missing voice files, defaulting to 'af_bella' for better usability.
- Updated command-line argument defaults in tts_demo.py to align with new voice management features.
- Enhanced README to clarify voice availability and usage instructions, improving user experience.
- Updated voice file retrieval to store voices in a local 'voices' directory, improving organization and accessibility.
- Implemented automatic creation of the voices directory if it doesn't exist, ensuring smoother user experience.
- Enhanced load_voice function to download missing voice files locally, defaulting to 'af_bella' for better usability.
- Adjusted tqdm import for improved compatibility with Windows consoles and configured it to prevent encoding issues.
- Updated command-line argument defaults in tts_demo.py to reflect changes in voice management.
- Enhanced the list_available_voices function to dynamically download voice files from Hugging Face based on the repository contents.
- Updated the fallback voice list to include additional voice options for better user experience.
- Revised README to clearly list available voices by category, improving clarity for users.
- Added functionality to download additional required files, including config.json, during model initialization in models.py.
- Improved module import process for better clarity and organization.
- Included a test for the phonemizer to ensure functionality.
- Updated README to provide specific instructions for enabling Developer Mode or running Python as Administrator on Windows for optimal performance and symlink support.
- Removed the get_default_voices_path function and replaced it with a more robust list_available_voices function that utilizes the new list_available_voices method from models.py.
- Introduced get_voices_path in models.py to streamline voice file retrieval across platforms.
- Improved voice downloading logic to ensure availability of voice files from Hugging Face.
- Updated README to include instructions for downloading initial models and voices, enhancing user setup experience.
- Revised instructions for launching the Gradio web interface, emphasizing the need to visit the correct URL.
- Added note regarding automatic port selection if port 7860 is in use, enhancing user experience and troubleshooting.
- Added audio format conversion functionality using pydub, supporting WAV, MP3, and AAC formats.
- Improved error handling for voice directory access and audio conversion processes.
- Updated README to reflect new web interface features and installation requirements, including FFmpeg.
- Enhanced the TTS generation function to utilize the correct Python interpreter across platforms.
- Documented new features in the README, including real-time progress monitoring and network sharing capabilities.
- Introduced load_and_validate_voice function to ensure requested voice exists before loading.
- Added command-line options for model path, output file, and language code with default values.
- Implemented progress indicators using tqdm for model and voice loading, as well as speech generation.
- Updated default text handling and ensured proper cleanup of resources after execution.