# NeuTTS Air ||VIDEO|| *Click the image above to watch NeuTTS Air in action on YouTube!* *Created by [Neuphonic](http://neuphonic.com/) - building faster, smaller, on-device voice AI* State-of-the-art Voice AI has been locked behind web APIs for too long. NeuTTS Air is the world’s first super-realistic, on-device, TTS speech language model with instant voice cloning. Built off a 0.5B LLM backbone, NeuTTS Air brings natural-sounding speech, real-time performance, built-in security and speaker cloning to your local device - unlocking a new category of embedded voice agents, assistants, toys, and compliance-safe apps. ## Key Features - 🗣Best-in-class realism for its size - produces natural, ultra-realistic voices that sound human - 📱Optimised for on-device deployment - provided in GGML format, ready to run on phones, laptops, or even Raspberry Pis - 👫Instant voice cloning - create your own speaker with as little as 3 seconds of audio - 🚄Simple LM + codec architecture built off a 0.5B backbone - the sweet spot between speed, size, and quality for real-world applications ## Model Details NeuTTS Air is built off Qwen 0.5B - a lightweight yet capable language model optimised for text understanding and generation - as well as a powerful combination of technologies designed for efficiency and quality: - **Audio Codec**: [NeuCodec](https://huggingface.co/neuphonic/neucodec) - our proprietary neural audio codec that achieves exceptional audio quality at low bitrates using a single codebook - **Format**: Available in GGML format for efficient on-device inference - **Responsibility**: Watermarked outputs - **Inference Speed**: Real-time generation on mid-range devices - **Power Consumption**: Optimised for mobile and embedded devices ## Get Started 1. **Clone Git Repo** ```bash git clone https://github.com/neuphonic/neutts-air.git ``` ```bash cd neuttsair ``` 2. **Install `espeak` (required dependency)** Please refer to the following link for instructions on how to install `espeak`: https://github.com/espeak-ng/espeak-ng/blob/master/docs/guide.md ```bash # Mac OS brew install espeak # Ubuntu/Debian sudo apt install espeak ``` Mac users may need to put the following lines at the top of the neutts.py file. ```python from phonemizer.backend.espeak.wrapper import EspeakWrapper _ESPEAK_LIBRARY = '/opt/homebrew/Cellar/espeak/1.48.04_1/lib/libespeak.1.1.48.dylib' #use the Path to the library. EspeakWrapper.set_library(_ESPEAK_LIBRARY) ``` 3. **Install Python dependencies** The requirements file includes the dependencies needed to run the model with PyTorch. When using an ONNX decoder or a GGML model, some dependencies (such as PyTorch) are no longer required. The inference is compatible and tested on `python>=3.11`. ``` pip install -r requirements.txt ``` 4. **(Optional) Install Llama-cpp-python to use the `GGUF` models.** ``` pip install llama-cpp-python ``` To run llama-cpp with GPU suport (CUDA, MPS) support please refer to: https://pypi.org/project/llama-cpp-python/ 5. **(Optional) Install onnxruntime to use the `.onnx` decoder.** If you wnat to run the onnxdecoder ``` pip install onnxruntime ``` ## Basic Example Run the basic example script to synthesize speech: ```bash python -m examples.basic_example \ --input_text "My name is Dave, and um, I'm from London" \ --ref_audio samples/dave.wav \ --ref_text samples/dave.txt ``` To specify a particular model repo for the backbone or codec, add the `--backbone` argument. Available backbones are listed in [NeuTTS-Air huggingface collection](https://huggingface.co/collections/neuphonic/neutts-air-68cc14b7033b4c56197ef350). Several examples are available, including a Jupyter notebook in the `examples` folder. ### Simple One-Code Block Usage ```python from neuttsair.neutts import NeuTTSAir import soundfile as sf tts = NeuTTSAir( backbone_repo="neuphonic/neutts-air-q4-gguf", backbone_device="cpu", codec_repo="neuphonic/neucodec", codec_device="cpu") input_text = "My name is Dave, and um, I'm from London." ref_text = "samples/dave.txt" ref_audio_path = "samples/dave.wav" ref_text = open(ref_text, "r").read().strip() ref_codes = tts.encode_reference(ref_audio_path) wav = tts.infer(input_text, ref_codes, ref_text) sf.write("test.wav", wav, 24000) ``` ## Advanced Examples ### GGML Backbone Example ```bash python -m examples.basic_example \ --input_text "My name is Dave, and um, I'm from London" \ --ref_audio ./samples/dave.wav \ --ref_text ./samples/dave.txt \ --backbone neuphonic/neutts-air-q4-gguf ``` ### Onnx Decoder Example Make sure you have installed ```onnxruntime``` ```bash python -m examples.onnx_example \ --input_text "My name is Dave, and um, I'm from London" \ --ref_codes samples/dave.pt \ --ref_text samples/dave.txt ``` To run the model with the onnx decoder you need to encode the reference sample. Please refer to the encode_reference example. #### Encode reference You only need to provide a reference audio for the reference encoding. ```bash python -m examples.encode_reference \ --ref_audio ./samples/dave.wav \ --output_path encoded_reference.pt ``` ## Prepare References for Cloning NeuTTS Air requires two inputs: 1. A reference audio sample (`.wav` file) 2. A text string The model then synthesises the text as speech in the style of the reference audio. This is what enables NeuTTS Air’s instant voice cloning capability. ### Example Reference Files You can find some ready-to-use samples in the `examples` folder: - `samples/dave.wav` - `samples/jo.wav` ### Guidelines for Best Results For optimal performance, reference audio samples should be: 1. **Mono channel** 2. **16-44 kHz sample rate** 3. **3–15 seconds in length** 4. **Saved as a `.wav` file** 5. **Clean** — minimal to no background noise 6. **Natural, continuous speech** — like a monologue or conversation, with few pauses, so the model can capture tone effectively ## Responsibility Every audio file generated by NeuTTS Air includes [Perth (Perceptual Threshold) Watermarker](https://github.com/resemble-ai/perth). ## Disclaimer Don't use this model to do bad things… please. ## Developer Requirements To run the pre commit hooks to contribute to this project run: ```bash pip install pre-commit ``` Then: ```bash pre-commit install ```