# Mimic 3 ![mimic 3 mark 2](img/mimic3-mark-ii.png) A fast and local neural text to speech system developed by [Mycroft](https://mycroft.ai/) for the [Mark II](https://mycroft.ai/product/mark-ii/). * [Available voices](https://github.com/MycroftAI/mimic3-voices) * [How does it work?](mimic3-tts/#architecture) ## Use Cases * [Mycroft TTS plugin](#mycroft-tts-plugin) * `mycroft-say 'Hello world.'` * [Web server](#web-server-and-client) * `curl -X POST --data 'Hello world.' --output - localhost:59125/api/tts | aplay` * Drop-in [replacement for MaryTTS](#marytts-compatibility) * [Command-line tool](#command-line-tools) * `mimic3 'Hello world.' | aplay` * [Voice for screen reader](mimic3-tts/#speech-dispatcher) * `spd-say 'Hello world.'` ## Dependencies Mimic 3 requires: * Python 3.7 or higher * The [Onnx runtime](https://onnxruntime.ai/) * [gruut](https://github.com/rhasspy/gruut) or [eSpeak-ng](https://github.com/espeak-ng/espeak-ng) or [epitran](https://github.com/dmort27/epitran/) (depending on the voice) ## Installation ### eSpeak Some voices depend on [eSpeak-ng](https://github.com/espeak-ng/espeak-ng), specifically `libespeak-ng.so`. For those voices, make sure that libespeak-ng is installed with: ``` sh sudo apt-get install libespeak-ng1 ``` On 32-bit ARM platforms (a.k.a. `armv7l` or `armhf`), you will also need some extra libraries: ``` sh sudo apt-get install libatomic1 libgomp1 libatlas-base-dev ``` ### Mycroft TTS Plugin Install the plugin: ``` sh mycroft-pip install plugin-tts-mimic3[all] ``` Enable the plugin in your [mycroft.conf](https://mycroft-ai.gitbook.io/docs/using-mycroft-ai/customizations/mycroft-conf) file: ``` sh mycroft-config set tts.module mimic3_tts_plug ``` or you can manually add the following to `mycroft.conf` with `mycroft-config edit user`: ``` json "tts": { "module": "mimic3_tts_plug" } ``` See the [plugin's documentation](https://github.com/MycroftAI/plugin-tts-mimic3) for more options. ### Docker image A pre-built Docker image is available for the following platforms: * `linux/amd64` * For desktops and laptops (`x86_64` CPUs) * `linux/arm64` * For Raspberry 3/4 and Zero 2 with [64-bit Pi OS](https://www.raspberrypi.com/news/raspberry-pi-os-64-bit/) * `linux/arm/v7` * For Raspberry Pi 1/2/3/4 and Zero 2 with 32-bit Pi OS Install/update with: ``` sh docker pull mycroftai/mimic3 ``` Once installed, check out the following scripts for running: * [`mimic3`](docker/mimic3) * [`mimic3-server`](docker/mimic3-server) * [`mimic3-download`](docker/mimic3-download) Or you can manually run the web server with: ``` sh docker run \ -it \ -p 59125:59125 \ -v "${HOME}/.local/share/mimic3:/home/mimic3/.local/share/mimic3" \ 'mycroftai/mimic3' ``` Voices will be automatically downloaded to `${HOME}/.local/share/mimic3/voices` ### Debian Package Grab the Debian package from the [latest release](https://github.com/mycroftAI/mimic3/releases) for your platform: * `mimic3-tts__amd64.deb` * For desktops and laptops (`x86_64` CPUs) * `mimic3-tts__arm64.deb` * For Raspberry 3/4 and Zero 2 with [64-bit Pi OS](https://www.raspberrypi.com/news/raspberry-pi-os-64-bit/) * `mimic3-tts__armhf.deb` * For Raspberry Pi 1/2/3/4 and Zero 2 with 32-bit Pi OS Once downloaded, install the package with (note the `./`): ``` sh sudo apt install ./mimic3-tts__.deb ``` Once installed, the following commands will be available in `/usr/bin`: * `mimic3` * `mimic3-server` * `mimic3-download` ### Using pip Install the command-line tool: ``` sh pip install mimic3-tts[all] ``` Once installed, the following commands will be available: * `mimic3` * `mimic3-download` Install the HTTP web server: ``` sh pip install mimic3-http[all] ``` Once installed, the following commands will be available: * `mimic3-server` Language support can be selectively installed by replacing `all` with: * `de` - German * `es` - Spanish * `fa` - Farsi * `fr` - French * `it` - Italian * `nl` - Dutch * `ru` - Russian * `sw` - Kiswahili Excluding `[..]` entirely will install support for English only. ### From Source Clone the repository: ``` sh git clone https://github.com/MycroftAI/mimic3.git ``` Run the install script: ``` sh cd mimic3/ ./install.sh ``` A virtual environment will be created in `mimic3/.venv` and each of the Python modules will be installed in editiable mode (`pip install -e`). Once installed, the following commands will be available in `.venv/bin`: * `mimic3` * `mimic3-server` * `mimic3-download` ## Voice Keys Mimic 3 references voices with the format: * `_/_` for single speaker voices, and * `_/_#` for multi-speaker voices * `` can be a name or number starting at 0 * Speaker names come from a voice's `speakers.txt` file ![parts of a mimic 3 voice](img/voice_parts.png) For example, the default [Alan Pope](https://popey.me/) voice key is `en_UK/apope_low`. The [CMU Arctic voice](https://github.com/MycroftAI/mimic3-voices/tree/master/voices/en_US/cmu-arctic_low) contains multiple speakers, with a commonly used voice being `en_US/cmu-arctic_low#slt`. Voices are automatically downloaded from [Github](https://github.com/MycroftAI/mimic3-voices) and stored in `${HOME}/.local/share/mimic3` (technically `${XDG_DATA_HOME}/mimic3`). You can also [manually download them](#downloading-voices). ## Running ### Command-Line Tools The `mimic3` command can be used to synthesize audio on the command line: ``` sh mimic3 --voice 'en_UK/apope_low' 'My hovercraft is full of eels.' > hovercraft_eels.wav ``` See [voice keys](#voice-keys) for how to reference voices and speakers. See `mimic3 --help` or the [CLI documentation](mimic3-tts/) for more details. #### Downloading Voices Mimic 3 automatically downloads voices when they're first used, but you can manually download them too with `mimic3-download`. For example: ``` sh mimic3-download 'en_US/*' ``` will download all U.S. English voices to `${HOME}/.local/share/mimic3`. See `mimic3-download --help` for more options. ### Web Server and Client Start a web server with `mimic3-server` and visit `http://localhost:59125` to view the web UI. ![screenshot of web interface](mimic3-http/img/server_screenshot.jpg) The following endpoints are available: * `/api/tts` * `POST` text or [SSML](#ssml) and receive WAV audio back * Use `?voice=` to select a different [voice/speaker](#voice-keys) * Set `Content-Type` to `application/ssml+xml` (or use `?ssml=1`) for [SSML](#ssml) input * `/api/voices` * Returns a JSON list of available voices An [OpenAPI](https://www.openapis.org/) test page is also available at `http://localhost:59125/openapi` See `mimic3-server --help` for the [web server documentation](mimic3-http/) for more details. #### Web Client The `mimic3` program provides an interface to the Mimic 3 web server when the `--remote` option is given. Assuming you have started `mimic3-server` and can access `http://localhost:59125`, then: ``` sh mimic3 --remote --voice 'en_UK/apope_low' 'My hovercraft is full of eels.' > hovercraft_eels.wav ``` If your server is somewhere besides `localhost`, use `mimic3 --remote ...` See `mimic3 --help` for more options. ## CUDA Acceleration If you have a GPU with support for CUDA, you can accelerate synthesis with the `--cuda` flag when running `mimic3` or `mimic3-server`. This requires you to install the [onnxruntime-gpu](https://pypi.org/project/onnxruntime-gpu/) Python package. Using [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) is highly recommended. See [ Dockerfile.gpu](Dockerfile.gpu) for an example of how to build a compatible container. ## MaryTTS Compatibility Use the Mimic 3 web server as a drop-in replacement for [MaryTTS](http://mary.dfki.de/), for example with [Home Assistant](https://www.home-assistant.io/integrations/marytts/). Make sure to use a compatible [voice key](#voice-keys) like `en_UK/apope_low`. For Mycroft, you can use this instead of [the plugin](https://github.com/MycroftAI/plugin-tts-mimic3) by running: ``` sh mycroft-config edit user ``` and then adding the following: ``` json "tts": { "module": "marytts", "marytts": { "url": "http://localhost:59125", "voice": "en_UK/apope_low" } ``` ## SSML A [subset of SSML](mimic3-tts/#SSML) (Speech Synthesis Markup Language) is supported. For example: ``` xml Welcome to the world of speech synthesis. This is a 2 voice. ``` will speak the two sentences with different voices and a 3 second second pause in between. The second sentence will also have the number "2" pronounced as "second" (ordinal form). SSML `` support varies between voice types: * [gruut](https://github.com/rhasspy/gruut/#ssml) * [eSpeak-ng](http://espeak.sourceforge.net/ssml.html) * [epitran](https://github.com/dmort27/epitran/) voices do not currently support `` * Character-based voices do not currently support `` ## License See [license file](LICENSE)