diff --git a/README.md b/README.md index 5ee8587..7fd9e73 100644 --- a/README.md +++ b/README.md @@ -5,99 +5,32 @@ A fast and local neural text to speech system developed by [Mycroft](https://mycroft.ai/) for the [Mark II](https://mycroft.ai/product/mark-ii/). * [Available voices](https://github.com/MycroftAI/mimic3-voices) -* [How does it work?](mimic3_tts/#architecture) +* [Documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3) +* [How does it work?](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3#how-it-works) -## Use Cases - -* [Mycroft TTS plugin](#mycroft-tts-plugin) - * `mycroft-say 'Hello world.'` -* [Web server](#web-server-and-client) - * `curl -X POST --data 'Hello world.' --output - localhost:59125/api/tts | aplay` - * Drop-in [replacement for MaryTTS](#marytts-compatibility) -* [Command-line tool](#command-line-tools) - * `mimic3 'Hello world.' | aplay` -* [Voice for screen reader](mimic3_tts/#speech-dispatcher) - * `spd-say 'Hello world.'` - - -## Dependencies - -Mimic 3 requires: - -* Python 3.7 or higher -* The [Onnx runtime](https://onnxruntime.ai/) -* [gruut](https://github.com/rhasspy/gruut) or [eSpeak-ng](https://github.com/espeak-ng/espeak-ng) or [epitran](https://github.com/dmort27/epitran/) (depending on the voice) - - -## Installation - - -### eSpeak - -Some voices depend on [eSpeak-ng](https://github.com/espeak-ng/espeak-ng), specifically `libespeak-ng.so`. For those voices, make sure that libespeak-ng is installed with: - -``` sh -sudo apt-get install libespeak-ng1 -``` - -On 32-bit ARM platforms (a.k.a. `armv7l` or `armhf`), you will also need some extra libraries: - -``` sh -sudo apt-get install libatomic1 libgomp1 libatlas-base-dev -``` - +## Quickstart ### Mycroft TTS Plugin -Install the plugin: - ``` sh +# Install system packages +sudo apt-get install libespeak-ng1 + +# Install plugin mycroft-pip install mycroft-plugin-tts-mimic3[all] -``` -Enable the plugin in your [mycroft.conf](https://mycroft-ai.gitbook.io/docs/using-mycroft-ai/customizations/mycroft-conf) file: - -``` sh +# Activate plugin mycroft-config set tts.module mimic3_tts_plug + +# Start mycroft +mycroft-start all ``` -or you can manually add the following to `mycroft.conf` with `mycroft-config edit user`: - -``` json -"tts": { - "module": "mimic3_tts_plug" -} -``` +See [documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3#tts-plugin-for-mycroft-ai) for more details. -See the [plugin's documentation](https://github.com/MycroftAI/plugin-tts-mimic3) for more options. - - -### Docker image - -A pre-built Docker image is available for the following platforms: - -* `linux/amd64` - * For desktops and laptops (`x86_64` CPUs) -* `linux/arm64` - * For Raspberry 3/4 and Zero 2 with [64-bit Pi OS](https://www.raspberrypi.com/news/raspberry-pi-os-64-bit/) -* `linux/arm/v7` - * For Raspberry Pi 1/2/3/4 and Zero 2 with 32-bit Pi OS - -Install/update with: - -``` sh -docker pull mycroftai/mimic3 -``` - -Once installed, check out the following scripts for running: - -* [`mimic3`](docker/mimic3) -* [`mimic3-server`](docker/mimic3-server) -* [`mimic3-download`](docker/mimic3-download) - -Or you can manually run the web server with: +### Web Server ``` sh docker run \ @@ -107,234 +40,43 @@ docker run \ 'mycroftai/mimic3' ``` -Voices will be automatically downloaded to `${HOME}/.local/share/mycroft/mimic3/voices` - - -### Debian Package - -Grab the Debian package from the [latest release](https://github.com/mycroftAI/mimic3/releases) for your platform: - -* `mycroft-mimic3-tts__amd64.deb` - * For desktops and laptops (`x86_64` CPUs) -* `mycroft-mimic3-tts__arm64.deb` - * For Raspberry 3/4 and Zero 2 with [64-bit Pi OS](https://www.raspberrypi.com/news/raspberry-pi-os-64-bit/) -* `mycroft-mimic3-tts__armhf.deb` - * For Raspberry Pi 1/2/3/4 and Zero 2 with 32-bit Pi OS - -Once downloaded, install the package with (note the `./`): - -``` sh -sudo apt install ./mycroft-mimic3-tts__.deb -``` - -Once installed, the following commands will be available in `/usr/bin`: - -* `mimic3` -* `mimic3-server` -* `mimic3-download` - - -### Using pip - -Install the command-line tool: - -``` sh -pip install mycroft-mimic3-tts[all] -``` - -Once installed, the following commands will be available: - -* `mimic3` -* `mimic3-download` -* `mimic3-server` - -Language support can be selectively installed by replacing `all` with: - -* `de` - German -* `es` - Spanish -* `fa` - Farsi -* `fr` - French -* `it` - Italian -* `nl` - Dutch -* `ru` - Russian -* `sw` - Kiswahili - -Excluding `[..]` entirely will install support for English only. - - -### From Source - -Clone the repository: - -``` sh -git clone https://github.com/MycroftAI/mimic3.git -``` - -Run the install script: - -``` sh -cd mimic3/ -./install.sh -``` - -A virtual environment will be created in `mimic3/.venv` and each of the Python modules will be installed in editiable mode (`pip install -e`). - -Once installed, the following commands will be available in `.venv/bin`: - -* `mimic3` -* `mimic3-server` -* `mimic3-download` - - -## Voice Keys - -Mimic 3 references voices with the format: - -* `_/_` for single speaker voices, and -* `_/_#` for multi-speaker voices - * `` can be a name or number starting at 0 - * Speaker names come from a voice's `speakers.txt` file - -![parts of a mimic 3 voice](img/voice_parts.png) - -For example, the default [Alan Pope](https://popey.me/) voice key is `en_UK/apope_low`. -The [CMU Arctic voice](https://github.com/MycroftAI/mimic3-voices/tree/master/voices/en_US/cmu-arctic_low) contains multiple speakers, with a commonly used voice being `en_US/cmu-arctic_low#slt`. - -Voices are automatically downloaded from [Github](https://github.com/MycroftAI/mimic3-voices) and stored in `${HOME}/.local/share/mycroft/mimic3` (technically `${XDG_DATA_HOME}/mycroft/mimic3`). You can also [manually download them](#downloading-voices). - - -## Running - - -### Command-Line Tools - -The `mimic3` command can be used to synthesize audio on the command line: - -``` sh -mimic3 --voice 'en_UK/apope_low' 'My hovercraft is full of eels.' > hovercraft_eels.wav -``` - -See [voice keys](#voice-keys) for how to reference voices and speakers. - -See `mimic3 --help` or the [CLI documentation](mimic3_tts/) for more details. - - -#### Downloading Voices - -Mimic 3 automatically downloads voices when they're first used, but you can manually download them too with `mimic3-download`. - -For example: - -``` sh -mimic3-download 'en_US/*' -``` - -will download all U.S. English voices to `${HOME}/.local/share/mycroft/mimic3/voices`. - -See `mimic3-download --help` for more options. - - -### Web Server and Client - -Start a web server with `mimic3-server` and visit `http://localhost:59125` to view the web UI. - -To access the server from outside your device, run `mimic3-server --host 0.0.0.0` (see also `--port `). - -![screenshot of web interface](img/server_screenshot.jpg) - -The following endpoints are available: - -* `/api/tts` - * `POST` text or [SSML](#ssml) and receive WAV audio back - * Use `?voice=` to select a different [voice/speaker](#voice-keys) - * Set `Content-Type` to `application/ssml+xml` (or use `?ssml=1`) for [SSML](#ssml) input -* `/api/voices` - * Returns a JSON list of available voices - -An [OpenAPI](https://www.openapis.org/) test page is also available at `http://localhost:59125/openapi` - -See `mimic3-server --help` for the [web server documentation](mimic3_http/) for more details. - - -#### Web Client - -The `mimic3` program provides an interface to the Mimic 3 web server when the `--remote` option is given. - -Assuming you have started `mimic3-server` and can access `http://localhost:59125`, then: - -``` sh -mimic3 --remote --voice 'en_UK/apope_low' 'My hovercraft is full of eels.' > hovercraft_eels.wav -``` - -If your server is somewhere besides `localhost`, use `mimic3 --remote ...` - -See `mimic3 --help` for more options. - - -## CUDA Acceleration - -If you have a GPU with support for CUDA, you can accelerate synthesis with the `--cuda` flag when running `mimic3` or `mimic3-server`. This requires you to install the [onnxruntime-gpu](https://pypi.org/project/onnxruntime-gpu/) Python package. - -Using [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) is highly recommended. See [ Dockerfile.gpu](Dockerfile.gpu) for an example of how to build a compatible container. - - -## MaryTTS Compatibility - -Use the Mimic 3 web server as a drop-in replacement for [MaryTTS](http://mary.dfki.de/), for example with [Home Assistant](https://www.home-assistant.io/integrations/marytts/). - -Make sure to use a Mimic 3 [voice key](#voice-keys) like `en_UK/apope_low` instead of a MaryTTS voice name. - -For Mycroft, you can use this instead of [the plugin](https://github.com/MycroftAI/plugin-tts-mimic3) by running: +Visit [http://localhost:59125](http://localhost:59125) or from another terminal: ``` sh -mycroft-config edit user +curl -X POST --data 'Hello world.' --output - localhost:59125/api/tts | aplay + ``` -and then adding the following: +See [documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3#web-server) for more details. -``` json -"tts": { -"module": "marytts", -"marytts": { - "url": "http://localhost:59125", - "voice": "en_UK/apope_low" -} + +### Command-Line Tool + +``` sh +# Install system packages +sudo apt-get install libespeak-ng1 + +# Create virtual environment +python3 -m venv .venv +source .venv/bin/activate +pip3 install --upgrade pip + +pip3 install mycroft-mimic3-tts[all] ``` +Now you can run: -## SSML - -A [subset of SSML](mimic3_tts/#SSML) (Speech Synthesis Markup Language) is supported. - -For example: - -``` xml - - - - Welcome to the world of speech synthesis. - - - - - - - This is a 2 voice. - - - - +``` sh +mimic3 'Hello world.' | aplay ``` -will speak the two sentences with different voices and a 3 second second pause in between. The second sentence will also have the number "2" pronounced as "second" (ordinal form). +Use `mimic3-server` and `mimic3 --remote ...` for repeated usage (much faster). -SSML `` support varies between voice types: +See [documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3#command-line-interface) for more details. -* [gruut](https://github.com/rhasspy/gruut/#ssml) -* [eSpeak-ng](http://espeak.sourceforge.net/ssml.html) -* [epitran](https://github.com/dmort27/epitran/) voices do not currently support `` -* Character-based voices do not currently support `` + +--- ## License diff --git a/img/server_screenshot.jpg b/img/server_screenshot.jpg index 25226b4..c4c9892 100644 Binary files a/img/server_screenshot.jpg and b/img/server_screenshot.jpg differ diff --git a/mimic3_http/README.md b/mimic3_http/README.md index bd24ddf..1bcc33d 100644 --- a/mimic3_http/README.md +++ b/mimic3_http/README.md @@ -2,56 +2,12 @@ A small HTTP web server for the [Mimic 3](https://github.com/MycroftAI/mimic3) text to speech system. -[Available voices](https://github.com/MycroftAI/mimic3-voices) +* [Available voices](https://github.com/MycroftAI/mimic3-voices) +* [Documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3) ![screenshot of web interface](img/server_screenshot.jpg) -## Running the Server +## License -``` sh -mimic3-server -``` - -This will start a web server at `http://localhost:59125` - -See `mimic3-server --debug` for more options. - - -### Endpoints - -* `/api/tts` - * `POST` text or [SSML](#ssml) and receive WAV audio back - * Use `?voice=` to select a different [voice/speaker](#voice-keys) - * Set `Content-Type` to `application/ssml+xml` (or use `?ssml=1`) for [SSML](#ssml) input -* `/api/voices` - * Returns a JSON list of available voices - -An [OpenAPI](https://www.openapis.org/) test page is also available at `http://localhost:59125/openapi` - - -### CUDA Acceleration - -If you have a GPU with support for CUDA, you can accelerate synthesis with the `--cuda` flag. This requires you to install the [onnxruntime-gpu](https://pypi.org/project/onnxruntime-gpu/) Python package. - -Using [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) is highly recommended. See the `Dockerfile.gpu` file in the parent repository for an example of how to build a compatible container. - - -## Running the Client - -Assuming you have started `mimic3-server` and can access `http://localhost:59125`, then: - -``` sh -mimic3 --remote --voice 'en_UK/apope_low' 'My hovercraft is full of eels.' > hovercraft_eels.wav -``` - -If your server is somewhere besides `localhost`, use `mimic3 --remote ...` - -See `mimic3 --help` for more options. - - -## MaryTTS Compatibility - -Use the Mimic 3 web server as a drop-in replacement for [MaryTTS](http://mary.dfki.de/), for example with [Home Assistant](https://www.home-assistant.io/integrations/marytts/). - -Make sure to use a compatible [voice key](#voice-keys) like `en_UK/apope_low`. +Mimic 3 is available under the [AGPL v3 license](LICENSE) diff --git a/mimic3_tts/README.md b/mimic3_tts/README.md index d7cdeaf..3797309 100644 --- a/mimic3_tts/README.md +++ b/mimic3_tts/README.md @@ -1,348 +1,11 @@ # Mimic 3 -A fast and local neural text to speech system for [Mycroft](https://mycroft.ai/) and the [Mark II](https://mycroft.ai/product/mark-ii/). +A fast and local neural text to speech system developed by [Mycroft](https://mycroft.ai/) for the [Mark II](https://mycroft.ai/product/mark-ii/). * [Available voices](https://github.com/MycroftAI/mimic3-voices) -* [Mimic 3 Architecture](#architecture) - - -## Command-Line Tools - - -### mimic3 - - -#### Basic Synthesis - -```sh -mimic3 --voice "" > output.wav -``` - -where `` is a [voice key](https://github.com/MycroftAI/mimic3/#voice-keys) like `en_UK/apope_low`. -`` may contain multiple sentences, which will be combined in the final output WAV file. These can also be [split into separate WAV files](#multiple-wav-output). - - -#### SSML Synthesis - -```sh -mimic3 --ssml --voice "" > output.wav -``` - -where `` is valid [SSML](https://www.w3.org/TR/speech-synthesis11/). Not all SSML features are supported, see [the documentation](#ssml) for details. - -If your SSML contains `` tags, add `--mark-file ` to the command-line and use `--interactive` mode. As the marks are encountered, their names will be written on separate lines to the file: - -```sh -mimic3 --ssml --interactive --mark-file - 'Test 1. Test 2.' -``` - - -#### Long Texts - -If your text is very long, and you would like to listen to it as its being synthesized, use `--interactive` mode: - -```sh -mimic3 --interactive < long.txt -``` - -Each input line will be synthesized and played (see `--play-program`). By default, 5 sentences will be kept in an output queue, only blocking synthesis when the queue is full. You can adjust this value with `--result-queue-size`. - -If your long text is fixed-width with blank lines separating paragraphs like those from [Project Gutenberg](https://www.gutenberg.org/), use the `--process-on-blank-line` option so that sentences will not be broken at line boundaries. For example, you can listen to "Alice in Wonderland" like this: - -```sh -curl --output - 'https://www.gutenberg.org/files/11/11-0.txt' | \ - mimic3 --interactive --process-on-blank-line -``` - - -#### Multiple WAV Output - -With `--output-dir` set to a directory, Mimic 3 will output a separate WAV file for each sentence: - -```sh -mimic3 'Test 1. Test 2.' --output-dir /path/to/wavs -``` - -By default, each WAV file will be named using the (slightly modified) text of the sentence. You can have WAV files named using a timestamp instead with `--output-naming time`. For full control of the output naming, the `--csv` command-line flag indicates that each sentence is of the form `id|text` where `id` will be the name of the WAV file. - -```sh -cat << EOF | -s01|The birch canoe slid on the smooth planks. -s02|Glue the sheet to the dark blue background. -s03|It's easy to tell the depth of a well. -s04|These days a chicken leg is a rare dish. -s05|Rice is often served in round bowls. -s06|The juice of lemons makes fine punch. -s07|The box was thrown beside the parked truck. -s08|The hogs were fed chopped corn and garbage. -s09|Four hours of steady work faced us. -s10|Large size in stockings is hard to sell. -EOF - mimic3 --csv --output-dir /path/to/wavs -``` - -You can adjust the delimiter with `--csv-delimiter `. - -Additionally, you can use the `--csv-voice` option to specify a different voice or speaker for each line: - -```sh -cat << EOF | -s01|#awb|The birch canoe slid on the smooth planks. -s02|#rms|Glue the sheet to the dark blue background. -s03|#slt|It's easy to tell the depth of a well. -s04|#ksp|These days a chicken leg is a rare dish. -s05|#clb|Rice is often served in round bowls. -s06|#aew|The juice of lemons makes fine punch. -s07|#bdl|The box was thrown beside the parked truck. -s08|#lnh|The hogs were fed chopped corn and garbage. -s09|#jmk|Four hours of steady work faced us. -s10|en_UK/apope_low|Large size in stockings is hard to sell. -EOF - mimic3 --voice 'en_US/cmu-arctic_low' --csv-voice --output-dir /path/to/wavs -``` - -The second contain can contain a `#` or an entirely different voice! - - -#### Interactive Mode - -With `--interactive`, Mimic 3 will switch into interactive mode. After entering a sentence, it will be played with `--play-program`. - -```sh -mimic3 --interactive -Reading text from stdin... -Hello world! -``` - -Use `CTRL+D` or `CTRL+C` to exit. - - -#### Noise and Length Settings - -Synthesis has the following additional parameters: - -* `--noise-scale` and `--noise-w` - * Determine the speaker volatility during synthesis - * 0-1, default is 0.667 and 0.8 respectively -* `--length-scale` - makes the voice speaker slower (> 1) or faster (< 1) - -Individual voices have default settings for these parameters in their `config.json` files (under `inference`). - - -#### List Voices - -```sh -mimic3 --voices -``` - - -#### CUDA Acceleration - -If you have a GPU with support for CUDA, you can accelerate synthesis with the `--cuda` flag. This requires you to install the [onnxruntime-gpu](https://pypi.org/project/onnxruntime-gpu/) Python package. - -Using [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) is highly recommended. See the `Dockerfile.gpu` file in the parent repository for an example of how to build a compatible container. - - - -### mimic3-download - -Mimic 3 automatically downloads voices when they're first used, but you can manually download them too with `mimic3-download`. - -For example: - -``` sh -mimic3-download 'en_US/*' -``` - -will download all U.S. English voices to `${HOME}/.local/share/mycroft/mimic3` (technically `${XDG_DATA_HOME}/mimic3`). - -See `mimic3-download --help` for more options. - - -## SSML - -A subset of [SSML](https://www.w3.org/TR/speech-synthesis11/) (Speech Synthesis Markup Language) is supported: - -* `` - wrap around SSML text - * `lang` - set language for document -* `` - sentence (disables automatic sentence breaking) - * `lang` - set language for sentence -* `` / `` - word (disables automatic tokenization) -* `` - set voice of inner text - * `voice` - name or language of voice - * Name format is `tts:voice` (e.g., "glow-speak:en-us_mary_ann") or `tts:voice#speaker_id` (e.g., "coqui-tts:en_vctk#p228") - * If one of the supported languages, a preferred voice is used (override with `--preferred-voice `) -* `` - change speaking attributes - * Supported `attribute` names: - * `volume` - speaking volume - * number in [0, 100] - 0 is silent, 100 is loudest (default) - * +X, -X, +X%, -X% - absolute/percent offset from current volume - * one of "default", "silent", "x-loud", "loud", "medium", "soft", "x-soft" - * `rate` - speaking rate - * number - 1 is default rate, < 1 is slower, > 1 is faster - * X% - 100% is default rate, 50% is half speed, 200% is twice as fast - * one of "default", "x-fast", "fast", "medium", "slow", "x-slow" -* `` - force interpretation of inner text - * `interpret-as` one of "spell-out", "date", "number", "time", or "currency" - * `format` - way to format text depending on `interpret-as` - * number - one of "cardinal", "ordinal", "digits", "year" - * date - string with "d" (cardinal day), "o" (ordinal day), "m" (month), or "y" (year) -* `` - Pause for given amount of time - * time - seconds ("123s") or milliseconds ("123ms") -* `` - substitute `alias` for inner text -* `` - supply phonemes for inner text - * See `phonemes.txt` in voice directory for available phonemes - * Phonemes may need to be separated by whitespace - -SSML `` support varies between voice types: - -* [gruut](https://github.com/rhasspy/gruut/#ssml) -* [eSpeak-ng](http://espeak.sourceforge.net/ssml.html) -* Character-based voices do not currently support `` - - -## Speech Dispatcher - -Mimic 3 can be used with the [Orca screen reader](https://help.gnome.org/users/orca/stable/) for Linux via [speech-dispatcher](https://github.com/brailcom/speechd). - -After [installing Mimic 3](https://github.com/MycroftAI/mimic3/#installation), make sure you also have speech-dispatcher installed: - -``` sh -sudo apt-get install speech-dispatcher -``` - -Create the file `/etc/speech-dispatcher/modules/mimic3-generic.conf` with the contents: - -``` text -GenericExecuteSynth "printf %s \'$DATA\' | /path/to/mimic3 --remote --voice \'$VOICE\' --stdout | $PLAY_COMMAND" -AddVoice "en-us" "MALE1" "en_UK/apope_low" -``` - -You will need `sudo` access to do this. Make sure to change `/path/to/mimic3` to wherever you installed Mimic 3. Note that the `--remote` option is used to connect to a local Mimic 3 web server (use `--remote ` if your server is somewhere besides `localhost`). - -To change the voice later, you only need to replace `en_UK/apope_low`. - -Next, edit the existing file `/etc/speech-dispatcher/speechd.conf` and ensure the following settings are present: - -``` text -DefaultVoiceType "MALE1" -DefaultModule mimic3-generic -``` - -Restart speech-dispatcher with: - -``` sh -sudo systemctl restart speech-dispatcher -``` - -and test it out with: - -``` sh -spd-say 'Hello from speech dispatcher.' -``` - - -### Systemd Service - -To ensure that Mimic 3 runs at boot, create a systemd service at `$HOME/.config/systemd/user/mimic3.service` with the contents: - -``` text -[Unit] -Description=Run Mimic 3 web server -Documentation=https://github.com/MycroftAI/mimic3 - -[Service] -ExecStart=/path/to/mimic3-server - -[Install] -WantedBy=default.target -``` - -Make sure to change `/path/to/mimic3-server` to wherever you installed Mimic 3. - -Refresh the systemd services: - -``` sh -systemctl --user daemon-reload -``` - -Now try starting the service: - -``` sh -systemctl --user start mimic3 -``` - -If that's successful, ensure it starts at boot: - -``` sh -systemctl --user enable mimic3 -``` - - -## Architecture - -Mimic 3 uses the [VITS](https://arxiv.org/abs/2106.06103), a "Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech". VITS is a combination of the [GlowTTS duration predictor](https://arxiv.org/abs/2005.11129) and the [HiFi-GAN vocoder](https://arxiv.org/abs/2010.05646). - -Our implementation is heavily based on [Jaehyeon Kim's PyTorch model](https://github.com/jaywalnut310/vits), with the addition of [Onnx runtime](https://onnxruntime.ai/) export for speed. - -![mimic 3 architecture](img/mimic3-architecture.png) - - -### Phoneme Ids - -At a high level, Mimic 3 performs two important tasks: - -1. Converting raw text to numeric input for the VITS TTS model, and -2. Using the model to transform numeric input into audio output - -The second step is the same for every voice, but the first step (text to numbers) varies. There are currently three implementations of step 1, described below. - - -### gruut Phoneme-based Voices - -Voices that use [gruut](https://github.com/rhasspy/gruut/) for phonemization. - -gruut normalizes text and phonemizes words according to a lexicon, with a pre-trained grapheme-to-phoneme model used to guess unknown word pronunciations. - - -### eSpeak Phoneme-based Voices - -Voices that use [eSpeak-ng](https://github.com/espeak-ng/espeak-ng) for phonemization (via [espeak-phonemizer](https://github.com/rhasspy/espeak-phonemizer)). - -eSpeak-ng normalizes and phonemizes text using internal rules and lexicons. It supports a large number of languages, and can handle many textual forms. - - -### Character-based Voices - -Voices whose "phonemes" are characters from an alphabet, typically with some punctuation. - -For voices whose orthography (writing system) is close enough to its spoken form, character-based voices allow for skipping the phonemization step. However, these voices do not support text normalization, so numbers, dates, etc. must be written out. - - -### Epitran-based Voices - -Voices that use [epitran](https://github.com/dmort27/epitran/) for phonemization. - -epitran uses rules to generate phonetic pronunciations from text. It does not support text normalization, however, so numbers, dates, etc. must be written out. - - -### Components of a Voice Model - -Voice models are stored in a directory with a specific layout: - -* `_` (e.g., `en_UK`) - * `_` (e.g., `apope_low`) - * `ALIASES` - alternative names for the voice, one per line (optional) - * `config.json` - training/inference configuration (see [code](https://github.com/MycroftAI/mimic3/blob/master/mimic3-tts/mimic3_tts/config.py) for details) - * `generator.onnx` - exported inference model (see `ids_to_audio` method in [`voice.py`](https://github.com/MycroftAI/mimic3/blob/master/mimic3-tts/mimic3_tts/voice.py)) - * `LICENSE` - text, name, or URL of voice model license - * `phoneme_map.txt` - mapping from source phoneme to destination phoneme(s) (optional) - * `phonemes.txt` - mapping from integer ids to phonemes (`_` = padding, `^` = beginning of utterance, `$` = end of utterance, `#` = word break) - * `README.md` - description of the voice - * `SOURCE` - URL(s) of the dataset(s) this voice was trained on - * `VERSION` - version of the voice in the format "MAJOR.Minor.bugfix" (e.g. "1.0.2") +* [Documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3) ## License -See [license file](LICENSE) +Mimic 3 is available under the [AGPL v3 license](LICENSE) diff --git a/opentts_abc/README.md b/opentts_abc/README.md new file mode 100644 index 0000000..280aece --- /dev/null +++ b/opentts_abc/README.md @@ -0,0 +1,3 @@ +# OpenTTS ABC + +Abstract base classes used by the [Mimic 3](https://github.com/MycroftAI/mimic3) text to speech system.