Centralize documentation

This commit is contained in:
Michael Hansen 2022-05-06 15:47:01 -04:00
commit 07f3c00d9b
5 changed files with 47 additions and 683 deletions

332
README.md
View file

@ -5,99 +5,32 @@
A fast and local neural text to speech system developed by [Mycroft](https://mycroft.ai/) for the [Mark II](https://mycroft.ai/product/mark-ii/).
* [Available voices](https://github.com/MycroftAI/mimic3-voices)
* [How does it work?](mimic3_tts/#architecture)
* [Documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3)
* [How does it work?](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3#how-it-works)
## Use Cases
* [Mycroft TTS plugin](#mycroft-tts-plugin)
* `mycroft-say 'Hello world.'`
* [Web server](#web-server-and-client)
* `curl -X POST --data 'Hello world.' --output - localhost:59125/api/tts | aplay`
* Drop-in [replacement for MaryTTS](#marytts-compatibility)
* [Command-line tool](#command-line-tools)
* `mimic3 'Hello world.' | aplay`
* [Voice for screen reader](mimic3_tts/#speech-dispatcher)
* `spd-say 'Hello world.'`
## Dependencies
Mimic 3 requires:
* Python 3.7 or higher
* The [Onnx runtime](https://onnxruntime.ai/)
* [gruut](https://github.com/rhasspy/gruut) or [eSpeak-ng](https://github.com/espeak-ng/espeak-ng) or [epitran](https://github.com/dmort27/epitran/) (depending on the voice)
## Installation
### eSpeak
Some voices depend on [eSpeak-ng](https://github.com/espeak-ng/espeak-ng), specifically `libespeak-ng.so`. For those voices, make sure that libespeak-ng is installed with:
``` sh
sudo apt-get install libespeak-ng1
```
On 32-bit ARM platforms (a.k.a. `armv7l` or `armhf`), you will also need some extra libraries:
``` sh
sudo apt-get install libatomic1 libgomp1 libatlas-base-dev
```
## Quickstart
### Mycroft TTS Plugin
Install the plugin:
``` sh
# Install system packages
sudo apt-get install libespeak-ng1
# Install plugin
mycroft-pip install mycroft-plugin-tts-mimic3[all]
```
Enable the plugin in your [mycroft.conf](https://mycroft-ai.gitbook.io/docs/using-mycroft-ai/customizations/mycroft-conf) file:
``` sh
# Activate plugin
mycroft-config set tts.module mimic3_tts_plug
# Start mycroft
mycroft-start all
```
or you can manually add the following to `mycroft.conf` with `mycroft-config edit user`:
``` json
"tts": {
"module": "mimic3_tts_plug"
}
```
See [documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3#tts-plugin-for-mycroft-ai) for more details.
See the [plugin's documentation](https://github.com/MycroftAI/plugin-tts-mimic3) for more options.
### Docker image
A pre-built Docker image is available for the following platforms:
* `linux/amd64`
* For desktops and laptops (`x86_64` CPUs)
* `linux/arm64`
* For Raspberry 3/4 and Zero 2 with [64-bit Pi OS](https://www.raspberrypi.com/news/raspberry-pi-os-64-bit/)
* `linux/arm/v7`
* For Raspberry Pi 1/2/3/4 and Zero 2 with 32-bit Pi OS
Install/update with:
``` sh
docker pull mycroftai/mimic3
```
Once installed, check out the following scripts for running:
* [`mimic3`](docker/mimic3)
* [`mimic3-server`](docker/mimic3-server)
* [`mimic3-download`](docker/mimic3-download)
Or you can manually run the web server with:
### Web Server
``` sh
docker run \
@ -107,234 +40,43 @@ docker run \
'mycroftai/mimic3'
```
Voices will be automatically downloaded to `${HOME}/.local/share/mycroft/mimic3/voices`
### Debian Package
Grab the Debian package from the [latest release](https://github.com/mycroftAI/mimic3/releases) for your platform:
* `mycroft-mimic3-tts_<version>_amd64.deb`
* For desktops and laptops (`x86_64` CPUs)
* `mycroft-mimic3-tts_<version>_arm64.deb`
* For Raspberry 3/4 and Zero 2 with [64-bit Pi OS](https://www.raspberrypi.com/news/raspberry-pi-os-64-bit/)
* `mycroft-mimic3-tts_<version>_armhf.deb`
* For Raspberry Pi 1/2/3/4 and Zero 2 with 32-bit Pi OS
Once downloaded, install the package with (note the `./`):
``` sh
sudo apt install ./mycroft-mimic3-tts_<version>_<platform>.deb
```
Once installed, the following commands will be available in `/usr/bin`:
* `mimic3`
* `mimic3-server`
* `mimic3-download`
### Using pip
Install the command-line tool:
``` sh
pip install mycroft-mimic3-tts[all]
```
Once installed, the following commands will be available:
* `mimic3`
* `mimic3-download`
* `mimic3-server`
Language support can be selectively installed by replacing `all` with:
* `de` - German
* `es` - Spanish
* `fa` - Farsi
* `fr` - French
* `it` - Italian
* `nl` - Dutch
* `ru` - Russian
* `sw` - Kiswahili
Excluding `[..]` entirely will install support for English only.
### From Source
Clone the repository:
``` sh
git clone https://github.com/MycroftAI/mimic3.git
```
Run the install script:
``` sh
cd mimic3/
./install.sh
```
A virtual environment will be created in `mimic3/.venv` and each of the Python modules will be installed in editiable mode (`pip install -e`).
Once installed, the following commands will be available in `.venv/bin`:
* `mimic3`
* `mimic3-server`
* `mimic3-download`
## Voice Keys
Mimic 3 references voices with the format:
* `<language>_<region>/<dataset>_<quality>` for single speaker voices, and
* `<language>_<region>/<dataset>_<quality>#<speaker>` for multi-speaker voices
* `<speaker>` can be a name or number starting at 0
* Speaker names come from a voice's `speakers.txt` file
![parts of a mimic 3 voice](img/voice_parts.png)
For example, the default [Alan Pope](https://popey.me/) voice key is `en_UK/apope_low`.
The [CMU Arctic voice](https://github.com/MycroftAI/mimic3-voices/tree/master/voices/en_US/cmu-arctic_low) contains multiple speakers, with a commonly used voice being `en_US/cmu-arctic_low#slt`.
Voices are automatically downloaded from [Github](https://github.com/MycroftAI/mimic3-voices) and stored in `${HOME}/.local/share/mycroft/mimic3` (technically `${XDG_DATA_HOME}/mycroft/mimic3`). You can also [manually download them](#downloading-voices).
## Running
### Command-Line Tools
The `mimic3` command can be used to synthesize audio on the command line:
``` sh
mimic3 --voice 'en_UK/apope_low' 'My hovercraft is full of eels.' > hovercraft_eels.wav
```
See [voice keys](#voice-keys) for how to reference voices and speakers.
See `mimic3 --help` or the [CLI documentation](mimic3_tts/) for more details.
#### Downloading Voices
Mimic 3 automatically downloads voices when they're first used, but you can manually download them too with `mimic3-download`.
For example:
``` sh
mimic3-download 'en_US/*'
```
will download all U.S. English voices to `${HOME}/.local/share/mycroft/mimic3/voices`.
See `mimic3-download --help` for more options.
### Web Server and Client
Start a web server with `mimic3-server` and visit `http://localhost:59125` to view the web UI.
To access the server from outside your device, run `mimic3-server --host 0.0.0.0` (see also `--port <PORT>`).
![screenshot of web interface](img/server_screenshot.jpg)
The following endpoints are available:
* `/api/tts`
* `POST` text or [SSML](#ssml) and receive WAV audio back
* Use `?voice=` to select a different [voice/speaker](#voice-keys)
* Set `Content-Type` to `application/ssml+xml` (or use `?ssml=1`) for [SSML](#ssml) input
* `/api/voices`
* Returns a JSON list of available voices
An [OpenAPI](https://www.openapis.org/) test page is also available at `http://localhost:59125/openapi`
See `mimic3-server --help` for the [web server documentation](mimic3_http/) for more details.
#### Web Client
The `mimic3` program provides an interface to the Mimic 3 web server when the `--remote` option is given.
Assuming you have started `mimic3-server` and can access `http://localhost:59125`, then:
``` sh
mimic3 --remote --voice 'en_UK/apope_low' 'My hovercraft is full of eels.' > hovercraft_eels.wav
```
If your server is somewhere besides `localhost`, use `mimic3 --remote <URL> ...`
See `mimic3 --help` for more options.
## CUDA Acceleration
If you have a GPU with support for CUDA, you can accelerate synthesis with the `--cuda` flag when running `mimic3` or `mimic3-server`. This requires you to install the [onnxruntime-gpu](https://pypi.org/project/onnxruntime-gpu/) Python package.
Using [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) is highly recommended. See [ Dockerfile.gpu](Dockerfile.gpu) for an example of how to build a compatible container.
## MaryTTS Compatibility
Use the Mimic 3 web server as a drop-in replacement for [MaryTTS](http://mary.dfki.de/), for example with [Home Assistant](https://www.home-assistant.io/integrations/marytts/).
Make sure to use a Mimic 3 [voice key](#voice-keys) like `en_UK/apope_low` instead of a MaryTTS voice name.
For Mycroft, you can use this instead of [the plugin](https://github.com/MycroftAI/plugin-tts-mimic3) by running:
Visit [http://localhost:59125](http://localhost:59125) or from another terminal:
``` sh
mycroft-config edit user
curl -X POST --data 'Hello world.' --output - localhost:59125/api/tts | aplay
```
and then adding the following:
See [documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3#web-server) for more details.
``` json
"tts": {
"module": "marytts",
"marytts": {
"url": "http://localhost:59125",
"voice": "en_UK/apope_low"
}
### Command-Line Tool
``` sh
# Install system packages
sudo apt-get install libespeak-ng1
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
pip3 install --upgrade pip
pip3 install mycroft-mimic3-tts[all]
```
Now you can run:
## SSML
A [subset of SSML](mimic3_tts/#SSML) (Speech Synthesis Markup Language) is supported.
For example:
``` xml
<speak>
<voice name="en_UK/apope_low">
<s>
Welcome to the world of speech synthesis.
</s>
</voice>
<break time="3s" />
<voice name="en_US/cmu-arctic_low#slt">
<s>
<prosody volume="soft" rate="150%">
This is a <say-as interpret-as="number" format="ordinal">2</say-as> voice.
</prosody>
</s>
</voice>
</speak>
``` sh
mimic3 'Hello world.' | aplay
```
will speak the two sentences with different voices and a 3 second second pause in between. The second sentence will also have the number "2" pronounced as "second" (ordinal form).
Use `mimic3-server` and `mimic3 --remote ...` for repeated usage (much faster).
SSML `<say-as>` support varies between voice types:
See [documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3#command-line-interface) for more details.
* [gruut](https://github.com/rhasspy/gruut/#ssml)
* [eSpeak-ng](http://espeak.sourceforge.net/ssml.html)
* [epitran](https://github.com/dmort27/epitran/) voices do not currently support `<say-as>`
* Character-based voices do not currently support `<say-as>`
---
## License

Binary file not shown.

Before

Width:  |  Height:  |  Size: 58 KiB

After

Width:  |  Height:  |  Size: 58 KiB

Before After
Before After

View file

@ -2,56 +2,12 @@
A small HTTP web server for the [Mimic 3](https://github.com/MycroftAI/mimic3) text to speech system.
[Available voices](https://github.com/MycroftAI/mimic3-voices)
* [Available voices](https://github.com/MycroftAI/mimic3-voices)
* [Documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3)
![screenshot of web interface](img/server_screenshot.jpg)
## Running the Server
## License
``` sh
mimic3-server
```
This will start a web server at `http://localhost:59125`
See `mimic3-server --debug` for more options.
### Endpoints
* `/api/tts`
* `POST` text or [SSML](#ssml) and receive WAV audio back
* Use `?voice=` to select a different [voice/speaker](#voice-keys)
* Set `Content-Type` to `application/ssml+xml` (or use `?ssml=1`) for [SSML](#ssml) input
* `/api/voices`
* Returns a JSON list of available voices
An [OpenAPI](https://www.openapis.org/) test page is also available at `http://localhost:59125/openapi`
### CUDA Acceleration
If you have a GPU with support for CUDA, you can accelerate synthesis with the `--cuda` flag. This requires you to install the [onnxruntime-gpu](https://pypi.org/project/onnxruntime-gpu/) Python package.
Using [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) is highly recommended. See the `Dockerfile.gpu` file in the parent repository for an example of how to build a compatible container.
## Running the Client
Assuming you have started `mimic3-server` and can access `http://localhost:59125`, then:
``` sh
mimic3 --remote --voice 'en_UK/apope_low' 'My hovercraft is full of eels.' > hovercraft_eels.wav
```
If your server is somewhere besides `localhost`, use `mimic3 --remote <URL> ...`
See `mimic3 --help` for more options.
## MaryTTS Compatibility
Use the Mimic 3 web server as a drop-in replacement for [MaryTTS](http://mary.dfki.de/), for example with [Home Assistant](https://www.home-assistant.io/integrations/marytts/).
Make sure to use a compatible [voice key](#voice-keys) like `en_UK/apope_low`.
Mimic 3 is available under the [AGPL v3 license](LICENSE)

View file

@ -1,348 +1,11 @@
# Mimic 3
A fast and local neural text to speech system for [Mycroft](https://mycroft.ai/) and the [Mark II](https://mycroft.ai/product/mark-ii/).
A fast and local neural text to speech system developed by [Mycroft](https://mycroft.ai/) for the [Mark II](https://mycroft.ai/product/mark-ii/).
* [Available voices](https://github.com/MycroftAI/mimic3-voices)
* [Mimic 3 Architecture](#architecture)
## Command-Line Tools
### mimic3
#### Basic Synthesis
```sh
mimic3 --voice <voice> "<text>" > output.wav
```
where `<voice>` is a [voice key](https://github.com/MycroftAI/mimic3/#voice-keys) like `en_UK/apope_low`.
`<TEXT>` may contain multiple sentences, which will be combined in the final output WAV file. These can also be [split into separate WAV files](#multiple-wav-output).
#### SSML Synthesis
```sh
mimic3 --ssml --voice <voice> "<ssml>" > output.wav
```
where `<ssml>` is valid [SSML](https://www.w3.org/TR/speech-synthesis11/). Not all SSML features are supported, see [the documentation](#ssml) for details.
If your SSML contains `<mark>` tags, add `--mark-file <file>` to the command-line and use `--interactive` mode. As the marks are encountered, their names will be written on separate lines to the file:
```sh
mimic3 --ssml --interactive --mark-file - '<speak>Test 1. <mark name="here" /> Test 2.</speak>'
```
#### Long Texts
If your text is very long, and you would like to listen to it as its being synthesized, use `--interactive` mode:
```sh
mimic3 --interactive < long.txt
```
Each input line will be synthesized and played (see `--play-program`). By default, 5 sentences will be kept in an output queue, only blocking synthesis when the queue is full. You can adjust this value with `--result-queue-size`.
If your long text is fixed-width with blank lines separating paragraphs like those from [Project Gutenberg](https://www.gutenberg.org/), use the `--process-on-blank-line` option so that sentences will not be broken at line boundaries. For example, you can listen to "Alice in Wonderland" like this:
```sh
curl --output - 'https://www.gutenberg.org/files/11/11-0.txt' | \
mimic3 --interactive --process-on-blank-line
```
#### Multiple WAV Output
With `--output-dir` set to a directory, Mimic 3 will output a separate WAV file for each sentence:
```sh
mimic3 'Test 1. Test 2.' --output-dir /path/to/wavs
```
By default, each WAV file will be named using the (slightly modified) text of the sentence. You can have WAV files named using a timestamp instead with `--output-naming time`. For full control of the output naming, the `--csv` command-line flag indicates that each sentence is of the form `id|text` where `id` will be the name of the WAV file.
```sh
cat << EOF |
s01|The birch canoe slid on the smooth planks.
s02|Glue the sheet to the dark blue background.
s03|It's easy to tell the depth of a well.
s04|These days a chicken leg is a rare dish.
s05|Rice is often served in round bowls.
s06|The juice of lemons makes fine punch.
s07|The box was thrown beside the parked truck.
s08|The hogs were fed chopped corn and garbage.
s09|Four hours of steady work faced us.
s10|Large size in stockings is hard to sell.
EOF
mimic3 --csv --output-dir /path/to/wavs
```
You can adjust the delimiter with `--csv-delimiter <delimiter>`.
Additionally, you can use the `--csv-voice` option to specify a different voice or speaker for each line:
```sh
cat << EOF |
s01|#awb|The birch canoe slid on the smooth planks.
s02|#rms|Glue the sheet to the dark blue background.
s03|#slt|It's easy to tell the depth of a well.
s04|#ksp|These days a chicken leg is a rare dish.
s05|#clb|Rice is often served in round bowls.
s06|#aew|The juice of lemons makes fine punch.
s07|#bdl|The box was thrown beside the parked truck.
s08|#lnh|The hogs were fed chopped corn and garbage.
s09|#jmk|Four hours of steady work faced us.
s10|en_UK/apope_low|Large size in stockings is hard to sell.
EOF
mimic3 --voice 'en_US/cmu-arctic_low' --csv-voice --output-dir /path/to/wavs
```
The second contain can contain a `#<speaker>` or an entirely different voice!
#### Interactive Mode
With `--interactive`, Mimic 3 will switch into interactive mode. After entering a sentence, it will be played with `--play-program`.
```sh
mimic3 --interactive
Reading text from stdin...
Hello world!<ENTER>
```
Use `CTRL+D` or `CTRL+C` to exit.
#### Noise and Length Settings
Synthesis has the following additional parameters:
* `--noise-scale` and `--noise-w`
* Determine the speaker volatility during synthesis
* 0-1, default is 0.667 and 0.8 respectively
* `--length-scale` - makes the voice speaker slower (> 1) or faster (< 1)
Individual voices have default settings for these parameters in their `config.json` files (under `inference`).
#### List Voices
```sh
mimic3 --voices
```
#### CUDA Acceleration
If you have a GPU with support for CUDA, you can accelerate synthesis with the `--cuda` flag. This requires you to install the [onnxruntime-gpu](https://pypi.org/project/onnxruntime-gpu/) Python package.
Using [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) is highly recommended. See the `Dockerfile.gpu` file in the parent repository for an example of how to build a compatible container.
### mimic3-download
Mimic 3 automatically downloads voices when they're first used, but you can manually download them too with `mimic3-download`.
For example:
``` sh
mimic3-download 'en_US/*'
```
will download all U.S. English voices to `${HOME}/.local/share/mycroft/mimic3` (technically `${XDG_DATA_HOME}/mimic3`).
See `mimic3-download --help` for more options.
## SSML
A subset of [SSML](https://www.w3.org/TR/speech-synthesis11/) (Speech Synthesis Markup Language) is supported:
* `<speak>` - wrap around SSML text
* `lang` - set language for document
* `<s>` - sentence (disables automatic sentence breaking)
* `lang` - set language for sentence
* `<w>` / `<token>` - word (disables automatic tokenization)
* `<voice name="...">` - set voice of inner text
* `voice` - name or language of voice
* Name format is `tts:voice` (e.g., "glow-speak:en-us_mary_ann") or `tts:voice#speaker_id` (e.g., "coqui-tts:en_vctk#p228")
* If one of the supported languages, a preferred voice is used (override with `--preferred-voice <lang> <voice>`)
* `<prosody attribute="value">` - change speaking attributes
* Supported `attribute` names:
* `volume` - speaking volume
* number in [0, 100] - 0 is silent, 100 is loudest (default)
* +X, -X, +X%, -X% - absolute/percent offset from current volume
* one of "default", "silent", "x-loud", "loud", "medium", "soft", "x-soft"
* `rate` - speaking rate
* number - 1 is default rate, < 1 is slower, > 1 is faster
* X% - 100% is default rate, 50% is half speed, 200% is twice as fast
* one of "default", "x-fast", "fast", "medium", "slow", "x-slow"
* `<say-as interpret-as="">` - force interpretation of inner text
* `interpret-as` one of "spell-out", "date", "number", "time", or "currency"
* `format` - way to format text depending on `interpret-as`
* number - one of "cardinal", "ordinal", "digits", "year"
* date - string with "d" (cardinal day), "o" (ordinal day), "m" (month), or "y" (year)
* `<break time="">` - Pause for given amount of time
* time - seconds ("123s") or milliseconds ("123ms")
* `<sub alias="">` - substitute `alias` for inner text
* `<phoneme ph="">` - supply phonemes for inner text
* See `phonemes.txt` in voice directory for available phonemes
* Phonemes may need to be separated by whitespace
SSML `<say-as>` support varies between voice types:
* [gruut](https://github.com/rhasspy/gruut/#ssml)
* [eSpeak-ng](http://espeak.sourceforge.net/ssml.html)
* Character-based voices do not currently support `<say-as>`
## Speech Dispatcher
Mimic 3 can be used with the [Orca screen reader](https://help.gnome.org/users/orca/stable/) for Linux via [speech-dispatcher](https://github.com/brailcom/speechd).
After [installing Mimic 3](https://github.com/MycroftAI/mimic3/#installation), make sure you also have speech-dispatcher installed:
``` sh
sudo apt-get install speech-dispatcher
```
Create the file `/etc/speech-dispatcher/modules/mimic3-generic.conf` with the contents:
``` text
GenericExecuteSynth "printf %s \'$DATA\' | /path/to/mimic3 --remote --voice \'$VOICE\' --stdout | $PLAY_COMMAND"
AddVoice "en-us" "MALE1" "en_UK/apope_low"
```
You will need `sudo` access to do this. Make sure to change `/path/to/mimic3` to wherever you installed Mimic 3. Note that the `--remote` option is used to connect to a local Mimic 3 web server (use `--remote <URL>` if your server is somewhere besides `localhost`).
To change the voice later, you only need to replace `en_UK/apope_low`.
Next, edit the existing file `/etc/speech-dispatcher/speechd.conf` and ensure the following settings are present:
``` text
DefaultVoiceType "MALE1"
DefaultModule mimic3-generic
```
Restart speech-dispatcher with:
``` sh
sudo systemctl restart speech-dispatcher
```
and test it out with:
``` sh
spd-say 'Hello from speech dispatcher.'
```
### Systemd Service
To ensure that Mimic 3 runs at boot, create a systemd service at `$HOME/.config/systemd/user/mimic3.service` with the contents:
``` text
[Unit]
Description=Run Mimic 3 web server
Documentation=https://github.com/MycroftAI/mimic3
[Service]
ExecStart=/path/to/mimic3-server
[Install]
WantedBy=default.target
```
Make sure to change `/path/to/mimic3-server` to wherever you installed Mimic 3.
Refresh the systemd services:
``` sh
systemctl --user daemon-reload
```
Now try starting the service:
``` sh
systemctl --user start mimic3
```
If that's successful, ensure it starts at boot:
``` sh
systemctl --user enable mimic3
```
## Architecture
Mimic 3 uses the [VITS](https://arxiv.org/abs/2106.06103), a "Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech". VITS is a combination of the [GlowTTS duration predictor](https://arxiv.org/abs/2005.11129) and the [HiFi-GAN vocoder](https://arxiv.org/abs/2010.05646).
Our implementation is heavily based on [Jaehyeon Kim's PyTorch model](https://github.com/jaywalnut310/vits), with the addition of [Onnx runtime](https://onnxruntime.ai/) export for speed.
![mimic 3 architecture](img/mimic3-architecture.png)
### Phoneme Ids
At a high level, Mimic 3 performs two important tasks:
1. Converting raw text to numeric input for the VITS TTS model, and
2. Using the model to transform numeric input into audio output
The second step is the same for every voice, but the first step (text to numbers) varies. There are currently three implementations of step 1, described below.
### gruut Phoneme-based Voices
Voices that use [gruut](https://github.com/rhasspy/gruut/) for phonemization.
gruut normalizes text and phonemizes words according to a lexicon, with a pre-trained grapheme-to-phoneme model used to guess unknown word pronunciations.
### eSpeak Phoneme-based Voices
Voices that use [eSpeak-ng](https://github.com/espeak-ng/espeak-ng) for phonemization (via [espeak-phonemizer](https://github.com/rhasspy/espeak-phonemizer)).
eSpeak-ng normalizes and phonemizes text using internal rules and lexicons. It supports a large number of languages, and can handle many textual forms.
### Character-based Voices
Voices whose "phonemes" are characters from an alphabet, typically with some punctuation.
For voices whose orthography (writing system) is close enough to its spoken form, character-based voices allow for skipping the phonemization step. However, these voices do not support text normalization, so numbers, dates, etc. must be written out.
### Epitran-based Voices
Voices that use [epitran](https://github.com/dmort27/epitran/) for phonemization.
epitran uses rules to generate phonetic pronunciations from text. It does not support text normalization, however, so numbers, dates, etc. must be written out.
### Components of a Voice Model
Voice models are stored in a directory with a specific layout:
* `<language>_<region>` (e.g., `en_UK`)
* `<voice-name>_<quality>` (e.g., `apope_low`)
* `ALIASES` - alternative names for the voice, one per line (optional)
* `config.json` - training/inference configuration (see [code](https://github.com/MycroftAI/mimic3/blob/master/mimic3-tts/mimic3_tts/config.py) for details)
* `generator.onnx` - exported inference model (see `ids_to_audio` method in [`voice.py`](https://github.com/MycroftAI/mimic3/blob/master/mimic3-tts/mimic3_tts/voice.py))
* `LICENSE` - text, name, or URL of voice model license
* `phoneme_map.txt` - mapping from source phoneme to destination phoneme(s) (optional)
* `phonemes.txt` - mapping from integer ids to phonemes (`_` = padding, `^` = beginning of utterance, `$` = end of utterance, `#` = word break)
* `README.md` - description of the voice
* `SOURCE` - URL(s) of the dataset(s) this voice was trained on
* `VERSION` - version of the voice in the format "MAJOR.Minor.bugfix" (e.g. "1.0.2")
* [Documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3)
## License
See [license file](LICENSE)
Mimic 3 is available under the [AGPL v3 license](LICENSE)

3
opentts_abc/README.md Normal file
View file

@ -0,0 +1,3 @@
# OpenTTS ABC
Abstract base classes used by the [Mimic 3](https://github.com/MycroftAI/mimic3) text to speech system.