Centralize documentation
This commit is contained in:
parent
4a781a1a64
commit
07f3c00d9b
5 changed files with 47 additions and 683 deletions
332
README.md
332
README.md
|
|
@ -5,99 +5,32 @@
|
|||
A fast and local neural text to speech system developed by [Mycroft](https://mycroft.ai/) for the [Mark II](https://mycroft.ai/product/mark-ii/).
|
||||
|
||||
* [Available voices](https://github.com/MycroftAI/mimic3-voices)
|
||||
* [How does it work?](mimic3_tts/#architecture)
|
||||
* [Documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3)
|
||||
* [How does it work?](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3#how-it-works)
|
||||
|
||||
|
||||
## Use Cases
|
||||
|
||||
* [Mycroft TTS plugin](#mycroft-tts-plugin)
|
||||
* `mycroft-say 'Hello world.'`
|
||||
* [Web server](#web-server-and-client)
|
||||
* `curl -X POST --data 'Hello world.' --output - localhost:59125/api/tts | aplay`
|
||||
* Drop-in [replacement for MaryTTS](#marytts-compatibility)
|
||||
* [Command-line tool](#command-line-tools)
|
||||
* `mimic3 'Hello world.' | aplay`
|
||||
* [Voice for screen reader](mimic3_tts/#speech-dispatcher)
|
||||
* `spd-say 'Hello world.'`
|
||||
|
||||
|
||||
## Dependencies
|
||||
|
||||
Mimic 3 requires:
|
||||
|
||||
* Python 3.7 or higher
|
||||
* The [Onnx runtime](https://onnxruntime.ai/)
|
||||
* [gruut](https://github.com/rhasspy/gruut) or [eSpeak-ng](https://github.com/espeak-ng/espeak-ng) or [epitran](https://github.com/dmort27/epitran/) (depending on the voice)
|
||||
|
||||
|
||||
## Installation
|
||||
|
||||
|
||||
### eSpeak
|
||||
|
||||
Some voices depend on [eSpeak-ng](https://github.com/espeak-ng/espeak-ng), specifically `libespeak-ng.so`. For those voices, make sure that libespeak-ng is installed with:
|
||||
|
||||
``` sh
|
||||
sudo apt-get install libespeak-ng1
|
||||
```
|
||||
|
||||
On 32-bit ARM platforms (a.k.a. `armv7l` or `armhf`), you will also need some extra libraries:
|
||||
|
||||
``` sh
|
||||
sudo apt-get install libatomic1 libgomp1 libatlas-base-dev
|
||||
```
|
||||
|
||||
## Quickstart
|
||||
|
||||
### Mycroft TTS Plugin
|
||||
|
||||
Install the plugin:
|
||||
|
||||
``` sh
|
||||
# Install system packages
|
||||
sudo apt-get install libespeak-ng1
|
||||
|
||||
# Install plugin
|
||||
mycroft-pip install mycroft-plugin-tts-mimic3[all]
|
||||
```
|
||||
|
||||
Enable the plugin in your [mycroft.conf](https://mycroft-ai.gitbook.io/docs/using-mycroft-ai/customizations/mycroft-conf) file:
|
||||
|
||||
``` sh
|
||||
# Activate plugin
|
||||
mycroft-config set tts.module mimic3_tts_plug
|
||||
|
||||
# Start mycroft
|
||||
mycroft-start all
|
||||
```
|
||||
|
||||
or you can manually add the following to `mycroft.conf` with `mycroft-config edit user`:
|
||||
|
||||
``` json
|
||||
"tts": {
|
||||
"module": "mimic3_tts_plug"
|
||||
}
|
||||
```
|
||||
See [documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3#tts-plugin-for-mycroft-ai) for more details.
|
||||
|
||||
|
||||
See the [plugin's documentation](https://github.com/MycroftAI/plugin-tts-mimic3) for more options.
|
||||
|
||||
|
||||
### Docker image
|
||||
|
||||
A pre-built Docker image is available for the following platforms:
|
||||
|
||||
* `linux/amd64`
|
||||
* For desktops and laptops (`x86_64` CPUs)
|
||||
* `linux/arm64`
|
||||
* For Raspberry 3/4 and Zero 2 with [64-bit Pi OS](https://www.raspberrypi.com/news/raspberry-pi-os-64-bit/)
|
||||
* `linux/arm/v7`
|
||||
* For Raspberry Pi 1/2/3/4 and Zero 2 with 32-bit Pi OS
|
||||
|
||||
Install/update with:
|
||||
|
||||
``` sh
|
||||
docker pull mycroftai/mimic3
|
||||
```
|
||||
|
||||
Once installed, check out the following scripts for running:
|
||||
|
||||
* [`mimic3`](docker/mimic3)
|
||||
* [`mimic3-server`](docker/mimic3-server)
|
||||
* [`mimic3-download`](docker/mimic3-download)
|
||||
|
||||
Or you can manually run the web server with:
|
||||
### Web Server
|
||||
|
||||
``` sh
|
||||
docker run \
|
||||
|
|
@ -107,234 +40,43 @@ docker run \
|
|||
'mycroftai/mimic3'
|
||||
```
|
||||
|
||||
Voices will be automatically downloaded to `${HOME}/.local/share/mycroft/mimic3/voices`
|
||||
|
||||
|
||||
### Debian Package
|
||||
|
||||
Grab the Debian package from the [latest release](https://github.com/mycroftAI/mimic3/releases) for your platform:
|
||||
|
||||
* `mycroft-mimic3-tts_<version>_amd64.deb`
|
||||
* For desktops and laptops (`x86_64` CPUs)
|
||||
* `mycroft-mimic3-tts_<version>_arm64.deb`
|
||||
* For Raspberry 3/4 and Zero 2 with [64-bit Pi OS](https://www.raspberrypi.com/news/raspberry-pi-os-64-bit/)
|
||||
* `mycroft-mimic3-tts_<version>_armhf.deb`
|
||||
* For Raspberry Pi 1/2/3/4 and Zero 2 with 32-bit Pi OS
|
||||
|
||||
Once downloaded, install the package with (note the `./`):
|
||||
|
||||
``` sh
|
||||
sudo apt install ./mycroft-mimic3-tts_<version>_<platform>.deb
|
||||
```
|
||||
|
||||
Once installed, the following commands will be available in `/usr/bin`:
|
||||
|
||||
* `mimic3`
|
||||
* `mimic3-server`
|
||||
* `mimic3-download`
|
||||
|
||||
|
||||
### Using pip
|
||||
|
||||
Install the command-line tool:
|
||||
|
||||
``` sh
|
||||
pip install mycroft-mimic3-tts[all]
|
||||
```
|
||||
|
||||
Once installed, the following commands will be available:
|
||||
|
||||
* `mimic3`
|
||||
* `mimic3-download`
|
||||
* `mimic3-server`
|
||||
|
||||
Language support can be selectively installed by replacing `all` with:
|
||||
|
||||
* `de` - German
|
||||
* `es` - Spanish
|
||||
* `fa` - Farsi
|
||||
* `fr` - French
|
||||
* `it` - Italian
|
||||
* `nl` - Dutch
|
||||
* `ru` - Russian
|
||||
* `sw` - Kiswahili
|
||||
|
||||
Excluding `[..]` entirely will install support for English only.
|
||||
|
||||
|
||||
### From Source
|
||||
|
||||
Clone the repository:
|
||||
|
||||
``` sh
|
||||
git clone https://github.com/MycroftAI/mimic3.git
|
||||
```
|
||||
|
||||
Run the install script:
|
||||
|
||||
``` sh
|
||||
cd mimic3/
|
||||
./install.sh
|
||||
```
|
||||
|
||||
A virtual environment will be created in `mimic3/.venv` and each of the Python modules will be installed in editiable mode (`pip install -e`).
|
||||
|
||||
Once installed, the following commands will be available in `.venv/bin`:
|
||||
|
||||
* `mimic3`
|
||||
* `mimic3-server`
|
||||
* `mimic3-download`
|
||||
|
||||
|
||||
## Voice Keys
|
||||
|
||||
Mimic 3 references voices with the format:
|
||||
|
||||
* `<language>_<region>/<dataset>_<quality>` for single speaker voices, and
|
||||
* `<language>_<region>/<dataset>_<quality>#<speaker>` for multi-speaker voices
|
||||
* `<speaker>` can be a name or number starting at 0
|
||||
* Speaker names come from a voice's `speakers.txt` file
|
||||
|
||||

|
||||
|
||||
For example, the default [Alan Pope](https://popey.me/) voice key is `en_UK/apope_low`.
|
||||
The [CMU Arctic voice](https://github.com/MycroftAI/mimic3-voices/tree/master/voices/en_US/cmu-arctic_low) contains multiple speakers, with a commonly used voice being `en_US/cmu-arctic_low#slt`.
|
||||
|
||||
Voices are automatically downloaded from [Github](https://github.com/MycroftAI/mimic3-voices) and stored in `${HOME}/.local/share/mycroft/mimic3` (technically `${XDG_DATA_HOME}/mycroft/mimic3`). You can also [manually download them](#downloading-voices).
|
||||
|
||||
|
||||
## Running
|
||||
|
||||
|
||||
### Command-Line Tools
|
||||
|
||||
The `mimic3` command can be used to synthesize audio on the command line:
|
||||
|
||||
``` sh
|
||||
mimic3 --voice 'en_UK/apope_low' 'My hovercraft is full of eels.' > hovercraft_eels.wav
|
||||
```
|
||||
|
||||
See [voice keys](#voice-keys) for how to reference voices and speakers.
|
||||
|
||||
See `mimic3 --help` or the [CLI documentation](mimic3_tts/) for more details.
|
||||
|
||||
|
||||
#### Downloading Voices
|
||||
|
||||
Mimic 3 automatically downloads voices when they're first used, but you can manually download them too with `mimic3-download`.
|
||||
|
||||
For example:
|
||||
|
||||
``` sh
|
||||
mimic3-download 'en_US/*'
|
||||
```
|
||||
|
||||
will download all U.S. English voices to `${HOME}/.local/share/mycroft/mimic3/voices`.
|
||||
|
||||
See `mimic3-download --help` for more options.
|
||||
|
||||
|
||||
### Web Server and Client
|
||||
|
||||
Start a web server with `mimic3-server` and visit `http://localhost:59125` to view the web UI.
|
||||
|
||||
To access the server from outside your device, run `mimic3-server --host 0.0.0.0` (see also `--port <PORT>`).
|
||||
|
||||

|
||||
|
||||
The following endpoints are available:
|
||||
|
||||
* `/api/tts`
|
||||
* `POST` text or [SSML](#ssml) and receive WAV audio back
|
||||
* Use `?voice=` to select a different [voice/speaker](#voice-keys)
|
||||
* Set `Content-Type` to `application/ssml+xml` (or use `?ssml=1`) for [SSML](#ssml) input
|
||||
* `/api/voices`
|
||||
* Returns a JSON list of available voices
|
||||
|
||||
An [OpenAPI](https://www.openapis.org/) test page is also available at `http://localhost:59125/openapi`
|
||||
|
||||
See `mimic3-server --help` for the [web server documentation](mimic3_http/) for more details.
|
||||
|
||||
|
||||
#### Web Client
|
||||
|
||||
The `mimic3` program provides an interface to the Mimic 3 web server when the `--remote` option is given.
|
||||
|
||||
Assuming you have started `mimic3-server` and can access `http://localhost:59125`, then:
|
||||
|
||||
``` sh
|
||||
mimic3 --remote --voice 'en_UK/apope_low' 'My hovercraft is full of eels.' > hovercraft_eels.wav
|
||||
```
|
||||
|
||||
If your server is somewhere besides `localhost`, use `mimic3 --remote <URL> ...`
|
||||
|
||||
See `mimic3 --help` for more options.
|
||||
|
||||
|
||||
## CUDA Acceleration
|
||||
|
||||
If you have a GPU with support for CUDA, you can accelerate synthesis with the `--cuda` flag when running `mimic3` or `mimic3-server`. This requires you to install the [onnxruntime-gpu](https://pypi.org/project/onnxruntime-gpu/) Python package.
|
||||
|
||||
Using [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) is highly recommended. See [ Dockerfile.gpu](Dockerfile.gpu) for an example of how to build a compatible container.
|
||||
|
||||
|
||||
## MaryTTS Compatibility
|
||||
|
||||
Use the Mimic 3 web server as a drop-in replacement for [MaryTTS](http://mary.dfki.de/), for example with [Home Assistant](https://www.home-assistant.io/integrations/marytts/).
|
||||
|
||||
Make sure to use a Mimic 3 [voice key](#voice-keys) like `en_UK/apope_low` instead of a MaryTTS voice name.
|
||||
|
||||
For Mycroft, you can use this instead of [the plugin](https://github.com/MycroftAI/plugin-tts-mimic3) by running:
|
||||
Visit [http://localhost:59125](http://localhost:59125) or from another terminal:
|
||||
|
||||
|
||||
``` sh
|
||||
mycroft-config edit user
|
||||
curl -X POST --data 'Hello world.' --output - localhost:59125/api/tts | aplay
|
||||
|
||||
```
|
||||
|
||||
and then adding the following:
|
||||
See [documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3#web-server) for more details.
|
||||
|
||||
``` json
|
||||
"tts": {
|
||||
"module": "marytts",
|
||||
"marytts": {
|
||||
"url": "http://localhost:59125",
|
||||
"voice": "en_UK/apope_low"
|
||||
}
|
||||
|
||||
### Command-Line Tool
|
||||
|
||||
``` sh
|
||||
# Install system packages
|
||||
sudo apt-get install libespeak-ng1
|
||||
|
||||
# Create virtual environment
|
||||
python3 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip3 install --upgrade pip
|
||||
|
||||
pip3 install mycroft-mimic3-tts[all]
|
||||
```
|
||||
|
||||
Now you can run:
|
||||
|
||||
## SSML
|
||||
|
||||
A [subset of SSML](mimic3_tts/#SSML) (Speech Synthesis Markup Language) is supported.
|
||||
|
||||
For example:
|
||||
|
||||
``` xml
|
||||
<speak>
|
||||
<voice name="en_UK/apope_low">
|
||||
<s>
|
||||
Welcome to the world of speech synthesis.
|
||||
</s>
|
||||
</voice>
|
||||
<break time="3s" />
|
||||
<voice name="en_US/cmu-arctic_low#slt">
|
||||
<s>
|
||||
<prosody volume="soft" rate="150%">
|
||||
This is a <say-as interpret-as="number" format="ordinal">2</say-as> voice.
|
||||
</prosody>
|
||||
</s>
|
||||
</voice>
|
||||
</speak>
|
||||
``` sh
|
||||
mimic3 'Hello world.' | aplay
|
||||
```
|
||||
|
||||
will speak the two sentences with different voices and a 3 second second pause in between. The second sentence will also have the number "2" pronounced as "second" (ordinal form).
|
||||
Use `mimic3-server` and `mimic3 --remote ...` for repeated usage (much faster).
|
||||
|
||||
SSML `<say-as>` support varies between voice types:
|
||||
See [documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3#command-line-interface) for more details.
|
||||
|
||||
* [gruut](https://github.com/rhasspy/gruut/#ssml)
|
||||
* [eSpeak-ng](http://espeak.sourceforge.net/ssml.html)
|
||||
* [epitran](https://github.com/dmort27/epitran/) voices do not currently support `<say-as>`
|
||||
* Character-based voices do not currently support `<say-as>`
|
||||
|
||||
---
|
||||
|
||||
|
||||
## License
|
||||
|
|
|
|||
Binary file not shown.
|
Before Width: | Height: | Size: 58 KiB After Width: | Height: | Size: 58 KiB |
|
|
@ -2,56 +2,12 @@
|
|||
|
||||
A small HTTP web server for the [Mimic 3](https://github.com/MycroftAI/mimic3) text to speech system.
|
||||
|
||||
[Available voices](https://github.com/MycroftAI/mimic3-voices)
|
||||
* [Available voices](https://github.com/MycroftAI/mimic3-voices)
|
||||
* [Documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3)
|
||||
|
||||

|
||||
|
||||
|
||||
## Running the Server
|
||||
## License
|
||||
|
||||
``` sh
|
||||
mimic3-server
|
||||
```
|
||||
|
||||
This will start a web server at `http://localhost:59125`
|
||||
|
||||
See `mimic3-server --debug` for more options.
|
||||
|
||||
|
||||
### Endpoints
|
||||
|
||||
* `/api/tts`
|
||||
* `POST` text or [SSML](#ssml) and receive WAV audio back
|
||||
* Use `?voice=` to select a different [voice/speaker](#voice-keys)
|
||||
* Set `Content-Type` to `application/ssml+xml` (or use `?ssml=1`) for [SSML](#ssml) input
|
||||
* `/api/voices`
|
||||
* Returns a JSON list of available voices
|
||||
|
||||
An [OpenAPI](https://www.openapis.org/) test page is also available at `http://localhost:59125/openapi`
|
||||
|
||||
|
||||
### CUDA Acceleration
|
||||
|
||||
If you have a GPU with support for CUDA, you can accelerate synthesis with the `--cuda` flag. This requires you to install the [onnxruntime-gpu](https://pypi.org/project/onnxruntime-gpu/) Python package.
|
||||
|
||||
Using [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) is highly recommended. See the `Dockerfile.gpu` file in the parent repository for an example of how to build a compatible container.
|
||||
|
||||
|
||||
## Running the Client
|
||||
|
||||
Assuming you have started `mimic3-server` and can access `http://localhost:59125`, then:
|
||||
|
||||
``` sh
|
||||
mimic3 --remote --voice 'en_UK/apope_low' 'My hovercraft is full of eels.' > hovercraft_eels.wav
|
||||
```
|
||||
|
||||
If your server is somewhere besides `localhost`, use `mimic3 --remote <URL> ...`
|
||||
|
||||
See `mimic3 --help` for more options.
|
||||
|
||||
|
||||
## MaryTTS Compatibility
|
||||
|
||||
Use the Mimic 3 web server as a drop-in replacement for [MaryTTS](http://mary.dfki.de/), for example with [Home Assistant](https://www.home-assistant.io/integrations/marytts/).
|
||||
|
||||
Make sure to use a compatible [voice key](#voice-keys) like `en_UK/apope_low`.
|
||||
Mimic 3 is available under the [AGPL v3 license](LICENSE)
|
||||
|
|
|
|||
|
|
@ -1,348 +1,11 @@
|
|||
# Mimic 3
|
||||
|
||||
A fast and local neural text to speech system for [Mycroft](https://mycroft.ai/) and the [Mark II](https://mycroft.ai/product/mark-ii/).
|
||||
A fast and local neural text to speech system developed by [Mycroft](https://mycroft.ai/) for the [Mark II](https://mycroft.ai/product/mark-ii/).
|
||||
|
||||
* [Available voices](https://github.com/MycroftAI/mimic3-voices)
|
||||
* [Mimic 3 Architecture](#architecture)
|
||||
|
||||
|
||||
## Command-Line Tools
|
||||
|
||||
|
||||
### mimic3
|
||||
|
||||
|
||||
#### Basic Synthesis
|
||||
|
||||
```sh
|
||||
mimic3 --voice <voice> "<text>" > output.wav
|
||||
```
|
||||
|
||||
where `<voice>` is a [voice key](https://github.com/MycroftAI/mimic3/#voice-keys) like `en_UK/apope_low`.
|
||||
`<TEXT>` may contain multiple sentences, which will be combined in the final output WAV file. These can also be [split into separate WAV files](#multiple-wav-output).
|
||||
|
||||
|
||||
#### SSML Synthesis
|
||||
|
||||
```sh
|
||||
mimic3 --ssml --voice <voice> "<ssml>" > output.wav
|
||||
```
|
||||
|
||||
where `<ssml>` is valid [SSML](https://www.w3.org/TR/speech-synthesis11/). Not all SSML features are supported, see [the documentation](#ssml) for details.
|
||||
|
||||
If your SSML contains `<mark>` tags, add `--mark-file <file>` to the command-line and use `--interactive` mode. As the marks are encountered, their names will be written on separate lines to the file:
|
||||
|
||||
```sh
|
||||
mimic3 --ssml --interactive --mark-file - '<speak>Test 1. <mark name="here" /> Test 2.</speak>'
|
||||
```
|
||||
|
||||
|
||||
#### Long Texts
|
||||
|
||||
If your text is very long, and you would like to listen to it as its being synthesized, use `--interactive` mode:
|
||||
|
||||
```sh
|
||||
mimic3 --interactive < long.txt
|
||||
```
|
||||
|
||||
Each input line will be synthesized and played (see `--play-program`). By default, 5 sentences will be kept in an output queue, only blocking synthesis when the queue is full. You can adjust this value with `--result-queue-size`.
|
||||
|
||||
If your long text is fixed-width with blank lines separating paragraphs like those from [Project Gutenberg](https://www.gutenberg.org/), use the `--process-on-blank-line` option so that sentences will not be broken at line boundaries. For example, you can listen to "Alice in Wonderland" like this:
|
||||
|
||||
```sh
|
||||
curl --output - 'https://www.gutenberg.org/files/11/11-0.txt' | \
|
||||
mimic3 --interactive --process-on-blank-line
|
||||
```
|
||||
|
||||
|
||||
#### Multiple WAV Output
|
||||
|
||||
With `--output-dir` set to a directory, Mimic 3 will output a separate WAV file for each sentence:
|
||||
|
||||
```sh
|
||||
mimic3 'Test 1. Test 2.' --output-dir /path/to/wavs
|
||||
```
|
||||
|
||||
By default, each WAV file will be named using the (slightly modified) text of the sentence. You can have WAV files named using a timestamp instead with `--output-naming time`. For full control of the output naming, the `--csv` command-line flag indicates that each sentence is of the form `id|text` where `id` will be the name of the WAV file.
|
||||
|
||||
```sh
|
||||
cat << EOF |
|
||||
s01|The birch canoe slid on the smooth planks.
|
||||
s02|Glue the sheet to the dark blue background.
|
||||
s03|It's easy to tell the depth of a well.
|
||||
s04|These days a chicken leg is a rare dish.
|
||||
s05|Rice is often served in round bowls.
|
||||
s06|The juice of lemons makes fine punch.
|
||||
s07|The box was thrown beside the parked truck.
|
||||
s08|The hogs were fed chopped corn and garbage.
|
||||
s09|Four hours of steady work faced us.
|
||||
s10|Large size in stockings is hard to sell.
|
||||
EOF
|
||||
mimic3 --csv --output-dir /path/to/wavs
|
||||
```
|
||||
|
||||
You can adjust the delimiter with `--csv-delimiter <delimiter>`.
|
||||
|
||||
Additionally, you can use the `--csv-voice` option to specify a different voice or speaker for each line:
|
||||
|
||||
```sh
|
||||
cat << EOF |
|
||||
s01|#awb|The birch canoe slid on the smooth planks.
|
||||
s02|#rms|Glue the sheet to the dark blue background.
|
||||
s03|#slt|It's easy to tell the depth of a well.
|
||||
s04|#ksp|These days a chicken leg is a rare dish.
|
||||
s05|#clb|Rice is often served in round bowls.
|
||||
s06|#aew|The juice of lemons makes fine punch.
|
||||
s07|#bdl|The box was thrown beside the parked truck.
|
||||
s08|#lnh|The hogs were fed chopped corn and garbage.
|
||||
s09|#jmk|Four hours of steady work faced us.
|
||||
s10|en_UK/apope_low|Large size in stockings is hard to sell.
|
||||
EOF
|
||||
mimic3 --voice 'en_US/cmu-arctic_low' --csv-voice --output-dir /path/to/wavs
|
||||
```
|
||||
|
||||
The second contain can contain a `#<speaker>` or an entirely different voice!
|
||||
|
||||
|
||||
#### Interactive Mode
|
||||
|
||||
With `--interactive`, Mimic 3 will switch into interactive mode. After entering a sentence, it will be played with `--play-program`.
|
||||
|
||||
```sh
|
||||
mimic3 --interactive
|
||||
Reading text from stdin...
|
||||
Hello world!<ENTER>
|
||||
```
|
||||
|
||||
Use `CTRL+D` or `CTRL+C` to exit.
|
||||
|
||||
|
||||
#### Noise and Length Settings
|
||||
|
||||
Synthesis has the following additional parameters:
|
||||
|
||||
* `--noise-scale` and `--noise-w`
|
||||
* Determine the speaker volatility during synthesis
|
||||
* 0-1, default is 0.667 and 0.8 respectively
|
||||
* `--length-scale` - makes the voice speaker slower (> 1) or faster (< 1)
|
||||
|
||||
Individual voices have default settings for these parameters in their `config.json` files (under `inference`).
|
||||
|
||||
|
||||
#### List Voices
|
||||
|
||||
```sh
|
||||
mimic3 --voices
|
||||
```
|
||||
|
||||
|
||||
#### CUDA Acceleration
|
||||
|
||||
If you have a GPU with support for CUDA, you can accelerate synthesis with the `--cuda` flag. This requires you to install the [onnxruntime-gpu](https://pypi.org/project/onnxruntime-gpu/) Python package.
|
||||
|
||||
Using [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) is highly recommended. See the `Dockerfile.gpu` file in the parent repository for an example of how to build a compatible container.
|
||||
|
||||
|
||||
|
||||
### mimic3-download
|
||||
|
||||
Mimic 3 automatically downloads voices when they're first used, but you can manually download them too with `mimic3-download`.
|
||||
|
||||
For example:
|
||||
|
||||
``` sh
|
||||
mimic3-download 'en_US/*'
|
||||
```
|
||||
|
||||
will download all U.S. English voices to `${HOME}/.local/share/mycroft/mimic3` (technically `${XDG_DATA_HOME}/mimic3`).
|
||||
|
||||
See `mimic3-download --help` for more options.
|
||||
|
||||
|
||||
## SSML
|
||||
|
||||
A subset of [SSML](https://www.w3.org/TR/speech-synthesis11/) (Speech Synthesis Markup Language) is supported:
|
||||
|
||||
* `<speak>` - wrap around SSML text
|
||||
* `lang` - set language for document
|
||||
* `<s>` - sentence (disables automatic sentence breaking)
|
||||
* `lang` - set language for sentence
|
||||
* `<w>` / `<token>` - word (disables automatic tokenization)
|
||||
* `<voice name="...">` - set voice of inner text
|
||||
* `voice` - name or language of voice
|
||||
* Name format is `tts:voice` (e.g., "glow-speak:en-us_mary_ann") or `tts:voice#speaker_id` (e.g., "coqui-tts:en_vctk#p228")
|
||||
* If one of the supported languages, a preferred voice is used (override with `--preferred-voice <lang> <voice>`)
|
||||
* `<prosody attribute="value">` - change speaking attributes
|
||||
* Supported `attribute` names:
|
||||
* `volume` - speaking volume
|
||||
* number in [0, 100] - 0 is silent, 100 is loudest (default)
|
||||
* +X, -X, +X%, -X% - absolute/percent offset from current volume
|
||||
* one of "default", "silent", "x-loud", "loud", "medium", "soft", "x-soft"
|
||||
* `rate` - speaking rate
|
||||
* number - 1 is default rate, < 1 is slower, > 1 is faster
|
||||
* X% - 100% is default rate, 50% is half speed, 200% is twice as fast
|
||||
* one of "default", "x-fast", "fast", "medium", "slow", "x-slow"
|
||||
* `<say-as interpret-as="">` - force interpretation of inner text
|
||||
* `interpret-as` one of "spell-out", "date", "number", "time", or "currency"
|
||||
* `format` - way to format text depending on `interpret-as`
|
||||
* number - one of "cardinal", "ordinal", "digits", "year"
|
||||
* date - string with "d" (cardinal day), "o" (ordinal day), "m" (month), or "y" (year)
|
||||
* `<break time="">` - Pause for given amount of time
|
||||
* time - seconds ("123s") or milliseconds ("123ms")
|
||||
* `<sub alias="">` - substitute `alias` for inner text
|
||||
* `<phoneme ph="">` - supply phonemes for inner text
|
||||
* See `phonemes.txt` in voice directory for available phonemes
|
||||
* Phonemes may need to be separated by whitespace
|
||||
|
||||
SSML `<say-as>` support varies between voice types:
|
||||
|
||||
* [gruut](https://github.com/rhasspy/gruut/#ssml)
|
||||
* [eSpeak-ng](http://espeak.sourceforge.net/ssml.html)
|
||||
* Character-based voices do not currently support `<say-as>`
|
||||
|
||||
|
||||
## Speech Dispatcher
|
||||
|
||||
Mimic 3 can be used with the [Orca screen reader](https://help.gnome.org/users/orca/stable/) for Linux via [speech-dispatcher](https://github.com/brailcom/speechd).
|
||||
|
||||
After [installing Mimic 3](https://github.com/MycroftAI/mimic3/#installation), make sure you also have speech-dispatcher installed:
|
||||
|
||||
``` sh
|
||||
sudo apt-get install speech-dispatcher
|
||||
```
|
||||
|
||||
Create the file `/etc/speech-dispatcher/modules/mimic3-generic.conf` with the contents:
|
||||
|
||||
``` text
|
||||
GenericExecuteSynth "printf %s \'$DATA\' | /path/to/mimic3 --remote --voice \'$VOICE\' --stdout | $PLAY_COMMAND"
|
||||
AddVoice "en-us" "MALE1" "en_UK/apope_low"
|
||||
```
|
||||
|
||||
You will need `sudo` access to do this. Make sure to change `/path/to/mimic3` to wherever you installed Mimic 3. Note that the `--remote` option is used to connect to a local Mimic 3 web server (use `--remote <URL>` if your server is somewhere besides `localhost`).
|
||||
|
||||
To change the voice later, you only need to replace `en_UK/apope_low`.
|
||||
|
||||
Next, edit the existing file `/etc/speech-dispatcher/speechd.conf` and ensure the following settings are present:
|
||||
|
||||
``` text
|
||||
DefaultVoiceType "MALE1"
|
||||
DefaultModule mimic3-generic
|
||||
```
|
||||
|
||||
Restart speech-dispatcher with:
|
||||
|
||||
``` sh
|
||||
sudo systemctl restart speech-dispatcher
|
||||
```
|
||||
|
||||
and test it out with:
|
||||
|
||||
``` sh
|
||||
spd-say 'Hello from speech dispatcher.'
|
||||
```
|
||||
|
||||
|
||||
### Systemd Service
|
||||
|
||||
To ensure that Mimic 3 runs at boot, create a systemd service at `$HOME/.config/systemd/user/mimic3.service` with the contents:
|
||||
|
||||
``` text
|
||||
[Unit]
|
||||
Description=Run Mimic 3 web server
|
||||
Documentation=https://github.com/MycroftAI/mimic3
|
||||
|
||||
[Service]
|
||||
ExecStart=/path/to/mimic3-server
|
||||
|
||||
[Install]
|
||||
WantedBy=default.target
|
||||
```
|
||||
|
||||
Make sure to change `/path/to/mimic3-server` to wherever you installed Mimic 3.
|
||||
|
||||
Refresh the systemd services:
|
||||
|
||||
``` sh
|
||||
systemctl --user daemon-reload
|
||||
```
|
||||
|
||||
Now try starting the service:
|
||||
|
||||
``` sh
|
||||
systemctl --user start mimic3
|
||||
```
|
||||
|
||||
If that's successful, ensure it starts at boot:
|
||||
|
||||
``` sh
|
||||
systemctl --user enable mimic3
|
||||
```
|
||||
|
||||
|
||||
## Architecture
|
||||
|
||||
Mimic 3 uses the [VITS](https://arxiv.org/abs/2106.06103), a "Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech". VITS is a combination of the [GlowTTS duration predictor](https://arxiv.org/abs/2005.11129) and the [HiFi-GAN vocoder](https://arxiv.org/abs/2010.05646).
|
||||
|
||||
Our implementation is heavily based on [Jaehyeon Kim's PyTorch model](https://github.com/jaywalnut310/vits), with the addition of [Onnx runtime](https://onnxruntime.ai/) export for speed.
|
||||
|
||||

|
||||
|
||||
|
||||
### Phoneme Ids
|
||||
|
||||
At a high level, Mimic 3 performs two important tasks:
|
||||
|
||||
1. Converting raw text to numeric input for the VITS TTS model, and
|
||||
2. Using the model to transform numeric input into audio output
|
||||
|
||||
The second step is the same for every voice, but the first step (text to numbers) varies. There are currently three implementations of step 1, described below.
|
||||
|
||||
|
||||
### gruut Phoneme-based Voices
|
||||
|
||||
Voices that use [gruut](https://github.com/rhasspy/gruut/) for phonemization.
|
||||
|
||||
gruut normalizes text and phonemizes words according to a lexicon, with a pre-trained grapheme-to-phoneme model used to guess unknown word pronunciations.
|
||||
|
||||
|
||||
### eSpeak Phoneme-based Voices
|
||||
|
||||
Voices that use [eSpeak-ng](https://github.com/espeak-ng/espeak-ng) for phonemization (via [espeak-phonemizer](https://github.com/rhasspy/espeak-phonemizer)).
|
||||
|
||||
eSpeak-ng normalizes and phonemizes text using internal rules and lexicons. It supports a large number of languages, and can handle many textual forms.
|
||||
|
||||
|
||||
### Character-based Voices
|
||||
|
||||
Voices whose "phonemes" are characters from an alphabet, typically with some punctuation.
|
||||
|
||||
For voices whose orthography (writing system) is close enough to its spoken form, character-based voices allow for skipping the phonemization step. However, these voices do not support text normalization, so numbers, dates, etc. must be written out.
|
||||
|
||||
|
||||
### Epitran-based Voices
|
||||
|
||||
Voices that use [epitran](https://github.com/dmort27/epitran/) for phonemization.
|
||||
|
||||
epitran uses rules to generate phonetic pronunciations from text. It does not support text normalization, however, so numbers, dates, etc. must be written out.
|
||||
|
||||
|
||||
### Components of a Voice Model
|
||||
|
||||
Voice models are stored in a directory with a specific layout:
|
||||
|
||||
* `<language>_<region>` (e.g., `en_UK`)
|
||||
* `<voice-name>_<quality>` (e.g., `apope_low`)
|
||||
* `ALIASES` - alternative names for the voice, one per line (optional)
|
||||
* `config.json` - training/inference configuration (see [code](https://github.com/MycroftAI/mimic3/blob/master/mimic3-tts/mimic3_tts/config.py) for details)
|
||||
* `generator.onnx` - exported inference model (see `ids_to_audio` method in [`voice.py`](https://github.com/MycroftAI/mimic3/blob/master/mimic3-tts/mimic3_tts/voice.py))
|
||||
* `LICENSE` - text, name, or URL of voice model license
|
||||
* `phoneme_map.txt` - mapping from source phoneme to destination phoneme(s) (optional)
|
||||
* `phonemes.txt` - mapping from integer ids to phonemes (`_` = padding, `^` = beginning of utterance, `$` = end of utterance, `#` = word break)
|
||||
* `README.md` - description of the voice
|
||||
* `SOURCE` - URL(s) of the dataset(s) this voice was trained on
|
||||
* `VERSION` - version of the voice in the format "MAJOR.Minor.bugfix" (e.g. "1.0.2")
|
||||
* [Documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/coming-soon-mimic-3)
|
||||
|
||||
|
||||
## License
|
||||
|
||||
See [license file](LICENSE)
|
||||
Mimic 3 is available under the [AGPL v3 license](LICENSE)
|
||||
|
|
|
|||
3
opentts_abc/README.md
Normal file
3
opentts_abc/README.md
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
# OpenTTS ABC
|
||||
|
||||
Abstract base classes used by the [Mimic 3](https://github.com/MycroftAI/mimic3) text to speech system.
|
||||
Loading…
Add table
Add a link
Reference in a new issue