More README goodness

2022-03-31 16:50:28 -04:00 · 2022-03-31 16:50:28 -04:00 · c6e218fdae
commit c6e218fdae
parent 02f02efc77
4 changed files with 155 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -150,6 +150,8 @@ See `mimic3-download --help` for more options.

 Start a web server with `mimic3-server` and visit `http://localhost:59125` to view the web UI.

+![screenshot of web interface](mimic3-http/img/server_screenshot.jpg)
+
 The following endpoints are available:

 * `/api/tts`
@ -178,12 +180,13 @@ mimic3-client --voice 'en_UK/apope_low' 'My hovercraft is full of eels.' > hover
 ## MaryTTS Compatibility

 Use the  Mimic 3 web server as a drop-in replacement for [MaryTTS](http://mary.dfki.de/), for example with [Home Assistant](https://www.home-assistant.io/integrations/marytts/).
-Make sure to use a compatible [voice key](#voice-keys).
+
+Make sure to use a compatible [voice key](#voice-keys) like `en_UK/apope_low`.


 ## SSML

-A [subset of SSML](mimic3-tts/#SSML) is supported.
+A [subset of SSML](mimic3-tts/#SSML) (Speech Synthesis Markup Language) is supported.

 For example:

--- a/mimic3-http/README.md
+++ b/mimic3-http/README.md
@ -4,6 +4,8 @@ A small HTTP web server for the [Mimic 3](https://github.com/MycroftAI/mimic3) t

 [Available voices](https://github.com/MycroftAI/mimic3-voices)

+![screenshot of web interface](img/server_screenshot.jpg)
+

 ## Installation

--- a/mimic3-http/img/server_screenshot.jpg
+++ b/mimic3-http/img/server_screenshot.jpg
--- a/mimic3-tts/README.md
+++ b/mimic3-tts/README.md
@ -11,12 +11,148 @@ A fast and local neural text to speech system for [Mycroft](https://mycroft.ai/)
 ### mimic3


+#### Basic Synthesis
+
+```sh
+mimic3 --voice <voice> "<text>" > output.wav
+```
+
+where `<voice>` is a [voice key](https://github.com/MycroftAI/mimic3/#voice-keys) like `en_UK/apope_low`.
+`<TEXT>` may contain multiple sentences, which will be combined in the final output WAV file. These can also be [split into separate WAV files](#multiple-wav-output).
+
+
+#### SSML Synthesis
+
+```sh
+mimic3 --ssml --voice <voice> "<ssml>" > output.wav
+```
+
+where `<ssml>` is valid [SSML](https://www.w3.org/TR/speech-synthesis11/). Not all SSML features are supported, see [the documentation](#ssml) for details.
+
+If your SSML contains `<mark>` tags, add `--mark-file <file>` to the command-line and use `--interactive` mode. As the marks are encountered, their names will be written on separate lines to the file:
+
+```sh
+mimic3 --ssml --interactive --mark-file - '<speak>Test 1. <mark name="here" /> Test 2.</speak>'
+```
+
+
+#### Long Texts
+
+If your text is very long, and you would like to listen to it as its being synthesized, use `--interactive` mode:
+
+```sh
+mimic3 --interactive < long.txt
+```
+
+Each input line will be synthesized and played (see `--play-program`). By default, 5 sentences will be kept in an output queue, only blocking synthesis when the queue is full. You can adjust this value with `--result-queue-size`.
+
+If your long text is fixed-width with blank lines separating paragraphs like those from [Project Gutenberg](https://www.gutenberg.org/), use the `--process-on-blank-line` option so that sentences will not be broken at line boundaries. For example, you can listen to "Alice in Wonderland" like this:
+
+```sh
+curl --output - 'https://www.gutenberg.org/files/11/11-0.txt' | \
+    mimic3 --interactive --process-on-blank-line
+```
+
+
+#### Multiple WAV Output
+
+With `--output-dir` set to a directory, Larynx will output a separate WAV file for each sentence:
+
+```sh
+mimic3 'Test 1. Test 2.' --output-dir /path/to/wavs
+```
+
+By default, each WAV file will be named using the (slightly modified) text of the sentence. You can have WAV files named using a timestamp instead with `--output-naming time`. For full control of the output naming, the `--csv` command-line flag indicates that each sentence is of the form `id|text` where `id` will be the name of the WAV file.
+
+```sh
+cat << EOF |
+s01|The birch canoe slid on the smooth planks.
+s02|Glue the sheet to the dark blue background.
+s03|It's easy to tell the depth of a well.
+s04|These days a chicken leg is a rare dish.
+s05|Rice is often served in round bowls.
+s06|The juice of lemons makes fine punch.
+s07|The box was thrown beside the parked truck.
+s08|The hogs were fed chopped corn and garbage.
+s09|Four hours of steady work faced us.
+s10|Large size in stockings is hard to sell.
+EOF
+  mimic3 --csv --output-dir /path/to/wavs
+```
+
+You can adjust the delimiter with `--csv-delimiter <delimiter>`.
+
+Additionally, you can use the `--csv-voice` option to specify a different voice or speaker for each line:
+
+```sh
+cat << EOF |
+s01|#awb|The birch canoe slid on the smooth planks.
+s02|#rms|Glue the sheet to the dark blue background.
+s03|#slt|It's easy to tell the depth of a well.
+s04|#ksp|These days a chicken leg is a rare dish.
+s05|#clb|Rice is often served in round bowls.
+s06|#aew|The juice of lemons makes fine punch.
+s07|#bdl|The box was thrown beside the parked truck.
+s08|#lnh|The hogs were fed chopped corn and garbage.
+s09|#jmk|Four hours of steady work faced us.
+s10|en_UK/apope_low|Large size in stockings is hard to sell.
+EOF
+  mimic3 --voice 'en_US/cmu-arctic_low' --csv-voice --output-dir /path/to/wavs
+```
+
+The second contain can contain a `#<speaker>` or an entirely different voice!
+
+
+#### Interactive Mode
+
+With `--interactive`, Mimic 3 will switch into interactive mode. After entering a sentence, it will be played with `--play-program`.
+
+```sh
+mimic3 --interactive
+Reading text from stdin...
+Hello world!<ENTER>
+```
+
+Use `CTRL+D` or `CTRL+C` to exit.
+
+
+#### Noise and Length Settings
+
+Synthesis has the following additional parameters:
+
+* `--noise-scale` and `--noise-w`
+    * Determine the speaker volatility during synthesis
+    * 0-1, default is 0.667 and 0.8 respectively
+* `--length-scale` - makes the voice speaker slower (> 1) or faster (< 1)
+
+Individual voices have default settings for these parameters in their `config.json` files (under `inference`).
+
+
+#### List Voices
+
+```sh
+mimic3 --voices
+```
+
+
 ### mimic3-download

+Mimic 3 automatically downloads voices when they're first used, but you can manually download them too with `mimic3-download`.
+
+For example:
+
+``` sh
+mimic3-download 'en_US/*'
+```
+
+will download all U.S. English voices to `${HOME}/.local/share/mimic3` (technically `${XDG_DATA_HOME}/mimic3`).
+
+See `mimic3-download --help` for more options.
+

 ## SSML

-A subset of [SSML](https://www.w3.org/TR/speech-synthesis11/) is supported:
+A subset of [SSML](https://www.w3.org/TR/speech-synthesis11/) (Speech Synthesis Markup Language) is supported:

 * `<speak>` - wrap around SSML text
    * `lang` - set language for document
@ -35,6 +171,15 @@ A subset of [SSML](https://www.w3.org/TR/speech-synthesis11/) is supported:
 * `<break time="">` - Pause for given amount of time
    * time - seconds ("123s") or milliseconds ("123ms")
 * `<sub alias="">` - substitute `alias` for inner text
+* `<phoneme ph="">` - supply phonemes for inner text
+    * See `phonemes.txt` in voice directory for available phonemes
+    * Phonemes may need to be separated by whitespace
+
+SSML `<say-as>` support varies between voice types:
+
+* [gruut](https://github.com/rhasspy/gruut/#ssml)
+* [eSpeak-ng](http://espeak.sourceforge.net/ssml.html)
+* Character-based voices do not currently support `<say-as>`


 ## Architecture
@ -48,6 +193,8 @@ Our implementation is heavily based on [Jaehyeon Kim's PyTorch model](https://gi

 Voices that use [gruut](https://github.com/rhasspy/gruut/) for phonemization.

+gruut phonemizes words according to a lexicon, with a pre-trained grapheme-to-phoneme model used to guess unknown word pronunciations.
+

 ### eSpeak Phoneme-based Voices