A fast local neural text to speech engine for Mycroft

Find a file

Michael Hansen 0cbfe3fbeb Bump version		2022-05-04 19:03:33 -04:00
debian	Fix debian scripts	2022-05-03 11:51:22 -04:00
docker	Merge mimic3-client into main CLI	2022-04-11 14:50:48 -04:00
examples	Merge mimic3-client into main CLI	2022-04-11 14:50:48 -04:00
img	Refactor into a single package	2022-05-02 15:24:43 -04:00
mimic3_http	Refactor into a single package	2022-05-02 15:24:43 -04:00
mimic3_tts	Bump version	2022-05-04 19:03:33 -04:00
opentts_abc	Refactor into a single package	2022-05-02 15:24:43 -04:00
pyinstaller	Merge mimic3-client into main CLI	2022-04-11 14:50:48 -04:00
scripts	Add github release scripts	2022-05-04 12:05:51 -04:00
tests	Build plugin dist	2022-04-29 11:46:12 -04:00
voices	Keep specific directories	2022-04-27 12:02:17 -04:00
wheels	Keep specific directories	2022-04-27 12:02:17 -04:00
.gitignore	Bump version	2022-05-04 19:03:33 -04:00
.isort.cfg	Add script to generate deterministic samples and hashes	2022-04-20 11:43:15 -04:00
.projectile	Initial commit	2022-03-16 17:22:15 -04:00
build-dist.sh	Refactor into a single package	2022-05-02 15:24:43 -04:00
check.sh	Refactor into a single package	2022-05-02 15:24:43 -04:00
Dockerfile	Show debug for voice download in Docker builds	2022-05-04 14:30:38 -04:00
Dockerfile.debian	Using mycroft/mimic3 instead of mycroft-mimic3-tts	2022-05-03 11:13:22 -04:00
Dockerfile.debian.dockerignore	Refactor into a single package	2022-05-02 15:24:43 -04:00
Dockerfile.dist	Show debug for voice download in Docker builds	2022-05-04 14:30:38 -04:00
Dockerfile.dist.dockerignore	Refactor into a single package	2022-05-02 15:24:43 -04:00
Dockerfile.dockerignore	Refactor into a single package	2022-05-02 15:24:43 -04:00
Dockerfile.gpu	Using mycroft/mimic3 instead of mycroft-mimic3-tts	2022-05-03 11:13:22 -04:00
Dockerfile.gpu.dockerignore	Refactor into a single package	2022-05-02 15:24:43 -04:00
Dockerfile.plugin	Bump version	2022-05-04 19:03:33 -04:00
Dockerfile.plugin.dockerignore	Fix ignore for plugin build	2022-05-02 15:34:01 -04:00
Dockerfile.sample	Show debug for voice download in Docker builds	2022-05-04 14:30:38 -04:00
Dockerfile.sample.dockerignore	Refactor into a single package	2022-05-02 15:24:43 -04:00
install.sh	Refactor into a single package	2022-05-02 15:24:43 -04:00
Jenkinsfile	Try arm64	2022-05-04 16:57:37 -04:00
LICENSE	Fix license	2022-03-25 16:57:31 -04:00
Makefile	Fix typo	2022-04-29 11:51:50 -04:00
MANIFEST.in	Refactor into a single package	2022-05-02 15:24:43 -04:00
mypy.ini	Refactor into a single package	2022-05-02 15:24:43 -04:00
pylintrc	Add script to generate deterministic samples and hashes	2022-04-20 11:43:15 -04:00
README.md	Using mycroft/mimic3 instead of mycroft-mimic3-tts	2022-05-03 11:13:22 -04:00
requirements.txt	Refactor into a single package	2022-05-02 15:24:43 -04:00
requirements_dev.txt	Working single-file binary build	2022-03-30 17:08:56 -04:00
setup.cfg	Refactor into a single package	2022-05-02 15:24:43 -04:00
setup.py	Refactor into a single package	2022-05-02 15:24:43 -04:00
test.sh	Set no download option for test	2022-04-25 19:04:00 -04:00

README.md

Mimic 3

A fast and local neural text to speech system developed by Mycroft for the Mark II.

Use Cases

Mycroft TTS plugin
- mycroft-say 'Hello world.'
Web server
- curl -X POST --data 'Hello world.' --output - localhost:59125/api/tts | aplay
- Drop-in replacement for MaryTTS
Command-line tool
- mimic3 'Hello world.' | aplay
Voice for screen reader
- spd-say 'Hello world.'

Dependencies

Mimic 3 requires:

Python 3.7 or higher
The Onnx runtime
gruut or eSpeak-ng or epitran (depending on the voice)

Installation

eSpeak

Some voices depend on eSpeak-ng, specifically libespeak-ng.so. For those voices, make sure that libespeak-ng is installed with:

sudo apt-get install libespeak-ng1

On 32-bit ARM platforms (a.k.a. armv7l or armhf), you will also need some extra libraries:

sudo apt-get install libatomic1 libgomp1 libatlas-base-dev

Mycroft TTS Plugin

Install the plugin:

mycroft-pip install mycroft-plugin-tts-mimic3[all]

Enable the plugin in your mycroft.conf file:

mycroft-config set tts.module mimic3_tts_plug

or you can manually add the following to mycroft.conf with mycroft-config edit user:

"tts": {
  "module": "mimic3_tts_plug"
}

See the plugin's documentation for more options.

Docker image

A pre-built Docker image is available for the following platforms:

linux/amd64
- For desktops and laptops (x86_64 CPUs)
linux/arm64
- For Raspberry 3/4 and Zero 2 with 64-bit Pi OS
linux/arm/v7
- For Raspberry Pi 1/2/3/4 and Zero 2 with 32-bit Pi OS

Install/update with:

docker pull mycroftai/mimic3

Once installed, check out the following scripts for running:

Or you can manually run the web server with:

docker run \
       -it \
       -p 59125:59125 \
       -v "${HOME}/.local/share/mycroft/mimic3:/home/mimic3/.local/share/mycroft/mimic3" \
       'mycroftai/mimic3'

Voices will be automatically downloaded to ${HOME}/.local/share/mycroft/mimic3/voices

Debian Package

Grab the Debian package from the latest release for your platform:

mycroft-mimic3-tts_<version>_amd64.deb
- For desktops and laptops (x86_64 CPUs)
mycroft-mimic3-tts_<version>_arm64.deb
- For Raspberry 3/4 and Zero 2 with 64-bit Pi OS
mycroft-mimic3-tts_<version>_armhf.deb
- For Raspberry Pi 1/2/3/4 and Zero 2 with 32-bit Pi OS

Once downloaded, install the package with (note the ./):

sudo apt install ./mycroft-mimic3-tts_<version>_<platform>.deb

Once installed, the following commands will be available in /usr/bin:

mimic3
mimic3-server
mimic3-download

Using pip

Install the command-line tool:

pip install mycroft-mimic3-tts[all]

Once installed, the following commands will be available:

mimic3
mimic3-download
mimic3-server

Language support can be selectively installed by replacing all with:

de - German
es - Spanish
fa - Farsi
fr - French
it - Italian
nl - Dutch
ru - Russian
sw - Kiswahili

Excluding [..] entirely will install support for English only.

From Source

Clone the repository:

git clone https://github.com/MycroftAI/mimic3.git

Run the install script:

cd mimic3/
./install.sh

A virtual environment will be created in mimic3/.venv and each of the Python modules will be installed in editiable mode (pip install -e).

Once installed, the following commands will be available in .venv/bin:

mimic3
mimic3-server
mimic3-download

Voice Keys

Mimic 3 references voices with the format:

<language>_<region>/<dataset>_<quality> for single speaker voices, and
<language>_<region>/<dataset>_<quality>#<speaker> for multi-speaker voices
- <speaker> can be a name or number starting at 0
- Speaker names come from a voice's speakers.txt file

For example, the default Alan Pope voice key is en_UK/apope_low. The CMU Arctic voice contains multiple speakers, with a commonly used voice being en_US/cmu-arctic_low#slt.

Voices are automatically downloaded from Github and stored in ${HOME}/.local/share/mycroft/mimic3 (technically ${XDG_DATA_HOME}/mycroft/mimic3). You can also manually download them.

Running

Command-Line Tools

The mimic3 command can be used to synthesize audio on the command line:

mimic3 --voice 'en_UK/apope_low' 'My hovercraft is full of eels.' > hovercraft_eels.wav

See voice keys for how to reference voices and speakers.

See mimic3 --help or the CLI documentation for more details.

Downloading Voices

Mimic 3 automatically downloads voices when they're first used, but you can manually download them too with mimic3-download.

For example:

mimic3-download 'en_US/*'

will download all U.S. English voices to ${HOME}/.local/share/mycroft/mimic3/voices.

See mimic3-download --help for more options.

Web Server and Client

Start a web server with mimic3-server and visit http://localhost:59125 to view the web UI.

The following endpoints are available:

/api/tts
- POST text or SSML and receive WAV audio back
- Use ?voice= to select a different voice/speaker
- Set Content-Type to application/ssml+xml (or use ?ssml=1) for SSML input
/api/voices
- Returns a JSON list of available voices

An OpenAPI test page is also available at http://localhost:59125/openapi

See mimic3-server --help for the web server documentation for more details.

Web Client

The mimic3 program provides an interface to the Mimic 3 web server when the --remote option is given.

Assuming you have started mimic3-server and can access http://localhost:59125, then:

mimic3 --remote --voice 'en_UK/apope_low' 'My hovercraft is full of eels.' > hovercraft_eels.wav

If your server is somewhere besides localhost, use mimic3 --remote <URL> ...

See mimic3 --help for more options.

CUDA Acceleration

If you have a GPU with support for CUDA, you can accelerate synthesis with the --cuda flag when running mimic3 or mimic3-server. This requires you to install the onnxruntime-gpu Python package.

Using nvidia-docker is highly recommended. See Dockerfile.gpu for an example of how to build a compatible container.

MaryTTS Compatibility

Use the Mimic 3 web server as a drop-in replacement for MaryTTS, for example with Home Assistant.

Make sure to use a compatible voice key like en_UK/apope_low.

For Mycroft, you can use this instead of the plugin by running:

mycroft-config edit user

and then adding the following:

"tts": {
"module": "marytts",
"marytts": {
    "url": "http://localhost:59125",
    "voice": "en_UK/apope_low"
}

SSML

A subset of SSML (Speech Synthesis Markup Language) is supported.

For example:

<speak>
  <voice name="en_UK/apope_low">
    <s>
      Welcome to the world of speech synthesis.
    </s>
  </voice>
  <break time="3s" />
  <voice name="en_US/cmu-arctic_low#slt">
    <s>
      <prosody volume="soft" rate="150%">
        This is a <say-as interpret-as="number" format="ordinal">2</say-as> voice.
      </prosody>
    </s>
  </voice>
</speak>

will speak the two sentences with different voices and a 3 second second pause in between. The second sentence will also have the number "2" pronounced as "second" (ordinal form).

SSML <say-as> support varies between voice types:

gruut
eSpeak-ng
epitran voices do not currently support <say-as>
Character-based voices do not currently support <say-as>

License

See license file