Uses ctypes and libespeak-ng to transform test into IPA phonemes
Find a file
2022-09-22 17:58:16 -06:00
bin Add larynx-ids script 2021-08-15 16:48:49 -04:00
espeak_phonemizer Set unique_speaker and user_data parameters to null in espeak_Synth 2021-10-11 15:46:41 -04:00
scripts Set unique_speaker and user_data parameters to null in espeak_Synth 2021-10-11 15:46:41 -04:00
tests Add tests and update README 2021-08-25 17:17:46 -04:00
.gitignore Set unique_speaker and user_data parameters to null in espeak_Synth 2021-10-11 15:46:41 -04:00
.isort.cfg Initial commit 2021-08-02 17:19:20 -04:00
.projectile Initial commit 2021-08-02 17:19:20 -04:00
LICENSE Add license and separate out phonemes2ids library 2021-08-10 15:12:08 -04:00
Makefile Add tests and update README 2021-08-25 17:17:46 -04:00
mypy.ini Initial commit 2021-08-02 17:19:20 -04:00
pylintrc Initial commit 2021-08-02 17:19:20 -04:00
README.md Add tests and update README 2021-08-25 17:17:46 -04:00
requirements_dev.txt Set unique_speaker and user_data parameters to null in espeak_Synth 2021-10-11 15:46:41 -04:00
setup.cfg Initial commit 2021-08-02 17:19:20 -04:00
setup.py Update setup.py 2022-09-22 17:58:16 -06:00

eSpeak Phonemizer

Uses ctypes and libespeak-ng to transform text into IPA phonemes.

Installation

First, install libespeak-ng:

sudo apt-get install libespeak-ng1

Next, install espeak_phonemizer:

pip install espeak_phonemizer

If installation was successful, you should be able to run:

espeak-phonemizer --version

Basic Phonemization

Simply pass your text into the standard input of espeak-phonemizer:

echo 'This is a test.' | espeak-phonemizer -v en-us
ðɪs ɪz ɐ tˈɛst

Separators

Phoneme and word separators can be changed:

echo 'This is a test.' | espeak-phonemizer -v en-us -p '_' -w '#'
ð_ɪ_s#ɪ_z#ɐ#t_ˈɛ_s_t

Punctuation and Stress

Some punctuation can be kept (.,;:!?) in the output:

echo 'This: is, a, test.' | espeak-phonemizer -v en-us --keep-punctuation
ðˈɪs: ˈɪz, ˈeɪ, tˈɛst.

Stress markers can also be dropped:

echo 'This is a test.' | espeak-phonemizer -v en-us --no-stress
ðɪs ɪz ɐ tɛst

Delimited Input

The --csv flag enables delimited input with fields separated by a '|' (change with --csv-delimiter):

echo 's1|This is a test.' | espeak-phonemizer -v en-us --csv
s1|This is a test.|ðɪs ɪz ɐ tˈɛst

Phonemes are added as a final column, allowing you to pass arbitrary metadata through to the output.

Parallelize with GNU Parallel

parallel -a /path/to/input.csv --pipepart \
    espeak-phonemizer -v en-us --csv \
    > /path/to/output.csv