[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As of 1.2 initial scripts have been added to aid the conversion of FestVox voices to Flite. In general the conversion cannot be automatic. For example all specific Scheme code written for a voice needs to be hand converted to C to work in Flite, this can be a major task.
Simple conversion scripts are given as examples of the stages you need to go through. These are designed to work on standard (English) diphone sets, and simple limited domain voices. The conversion technique will almost certainly fail for large unit selection voices due to limitations in the C compiler (more discussion below). In 1.4 we have also added support for converting clustergen voices too (which is a little easier, see section below).
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Conversion is basically taking the description of units (clunit catalogue or diphone index) and constructing some C files that can be compiled to form a usable database. Using the C compiler to generate the object files has the advantage that we do not need to worry about byte order, alignment and object formats as the C compiler for the particular target platform should be able to generate the right code.
Before you start ensure you have successfully built and run your FestVox voice in Festival. Flite is not designed as a voice building/debugging tool it is just a delivery vehicle for finalized voices so you should first ensure you are satisfied with the quality of Festival voices before you start converting it for Flite.
The following basic stages are required:
The conversion assumes the environment variable FLITEDIR
is set, for example
export FLITEDIR=/home/awb/projects/flite/
The basic flite conversion takes place within a FestVox voice directory. Thus all of the conversion scripts expect that the standard files are available. The first task is to build some new directories and copy in the build scripts. The scripts are copied rather than linked from the Flite directories as you may need to change these for your particular voices.
$FLITEDIR/tools/setup_flite
This will read etc/voice.defs, which should have been created by the FestVox build process (except in very old versions of FestVox).
If you don’t have a etc/voice.defs you can construct one
with festvox/src/general/guess_voice_defs
in the Festvox
distribution, or generate one by hand making it look
like
FV_INST=cmu FV_LANG=us FV_NAME=ked_timit FV_TYPE=clunits FV_VOICENAME=$FV_INST"_"$FV_LANG"_"$FV_NAME FV_FULLVOICENAME=$FV_VOICENAME"_"$FV_TYPE
The main script build building the Flite voice is bin/build_flite which will eventually build sufficient C code in flite/ that can be compiled with the constructed flite/Makefile to give you a library that can be linked into applications and also an example flite binary with the constructed voice built-in.
You can run all of these stages, except the final make, together by running the the build script with no arguments
./bin/build_flite
But as things may not run smoothly, we will go through the stages explicitly.
The first stage is to build the LPC files, this may have already been done as part of the diphone building process (though probably not in the ldom/clunit case). In our experience it is very important that the records be of similar power, as mis-matched power can often cause overflows in the resulting flite (and sometimes Festival) voices. Thus, for diphone voices, it is important to run the power normalization techniques described int he FestVox document. The Flite LPC build process also builds a parameter file of the ranges of the LPC parameters used in later coding of the files, so even if you have already built your LPC files you should still do this again
./bin/build_flite lpc
For ldom, and clunit voices (but not for diphone voices) we also need the Mel-frequency Cepstral Coefficients. These are assumed to have been cleared and are in mcep/ as they are necessary for running the voice in Festival. This stage simply constructs information about the range of the mcep parameters.
./bin/build_flite mcep
The next stage is to construct the STS files. Short Term Signals (STS) are built for each pitch period in the database. These are ascii files (one for each utterance file in the database, with LPC coefficients, and ulaw encoded residuals for each pitch period. These are built using a binary executable built as part of the Flite build (flite/tools/find_sts.
./bin/build_flite sts
Note that the flite code expects waveform files to be in Microsoft RIFF format and cannot deal with files in other formats. Some earlier versions of the Edinburgh Speech Tools used NIST as the default header format. This is likely to cause flite and its related programs not work. So do ensure your waveform files are in riff format (ch_wave -info wav/* will tell you the format). And the following fill convert all your wave files
mv wav wav.nist mkdir wav cd wav.nist for i in *.wav do ch_wave -otype riff -o ../wav/$i $i done
The next stage is to convert the index to the required C format. For diphone voices this takes the dic/*.est index files, for clunit/ldom voices it takes the festival/clunit/VOICE.catalogue and festival/trees/VOICE.tree files. This process uses a binary executable built as part of the Flite build process (flite/tools/flite_sort) to sort the indices into the same sorting order required for flite to run. (Using unix sort may or may not give the same result due to definitions of lexicographic order so we use the very same function in C that will be used in flite to ensure that a consistent order is given.)
./bin/build_flite idx
All the necessary C files should now have been built in flite/ and you may compile them by
cd flite make
This should give a library and an executable called flite that can run as
./flite "Hello World"
Assuming a general voice. For ldom voices it will only be able to say things in its domain. This flite binary offers the same options as standard the standard flite binary compiled in the Flite build but with your voice rather than the distributed voices.
Almost certainly this process will not run smoothly for you. Building voices is still a very hard thing to do and problems will probably exist.
This build process does not deal with customization for the given voices. Thus you will need to edit flite/VOICE.c to set intonation ranges and duration stretch for your particular voice.
For example in our cmu_us_sls_diphone voice (a US English female diphone voice). We had to change the default parameters from
feat_set_float(v->features,"int_f0_target_mean",110.0); feat_set_float(v->features,"int_f0_target_stddev",15.0); feat_set_float(v->features,"duration_stretch",1.0);
to
feat_set_float(v->features,"int_f0_target_mean",167.0); feat_set_float(v->features,"int_f0_target_stddev",25.0); feat_set_float(v->features,"duration_stretch",1.0);
Note this conversion is limited. Because it depends on the C compiler to do the final conversion into binary object format (a good idea in general for portability), you can easily generate files too big for the C compiler to deal with. We have spent some time investigating this so the largest possible voices can be converted but it is still too limited for our larger voices. In general the limitation seems to be best quantified by the number of pitch periods in the database. After about 100k pitch periods the files get too big to handle. There are probably solutions to this but we have not yet investigated them. This limitation doesn’t seem to be an issue with the diphone voices as they are typically much smaller than unit selection voices.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The process of building from a clustergen (cg) voice is also
supported. It is assumed the environment variable FLITEDIR
is
set
export FLITEDIR=/home/awb/projects/flite/
After you build the clustergen voice you can convert by first setting up the skeleton files in the flite/ directory
$FLITEDIR/tools/setup_flite
Assuming etc/voice.defs properly identifies the voice the cg templates will be compied in.
The conversion itself is actually much faster than a clunit build (there is less to actually convert).
./bin/build_flite cg
Will convert then necessary models into files in the flite/ directory. The you can compile it with
cd flite make ./flite_cmu_us_awb "Hello world"
Note that the voice that is to be converted *must* be a standard clustergen voice with f0, mceps, delta mceps (optionally strengths for mixed excitation) and voicing in its combined coeffs files. The method could be changed to deal with other possibilities but it will only work for default build method.
The generated library libflite_cmu_us_awb.a may be linked with
other programs like any other flite voice. The binary generated
flite_cmu_us_awb
links in only one voice (unlike the flite binary in
the full flite distribution.
A single flat file contain the cg voice can also be generated that can be loaded at run time into the flite binary. You can dump this file from the initial constructed flite binary
./flite_cmu_us_awb -voicedump cmu_us_awb.flitevox
The file cmu_us_awb.flitevox may now be references (with pathname/url) on the flite command line and used by the synthesizer
./flite -voice cmu_us_awb.flitevox "Hello World"
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As of 1.3 the script for converting the CMU lexicon (as distributed as part of Festival) is included. make_cmulex will use the version of CMULEX unpacked in the current directory to build a new lexicon. Also in 1.3. a more sophisticated compression technique is used to reduce the lexicon size. The lexicon is pruned, removing those words which the letter to sound rule models get correct. Also the letters and phones are separately huffman coded to produce a smaller lexicon.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This is by far the weakest part as this is the most open ended. There are basic tools in the flite/tools/ directory that include Scheme code to convert various Scheme structures to C include CART tree conversion and Lisp list conversion. The other major source of help here is the existing language examples in flite/lang/usenglish/.
Adding new language support is far from automatic, but there are core scripts for setting up new Flite support for languages and lexicons. There are also scripts for converting (Festival) phoneset definitions to C and converting Festival lexicons to LTS rules and compressed lexicons in C.
But beyond that you are sort of on your own. The largest gap here is text normalization. We do not yet have a standardize model for text normalization with well definied models for which we could write conversion scripts.
However here is a step by step attempt to show you what to do when building support for a new language/lexicon.
Suppose we need to create support for Pashto, and already have a festival voice running, and want it now to run in flite. Converting the voice itself (unitselction or clustergen) is fairly robust, but you will also need C libraries for cmu_pashto_lang and cmu_pasho_lex. The first stage is to create the basic temple files for these. In the core flite/ source directory type
./tools/make_new_lang_lex pashto
This will create language and lex template files in lang/cmu_pashto_lang/' and cmu_pashto_lex.
Then in firectory lang/cmu_pashto_lang/ type
festival $FLITEDIR/tools/make_phoneset.scm ... festival> (phonesettoC "cmu_pashto" (car (load "PATHTO/cmu_pashto_transtac_phoneset.scm" t)) "pau" ".")
This will create cmu_pashto_lang_phoneset.[ch]. You must the add these explicitly to the Makefile.
Then in lang/cmu_pashto_lex/ you have to build the C version of the lexicon and letter to sound rules. The core script is in flite/tools/build_lex.
mkdir lex cd lex cp -p $FLITEDIR/tooks/build_lex .
Edit build_lex to give it the name of your lexicon name, and compiled lexicon from your voice.
LEXNAME=cmu_pashto LEXFILE=lexicon.out
You should (I think) remove the first line “MNCL” from your lexicon.out file, note this must be the compiled lexicon not the list of entries you compiled from as it expects the ordering, and the syllabification.
./build_lex setup
Build the letter to sound rules (probably again)
./build_lex lts
Convert the compiled letter to sound rules to C. This converts the decision trees to decision graphs and runs WFST minimization of them to get a more efficient set of structures. This puts the generated C files in c/.
./build_lex lts2c
Now convert the lexical entries themselves
./build_lex lex
Again the generate C files will be put in c/.
Now we generated a Huffman codes compressed lexicon to reduce the lexicon size, merging frequent letter sequences and phone sequences.
./build_lex compresslex
The copy the .c and .h files to lang/cmu_pashto_lex/ [something about compressed and non-compressed???]
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |