[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Flite is a library that we expected will be embedded into other applications. Included with the distribution is a small example executable that allows synthesis of strings of text and text files from the command line.
You may want to look at Bard http://festvox.org/bard/, an ebook reader with a tight coupling to flite as a synthesizer. This is the most elaborate use of the Flite API within our suite of programs.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The example flite binary may be suitable for very simple applications. Unlike Festival its start up time is very short (less than 25ms on a PIII 500MHz) making it practical (on larger machines) to call it each time you need to synthesize something.
flite TEXT OUTPUTTYPE
If TEXT
contains a space it is treated as a string of text and
converted to speech, if it does not contain a space TEXT
is
treated as a file name and the contents of that file are converted to
speech. The option -t
specifies TEXT
is to be treat
as text (not a filename) and -f
forces treatment as a file.
Thus
flite -t hello
will say the word "hello" while
flite hello
will say the content of the file hello. Likewise
flite "hello world."
will say the words "hello world" while
flite -f "hello world"
will say the contents of a file hello world. If no argument is specified text is read from standard input.
The second argument OUTPUTTYPE
is the name of a file the output
is written to, or if it is play
then it is played to the audio
device directly. If it is none
then the audio is created but
discarded, this is used for benchmarking. If it is stream
then
the audio is streamed through a call back function (though this is not
particularly useful in the command line version. If OUTPUTTYPE
is omitted, play
is assumed. You can also explicitly set the
outputtype with the -o
flag.
flite -f doc/alice -o alice.wav
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
All the voices in the distribution are collected into a single simple
list in the global variable flite_voice_list
. You can select a
voice from this list from the command line
flite -voice awb -f doc/alice -o alice.wav
And list which voices are currently supported in the binary with
flite -lv
The voices which get linked together are those listed in the
VOICES
in the main/Makefile. You can change that as you
require.
Voices may also be dynamically loaded from files as well as built in.
The argument to the -voice
option may be pathname to a dumped
(Clustergen) voice. This may be a Unix pathname or a URL (only protocols http and file are supported. For example
flite -voice file://cmu_us_awb.flitevox -f doc/alice -o alice.wav flite -voice http://festvox.org/voices/cmu_us_ksp.flitevox -f doc/alice -o alice.wav
Voices will be loaded once and added to flite_voice_list
.
Although these voices are often small (a few megabytes) there will
still be some time required to read them in the first time. The
voices are not mapped, they are read into newly created structures.
This loading function is currently only supported for Clustergen voices.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Each voice in Flite is held in a structure, a pointer to which is
returned by the voice registration function. In the standard
distribution, the example diphone voice is cmu_us_kal
.
Here is a simple C program that uses the flite library
#include "flite.h" cst_voice * register_cmu_us_kal(const char *voxdir); int main(int argc, char **argv) { cst_voice *v; if (argc != 2) { fprintf(stderr,"usage: flite_test FILE\n"); exit(-1); } flite_init(); v = register_cmu_us_kal(NULL); flite_file_to_speech(argv[1],v,"play"); }
Assuming the shell variable FLITEDIR is set to the flite directory the following will compile the system (with appropriate changes for your platform if necessary).
gcc -Wall -g -o flite_test flite_test.c -I$FLITEDIR/include -L$FLITEDIR/lib -lflite_cmu_us_kal -lflite_usenglish -lflite_cmulex -lflite -lm
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Although, of course you are welcome to call lower level functions, there a few key functions that will satisfy most users of flite.
void flite_init(void);
This must be called before any other flite function can be called. As of Flite 1.1, it actually does nothing at all, but there is no guarantee that this will remain true.
cst_wave *flite_text_to_wave(const char *text,cst_voice *voice);
Returns a waveform (as defined in include/cst_wave.h) synthesized from the given text string by the given voice.
float flite_file_to_speech(const char *filename, cst_voice *voice, const char *outtype);
synthesizes all the sentences in the file filename with given
voice. Output (at present) can only reasonably be, play
or
none
. If the feature file_start_position
with an
integer, that point is used as start position in the file to be synthesized.
float flite_text_to_speech(const char *text, cst_voice *voice, const char *outtype);
synthesizes the text in string point to by text
, with the given
voice. outtype
may be a filename where the generated waveform is
written to, or "play" and it will be sent to the audio device, or
"none" and it will be discarded. The return value is the
number of seconds of speech generated.
cst_utterance *flite_synth_text(const char *text,cst_voice *voice);
synthesize the given text with the given voice and returns an utterance from it for further processing and access.
cst_utterance *flite_synth_phones(const char *phones,cst_voice *voice);
synthesize the given phones with the given voice and returns an utterance from it for further processing and access.
cst_voice *flite_voice_select(const char *name);
returns a pointer to the voice named name
. Will retrurn
NULL
if there is not match, if name == NULL
then the
first voice in the voice list is returned. If name
is a url
(starting with file: or http:, that file will be
accessed and the voice will be downloaded from there.
float flite_ssml_file_to_speech(const char *filename, cst_voice *voice, const char *outtype);
Will read the file as ssml, not all ssml tags are supported but many are, unsupported ones are ignored. Voice selection works by naming the internal name of the voice, or the name may be a url and the voice will be loaded. The audio tag is supported for loading waveform files, again urls are supported.
float flite_ssml_text_to_speech(const char *text, cst_voice *voice, const char *outtype);
Will treat the text as ssml.
int flite_voice_add_lex_addenda(cst_voice *v, const cst_string *lexfile);
loads the pronunciations from lexfile
into the lexicon
identified in the given voice (which will cause all other voices using
that lexicon to also get this new addenda list. An example lexicon
file is given in flite/tools/examples.lex. Words may be in
double quotes, an optional part of speech tag may be give. A colon
separates the headword/postag from the list of phonemes. Stress
values (if used in the lexicon) must be specified. Bad phonemes will
be complained about on standard out.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In 1.4 support was added for streaming synthesis. Basically you may provide a call back function that will be called with waveform data immediately when it is available. This potentially can reduce the delay between sending text to the synthesized and having audio available.
The support is through a call back function of type
int audio_stream_chunk(const cst_wave *w, int start, int size, int last, cst_audio_streaming_info *asi)
If the utterance feature streaming_info
is set (which can
be set in a voice or in an utterance). The LPC or MLSA resynthesis
functions will call the provided function as buffers become available.
The LPC and MLSA waveform synthesis functions are used for diphones,
limited domain, unit selection and clustergen voices. Note explicit
support is required for streaming so new waveform synthesis function
may not have the functionality.
An example streaming function is provided in
src/audio/au_streaming.c and is used by the example flite main
program when stream
is given as the playing option. (Though in
the command line program the function it isn’t really useful.)
In order to use streaming you must provide call back function in your particular thread. This is done by adding features to the voice in your thread. Suppose your function was declared as
int example_audio_stream_chunk(const cst_wave *w, int start, int size, int last, void *user)
You can add this function as the streaming function through the statement
cst_audio_streaming_info *asi; ... asi = new_audio_streaming_info(); asi->asc = example_audio_stream_chunk; feat_set(voice->features, "streaming_info", audio_streaming_info_val(asi));
You may also optionally include your own pointer to any information you additionally want to pass to your function. For example
typedef my_callback_struct { cst_audiodev *fd; int count; }; cst_audio_streaming_info *asi; ... mcs = cst_alloc(my_callback_struct,1); mcs->fd=NULL; mcs->count=1; asi = new_audio_streaming_info(); asi->asc = example_audio_stream_chunk; asi->userdata = mcs; feat_set(voice->features, "streaming_info", audio_streaming_info_val(asi));
Another example is given in testsuite/by_word_main.c which shows a call back funtion that also prints the token as it is being synthesized. The utt field in the cst_audio_streaming_info structure will be set to the current utterance. Please note that the item field in the cst_audio_streaming_info structure is for your convenience and is not set by anyone at all. The previous sentence exists in the documentation so that I can point at it, when user’s fail to read it.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |