speech synthesis

James J. Besemer jb at cascade-sys.com
Fri Jun 28 13:26:52 EDT 2002


Alistair Campbell wrote:

> I am a newbie with Python so any response to this may be way over my head.
>
> Basically, I am wanting to know whether there are any python modules that
> allow you to write programs which include voice synthesis. [...]
> So, I want to be able to convert
> typed text and stored blocks of text into audible speech using standard
> sound card interfaces.

> I am using Python 2.2 on a Windows OS. I have a Linux OS as well but I am

What I have done is I downloaded the ViaVoice software from IBM.  This is a
large package that includes speech recognition, although presently I use it ONLY
for text to speech generation.  I'll go out on a limb here and suggest that
Speech Recognition and Analysis is something better done by an auxilliary
package outside of Python, rather than in Python itself.

ViaVoice is presently FREE on Linux, downloadable from IBM's web pages.   THe
package also runs on Windows, though IBM markets it as a retail product, so it's
not free.  They have a 'personal' version for $30 which I *believe* includes all
the necessary parts.  Once you have the Retail version, you can download the
Text to speech API for free and access the speech generator programmatically, if
you wanna go that route (may have to on Windows).

I have text to speech running on Linux.  I have not yet installed or tried it on
Windows.

ViaVoice installs pretty easily on Linux and works with its standard Audio
devices.  The only glitch I have encountered is that Linux dinks with device
permissions when people log in and out and this sometimes screws up long running
apps.  I need to change that permission thingy.

For the simple things I am doing I did not attempt to interface Python to the
API.  ViaVoice comes with some command line utilities that perform all the basic
tasks.  On Linux, I simply use os.popen2() to write the text to the command line
program.  This works fine for me.  There is a tiny latency in starting to talk
but I am not sure that would be speeded up via the API, as I suspect there is
some considerable computation to generate the speech.  I did a similar thing to
play WAV files via the Linux "play" command.

The generated speech is quite inteligible, though clearly mechanical.  The
default voice sounds a lot like Stephen Hawkings, though it's easy to configure
it for other voices and the install comes with a variety.  Even easier is to
speed up or slow down speech, change pitch and simple things like that.  It
comes with a customizable pronunciation dictionary but I haven't taken a look at
it as it does pretty great by itself.  If you have important proper nouns or
technical terms, you may need to work with it.  It does a pretty good job of
Understanding the text before pronuncing, thus it does the 'right thing' with
things like

    Dr. Smith lives on Elm Dr.

As I said, I merely put some simple python wrappers around the out of the box
IBM command line.  You could dig deeper along these lines if you wish or, as I
did, instead focus on the rest of your Python app.

First step would be to get the IBM software install it and get it to work.

You can look at my low-level Python code via

    http://www.cascade-sys.com/~jb/Pythonetics/vv/index.htm

As I said, the IBM sofware for Linux is available for download.  I forget the
URL.  A Google search for "IBM ViaVoice Linux" (maybe add "download"). should
get you close.  IIRC you sort of have to dink around their web pages to find the
exact right download page.  You also have to "register" before they'll let you
have the free code.

I got a copy of the Windows Personal version at FRY's for $30.  It's still in
the shrink wrap, as my associated project is on Linux.




--
James J. Besemer  503-280-0838 voice
http://cascade-sys.com  503-280-0375 fax
mailto:jb at cascade-sys.com







More information about the Python-list mailing list