Natural Speech (was Re: [Edu-sig] CP4E: Programming for Fun...)

Jason Cunliffe Jason Cunliffe" <jasonic@nomadics.org
Sat, 7 Sep 2002 23:46:22 -0400


Teryy,

For Python a great place to start is 'Snack'.

http://www.speech.kth.se/snack/

"The Snack Sound Toolkit is designed to be used with a scripting language such
as Tcl/Tk or Python. Using Snack you can create powerful multi-platform audio
applications with just a few lines of code. Snack has commands for basic sound
handling, e.g. sound card and disk I/O. Snack also has primitives for sound
visualization, e.g. waveforms and spectrograms. It was developed mainly to
handle digital recordings of speech, but is just as useful for general audio.
Snack has also successfully been applied to other one-dimensional signals.
The combination of Snack and a scripting language makes it possible to create
sound tools and applications with a minimum of effort. This is due to the rapid
development nature of scripting languages. As a bonus you get an application
that is cross-platform from start. It is also easy to integrate Snack based
applications with existing sound analysis software."

Python Snack manual, version 2.2
http://www.speech.kth.se/snack/man/snack2.2/python-man.html

Spectograms are crucial to understanding speech and the role of formants.

Be sure to followup on the Snack links page:
http://www.speech.kth.se/snack/links.html

For example, a paper on the use of Snack was presented at the ICSLP '98
conference.
[ICSLP = International Conference on Spoken Language Processing]
http://www.speech.kth.se/~jocke/publications/icslp98_web.html




Two other important toolkits I recommend to help you do what you want are:

PD = Pure Data and its GEM extensions
http://www.pure-data.org/
http://www.crca.ucsd.edu/~msp/Pd_documentation/index.htm
PD has terrific community [andmailing list] doing all manner of experimental
sound work, much of for live performance orientation. The GEM extensions may
well cover your graphic needs too.
http://www.danks.org/mark/


JSyn - Audio Software Synthesis API and Plugins for Java [works with Jython]
http://www.softsynth.com/jsyn/



Other interactive synthesis tools I suggest for deepening ones grasp of speech
and speech-like sound are:

RGS - Realtime Granular Synthesis by Henry Lowengard
http://www.echonyc.com/~jhhl/software.html
RGS was developed for Amiga. To the best of my knowldege it will still run on a
number of computer platforms using UAE [Universal Amiga Emulator].
A modern Windows version I have used is 'Coagula' by Rasmus Ekman:
http://hem.passagen.se/rasmuse/Coagula.htm


Finally, an unusual book with a novel, eye-openig approach to speech analysis is
"Who is Fourier?"
http://www.lexlrf.org/pub/fourier.html
http://www.amazon.com/exec/obidos/ASIN/0964350408/104-1198719-6481569

Developed by the 'Transnational College of Lex', the book is an extraordinary
and lively introduction to calculus, working with the idea recorded vocal sounds
and  leading up to fourier transformations to compare and graph patterns of
Japanese vowwls. The result on page 160 is stunning. Clearly their technique can
be applied to other languages and sounds.


hth & please keep me posted about your project
best wishes
./Jason