compare two voices

Sun May 1 03:54:40 EDT 2005

> Jeremy Bowers wrote:
>> No matter how you slice it, this is not a Python problem, this is an
>> intense voice recognition algorithm problem that would make a good
>> PhD thesis.

Qiangning Hong wrote:
> No, my goal is nothing relative to voice recognition.  Sorry that I
> haven't described my question clearly.  We are not teaching English, so
> the voice recognition isn't helpful here.

To repeat what Jeremy wrote - what you are asking *is* relative
to voice recognition.  You want to recognize that two different voices,
with different pitches, pauses, etc., said the same thing.

There is a lot of data in speech.  That's why sound files are bigger
than text files.  Some of it gets interpreted as emotional nuances,
or as an accent, while others are simply ignored.

> I just want to compare two sound WAVE file, not what the students or
> the teacher really saying.  For example, if the teacher recorded his
> "standard" pronouncation of "god", then the student saying "good" will
> get a higher score than the student saying "evil" ---- because "good"
> sounds more like "god".

Try this: record the word twice and overlay them.  They will be
different.  And that's with the same speaker.  Now try it with your
voice compared with another's.  You can hear just how different they
are.  One will be longer, another deeper, or with the "o" sound
originating in a different part of the mouth.

At the level you are working on the computer doesn't know which of
the data can be ignored.  It doesn't know how to find the start
of the word (as when a student says "ummm, good").  It doesn't know
how to stretch the timings, nor adjust for pitch between, say,
a man and a woman's voice.

My ex-girlfriend gave me a computer program for learning Swedish.
It included a program to do a simpler version of what you are
asking.  It only compared phonemes, so I could practice the vowels.
Even then it's comparison seemed more like a random value than
meaningful.

Again, as Jeremy said, you want something harder than what
speech recognition programs do.  They at least are trained
to understand a given speaker, which helps improve the quality
of the recognition.  You don't want that -- that's the
opposite of what you're trying to do.  Speaker-independent
voice recognition is harder than speaker-dependent.

You can implement a solution on the lines you were thinking of
but as you found it doesn't work.  A workable solution will
require good speech recognition capability and is still very
much in the research stage (as far as I know; it's not my
field).

If your target language is a major one then there may be some
commercial language recognition software you can use.  You
could have your reference speaker train the software on the
vocabulary list and have your students try to have the software
recognize the correct word.

If your word list is too short or recognizer not set well
enough then saying something like "thud" will also be
recognized as being close enough to "good".

Why don't you just have the students hear both the
teachers voice and the student's just recorded voice, one
right after the other?  That gives feedback.  Why does
the computer need to judge the correctness?

				Andrew
				dalke at dalkescientific.com