compare two voices

Robert Oschler no-mail-please at nospam.com
Sun May 1 13:32:43 EDT 2005


"Qiangning Hong" <hongqn at gmail.com> wrote in message
news:1114916457.795332.45220 at f14g2000cwb.googlegroups.com...
> I want to make an app to help students study foreign language.  I want
> the following function in it:
>
> The student reads a piece of text to the microphone. The software
> records it and compares it to the wave-file pre-recorded by the
> teacher, and gives out a score to indicate the similarity between them.
>
> This function will help the students pronounce properly, I think.
>
> Is there an existing library (C or Python) to do this?  Or if someone
> can guide me to a ready-to-implement algorithm?
>

How about another approach?

All modern speech recognition systems employ a phonetic alphabet.  It's how
you describe to the speech recognition engine exactly how the word sounds.

For each sentence read, you create a small recognition context that includes
the sentence itself, AND subtle variations of the sentence phonetically.

For example (using English):

You want them to say correctly: "The weather is good today".

You create a context with the following phrases which include the original
sentence, and then alternative sentences that dithers (varies) the original
sentence phonetically.  Sample context:

(*) The weather is good today
Da wedder is god tuday
The weether is good towday

Etc.

Then submit the context to the speech recognition engine and ask the user to
say the sentences.  If the original sentence (*) comes back as the speech
recognition engine's best choice, then they said it right.  If one of the
other choices comes back, then they made a mistake.

You could even "grade" their performance by tagging the variations by
closeness to the original, for example:

(*) The weather is good today (100)
Da wedder is god tuday (80)
Ta wegger es gid towday (50)

In the example above, the original sentence gets a 100, the second choice
which is close gets an 80, and the last option which is pretty bad gets  50.
With a little effort you could automatically create the "dithered" phonetic
variations and auto-calculate the score or closeness to original too.

Thanks,
Robert
http://www.robodance.com
Robosapien Dance Machine - SourceForge project






More information about the Python-list mailing list