compare two voices

Sun May 1 08:47:45 EDT 2005

[Qiangning Hong]

> I just want to compare two sound WAVE file, not what the students or
> the teacher really saying.  For example, if the teacher recorded his
> "standard" pronouncation of "god", then the student saying "good" will
> get a higher score than the student saying "evil" ---- because "good"
> sounds more like "god".

If I had this problem and was alone, I would likely create one audiogram
(I mean a spectral frequency analysis over time) for each voice sample.
Normally, this is presented with frequency on the vertical axis, time on
the horizontal axis, and gray value for frequency amplitude.  There are
a few tools available for doing this, yet integrating them in another
application may require some work.

Now, because of voice pitch differences and elocution speed, the
audiograms would somehow look alike, yet scaled differently in both
directions.  The problem you now have is to recognise that an image is
"similar" to part of another, so here, I would likely do some research
on various transforms (like Hough's and any other of the same kind) that
might ease normalisation prior to comparison.  Image classification
techniques (they do this a lot in satellite imagery) for recognizing
similar textures in audiograms, and so, clues for matching images.  A
few image classification programs which have been previously announced
here, I did not look at them yet, but who knows, they may be helpful.

Then, if the above work is done correctly and meaningfully, you now want
to compute correlations between normalised audiograms.  More correlated
they are, more likely the original pronunciation were.

Now, if I had this problem and could call friends, I would surely phone
one or two of them, who work at companies offering voice recognition
devices or services.  They will be likely reluctant at sharing advanced
algorithms, as these give them industrial advantage over competitors.

> I try to use the value returned from rms(add(a, mul(b, -findfactor(a,
> b)))) as the score.  But the result is not good.

Oh, absolutely no chance that such a simple thing would ever work. :-)

-- 
François Pinard   http://pinard.progiciels-bpi.ca