[SciPy-User] Wave files / PCM question

Sun Nov 7 17:51:51 EST 2010

Hi all,

In a linear PCM encoded wave file, the samples are typically stored 
either as unsigned bytes or signed 16 bit integers. Does anyone know 
(and preferably have a solid reference for) the correct conversion for 
both of these types to floats between -1 and 1?

My assumption would be that no possible values should be wasted, so that 
-1 should correspond to 0 (or -2**15) and +1 should correspond to 255 
(or 2**15-1) for 8 (or 16) bit samples. But this has the odd feature 
that 0 is not represented, as it would have to correspond to 127.5 (or 
-0.5). That doesn't bother me too much, at least in the case of the 
unsigned bytes, but in the case of the signed 16 bit ints, it means that 
the zero of the signed 16 bit int doesn't correspond to the zero of the 
float, and that essentially the signedness of the 16 bit int is more or 
less ignored.

The alternative is that the signedness is used and +/- 1 corresponds to 
+/- 2**15-1, which would mean that the value -2**15 is never used for 16 
bit LPCM, which seems to violate my intuition about how people used to 
design file formats back in the good old days when everything was very 
efficient.

So which is it? Waste -2**15 or violate 0=0? I've found web pages that 
seem to suggest both possibilities, but I'm not sure what the definitive 
reference is for this.

Apologies for slightly offtopic question, although I am using numpy and 
scipy. :)

Dan