Dealing with binary data...
Tim Peters
tim_one at email.msn.com
Sat Mar 4 14:27:35 EST 2000
[posted & mailed]
[Thomas A. Bryan]
> I'm trying to work with a data file format defined by Fortran programmers.
> I'd like to write some Python to read and write the data. I like
> Python's struct because I can simply specify '<' at the beginning of
> the format string to guarantee a platform independent reader/writer
> for this format.
>
> I'm hitting one problem. The format contains a fixed-size series of
> 4-byte (little-endian) floats. When there isn't enough data to fill
> up the file, each float is padded with the bit pattern of
> ff7f ff7f
> The file format definition explains it as one (little-endian) integer
> 32767 in each of the float's two bytes.
Unfortunately, that bit pattern doesn't correspond to a finite IEEE-754
float. Python *is* blowing this, but not where you think <wink>.
> Here's the problem:
>
> Python 1.5.2 (#1, Apr 18 1999, 16:03:16) [GCC pgcc-2.91.60 19981201
> (egcs-1.1.1 on linux2
> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> >>> import struct
> >>> struct.pack('<hh',32767,32767)
> '\377\177\377\177'
> >>> struct.unpack('<f','\377\177\377\177')
> (6.79235465281e+38,)
That's where it's blowing it. This should not yield a normal floating-point
value. Look for the comment
/* XXX This sadly ignores Inf/NaN issues */
in structmodule.c's unpack_float() function. Unclear what it should do,
though, as the Python language itself ignores the possibility of infs and
NaNs (that's all a platform-dependent crap shoot).
> >>> floatNum = struct.unpack('<f','\377\177\377\177')[0]
> >>> struct.pack('<f',floatNum)
> Traceback (innermost last):
> File "<stdin>", line 1, in ?
> OverflowError: float too large to pack with f format
This is legit. The largest finite IEEE-754 float is about 3.4e+38, and
Python is getting that part right:
>>> struct.pack('<f', 3.4e38) # ~= largest finite float
'\236\311\177\177'
>>> struct.pack('<f', 3.41e38) # a little bigger than the largest
Traceback (innermost last):
File "<pyshell#22>", line 1, in ?
struct.pack('<f', 3.41e38)
OverflowError: float too large to pack with f format
>>>
> ...
> If this behavior is expected, then I suppose that I'll have to
> unpack each float twice...once into a pair of shorts (to check
> for the "no data" values) and then into a float (if the data is
> present). Then, when I output the data, I'll have to check
> each float. If it's None, pack the two ints. If it's not None,
> pack the float.
You'll have to do *something* to distinguish real floats from padding, but
that's up to you.
Another way to detect the dummy values is this:
dummy = math.frexp(the_unpacked_float)[1] > 128
because, e.g.,
>>> math.frexp(3.4e38)
(0.999170198199, 128)
>>> math.frexp(3.41e38)
(0.501054467038, 129)
>>>
That's cheap, and will catch other cases where the input data is insane too.
good-to-the-last-bit-ly y'rs - tim
More information about the Python-list
mailing list