struct: type registration?

Serge Orlov Serge.Orlov at gmail.com
Fri Jun 2 03:44:30 EDT 2006


John Machin wrote:
> On 2/06/2006 4:18 AM, Serge Orlov wrote:
> > If you want to parse binary data use pyconstruct
> > <http://pyconstruct.wikispaces.com/>
> >
>
> Looks promising on the legibility and functionality fronts. Can you make
> any comment on the speed?

I don't know really. I used it for small data parsing, its performance
was acceptable. As I understand it is implemented right now as pure
python code using struct under the hood. The biggest concern is the
lack of comprehensive documentation, if that scares you, it's not for
you.

> Reason for asking is that Microsoft Excel
> files have this weird "RK" format for expressing common float values in
> 32 bits (refer http://sc.openoffice.org, see under "Documentation"
> heading). I wrote and support the xlrd module (see
> http://cheeseshop.python.org/pypi/xlrd) for reading those files in
> portable pure Python. Below is a function that would plug straight in as
> an example of Giovanni's custom unpacker functions. Some of the files
> can be very large, and reading rather slow.

I *guess* that the *current* implementation of pyconstruct will make
parsing slightly slower. But you have to try to find out.

> from struct import unpack
>
> def unpack_RK(rk_str): # arg is 4 bytes
>      flags = ord(rk_str[0])
>      if flags & 2:
>          # There's a SIGNED 30-bit integer in there!
>          i, = unpack('<i', rk_str)
>          i >>= 2 # div by 4 to drop the 2 flag bits
>          if flags & 1:
>              return i / 100.0
>          return float(i)
>      else:
>          # It's the most significant 30 bits
>          # of an IEEE 754 64-bit FP number
>          d, = unpack('<d', '\0\0\0\0' + chr(flags & 252) + rk_str[1:4])
>          if flags & 1:
>              return d / 100.0
>          return d

I had to lookup what < means :) Since nobody except this function cares
about internals of RK number, you don't need to use pyconstruct to
parse at bit level. The code will be almost like you wrote except you
replace unpack('<d', with Construct.LittleFloat64("").parse( and plug
the unpack_RK into pyconstruct framework by deriving from Field class.
Sure, nobody is going to raise your paycheck because of this rewrite :)
The biggest benefit comes from parsing the whole data file with
pyconstruct, not individual fields.




More information about the Python-list mailing list