[SciPy-User] unpacking binary data from a C structure

Robert Kern robert.kern at gmail.com
Tue Apr 13 10:41:12 EDT 2010


On Tue, Apr 13, 2010 at 08:20, Tom Kuiper <kuiper at jpl.nasa.gov> wrote:
> Dear list,
>
> here's something I find very strange.  I have a C structure defined as:
>
> typedef struct
> {
>  unsigned short spcid;         /* station id - 10, 40, 60, 21 */
>  unsigned short vsrid;         /* vsr1a, vsr1b ... from enum */
>  unsigned short chanid;        /* subchannel id 0,1,2,3 */
>  unsigned short bps;           /* number of bits per sample - 1, 2, 4,
> 8, or
>                                   16 */
>  unsigned long  srate;         /* number of samples per second in
> kilo-samples
>                                   per second */
>  unsigned short error;         /* hw err flag, dma error or num_samples
> error,
>                                   0 ==> no errors */
>  unsigned short year;          /* time tag - year */
>  unsigned short doy;           /* time tag - day of year */
>  unsigned long  sec;           /* time tag - second of day */
>  double         freq;          /* in Hz */
>  unsigned long  orate;         /* number of statistics samples per
> second */
>  unsigned short nsubchan;      /* number of output sub chans */
> }
> stats_hdr_t;
>
> The python module struct unpack expected format is 'HHHH L HHH L d L H'
> Here's a real header structure as it appears at the head of a file:
>
>  0000000  000d  0001  0006  0008
>  0000010  4240  000f  0000  0000
>  0000020  0000  07da  0064  4730
>  0000030  0001  0000  0000  0000
>  0000040  d800  d31d  421d  03e8
>  0000048  0000  0000  0000  0002
>
> Decoded as unsigned shorts:
>
>  0000000    13     1     6     8
>  0000010 16960    15     0     0
>  0000020     0  2010   100 18224
>  0000030     1     0     0     0
>  0000040 55296 54045 16925  1000
>  0000050     0     0     0     2
>
> Matching these to the stats_hdr_t with 'unpack' notation:
>
>  0000000     H     H     H     H
>  0000010    L1    L2     H     ?
>  0000020     ?     H     H    L1
>  0000030    L2     ?     ?    D1
>  0000040    D2    D3    D4    L1
>  0000050    L2     ?     ?     H
>
> So the actual format is 'HHHH L H xxxx HH L xxxx d L xxxx H'
> What are all the mystery 4-byte blanks?  This works:
>
> buf = fd.read(50)
> header = unpack_from('=4H LH2x 2x2HL4xdL4xH',buf)
>
> Since unpacking binary data must be a fairly common activity in
> scientific circles. I hope you will have some suggestions.

C compilers can insert padding into the memory layout of a structure
in order to align certain members to certain boundaries, particularly
doubles. This behavior is compiler- and platform-specific.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco



More information about the SciPy-User mailing list