[SciPy-User] unpacking binary data from a C structure

Scott Ransom sransom at nrao.edu
Tue Apr 13 10:57:38 EDT 2010


On Tuesday 13 April 2010 10:41:12 am Robert Kern wrote:
> On Tue, Apr 13, 2010 at 08:20, Tom Kuiper <kuiper at jpl.nasa.gov> wrote:
> > Dear list,
> >
> > here's something I find very strange.  I have a C structure defined
> > as:
> >
> > typedef struct
> > {
> >  unsigned short spcid;         /* station id - 10, 40, 60, 21 */
> >  unsigned short vsrid;         /* vsr1a, vsr1b ... from enum */
> >  unsigned short chanid;        /* subchannel id 0,1,2,3 */
> >  unsigned short bps;           /* number of bits per sample - 1, 2,
> > 4, 8, or
> >                                   16 */
> >  unsigned long  srate;         /* number of samples per second in
> > kilo-samples
> >                                   per second */
> >  unsigned short error;         /* hw err flag, dma error or
> > num_samples error,
> >                                   0 ==> no errors */
> >  unsigned short year;          /* time tag - year */
> >  unsigned short doy;           /* time tag - day of year */
> >  unsigned long  sec;           /* time tag - second of day */
> >  double         freq;          /* in Hz */
> >  unsigned long  orate;         /* number of statistics samples per
> > second */
> >  unsigned short nsubchan;      /* number of output sub chans */
> > }
> > stats_hdr_t;
> >
> > The python module struct unpack expected format is 'HHHH L HHH L d
> > L H' Here's a real header structure as it appears at the head of a
> > file:
> >
> >  0000000  000d  0001  0006  0008
> >  0000010  4240  000f  0000  0000
> >  0000020  0000  07da  0064  4730
> >  0000030  0001  0000  0000  0000
> >  0000040  d800  d31d  421d  03e8
> >  0000048  0000  0000  0000  0002
> >
> > Decoded as unsigned shorts:
> >
> >  0000000    13     1     6     8
> >  0000010 16960    15     0     0
> >  0000020     0  2010   100 18224
> >  0000030     1     0     0     0
> >  0000040 55296 54045 16925  1000
> >  0000050     0     0     0     2
> >
> > Matching these to the stats_hdr_t with 'unpack' notation:
> >
> >  0000000     H     H     H     H
> >  0000010    L1    L2     H     ?
> >  0000020     ?     H     H    L1
> >  0000030    L2     ?     ?    D1
> >  0000040    D2    D3    D4    L1
> >  0000050    L2     ?     ?     H
> >
> > So the actual format is 'HHHH L H xxxx HH L xxxx d L xxxx H'
> > What are all the mystery 4-byte blanks?  This works:
> >
> > buf = fd.read(50)
> > header = unpack_from('=4H LH2x 2x2HL4xdL4xH',buf)
> >
> > Since unpacking binary data must be a fairly common activity in
> > scientific circles. I hope you will have some suggestions.
> 
> C compilers can insert padding into the memory layout of a structure
> in order to align certain members to certain boundaries, particularly
> doubles. This behavior is compiler- and platform-specific.

If the exact order of the elements in the structure is not important, 
you can mitigate against this (but not prevent it entirely) by putting 
common types of structure elements together, with the largest ones 
first.  In your case, that would mean grouping all the longs first and 
then putting all the shorts together second.

Scott


-- 
Scott M. Ransom            Address:  NRAO
Phone:  (434) 296-0320               520 Edgemont Rd.
email:  sransom at nrao.edu             Charlottesville, VA 22903 USA
GPG Fingerprint: 06A9 9553 78BE 16DB 407B  FFCA 9BFA B6FF FFD3 2989



More information about the SciPy-User mailing list