[SciPy-User] unpacking binary data from a C structure
Scott Ransom
sransom at nrao.edu
Tue Apr 13 10:57:38 EDT 2010
On Tuesday 13 April 2010 10:41:12 am Robert Kern wrote:
> On Tue, Apr 13, 2010 at 08:20, Tom Kuiper <kuiper at jpl.nasa.gov> wrote:
> > Dear list,
> >
> > here's something I find very strange. I have a C structure defined
> > as:
> >
> > typedef struct
> > {
> > unsigned short spcid; /* station id - 10, 40, 60, 21 */
> > unsigned short vsrid; /* vsr1a, vsr1b ... from enum */
> > unsigned short chanid; /* subchannel id 0,1,2,3 */
> > unsigned short bps; /* number of bits per sample - 1, 2,
> > 4, 8, or
> > 16 */
> > unsigned long srate; /* number of samples per second in
> > kilo-samples
> > per second */
> > unsigned short error; /* hw err flag, dma error or
> > num_samples error,
> > 0 ==> no errors */
> > unsigned short year; /* time tag - year */
> > unsigned short doy; /* time tag - day of year */
> > unsigned long sec; /* time tag - second of day */
> > double freq; /* in Hz */
> > unsigned long orate; /* number of statistics samples per
> > second */
> > unsigned short nsubchan; /* number of output sub chans */
> > }
> > stats_hdr_t;
> >
> > The python module struct unpack expected format is 'HHHH L HHH L d
> > L H' Here's a real header structure as it appears at the head of a
> > file:
> >
> > 0000000 000d 0001 0006 0008
> > 0000010 4240 000f 0000 0000
> > 0000020 0000 07da 0064 4730
> > 0000030 0001 0000 0000 0000
> > 0000040 d800 d31d 421d 03e8
> > 0000048 0000 0000 0000 0002
> >
> > Decoded as unsigned shorts:
> >
> > 0000000 13 1 6 8
> > 0000010 16960 15 0 0
> > 0000020 0 2010 100 18224
> > 0000030 1 0 0 0
> > 0000040 55296 54045 16925 1000
> > 0000050 0 0 0 2
> >
> > Matching these to the stats_hdr_t with 'unpack' notation:
> >
> > 0000000 H H H H
> > 0000010 L1 L2 H ?
> > 0000020 ? H H L1
> > 0000030 L2 ? ? D1
> > 0000040 D2 D3 D4 L1
> > 0000050 L2 ? ? H
> >
> > So the actual format is 'HHHH L H xxxx HH L xxxx d L xxxx H'
> > What are all the mystery 4-byte blanks? This works:
> >
> > buf = fd.read(50)
> > header = unpack_from('=4H LH2x 2x2HL4xdL4xH',buf)
> >
> > Since unpacking binary data must be a fairly common activity in
> > scientific circles. I hope you will have some suggestions.
>
> C compilers can insert padding into the memory layout of a structure
> in order to align certain members to certain boundaries, particularly
> doubles. This behavior is compiler- and platform-specific.
If the exact order of the elements in the structure is not important,
you can mitigate against this (but not prevent it entirely) by putting
common types of structure elements together, with the largest ones
first. In your case, that would mean grouping all the longs first and
then putting all the shorts together second.
Scott
--
Scott M. Ransom Address: NRAO
Phone: (434) 296-0320 520 Edgemont Rd.
email: sransom at nrao.edu Charlottesville, VA 22903 USA
GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
More information about the SciPy-User
mailing list