[SciPy-User] unpacking binary data from a C structure

Tue Apr 13 11:27:55 EDT 2010

On Tue, Apr 13, 2010 at 7:20 AM, Tom Kuiper <kuiper at jpl.nasa.gov> wrote:

> Dear list,
>
> here's something I find very strange.  I have a C structure defined as:
>
> typedef struct
> {
>  unsigned short spcid;         /* station id - 10, 40, 60, 21 */
>  unsigned short vsrid;         /* vsr1a, vsr1b ... from enum */
>  unsigned short chanid;        /* subchannel id 0,1,2,3 */
>  unsigned short bps;           /* number of bits per sample - 1, 2, 4,
> 8, or
>                                   16 */
>  unsigned long  srate;         /* number of samples per second in
> kilo-samples
>                                   per second */
>  unsigned short error;         /* hw err flag, dma error or num_samples
> error,
>                                   0 ==> no errors */
>  unsigned short year;          /* time tag - year */
>  unsigned short doy;           /* time tag - day of year */
>  unsigned long  sec;           /* time tag - second of day */
>  double         freq;          /* in Hz */
>  unsigned long  orate;         /* number of statistics samples per
> second */
>  unsigned short nsubchan;      /* number of output sub chans */
> }
> stats_hdr_t;
>
> The python module struct unpack expected format is 'HHHH L HHH L d L H'
> Here's a real header structure as it appears at the head of a file:
>
>  0000000  000d  0001  0006  0008
>  0000010  4240  000f  0000  0000
>  0000020  0000  07da  0064  4730
>  0000030  0001  0000  0000  0000
>  0000040  d800  d31d  421d  03e8
>  0000048  0000  0000  0000  0002
>
> Decoded as unsigned shorts:
>
>  0000000    13     1     6     8
>  0000010 16960    15     0     0
>  0000020     0  2010   100 18224
>  0000030     1     0     0     0
>  0000040 55296 54045 16925  1000
>  0000050     0     0     0     2
>
> Matching these to the stats_hdr_t with 'unpack' notation:
>
>  0000000     H     H     H     H
>  0000010    L1    L2     H     ?
>  0000020     ?     H     H    L1
>  0000030    L2     ?     ?    D1
>  0000040    D2    D3    D4    L1
>  0000050    L2     ?     ?     H
>
> So the actual format is 'HHHH L H xxxx HH L xxxx d L xxxx H'
> What are all the mystery 4-byte blanks?  This works:
>
> buf = fd.read(50)
> header = unpack_from('=4H LH2x 2x2HL4xdL4xH',buf)
>
> Since unpacking binary data must be a fairly common activity in
> scientific circles. I hope you will have some suggestions.
>
>
I presume you didn't produce the data, but as a rule of thumb c structures
should not be used to write out binary data, as the binary layout of the
data won't be portable. Text, netcdf, hdf5, or some other standard data
format is preferable, with text being perhaps the most portable. That said,
lots of old data collection programs write out c structures, and no doubt
newer programs do so also.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100413/646d2a60/attachment.html>