[SciPy-User] unpacking binary data from a C structure

Tue Apr 13 15:24:06 EDT 2010

scipy-user-request at scipy.org wrote:
> Send SciPy-User mailing list submissions to
>         scipy-user at scipy.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://mail.scipy.org/mailman/listinfo/scipy-user
> or, via email, send a message with subject or body 'help' to
>         scipy-user-request at scipy.org
>
> You can reach the person managing the list at
>         scipy-user-owner at scipy.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of SciPy-User digest..."
>
>
> Today's Topics:
>
>    1. Re: unpacking binary data from a C structure (Scott Ransom)
>    2. Re: unpacking binary data from a C structure (Jason Grout)
>    3. Re: unpacking binary data from a C structure (Charles R Harris)
>    4. Re: SciPy-User Digest, Vol 80, Issue 22 (Tom Kuiper)
>    5. Re: SciPy-User Digest, Vol 80, Issue 22 (Scott Ransom)
>    6. Re: unpacking binary data from a C structure (Anne Archibald)
>    7. Re: SciPy-User Digest, Vol 80, Issue 22 (Charles R Harris)
>    8. Re: SciPy-User Digest, Vol 80, Issue 22 (Tom Kuiper)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 13 Apr 2010 10:57:38 -0400
> From: Scott Ransom <sransom at nrao.edu>
> Subject: Re: [SciPy-User] unpacking binary data from a C structure
> To: scipy-user at scipy.org
> Message-ID: <201004131057.38774.sransom at nrao.edu>
> Content-Type: Text/Plain;  charset="utf-8"
>
> On Tuesday 13 April 2010 10:41:12 am Robert Kern wrote:
>   
>> On Tue, Apr 13, 2010 at 08:20, Tom Kuiper <kuiper at jpl.nasa.gov> wrote:
>>     
>>> Dear list,
>>>
>>> here's something I find very strange.  I have a C structure defined
>>> as:
>>>
>>> typedef struct
>>> {
>>>  unsigned short spcid;         /* station id - 10, 40, 60, 21 */
>>>  unsigned short vsrid;         /* vsr1a, vsr1b ... from enum */
>>>  unsigned short chanid;        /* subchannel id 0,1,2,3 */
>>>  unsigned short bps;           /* number of bits per sample - 1, 2,
>>> 4, 8, or
>>>                                   16 */
>>>  unsigned long  srate;         /* number of samples per second in
>>> kilo-samples
>>>                                   per second */
>>>  unsigned short error;         /* hw err flag, dma error or
>>> num_samples error,
>>>                                   0 ==> no errors */
>>>  unsigned short year;          /* time tag - year */
>>>  unsigned short doy;           /* time tag - day of year */
>>>  unsigned long  sec;           /* time tag - second of day */
>>>  double         freq;          /* in Hz */
>>>  unsigned long  orate;         /* number of statistics samples per
>>> second */
>>>  unsigned short nsubchan;      /* number of output sub chans */
>>> }
>>> stats_hdr_t;
>>>
>>> The python module struct unpack expected format is 'HHHH L HHH L d
>>> L H' Here's a real header structure as it appears at the head of a
>>> file:
>>>
>>>  0000000  000d  0001  0006  0008
>>>  0000010  4240  000f  0000  0000
>>>  0000020  0000  07da  0064  4730
>>>  0000030  0001  0000  0000  0000
>>>  0000040  d800  d31d  421d  03e8
>>>  0000048  0000  0000  0000  0002
>>>
>>> Decoded as unsigned shorts:
>>>
>>>  0000000    13     1     6     8
>>>  0000010 16960    15     0     0
>>>  0000020     0  2010   100 18224
>>>  0000030     1     0     0     0
>>>  0000040 55296 54045 16925  1000
>>>  0000050     0     0     0     2
>>>
>>> Matching these to the stats_hdr_t with 'unpack' notation:
>>>
>>>  0000000     H     H     H     H
>>>  0000010    L1    L2     H     ?
>>>  0000020     ?     H     H    L1
>>>  0000030    L2     ?     ?    D1
>>>  0000040    D2    D3    D4    L1
>>>  0000050    L2     ?     ?     H
>>>
>>> So the actual format is 'HHHH L H xxxx HH L xxxx d L xxxx H'
>>> What are all the mystery 4-byte blanks?  This works:
>>>
>>> buf = fd.read(50)
>>> header = unpack_from('=4H LH2x 2x2HL4xdL4xH',buf)
>>>
>>> Since unpacking binary data must be a fairly common activity in
>>> scientific circles. I hope you will have some suggestions.
>>>       
>> C compilers can insert padding into the memory layout of a structure
>> in order to align certain members to certain boundaries, particularly
>> doubles. This behavior is compiler- and platform-specific.
>>     
>
> If the exact order of the elements in the structure is not important,
> you can mitigate against this (but not prevent it entirely) by putting
> common types of structure elements together, with the largest ones
> first.  In your case, that would mean grouping all the longs first and
> then putting all the shorts together second.
>
> Scott
>
>
> --
> Scott M. Ransom            Address:  NRAO
> Phone:  (434) 296-0320               520 Edgemont Rd.
> email:  sransom at nrao.edu             Charlottesville, VA 22903 USA
> GPG Fingerprint: 06A9 9553 78BE 16DB 407B  FFCA 9BFA B6FF FFD3 2989
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 13 Apr 2010 10:06:24 -0500
> From: Jason Grout <jason-sage at creativetrax.com>
>   
...
> If you are using gcc, this might be relevant:
>
> http://sig9.com/articles/gcc-packed-structures
>   
I've bookmarked that.  Thanks.
> Date: Tue, 13 Apr 2010 09:27:55 -0600
> From: Charles R Harris <charlesr.harris at gmail.com>
>   
....
> I presume you didn't produce the data,
A colleague did.  I'm encouraging him to re-compile as per the above.
>  but as a rule of thumb c structures
> should not be used to write out binary data, as the binary layout of the
> data won't be portable. Text, netcdf, hdf5, or some other standard data
> format is preferable, with text being perhaps the most portable. That said,
> lots of old data collection programs write out c structures, and no doubt
> newer programs do so also.
>
> Chuck
>   
....
> Date: Tue, 13 Apr 2010 11:55:53 -0400
> From: Anne Archibald <peridot.faceted at gmail.com>
>   
...
> There's also a FORTRAN binary format which one program I have to cope
> with uses; the exact layout of those data files depends on the
> compiler (g77 vs. gfortran) as well as the hardware. I'd also add FITS
> to the list of self-describing portable binary formats that python
> supports well.
>   
Writing the data in FITS was on my TODO list.  Unfortunately,  the 
amount of data to be written is not known beforehand and is very large. 
5-6 GB is not atypical.  I was going to write an intermediate file and, 
then once I know how much data I have, convert it to FITS.  I was doing 
Python to have some sneak peeks at subsets of the data.
> Date: Tue, 13 Apr 2010 09:57:26 -0600
> From: Charles R Harris <charlesr.harris at gmail.com>
>   
...
>> That's probably not because of the padding.  It is likely due to the
>> fact that longs are 4 bytes on 32-bit machines and 8 bytes on 64-bit
>> machines.
>>     
> Unless you are using MSVC on 64 bit windows,
Never!
> in which case they are still 32 bits, IIRC.
>   
Thanks for the warning.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100413/149b7a59/attachment.html>