[SciPy-User] unpacking binary data from a C structure
Tom Kuiper
kuiper at jpl.nasa.gov
Tue Apr 13 15:24:06 EDT 2010
scipy-user-request at scipy.org wrote:
> Send SciPy-User mailing list submissions to
> scipy-user at scipy.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://mail.scipy.org/mailman/listinfo/scipy-user
> or, via email, send a message with subject or body 'help' to
> scipy-user-request at scipy.org
>
> You can reach the person managing the list at
> scipy-user-owner at scipy.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of SciPy-User digest..."
>
>
> Today's Topics:
>
> 1. Re: unpacking binary data from a C structure (Scott Ransom)
> 2. Re: unpacking binary data from a C structure (Jason Grout)
> 3. Re: unpacking binary data from a C structure (Charles R Harris)
> 4. Re: SciPy-User Digest, Vol 80, Issue 22 (Tom Kuiper)
> 5. Re: SciPy-User Digest, Vol 80, Issue 22 (Scott Ransom)
> 6. Re: unpacking binary data from a C structure (Anne Archibald)
> 7. Re: SciPy-User Digest, Vol 80, Issue 22 (Charles R Harris)
> 8. Re: SciPy-User Digest, Vol 80, Issue 22 (Tom Kuiper)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 13 Apr 2010 10:57:38 -0400
> From: Scott Ransom <sransom at nrao.edu>
> Subject: Re: [SciPy-User] unpacking binary data from a C structure
> To: scipy-user at scipy.org
> Message-ID: <201004131057.38774.sransom at nrao.edu>
> Content-Type: Text/Plain; charset="utf-8"
>
> On Tuesday 13 April 2010 10:41:12 am Robert Kern wrote:
>
>> On Tue, Apr 13, 2010 at 08:20, Tom Kuiper <kuiper at jpl.nasa.gov> wrote:
>>
>>> Dear list,
>>>
>>> here's something I find very strange. I have a C structure defined
>>> as:
>>>
>>> typedef struct
>>> {
>>> unsigned short spcid; /* station id - 10, 40, 60, 21 */
>>> unsigned short vsrid; /* vsr1a, vsr1b ... from enum */
>>> unsigned short chanid; /* subchannel id 0,1,2,3 */
>>> unsigned short bps; /* number of bits per sample - 1, 2,
>>> 4, 8, or
>>> 16 */
>>> unsigned long srate; /* number of samples per second in
>>> kilo-samples
>>> per second */
>>> unsigned short error; /* hw err flag, dma error or
>>> num_samples error,
>>> 0 ==> no errors */
>>> unsigned short year; /* time tag - year */
>>> unsigned short doy; /* time tag - day of year */
>>> unsigned long sec; /* time tag - second of day */
>>> double freq; /* in Hz */
>>> unsigned long orate; /* number of statistics samples per
>>> second */
>>> unsigned short nsubchan; /* number of output sub chans */
>>> }
>>> stats_hdr_t;
>>>
>>> The python module struct unpack expected format is 'HHHH L HHH L d
>>> L H' Here's a real header structure as it appears at the head of a
>>> file:
>>>
>>> 0000000 000d 0001 0006 0008
>>> 0000010 4240 000f 0000 0000
>>> 0000020 0000 07da 0064 4730
>>> 0000030 0001 0000 0000 0000
>>> 0000040 d800 d31d 421d 03e8
>>> 0000048 0000 0000 0000 0002
>>>
>>> Decoded as unsigned shorts:
>>>
>>> 0000000 13 1 6 8
>>> 0000010 16960 15 0 0
>>> 0000020 0 2010 100 18224
>>> 0000030 1 0 0 0
>>> 0000040 55296 54045 16925 1000
>>> 0000050 0 0 0 2
>>>
>>> Matching these to the stats_hdr_t with 'unpack' notation:
>>>
>>> 0000000 H H H H
>>> 0000010 L1 L2 H ?
>>> 0000020 ? H H L1
>>> 0000030 L2 ? ? D1
>>> 0000040 D2 D3 D4 L1
>>> 0000050 L2 ? ? H
>>>
>>> So the actual format is 'HHHH L H xxxx HH L xxxx d L xxxx H'
>>> What are all the mystery 4-byte blanks? This works:
>>>
>>> buf = fd.read(50)
>>> header = unpack_from('=4H LH2x 2x2HL4xdL4xH',buf)
>>>
>>> Since unpacking binary data must be a fairly common activity in
>>> scientific circles. I hope you will have some suggestions.
>>>
>> C compilers can insert padding into the memory layout of a structure
>> in order to align certain members to certain boundaries, particularly
>> doubles. This behavior is compiler- and platform-specific.
>>
>
> If the exact order of the elements in the structure is not important,
> you can mitigate against this (but not prevent it entirely) by putting
> common types of structure elements together, with the largest ones
> first. In your case, that would mean grouping all the longs first and
> then putting all the shorts together second.
>
> Scott
>
>
> --
> Scott M. Ransom Address: NRAO
> Phone: (434) 296-0320 520 Edgemont Rd.
> email: sransom at nrao.edu Charlottesville, VA 22903 USA
> GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 13 Apr 2010 10:06:24 -0500
> From: Jason Grout <jason-sage at creativetrax.com>
>
...
> If you are using gcc, this might be relevant:
>
> http://sig9.com/articles/gcc-packed-structures
>
I've bookmarked that. Thanks.
> Date: Tue, 13 Apr 2010 09:27:55 -0600
> From: Charles R Harris <charlesr.harris at gmail.com>
>
....
> I presume you didn't produce the data,
A colleague did. I'm encouraging him to re-compile as per the above.
> but as a rule of thumb c structures
> should not be used to write out binary data, as the binary layout of the
> data won't be portable. Text, netcdf, hdf5, or some other standard data
> format is preferable, with text being perhaps the most portable. That said,
> lots of old data collection programs write out c structures, and no doubt
> newer programs do so also.
>
> Chuck
>
....
> Date: Tue, 13 Apr 2010 11:55:53 -0400
> From: Anne Archibald <peridot.faceted at gmail.com>
>
...
> There's also a FORTRAN binary format which one program I have to cope
> with uses; the exact layout of those data files depends on the
> compiler (g77 vs. gfortran) as well as the hardware. I'd also add FITS
> to the list of self-describing portable binary formats that python
> supports well.
>
Writing the data in FITS was on my TODO list. Unfortunately, the
amount of data to be written is not known beforehand and is very large.
5-6 GB is not atypical. I was going to write an intermediate file and,
then once I know how much data I have, convert it to FITS. I was doing
Python to have some sneak peeks at subsets of the data.
> Date: Tue, 13 Apr 2010 09:57:26 -0600
> From: Charles R Harris <charlesr.harris at gmail.com>
>
...
>> That's probably not because of the padding. It is likely due to the
>> fact that longs are 4 bytes on 32-bit machines and 8 bytes on 64-bit
>> machines.
>>
> Unless you are using MSVC on 64 bit windows,
Never!
> in which case they are still 32 bits, IIRC.
>
Thanks for the warning.
Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100413/149b7a59/attachment.html>
More information about the SciPy-User
mailing list