[Tutor] reading binary files

bob gailer bgailer at gmail.com
Wed Feb 4 19:15:35 CET 2009


eShopping wrote:
> Bob
>
> I am trying to read UNFORMATTED files.  The files also occur as 
> formatted files and the format string I provided is the string used to 
> write the formatted version.  I can read the formatted version OK.  I 
> (naively) assumed that the same format string was used for both files, 
> the only differences being whether the FORTRAN WRITE statement 
> indicated unformatted or formatted.

WRITE UNFORMATTED dump memory to disk with no formatting. That is why we 
must do some analysis of the file to see where the data has been placed, 
how long the floats are, and what "endian" is being used.

I'd like to examine the file myself. We might save a lot of time and 
energy that way. If it is not very large would you attach it to your 
reply. If it is very large you could either copy just the first 1000 or 
so bytes, or send the whole thing thru www.yousendit.com.
>
> At 21:41 03/02/2009, bob gailer wrote:
>> First question: are you trying to work with the file written 
>> UNFORMATTED? If so read on.

Well, did you read on? What reactions do you have?

>> eShopping wrote:
>>
>>>>> Data format:
>>>>>
>>>>> TIME      1  F  0.0
>>>>> DISTANCE 10  F  0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
>>>>>
>>>>> F=float, D=double, L=logical, S=string etc
>>>>>
>>>>>
>>>>> The first part of the file should contain a string (eg "TIME"),
>>>>> an integer (1) and another string (eg "F") so I tried using
>>>>>
>>>>> import struct
>>>>> in_file = open(file_name+".dat","rb")
>>>>> data = in_file.read()
>>>>> items = struct.unpack('sds', data)
>>>>>
>>>>> Now I get the error
>>>>>
>>>>> error: unpack requires a string argument of length 17
>>>>>
>>>>> which has left me completely baffled!
>>>>
>>>> Did you open the file with mode 'b'? If not change that.
>>>>
>>>> You are passing the entire file to unpack when you should be giving 
>>>> it only the first "line". That's why is is complaining about the 
>>>> length. We need to figure out the lengths of the lines.
>>>>
>>>> Consider the first "line"
>>>>
>>>> TIME      1  F  0.0
>>>>
>>>> There were (I assume)  4 FORTRAN variables written here: character 
>>>> integer character float. Without knowing the lengths of the 
>>>> character variables we are at a loss as to what the struct format 
>>>> should be. Do you know their lengths? Is the last float or double?
>>>>
>>>> Try this: print data[:40] You should see something like:
>>>>
>>>> TIME...\x01\x00\x00\x00...F...\x00\x00\x00\x00...DISTANCE...\n\x00\x00\x00 
>>>>
>>>>
>>>> where ... means 0 or more intervening stuff. It might be that the 
>>>> \x01 and the \n are in other places, as we also have to deal with 
>>>> "byte order" issues.
>>>>
>>>> Please do this and report back your results. And also the FORTRAN 
>>>> variable types if you have access to them.
>>>
>>> Apologies if this is getting a bit messy but the files are at a 
>>> remote location and I forgot to bring copies home.  I don't have 
>>> access to the original FORTRAN program so I tried to emulate the 
>>> reading the data using the Python script below.  AFAIK the FORTRAN 
>>> format line for the header is  (1X, 1X, A8, 1X, 1X, I6, 1X, 1X, 
>>> A1).  If the data following is a float it is written using n(1X, 
>>> F6.2) where n is the number of records picked up from the preceding 
>>> header.
>>>
>>> # test program to read binary data
>>>
>>> import struct
>>>
>>> # create dummy data
>>>
>>> data = []
>>> for i in range(0,10):
>>>     data.append(float(i))
>>>
>>> # write data to binary file
>>>
>>> b_file = open("test.bin","wb")
>>>
>>> b_file.write("  %8s  %6d  %1s\n" % ("DISTANCE", len(data), "F"))
>>> for x in data:
>>>     b_file.write(" %6.2f" % x)
>>
>> You are still confusing text vs binary. The above writes text 
>> regardless of the file mode. If the FORTRAN file was written 
>> UNFORMATTED then you are NOT emulating that with the above program. 
>> The character data is read back in just fine, since there is no 
>> translation involved in the writing nor in the reading. The integer 
>> len(data) is being written as its text (character) representation 
>> (translating binary to text) but being read back in without 
>> translation. Also all the floating point data is going out as text.
>>
>> The file looks like (where b = blank) (how it would look in notepad):
>>
>> bbDISTANCEbbbbbb10bFbbb0.00bbb1.00bbb2.00 If you analyze this with 
>> 2s8s2si2s1s
>> you will see 2s matches bb, 8s matches DISTANCE, 2s matches bb, i 
>> matches bbbb. (\x40\x40\x40\x40). The i tells unpack to shove those 4 
>> bytes unaltered into a Python integer, resulting in 538976288. You 
>> can verify that:
>>
>> >>> struct.unpack('i', '    ')
>> (538976288,)
>>
>> Please either assure me you understand or are prepared for a more in 
>> depth tutorial.
>>> b_file.close()
>>>
>>> # read back data from file
>>>
>>> c_file = open("test.bin","rb")
>>>
>>> data = c_file.read()
>>> start, stop = 0, struct.calcsize("2s8s2si2s1s")
>>>
>>> items = struct.unpack("2s8s2si2s1s",data[start:stop])
>>> print items
>>> print data[:40]
>>>
>>> I'm pretty sure that when I tried this at the other PC there were a 
>>> bunch of \x00\x00 characters in the file but they don't appear in 
>>> NotePad  ... anyway, I thought the Python above would unpack the 
>>> data but items appears as
>>>
>>> ('  ', 'DISTANCE', '  ', 538976288, '10', ' ')
>>>
>>> which seems to be contain an extra item (538976288)
>>>
>>> Alun Griffiths
>>
>>
>> -- 
>> Bob Gailer
>> Chapel Hill NC
>> 919-636-4239
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>


-- 
Bob Gailer
Chapel Hill NC
919-636-4239


More information about the Tutor mailing list