[SciPy-User] writing data to binary for fortran

Fri May 14 10:30:37 EDT 2010

On 2010-05-14 03:57 , Francesc Alted wrote:
> A Tuesday 11 May 2010 21:09:32 Gideon escrigué:
>> I've previously used the FortranFile.py to read in binary data
>> generated by fortran computations, but now I'd like to write data from
>> NumPy/SciPy to binary which can be read in by a fortran program.  Does
>> anyone have an example of using fortranfile.py to create and write
>> data to binary?  Alternatively, can anyone suggest a way to write
>> numpy arrays to binary in away that permits me to specify the correct
>> offset (4 bytes on my machine) for fortran to then properly read the
>> data in?
>
> Just for completeness to other solutions offered, I'm attaching a BinaryFile
> class that allows you to read/write fortran files (in general, binary files).
>  From its docstrings:
>
> """
> BinaryFile: A class for accessing data to/from large binary files
> =================================================================
>
> The data is meant to be read/write sequentially from/to a binary file.
> One can request to read a piece of data with a specific type and shape
> from it.  Also, it supports the notion of Fortran and C ordered data,
> so that the returned data is always well-behaved (C-contiguous and
> aligned).
>
> This class is seeking capable.
> """
>
> It differs from the solutions that other presented here in that it does not
> use the struct module at all, so it is much more faster.  For example, when
> using Neil's fortranfile module, one have:
>
> In [1]: import fortranfile
>
> In [2]: import numpy as np
>
> In [3]: f = fortranfile.FortranFile('/tmp/test.unf',mode='w')
>
> In [5]: time f.writeReals(np.arange(1e7))
> CPU times: user 6.06 s, sys: 0.14 s, total: 6.21 s
> Wall time: 6.41 s
>
> In [7]: f.close()
>
> In [8]: f = fortranfile.FortranFile('/tmp/test.unf',mode='r')
>
> In [9]: time f.readReals()
> CPU times: user 0.64 s, sys: 0.35 s, total: 0.99 s
> Wall time: 1.00 s
> Out[10]:
> array([  0.00000000e+00,   1.00000000e+00,   2.00000000e+00, ...,
>           9.99999700e+06,   9.99999800e+06,   9.99999900e+06], dtype=float32)
>
> while using my binaryfile module gives:
>
> In [1]: import numpy as np
>
> In [2]: from binaryfile import BinaryFile
>
> In [3]: f = BinaryFile('/tmp/test.bin', mode="w+", order='fortran')
>
> In [4]: time f.write(np.arange(1e7))
> CPU times: user 0.04 s, sys: 0.19 s, total: 0.24 s
> Wall time: 0.24 s        # 26x times faster than fortranfile
>
> In [6]: f.seek(0)
>
> In [7]: time f.read('f8', (int(1e7),))
> CPU times: user 0.03 s, sys: 0.12 s, total: 0.15 s
> Wall time: 0.15 s       # 6.6 times faster than fortranfile
> Out[8]:
> array([  0.00000000e+00,   1.00000000e+00,   2.00000000e+00, ...,
>           9.99999700e+06,   9.99999800e+06,   9.99999900e+06])
>
> Also, binaryfile supports all the types in NumPy, even strings and records.

Wonderful speed!  But, alas, binaryfile does not produce fortran 
unformatted output.  The format that you've written is what Fortran 
calls stream output and is a relatively recent addition to that 
language.  While fortranfile is certainly slow due to its use of the 
struct module for all writes and reads, it allows it to read and write 
Fortran's record-oriented (not like numpy records) format with a great 
deal of flexibility.  It was designed to be able to read data files 
created by Fortran simulation codes that may have been produced on 
machines with different integer sizes and endian-ness than the machine 
doing the reading.  Your binaryfile does not do this, although I do not 
doubt that it could be done.  Any improvements that make fortranfile 
faster will be gladly accepted!

-Neil