How to write fast into a file in python?

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri May 17 13:47:02 EDT 2013


On Fri, 17 May 2013 18:20:33 +0300, Carlos Nepomuceno wrote:

> ### fastwrite5.py ###
> import cStringIO
> size = 50*1024*1024
> value = 0
> filename = 'fastwrite5.dat'
> x = 0
> b = cStringIO.StringIO()
> while x < size:
>     line = '{0}\n'.format(value)
>     b.write(line)
>     value += 1
>     x += len(line)+1

Oh, I forgot to mention: you have a bug in this function. You're already 
including the newline in the len(line), so there is no need to add one. 
The result is that you only generate 44MB instead of 50MB.

> f = open(filename, 'w')
> f.write(b.getvalue())
> f.close()
> b.close()

Here are the results of profiling the above on my computer. Including the 
overhead of the profiler, it takes just over 50 seconds to run your file
on my computer.

[steve at ando ~]$ python -m cProfile fastwrite5.py
         17846645 function calls in 53.575 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   30.561   30.561   53.575   53.575 fastwrite5.py:1(<module>)
        1    0.000    0.000    0.000    0.000 {cStringIO.StringIO}
  5948879    5.582    0.000    5.582    0.000 {len}
        1    0.004    0.004    0.004    0.004 {method 'close' of 'cStringIO.StringO' objects}
        1    0.000    0.000    0.000    0.000 {method 'close' of 'file' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  5948879    9.979    0.000    9.979    0.000 {method 'format' of 'str' objects}
        1    0.103    0.103    0.103    0.103 {method 'getvalue' of 'cStringIO.StringO' objects}
  5948879    7.135    0.000    7.135    0.000 {method 'write' of 'cStringIO.StringO' objects}
        1    0.211    0.211    0.211    0.211 {method 'write' of 'file' objects}
        1    0.000    0.000    0.000    0.000 {open}


As you can see, the time is dominated by repeatedly calling len(), 
str.format() and StringIO.write() methods. Actually writing the data to 
the file is quite a small percentage of the cumulative time.

So, here's another version, this time using a pre-calculated limit. I 
cheated and just copied the result from the fastwrite5 output :-)

# fasterwrite.py
filename = 'fasterwrite.dat'
with open(filename, 'w') as f:
    for i in xrange(5948879):  # Actually only 44MB, not 50MB.
        f.write('%d\n' % i)


And the profile results are about twice as fast as fastwrite5 above, with 
only 8 seconds in total writing to my HDD.

[steve at ando ~]$ python -m cProfile fasterwrite.py
         5948882 function calls in 28.840 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   20.592   20.592   28.840   28.840 fasterwrite.py:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  5948879    8.229    0.000    8.229    0.000 {method 'write' of 'file' objects}
        1    0.019    0.019    0.019    0.019 {open}


Without the overhead of the profiler, it is a little faster:

[steve at ando ~]$ time python fasterwrite.py

real    0m16.187s
user    0m13.553s
sys     0m0.508s


Although it is still slower than the heavily optimized dd command, 
but not unreasonably slow for a high-level language:

[steve at ando ~]$ time dd if=fasterwrite.dat of=copy.dat
90781+1 records in
90781+1 records out
46479922 bytes (46 MB) copied, 0.737009 seconds, 63.1 MB/s

real    0m0.786s
user    0m0.071s
sys     0m0.595s




-- 
Steven



More information about the Python-list mailing list