How to write fast into a file in python?

Carlos Nepomuceno carlosnepomuceno at outlook.com
Fri May 17 13:25:37 EDT 2013


Thank you Steve! You are totally right!

It takes about 0.2s for the f.write() to return. Certainly because it writes to the system file cache (~250MB/s).

Using a little bit different approach I've got:

C:\src\Python>python -m timeit -cvn3 -r3 -s"from fastwrite5r import run" "run()"
raw times: 24 25.1 24.4
3 loops, best of 3: 8 sec per loop
    

This time it took 8s to complete from previous 11.3s.

Does those 3.3s are the time to "open, read, parse, compile" steps you told me?

If so, the execute step is really taking 8s, right?

Why does it take so long to build the string to be written? Can it get faster?

Thanks in advance!



### fastwrite5r.py ###
def run():
    import cStringIO
    size = 50*1024*1024
    value = 0
    filename = 'fastwrite5.dat'
    x = 0
    b = cStringIO.StringIO()
    while x < size:
        line = '{0}\n'.format(value)
        b.write(line)
        value += 1
        x += len(line)+1
    f = open(filename, 'w')
    f.write(b.getvalue())
    f.close()
    b.close()

if __name__ == '__main__':
    run()





----------------------------------------
> From: steve+comp.lang.python at pearwood.info
> Subject: Re: How to write fast into a file in python?
> Date: Fri, 17 May 2013 16:42:55 +0000
> To: python-list at python.org
>
> On Fri, 17 May 2013 18:20:33 +0300, Carlos Nepomuceno wrote:
>
>> I've got the following results on my desktop PC (Win7/Python2.7.5):
>>
>> C:\src\Python>python -m timeit -cvn3 -r3 "execfile('fastwrite2.py')" raw
>> times: 123 126 125
>> 3 loops, best of 3: 41 sec per loop
>
> Your times here are increased significantly by using execfile. Using
> execfile means that instead of compiling the code once, then executing
> many times, it gets compiled over and over and over and over again. In my
> experience, using exec, execfile or eval makes your code ten or twenty
> times slower:
>
> [steve at ando ~]$ python -m timeit 'x = 100; y = x/3'
> 1000000 loops, best of 3: 0.175 usec per loop
> [steve at ando ~]$ python -m timeit 'exec("x = 100; y = x/3")'
> 10000 loops, best of 3: 37.8 usec per loop
>
>
>> Strangely I just realised that the time it takes to complete such
>> scripts is the same no matter what hard drive I choose to run them. The
>> results are the same for an SSD (main drive) and a HDD.
>
> There's nothing strange here. The time you measure is dominated by three
> things, in reducing order of importance:
>
> * the poor choice of execfile dominates the time taken;
>
> * followed by choice of algorithm;
>
> * followed by the time it actually takes to write to the disk, which is
> probably insignificant compared to the other two, regardless of whether
> you are using a HDD or SSD.
>
> Until you optimize the code, optimizing the media is a waste of time.
>
>
>> I think it's very strange to take 11.3s to write 50MB (4.4MB/s)
>> sequentially on a SSD which is capable of 140MB/s.
>
> It doesn't. It takes 11.3 seconds to open a file, read it into memory,
> parse it, compile it into byte-code, and only then execute it. My
> prediction is that the call to f.write() and f.close() probably take a
> fraction of a second, and nearly all of the rest of the time is taken by
> other calculations.
>
>
>
> --
> Steven
> --
> http://mail.python.org/mailman/listinfo/python-list 		 	   		  


More information about the Python-list mailing list