python disk i/o speed

Bengt Richter bokr at oz.net
Wed Aug 7 20:53:21 EDT 2002


On Wed, 07 Aug 2002 15:52:13 +0000, Martin Franklin <mfranklin1 at gatwick.westerngeco.slb.com> wrote:

>On Wednesday 07 Aug 2002 2:40 pm, Jeff Epler wrote:
>> On Wed, Aug 07, 2002 at 07:21:28AM -0700, nnes wrote:
>> > I generated a file about 7MB long, with 3 numbers on each line. Then I
>> > wrote a programm in python, java and ANSI C, generating a second file
>> > based on the first one, with 4 numbers; the original 3 plus the sum of
>> > these.
>> > e.g. "2","5","1" ----> "2","5","1","8"
>>
>> [...]
>>
>> > I wondered about the reason of almost 10 times the difference from c
>> > to python since the programms should be mostly I/O bound and not CPU
>> > bound. Is there also a way of improving the speed for python in this
>> > situation? If sombody wants to make comments on the c and the java
>> > code it would be ok also, since I am not an expert programmer.
>>
>> On any modern machine, reading a 7MB file a second time will not be "I/O
>> bound", because it will be in cache, and should be read at nearly the
>> speed of memcpy(), if not mmap().
>>
>> BTW, here's my attempt at a Python program.  Not having your programs, I
>> can't compare performance:
>>
>> import sys, re
>>
>> pat = re.compile('"([\d]+)","([\d]+)","([\d]+)"')
>> for line in sys.stdin:
>>     match = pat.match(line)
>> #   if not match:
>> #       sys.stdout.write(line)
>>     a, b, c = map(int, match.group(1, 2, 3))
>>     sys.stdout.write('"%s","%s","%s","%s"\n' % (a,b,c, a+b+c))
>>
>> Remember that you can shave another ~5% off of Python runtime by using
>> 'python -O'.  Also, you could attempt to measure the startup time, which
>> is likely to be smaller for C, and larger for Python and Java.
>>
>> Jeff
>
>
>And here is my python version:-
>
>
>
>file=open('bigdata.dat', 'rt')
>fout=open('bigdata.out', 'wt')
>
>for line in file:
>    a, b, c=map(int, line.split())
>    d=a+b+c
>    fout.write("%i %i %i %i\n" %(a, b, c, d))
A nit: the OP wanted quotes and commas. And also borrowing from above, you don't need 'd'
     fout.write('"%s","%s","%s","%s"\n' % (a,b,c, a+b+c))
>fout.close()
>
Don't know whether %i is faster thant %s, but I guess it could be.
>
>And the results from my 'c' version:-
>
>cc speedTest2.c 
>speedTest2.c:19:1: warning: no newline at end of file
[...etc... ]
>Yes thats right I cound not compile the 'c' version <wink>

Regards,
Bengt Richter



More information about the Python-list mailing list