python disk i/o speed

Wed Aug 7 10:40:10 EDT 2002

On Wed, Aug 07, 2002 at 07:21:28AM -0700, nnes wrote:
> I generated a file about 7MB long, with 3 numbers on each line. Then I
> wrote a programm in python, java and ANSI C, generating a second file
> based on the first one, with 4 numbers; the original 3 plus the sum of
> these.
> e.g. "2","5","1" ----> "2","5","1","8"
[...]
> I wondered about the reason of almost 10 times the difference from c
> to python since the programms should be mostly I/O bound and not CPU
> bound. Is there also a way of improving the speed for python in this
> situation? If sombody wants to make comments on the c and the java
> code it would be ok also, since I am not an expert programmer.

On any modern machine, reading a 7MB file a second time will not be "I/O
bound", because it will be in cache, and should be read at nearly the
speed of memcpy(), if not mmap().

BTW, here's my attempt at a Python program.  Not having your programs, I
can't compare performance:

import sys, re

pat = re.compile('"([\d]+)","([\d]+)","([\d]+)"')
for line in sys.stdin:
    match = pat.match(line)
#   if not match:
#       sys.stdout.write(line)
    a, b, c = map(int, match.group(1, 2, 3))
    sys.stdout.write('"%s","%s","%s","%s"\n' % (a,b,c, a+b+c))

Remember that you can shave another ~5% off of Python runtime by using
'python -O'.  Also, you could attempt to measure the startup time, which
is likely to be smaller for C, and larger for Python and Java.

Jeff