Python IO performance?

Wed Jun 4 17:06:34 EDT 2003

On 31 May 2003 10:10:12 -0400, aahz at pythoncraft.com (Aahz) wrote:

>In article <m33civ37b2.fsf at localhost.localdomain>,
>Ganesan R  <rganesan at myrealbox.com> wrote:
>>
>>I apologize for bringing up this topic again. I am sure I am missing
>>something obvious. I am a relatively recent perl to python convert
>>and I always had a nagging feeling that processing text files with
>>python was slow. 
>
>Although there has been some speedup in Python for this, Perl's current
>design wins over Python for two reasons (please let me know if my info
>is out of date):
>
>* Perl does not use thread-safe file I/O.  Even running in single-thread
>mode, making calls into the thread-safe OS subsystem slows things down.
I believe that thread-safe slowdown may be true, but the question to me
is why. Certainly the actual machine instruction sequences involved in these
calls take nowhere near the milliseconds of delay observed. They should be
on the order of microseconds, not milliseconds, on modern CPUs. A 7200 rpm disk
rotates at 8 1/3 ms per rev. That and inter-cylinder seek step times etc. will define
a physical limit to the streaming rate of data, once file structure info has been
cached and past initial seek and buffers are full (of course raid and/or multiple IDE
disks can make things more complicated). So the question becomes, why would optimal
reads be missed, if the CPU can finish its execution blips orders of magnitude faster
than disk revolutions?

My generic guess would be that an artificial delay is caused by some event/scheduling
implementation that e.g. posts events instead of doing something about them immediately,
even when they happen in a context where higher priority processes are available
(and anything should be higher priority than no process or or the idle process).

E.g., imagine what would happen if  a blocked (IO or lock) thread had to wait for the
next OS tick for the scheduler/dispatcher to allow the process to proceed to initiate the
next IO. Even with OS read-ahead buffering of file data in memory and disk controller track
caching etc., the system has to wait until data is removed before it can put more into the end
of that pipeline, so an artificial delay between removals (reads emptying buffers) will
have an effect. What the effect will be depends on the size of the delay vs the probability
of an optimal IO being missed because of it. Smaller delay, smaller probability, so the
effect can reduce toward pure periodic pacing of basic IO chunks.

The latter speculative theory applied to an assumed (IIRC) basic Linux time tick of 1 ms and
an IO chunk of 4k, would lead to a reading rate of about 4k/ms.

Interestingly (though perhaps not unlikely misleadingly ;-), that rate for 3MB is about what
Ganesan observed. I.e.,

real 0m0.716s vs (3MB/4k) * 1ms  or about .732

I'm not sure where the 16ms real speedup from predicted came from.
That would amount to 16*4k of IO that didn't block, I suppose.
If there were 64k of readahead, maybe that could account for it somehow
(though it doesn't seem consistent)? Ah well, this is speculation ... ;-)

>
>* Perl uses OS-specific FILE* hacks for speed; Python mostly sticks with
>simple, portable calls.
IMO if the latter does not result in max disk thoughput for simple IO, there is
room for improvement in the OS event/scheduling/dispatching and/or the interface(s) to it.
Not that that helps Python per se ;-/

PS. I wonder how the IO time for a mmap-ed file would compare. E.g., accessing every
512-th byte in the file ought to trigger the IO (or a bigger step if you know the clustering).

BTW, can you mmap /dev/null ?

Regards,
Bengt Richter