[Python-Dev] Parrot -- should life imitate satire?

Tim Peters tim.one@home.com
Sun, 12 Aug 2001 00:05:37 -0400


[Dan Sugalski, on
 Wednesday, August 01, 2001 2:20 AM]
> ...
> Ouch, I'd bet that hurts. Has anyone timed the difference between
> making lots of getc calls and making a few larger reads and managing
> the buffers internally? I can see it going either way, and another
> data point would be useful to have.

There's been lots of this in Python-Dev, like in this thread:

    http://aspn.activestate.com/ASPN/Mail/Message/600485

I'll quote the high-order bit:

    My line-at-a-time test case used (rounding to nearest whole integers)
    30 seconds in Python and 6 in Perl.  The result of testing many
    changes to Python's implementation was that the excess 24 seconds
    broke down like so:

    17   spent inside internal MS threadsafe getc() lock/unlock
             routines
     5   uncertain, but evidence suggests much of it due to MS
             malloc/realloc (Perl does its own memory mgmt)
     2   for not copying directly out of the platform FILE*
             implementation struct in a highly optimized loop (like
             Perl does)

    My last checkin to fileobject.c reclaimed 17 seconds on Win98SE
    while remaining threadsafe, via a combination of locking per line
    instead of per character, and invoking realloc much less often
    (only for lines exceeding 200 chars).

Note that thread overhead is overwhelmingly the biggest hangup.  Python has
two threadsafe input tricks now:

1. On platforms that have flockfile(), funlockfile(), and
   getc_unlocked(), the last is used in a loop bracketed by the first
   two.

2. At least on Windows, which doesn't have those, we use the platform
   fgets() in an excruciating way, tricking it into letting us read
   lines with embedded null bytes.

Oddly enough, in the timing reports I saw, approach #1 was never faster than
approach #2, and on at least one platform (Tru64, IIRC) was slower.

Of course fgets() is a primitive in std C because they *wanted* to make it
possible for vendors to optimize it (in the ways Perl does), but it appears
very few vendors do optimize it.  On Windows it's the same old
getc()-in-a-loop, but they lock/unlock the stream only once per fgets call
(using internal stream functions that aren't exposed).

The "2 seconds for not copying directly ... like Perl does" I reported above
came from hacking together a thread-unsafe line input routine that used the
same FILE* tricks Perl uses.  That is, thread-unsafe getc-in-a-loop was 2
seconds slower than using thread-unsafe FILE* tricks.  That's significant in
absolute terms, but was lost in the noise compared to the other stuff we
were fighting.