How to make regexes faster? (Python v. OmniMark)

Fri Apr 19 15:49:02 EDT 2002

[Donn Cave]
> Part of the problem is that when you write something like a "grep"
> in Python and in Perl, the Perl program will naturally be written
> like while ($line = <STDIN>) {...}, and the Python program will
> naturally be written like while 1: line = sys.stdin.readline() ...
> That pits a lot of function calls against what must be an inline
> operation.  I think I decided that "I/O", in this practical sense

[Cameron Laird]
> Worse (or at least "more"):  Perl goes out of its way to optimize
>   while ($line = <>) {...}
> 'Least, it has in the past; I haven't looked lately.

Sorry, you both lose <wink>.  In older versions of Python, and on most
platforms, the single biggest speed difference in line-at-a-time input was
due to that Python's I/O is thread-safe but Perl's is not.  Call the
platform getch(), and it doesn't just grab the next char from the buffer, it
runs all over creation slapping locks on internal platform I/O structures
for the duration.  This locking overhead utterly swamps the time needed just
to get the next char.  Perl peeks and pokes the platform stdio _iobufs
directly, without locking, and in the presence of threads then "whatever
happens, happens".

Recent versions of Python endure some remarkable platform-specific pain to
reduce the platform locking overhead while remaining threadsafe.  The
new-in-2.2 idiom

    for line in fileobject:

also invokes the xreadlines "chunking" mechanism under the covers.