How to make regexes faster? (Python v. OmniMark)
Tim Peters
tim.one at comcast.net
Fri Apr 19 15:49:02 EDT 2002
[Donn Cave]
> Part of the problem is that when you write something like a "grep"
> in Python and in Perl, the Perl program will naturally be written
> like while ($line = <STDIN>) {...}, and the Python program will
> naturally be written like while 1: line = sys.stdin.readline() ...
> That pits a lot of function calls against what must be an inline
> operation. I think I decided that "I/O", in this practical sense
[Cameron Laird]
> Worse (or at least "more"): Perl goes out of its way to optimize
> while ($line = <>) {...}
> 'Least, it has in the past; I haven't looked lately.
Sorry, you both lose <wink>. In older versions of Python, and on most
platforms, the single biggest speed difference in line-at-a-time input was
due to that Python's I/O is thread-safe but Perl's is not. Call the
platform getch(), and it doesn't just grab the next char from the buffer, it
runs all over creation slapping locks on internal platform I/O structures
for the duration. This locking overhead utterly swamps the time needed just
to get the next char. Perl peeks and pokes the platform stdio _iobufs
directly, without locking, and in the presence of threads then "whatever
happens, happens".
Recent versions of Python endure some remarkable platform-specific pain to
reduce the platform locking overhead while remaining threadsafe. The
new-in-2.2 idiom
for line in fileobject:
also invokes the xreadlines "chunking" mechanism under the covers.
More information about the Python-list
mailing list