How to make regexes faster? (Python v. OmniMark)
Donn Cave
donn at u.washington.edu
Fri Apr 19 13:10:33 EDT 2002
Quoth claird at starbase.neosoft.com (Cameron Laird):
...
| Next, I'd determine whether my test examples are indeed
| regex-bound (it might well be I/O which constrains your
| performance). After that ... well, part of the charm of
| regex-s for some people is that they're so flexible that
| different techniques are superior in different circumstances.
Indeed, it could be partly I/O.
I recently went to a meeting and heard someone mention that he had
written a program in Python, his first, but was thinking of rewriting
it in Perl because he had determined that Perl was 10 times faster
at I/O and regular expressions. At a site that employs hundreds of
at least occasional programmers, this is maybe the fourth I've seen
show this much interest in Python, so I was kind of chagrined to hear
this announcement and went back to check it out. I was even more
chagrined to find that it was not an unreasonable claim.
Part of the problem is that when you write something like a "grep"
in Python and in Perl, the Perl program will naturally be written
like while ($line = <STDIN>) {...}, and the Python program will
naturally be written like while 1: line = sys.stdin.readline() ...
That pits a lot of function calls against what must be an inline
operation. I think I decided that "I/O", in this practical sense
of getting a line of data, might have been about half the problem.
The xreadlines function available in later versions of Python did
reduce the disparity a little. This optimization might help a lot
in the present case, if there's a lot of line-by-line I/O and if
Python is 2.1 or later.
Donn Cave, donn at u.washington.edu
More information about the Python-list
mailing list