Perl speed vs. Python speed

Tim Peters tim_one at email.msn.com
Thu Jan 13 03:48:51 EST 2000


[Tom Culliton, to /F]
> ...
> Ask Tim and he'll recommend using small simple regular
> expressions 9 times out of 10.  At least thats how he's
> responded every time I've asked about problems with some
> excessively clever regular expression here.

Well, there *is* a bit of bias in the stimulus/response here:  by the time
someone writes a regexp so hairy that they feel compelled to post it to the
world in despair of ever getting it to work, the advice to simplify isn't
really hard to come up with <wink>.

[Gordon McMillan]
> Fredrik is emphasizing size of the string, not complexity
> of the expression.

This is putting words in /F's mouth -- although I suspect they were the
words he intended to put there himself.  The whole "speed problem" with
regex vs re is that the latter has higher fixed overhead per call, but is
(often) much faster once it gets going.

> [and here Gordon rudely repeats what I just wrote <wink>]

> Also, a long pattern is not necessarily a complex pattern. A
> humongous set of alternatives is no problem. The problem is
> when people expect to be able to do more than one thing at a
> time with a pattern.
>
> Such as match an entire Python triple quoted string.
>
> Which Tim does.

Hmm.  I'm inclined to call a single string one thing -- unless you pay
exaggerated attention to the three quotes <wink>.  For sure, I don't
hesitate to write extremely involved regexps for tokenization, but those are
always of the form

    alt1 | alt2 | alt3 | ... | altn

where the alt_i are mutually exclusive and so can be developed & reasoned
about in isolation (your "humongous set of alternatives", albeit that n is
rarely more than a dozen).  And I move heaven, earth and three acres of hell
to keep any possibility of backtracking out of each alt_i!  The regexp is
thus doubly defanged, and can't cause conceptual, speed or memory problems
regardless of string size or whether it matches or fails.

The whole trick to outsmarting an NFA engine lies in making sure none pf the
NFA "let's guess what they meant by this ambiguous pattern" machinery ever
gets triggered -- once it does, death eventually follows.

well-really-your-user's-but-it's-easy-to-get-more-
    of-them<wink>-ly y'rs  - tim






More information about the Python-list mailing list