Regexp optimization question

Magnus Lie Hetland mlh at furu.idi.ntnu.no
Sat Apr 24 11:15:45 EDT 2004


In article <kblic.15952$hR1.8421 at fe2.texas.rr.com>, Paul McGuire
wrote: [snip]
>
> pyparsing supports this kind of text skipping, using scanString()
> instead of parseString().

I already have a working implementation in Python -- if this isn't
more efficient (I'm just talking about the tokenization part) I don't
think there would be much gain in switching.

(IIRC, the pyparsing docs say that pyparsing is slow for complex
grammars, at least.)

BTW: I have not done some experiments with Plex with lots of regular
expressiosn; simply compiling a pattern with 500 alternatives took
forever, whereas re.compile was quite fast.

So... If I can somehow be content with only getting one match per
position, I guess re is the best solution.

Or I could implement something in C (Pyrex)... (Or use something like
pybison.)

-- 
Magnus Lie Hetland              "Wake up!"  - Rage Against The Machine
http://hetland.org              "Shut up!"  - Linkin Park



More information about the Python-list mailing list