Regexp optimization question

Paul McGuire ptmcg at austin.rr._bogus_.com
Fri Apr 23 23:30:24 EDT 2004


> One of the reasons I've avoided existing lexers is that I don't do
> standard tokenization -- I don't partition all of the text into regexp
> tokens. I allow the lexer to skip over text -- somewhat like how
> whitespace is normally handled, except that this can be *any* text --
> and to return the next token that is of any use to the current parsing
> rule.

pyparsing supports this kind of text skipping, using scanString() instead of
parseString().  scanString() is actually a generator, yielding for each
match a tuple consisting of:
- matched tokens (returned as a ParseResults object - sort of super-list,
supporting simple list semantics, but some tokens can be named and accessed
as in a dictionary or as attributes)
- start location in source string
- end location in source string

Download pyparsing at http://pyparsing.sourceforge.net .

-- Paul





More information about the Python-list mailing list