Regexp optimization question

Greg Ewing greg at cosc.canterbury.ac.nz
Mon Apr 26 00:46:26 EDT 2004


Magnus Lie Hetland wrote:
> One of the reasons I've avoided existing lexers is that I don't do
> standard tokenization -- I don't partition all of the text into regexp
> tokens. I allow the lexer to skip over text -- somewhat like how
> whitespace is normally handled, except that this can be *any* text --
> and to return the next token that is of any use to the current parsing
> rule.
> 
> But who knows -- I may be able to use Plex anyway.

You should be able to do this with Plex by defining a
fall-back rule that matches any single character and
ignores it.

If you make it the last rule, and make sure it only
matches one character, any other rule that also matches
will take precedence over it.

If you need to match different sets of tokens in
different circumstances, you can do this with states,
and have the parser switch states according to what
it's currently doing.

-- 
Greg Ewing, Computer Science Dept,
University of Canterbury,	
Christchurch, New Zealand
http://www.cosc.canterbury.ac.nz/~greg




More information about the Python-list mailing list