[Python-Dev] A standard lexer?

Fredrik Lundh Fredrik Lundh" <effbot@telia.com
Sun, 2 Jul 2000 19:55:11 +0200


tim wrote:
> OTOH, arbitrary small integers are not Pythonic.  Your example =
*generates*
> them in order to guarantee they're unique, which is a bad sign.

this feature itself has been on the todo list for quite a while; the =
(?P#n)
syntax just exposes the inner workings (the "small integer" is simply =
some-
thing that fits in a SRE_CODE word).

as you say, it's probably a good idea to hide it a bit better...

> >         for phrase, action in lexicon:
> >             p.append("(?:%s)(?P#%d)" % (phrase, len(p)))
>=20
> How about instead enhancing existing (?P<name>pattern) notation, to =
set a
> new match object attribute to name if & when pattern matches?  Then
> arbitrary info associated with a named pattern can be gotten at via =
dicts
> via the pattern name, & the whole mess should be more readable.

good idea.  and fairly easy to implement, I think.

on the other hand, that means creating more real groups.  and
groups don't come for free...

maybe this functionality should only be available through the scanner
class?  it can compile the patterns separately, and combine the data
structures before passing them to the code generator.  a little bit more
code to write, but less visible oddities.

> On the third hand, I'm really loathe to add more gimmicks to stinking
> regexps.  But, on the fourth hand, no alternative yet has proven =
popular
> enough to move away from those suckers.
>=20
> if-you-can't-get-a-new-car-at-least-tune-up-the-old-one-ly y'rs  - tim

hey, SRE is a new car.  same old technology, though.  only smaller ;-)

btw, if someone wants to play with this, I just checked in a new SRE
snapshot.  a little bit of documentation can be found here:
http://hem.passagen.se/eff/2000_07_01_bot-archive.htm#416954

</F>