Working with named groups in re module

Neil Cerutti horpner at yahoo.com
Wed Jan 10 09:53:49 EST 2007


A found some clues on lexing using the re module in Python in an
article by Martin L÷wis.

http://www.python.org/community/sigs/retired/parser-sig/towards-standard/

He writes:
  [...]
  A scanner based on regular expressions is usually implemented
  as an alternative of all token definitions. For XPath, a
  fragment of this expressions looks like this:


      (?P<Number>\\d+(\\.\\d*)?|\\.\\d+)|
      (?P<VariableReference>\\$""" + QName + """)|
      (?P<NCName>"""+NCName+""")|
      (?P<QName>"""+QName+""")|
      (?P<LPAREN>\\()|

  Here, each alternative in the regular expression defines a
  named group. Scanning proceeds in the following steps:

     1. Given the complete input, match the regular expression
     with the beginning of the input.
     2. Find out which alternative matched.
     [...]

Item 2 is where I get stuck. There doesn't seem to be an obvious
way to do it, which I understand is a bad thing in Python.
Whatever source code went with the article originally is not
linked from the above page, so I don't know what Martin did.

Here's what I came up with (with a trivial example regex):

  import re
  r = re.compile('(?P<x>x+)|(?P<a>a+)')
  m = r.match('aaxaxx')
  if m:
    for k in r.groupindex:
      if m.group(k):
        # Find the token type.
        token = (k, m.group())

I wish I could do something obvious instead, like m.name().

-- 
Neil Cerutti
After finding no qualified candidates for the position of principal, the
school board is pleased to announce the appointment of David Steele to the
post. --Philip Streifer



More information about the Python-list mailing list