Working with named groups in re module

Neil Cerutti horpner at yahoo.com
Wed Jan 10 11:14:28 EST 2007


On 2007-01-10, Fredrik Lundh <fredrik at pythonware.com> wrote:
> Neil Cerutti wrote:
>> A found some clues on lexing using the re module in Python in
>> an article by Martin L÷wis.
>
>>   Here, each alternative in the regular expression defines a
>>   named group. Scanning proceeds in the following steps:
>>
>>      1. Given the complete input, match the regular expression
>>      with the beginning of the input.
>>      2. Find out which alternative matched.
>
> you can use lastgroup, or lastindex:
>
> http://effbot.org/zone/xml-scanner.htm
>
> there's also a "hidden" ready-made scanner class inside the SRE
> module that works pretty well for simple cases; see:
>
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/457664

Thanks for the excellent pointers.

I got tripped up:

>>> m = re.match('(a+(b*)a+)', 'abbbbaa')
>>> dir(m)
['__copy__', '__deepcopy__', 'end', 'expand', 'group', 'groupdict', 'groups', 'span', 'start']

There are some notable omissions there. That's not much of an
excuse for my not understanding the handy docs, but I guess it
can can function as a warning against relying on the interactive
help.

I'd seen the lastgroup definition in the documentation, but I
realize it was exactly what I needed. I didn't think carefully
enough about what "last matched capturing group" actually meant,
given my regex. I don't think I saw "name" there either. ;-)

  lastgroup 
  
  The name of the last matched capturing group, or None if the
  group didn't have a name, or if no group was matched at all. 

-- 
Neil Cerutti
We dispense with accuracy --sign at New York drug store



More information about the Python-list mailing list