i don't understand this RE example from the documentation

Ben Cartwright bencvt at gmail.com
Mon May 8 19:11:32 EDT 2006


John Salerno wrote:
> John Salerno wrote:
> > Ok, I've been staring at this and figuring it out for a while. I'm close
> > to getting it, but I'm confused by the examples:
> >
> > (?(id/name)yes-pattern|no-pattern)
> > Will try to match with yes-pattern if the group with given id or name
> > exists, and with no-pattern if it doesn't. |no-pattern is optional and
> > can be omitted.
> >
> > For example, (<)?(\w+@\w+(?:\.\w+)+)(?(1)>) is a poor email matching
> > pattern, which will match with '<user at host.com>' as well as
> > 'user at host.com', but not with '<user at host.com'. New in version 2.4.
> >
> > group(1) is the email address pattern, right? So why does the above RE
> > match 'user at host.com'. If the email address exists, does the last part
> > of the RE: (?(1)>) mean that it has to end with a '>'?
>
> I think I got it. The group(1) is referring to the opening '<', not the
> email address. I had seen an earlier example that used group(0), so I
> thought maybe the groups were 0-based.

The groups *are* 0-based.  The 0th group is the whole match, e.g.:

  >>> import re
  >>> m = re.match(r'a(b+)', 'abbbb')
  >>> m.group(0)
  'abbbb'
  >>> m.group(1)
  'bbbb'

And for the pattern you were looking at:

  >>> m = re.match(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)', '<foo at test.com>')
  >>> m.group(0)
  '<foo at test.com>'
  >>> m.group(1)
  '<'
  >>> m.group(2)
  'foo at test.com'

--Ben




More information about the Python-list mailing list