make RE more cleaver to avoid inappropriate : sre_constants.error: redefinition of group name

Paddy paddy3118 at googlemail.com
Thu Mar 29 18:13:19 EDT 2007


On Mar 29, 3:22 pm, "aspineux" <aspin... at gmail.com> wrote:
> I want to parse
>
> 'foo at bare' or '<foot at bar>' and get the email address foo at bar
>
> the regex is
>
> r'<\w+@\w+>|\w+@\w+'
>
> now, I want to give it a name
>
> r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'
>
> sre_constants.error: redefinition of group name 'email' as group 2;
> was group 1
>
> BUT because I use a | , I will get only one group named 'email' !
>
> Any comment ?
>
> PS: I know the solution for this case is to use  r'(?P<lt><)?(?P<email>
> \w+@\w+)(?(lt)>)'

use two group names, one for each alternate form and if you are not
concerned with whichever matched do something like the following:

>>> s1 = 'foo at bare'
>>> s2 = '<foo at bare>'
>>> matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\w+)', s1)
>>> matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']
'foo at bare'
>>> matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\w+)', s2)
>>> matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']
'foo at bare'
>>>

- Paddy.




More information about the Python-list mailing list