make RE more cleaver to avoid inappropriate : sre_constants.error: redefinition of group name

aspineux aspineux at gmail.com
Fri Mar 30 08:44:27 EDT 2007


On 30 mar, 00:13, "Paddy" <paddy3... at googlemail.com> wrote:
> On Mar 29, 3:22 pm, "aspineux" <aspin... at gmail.com> wrote:
>
>
>
> > I want to parse
>
> > 'foo at bare' or '<foot at bar>' and get the email address foo at bar
>
> > the regex is
>
> > r'<\w+@\w+>|\w+@\w+'
>
> > now, I want to give it a name
>
> > r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'
>
> > sre_constants.error: redefinition of group name 'email' as group 2;
> > was group 1
>
> > BUT because I use a | , I will get only one group named 'email' !
>
> > Any comment ?
>
> > PS: I know the solution for this case is to use  r'(?P<lt><)?(?P<email>
> > \w+@\w+)(?(lt)>)'
>
> use two group names, one for each alternate form and if you are not
> concerned with whichever matched do something like the following:
>
The problem is the way I create this regex :-)

regex={}
regex['email']=r'(?P<email1>\w+@\w+)'

path=r'<%(email)s>|%(email)s' % regex

Once more, the original question is :
Is it normal to get an error when the same id used on both side of a
|

>
>
> >>> s1 = 'foo at bare'
> >>> s2 = '<foo at bare>'
> >>> matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\w+)', s1)
> >>> matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']
> 'foo at bare'
> >>> matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\w+)', s2)
> >>> matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']
> 'foo at bare'
>
> - Paddy.





More information about the Python-list mailing list