make RE more cleaver to avoid inappropriate : sre_constants.error: redefinition of group name

Paddy paddy3118 at googlemail.com
Fri Mar 30 12:19:49 EDT 2007


On Mar 30, 1:44 pm, "aspineux" <aspin... at gmail.com> wrote:
> On 30 mar, 00:13, "Paddy" <paddy3... at googlemail.com> wrote:
>
> > On Mar 29, 3:22 pm, "aspineux" <aspin... at gmail.com> wrote:
>
> > > I want to parse
>
> > > 'foo at bare' or '<foot at bar>' and get the email address foo at bar
>
> > > the regex is
>
> > > r'<\w+@\w+>|\w+@\w+'
>
> > > now, I want to give it a name
>
> > > r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'
>
> > > sre_constants.error: redefinition of group name 'email' as group 2;
> > > was group 1
>
> > > BUT because I use a | , I will get only one group named 'email' !
>
> > > Any comment ?
>
> > > PS: I know the solution for this case is to use  r'(?P<lt><)?(?P<email>
> > > \w+@\w+)(?(lt)>)'
>
> > use two group names, one for each alternate form and if you are not
> > concerned with whichever matched do something like the following:
>
> The problem is the way I create this regex :-)
>
> regex={}
> regex['email']=r'(?P<email1>\w+@\w+)'
>
> path=r'<%(email)s>|%(email)s' % regex
>
> Once more, the original question is :
> Is it normal to get an error when the same id used on both side of a
> |
>
>
>
> > >>> s1 = 'foo at bare'
> > >>> s2 = '<foo at bare>'
> > >>> matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\w+)', s1)
> > >>> matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']
> > 'foo at bare'
> > >>> matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\w+)', s2)
> > >>> matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']
> > 'foo at bare'
>
> > - Paddy.

Groups are numbered left-to-right irrespective of the expression
contents.
I am quite happy with the names being merely apseudonym for the
positional
group number and don't see a problem with not allowing multiple
occurrences of  the same group name.
I did see some article about RE's and their speed. It seems that if
Pythons
RE package distinguished between 'grep style' RE' and the full set of
Python
RE's then their are much faster and efficient algorithms available for
the
grep style subset.

- Paddy.




More information about the Python-list mailing list