regex, split and ()

Alex Martelli aleaxit at yahoo.com
Wed Dec 27 12:32:45 EST 2000


"Jacek Pop³awski" <jp at ulgo.koti.com.pl> wrote in message
news:slrn94k8rq.18t.jp at localhost.localdomain...
    [snip]
> >>> w=re.compile(r'(<[^<>]*(".*")?>)')
> >>> s="<html> one <br> two <img src=\"<blah\"> three </html>"
> >>> w.split(s)
> ['', '<html>', None, ' one ', '<br>', None, ' two ', '<img src="<blah">',
> '"<blah"', ' three ', '</html>', None, '']
>
> works good, but why it double everything? probably becouse I used second
(),

Right.

> how to fix it?

Use non-grouping parentheses (?: ... ) for the inner group, for example:

>>> w=re.compile(r'(<[^<>]*(?:".*")?>)')
>>> s="<html> one <br> two <img src=\"<blah\"> three </html>"
>>> w.split(s)
['', '<html>', ' one ', '<br>', ' two ', '<img src="<blah">', ' three ',
'</html>', '']



Alex






More information about the Python-list mailing list