small problem with re.sub

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Wed Jan 30 22:49:33 EST 2008


En Thu, 31 Jan 2008 01:01:30 -0200, Astan Chee <stanc at al.com.au> escribió:

> I have a html text stored as a string. Now I want to go through this
> string and find all 6 digit numbers and make links from them.
> Im using re.sub and for some reason its not picking up the previously
> matched condition. Am I doing something wrong? This is what my code
> looks like:
> htmlStr = re.sub('(?P<id>\d{6})','<a
> href=\"http://linky.com/(?P=id).html\">(?P=id)</a>',htmlStr)
> It seems that it replaces it alright, but it replaces it literally. Am I
> not escaping certain characters?

Two errors:
- use raw strings r"..." to write regular expressions (or quote every  
backslash...)
- if you want to *substitute* a named group, the syntax is "\g<name>"; you  
have used the syntax to *match* a named group.

re.sub(r'(?P<id>\d{6})',r'<a  
href="http://linky.com/\g<id>.html">\g<id></a>',htmlStr)

In simple cases like this, may be easier to just use a number:

re.sub(r'(\d{6})',r'<a href="http://linky.com/\1.html">\1</a>',htmlStr)

-- 
Gabriel Genellina




More information about the Python-list mailing list