Questions about regex

Rob Williscroft rtw at freenet.co.uk
Sat May 30 10:45:34 EDT 2009


 wrote in news:fe9f707f-aaf3-4ca6-859a-5b0c63904fc0
@s28g2000vbp.googlegroups.com in comp.lang.python:


>      text = re.sub('(\<(/?[^\>]+)\>)', "", text)#remove the HTML
> 

Python has a /r/ (raw) string literal type for regex's:

  text = re.sub( r'(\<(/?[^\>]+)\>)', "", text )

In raw strings python doesn't process backslash escape sequences
so r\n' is the 2 char' string '\\n' (a backslash folowed by an 'n').

Without that your pattern  string would need to be writen as:

  '(\\<(/?[^\\>]+)\\>)'

IOW backslashes need to be doubled up or python will process them
before they are passed to re.sub.

Also this seems to be some non-python dialect of regular expression
language, Pythons re's don't need to escape < and >.

http://docs.python.org/library/re.html

The grouping operators, '(' and ')', appear to be unnessasery,
so altogether this 1 line should probably be:

  text = re.sub( r'</?[^>]+>', '', text )

Rob.
-- 
http://www.victim-prime.dsl.pipex.com/



More information about the Python-list mailing list