Easy way to remove HTML entities from an HTML document?

Christopher T King squirrel at WPI.EDU
Tue Jul 27 09:23:59 EDT 2004


On Mon, 26 Jul 2004, Robert Oschler wrote:

> I believe the line that reads:
> 
> def converthtml(s):
>       return re.sub(r'&(#?)(.+?);',convert,s)
> 
> Should read:
> 
> def converthtml(s):
>       return re.sub(r'&(#?)(.+?);',convertentity,s)

Oops, you're right, mea culpa :)

> So you can pass a function to re.sub() as the replacement patttern?  Very
> cool, I didn't know that.  I think you could spend a year just learning
> regular expressions and still miss something.

That feature is only mentioned briefly in the online docs, and not at all 
in sre.sub's docstring.  Surprising, since it's indeed a very useful 
feature.




More information about the Python-list mailing list