Easy way to remove HTML entities from an HTML document?
Christopher T King
squirrel at WPI.EDU
Tue Jul 27 09:23:59 EDT 2004
On Mon, 26 Jul 2004, Robert Oschler wrote:
> I believe the line that reads:
>
> def converthtml(s):
> return re.sub(r'&(#?)(.+?);',convert,s)
>
> Should read:
>
> def converthtml(s):
> return re.sub(r'&(#?)(.+?);',convertentity,s)
Oops, you're right, mea culpa :)
> So you can pass a function to re.sub() as the replacement patttern? Very
> cool, I didn't know that. I think you could spend a year just learning
> regular expressions and still miss something.
That feature is only mentioned briefly in the online docs, and not at all
in sre.sub's docstring. Surprising, since it's indeed a very useful
feature.
More information about the Python-list
mailing list