URLs and ampersands

Richard Brodie R.Brodie at rl.ac.uk
Tue Aug 5 07:43:22 EDT 2008


"Steven D'Aprano" <steve at REMOVE-THIS-cybersource.com.au> wrote in message 
news:00a78f7e$0$20302$c3e8da3 at news.astraweb.com...

> I could just do a string replace, but is there a "right" way to escape
> and unescape URLs?

The right way is to parse your HTML with an HTML parser. URLs are not
exempt from the normal HTML escaping rules, although there are an awful lot
of pages that get this wrong.

You didn't post any code, so it's hard to tell but maybe something like
ElementTree or lxml would be a better tool than the ones you are currently using. 





More information about the Python-list mailing list