URLs and ampersands

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Tue Aug 5 19:21:22 EDT 2008


On Tue, 05 Aug 2008 12:07:39 +0000, Duncan Booth wrote:

> Whenever you put a URL into an HTML file you need to escape it, so
> naturally you will also need to unescape it when it is retrieved from
> the file. However, whatever you use to parse the HMTL ought to be
> unescaping text and attributes as part of the parsing process, so you
> shouldn't need a separate function for this.

...

> Even Python's builtin HTMLParser class will do this for you. What parser
> are you using?

A regex.

I know, I know, now I have two problems :-)

It's a quick and dirty hack, not a production piece of code, and I have a 
quick and dirty fix by just using url.replace('&', '&').

Thanks to everybody who replied. I guess I really have to bite the bullet 
and learn how to use a proper HTML parser.



-- 
Steven



More information about the Python-list mailing list