Saving search results in a dictionary

Duncan Booth me at privacy.net
Thu Jun 17 10:58:23 EDT 2004


Lukas Holcik <xholcik1 at fi.muni.cz> wrote in
news:Pine.LNX.4.60.0406171557330.16166 at nymfe30.fi.muni.cz: 

> Or how can I replace the html &entities; in a string 
> "blablabla&blablabal&balbalbal" with the chars they mean using
> re.sub? I found out they are stored in an dict [from htmlentitydefs
> import htmlentitydefs]. I though about this functionality:

You really don't want to use a regex for this. Remember that as well as 
forms like & you can equally use hex escapes such as &#x26;

I suggest you consider parsing your HTML using sgmllib as that will 
automatically handle all the entity definitions without you having to worry 
about them.

Likewise your question about extracting all the links in a single pass is 
much easier to do reliably if you use sgmllib than with a regular 
expression.



More information about the Python-list mailing list