Saving search results in a dictionary
Duncan Booth
me at privacy.net
Thu Jun 17 10:58:23 EDT 2004
Lukas Holcik <xholcik1 at fi.muni.cz> wrote in
news:Pine.LNX.4.60.0406171557330.16166 at nymfe30.fi.muni.cz:
> Or how can I replace the html &entities; in a string
> "blablabla&blablabal&balbalbal" with the chars they mean using
> re.sub? I found out they are stored in an dict [from htmlentitydefs
> import htmlentitydefs]. I though about this functionality:
You really don't want to use a regex for this. Remember that as well as
forms like & you can equally use hex escapes such as &
I suggest you consider parsing your HTML using sgmllib as that will
automatically handle all the entity definitions without you having to worry
about them.
Likewise your question about extracting all the links in a single pass is
much easier to do reliably if you use sgmllib than with a regular
expression.
More information about the Python-list
mailing list