HTML to dictionary

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Tue Feb 27 06:11:37 EST 2007


Tina I:
> I have a small, probably trivial even, problem. I have the following HTML:

This is a little data munging problem.
If it's a one-shot problem, then you can just load it with a browser,
copy and paste it as text, and then process the lines of the text in a
simple way (splitting lines according to ":", and using the stripped
pairs to feed a dict).

If there are more Html files, or you want to automate things more, you
can use html2text:
http://www.aaronsw.com/2002/html2text/

A little script like this may help you:

from html2text import html2text
txt = html2text(the_html_data)
lines = str(txt).replace("**", "").strip().splitlines()
fields = [[field.strip() for field in line.split(":")] for line in
lines]
print dict(fields)

Note that splitlines() is tricky, if you find some problems, then you
may want a smarter splitter.

Bye,
bearophile




More information about the Python-list mailing list