text representation of HTML
Duncan Booth
duncan.booth at invalid.invalid
Thu Jul 20 11:12:27 EDT 2006
Ksenia Marasanova wrote:
> I am looking for a library that will give me very simple text
> representation of HTML.
> For example
><div><h1>Title</h1><p>This is a <br />test</p></div>
>
> will be transformed to:
>
> Title
>
> This is a
> test
>
>
> i want to send plain text alternative of html email, and would prefer
> to do it automatically from HTML source.
> Any hints?
Use htmllib:
>>> import htmllib, formatter, StringIO
>>> def cleanup(s):
out = StringIO.StringIO()
p = htmllib.HTMLParser(
formatter.AbstractFormatter(formatter.DumbWriter(out)))
p.feed(s)
p.close()
if p.anchorlist:
print >>out
for idx,anchor in enumerate(p.anchorlist):
print >>out, "\n[%d]: %s" % (idx+1,anchor)
return out.getvalue()
>>> print cleanup('''<div><h1>Title</h1><p>This is a <br
/>test</p></div>''')
Title
This is a
test
>>> print cleanup('''<div><h1>Title</h1><p>This is a <br />test with <a
href="http://python.org">a link</a> to the Python homepage</p></div>''')
Title
This is a
test with a link[1] to the Python homepage
[1]: http://python.org
More information about the Python-list
mailing list