HTML to Text renderer

Marc Christiansen tolot at jupiter.solar-empire.de
Mon Nov 8 20:12:53 EST 2004


Ian Bicking <ianb at colorstudy.com> wrote:
> Robert Brewer wrote:
>> To clarify: you don't want the HTML tags merely stripped; you want to
>> replace e.g. br with a line break and p with, say, two line breaks?
> 
> Right.  And word wrapping too.  Some other tags would also be 
> interesting: <blockquote>, <pre>, <hr>, <table>,  , and something 
> to control alignment (e.g., <p align="">).

Have a look at htmllib.HTMLParser and formatter in the standard Python
lib (but also look at the source of htmllib). Maybe they provide what
you need.

HTH
  Marc



More information about the Python-list mailing list