How to convert markup text to plain text in python?

Paul McGuire ptmcg at austin.rr.com
Fri Feb 1 12:20:50 EST 2008


On Feb 1, 10:54 am, Tim Chase <python.l... at tim.thechases.com> wrote:
> >> Well, if all you want to do is remove everything from a "<" to a
> >> ">", you can use
>
> >>   >>> s = "<B>Today</B> is <U>Friday</U>"
> >>   >>> import re
> >>   >>> r = re.compile('<[^>]*>')
> >>   >>> print r.sub('', s)
> >>   Today is Friday
>
> [Tim's ramblings about pathological cases snipped]

pyparsing includes an example script for stripping tags from HTML
source.  See it on the wiki at http://pyparsing.wikispaces.com/space/showimage/htmlStripper.py.

-- Paul



More information about the Python-list mailing list