How to convert markup text to plain text in python?
Paul McGuire
ptmcg at austin.rr.com
Fri Feb 1 12:20:50 EST 2008
On Feb 1, 10:54 am, Tim Chase <python.l... at tim.thechases.com> wrote:
> >> Well, if all you want to do is remove everything from a "<" to a
> >> ">", you can use
>
> >> >>> s = "<B>Today</B> is <U>Friday</U>"
> >> >>> import re
> >> >>> r = re.compile('<[^>]*>')
> >> >>> print r.sub('', s)
> >> Today is Friday
>
> [Tim's ramblings about pathological cases snipped]
pyparsing includes an example script for stripping tags from HTML
source. See it on the wiki at http://pyparsing.wikispaces.com/space/showimage/htmlStripper.py.
-- Paul
More information about the Python-list
mailing list