Converting HTML to ASCII

Mike Meyer mwm at mired.org
Sat Feb 26 20:28:23 EST 2005


Michael Spencer <mahs at telcopartners.com> writes:

> Mike Meyer wrote:
>
>> It also fails on tags with a ">" in a string in the tag. That's
>> well-formed but ill-used HTML.
>>             <mike
> True enough...however, it doesn't fail too horribly:
>   >>> striptags("""<sometag attribute = '>'>the text</sometag>""")
>   "'>the text"
>   >>>

Depends on your example:

<sometage attribute='>' otherattribute='otherstuff' moreattribute='yet
more stuff'>

and so on. Then again, early browsers actually did the same kind of
parsing as you do, so this type of thing is discouraged.

> and I think that case could be rectified rather easily, by stripping
> any content up to '>' in the result without breaking anything else.

Yes, but then what happens with:

     <sometag>>text</sometag>

?

        <mike

-- 
Mike Meyer <mwm at mired.org>			http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.



More information about the Python-list mailing list