Converting HTML to ASCII
Mike Meyer
mwm at mired.org
Sat Feb 26 20:28:23 EST 2005
Michael Spencer <mahs at telcopartners.com> writes:
> Mike Meyer wrote:
>
>> It also fails on tags with a ">" in a string in the tag. That's
>> well-formed but ill-used HTML.
>> <mike
> True enough...however, it doesn't fail too horribly:
> >>> striptags("""<sometag attribute = '>'>the text</sometag>""")
> "'>the text"
> >>>
Depends on your example:
<sometage attribute='>' otherattribute='otherstuff' moreattribute='yet
more stuff'>
and so on. Then again, early browsers actually did the same kind of
parsing as you do, so this type of thing is discouraged.
> and I think that case could be rectified rather easily, by stripping
> any content up to '>' in the result without breaking anything else.
Yes, but then what happens with:
<sometag>>text</sometag>
?
<mike
--
Mike Meyer <mwm at mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
More information about the Python-list
mailing list