Possible bug in sgmllib?

Fredrik Nehr frneh at yahoo.com
Fri Oct 6 04:07:33 EDT 2000


The original Fredrik wrote:
> note that the bug is really on the end tag side; sgmllib checks
> that start tags are valid, but accepts almost anything in the end
> tag:
>
> >>> Parser().feed("<foo_bar>data</foo/&%&#bar>")
> start_foo
> do_foo
> end_foo/&%&#bar
>
> </F>
>
Ok, "_" is not a valid character in a SGML element name but I don't think
the current behaivor, to just drop everything after the first invalid
character and call the method with whats left, is perfect.  The library
does't handle starttags with a invalid character in the name very well
either.

ActivePython 2.0, build 201 (ActiveState Tool Corp.)
based on Python 2.0b1 (#1, Sep 22 2000, 12:29:54)
[GCC 2.95.1 19990816 (release)] on sunos5
Type "copyright", "credits" or "license" for more information.
>>> import sgmllib
>>> class Parser(sgmllib.SGMLParser):
...     def unknown_starttag(self, tag, attrs):
...             print tag, attrs
...     def unknown_endtag(self, tag):
...             print tag
...
>>> Parser().feed("<foo_bar>data</foo_bar>")
foo [('_bar', '_bar')]
foo_bar


Perhaps two new methods (invalid_starttag and invalid_endtag) should be
introduced?


/Fredrik (the other one)





More information about the Python-list mailing list