Possible bug in sgmllib?

Robert Roy rjroy at takingcontrol.com
Thu Oct 5 15:07:28 EDT 2000


On Thu, 5 Oct 2000 14:07:15 +0200, "Fredrik Nehr" <frneh at yahoo.com>
wrote:

>The start_<tag> and do_<tag> methods does't get called with the correct name
>when tag contains a underscore, however the end_<tag> method works as
>expected.
>
>This interactive session shows the possible problem:
>
>ActivePython 1.6, build 100 (ActiveState Tool Corp.)
>based on Python 1.6b1 (#0, Aug 23 2000, 13:42:10) [MSC 32 bit (Intel)] on
>win32
>Copyright (c) Corporation for National Research Initiatives.
>Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam.
>>>> import sgmllib
>>>> class Parser(sgmllib.SGMLParser):
>...     def __getattr__(self, name):
>...             print name
>...             raise AttributeError
>...
>>>> Parser().feed("<foo>data</foo>")
>start_foo
>do_foo
>end_foo
>>>> Parser().feed("<foo_bar>data</foo_bar>")
>start_foo
>do_foo
>end_foo_bar
>>>>
>
>
>
>Regards,
>
>Fredrik Nehr
>
>
>
Generally, underscores are not legal characters in element names.
Note also that the scope of sgmllib is sgml as used in html and
therefore there are not a lot of modifications done to the reference
syntax.

I quote from 	http://www.groveware.com/~lee/papers/sgml97b6p/

"It is worth commenting on three aspects of the above system. First,
many programmers prefer using an underscore to mixed case; this works
moderately well in a fixed width typeface, but looks unbearable in
anything else, as the underscore is generally a full em wide. At any
rate, the underscore is not available as an SGML name character in the
reference concrete syntax. Only the dot and the hyphen, together with
ASCII letters and digits, may be used. If you can change the syntax to
allow the underscore, you can also change it to allow mixed case.
"


Bob



More information about the Python-list mailing list