[issue5498] Can SGMLParser properly handle <empty/> tags?

once-off report at bugs.python.org
Tue Mar 17 12:19:36 CET 2009


New submission from once-off <once-off at mailinator.com>:

The attached script (sgml_error.py) was designed to output XML files
unchanged, other than expanding <empty/> tags into an opening and
closing tag, such as <empty></empty>.

It seems the SGMLParser class recognizes an empty tag, but does not emit
the closing tag until the NEXT forward slash it sees. So everything from
the forward slash in <empty/> (even the closing angle bracket) until the
next forward slash is considered to be textual data. See the following
line output.

Have I missed something here (like a conscious design limitation on the
class, an error on my part, etc), or is this really a bug with the class?

C:\Python24\Lib>python sgmllib.py H:\input.xml
start tag: <root>
data: '\n '
start tag: <tag1>
end tag: </tag1>
data: '\n '
start tag: <tag2>
data: '>\n <tag3>hello<'
end tag: </tag2>
data: 'tag3>\n'
end tag: </root>

C:\Python24\Lib>python
ActivePython 2.4.3 Build 12 (ActiveState Software Inc.) based on
Python 2.4.3 (#69, Apr 11 2006, 15:32:42) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sgml_error

Input:
<root>
 <tag1></tag1>
 <tag2/>
 <tag3>hello</tag3>
</root>

Output:
<root>
 <tag1></tag1>
 <tag2>>
 <tag3>hello<</tag2>tag3>
</root>

Expected:
<root>
 <tag1></tag1>
 <tag2></tag2>
 <tag3>hello</tag3>
</root>

----------
components: Extension Modules, Library (Lib), XML
files: sgml_error.py
messages: 83667
nosy: once-off
severity: normal
status: open
title: Can SGMLParser properly handle <empty/> tags?
type: behavior
versions: 3rd party, Python 2.4, Python 2.5
Added file: http://bugs.python.org/file13348/sgml_error.py

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5498>
_______________________________________


More information about the Python-bugs-list mailing list