[Python-bugs-list] PRIVATE: xmllib.XMLParser.handle_data() seems to handle ']' incorrectly (PR#63)
mkes@ra.rockwell.com
mkes@ra.rockwell.com
Wed, 25 Aug 1999 08:37:50 -0400 (EDT)
Full_Name: Miroslav Kes
Version: 1.5.2
OS: FreeBSD 3.2
Submission from: (NULL) (205.175.223.11)
Hi!
I have experienced following strange behaviour of
xmllib.XMLParser.handle_data()
method.
If I have XML tag whose body contains ']' the handle_data() method considers
the ']' as separator (or what ?) and splits the whole text into pieces:
the source code:
-------------------
import xmllib
import sys
class MyXMLParser( xmllib.XMLParser ):
i = 0
def __init__( self ):
xmllib.XMLParser.__init__( self )
self.i = 0
def unknown_starttag( self, tag, attributes ):
print 'start tag: ' + tag + str( attributes )
def unknown_endtag( self, tag ):
print 'end tag: ' + tag
def handle_data( self, data ):
print self.i
print type( data )
print data
self.i = self.i + 1
def run( self, filename ):
self.__init__()
file = open( filename, 'r' )
self.feed( file.read())
self.close()
file.close()
----------------------------------
the XML file sample:
----------------------------------
<?xml version="1.0"?>
<TEST>
<NAME>
Conversion from web speed [fpm] to motor speed [rpm] wrong (reference
calibration)
</NAME>
</TEST>
---------------------------------
runtime:
---------------------------------
odysseus:/usr/home/mira/engine> python
Python 1.5.2 (#2, May 11 1999, 17:14:37) [GCC 2.7.2.1] on freebsd3
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import xmlparser
>>> p = xmlparser.MyXMLParser()
>>> p.run("test.xml")
0
<type 'string'>
start tag: TEST{}
1
<type 'string'>
start tag: NAME{}
2
<type 'string'>
Conversion from web speed [fpm
3
<type 'string'>
]
4
<type 'string'>
to motor speed [rpm
5
<type 'string'>
]
6
<type 'string'>
wrong (reference calibration)
end tag: NAME
7
<type 'string'>
end tag: TEST
8
<type 'string'>
>>>
----------------------------------------
I think this is a bug because the XML specification treats the ']'
as valid data character.
W3C spec. - [14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)
If I parse the example above with other XML parsers (MSXML and one Java based
parser)
and they read it OK.
Mira