[XML-SIG] [ pyxml-Bugs-497322 ] sgmlop and unicode charrefs

noreply@sourceforge.net noreply@sourceforge.net
Fri, 28 Dec 2001 05:47:17 -0800


Bugs item #497322, was opened at 2001-12-28 05:47
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=497322&group_id=6473

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: sgmlop and unicode charrefs

Initial Comment:
sgmlop has a problem with unicode character references
when the handler doesn't implement handle_charref (and
sgmlop forwards the call to handle_data). The following
test script shows the problem:
----
from xml.parsers import sgmlop

class Handler:
   def handle_data(self, data):
      print "handle_data", repr(data)

class Handler2(Handler):
   def handle_charref(self, data):
      print "handle_charref", repr(data)

p = sgmlop.XMLParser()
p.register(Handler())
p.parse("€")
p.register(Handler2())
p.parse("€")
----
The output is the following:
----
handle_data '\xac'
handle_charref '8364'
----
i.e. parsing works with Handler2, 
but not with Handler.

To fix this bug sgmlop has to return unicode objects.
(There's already a patch for that, see #412237 "sgmlop 
returns Unicode")


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=497322&group_id=6473