[XML-SIG] XML Error? (Workaround found)

Schollnick, Benjamin Benjamin.Schollnick@usa.xerox.com
Fri, 14 Sep 2001 08:56:34 -0400


Here's a follow up...

I suspect, but have not conclusively proven, that with my configuration
any character higher than Chr(127), may cause the xml parser to fail, with
an exception.

Right now, I'm forced to build a santization string from chr(128) -
chr(255), 
and am attempting to use RE.sub to filter those characters out.

Is there an simplier way?  Why isn't the XML parser dealing with the
high-ascii
codes, especially since they are in the CDATA string?

		- Benjamin

-----Original Message-----
From: Schollnick, Benjamin [mailto:Benjamin.Schollnick@usa.xerox.com]
Sent: Friday, September 14, 2001 8:45 AM
To: 'xml-sig@python.org'
Subject: [XML-SIG] XML Error? (Workaround found)


Folks,

	I'm having some problems here with some XML code...

	I'm using Python v2.0 (#8, Oct 16 2000, 17:27:58) [MSC 32 Bit
(Intel)]
	on a Win NT 4 platform...

	This XML, causes a xml.sax._exceptions.SAXParseException:
(<unknown>:26:38: not well-formed)


Traceback (most recent call last):
  File "C:\develope\docushare_db\test.py", line 85, in ?
    decoded_data    = ds_workspace.decode_ds_propfind (xml_data)
  File "..\docushare\ds_workspace.py", line 402, in decode_ds_propfind
    Docushare_object.transfer_xml_to_ds_workspace ( data )
  File "..\docushare\ds_workspace.py", line 147, in
transfer_xml_to_ds_workspace

    xml_obj     = XML_Objectify ( tempfile_name )
  File "E:\ds_reminder\xml_objectify.py", line 218, in __init__
    	self._dom = minidom.parseString(self._fh.read())
  File "c:\progra~1\python20\lib\xml\dom\minidom.py", line 475, in
parseString 
	return _doparse(pulldom.parseString, args, kwargs)
  File "c:\progra~1\python20\lib\xml\dom\minidom.py", line 465, in _doparse
toktype, 
	rootNode = events.getEvent()
  File "c:\progra~1\python20\lib\xml\dom\pulldom.py", line 187, in getEvent 
	self.parser.feed(buf)
  File "c:\program files\python20\lib\xml\sax\expatreader.py", line 85, in
feed  
	self._err_handler.fatalError(exc)
  File "c:\program files\python20\lib\xml\sax\handler.py", line 38, in
fatalError
    	raise exception
xml.sax._exceptions.SAXParseException: <unknown>:26:38: not well-formed
	

Any idea why?  I've included the dump of the XML in a attached file
(error.1, standard
ASCII) because it appears to have some "high-ascii" characters that are not
being
reproduced in the CUT 'N paste that I've included here...

Actually, I just found it... The High-Ascii Characters did indeed break the
XML
parsing....$92, $93, $94 specially....

	xml_data = string.replace (xml_data, chr(146), "")
	xml_data = string.replace (xml_data, chr(147), "")
	xml_data = string.replace (xml_data, chr(148), "")

Any work arounds beyond this, or plans to fix this implementation?  (They
are contained in
the CDATA portion).

		- Benjamin

<?xml version="1.0" ?>
<multistatus>
  <response>
 
<href>http://xww.psg-techservices.world.xerox.com/docushare/File-1194</href>
    <propstat>
     <prop>
<children/>
<parents>
<dsref handle="Collection-193"><displayname>Past AI's</displayname></dsref>
</parents>
  <displayname>1999_10_26 AIs</displayname>
  <summary/>
  <description>Action items from 10/26 staff meeting</description>
  <keywords>File</keywords>
<entityowner><dsref handle="User-68"><displayname>Melissa
Gydesen</displayname><username>mgydesen</username></dsref></entityowner>
<creationdate>1999-10-26T21:13:21Z</creationdate>
<getlastmodified>Fri, 05 Jan 2001 19:15:36 GMT</getlastmodified>
<modified_by><dsref handle="User-142"><displayname>Margo
Forsythe</displayname><username>mforsythe</username></dsref></modified_by>
<parents>
<dsref handle="Collection-193"><displayname>Past AI's</displayname></dsref>
</parents>
<children/>
<subscriptions/>
  <abstract><![CDATA[- Open Action Items System Test Staff Meeting From
10/26/99 Meeting  - Rob Open 10/5 11/9 Log on to the file servers and
cleanup personal files Melissa Open 10/5 11/9 Archive Test Jobs to CD-ROM
Joe / Melissa Open 10/12 11/2 Talk to PSG-IM regarding any processes to
track/maintain software license data - Objective: How can we find out.  -
EVERYONE Open 10/26 10/29 Review Don's common client needs list and provide
feedback EVERYONE Open 10/26 11/1 Send a note to Javier concerning his role
in the next staff meeting Don Open 10/26 10/29 Send the "CAS Sign-off"
document to Brian Middlecoat Ming Open ]]></abstract>
  <document>actionitems102699.doc</document>
  <getcontenttype>application/msword</getcontenttype>
  <author>mgydesen</author>
<max_versions>1</max_versions>
<lockdiscovery/>
<getcontentlength>29184</getcontentlength>
     </prop>
     <status>HTTP/1.1 200 OK</status>
    </propstat>
  </response>
</multistatus>

 <<error.1>>