[XML-SIG] XML Error? (Workaround found)
Schollnick, Benjamin
Benjamin.Schollnick@usa.xerox.com
Fri, 14 Sep 2001 08:56:34 -0400
Here's a follow up...
I suspect, but have not conclusively proven, that with my configuration
any character higher than Chr(127), may cause the xml parser to fail, with
an exception.
Right now, I'm forced to build a santization string from chr(128) -
chr(255),
and am attempting to use RE.sub to filter those characters out.
Is there an simplier way? Why isn't the XML parser dealing with the
high-ascii
codes, especially since they are in the CDATA string?
- Benjamin
-----Original Message-----
From: Schollnick, Benjamin [mailto:Benjamin.Schollnick@usa.xerox.com]
Sent: Friday, September 14, 2001 8:45 AM
To: 'xml-sig@python.org'
Subject: [XML-SIG] XML Error? (Workaround found)
Folks,
I'm having some problems here with some XML code...
I'm using Python v2.0 (#8, Oct 16 2000, 17:27:58) [MSC 32 Bit
(Intel)]
on a Win NT 4 platform...
This XML, causes a xml.sax._exceptions.SAXParseException:
(<unknown>:26:38: not well-formed)
Traceback (most recent call last):
File "C:\develope\docushare_db\test.py", line 85, in ?
decoded_data = ds_workspace.decode_ds_propfind (xml_data)
File "..\docushare\ds_workspace.py", line 402, in decode_ds_propfind
Docushare_object.transfer_xml_to_ds_workspace ( data )
File "..\docushare\ds_workspace.py", line 147, in
transfer_xml_to_ds_workspace
xml_obj = XML_Objectify ( tempfile_name )
File "E:\ds_reminder\xml_objectify.py", line 218, in __init__
self._dom = minidom.parseString(self._fh.read())
File "c:\progra~1\python20\lib\xml\dom\minidom.py", line 475, in
parseString
return _doparse(pulldom.parseString, args, kwargs)
File "c:\progra~1\python20\lib\xml\dom\minidom.py", line 465, in _doparse
toktype,
rootNode = events.getEvent()
File "c:\progra~1\python20\lib\xml\dom\pulldom.py", line 187, in getEvent
self.parser.feed(buf)
File "c:\program files\python20\lib\xml\sax\expatreader.py", line 85, in
feed
self._err_handler.fatalError(exc)
File "c:\program files\python20\lib\xml\sax\handler.py", line 38, in
fatalError
raise exception
xml.sax._exceptions.SAXParseException: <unknown>:26:38: not well-formed
Any idea why? I've included the dump of the XML in a attached file
(error.1, standard
ASCII) because it appears to have some "high-ascii" characters that are not
being
reproduced in the CUT 'N paste that I've included here...
Actually, I just found it... The High-Ascii Characters did indeed break the
XML
parsing....$92, $93, $94 specially....
xml_data = string.replace (xml_data, chr(146), "")
xml_data = string.replace (xml_data, chr(147), "")
xml_data = string.replace (xml_data, chr(148), "")
Any work arounds beyond this, or plans to fix this implementation? (They
are contained in
the CDATA portion).
- Benjamin
<?xml version="1.0" ?>
<multistatus>
<response>
<href>http://xww.psg-techservices.world.xerox.com/docushare/File-1194</href>
<propstat>
<prop>
<children/>
<parents>
<dsref handle="Collection-193"><displayname>Past AI's</displayname></dsref>
</parents>
<displayname>1999_10_26 AIs</displayname>
<summary/>
<description>Action items from 10/26 staff meeting</description>
<keywords>File</keywords>
<entityowner><dsref handle="User-68"><displayname>Melissa
Gydesen</displayname><username>mgydesen</username></dsref></entityowner>
<creationdate>1999-10-26T21:13:21Z</creationdate>
<getlastmodified>Fri, 05 Jan 2001 19:15:36 GMT</getlastmodified>
<modified_by><dsref handle="User-142"><displayname>Margo
Forsythe</displayname><username>mforsythe</username></dsref></modified_by>
<parents>
<dsref handle="Collection-193"><displayname>Past AI's</displayname></dsref>
</parents>
<children/>
<subscriptions/>
<abstract><![CDATA[- Open Action Items System Test Staff Meeting From
10/26/99 Meeting - Rob Open 10/5 11/9 Log on to the file servers and
cleanup personal files Melissa Open 10/5 11/9 Archive Test Jobs to CD-ROM
Joe / Melissa Open 10/12 11/2 Talk to PSG-IM regarding any processes to
track/maintain software license data - Objective: How can we find out. -
EVERYONE Open 10/26 10/29 Review Don's common client needs list and provide
feedback EVERYONE Open 10/26 11/1 Send a note to Javier concerning his role
in the next staff meeting Don Open 10/26 10/29 Send the "CAS Sign-off"
document to Brian Middlecoat Ming Open ]]></abstract>
<document>actionitems102699.doc</document>
<getcontenttype>application/msword</getcontenttype>
<author>mgydesen</author>
<max_versions>1</max_versions>
<lockdiscovery/>
<getcontentlength>29184</getcontentlength>
</prop>
<status>HTTP/1.1 200 OK</status>
</propstat>
</response>
</multistatus>
<<error.1>>