[XML-SIG] 'utf8' codec can't decode byte 0xc3 - bug in xmlproc?

Anders Norrbom anders at norrbom.info
Fri Aug 5 10:07:43 CEST 2005


That makes sense, any idea how to deal with it, flush the buffer somehow?
This is what the code looks like:

from xml.sax import make_parser
from xml.sax.handler import feature_namespaces, feature_validation
from xml.sax.handler import ContentHandler, ErrorHandler, DTDHandler
.
.
.
evalHandler = EvaluateKeyHandler()
parser = make_parser(['_xmlplus.sax.drivers2.drv_xmlproc'])
parser.setFeature(feature_validation, 1)
parser.setFeature(feature_namespaces, 0)
parser.setContentHandler(evalHandler)
parser.setErrorHandler(evalHandler)
f = open(file)
parser.parse(f)
.
.
.
def characters(self, content):
    self.keywordData += content



 >>> xmlproc.version
'0.70'
PyXML-0.8.3
Python 2.3.3
Red Hat Linux 3.3.3-7


Mike Brown wrote:

>Anders wrote:
>  
>
>>Im having a hard time debugging this error:
>>
>><somefile>:<row>:<char>: character set conversion problem: 'utf8' codec can't decode byte 0xc3 in position 65535: unexpected end of data
>>
>>The file Im trying to parse with xmlproc contains no illegal utf-8 byte 
>>sequences and this error does not occur when I switch to pyexpat. This 
>>is a hexdump of the row its complaining about:
>>00020030  64 65 73 20 6c c3 a8 76  72 65 73 20 42 6f 72 64  |des l..vres 
>>Bord|
>>Its nothing wrong with this bytesequence what I can see.
>>
>>Has anyone else experienced this problem and found a solution, all help 
>>appreciated.
>>    
>>
>
>Apparently it's a buffering issue; the stream it's decoding only consists of 
>2^16 bytes, and the last one is that c3. What does your python code look like?
>What platform/OS is this on, and what versions of Python and PyXML?
>
>  
>




More information about the XML-SIG mailing list