Unicode error in sax parser

Stefan Behnel stefan_ml at behnel.de
Wed Feb 9 04:14:58 EST 2011


Stefan Behnel, 09.02.2011 09:58:
> Rickard Lindberg, 09.02.2011 09:32:
>> On Tue, Feb 8, 2011 at 5:41 PM, Chris Rebert<clp2 at rebertia.com> wrote:
>>>> Here is a bash script to reproduce my error:
>>>
>>> Including the error message and traceback is still helpful, for future
>>> reference.
>>
>> Thanks for pointing it out.
>>
>>>> #!/bin/sh
>>>>
>>>> cat> å.timeline<<EOF
>>> <snip>
>>>> EOF
>>>>
>>>> python<<EOF
>>>> # encoding: utf-8
>>>> from xml.sax import parse
>>>> from xml.sax.handler import ContentHandler
>>>> parse(u"å.timeline", ContentHandler())
>>>> EOF
>>>>
>>>> If I instead do
>>>>
>>>> parse(u"å.timeline".encode("utf-8"), ContentHandler())
>>>>
>>>> the script runs without errors.
>>>>
>>>> Is this a bug or expected behavior?
>>>
>>> Bug; open() figures out the filesystem encoding just fine.
>>> Bug tracker to report the issue to: http://bugs.python.org/
>>>
>>> Workaround:
>>> parse(open(u"å.timeline", 'r'), ContentHandler())
>>
>> When I tried your workaround, I still got this error:
>>
>> Traceback (most recent call last):
>> File "<stdin>", line 4, in<module>
>> File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/__init__.py",
>> line 31, in parse
>> parser.parse(filename_or_stream)
>> File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py",
>> line 109, in parse
>> xmlreader.IncrementalParser.parse(self, source)
>> File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/xmlreader.py",
>> line 119, in parse
>> self.prepareParser(source)
>> File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py",
>> line 121, in prepareParser
>> self._parser.SetBase(source.getSystemId())
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in
>> position 0: ordinal not in range(128)
>>
>> The open(..) part works fine, but there still seems to be a problem
>> inside the
>> sax parser.
>
> Did you read my reply?

Sorry, it was me who failed to read your question properly.

Unicode file names aren't really working well, especially not in Py2.x. 
Python 3.2 provides many improvements here.

I assume your file system encoding is UTF-8? What does 
sys.getfilesystemencoding() give you?

Stefan




More information about the Python-list mailing list