sax barfs on unicode filenames

Edward K. Ream edreamleo at charter.net
Wed Oct 4 09:37:49 EDT 2006


Hi.  Presumably this is a easy question, but anyone who understands the sax 
docs thinks completely differently than I do :-)



Following the usual cookbook examples, my app parses an open file as 
follows::



parser = xml.sax.make_parser()

parser.setFeature(xml.sax.handler.feature_external_ges,1)

# Hopefully the content handler can figure out the encoding from the <?xml> 
element.

handler = saxContentHandler(c,inputFileName,silent)

parser.setContentHandler(handler)

parser.parse(theFile)



Here 'theFile' is an open file.  Usually this works just fine, but when the 
filename contains u'\u8116' I get the following exception:



Traceback (most recent call last):



  File "c:\prog\tigris-cvs\leo\src\leoFileCommands.py", line 2159, in 
parse_leo_file

    parser.parse(theFile)



  File "c:\python25\lib\xml\sax\expatreader.py", line 107, in parse

    xmlreader.IncrementalParser.parse(self, source)



  File "c:\python25\lib\xml\sax\xmlreader.py", line 119, in parse

    self.prepareParser(source)



  File "c:\python25\lib\xml\sax\expatreader.py", line 111, in prepareParser

    self._parser.SetBase(source.getSystemId())



UnicodeEncodeError: 'ascii' codec can't encode character u'\u8116' in 
position 44: ordinal not in range(128)



Presumably the documentation at:



http://docs.python.org/lib/module-xml.sax.xmlreader.html



would be sufficient for a sax-head, but I have absolutely no idea of how to 
create an InputSource that can handle non-ascii filenames.



Any help would be appreciated.  Thanks!



Edward
--------------------------------------------------------------------
Edward K. Ream   email:  edreamleo at charter.net
Leo: http://webpages.charter.net/edreamleo/front.html
--------------------------------------------------------------------





More information about the Python-list mailing list