sax barfs on unicode filenames

Diez B. Roggisch deets at nospam.web.de
Wed Oct 4 10:23:52 EDT 2006


Edward K. Ream wrote:

> Hi.  Presumably this is a easy question, but anyone who understands the
> sax docs thinks completely differently than I do :-)
> 
> 
> 
> Following the usual cookbook examples, my app parses an open file as
> follows::
> 
> 
> 
> parser = xml.sax.make_parser()
> 
> parser.setFeature(xml.sax.handler.feature_external_ges,1)
> 
> # Hopefully the content handler can figure out the encoding from the
> # <?xml>
> element.
> 
> handler = saxContentHandler(c,inputFileName,silent)
> 
> parser.setContentHandler(handler)
> 
> parser.parse(theFile)
> 
> 
> 
> Here 'theFile' is an open file.  Usually this works just fine, but when

Filenames are expected to be bytestrings. So what happens is that the
unicode string you pass as filename gets implicitly converted using the
default encoding.

You have to encode the unicode string according to your filesystem
beforehand.

Diez




More information about the Python-list mailing list