Managing non-ascii filenames in python

"Martin v. Löwis" martin at v.loewis.de
Mon Jul 20 02:27:20 EDT 2009


> I thought the correct way to do this in python would be to scan the
> dir
> files=os.listdir(os.path.dirname( os.path.realpath( __file__ ) ))
> 
> then print the filenames
> for filename in files:
>   print filename
> 
> but as expected teh filename is not correct - so correct it using the
> file sysytems encoding
> 
>   print filename.decode(sys.getfilesystemencoding())
> 
> But I get
> UnicodeEncodeError: 'charmap' codec can't encode character u'\u2014'
> in position 6: character maps to <undefined>

As a starting point, you shouldn't be using byte-oriented APIs to
access files on Windows; the specific byte-oriented API is os.listdir,
when passed a directory represented as a byte string.

So try:

   dirname = os.path.dirname(os.path.realpath(__file__))
   dirname = dirname.decode(sys.getfilesystemencoding()
   files = os.listdir(dirname)

This should give you the files as Unicode strings.

> I need to be able to write (a representation) to the screen (and I
> don't see why I should not get something as good as DOS shows).

The command window (it's not really DOS anymore) uses the CP_OEMCP
encoding, which is not available in Python. This does all the
transliteration also, so you would have to write an extension module
if you want to get the same transliteration (or try to get to the
OEMCP encoding through ctypes).

If you can live with a simpler transliteration, try

  print filename.encode(sys.stdout.encoding, "replace")

> Write it to an XML file in UTF-8
> 
> and write it to a text file and be able to read it back in.
> Again I was supprised that this was also difficult - it appears that
> the file also wanted ascii.  Should I have to open the file in binary
> for write (I expect so) but then what encoding should I write in?

You need to tell us how precisely you tried to do this. My guess is:
if you now try again, with the filenames being Unicode strings, it
will work fairly well.

Regards,
Martin



More information about the Python-list mailing list