Printing Filenames with non-Ascii-Characters

vincent wehren vincent at visualtrans.de
Wed Feb 2 15:21:26 EST 2005


Marian Aldenhövel wrote:
> 
> But wouldn't that be correct in my case?
> 

This is what I get inside Eclipse using pydev when I run:

<code>
import os
dirname = "c:/test"
print dirname
for fname in os.listdir(dirname):
     print fname
     if os.path.isfile(fname):
         print fname
</code>:

c:/test
straßenschild.png
test.py
Übersetzung.rtf


This is what I get passing a unicode argument to os.listdir:

<code>
import os
dirname = u"c:/test"
print dirname # will print fine, all ascii subset compatible
for fname in os.listdir(dirname):
     print fname
     if os.path.isfile(fname):
         print fname
</code>

c:/test
Traceback (most recent call last):
   File "C:\Programme\eclipse\workspace\myFirstProject\pythonFile.py", 
line 5, in ?
     print fname
UnicodeEncodeError: 'ascii' codec can't encode character u'\xdf' in 
position 4: ordinal not in range(128)

which is probably what you are getting, right?

You are trying to write *Unicode* objects containing characters outside 
of the 0-128 to a multi byte-oriented output without telling Python the 
appropriate encoding to use. Inside eclipse, Python will always use 
ascii and never guess.

import os
dirname = u"c:/test"
print dirname
for fname in os.listdir(dirname):
     print type(fname)

c:/test
<type 'unicode'>
<type 'unicode'>
<type 'unicode'>



so finally:
<code>
import os
dirname = u"c:/test"
print dirname
for fname in os.listdir(dirname):
     print fname.encode("mbcs")
</code>

gives:

c:/test
straßenschild.png
test.py
Übersetzung.rtf

Instead of "mbcs", which should be available on all Windows systems, you 
could have used "cp1252" when working on a German locale; inside Eclipse 
even "utf-16-le" would work, underscoring that the way the 'output 
device' handles encodings is decisive. I know this all seems awkward at 
first, but Python's drive towards uncompromising explicitness pays off 
big time when you're dealing with multilingual data.

--
Vincent Wehren







More information about the Python-list mailing list