Printing Filenames with non-Ascii-Characters

vincent wehren vincent at visualtrans.de
Tue Feb 1 16:26:22 EST 2005


Marian Aldenhövel wrote:
> Hi,
> 
> I am very new to Python and have run into the following problem. If I do
> something like
> 
>   dir = os.listdir(somepath)
>   for d in dir:
>      print d
>             
> The program fails for filenames that contain non-ascii characters.
> 
>   'ascii' codec can't encode characters in position 33-34:

If you read this carefully, you'll notice that Python has tried and 
failed to *encode* a decoded ( = unicode) string using the 'ascii' 
codec. IOW, d seems to be bound to a unicode string. Which is unexpected 
unless maybe the argument passed to os.listdir (somepath) is a Unicode 
string, too. (If given a Unicode string as argument, os.listdir will 
return the list as a list of unicode names).

If you're printing to the console, modern Pythons will try to guess the 
console's encoding (e.g. cp850). I would expect a UnicodeEncodeError if 
the print fails because the characters do not map to the console's 
encoding, not the error you're seeing.

How *are* you running the program. In the console (cmd.exe)? Or from 
some IDE?

> 
> I have noticed that this seems to be a very common problem. I have read 
> a lot
> of postings regarding it but not really found a solution. Is there a simple
> one?
> 
> What I specifically do not understand is why Python wants to interpret the
> string as ASCII at all. Where is this setting hidden?

Don't be tempted to ever change sys.defaultencoding in site.py, this is 
site specific, meaning that if you ever distribute them, programs 
relying on this setting may fail on other people's Python installations.

--
Vincent Wehren

> 
> I am running Python 2.3.4 on Windows XP and I want to run the program on
> Debian sarge later.
> 
> Ciao, MM



More information about the Python-list mailing list