Printing Filenames with non-Ascii-Characters

aurora aurora00 at gmail.com
Tue Feb 1 14:57:22 EST 2005


On Tue, 01 Feb 2005 20:28:11 +0100, Marian Aldenhövel  
<marian at mba-software.de> wrote:

> Hi,
>
> I am very new to Python and have run into the following problem. If I do
> something like
>
>    dir = os.listdir(somepath)
>    for d in dir:
>       print d
>       		
> The program fails for filenames that contain non-ascii characters.
>
>    'ascii' codec can't encode characters in position 33-34:
>
> I have noticed that this seems to be a very common problem. I have read  
> a lot
> of postings regarding it but not really found a solution. Is there a  
> simple
> one?

English windows command prompt uses cp437 charset. To print it, use

   print d.encode('cp437')

The issue is a terminal only understand certain character set. If you have  
unicode string, like d in your case, you have to encode it before it can  
be printed. (We really need native unicode terminal!!!) If you don't  
encode, Python will do it for you. The default encoding is ASCII. Any  
string that contains non-ASCII character will give you trouble. In my  
opinion Python is too conversative to use the 'strict' encoding which  
gives users unaware of unicode a lot of woes.

So how did you get a unicoded d to start with? If 'somepath' is unicode,  
os.listdir returns a list of unicode. So why is somepath unicode? Either  
you have entered a unicode literal or it comes from some other sources.  
One possible source is XML parser, which returns string in unicode.

Windows NT support unicode filename. I'm not sure about Linux. The result  
maybe slightly differ.






>
> What I specifically do not understand is why Python wants to interpret  
> the
> string as ASCII at all. Where is this setting hidden?
>
> I am running Python 2.3.4 on Windows XP and I want to run the program on
> Debian sarge later.
>
> Ciao, MM




More information about the Python-list mailing list