Printing Filenames with non-Ascii-Characters

Marian Aldenhövel marian at mba-software.de
Wed Feb 2 04:12:05 EST 2005


Hi,

Thank you very much, you have collectively cleared up some of the confusion.

> English windows command prompt uses cp437 charset.

To be exact my Windows is german but I am not outputting to the command prompt
window. I am using eclipse with the pydev plugin as development platform and
the output is redirected to the console view in the IDE. I am not sure how
this affects the problem and have since tried a vanilla console too. The
problem stays the same, though.

I wonder what surprises are waiting for me when I first move this to my
linux-box :-). I believe it uses UTF-8 throughout.

 > print d.encode('cp437')

So I would have to specify the encoding on every call to print? I am sure to
forget and I don't like the program dying, in my case garbled output would be
much more acceptable.

Is there some global way of forcing an encoding instead of the default
'ascii'? I have found references to setencoding() but this seems to have gone
away.

> The issue is a terminal only understand certain character set.

I have experimented a bit now and I can make it work using encode(). The
eclipse console uses a different encoding than my windows command prompt, by
the way. I am sure this can be configured somewhere but I do not really care
at the moment.

 > If you have  unicode string, like d in your case, you have to encode it before
> it can be printed.

I got that now.

So encode() is a method of a unicode string, right?. I come from a background
of statically typed languages so I am a bit queasy when I am not allowed to
explicitly specify type.

How can I, maybe by print()-ing something, find out what type d actually is
of? Just to make sure and get a better feeling for the system?

Should d at any time not be a unicode string but some other flavour of string,
will encode() still work? Or do I need to write a function myPrint() that
distinguishes them by type and calls encode() only for unicode strings?

> So how did you get a unicoded d to start with?

I have asked myself this question before after reading the docs for
os.listdir(). But I have no way of finding out what type d really is (see
question above :-)). So I was dead-reckoning.

Can I force a string to be of a certain type? Like

     nonunicode=unicode.encode("specialencoding")

How would I do it the other way round? From encoded representation to full
unicode?

> If 'somepath' is unicode,  os.listdir returns a list of unicode. 
 > So why is somepath unicode?

 > One possible source is XML parser, which returns string in unicode.

I get a root-directory from XML and I walk the filesystem from there. That
explains it.

> Windows NT support unicode filename. I'm not sure about Linux. The 
> result maybe slightly differ.

I think I will worry about that later. I can create files using german umlauts
on the linux box. I am sure I will find a way to move those names into my
Python program.

I will not move data between the systems so there will not be much of
a problem.

Ciao, MM
-- 
Marian Aldenhövel, Rosenhain 23, 53123 Bonn. +49 228 624013.
http://www.marian-aldenhoevel.de
"There is a procedure to follow in these cases, and if followed it can
  pretty well guarantee a generous measure of success, success here
  defined as survival with major extremities remaining attached."



More information about the Python-list mailing list