Printing Filenames with non-Ascii-Characters
Marian Aldenhövel
marian at mba-software.de
Wed Feb 2 04:12:05 EST 2005
Hi,
Thank you very much, you have collectively cleared up some of the confusion.
> English windows command prompt uses cp437 charset.
To be exact my Windows is german but I am not outputting to the command prompt
window. I am using eclipse with the pydev plugin as development platform and
the output is redirected to the console view in the IDE. I am not sure how
this affects the problem and have since tried a vanilla console too. The
problem stays the same, though.
I wonder what surprises are waiting for me when I first move this to my
linux-box :-). I believe it uses UTF-8 throughout.
> print d.encode('cp437')
So I would have to specify the encoding on every call to print? I am sure to
forget and I don't like the program dying, in my case garbled output would be
much more acceptable.
Is there some global way of forcing an encoding instead of the default
'ascii'? I have found references to setencoding() but this seems to have gone
away.
> The issue is a terminal only understand certain character set.
I have experimented a bit now and I can make it work using encode(). The
eclipse console uses a different encoding than my windows command prompt, by
the way. I am sure this can be configured somewhere but I do not really care
at the moment.
> If you have unicode string, like d in your case, you have to encode it before
> it can be printed.
I got that now.
So encode() is a method of a unicode string, right?. I come from a background
of statically typed languages so I am a bit queasy when I am not allowed to
explicitly specify type.
How can I, maybe by print()-ing something, find out what type d actually is
of? Just to make sure and get a better feeling for the system?
Should d at any time not be a unicode string but some other flavour of string,
will encode() still work? Or do I need to write a function myPrint() that
distinguishes them by type and calls encode() only for unicode strings?
> So how did you get a unicoded d to start with?
I have asked myself this question before after reading the docs for
os.listdir(). But I have no way of finding out what type d really is (see
question above :-)). So I was dead-reckoning.
Can I force a string to be of a certain type? Like
nonunicode=unicode.encode("specialencoding")
How would I do it the other way round? From encoded representation to full
unicode?
> If 'somepath' is unicode, os.listdir returns a list of unicode.
> So why is somepath unicode?
> One possible source is XML parser, which returns string in unicode.
I get a root-directory from XML and I walk the filesystem from there. That
explains it.
> Windows NT support unicode filename. I'm not sure about Linux. The
> result maybe slightly differ.
I think I will worry about that later. I can create files using german umlauts
on the linux box. I am sure I will find a way to move those names into my
Python program.
I will not move data between the systems so there will not be much of
a problem.
Ciao, MM
--
Marian Aldenhövel, Rosenhain 23, 53123 Bonn. +49 228 624013.
http://www.marian-aldenhoevel.de
"There is a procedure to follow in these cases, and if followed it can
pretty well guarantee a generous measure of success, success here
defined as survival with major extremities remaining attached."
More information about the Python-list
mailing list