Ubunu - Linux - Unicode - encoding

Franz Steinhäusler franz.steinhaeusler at gmx.at
Thu Feb 1 15:05:55 EST 2007


On 1 Feb 2007 08:24:07 -0800, "Paul Boddie" <paul at boddie.org.uk>
wrote:

>On 1 Feb, 16:02, Franz Steinhaeusler <franz.steinhaeus... at gmx.at>
>wrote:
>>
>> The case:
>> I have a file on a WindowsXP partition which has as contents german
>> umlauts and the filename itself has umlauts like iÜüäßk.txt
>>
>> If I want to append this file to a list, I get somehow latin-1, cannot
>> decode 'utf-8'.
>
>You mean that you expect the filename in UTF-8, but it arrives as
>ISO-8859-1 (Latin1)? How do you get the filename? Via Python standard
>library functions or through a GUI toolkit? What does
>sys.getfilesystemencoding report?

Hello Paul,

I set the sysencoding already to 'latin-1', but obviously the value
is ignored and it takes 'utf-8' (?)

I get it with
thelist = os.listdir(directory) and the directory is a string, not
unicode.

>
>[...]
>
>> Why is this setdefaultencoding otherwise not working on linux?
>
>My impression was that you absolutely should not change the default
>encoding. 

Aha.


>Instead, you should react to encoding information provided
>by your sources of data. For example, sys.stdin.encoding tells you
>about the data from standard input.
>
>> (Also Filemanagers like Nautilus or Krusader cannot display the files
>> correctly).
>
>This sounds like a locale issue...

Hm, a setting in linux.

>
>> Is there a system wide linux language setting (encoding), which I have
>> to install and adjust?
>
>I keep running into this problem when installing various
>distributions. Generally, the locale needs to agree with the encoding
>of the filenames in your filesystem, so that if you've written files
>with UTF-8 filenames, you'll only see them with their proper names if
>the locale you're using is based on UTF-8 - things like en_GB.utf8 and
>de_AT.utf8 would be appropriate. Such locales are often optional
>packages, as I found out very recently, and you may wish to look at
>the language-pack-XX and language-pack-XX-base packages for Ubuntu
>(substituting XX for your chosen language). Once they are installed,
>typing "locale -a" will let you see available locales, and I believe
>that changing /etc/environment and setting the LANG variable there to
>one of the available locales may offer some kind of a solution.

Ah thank you very much for that enlightment!

>
>Another thing I also discovered very recently, after doing a
>debootstrap installation of Ubuntu, was that various terminals
>wouldn't reproduce non-ASCII characters without an appropriate (UTF-8)
>locale being set up, even though other desktop applications were happy
>to accept and display the characters.

That sound familar to me! ;)

> I thought this was a keyboard
>issue, compounded by the exotic nested X server plus User Mode Linux
>solution I was experimenting with, but I think locales were the main
>problem.
>
>Paul

So that is not exactly simple. :)

Thank you very much for this precise answer!
-- 
Franz Steinhaeusler



More information about the Python-list mailing list