Ubunu - Linux - Unicode - encoding

Paul Boddie paul at boddie.org.uk
Thu Feb 1 11:24:07 EST 2007


On 1 Feb, 16:02, Franz Steinhaeusler <franz.steinhaeus... at gmx.at>
wrote:
>
> The case:
> I have a file on a WindowsXP partition which has as contents german
> umlauts and the filename itself has umlauts like iÜüäßk.txt
>
> If I want to append this file to a list, I get somehow latin-1, cannot
> decode 'utf-8'.

You mean that you expect the filename in UTF-8, but it arrives as
ISO-8859-1 (Latin1)? How do you get the filename? Via Python standard
library functions or through a GUI toolkit? What does
sys.getfilesystemencoding report?

[...]

> Why is this setdefaultencoding otherwise not working on linux?

My impression was that you absolutely should not change the default
encoding. Instead, you should react to encoding information provided
by your sources of data. For example, sys.stdin.encoding tells you
about the data from standard input.

> (Also Filemanagers like Nautilus or Krusader cannot display the files
> correctly).

This sounds like a locale issue...

> Is there a system wide linux language setting (encoding), which I have
> to install and adjust?

I keep running into this problem when installing various
distributions. Generally, the locale needs to agree with the encoding
of the filenames in your filesystem, so that if you've written files
with UTF-8 filenames, you'll only see them with their proper names if
the locale you're using is based on UTF-8 - things like en_GB.utf8 and
de_AT.utf8 would be appropriate. Such locales are often optional
packages, as I found out very recently, and you may wish to look at
the language-pack-XX and language-pack-XX-base packages for Ubuntu
(substituting XX for your chosen language). Once they are installed,
typing "locale -a" will let you see available locales, and I believe
that changing /etc/environment and setting the LANG variable there to
one of the available locales may offer some kind of a solution.

Another thing I also discovered very recently, after doing a
debootstrap installation of Ubuntu, was that various terminals
wouldn't reproduce non-ASCII characters without an appropriate (UTF-8)
locale being set up, even though other desktop applications were happy
to accept and display the characters. I thought this was a keyboard
issue, compounded by the exotic nested X server plus User Mode Linux
solution I was experimenting with, but I think locales were the main
problem.

Paul




More information about the Python-list mailing list