locale.CODESET / different in python shell and scripts
Nuff Said
nuffsaid at phreaker.net
Thu Apr 29 17:44:22 EDT 2004
On Thu, 29 Apr 2004 22:14:23 +0200, Martin v. Löwis wrote:
> PLEASE invoke
>
> locale.setlocale(locale.LC_ALL, "")
>
> before invoking nl_langinfo. Different C libraries behave differently
> in their nl_langinfo responses if setlocale hasn't been called.
Thanks a lot for your help!
That solved (part of) the problem; now I get 'UTF-8' (which is correct)
when running the following script (with either my self-compiled Python
2.3 or Fedora's Python 2.2):
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import locale
locale.setlocale(locale.LC_ALL, "")
encoding = locale.nl_langinfo(locale.CODESET)
print encoding
Still, one problem remains:
When I add the following line to the above script
print u"schönes Mädchen".encode(encoding)
the result is:
schönes Mädchen (with my self-compiled Python 2.3)
schönes Mädchen (with Fedora's Python 2.2)
I observed, that my Python gives me (the correct value) 15 for
len(u"schönes Mädchen") whereas Fedora's Python says 17 (one more
for each German umlaut, i.e. the len of the UTF-8 representation of
the string; observe, that the file uses the coding cookie for UTF-8).
Maybe Fedora's Python was compiled without Unicode support?
(Is that even possible? I recall something about a UCS2 resp.
UCS4 switch when compiling Python; but without Unicode support?
And if it would be possible, shouldn't a Python without Unicode
support disallow strings of the form u"..." resp. show a warning???)
This really drives me nuts because I thought the above approach
should be the correct way to assure that Python scripts can print
non-ASCII characters on any terminal (which is able to display
those characters in some encoding as UTF-8, ISO-8859-x, ...).
Is there something I do utterly wrong here?
Python can't be that complicated?
Nuff.
More information about the Python-list
mailing list