locale.CODESET / different in python shell and scripts

Thu Apr 29 17:44:22 EDT 2004

On Thu, 29 Apr 2004 22:14:23 +0200, Martin v. Löwis wrote:
> PLEASE invoke
> 
> locale.setlocale(locale.LC_ALL, "")
> 
> before invoking nl_langinfo. Different C libraries behave differently
> in their nl_langinfo responses if setlocale hasn't been called.

Thanks a lot for your help! 

That solved (part of) the problem; now I get 'UTF-8' (which is correct) 
when running the following script (with either my self-compiled Python 
2.3 or Fedora's Python 2.2):

  #!/usr/bin/env python
  # -*- coding: UTF-8 -*-

  import locale

  locale.setlocale(locale.LC_ALL, "")
  encoding = locale.nl_langinfo(locale.CODESET)
  print encoding

Still, one problem remains: 

When I add the following line to the above script

  print u"schönes Mädchen".encode(encoding)

the result is:

  schönes Mädchen    (with my self-compiled Python 2.3)
  schÃ¶nes MÃ¤dchen  (with Fedora's Python 2.2)

I observed, that my Python gives me (the correct value) 15 for
len(u"schönes Mädchen") whereas Fedora's Python says 17 (one more
for each German umlaut, i.e. the len of the UTF-8 representation of
the string; observe, that the file uses the coding cookie for UTF-8).
Maybe Fedora's Python was compiled without Unicode support?

  (Is that even possible? I recall something about a UCS2 resp.
  UCS4 switch when compiling Python; but without Unicode support?
  And if it would be possible, shouldn't a Python without Unicode
  support disallow strings of the form u"..." resp. show a warning???)

This really drives me nuts because I thought the above approach
should be the correct way to assure that Python scripts can print
non-ASCII characters on any terminal (which is able to display 
those characters in some encoding as UTF-8, ISO-8859-x, ...).

Is there something I do utterly wrong here? 
Python can't be that complicated?

Nuff.