[Python-Dev] Shouldn't I be able to print Unicode objects?

M.-A. Lemburg mal@lemburg.com
Tue, 05 Jun 2001 23:00:23 +0200


Skip Montanaro wrote:
> 
>     mal> Please see Lib/site.py for details on how to enable all these
>     mal> goodies -- it's all there, just disabled and meant for super-users
>     mal> only ;-)
> 
> Okay, I found the encoding section.  I changed the encoding variable
> assignment to be
> 
>     encoding = "latin1"
> 
> and now the degree sign print works.  What other side-effects will that have
> besides on printed representations?  It appears I can create (but not see
> properly?)  variable names containing latin1 characters:
> 
> >>> ümlaut = "ümlaut"

Huh ? That should not be possible ! Python literals are still
ASCII.

>>> ümlaut = 'ümlaut'
  File "<stdin>", line 1
    ümlaut = 'ümlaut'
    ^
SyntaxError: invalid syntax

> >>> print locals().keys()
> ['orca', 'dir', '__doc__', 'rlcompleter', 'missionb', 'version', 'dirpat', 'xmlrpclib', 'belugab', '__builtin__', 'beluga', 'readline', '__name__', 'orcab', 'addpath', 'Writer', 'atexit', 'sys', 'dolphinb', 'mission', 'pprint', 'dolphin', '__builtins__', 'mlaut', 'help']
> 
> I am having trouble printing some strings containing latin1 characters:
> 
> >>> print ümlaut
> mlaut
> >>> type("ümlaut")
> <type 'string'>
> >>> type(string.letters)
> <type 'string'>
> >>> print "ümlaut"
> mlaut
> >>> print string.letters
> abcdefghijklmnopqrstuvwxyzµßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿABCDEFGHIJKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ
> >>> print string.letters[55:]
> üýþÿABCDEFGHIJKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ
> 
> The above was pasted from Python running in a shell session in XEmacs, which
> is certainly latin1-aware.  Why did I have trouble seeing the ü in some
> situations, but not in others?

No idea what's going on there... the encoding parameter should
not have any effect on printing normal 8-bit strings. It only
defines the standard encoding used in coercion and auto-conversion
from Unicode to 8-bit strings and vice-versa.

> Are the ramifications of all this encoding stuff documented somewhere?

The basic things can be found in Misc/unicode.txt, on the i18n sig 
page and some resources on the web. I'll give a talk in Bordeaux about
Unicode too, which will probably provide some additional help
as well.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/