beginner - py unicode Q

John Machin sjmachin at lexicon.net
Sat Apr 7 22:42:46 EDT 2007


On Apr 8, 9:51 am, enquiring mind <braind... at braindead.com> wrote:
> I read the posting by Rehceb Rotkiv and response but don't know if it
> relates to my problem in any way.
>
> I only want to write German to the screen/console for little German
> programs/exercises in python.  No file w/r will be used.
>
> #! /usr/bin/env python
> # -*- coding: utf-8 -*-
> # Filename:  7P07png.py
> # SUSE Linux 10 Python 2.4.1 gedit 2.12.0
>
> print 'Ich zähle zwölf weiß Hüte.'
> print 'Wollen Sie'
> verbs = ( 'kömmen' , 'essen' , 'trinken' )
> print verbs[:3]
Note: the [:3] is redundant. "print verbs" would have the same effect.

When you do print list_tuple_dict_etc, Python prints the repr() of
each element. You are seeing repr('kömmen'). This is great for
debugging, to see exactly what you've got (\xc3\xb6 is the utf8
encoding for small o with diaeresis (aka umlaut)) but no so great for
presentation to the user.

To see the difference, insert here:
for v in verbs:
   print v
   print str(v)
   print repr(v)

>
> print ' program ends '
>
> console display is: Ich zähle zwölf weiß Hüte.
> Wollen Sie
> ('k\xc3\xb6mmen', 'essen', 'trinken')
> program ends
>
> The first 2 print statements in German print perfectly to screen/console
> but not the 3rd.
>
> I ran it with these lines below from Rehceb Rotkiv's code but it did not
> fix problem.
> import sys
> import codecs

Importing modules without using them is pointless.

>
> I also tried unicode string u'kömmen', but it did not fix problem.
> Any help/direction would be appreciated.  Thanks in advance.
>
> I found this reference section but I am not sure it applies or how to
> use it to solve my problem.:

It doesn't solve your problem. Forget you ever read it.

>
> This built in setdefaultencoding(name) sets the default codec used to
> encode and decode Unicode and string objects (normally ascii)and is
> meant to be called only from sitecustomize.py at program startup; the
> site module them removes this attribute from sys.  You can call
> reload(sys) to make this attriute available again but this is not a good
> programming practice.
>
> I just thought of this.  I suppose because this is py source code, it
> should not be German but a reference/key to u'strings' to print German
> text to the screen?

It's "German" only to a human who reads the console output and
recognizes the bunches of characters as representing German words/
phrases/sentences. Python and your computer see only utf8 encoding
(which can be used to represent multiple languages all at once on the
same screen or in the same paragraph of a document).

Your console is quite happy rendering utf8 e.g. it printed "Ich zähle
zwölf weiß Hüte" OK, didn't it? Try this:

print "blahblah"
print u"blahblah".encode('utf8')
print u"blahblah"

and see what happens.

HTH,
John




More information about the Python-list mailing list