Unicode string output

Sun Jan 21 12:44:48 EST 2001

Michael Hudson wrote:
> 
> Michael =?iso-8859-1?Q?Str=F6der?= <michael at stroeder.com> writes:
> 
> > Michael Hudson wrote:
> > >
> > > "Alexander Kostyrkin" <avkost66 at f4.dion.ne.jp> writes:
> > >
> > > > Surprisingly printing a unicode string that contains a Japanese kanji
> > > > character raises an exception
> > > > For example
> > > >
> > > >     print u"\u55f4"
> > > > UnicodeError: ASCII encoding error: ordinal not in range(128)
> > > >
> > >
> > > print u"\u55f4".encode('kanji')
> >
> > How about this?
> >
> > >>> u"\u55f4".encode('utf-8')
> > '\345\227\264'
> 
> Indeed.  The answer is, I guess, "it depends", hence why Python forces
> you to decide rather than assuming it knows what you're trying to do.

One has to specify an encoding for the Unicode character set. So the
more complete answer (list of possibilities in standard Python)
would be:

u"\u55f4".encode('utf-8')
u"\u55f4".encode('utf-16')
u"\u55f4".encode('utf-16-be')

List to be continued by the Unicode experts here...

The displaying application has to turn this into Unicode characters
of a local encoding and display the Kanji symbols for the
characters.

When doing e.g. web programming using utf-8 might be the best guess.

Ciao, Michael.