UTF-8 to unicode or latin-1 (and yes, I read the FAQ)

NoelByron at gmx.net NoelByron at gmx.net
Thu Oct 19 05:56:20 EDT 2006


> >
> > 'K\xc3\xb6ni'.decode('utf-8')     # 'K\xc3\xb6ni' should be 'König',
>
> "Köni", to be precise.

Äh, yes.
;o)

> > contains a german 'umlaut'
> >
> > but failed since python assumes every string to decode to be ASCII?
>
> that should work, and it sure works for me:
>
>  >>> s = 'K\xc3\xb6ni'.decode('utf-8')
>  >>> s
> u'K\xf6ni'
>  >>> print s
> Köni
>
> what did you do, and how did it fail?

First, thank you so much for answering so fast. I proposed python for a
project and it would be very embarrassing for me if I would fail
converting a UTF-8 string to latin-1.

I realized that my problem ist not the decode to UTF-8. The exception
is raised by print if I try to print the unicode string.

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in
position 1: ordinal not in range(128)

But that is not a problem at all since I can now turn my UTF-8 strings
to unicode! Once again the problem was sitting right in front of my
screen. Silly me...
;o)

Again, thank you for your reply!

Best regards,
Noel




More information about the Python-list mailing list