a question about unicode in python

Evan Klitzke evan at yelp.com
Wed Jun 13 04:11:01 EDT 2007


On 6/13/07, Andre Engels <andreengels at gmail.com> wrote:
> 2007/6/12, WolfgangZ <wollez at gmx.net>:
> > hzqij schrieb:
> > > i have a python source code test.py
> > >
> > > # -*- coding: UTF-8 -*-
> > >
> > > # s is a unicode string, include chinese
> > > s = u'张三'
> > >
> > > then i run
> > >
> > > $ python test.py
> > > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1:
> > > invalid data
> > >
> > > by in python interactive, it is right
> > >
> > >>>> s = u'张三'
> > >
> > > why?
> > >
> > >
> >
> > just an idea: is your text editor really supporting utf-8? In the mail
> > it is only displayed as '??' which looks for me as the mail editor did
> > not send the mail as utf. Try to attach a correct text file.
>
> That must be your mail client, not his text editor or mail client. I
> do see two Chinese characters in the message.

Nonetheless, the email is not UTF-8 encoded (it's encoded in gb2312,
which is much more commonly used in China than UTF-8). It's likely
that the source code file is encoded using GB characters as well.


-- 
Evan Klitzke <evan at yelp.com>


More information about the Python-list mailing list