can compile function have a bug?

John Machin sjmachin at lexicon.net
Mon Oct 9 05:54:14 EDT 2006


Peter Otten wrote:
> ygao wrote:
>
> >>>> compile('U"中"','c:/test','single')
> > <code object ? at 00F06B60, file "c:/test", line 1>
> >>>> d=compile('U"中"','c:/test','single')
> >>>> d
> > <code object ? at 00F06BA0, file "c:/test", line 1>
> >>>> exec(d)
> > u'\xd6\xd0'
> >>>> U"中"
> > u'\u4e2d'
> >>>>
> >
> > why is the result different?
> > a bug or another reason?
>
> How that particular output came to be I don't know, but you should be able
> to avoid the confusion by either passing a unicode string to compile() or
> specifying the encoding:
>
> >>> exec compile(u'u"中"','c:/test','single')
> u'\u4e2d'
> >>> exec compile('# -*- coding: utf8 -*-\nu"中"','c:/test','single')
> u'\u4e2d'
>
> Peter
>
> PS: In and all-UTF-8 environment I would have /expected/ to see
>
> >>> your_encoding = "utf8"
> >>> identity = "latin1"
> >>> u'\u4e2d'.encode(your_encoding).decode(identity)
> u'\xe4\xb8\xad'
>
> and that's indeed what I get over here:
>
> >>> exec compile('u"中"','c:/test','single')
> u'\xe4\xb8\xad'

But it's not an all-UTF-8 environment; his_encoding = 'gb2312' or one
of its heirs/successors :-)

Cheers,
John




More information about the Python-list mailing list