[Python-Dev] Unicode literals in Python 2.7

Wed Apr 29 10:35:27 CEST 2015

This situation is a bit different from coding cookies. They are used when
we have bytes from a source file, but we don't know its encoding. During
interactive session the tokenizer always knows the encoding of the bytes. I
would think that in the case of interactive session the PyCF_SOURCE_IS_UTF8
should be always set so the bytes containing encoded non-ASCII characters
are interpreted correctly. Why I'm talking about PyCF_SOURCE_IS_UTF8?
eval(u"u'\u03b1'") -> u'\u03b1' but eval(u"u'\u03b1'".encode('utf-8')) ->
u'\xce\xb1'. I understand that in the second case eval has no idea how are
the given bytes encoded. But the first case is actually implemented by
encoding to utf-8 and setting PyCF_SOURCE_IS_UTF8. That's why I'm talking
about the flag.

Regards, Drekin

On Wed, Apr 29, 2015 at 9:25 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 29 April 2015 at 06:20, Adam Bartoš <drekin at gmail.com> wrote:
> > Hello,
> >
> > is it possible to somehow tell Python 2.7 to compile a code entered in
> the
> > interactive session with the flag PyCF_SOURCE_IS_UTF8 set? I'm
> considering
> > adding support for Python 2 in my package
> > (https://github.com/Drekin/win-unicode-console) and I have run into the
> fact
> > that when u"α" is entered in the interactive session, it results in
> > u"\xce\xb1" rather than u"\u03b1". As this seems to be a highly
> specialized
> > question, I'm asking it here.
>
> As far as I am aware, we don't have the equivalent of a "coding
> cookie" for the interactive interpreter, so if anyone else knows how
> to do it, I'll be learning something too :)
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20150429/e6989006/attachment.html>