Encoding of Python 2 string literals

Thu Jul 23 02:13:02 EDT 2015

On Thu, Jul 23, 2015 at 3:58 PM, dieter <dieter at handshake.de> wrote:
> Steven D'Aprano <steve at pearwood.info> writes:
>> On Wed, 22 Jul 2015 08:17 pm, anatoly techtonik wrote:
>>> Is there a way to know encoding of string (bytes) literal
>>> defined in source file? For example, given that source:
>>>
>>>     # -*- coding: utf-8 -*-
>>>     from library import Entry
>>>     Entry("текст")
>>>
>>> Is there any way for Entry() constructor to know that
>>> string "текст" passed into it is the utf-8 string?
>> ...
>> The right way to deal with this is to use an actual Unicode string:
>>
>> Entry(u"текст")
>>
>> and make sure that the file is saved using UTF-8, as the encoding cookie
>> says.
>
> In order to follow this recommendation, is there an easy way to
> learn about the "encoding cookie"'s value -- rather than parsing
> the first two lines of the source file (which may not always be available).

No; you don't need to. If you use a Unicode string literal (as marked
by the u"..." notation), the Python compiler will handle the decoding
for you. The string that's passed to Entry() will simply be a string
of Unicode codepoints - no encoding information needed. If you then
want that in UTF-8, you can encode it explicitly.

ChrisA