[Python-Dev] eval and triple quoted strings

Tue Jun 18 02:02:26 CEST 2013

2013/6/17 Guido van Rossum <guido at python.org>:
> On Mon, Jun 17, 2013 at 4:40 PM, Benjamin Peterson <benjamin at python.org> wrote:
>> 2013/6/17 Greg Ewing <greg.ewing at canterbury.ac.nz>:
>>> Guido van Rossum wrote:
>>>>
>>>> No. Executing a file containing those exact characters produces a
>>>> string containing only '\n' and exec/eval is meant to behave the same
>>>> way. The string may not have originated from a file, so the universal
>>>> newlines behavior of the io module is irrelevant here -- the parser
>>>> must implement its own equivalent processing, and it does.
>>>
>>>
>>> I'm still not convinced that this is necessary or desirable
>>> behaviour. I can understand the parser doing this as a
>>> workaround before we had universal newlines, but now that
>>> we do, I'd expect any Python string to already have newlines
>>> converted to their canonical representation, and that any CRs
>>> it contains are meant to be there. The parser shouldn't need
>>> to do newline translation a second time.
>>
>> It used to be that way until 2.7. People like to do things like
>>
>>     with open("myfile.py", "rb") as fp:
>>         exec fp.read() in ns
>>
>> which used to fail with CRLF newlines because binary mode doesn't have
>> them. I think this is actually the correct way to execute Python
>> sources because the parser then handles the somewhat complicated
>> process of decoding Python source for you.
>
> What exactly does the parser handles better than the io module? Is it
> just the coding cookies? I suppose that works as long as the file is
> encoded using as ASCII superset like the Latin-N variants or UTF-8. It
> would fail pretty badly if it was UTF-16 (and yes, that's an
> abominable encoding for other reasons :-).

The coding cookie is the main one. In fact, if you can't parse that,
you don't really know what encoding to open the file with at all.
There's also small things like BOM handling (you have to use the
utf-16-sig encoding with TextIO to get it removed) and defaulting to
UTF-8 (which the io module doesn't do) which is better left to the
parser.

--
Regards,
Benjamin