[Python-Dev] eval and triple quoted strings

Tue Jun 18 02:22:18 CEST 2013

It may be possible to implement parsing the codec cookie as a Python codec :-)

Victor

2013/6/18 Guido van Rossum <guido at python.org>:
> On Mon, Jun 17, 2013 at 5:02 PM, Benjamin Peterson <benjamin at python.org> wrote:
>> 2013/6/17 Guido van Rossum <guido at python.org>:
>>> On Mon, Jun 17, 2013 at 4:40 PM, Benjamin Peterson <benjamin at python.org> wrote:
>>>> 2013/6/17 Greg Ewing <greg.ewing at canterbury.ac.nz>:
>>>>> Guido van Rossum wrote:
>>>>>>
>>>>>> No. Executing a file containing those exact characters produces a
>>>>>> string containing only '\n' and exec/eval is meant to behave the same
>>>>>> way. The string may not have originated from a file, so the universal
>>>>>> newlines behavior of the io module is irrelevant here -- the parser
>>>>>> must implement its own equivalent processing, and it does.
>>>>>
>>>>>
>>>>> I'm still not convinced that this is necessary or desirable
>>>>> behaviour. I can understand the parser doing this as a
>>>>> workaround before we had universal newlines, but now that
>>>>> we do, I'd expect any Python string to already have newlines
>>>>> converted to their canonical representation, and that any CRs
>>>>> it contains are meant to be there. The parser shouldn't need
>>>>> to do newline translation a second time.
>>>>
>>>> It used to be that way until 2.7. People like to do things like
>>>>
>>>>     with open("myfile.py", "rb") as fp:
>>>>         exec fp.read() in ns
>>>>
>>>> which used to fail with CRLF newlines because binary mode doesn't have
>>>> them. I think this is actually the correct way to execute Python
>>>> sources because the parser then handles the somewhat complicated
>>>> process of decoding Python source for you.
>>>
>>> What exactly does the parser handles better than the io module? Is it
>>> just the coding cookies? I suppose that works as long as the file is
>>> encoded using as ASCII superset like the Latin-N variants or UTF-8. It
>>> would fail pretty badly if it was UTF-16 (and yes, that's an
>>> abominable encoding for other reasons :-).
>>
>> The coding cookie is the main one. In fact, if you can't parse that,
>> you don't really know what encoding to open the file with at all.
>> There's also small things like BOM handling (you have to use the
>> utf-16-sig encoding with TextIO to get it removed) and defaulting to
>> UTF-8 (which the io module doesn't do) which is better left to the
>> parser.
>
> Maybe there are some lessons here that the TextIO module could learn?
>
> --
> --Guido van Rossum (python.org/~guido)
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com