[Python-Dev] Reading Python source file

Tue Nov 17 11:06:17 EST 2015

On 17.11.15 05:05, MRAB wrote:
> As I understand it, *nix expects the shebang to be b'#!', which means
> that the
> first line should be ASCII-compatible (it's possible that the UTF-8 BOM
> might
> be present). This kind of suggests that encodings like UTF-16 would cause a
> problem on such systems.
>
> The encoding line also needs to be ASCII-compatible.
>
> I believe that the recent thread "Support of UTF-16 and UTF-32 source
> encodings" also concluded that UTF-16 and UTF-32 shouldn't be supported.
>
> This means that you could treat the first 2 lines as though they were some
> kind of extended ASCII (Latin-1?), the line ending being '\n' or '\r' or
> '\r\n'.
>
> Once you'd identify the encoding, you could decode everything (including
> the
> shebang line) using that encoding.

Yes, that is what I were going to implement (and already halfway here). 
My question is whether it is worth to complicate the code further to 
preserve reading by the line. In any case after reading the first line 
that doesn't contain neither coding cookie, nor non-comment tokens, we 
need to wait the second line.

> (What should happen if the encoding line then decoded differently, i.e.
> encoding_line.decode(encoding) != encoding_line.decode('latin-1')?)

The parser should got the line decoded with specified encoding.