[Python-ideas] Proposal to extend PEP 484 (gradual typing) to support Python 2.7

Fri Jan 22 16:40:21 EST 2016

On Jan 22, 2016, at 13:11, Guido van Rossum <guido at python.org> wrote:
> 
> Interesting. PEP 484 defines an IO generic class, so you can write IO[str] or IO[bytes]. Maybe introducing separate helper functions that open files in text or binary mode can complement this to get a solution?

The runtime types are a little weird here as well.

In 3.x, open returns different types depending on the value, rather than the type, of its inputs. Also, TextIOBase is a subclass of IOBase, even though it isn't a subtype in the LSP sense, so you have to test isinstance(IOBase) and not isinstance(TextIOBase) to know that read() is going to return bytes. That's all a little wonky, but not impossible to deal with.

In 2.x, most file-like objects--including file itself, which open returns--don't satisfy either ABC, and most of them can return either type from read.

Having a different function for open-binary instead of a mode flag would solve this, but it seems a little late to be adding that now. You'd have to go through all your 2.x code and change every open to one of the two new functions just to statically type your code, and then change it again for 3.x. Plus, you'd need to do the same thing not just for the builtin open, but for every library that provides an open-like method.

Maybe this special case is special enough that static type checkers just have to deal with it specially? When the mode flag is a literal, process it; when it's forwarded from another function, it may be possible to get the type from there; otherwise, everything is just unicode|bytes and the type checker can't know any more unless you explicitly tell it (by annotating the variable the result of open is stored in).

> 
>> On Fri, Jan 22, 2016 at 12:58 PM, Paul Moore <p.f.moore at gmail.com> wrote:
>> On 22 January 2016 at 19:08, Guido van Rossum <guido at python.org> wrote:
>> > On Fri, Jan 22, 2016 at 10:37 AM, Brett Cannon <brett at python.org> wrote:
>> >>
>> >>
>> >>
>> >> On Thu, 21 Jan 2016 at 10:45 Guido van Rossum <guido at python.org> wrote:
>> >>>
>> >>> On Thu, Jan 21, 2016 at 10:14 AM, Agustín Herranz Cecilia
>> >>> <agustin.herranz at gmail.com> wrote:
>> >>> [...]
>> >>> Yes, this is no related with the choice of syntax for annotations
>> >>> directly. This is intended to help in the process of porting python2 code to
>> >>> python3, and it's outside of the PEP scope but related to the original
>> >>> problem. What I have in mind is some type aliases so you could annotate a
>> >>> version specific type to avoid ambiguousness on code that it's used on
>> >>> different versions. At the end what I originally try to said is that it's
>> >>> good to have a convention way to name this type aliases.
>> >>>
>> >>> Yes, this is a useful thing to discuss.
>> >>>
>> >>> Maybe we can standardize on the types defined by the 'six' package, which
>> >>> is commonly used for 2-3 straddling code:
>> >>>
>> >>> six.text_type (unicode in PY2, str in PY3)
>> >>> six.binary_type (str in PY2, bytes in PY3)
>> >>>
>> >>> Actually for the latter we might as well use bytes.
>> >>
>> >>
>> >> I agree that `bytes` should cover str/bytes in Python 2 and `bytes` in
>> >> Python 3.
>> >
>> >
>> > OK, that's settled.
>> >
>> >>
>> >> As for the textual type, I say either `text` or `unicode` since they are
>> >> both unambiguous between Python 2 and 3 and get the point across.
>> >
>> >
>> > Then let's call it unicode. I suppose we can add this to typing.py. In PY2,
>> > typing.unicode is just the built-in unicode. In PY3, it's the built-in str.
>> 
>> This thread came to my attention just as I'd been thinking about a
>> related point.
>> 
>> For me, by far the worst Unicode-related porting issue I see is people
>> with a confused view of what type of data reading a file will give.
>> This is because open() returns a different type (byte stream or
>> character stream) depending on its arguments (specifically 'b' in the
>> mode) and it's frustratingly difficult to track this type across
>> function calls - especially in code originally written in a Python 2
>> environment where people *expect* to confuse bytes and strings in this
>> context. So, for example, I see a function read_one_byte which does
>> f.read(1), and works fine in real use when a data file (opened with
>> 'b') is processed, but fails when sys.stdin us used (on Python 3once
>> someone types a Unicode character).
>> 
>> As far as I know, there's no way for type annotations to capture this
>> distinction - either as they are at present in Python3, nor as being
>> discussed here. But what I'm not sure of is whether it's something
>> that *could* be tracked by a type checker. Of course I'm also not sure
>> I'm right when I say you can't do it right now :-)
>> 
>> Is this something worth including in the discussion, or is it a
>> completely separate topic?
>> Paul
> 
> 
> 
> -- 
> --Guido van Rossum (python.org/~guido)
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160122/3e40cf3b/attachment.html>