[Python-ideas] Proposal to extend PEP 484 (gradual typing) to support Python 2.7

Thu Jan 21 13:14:18 EST 2016

El 2016/01/21 a las 1:11, Guido van Rossum escribió:
> On Wed, Jan 20, 2016 at 9:42 AM, Andrew Barnert via Python-ideas 
> <python-ideas at python.org <mailto:python-ideas at python.org>> wrote:
>
>     On Jan 20, 2016, at 06:27, Agustín Herranz Cecilia
>     <agustin.herranz at gmail.com <mailto:agustin.herranz at gmail.com>> wrote:
>     >
>     > - GVR proposal includes some kind of syntactic sugar for
>     function type comments (" # type: (t_arg1, t_arg2) -> t_ret "). I
>     think it's good but this must be an alternative over typing module
>     syntax (PEP484), not the preferred way (for people get used to
>     typehints). Is this syntactic sugar compatible with generators?
>     The type analyzers could be differentiate between a Callable and a
>     Generator?
>
>     I'm pretty sure Generator is not the type of a generator function,
>     bit of a generator object. So to type a generator function, you
>     just write `(int, int) -> Generator[int]`. Or, the long way,
>     `Function[[int, int], Generator[int]]`.
>
>
> There is no 'Function' -- it existed in mypy before PEP 484 but was 
> replaced by 'Callable'. And you don't annotate a function def with '-> 
> Callable' (unless it returns another function). The Callable type is 
> only needed in the signature of higher-order functions, i.e. functions 
> that take functions for arguments or return a function. For example, a 
> simple map function would be written like this:
>
> def map(f: Callable[[T], S], a: List[T]) -> List[S]:
>     ...
>
> As to generators, we just improved how mypy treats generators 
> (https://github.com/JukkaL/mypy/commit/d8f72279344f032e993a3518c667bba813ae041a). 
> The Generator type has *three* parameters: the "yield" type (what's 
> yielded), the "send" type (what you send() into the generator, and 
> what's returned by yield), and the "return" type (what a return 
> statement in the generator returns, i.e. the value for the 
> StopIteration exception). You can also use Iterator if your generator 
> doesn't expect its send() or throw() messages to be called and it 
> isn't returning a value for the benefit of `yield from'.
>
> For example, here's a simple generator that iterates over a list of 
> strings, skipping alternating values:
>
> def skipper(a: List[str]) -> Iterator[str]:
>     for i, s in enumerate(a):
> if i%2 == 0:
> yield s
>
> and here's a coroutine returning a string (I know, it's pathetic, but 
> it's an example :-):
>
> @asyncio.coroutine
> def readchar() -> Generator[Any, None, str]:
>     # Implementation not shown
> @asyncio.coroutine
> def readline() -> Generator[Any, None, str]:
>     buf = ''
>     while True:
>         c = yield from readchar()
> if not c: break
> buf += c
> if c == '\n': break
> return buf
>
> Here, in Generator[Any, None, str], the first parameter ('Any') refers 
> to the type yielded -- it actually yields Futures, but we don't care 
> about that (it's an asyncio implementation detail). The second 
> parameter ('None') is the type returned by yield -- again, it's an 
> implementation detail and we might just as well say 'Any' here. The 
> third parameter (here 'str') is the type actually returned by the 
> 'return' statement.
>
> It's illustrative to observe that the signature of readchar() is 
> exactly the same (since it also returns a string). OTOH the return 
> type of e.g. asyncio.sleep() is Generator[Any, None, None], because it 
> doesn't return a value.
>
> This business is clearly still suboptimal -- we would like to 
> introduce a new type, perhaps named Coroutine, so that you can write 
> Coroutine[T] instead of Generator[Any, None, T]. But that would just 
> be a shorthand. The actual type of a generator object is always some 
> parametrization of Generator.
>
> In any case, whatever we write after the -> (i.e., the return type) is 
> still the type of the value you get when you call the function. If the 
> function is a generator function, the value you get is a generator 
> object, and that's what the return type designates.
>
>     (Of course you can use Callable instead of the more specific
>     Function, or Iterator (or even Iterable) instead of the more
>     specific Generator, if you want to be free to change the
>     implementation to use an iterator class or something later, but
>     normally you'd want the most specific type, I think.)
>
>
> I don't know where you read about Callable vs. Function.
>
> Regarding using Iterator[T] instead of Generator[..., ..., T], you are 
> correct.
>
> Note that you *cannot* define a generator function as returning a 
> *subclass* of Iterator/Generator; there is no way to have a generator 
> function instantiate some other class as its return value. Consider 
> (ignoring generic types):
>
> class MyIterator:
>     def __next__(self): ...
>     def __iter__(self): ...
>     def bar(self): ...
>
> def foo() -> MyIterator:
>     yield
>
> x = foo()
> x.bar() # Boom!
>
> The type checker would assume that x has a method bar() based on the 
> declared return type for foo(), but it doesn't. (There are a few other 
> special cases, in addition to Generator and Iterator; declaring the 
> return type to be Any or object is allowed.)
This is a mistake by my side, I got confused, the generator is just the 
return type of the callable, but the returned generator it's also a 
callable.

>     > - As this is intended to gradual type python2 code to port it to
>     python 3 I think it's convenient to add some sort of import that
>     only be used for type checking, and be only imported by the type
>     analyzer, not the runtime. This could be achieve by prepending
>     "#type: " to the normal import statement, something like:
>     >    # type: import module
>     >    # type: from package import module
>
>     That sounds like a bad idea. If the typing module shadows some
>     global, you won't get any errors, but your code will be misleading
>     to a reader (and even worse if you from package.module import t).
>     If the cost of the import is too high for Python 2, surely it's
>     also too high for Python 3. And what other reason do you have for
>     skipping it?
>
>
> Exactly. Even though (when using Python 2) all type annotations are in 
> comments, you still must write real imports. (This causes minor 
> annoyances with linters that warn about unused imports, but there are 
> ways to teach them.)
This type comment 'imports' are not intended to shadow the current 
namespace, are intended to tell the analyzer where it can find those 
types present in the type comments that are not in the current namespace 
without import in it. This surely complicates the analyzer task but 
helps avoid namespace pollution and also saves memory on runtime.

The typical case I've found is when using a third party library (that 
don't have type information) and you creates objects with a factory. The 
class of the objects is no needed anywhere so it's not imported in the 
current namespace, but it's needed only for type analysis and autocomplete.

>     > - Also there must be addressed how it work on a python2 to
>     python3 environment as there are types with the same name, str for
>     example, that works differently on each python version. If the
>     code is for only one version uses the type names of that version.
>
>     That's the same problem that exists at runtime, and people (and
>     tools) already know how to deal with it: use bytes when you mean
>     bytes, unicode when you mean unicode, and str when you mean
>     whatever is "native" to the version you're running under and are
>     willing to deal with it. So now you just have to do the same thing
>     in type hints that you're already doing in constructors,
>     isinstance checks, etc.
>
>
> This is actually still a real problem. But it has no bearing on the 
> choice of syntax for annotations in Python 2 or straddling code.

Yes, this is no related with the choice of syntax for annotations 
directly. This is intended to help in the process of porting python2 
code to python3, and it's outside of the PEP scope but related to the 
original problem. What I have in mind is some type aliases so you could 
annotate a version specific type to avoid ambiguousness on code that 
it's used on different versions. At the end what I originally try to 
said is that it's good to have a convention way to name this type aliases.

This are intended to use during the process of porting, to help some 
automated tools, in a period of transition between versions. It's a way 
to tell the analyzer that a type have a behavior, perhaps different, 
than the same type on the running python version.

For example. You start with some working python2 code that you want to 
still be working. A code analysis tool can infer the types and annotate 
the code. Also can check which parts are py2/py3 compatible and which 
not, and mark those types with the mentioned type aliases. With this, 
and test suites, it could be calculated how much code is needed to be 
ported. Refactor to adapt the code to python3 maintaining code to still 
run on python2 (it could be marked for automate deletion), and when it's 
done, drop all the python2 code..
>
>     Of course many people use libraries like six to help them deal
>     with this, which means that those libraries have to be type-hinted
>     appropriately for both versions (maybe using different stubs for
>     py2 and py3, with the right one selected at pip install time?),
>     but if that's taken care of, user code should just work.
>
>
> Yeah, we could use help. There are some very rudimentary stubs for a 
> few things defined by six 
> (https://github.com/python/typeshed/tree/master/third_party/3/six, 
> https://github.com/python/typeshed/tree/master/third_party/2.7/six) 
> but we need more. There's a PR but it's of bewildering size 
> (https://github.com/python/typeshed/pull/21).
>
I think the process of porting it's different from the process of 
adapting code to work on python 2/3. Code with bytes, unicode, & 
str(don't mind) are not python2 code nor python3. Lot's of libraries 
that are 2/3 compatibles are just python2 code minimally adapted to run 
on python3 with six, and still be developed with a python2 style. When 
the time of drop python2 arrives the refactor needed will be huge. There 
is also an article that recently claims "Stop writing code that break on 
Python 4" and show code that treats python3 as the special case..

> PS. I have a hard time following the rest of Agustin's comments. The 
> comment-based syntax I proposed for Python 2.7 does support exactly 
> the same functionality as the official PEP 484 syntax; the only thing 
> it doesn't allow is selectively leaving out types for some arguments 
> -- you must use 'Any' to fill those positions. It's not a problem in 
> practice, and it doesn't reduce functionality (omitted argument types 
> are assumed to be Any in PEP 484 too). I should also remark that mypy 
> supports the comment-based syntax in Python 2 mode as well as in 
> Python 3 mode; but when writing Python 3 only code, the non-comment 
> version is strongly preferred. (We plan to eventually produce a tool 
> that converts the comments to standard PEP 484 syntax).
> -- 
> --Guido van Rossum (python.org/~guido <http://python.org/%7Eguido>)

My original point is that if comment-based function annotations are 
going to be added, add it to python 3 too, no only for the special case 
of "Python 2.7 and straddling code", even though, on python 3, type 
annotations are preferred.

I think that have the alternative to define types of a function as a 
type comment is a good thing because annotations could become a mesh, 
specially with complex types and default parameters, and I don't fell 
that the optional part of gradual typing must include readability.
Some examples of my own code:

class Field:
     def __init__(self, name: str,
                  extract: Callable[[str], str],
                  validate: Callable[[str], bool]=bool_test,
                  transform: Callable[[str], Any]=identity) -> 'Field':

class RepeatableField:
     def __init__(self,
                  extract: Callable[[str], str],
                  size: int,
                  fields: List[Field],
                  index_label: str,
                  index_transform: Callable[[int], str]=lambda x: 
str(x)) -> 'RepeatableField':

def filter_by(field_gen: Iterable[Dict[str, Any]], **kwargs) -> 
Generator[Dict[str, Any], Any, Any]:

So, for define a comment-based function annotation it should be accepted 
two kind of syntax:
- one 'explicit' marking the type of the function according to the 
PEP484 syntax:

     def embezzle(self, account, funds=1000000, *fake_receipts):
         # type: Callable[[str, int, *str], None]
         """Embezzle funds from account using fake receipts."""
         <code goes here>

   like if was a normal type comment:

     embezzle = get_embezzle_function()  # type: Callable[[str, int, *str], None]

- and another one that 'implicitly' define the type of the function as 
Callable:

     def embezzle(self, account, funds=1000000, *fake_receipts):
         # type: (str, int, *str) -> None
         """Embezzle funds from account using fake receipts."""
         <code goes here>

Both ways are easily translated back and forth into python3 annotations.

Also, comment-based function annotations easily goes over one line's 
characters, so it should be define which syntax is used to break the 
line. As it said on https://github.com/JukkaL/mypy/issues/1102

Those things should be on a PEP as a standard way to implement this, not 
only for mypy, also for other tools.
Accept comment-based function annotations in python3 is good for 
migration python 2/3 code as it helps on refactor and use (better 
autocomplete), but makes it a python2 feature and not python3 increase 
the gap between versions.

Hope I expressed better, if not, sorry about that.

Agustín Herranz

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160121/9b3a1fb4/attachment-0001.html>