[Python-ideas] Proposal: Use mypy syntax for function annotations

Jukka Lehtosalo jlehtosalo at gmail.com
Thu Aug 14 06:06:41 CEST 2014


On Wed, Aug 13, 2014 at 6:39 PM, Andrew Barnert <abarnert at yahoo.com> wrote:

> On Wednesday, August 13, 2014 12:45 PM, Guido van Rossum <guido at python.org>
> wrote:
>
>
> >  def word_count(input: List[str]) -> Dict[str, int]:
> >      result = {}  #type: Dict[str, int]
> >      for line in input:
> >          for word in line.split():
> >              result[word] = result.get(word, 0) + 1
> >      return result
>
>
> I just realized why this bothers me.
>
> This function really, really ought to be taking an Iterable[String]
> (except that we don't have a String ABC). If you hadn't statically typed
> it, it would work just fine with, say, a text file—or, for that matter, a
> binary file. By restricting it to List[str], you've made it a lot less
> usable, for no visible benefit.
>
> And, while this is less serious, I don't think it should be guaranteeing
> that the result is a Dict rather than just some kind of Mapping. If you
> want to change the implementation tomorrow to return some kind of proxy or
> a tree-based sorted mapping, you can't do so without breaking all the code
> that uses your function.
>

I see this is a matter of programming style. In a library module, I'd
usually use about as general types as feasible (without making them overly
complex). However, if we have just a simple utility function that's only
used within a single program, declaring everything using abstract types
buys you little, IMHO, but may make things much more complicated. You can
always refactor the code to use more general types if the need arises.
Using simple, concrete types seems to decrease the cognitive load, but
that's just my experience.

Also, programmers don't always read documentation/annotations and can abuse
the knowledge of the concrete return type of any function (they can figure
this out easily by using repr()/type()). In general, as long as dynamically
typed programs may call your function, changing the concrete return type of
a library function risks breaking code that makes too many assumptions.
Thus I'd rather use concrete types for function return types -- but of
course everybody is free to not follow this convention.


> And if even Guido, in the motivating example for this feature, is
> needlessly restricting the usability and future flexibility of a function,
> I suspect it may be a much bigger problem in practice.
>
>
> This example also shows exactly what's wrong with simple generics: if this
> function takes an Iterable[String], it doesn't just return a
> Mapping[String, int], it returns a Mapping of _the same String type_. If
> your annotations can't express that, any value that passes through this
> function loses type information.
>

If I define a subclass X of str, split() still returns a List[str] rather
than List[X], unless I override something, so this wouldn't work with the
above example:

>>> class X(str): pass
...
>>> type(X('x y').split()[0])
<class 'str'>


> And not being able to tell whether the keys in word_count(f) are str or
> bytes *even if you know that f was a text file* seems like a pretty major
> loss.
>

Mypy considers bytes incompatible with str, and vice versa. The annotation
Iterable[str] says that Iterable[bytes] (such as a binary file) would not
be a valid argument. Text files and binary files have different types,
though the return type of open(...) is not inferred correctly right now. It
would be easy to fix this for the most common cases, though.

You could use AnyStr to make the example work with bytes as well:

  def word_count(input: Iterable[AnyStr]) -> Dict[AnyStr, int]:
      result = {}  #type: Dict[AnyStr, int]
      for line in input:
          for word in line.split():
              result[word] = result.get(word, 0) + 1
      return result

Again, if this is just a simple utility function that you use once or
twice, I see no reason to spend a lot of effort in coming up with the most
general signature. Types are an abstraction and they can't express
everything precisely -- there will always be a lot of cases where you can't
express the most general type. However, I think that relatively simple
types work well enough most of the time, and give the most bang for the
buck.

Jukka
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140813/a243f94a/attachment.html>


More information about the Python-ideas mailing list