[Python-ideas] Type hints for text/binary data in Python 2+3 code

Jukka Lehtosalo jlehtosalo at gmail.com
Sat Mar 26 08:55:11 EDT 2016


On Fri, Mar 25, 2016 at 12:00 AM, Andrey Vlasovskikh <
andrey.vlasovskikh at gmail.com> wrote:

> Upon further investigation of the problem I've come up with an alternative
> idea that looks simpler and yet still capable of finding most text/binary
> conversion errors.
>
...

> ## TL;DR
>
> * Introduce `typing.Text` for text data in Python 2+3
> * `bytes`, `str`, `unicode`, `typing.Text` in type hints mean whatever they
>   mean at runtime for Python 2 or 3
> * Allow `str -> unicode` and `unicode -> str` promotions for Python 2
>

I'm against this, as it would seem to make str and unicode pretty much the
same type in Python 2, and thus Python 2 mode seems much weaker than
necessary. I wrote a more detailed reply in the mypy issue tracker (
https://github.com/python/mypy/issues/1141#issuecomment-201799761). I'm not
copying it all here since much of that is somewhat mypy-specific and
related to the rest of the discussion on that issue, but I'll summarize my
main points here.

I prefer the idea of doing better type checking in Python 2 mode for str
and unicode, though I suspect we need to implement a prototype to decide
whether it will be practical.

* Type checking for Python 2 *and* Python 3 actually finds most text/binary
>   errors
>

This may be true, but I'm worried about usability for Python 2 code bases.
Also, the effort needed to pass type checking in both modes (which is
likely pretty close to the effort of a full Python 3 migration, if the
entire code will be annotated) might be impractical for a large Python 2
code base.

## Summary for authors of type checkers
>
> The semantics of types `bytes`, `str`, `unicode`, `typing.Text` and the
> type
> checking rules for them should match the *runtime behavior* of these types
> in
> Python 2 and Python 3 depending on Python 2 or 3 modes. Using the runtime
> semantics for the types is easy to understand while it still allows to
> catch
> most errors. The Python 2+3 compatibility mode is just a sum of Python 2
> and
> Python 3 warnings.
>

At least for mypy, the Python 2+3 compatibility mode would likely that
twice as much CPU to run, which is a pretty high cost as type checking
speed is one of the biggest open issues we have right now.

## Runtime type compatibility
>
...

> Each cell contains two characters: the result in Python 2 and in Python 3
> respectively. Abbreviations:
>
...

> * `*` — types are compatible, ignoring implicit ASCII conversions
>

Am I reading this right if I understand this as "considered valid during
type checking but may fail at runtime"?

For non-ASCII text literals passed to functions that expect `Text` or `str`
> in
> Python 2 a type checker can analyze the contents of the literal and show
> additional warnings based on this information. For non-ASCII data coming
> from
> sources other than literals this check would be more complicated.
>

I wonder what would the check look like in the latter case? I can't imagine
how this would work for non-literals.

Jukka
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160326/edf8859a/attachment.html>


More information about the Python-ideas mailing list