[Python-ideas] Move optional data out of pyc files

INADA Naoki songofacandy at gmail.com
Thu Apr 12 06:48:07 EDT 2018


> Finally, loading docstrings and other optional components can be made lazy.
> This was not in my original idea, and this will significantly complicate the
> implementation, but in principle it is possible. This will require larger
> changes in the marshal format and bytecode.

I'm +1 on this idea.

* New pyc format has code section (same to current) and text section.
text section stores UTF-8 strings and not loaded at import time.
* Function annotation (only when PEP 563 is used) and docstring are
stored as integer, point to offset in the text section.
* When type.__doc__, PyFunction.__doc__, PyFunction.__annotation__ are
integer, text is loaded from the text section lazily.

PEP 563 will reduce some startup time, but __annotation__ is still
dict.  Memory overhead is negligible.

In [1]: def foo(a: int, b: int) -> int:
   ...:     return a + b
   ...:
   ...:

In [2]: import sys
In [3]: sys.getsizeof(foo)
Out[3]: 136

In [4]: sys.getsizeof(foo.__annotations__)
Out[4]: 240

When PEP 563 is used, there are no side effect while building the annotation.
So the annotation can be serialized in text, like
{"a":"int","b":"int","return":"int"}.

This change will require new pyc format, and descriptor for
PyFunction.__doc__, PyFunction.__annotation__
and type.__doc__.

Regards,

-- 
INADA Naoki  <songofacandy at gmail.com>


More information about the Python-ideas mailing list