[Python-ideas] Optimizing global names via dict references

Sat Dec 19 09:10:10 EST 2015

(Previous threads from python-dev:
"Re: [Python-Dev] Third milestone of FAT Python"
    https://mail.python.org/pipermail/python-dev/2015-December/142437.html
"[Python-Dev] Idea: Dictionary references"
    (Forked from above thread.)
    https://mail.python.org/pipermail/python-dev/2015-December/142490.html
)

Dictionary References: An Idea in Three Emails

Act 1: Stuff

Table of Contents:

    Act 1: Stuff
    https://mail.python.org/pipermail/python-ideas/2015-December/037511.html
    - Definitions
    - Problem statement
    - The idea
    - The benefits
    - The costs
    - What will hold a RefCell?
    - Add `getref` to `dict`
    - Implementation issues
    - Interpreters other than CPython

    Act 2: More stuff
    https://mail.python.org/pipermail/python-ideas/2015-December/037512.html
    - Extension: Scope ancestry
    - Aside: Classes and scopes do not have the same rules
    - Extension: Read/write refs
    - Extension: Read-only refs
    - Option: Alternative RefCell structure
    - Option: Replace the methods when detaching a RefCell
    - Extension: Dynamic scoping
    - Extension: Thread safety

    Act 3: Sample pseudocode
    https://mail.python.org/pipermail/python-ideas/2015-December/037513.html
    - ScopeDict
    - Scope ancestry
    - Alternative RefCell structure
    - Thread safety

== Definitions ==

"lookup"
    A dictionary lookup.

"exposed RefCell"
    A RefCell with a PyRef other than by its owner.
    A RefCell with a refcount > 1.

"pyreference"
    A CPython ref-counted reference.

== Problem statement ==

I propose a CPython interpreter optimization for global name lookups.

When a function uses a global name, it requires a lookup during execution.

    def maxlen(lists):
        return sum(len(lst)   #globals['len']
            for lst in lists)

This is necessary because:
1. The name might not exist yet.
2. The value might change.

Repeated dict lookups are more expensive than, uh, not doing that.
Local variables are indexed by array, rather than lookups. (See
http://stackoverflow.com/questions/12590058/python-performance-with-global-variables-vs-local)

Pythonistas often get around this lookup by saving the name in a local variable.

    def maxlen(lists):
        len_ = len
        return sum(len_(lst)   #globals['len']
            for lst in lists)

They also use a default argument for that variable, which would
prevent the lookup during each call.

    def maxlen(lists, len=len):
        return sum(len_(lst)   #globals['len']
            for lst in lists)

I think these tricks are theoretically unnecessary. We only need a
single lookup, at compile-time.

There should be no change to Python's semantics. In fact, this should
reinforce the semantics, because the above tricks don't allow for
`len` to change, so if this idea is implemented, the tricks would only
be used to explicitly disallow changes.

== The idea ==

We use a level of indirection, and allow empty values in the internal
dict (scope.__inner__[key] = NULL).

Let ScopeDict be a subclass of dict. (It doesn't have to be a
subclass. See "ScopeDict should replace dict".)

ScopeDict owns a second dict of RefCells. A normal dict owns
pyreference to its values. A RefCell will have a pointer to that
reference. This means that a RefCell does not need to be notified when
a value is set, since it knows where the dict stores its pointer to
the value. Instead, when the dict resizes, the dict has to update all
of its RefCells.

Upon deletion of the dict, ScopeDict will pass ownership of the value
pyreferences to the RefCells.

That was a simplified view. We can make these optimizations:

1. The RefCells will be held in an array, not a dict. This array will
be synced with the dict's internal hash table, and resizes with it.
This allows a single lookup for both the value and its RefCell (or
lack thereof), because they will be at the same index in different
arrays.

2. RefCells won't be created unless they're requested. Obviously.
2a. The array doesn't need to be created unless a RefCell is created.

3. Any time a ScopeDict would touch a RefCell, it would first check
that it's not exposed (that is, that the RefCell has no outside
references). If it isn't exposed, that Refcell can be safely deleted.
3a. The refs array can also be deleted at this time if it doesn't hold anything.

4. If a key has no corresponding RefCells, it doesn't need to be
saved. This can be checked on resize and ScopeDict.__del__, and also
can be checked on __delitem__.

[See "Pseudocode: ScopeDict"]

== The benefits ==

Every global lookup is done at compile-time. No need to save a local
copy of a global variable.

Guaranteed to have the most updated value. Even if it didn't exist at
compile-time. In particular, it works even for recursive functions and
wrapped functions, such as

    from functools import lru_cache

    @lru_cache()
    def fib(n):
        if n < 2:
            return n
        return fib(n-1) + fib(n-2)

Access to a global variable during a call is a few C dereferences,
plus a few checks and C function calls.

No change to the semantics of the language.

Possible change in the semantics of CPython: There might be cases
where keys live longer than they used to, given the following:

    1. `del scope[key]` was called.

    2. There was an exposed RefCell for that key.

    3. Later, the RefCell stopped being exposed

    4. `scope` has not resized since then (and thus didn't clean up
its dead cells yet).

The above is already possible in garbage-collected interpreter. We can
make ScopeDict weakref its RefCells, and have RefCells remove keys
with no value when they die, but it would mean extra overhead by
wrapping a RefCell with a weakref. Is that worth it? (`getref` will
return a strong reference.)

== The costs ==

Memory:

- An extra pointer to the refs table in each ScopeDict, and an extra
function pointer for the new function.

- An extra array for each ScopeDict, 1/3 to 1/4 the size of the
internal dict table (depends on how big pointers are on the platform).
    Note: Since the refs table is the same size as the internal hash
table, Raymond Hettinger's compact dict idea will also decrease the
size of the refs table. (compact dict idea:
https://mail.python.org/pipermail/python-dev/2012-December/123028.html)

- Extra memory for each RefCell (a Python object). (There will be at
most one RefCell per dict per key.)

- The dict will be bigger because it holds empty entries.

Execution:

- Extra compile time. Functions will do lookups when they're compiled.
But we're saving lookups during execution, so it's only extra if the
function never gets called or never uses the RefCell.

- `getitem`: Must check if the value is NULL. Currently, it only checks the key.

- `delitem`: Instead of removing the key, just set the value to NULL.
In fact, this is probably cheaper.

- Resizing the dict: It will have to copy over all the RefCells (and
delete the unused ones) to a new array, too, and update them.

- Deleting the dict: The dict will have to DecRef its RefCells, and
pass on its value pyreferences to the ones that survive.

Python Refs:

- We're potentially holding keys even when values have been deleted.
This is okay, because most pyreferences to a refcell replace a
pyreference to a key.

== What will hold a RefCell? ==

Functions will request RefCells for each global variable they use.
They will hold them until they die.

Code not in functions (e.g. in module scope), or using eval/exec, will
request RefCells during compilation, and they will be used during
execution. They will discard the RefCells when they are done
executing.

== Add `getref` to `dict` ==

I propose that all dicts should have the ability to return RefCells.

As Victor Stinner pointed out in his FAT Python message, using a
subclass for scopes would disallow regular dicts being used as scopes
by eval and exec.
(See end of https://mail.python.org/pipermail/python-dev/2015-December/142397.html)

But I think refs can be useful enough that it could be added to the
dict interface. Meaning `getref` (via another name... `item`?) can be
a new dict method. Upon first use, the dict will generate an empty
refs table, and replace its methods with ones that know how to deal
with refs. This prevents O(n) memory overhead for dicts that don't
need it.

== Implementation issues ==

There are performance _concerns_. I think it can be done with only a
few performance _issues_. In my experience in arguing out the idea,
the overhead for both memory and execution will be better than what
they replace.[*] So while I'm not promising a free lunch, I'm
believing in it.

There are potential issues with dict subclasses that also change the C
function pointers, since I'm proposing that we replace those functions
dynamically. I think these issues are solvable, but it would require
understanding what's allowed so I would know how to properly wrap that
functionality.

[*] (Except that there's a tricky case in deep multiple inheritance of
scope, which I'd have to figure out. But I don't think it's possible
to do that except in classes, and I don't wanna touch class MRO until
I understand how the resolution works in C.)
[*] (And you can pay a lot of lookups if you keep creating functions
that won't get executed, but it's impossible to do such a thing in a
loop since the lookups will be done once by the parent. Unless you
loop an eval/exec to keep making functions that will never be called,
in which case you're a bad person.)

== Interpreters other than CPython ==

It should be possible to use the same idea in IronPython, Jython, and
PyPy. I think it's a simple enough idea that I am surprised that it's
not already there. In fact, I would not be surprised if PyPy already
uses it. In fact, I would not be (very) surprised if I had gotten the
idea from PyPy.