[Python-ideas] 'Injecting' objects as function-local constants

Fri Jun 17 09:02:52 CEST 2011

On Fri, Jun 17, 2011 at 3:37 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> What benefit is there in making inject a keyword? Even super isn't a
> keyword.

Given the magic effects they have on the compiler, super and __class__
probably *should* be keywords. The only reason they aren't is that
their effect (automatically defining __class__ as a local when inside
a function in a class scope) is relatively harmless in the event that
super has actually been rebound to refer to something other than the
builtin:

class C:
  def f(self):
    print(locals())
  def g(self):
    __class__
    print(locals())
  def h(self):
    super
    print(locals())

>>> C().f()
{'self': <__main__.C object at 0xde60d0>}
>>> C().g()
{'self': <__main__.C object at 0xde60d0>, '__class__': <class '__main__.C'>}
>>> C().h()
{'self': <__main__.C object at 0xde60d0>, '__class__': <class '__main__.C'>}

> As far as I'm concerned, inject need only be a function in the functools
> module, not even a built-in, let alone a keyword.
>
> Here's a quick and dirty version that comes close to the spirit of inject,
> as I see it. Thanks to Alex Light's earlier version.
>
>
> # Credit to Alex Light.
> from contextlib import contextmanager
> from functools import wraps
>
> def inject(**localArgs):
>    def decorator(func):
>        glbs = func.__globals__
>        @wraps(func)
>        def inner(*args, **kwargs):
>            with _modifyGlobals(glbs, localArgs):
>                ret = func(*args, **kwargs)
>            return ret
>        return inner
>    return decorator

Sorry, I meant to point out why this was a bad idea when Alex first posted it.

The __globals__ reference on a function object refers to the globals
of the module where the function is defined. Modify the contents of
that dictionary and you modify the contents of that module. So this
"injection" approach not only affects the function being decorated,
but every other function in the module. Thread safety is completely
non-existent and cannot be handled locally within the decorated
function.

The reason something like @inject or @def is needed as a language
construct is because the object of the exercise is to define a new
kind of scope (call it "shared locals" for lack of a better name) and
we need the compiler's help to do it properly.

Currently, the closest equivalent to a shared locals scope is the
default argument namespace, which is why people use it that way: the
names are assigned values at function definition time, and they are
automatically copied into the frame locals whenever the function is
invoked. A shared namespace can also be created explicitly by using a
closure or a class, but both of those suffer from serious verbosity
(and hence readability) problems when the design intent you are aiming
to express is a single algorithm with some persistent state.

As noted in Jan's original message, using the default argument
namespace has its own flaws (rebinding of immutable targets not
working properly, cluttering the function signature on introspection,
risk of inadvertent replacement in the call), but if it didn't address
a genuine design need, it wouldn't be so popular. Hence the current
discussion, which reminds me a lot of the PEP 308 (ternary
expressions) discussion. Developers have proven they want this
functionality by coming up with a hack that does it, but the hack is
inherently flawed. Telling them "don't do that" is never going to
work, so the best way to eliminate usage of the hack is to provide a
way to do it *right*. (Relating back to PEP 308: how often do you see
the and/or hack in modern Python code written by anyone that learned
the language post Python 2.4?)

The runtime *semantics* of my implementation sketch (an additional set
of cells stored on the function object that are known to the compiler
and accessed via closure ) are almost certainly the right way to go:
it's a solution that cleanly handles rebinding of immutable targets
and avoids cluttering the externall visible function signature with
additional garbage. The only question is how to tell the compiler
about it, and there are three main options for that:

1. Embedded in the function header, modelled on the handling of
keyword-only arguments:

  def example(arg, **, cache=set(), invocations=0):
    """Record and return arguments seen and count the number of times
the function has been invoked"""
    invocations += 1
    cache.add(arg)
    return arg

  Pros: no bikeshedding about the keyword for the new syntax,
namespace for execution is clearly the same as that for default
arguments (i.e. the containing namespace)
  Cons: look like part of the argument namespace (when they really
aren't), no mnemonic to assist new users in remembering what they're
for, no open questions

2. Inside the function as a new statement type (bikeshed colour
options: @def, @shared, shared)

  def example(arg, **, cache=set(), invocations=0):
    """Record and return arguments seen and count the number of times
the function has been invoked"""
    @def cache=set(), invocations=0
    invocations += 1
    cache.add(arg)
    return arg

  Pros: implementation detail of shared state is hidden inside the
function where it belongs, keyword choice can provide a good mnemonic
for functionality
  Cons: needs new style rules on appropriate placements of @def/shared
statements (similar to nonlocal and global), use of containing
namespace for execution may be surprising
  Open Questions: whether to allow only one line with a tuple of
assignments or multiple lines, whether to allow simple assignments
only or any simple non-flow control statement

3. After the decorators and before the function definition (bikeshed
colour options: @def, @inject, @shared)

  @def cache=set(), invocations=0
  def example(arg)
    """Record and return arguments seen and count the number of times
the function has been invoked"""
    invocations += 1
    cache.add(arg)
    return arg

  Pros: keyword choice can provide a good mnemonic for functionality,
namespace for execution is clearly the same as that for decorator
expressions (i.e. the containing namespace)
  Cons: puts private implementation details ahead of the public
signature information, looks too much like an ordinary decorator
  Open Questions: whether to allow only one line with a tuple of
assignments or multiple lines

I already have too much on my to-do list to champion a PEP for this,
but I'd be happy to help someone else with the mechanics of writing
one and getting it published on python.org (hint, hint Jan!).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia