[Python-Dev] Accessing globals without dict lookup

Guido van Rossum guido@python.org
Sun, 10 Feb 2002 11:20:30 -0500


> I persist in my delusion.  Original text:
> 
>     When you use its getitem method, the PyObject * in the cell is
>     dereferenced, and if a NULL is found, getitem raises KeyError
>     even if the cell exists.
> 
> Since we're doing something with "the PyObject* in the cell", surely "the
> cell" *must* exist.  So what is the "even if the cell exists" trying to say?

It is trying to say "despite the cell's existence".  See the sample code.

> I believe it means to say
> 
>     even if the cell's cellptr is not NULL
> 
> and "the cell's cellptr is not NULL" is quite different from "the cell
> exists".

No, it doesn't try to say that.  But you're right that it's useful to
add that the cell's cellptr is irrelevant to getitem.

> Another idea: a celldict could contain a "real dict" pointer,
> normally NULL, and pointing to a plain dict when a real dict is
> given.  The celldict constructor would populate the cells from the
> realdict's contents when not NULL.  Then getitem wouldn't have to do
> anything special (realdict==NULL and realdict!=NULL would be the
> same to it).  setitem and delitem would propagate mutations
> immediately into the realdict too when non-NULL.  Since mutations
> are almost certainly much rarer than accesses, this makes the rarer
> operations pay.  The eval loop would always see a celldict.

This works for propagating changes from the celldict to the real dict,
but not the other way around.  Example:

  d = {'x': 10}
  def set_x(x):
      d['x'] = x
  exec "...some code that calls set_x()..." in d

> > (Another hack probably not worth it right now is to make the module's
> > cell.cellptr point to itself if it's not shadowing a builtin cell --
> > then the first NULL check for cell.cellptr can be avoided in the case
> > of finding a builtin name successful.)
> 
> I don't think I followed this.  If, e.g., a module's "len" cell is normally
> 
>     {NULL, pointer to __builtin__'s "len" cell}
> 
> under the original scheme, how would that change?
> 
>     {NULL, pointer to this very cell}
> 
> wouldn't make sense.
> 
>     {builtin len, pointer to this very cell}
> 
> would make sense, but then the pointer to self is useless -- except as a
> hint that we copied the value up from the builtins?  But then a change to
> __builtin__.len wouldn't be visible to the module.

I meant that for "len" it would not change, i.e. it would be

    {NULL, pointer to __builtin__'s "len" cell}

but for a global "foo" it would change to

    {value of foo or NULL if foo is undefined, pointer to this very cell}

Then if foo is defined, the code would find the value of foo in the
first cell it tries, and if foo is undefined, it would find a NULL in
the cell and in the cell it points to.

> > I do.  The C code in compiler.c is already at a level of
> > complexity that nobody understands it in its entirety!  (I don't
> > understand what Jeremy added, and Jeremy has to ask me about the
> > original code. :-( )
> 
> I don't care because I care about something else <wink>: it would
> add to the pressure to refactor this code mercilessly, and that
> would be a Good Thing over the long term.  The current complexity
> isn't inherent, it's an artifact of outgrowing the original
> concrete-syntax-tree direct-to bytecode one-pass design.  Now we've
> got multiple passes crawling over a now- inappropriate program
> representation, glued together more by "reliable accidents" <wink>
> than sensible design.  That's all curable, and the pressures *to*
> cure it will continue to multiply over time (e.g., it would take a
> certain insanity to even think about folding pychecker-like checks
> into the current architecture).

Actually, the concrete syntax tree was never a very good
representation; it was convenient for the parser to generate that, and
it was "okay" (or "good enough") to generate code from and to do
anything else from.

I agree that it's a good idea to start thinking about changing the
parse tree representation to a proper abstract syntax tree.  Maybe the
normalization that the compiler.py package uses would be a good start?
Except that I've never quite grasped the visitor architecture there. :-(

> I agree it needs more detail, but at the start I'm more interested
> in the normal cases.  I'll reattach my no-holds-barred description
> of resolving normal-case "len" in this scheme.  Perhaps Jeremy could
> do the same for his.  Jeremy is also aiming at speeding things like
> math.pi (global.attribute) as a whole (not just speeding the "math"
> part of it).

One problem with that is that it's hard to know when <global> in
<global>.<attribute> is a module, and when it's something else.  I
guess global analysis could help -- if it's imported ("import math")
it's likely a module, if it's assigned from an expression ("L = []")
or a locally defined function or class, it's likely not a module.  But
"from X import Y" creates a mystery -- X could be a package containing
a module Y, or it could be a module containing a function or class Y.

> Regurgitatia:
> 
> """
> If I'm reading this right, then in the normal case of resolving "len" in
> 
> def mylen(s):
>     return len(s)
> 
> 1. We test func_cells for NULL and find out it isn't.

This step could be avoided using my trick of an array of dummy cells
or using your trick of a celldict containing an optional reference to
a real dict, so let's skip it.

> 2. A pointer to a cell object is read out of func_cells at a fixed (wrt
>    this function) offset.  This points to len's cell object in the
>    module's celldict.
> 3. The cell object's PyObject* pointer is tested and found to be NULL.
> 4. The cell object's cellptr pointer is tested and found not to be NULL.

This NULL test shouldn't be needed given my trick of linking cells
that do not shadow globals to themselves.

>    This points to len's cell object in __builtin__'s celldict.
> 5. The cell object's cellptr's PyObject* is tested and found not to be
>    NULL.
> 6. The cell object's cellptr's PyObject* is returned.
> """
> 
> For a module global, the same description applies, but the outcome of #3 is
> not-NULL and it ends there then.
> 
> For global.attr, step #3 yields the global, and then attr lookup is the same
> as today.
> 
> Jeremy, can you do the same level of detail for your scheme?  Skip?

Jeremy is probably still recovering with his family from the
conference.  I know I got sick there and am now stuck with a horrible
cold (the umpteenth one this season).

--Guido van Rossum (home page: http://www.python.org/~guido/)