Python 2 namespace change? (was Re: [Python-Dev] Changing existing class instances)

Guido van Rossum guido@python.org
Thu, 03 Feb 2000 14:45:35 -0500


> No.  The idea is to have "association" objects. We can create
> these directly if we want:
> 
>   a=Association('limit',100)
>   print a.key, a.value # whatever
> 
> The association value is mutable, but the key is not.
> 
> A namespace object is a collection of association objects
> such that no two items have the same key. Internally, this
> would be very much like the current dictionary except that
> instead of an array of dictentries, you'd have an array of
> association object pointers.  Effectively, associations
> are exposed dictentries.
> 
> Externally, a namspace acts more or less like any
> mapping object. For example, when someone does a getitem, 
> the namespace object will find the association with the 
> desired key and return it's value.  In addition, a namspace
> object would provide methods along the lines of:
> 
>   associations()
> 
>     Return a sequence of the associations in the namespace
> 
>   addAssociation(assoc)
> 
>     Add the given association to the namsspace.  This
>     creates another reference to the association. 
>     Changing the association's value also changes the value
>     in the namespace.
> 
>   getAssociation(key)
> 
>     Get the association associated with the key.
> 
> A setitem on a namespace modifies an existing association
> if there is already an association for the given key.

I presume __setitem__() creates a new association if there isn't one.
I also presume that if an association's value is NULL, it doesn't show
up in keys(), values() and items() and it doesn't exist for has_key()
or __getitem__().

What does a delitem do?  Delete the association or set the value to
NULL?  I suppose the latter.

> For example:
> 
>   n1=namespace()
>   n1['limit']=100
>   n2=namespace()
>   n2.addAssociation(n1.getAssociation('limit'))
>   print n2['limit'] # prints 100
>   n1['limit']=200
>   print n2['limit'] # prints 200
> 
> When a function is compiled that refers to a global
> variable, we get the association from the global namespace
> and store it. The function doesn't need to store the global
> namespace itself, so we don't create a circular reference.

For this to work we would have to have to change the division of labor
between the function object and the code object.  The code object is
immutable and contains no references to mutable objects; this means
that it can easily be marshalled and unmarshalled.  (Also, when a code
object is compiled or unmarshalled, the globals in which its function
will be defined may not exist yet.)  The function object currently
contains a pointer to the code object and a pointer to the dictionary
with the globals.  (It also contains the default arg values.)

It seems that for associations to work, they need to be placed in the
function object, and the code object somehow needs to reference them
through the function object.  To make this concrete: if a function
references globals a, b, and c, these need to be numbered, and the
bytecodes should look like this:

	LOAD_GLOBAL	0	# a
	STORE_GLOBAL	1	# b
	DEL_GLOBAL	2	# c

(This could be compiled from ``b = a; del c''.)

The code object should also contains a list of global names, ordered
by their ordinals, e.g. ("a", "b", "c").

Then when the function object is created, it looks in that list and
creates a corresponding list of associations, e.g.:

	L = []
	for name in code.co_global_names:
	    L.append(globals.getAssociation(name))

The VM then sticks a pointer to this list into the frame, whenever the
function is called (instead of the globals dict which it sticks there
now), and the LOAD/STORE/DEL_GLOBAL opcodes reference the associations
through this list.

Some complications left as exercises:

- The built-in functions (and exceptions, etc.) should also be
referenced via associations; the loop above would become a bit
trickier since it needs to look in two dicts.  (We're assuming that
the code generator doesn't know which names are globals and which are
built-ins.)

- If the association for a name doesn't yet exist, it should be
created.

Note that the semantics are slightly different than currently: the
decision whether a name refers to a global or to a built-in is made
when the function is defined rather than each time when the name is
referenced.  This is a bit cleaner -- in the type-sig we're making
similar assumptions but the decision is made even earlier.

But, overall the necessary changes to the implementation and to the
semantics (e.g. of the 'for' statement) seem prohibitive to me.

I also think that the namespace implementation will be quite a bit
less efficient than a regular dictionary: currently, a dictionary
entry is a struct of 12 bytes, and the dictionary has an array of
these tightly packed.  Your association objects will be "real"
objects, which means they have a reference count, a type pointer, a
key, and a value, i.e. 16 bytes, without counting the malloc overhead;
this probably comes in addition to the 12 bytes in the dict entry.
(If you want to have the association objects directly in the hash
table, they can't be shared between namespaces, and a namespace
couldn't grow -- when a dict grows its hash table is reallocated.)

> Note that circular references are bad even if we have
> a more powerful gc.

I don't understand or believe this statement.

> For example, by not storing the global 
> namespace in a function, we don't have to worry about the
> global namespace being blown away before a destructor is run
> during process exit.

If we had more powerful gc the global namespace wouldn't have to be
blown away at all (it would gently dissolve when __main__ was deleted
from the interpreter).

> When we use the global variable
> in the function, we simply get the current value from the
> association. We don't have to look it up.
> 
> Namespaces would have other benefits:
> 
>   - improve the semantics of:
> 
>       from spam import foo
> 
>     in that you'd be importing a name binding, not a value

But its semantics will be harder to explain, because they will no
longer be equivalent to

	import spam	# assume there's no spam already
	foo = spam.foo
	del spam

Also, we currently *explain* that only objects are shared and name
bindings are unique per namespace; this would no longer be true so we
would have to explain a much harder rule.  ("If you got your foo
through an import from another module, assigning to it will affect foo
in that other module too; but if you got it through a local
assignment, the effect will be local.")

All in all, I think these semantics are messy and unacceptable.  True,
object sharing is hard to explain too (see diagram on Larning Python
page 60), but you'll still have to explain that anyway because it
still exists within a namespace; but now in addition we'd have to
explain that there is an exception to object sharing...  Messy, messy.

>   - Be useful in any application where it's desireable to 
>     share a name binding.    

I think it's better to explicitly share the namespace -- "foo.bar = 1"
makes it clear that whoever else has a reference to foo will see bar
similarly changed.

> > > Again, it would also make function global variable access
> > > faster and cleaner in some ways.
> > 
> > But I have other plans for that (if the optional static typing stuff
> > ever gets implemented).
> 
> Well, OK, but I argue that the namespace idea is much simpler
> and more foolproof.

I claim that it's not foolproof at all -- on the contrary, it creates
something that hides in the dark and will bite us in the behind by
surprise, long after we thought we knew there were no monsters under
the bed.  (Yes, I've been re-reading Calvin and Hobbes. :-)

> > > > however it would break a considerable amount of old code,
> > > > I think.
> > >
> > > Really? I wonder. I bet it would break alot less old
> > > code that other recent changes.
> > 
> > Oh?  Name some changes that broke a lot of code?
> 
> The move to class-based exceptions broke alot of our code.

It must have been very traumatic that you're still sore over that;
it was introduced in 1.5, over two years ago.

> Maybe we can drop this point. Do you still think
> that the namespace idea would break alot of code?

Yes.

--Guido van Rossum (home page: http://www.python.org/~guido/)