Modifying the {} and [] tokens

Sat Aug 23 09:56:13 EDT 2003

In article <ckjekvcm6pf4hhq59ro08p20gk6lf9himp at 4ax.com>,
	Geoff Howland <ghowland at lupineNO.SPAMgames.com> writes:

[snip]

> I've explored all of __builtins__ that I have been able to, and
> replacing dict there does nothing either, so I'm beginning to think
> that {}, [], etc are bound to types in the C code only and not
> available at all through Python to be re-typed.
> 
> Is this correct?  Is there no way to change type on these expression
> delimeters?
> 
> I would think that somewhere this is a Python accessable definitely of
> what <type 'dict'> is that could be altered, so far I cant seem to
> find any reference to it through the Python Language Reference, Python
> in a Nutshell, or any google search.
> 
> Python in a Nutshell specifies that dict(d={}) is essentially d = {},
> but I can't find a description of how this token/expression to type
> binding happens.

The parser compiles {} into the BUILD_MAP opcode which the eval loop 
interprets by calling PyDict_New(). This creates a dict object
independently of whatever __builtins__.dict is bound to.

I discovered this by disassembling some code and looking up the opcode
in the bytecode interpreter eval loop (Python/ceval.c).

>>> import dis
>>> code = compile("{}", "<test>", "single")
>>> dis.dis(code)
  1           0 BUILD_MAP                0
              3 PRINT_EXPR
              4 LOAD_CONST               0 (None)
              7 RETURN_VALUE

Reading the Language Reference can help a lot, too.

> It seems like it would be really handy to add features to all
> containers equally though, especially since you should be able to get
> new functionality from any code written even if it didn't know about
> your new features because it could instantiate with the new type.

I think that's precisely the reason why this feature will probably
never make it into Python. I, for one, don't want a module that I import
to change the meaning of a literal just because that module wants to use
a funky form of dict internally.

It's possible it could be encapsulated on a per-file basis, but that
would probably require adding another opcode for each builtin type. 

In the end, it's just not worth the trouble. 99.9% of the time, I think
you will find that subclassing the builtin types and giving the new
classes short names will suffice. I mean, how much of a hardship is it
to do the following:

class d(dict):
    def __add__(self, other):
        # stuff

d({1:2}) + d({3:4})

It's only 3 extra characters each time; everyone who reads it knows what
is going on, knows to look for "class d" to find out how it's different
from a normal dict; it doesn't screw up other people's code; and it even
allows for you to define "class YetAnotherDict(dict)" which has
completely different behavior and use both at the same time.

If you're still not convinced, ask yourself these questions:

* How would you apply the new subclass?
  - Only on new literals after the subclass definition (and registry of
    the subclass with some special hook)?
  - On every new object that would normally have been the base type?
  - On every previously existing object with the base type?
  - In just the one module? or others which import it? or also in
    modules imported after the one with the subclass?

* How does your choice above work with code compiled on-the-fly with
  eval, exec or execfile?

* How do you deal with multiple subclasses being defined and registered?

* How would you pass in initialization information?
  E.g. say I want to limit the length of lists

  class LimitList(list):
    def __init__(self, maxlength, data=[]):
      self.maxlength = maxlength
      list.__init__(self, data)
    def append(self, value):
      if len(self) == self.maxlength:
        raise ValueError, "list at maximum length"
      else:
        list.append(self, value)

* Can I get a real dict again if wanted to?

* Given your choices above, how can you implement it in such a way that
  you don't interfere with other people's code by accident?

Okay, so the last one isn't really fair, but the answers you give on the
other questions should help define in your mind the kind of behavior you
want. Reading up on Python internals with the Language Reference, the
dis module documentation, and some source code should give you an idea
of how one might go about an implementation and (more importantly) the
compromises one would have to make. 

Then compare these specific features with the current way. Which is
safer? Which is more readable by someone unfamiliar with the code? Which
is more flexible? Which saves the most typing?

> BTW, some of the other comments in my team have been for a desire of
> more inclusive OO, such as [].len() instead of len([]), and this sort
> of thing.  Having them builtin is obviously great and useful, but it
> feels wrong to some people and I'm trying to work on making things
> smoother instead of just forcing them to adapt (which they may not
> choose to do).

My condolences on having to deal with such a team. It seems a little
silly to me to equate OO with Everything-Must-Be-a-Method-Call. But if
they insist, point them at [].__len__(). len([]) just calls [].__len__()
anyways. After writing code like that for a while, they'll probably get
over their method fixation.

> -Geoff Howland
> http://ludumdare.com/

-- 
Robert Kern
kern at caltech.edu

"In the fields of hell where the grass grows high
 Are the graves of dreams allowed to die."
  -- Richard Harter