Modifying the {} and [] tokens

Sat Aug 23 21:54:59 EDT 2003

In article <km4fkv8qbbge3mvckv5gsim9amq59qvohb at 4ax.com>,
	Geoff Howland <ghowland at lupineNO.SPAMgames.com> writes:
> On Sat, 23 Aug 2003 13:56:13 +0000 (UTC), kern at taliesen.caltech.edu
> (Robert Kern) wrote:

[snip]

>>In the end, it's just not worth the trouble. 99.9% of the time, I think
>>you will find that subclassing the builtin types and giving the new
>>classes short names will suffice. I mean, how much of a hardship is it
>>to do the following:
>>
>>class d(dict):
>>    def __add__(self, other):
>>        # stuff
>>
>>d({1:2}) + d({3:4})
> 
> It's not actually the hardship of doing this, there is also the
> conversion of everything that doesn't create MyDict by default.  I
> would have to keep converting them to perform the same manipulations,
> or other functions (add is only an important example, others could be
> handy as well).
> 
> Now I will need additional lines to do this, or make the lines that
> exist more complicated.  It's not undue hardship in a lot of ways, but
> it would be nice not to have to and be able to keep the characters
> used to a minimum.  Espcially when trying to write sub-50 line scripts
> and such.

I understand this feeling. In the NumPy world, we have the same problem
with subclassing arrays. The builtin functions still return regular
arrays.

One particular type of solution is often possible. For example, making
an __add__ method for dicts probably doesn't require any other new
methods, just the normal dict ones (copy, update). That is, it doesn't
require the other object to be MyDict, just a normal dict. So you can
write:

d({1:2}) + {3:4}

Especially for the situations you are considering, where you are only
adding functionality, not changing the way other methods work, this will
often be a viable way of doing what you want.

>>If you're still not convinced, ask yourself these questions:
> 
> I actually feel confident I would be happy with my provided answers as
> real functionality.  I don't believe Python will be changed to allow
> that, and that is fine with me.  I just wanted to know if it could be
> done the way Python is now.

The point of this exercise is to clarify the details what you actually
want, try to square them with the current architecture and each other,
and finally to show why they aren't there now.

>>* How would you apply the new subclass?
>>  - Only on new literals after the subclass definition (and registry of
>>    the subclass with some special hook)?
>>  - On every new object that would normally have been the base type?
> 
> Yes and yes.
> 
> It is possible, if this were possible, to only make this work for
> classes that imported it if they so desired I expect.  Explicitly
> requested only, etc.

Not in any clean way. See below.

>>  - On every previously existing object with the base type?
> 
> No, they're already defined.  Unless my understanding of the way types
> are done is lacking, and they are referencing the same base object
> code, and then maybe yes to this as well, but it would be passive.

The question is "do you really want this?" It would interfere with other
modules in a very invasive way and according to what you say below is
not what you want.

>>  - In just the one module? or others which import it? or also in
>>    modules imported after the one with the subclass?
> 
> In the module, and any that import it.  It fails to be useful in
> making things clean-and-neat if in order to make a clean-and-neat
> script you have to first have a bunch of stuff that modifies the
> language norms.

I think this is impossible without taking a performance hit in the
construction of all object with a builtin type. Every internal call to
PyDict_New, PyList_New, etc. would have to check the registry and
somehow know what module it is being called from. There would have to be
a reworking of the internals to give the Py<Object>_New functions that
kind of information. And you get a performance hit even if you don't
subclass builtin types.

And if you do subclass in your module, just for internal use, how can
you import this module into another without affecting it as well? That
is, you only want what the module does, not the redefinition of dicts?

>>* How does your choice above work with code compiled on-the-fly with
>>  eval, exec or execfile?
> 
> It would depend on how Python allowed this change to work.  If it did
> it the way "I imagine" it would, then the change would effect all of
> these circumstances.

I don't think it works the way you imagine. I think there would have to
be even more reworking of the way these work to pass in the information
that the exec'd code should use the redefinitions in the current module.

>>* How do you deal with multiple subclasses being defined and registered?
> 
> If they re-wrote the same thing and didn't just augment it (or
> overwrote each other), then I would expect them to clobber each other.
> Problems could ensue.

This is where I say "too much rope."

> However, why isn't having {} + {} or [].len() part of the language
> anyway?  If it was, people wouldnt have to keep writing their own.
>
> {} + {} doesnt have to solve ALL cases, which is why I'm assuming it
> isnt already implemented as [] + [] is.  It just has to solve the
> basic case and provide the mechanism for determining the difference.
> Ex, key clobbering add, and an intersect() function to see what will
> be clobbered and do something about it if you care.
> 
> Isn't it better to have some default solution than to have none and
> everyone keeps implementing it themselves anew every time?

Unless the default solution is the one clear way to do it, no. Since it
only takes a few lines to implement in any one way that you might want
to do it, there is very little incentive to make the default way one of
them. You want clobbering; I'll probably want the values added to each
other as in a sparse array.

For the same reason, the Python standard library doesn't have a priority
queue implementation. There are so many different ways to write a
priority queue with different behaviors, tradeoffs, and applications.
Since there are all pretty easy to implement in a few number of lines,
there is no incentive to include a default one in the standard library.

> For the [].len() type things, this is obviously a matter of taste, and
> I have always been ok with len([]), but in trying to get team mates to
> work with Python I have to concede that [].len() is more consistent
> and a good thing.  Same for other builtins that apply.

Terry explains the historical reasons very well. I would also add that
the "len(obj) calls obj.__len__()" rule enforces a consistency among
user-defined classes and types. The canonical way to report a length is
to define a __len__ method, and the canonical way to get a length is to
call len(obj). If this were not the case, objects would be defining
.len(), .length(), .getLength(), .number(), .cardinality(), etc.

>>* How would you pass in initialization information?
>>  E.g. say I want to limit the length of lists
>>
>>  class LimitList(list):
>>    def __init__(self, maxlength, data=[]):
>>      self.maxlength = maxlength
>>      list.__init__(self, data)
>>    def append(self, value):
>>      if len(self) == self.maxlength:
>>        raise ValueError, "list at maximum length"
>>      else:
>>        list.append(self, value)
> 
> I am actually interested in new features than modifying current ones,
> but I suppose you would have this ability as well since you have
> access.
> 
> I would say if you break the initialization or any other methods of
> accessing the type in a default way, then you have just given yourself
> broken code.  No one other function will work properly that uses it
> and didn't expect to.

Again, "too much rope."

> You will see it doesn't work, and stop.  I didn't say I wanted to
> change ALL the behavior of the language, I just wanted to add features
> that dont currently exist and I think should.

I, for one, rarely ever subclass the builtin types to add something
orthogonal to the original methods. I'm usually modifying the behavior
of current methods. There's no clean way for an implementation to know
this upon registry and raise an error.

>>* Can I get a real dict again if wanted to?
> 
> Save it, dont clobber the original out of existance.  The coder could
> mess this up (more rope).
> 
>>* Given your choices above, how can you implement it in such a way that
>>  you don't interfere with other people's code by accident?
> 
> There could be module limitations, but when I imagined this it was
> globally reaching.  I believe that as long as you add features, or any
> changes you make dont break things, then you will be safe.  If you
> mess with features that exist then you will break things and your
> programs will not work.
> 
> This seems natural enough to me.  You ARE changing a default behavior,
> there are risks, but if you dont change things that other functions
> use (pre-existing methods), then you are safe.

This restriction really limits the usefulness of the feature. Compare
the changes that would have to be made at the architecture level to the
drawbacks of using MyDict({}).

>>Okay, so the last one isn't really fair, but the answers you give on the
>>other questions should help define in your mind the kind of behavior you
>>want. Reading up on Python internals with the Language Reference, the
>>dis module documentation, and some source code should give you an idea
>>of how one might go about an implementation and (more importantly) the
>>compromises one would have to make. 
> 
> It seems the changes would have to be to the Python source, and I dont
> want to make a new Python.  I dont even want to "change" Python, I
> want to augment it.  I think these are reasonable additions, and I'm
> sure I missed the discussions on why they aren't there now.
> 
> There may be very good reasons, and I could end up retracting my
> desired feature set because of them, but I still wanted to see if I
> could make it work for my own code.

At the Python prompt, "import this".

>>Then compare these specific features with the current way. Which is
>>safer? Which is more readable by someone unfamiliar with the code? Which
>>is more flexible? Which saves the most typing?
> 
> I think my answers above are sound.  If it worked the way I hoped it
> might, you could change things on a global scale and you still
> wouldn't break anything.

Well, Python isn't architectured the same way that Ruby is. There are
fundamental problems in doing what you want.

>>My condolences on having to deal with such a team. It seems a little
>>silly to me to equate OO with Everything-Must-Be-a-Method-Call. But if
>>they insist, point them at [].__len__(). len([]) just calls [].__len__()
>>anyways. After writing code like that for a while, they'll probably get
>>over their method fixation.
> 
> I never had a problem with this myself, but people are different.  I
> have a great team, everyone has their own opinions.  No need to
> belittle them for differing.
> 
> Some people never get over the whitespacing.  It's just the way the
> people work.  No need to see them as flawed.  :)

No belittling was intended. I just wanted to express that I just don't
see the sense in it, and that if I had to work with such a team, I would
be tearing my hair out. I lose enough hair interfacing with FORTRAN,
thank you very much. :-)

Preferring [].len() to len([]) is one thing, claiming it's intrinsically
more OO is another.

> -Geoff Howland
> http://ludumdare.com/

-- 
Robert Kern
kern at caltech.edu

"In the fields of hell where the grass grows high
 Are the graves of dreams allowed to die."
  -- Richard Harter