[Python-ideas] JavaScript-Style Object Creation in Python (using a constructor function instead of a class to create objects)

Steven D'Aprano steve at pearwood.info
Sun May 14 07:14:36 EDT 2017


Some further thoughts...


On Sun, May 14, 2017 at 04:07:44AM +0000, Simon Ramstedt wrote:

> *proposed*:
> 
>     def MyClass(x):
>       self = ParentClass()
>       def my_method(y):
>         z = x + y
>         return z
>       self.my_method = my_method  # that's cumbersome (see comments below)
>       return self

I think I misunderstood this earlier. "x" in this case is intended as a 
parameter to the __init__ method, not as a parameter to the "my_method" 
method. So my earlier objection doesn't apply: this isn't about the 
arguments to my_method, it is about the parameters to __init__.

To my mind, this is very confusing. The function is called MyClass, 
leading me to believe that it returns the class object, but it doesn't, 
it returns a newly instantiated instance.

But... given obj = MyClass(1), say, we get:

assert type(obj) is ParentClass

So MyClass doesn't actually exist anywhere. Worse:

def Spam():
    self = Parent()
    # ...

def Eggs():
    self = Parent()
    # ...


a = Spam()
b = Eggs()

assert type(a) is type(b)

As you point out later in your post:

> (-/+) Checking types: In the proposed example above the returned object
> wouldn't know that it has been created by `MyClass`.

In fact, there is no MyClass class anywhere. That's very strange and 
confusing.


> There are a couple of
> solutions to that, though. The easiest to implement would be to change the
> first line to `self = subclass(ParentClass())` where the subclass function
> looks at the next item in the call stack (i.e. `MyClass`) and makes it the
> type of the object.

You say this is the easiest to implement. Do you have an implementation? 
Does it work for CPython, Jython, IronPython, PyPy, Stackless, Nuitka, 
and other Python implementations? What of Python implementations that 
don't support intraspection of the call stack?

(Perhaps we should just make that a required language feature. But 
that's a separate argument.)


> Another solution would be to have a special rule for
> functions with capital first letter returning a single object to append
> itself to the list of types of the returned object. Alternatively there
> could be a special keyword e.g. `classdef` that would be used instead of
> `def` if we wouldn't want to rely on the name.

There needs to be some compiler magic happening here. Whether it is in 
the def keyword or in a new built-in "subclass()" or "classdef", any way 
you do it it needs to be built into the interpreter.

That means its backwards incompatible, and cannot be backported to older 
versions or alternative implementations. That's not necessarily a fatal 
objection, but it does mean that the feature needs to add substantial 
value to the language to make up for the cost of having two ways to do 
the same thing.

> Here are the pros and cons I could come up with for the proposed method:
> 
> (+) Simpler and more explicit.

To elaborate on my earlier answer, I really don't think it is simpler. 
There's a lot of extra compiler magic going on to make it work, and 
we're lacking a reference to the actual class object itself. You can't 
say:

MyClass.attribute = 999  # add a class attribute

because MyClass isn't actually the class, it's a special constructor of 
the class. Not the __init__ method, or the __new__ method, but a factory 
function that somehow, behind the scenes, magically creates the class 
and returns a new instance of it. So to add to the class, you have to 
write:

instance = MyClass()  # hope this has no side-effects
type(instance).attribute = 999
del instance


A couple of other consequences that you might have missed:

(1) No docstring: there's no obvious place to declare the class 
docstring.

(2) Every instance gets a new, unique, my_method() object added to it.

Every time you call the MyClass constructor/factory function, it creates 
a brand new my_method object, and attaches it to the instance, self. 
That's potentially wasteful of memory, but it will work.

Well, not quite. As written, it is quite different from the way methods 
are normally handled. In this new suggested syntax, my_method is 
actually a function object on the instance, with no self parameter. 
Since the descriptor protocol is bypassed for objects on self, that 
doesn't matter, it will work. But now we have two ways of writing 
methods:

- if you write methods using the class keyword, they will be stored in 
  the class __dict__ and they MUST have a `self` parameter;

- if you write methods using the constructor function, they will be 
  stored in the instance __dict__, they will be function objects not 
  method objects, they MUST NOT have a `self` parameter, and you cannot 
  use classmethod or staticmethod.

I can see this leading to severe confusion when people mistakenly add a 
self parameter or leave it out.

To avoid this, we need yet more magic, or work: instead of writing 

    self.my_method = my_method

you have to write something like:

    type(self).my_method = my_method

except that's not right either! That will mean each time you call the 
constructor function, you replace the my_method for EVERY instance with 
a new closure.

This is not simple at all.


[...]
> (+) Class/instance level imports would work.

Why do you think that import doesn't work at the class or instance?



> (-/+) Speed: The `def`-based objects take 0.6 μs to create while the
> `class`-based objects take only 0.4 μs. For method execution however the
> closure takes only 0.15 μs while the proper method takes 0.22 μs (script
> <https://gist.github.com/rmst/78b2b0f56a3d9ec13b1ec6f3bd50aa9c>).

Premature optimization.


> (-) The current syntax for adding a function to an object is cumbersome.

No more cumbersome than any other value:

    obj.attribue = value

works for any value, including functions, provided that it already 
exists (or can be created using a single expression). If not, then the 
syntax is no worse for functions than any other multi-statement object:

# prepare the value
a = []
while condition():
    a.append(something())

# attach it to the object
obj.attribute = a


> That's what is preventing me from actually using the proposed pattern. But
> is this really the only reason for not using it? And if so, wouldn't that
> be a good argument for enabling something like below?
[...]
>       def self.my_method(y):

That's been proposed at least once before, and probably more than that. 
I think the most recent time was earlier this year?

I don't think I'm against that specific language feature. But nor is it 
an obviously compelling feature.


> or alternatively *multiline lambdas*:

Gosh, what a good idea! I wonder why nobody else has suggested a 
multi-statement lambda!

/sarcasm


Multi-statement lambdas have been debated since, oh, Python 1.4 if not 
earlier. If they were easy to add, we would have had them by now.

At the point that your simple proposal to add Javascript-style 
constructors requires multiple other changes to the language to be 
practical:

- either multi-statement lambda, 
- or dotted function targets `def self.name`; 

- a new built-in function subclass() with call stack introspection 
powers,
- or a new keyword classdef,
- or significant magic in the def keyword;

this is not a simple proposal. And the benefits are marginal, at best: 
as far as I can see, the best we can hope for is a tiny performance 
increase by using closures as methods, and some interesting semantic 
differences: 

- classmethod and staticmethod won't work (this is bad);

- per-instance methods: methods are in the instance __dict__, not 
  the class __dict__ (this might be good, or bad);

- bypass the descriptor protocol;

- no self (that's a bad thing);

- mutator methods need to use the nonlocal keyword (bad).



-- 
Steve


More information about the Python-ideas mailing list