init is the initialiser

Fri Jan 31 22:42:24 EST 2014

On Fri, 31 Jan 2014 14:52:15 -0500, Ned Batchelder wrote:

> Why can't we call __init__ the constructor and __new__ the allocator?

__new__ constructs the object, and __init__ initialises it. What's wrong 
with calling them the constructor and initialiser? Is this such a 
difficult concept that the average programmer can't learn it?

I've met people who have difficulty with OOP principles, at least at 
first. But once you understand the idea of objects, it isn't that hard to 
understand the idea that:

- first, the object has to be created, or constructed, or allocated 
  if you will;

- only then can it be initialised.

Thus, two methods. __new__ constructs (creates, allocates) a new object; 
__init__ initialises it after the event.

(In hindsight, it was probably a mistake for Python to define two create-
an-object methods, although I expect it was deemed necessary for 
historical reasons. Most other languages make do with a single method, 
Objective-C being an exception with "alloc" and "init" methods.)

Earlier in this post, you wrote:

> But that distinction [between __new__ and __init__] isn't useful in
> most programs.

Well, I don't know about that. I guess it depends on what sort of objects 
you're creating. If you're creating immutable objects, then the 
distinction is vital. If you're subclassing from immutable built-ins, of 
which there are a few, the distinction may be important. If you're using 
the object-pool design pattern, the distinction is also vital. It's not 
*rare* to care about these things.

> The thing most people mean by "constructor" is "the method that gets
> invoked right at the beginning of the object's lifetime, where you can
> add code to initialize it properly."  That describes __init__.

"Most people". I presume you've done a statistically valid survey then 
*wink*

It *better* describes __new__, because it is *not true* that __init__ 
gets invoked "right at the beginning of the object's lifetime". Before 
__init__ is invoked, the object's lifetime has already begun, inside the 
call to __new__. Excluding metaclass shenanigans, the object lifetime 
goes:

Prior to the object existing:
- static method __new__ called on the class[1]
- __new__ creates the object[2]  <=== start of object lifetime

Within the object's lifetime:
- the rest of the __new__ method runs, which may perform arbitrarily
  complex manipulations of the object;
- __new__ exits, returning the object
- __init__ runs

So __init__ does not occur *right at the beginning*, and it is completely 
legitimate to write your classes using only __new__. You must use __new__ 
for immutable objects, and you may use __new__ for mutable ones. __init__ 
may be used by convention, but it is entirely redundant.

I do not buy the argument made by some people that Python ought to follow 
whatever (possibly inaccurate or misleading) terminology other languages 
use. Java and Ruby have the exact same argument passing conventions as 
Python, but one calls it "call by value" and the other "call by 
reference", and neither is the same meaning of "call by value/reference" 
as used by Pascal, C, Visual Basic, or other languages. So which 
terminology should Python use? Both C++ and Haskell have "functors", but 
they are completely different things. What Python calls a class method, 
Java calls a static method. We could go on for days, just listing 
differences in terminology.

In Python circles, using "constructor" for __new__ and "initialiser" for 
__init__ are well-established. In the context of Python, they make good 
sense: __new__ creates ("constructs") the object, and __init__ 
_init_ialises it. Missing the opportunity to link the method name 
__init__ to *initialise* would be a mistake.

We can decry the fact that computer science has not standardised on a 
sensible set of names for concepts, but on the other hand since the 
semantics of languages differ slightly, it would be more confusing to try 
to force all languages to use the same words for slightly different 
concepts.

The reality is, if you're coming to Python from another language, you're 
going to have to learn a whole lot of new stuff anyway, so having to 
learn a few language-specific terms is just a small incremental cost. And 
if you have no idea about other languages, then it is no harder to learn 
that __new__ / __init__ are the constructor/initialiser than it would be 
to learn that they are the allocator/constructor or preformulator/
postformulator.

I care about using the right terminology that will cause the least amount 
of cognitive dissonance to users' understanding of Python, not whether 
they have to learn new terminology, and in the context of Python's object 
module, "constructor" and "initialiser" best describe what __new__ and 
__init__ do.

[1] Yes, despite being declared with a "cls" parameter, __new__ is 
actually hard-coded as a static method.

[2] By explicitly or implicitly calling object.__new__.

-- 
Steven

__init__ is the initialiser

init is the initialiser