init is the initialiser

Sat Feb 1 07:28:01 EST 2014

On 1/31/14 10:42 PM, Steven D'Aprano wrote:
> On Fri, 31 Jan 2014 14:52:15 -0500, Ned Batchelder wrote:
>
>> Why can't we call __init__ the constructor and __new__ the allocator?
>
> __new__ constructs the object, and __init__ initialises it. What's wrong
> with calling them the constructor and initialiser? Is this such a
> difficult concept that the average programmer can't learn it?
>
> I've met people who have difficulty with OOP principles, at least at
> first. But once you understand the idea of objects, it isn't that hard to
> understand the idea that:
>
> - first, the object has to be created, or constructed, or allocated
>    if you will;
>
> - only then can it be initialised.
>
> Thus, two methods. __new__ constructs (creates, allocates) a new object;
> __init__ initialises it after the event.
>
> (In hindsight, it was probably a mistake for Python to define two create-
> an-object methods, although I expect it was deemed necessary for
> historical reasons. Most other languages make do with a single method,
> Objective-C being an exception with "alloc" and "init" methods.)
>
>
>
> Earlier in this post, you wrote:
>
>> But that distinction [between __new__ and __init__] isn't useful in
>> most programs.
>
> Well, I don't know about that. I guess it depends on what sort of objects
> you're creating. If you're creating immutable objects, then the
> distinction is vital. If you're subclassing from immutable built-ins, of
> which there are a few, the distinction may be important. If you're using
> the object-pool design pattern, the distinction is also vital. It's not
> *rare* to care about these things.
>
>
>> The thing most people mean by "constructor" is "the method that gets
>> invoked right at the beginning of the object's lifetime, where you can
>> add code to initialize it properly."  That describes __init__.
>
> "Most people". I presume you've done a statistically valid survey then
> *wink*
>
> It *better* describes __new__, because it is *not true* that __init__
> gets invoked "right at the beginning of the object's lifetime". Before
> __init__ is invoked, the object's lifetime has already begun, inside the
> call to __new__. Excluding metaclass shenanigans, the object lifetime
> goes:
>
>
> Prior to the object existing:
> - static method __new__ called on the class[1]
> - __new__ creates the object[2]  <=== start of object lifetime
>
> Within the object's lifetime:
> - the rest of the __new__ method runs, which may perform arbitrarily
>    complex manipulations of the object;
> - __new__ exits, returning the object
> - __init__ runs
>
>
> So __init__ does not occur *right at the beginning*, and it is completely
> legitimate to write your classes using only __new__. You must use __new__
> for immutable objects, and you may use __new__ for mutable ones. __init__
> may be used by convention, but it is entirely redundant.
>
> I do not buy the argument made by some people that Python ought to follow
> whatever (possibly inaccurate or misleading) terminology other languages
> use. Java and Ruby have the exact same argument passing conventions as
> Python, but one calls it "call by value" and the other "call by
> reference", and neither is the same meaning of "call by value/reference"
> as used by Pascal, C, Visual Basic, or other languages. So which
> terminology should Python use? Both C++ and Haskell have "functors", but
> they are completely different things. What Python calls a class method,
> Java calls a static method. We could go on for days, just listing
> differences in terminology.
>
> In Python circles, using "constructor" for __new__ and "initialiser" for
> __init__ are well-established. In the context of Python, they make good
> sense: __new__ creates ("constructs") the object, and __init__
> _init_ialises it. Missing the opportunity to link the method name
> __init__ to *initialise* would be a mistake.
>
> We can decry the fact that computer science has not standardised on a
> sensible set of names for concepts, but on the other hand since the
> semantics of languages differ slightly, it would be more confusing to try
> to force all languages to use the same words for slightly different
> concepts.
>
> The reality is, if you're coming to Python from another language, you're
> going to have to learn a whole lot of new stuff anyway, so having to
> learn a few language-specific terms is just a small incremental cost. And
> if you have no idea about other languages, then it is no harder to learn
> that __new__ / __init__ are the constructor/initialiser than it would be
> to learn that they are the allocator/constructor or preformulator/
> postformulator.
>
> I care about using the right terminology that will cause the least amount
> of cognitive dissonance to users' understanding of Python, not whether
> they have to learn new terminology, and in the context of Python's object
> module, "constructor" and "initialiser" best describe what __new__ and
> __init__ do.
>

My summary of our two views is this:  I am trying to look at things from 
a typical programmer's point of view.  The existence of __new__ is an 
advanced topic that many programmers never encounter.  Taking a quick 
scan through some large projects (Django, edX, SQLAlchemy, mako), the 
ratio of __new__ implementations to __init__ implementations ranges from 
0% to 1.5%, which falls into "rare" territory for me.  Among programs 
less than 5000 lines long, I'm sure the number is indistinguishable from 
0, though I'm sure someone will question my methodology here as well! :)

You are looking at things from an accurate-down-to-the-last-footnote 
detailed point of view (and have provided some footnotes!).  That's a 
very valuable and important point of view.  It's just not how most 
programmers approach the language.

We are also both trying to reduce cognitive dissonance, but again, you 
are addressing language mavens who understand the footnotes, and I am 
trying to help the in-the-trenches people who have never encountered 
__new__ and are wondering why people are using funny words for the code 
they are writing.

Another difference in our approach: do you name things based on how they 
work under the hood, or how they are used?  I hope we can all agree that 
when writing a user-defined class, the code that in C++ or Java would go 
into the constructor, in Python typically goes in __init__.  When I say 
that __init__ plays the role of constructor, again, I mean from the 
typical programmer's point of view when writing typical user-defined 
classes.

Finding names for things is hard, and it's impossible to please both 
ends of this spectrum.

-- 
Ned Batchelder, http://nedbatchelder.com

__init__ is the initialiser

init is the initialiser