why cannot assign to function call

Sun Jan 11 13:28:59 EST 2009

Aaron Brady <castironpi at gmail.com> wrote:

> True or not, it requires the reader to know what references are.  And,
> since your definition conflicts with the C++ definition, it's not
> clear that the requirement is good.

I blame C++ for coopting a perfectly good word with a established
well-understood meaning, and applying it to a half-baked and
inconsistent concept.

> More terminology: is 'a' a variable, a reference, or an object, in
> 'a=2'?
> 
> It's a variable; it refers to an object.
> It's a variable, which refers to an object.
> It's a variable, which is a reference to an object.
> It's a reference to an object.

It's an /identifier/ (2.3); or, if you want a shorter word, a /name/
(also 2.3).  (I'll go with `name'.)  In any given context, the name
might or might not mean something.  Names are assigned meanings by an
/environment/; if the environment assigns a meaning to a name, we say
that the name is /bound/ to that meaning; otherwise it is /unbound/.

The rules for determining the environment at any given point in a
program vary with language.  Python is fairly simple: it is /lexically
scoped/, which means that a great deal of information about the
environment can be determined simply by analysing the program text
statically.  (Compare `dynamic scope', where the environment at any
point in a program's executiondepends quite subtly on which functions
are active at that point.  The main difference is that, under lexical
scoping, a function's environment is an extension of the environment in
which it was defined, whereas under dynamic scoping a function's
environment is an extension of the environment of the caller.  Languages
can offer both kinds of scoping simultaneously: both Common Lisp and
Perl do this.)

Python is /block-structured/: various syntactic forms (/blocks/)
introduce new environments which are extensions of the environment of
the enclosing block, i.e., within the inner block, a few names are bound
to new meanings, while all the other names retain whatever meaning they
had in the enclosing block.  Names whose meanings are modified are said
to be /bound by the block/.  A name which occurs within the block is a
/bound occurrence/ if the name is bound by the block, or a /free
occurrence/ otherwise.

Python has what one might call `implicit binding': a name appearing
alone (or as part of a destructuring pattern) on the left-hand side of
an assignment which would otherwise be a free occurrence, and in the
absence of an explicit declaration such as `global' or `nonlocal', is
implicitly bound to a fresh variable by the smallest enclosing block.
Python also has explicit binding: e.g., the parameter names of a
function are explicitly bound to new variables by the function.

So, where were we?  `a = 2'.  The `a' is a name.  It appears alone on
the left-hand side of an assignment: this is therefore a candidate for
implicit binding.  Let's assume that `a' is bound to a variable, either
as a result of this or some other implicit binding, or an explicit
declaration.  (I don't believe Python has meanings other than variables
which might be bound to a name, so this is pretty safe.  Scheme, for
example, puts macros and special syntactic keywords in the same
namespace as variables, so you can, for example, lexically rebind the IF
keyword as a function, should you so wish.  This is probably a bad
idea.)

So, `a' is bound to a variable.  The variable, like all Python
variables, stores a reference.  We don't know what this reference might
be before the assignment, but afterwards, we know that it must be a
reference to an integer object representing the value 2.

This is cumbersome to talk about; in informal conversations, one often
talks about `the variable a' or even `the integer a'.  It's important to
realise that such phrases are abbreviations for convenience, and do not
directly correspond to reality.

> Similarly, is '2' an expression, a reference, or an object?
>
> It's an expression; it evaluates to an object.
> It's an expression; it evaluates to a reference to an object.
> It's an expression; it expresses an object.
> It's an expression; it refers to an object.
> It's a reference to an object.
> It's an object.

It's an /integer literal/ (2.4.3).  It's also an expression, because all
literals are expressions.  We can confidently predict that the value of
this expression (i.e., the result of evaluating it) is an integer object
representing the value 2.

> In the case of the Socratic, non-bludgeoning, dialogue with a student,
> if the student can be trusted to question intelligently, s/he can be
> expected, on our telling him/er, "'a' is an object", to ask, "What
> object?"  Whether to expect audience interaction, and what
> interaction, is a big component in the choice of method of a
> demonstration.

Declaring `a' to be an object begs many questions, such as `what type
does this object have?'.  This is an unfortunate question, because the
naive answer (e.g., from the above: `it's an integer') comes up against
the problem that one can later say `a = "mumble"', causing one to
declare `it's now a string'; but this contradicts 3.1: `An object's
/type/ is also unchangeable' (with a footnote about how this might not
be the complete truth).  We could try to escape by saying that `a' is
now a /different/ object, but that's strange because its appearance
(i.e., the letter `a') hasn't changed at all.

There are other problems with naively declaring that `a' is an object.
In particular, if I just type `a' at a fresh Python, it says `name 'a'
is not defined'.  (Among other things, that tells me that `a' is a
name.)  Now the question `which object?' naively yields the answer
`err... none -- no, not None, but none, there isn't one.'.

At some level within the compiler, the `a' you type gets represented by
an `object', though whether 

> >>> 300+301+302 is 300+301+302
> False
> 
> There are 10 objects created in the evaluation of this, 11 if
> including 'False'.  They (their integer contents), in order, are: 300,
> 301, 302, then 300, 301, 302, not the same ones, again, then 602, 903,
> 602, 903, and then possibly 'False' *.

Here we enter some philosophical (and implementation-specific)
territory.  Does the evaluation of a literal /create/ an object?  The
answer seems to be `maybe'.

>>> 123456789 is 123456789
True

Hmm.  So at most one of the occurrences created a new object.  I
wonder...

>>> def foo(x):
...   if x is 123456789: print 'snap!'
...   return 123456789
...
>>> foo(123456789)
123456789
>>> foo(_)
snap!
123456789

It seems as if foo has a private copy of the magic number, rather than
generating a new one (indeed, two new ones) each time.

There is only one False object in the system (like there is only one
None object).  We can be sure that it didn't create that False specially
for us.

Given this, it doesn't seem right to talk about an expression
necessarily creating objects, especially ones whose value appears
literally.  As another example, with strings this time:

>>> def foo(): return '!mumble'
...
>>> foo() is '!mumble'
False
>>> foo() is foo()
True

(The `!' suppresses the CPython compiler's interning process.)  Tuples
behave similarly.  Lists, obviously, do not -- otherwise we could
observably change the function's behaviour by mutating the list.

(Lisp makes all of this much more explicit, since the input to the
compiler consists of objects -- constructed by the reader -- rather than
text, and one expects that, for example, self-evaluating and quoted
objects evaluate identically to themselves, rather than creating new
copies.)

> * Based on a run of a build of 'r26:66714' with extra output from
> 'PyInt_FromLong' and command line '-c "300+301+302 is 300+301+302"'.
> Actual output showed 624 int objects created (!) with this function
> alone, ending with: 300 301 302 300 301 302 11788072 8192 11788072 601
> 903 601 903!  Any ideas about those extra three?

My guess is that the first six at least are generated at compile-time,
and then probably merged when the literal table is constructed.  

Some digging about with gdb shows that most of the early integer object
constructions are from random modules imported on startup (even with
`-S').  The `extra three' -- in your case 11788072 8192 11788072 -- are
caused within PyAST_Compile.  The first is by PySTEntry_New,
constructing a new symbol table, and the argument is actually the
address of a _mod structure explaining what we're meant to be compiling.
The second is the value of FREE << SCOPE_OFF, from within
update_symbols, and the last is in PySymtable_Lookup, and again it's
converted the _mod address to an integer.

This doesn't actually leave me much the wiser.  But basically they're
artifacts of internal compiler wrangling.

-- [mdw]