Names and identifiers

Wed Jun 6 21:23:31 EDT 2018

Disclaimer: Ido not see Stefan's original post. I recall that he has set 
some sort of header on his posts which means they are not processed by 
Gmane, but unfortunately I no longer have any of his posts in my cache 
where I can check.

If anyone else is getting Stefan's posts, can you inspect the full 
headers and see if there is a relevant header?

On Wed, 06 Jun 2018 23:43:10 +0300, Marko Rauhamaa wrote:

> ram at zedat.fu-berlin.de (Stefan Ram):
> 
>>   I was asked about the difference between a name and an identifier. I
>>   was not sure.

The Python documentation makes it clear that "name" and "identifier" are 
considered synonyms and there is no difference:

    Identifiers (also referred to as names) are described
    by the following lexical definitions.

https://docs.python.org/3/reference/lexical_analysis.html#identifiers

So as far as Python is concerned, "name" and "identifier" mean precisely 
the same thing.

> Ah, a delicious terminology debate ahead!
> 
> Traditionally, an "identifier" refers to a syntactic (lexical, to be
> exact) unit. It is a sequence of Unicode code points inside Python text.
> Ultimately, then, an identifier is a string that satisfies some lexical
> constraints. For example, the first code point must be a letter.

I agree with this.

> "Name" is latter-day hypercorrect jargon for a variable.

I disagree with this.

"Hypercorrectness" doesn't come into this. If you wanted to say 
"pedantic", I'd agree: there are shades of meaning between "variable" and 
"name" and "name binding", but often (especially in informal language) we 
can gloss over those subtleties. But the differences can lead us astray 
(see below).

There are also legitimate and reasonable disagreement as to whether the 
term "variable" carries too much baggage to be useful in Python, but 
either way, there is no doubt that *names are not variables* in the same 
way that the name "Marko" is not *you*, the person.

We wouldn't want to say that "names are people", or "names are pets", or 
"names are ships", and we shouldn't say that "names are variables". Names 
refer to people, pets, ships and variables. They aren't people, pets, 
ships or variables either.

https://en.wikipedia.org/wiki/Use%E2%80%93mention_distinction

Also relevant:

https://en.wikipedia.org/wiki/Map%E2%80%93territory_relation

> It is an
> abstract entity inside the Python runtime engine. It is a memory slot
> that can hold a temporal reference to an object.

That is certainly wrong for Python, as Python variables typically are 
found in namespaces (implemented as dictionaries) not fixed memory slots.

Even if an implementation should use memory slots for variables, that is 
not mandated by the language. It's a mere implementation detail.

> In Python text, you refer to these memory slots using identifiers
> (lists, dicts and tuples have memory slots as well, but I'll leave those
> out of this discussion). One identifier can refer to more than one
> memory slot. In fact, there is no limit to the number of memory slots
> that are referred to using an identical identifier (for example, you
> could have a parameter of a recursive function).

Not the best example, because of course Python does enforce a limit on 
the number of recursive calls.

But putting aside memory limits and similar, I don't think I like the 
terminology you use. I wouldn't say that one identifier can refer to 
multiple variables (absolutely not "memory slot", that's just wrong). If 
that were the case, and you mentioned the name "x", how would the 
interpreter know which variable you meant?

Rather I would say that within a single context (or namespace), each 
identifier refers to no more than one variable. But the same name can be 
used in multiple contexts:

- we can have a built-in "x", a global "x", and a local "x" at 
  the same time;

- the name resolution rules make it clear which one applies in
  context;

- each function invocation results in a new context, so the local
  variable "x" in one call and the local variable "x" in another
  call are different "x"es.

>>   Shouldn't it say,
>>
>> NameError: identifier 'aiuerhguqieh' is not defined
> 
> That's an odd way to put it.

Since "name" and "identifier" are synonyms, it wouldn't be wrong to put 
it that way, but nor would there be any advantage.

>>   or even,
>>
>> NameError: identifier 'aiuerhguqieh' is not a name
> 
> That's better.

No, that is completely wrong.

Of course "aiuerhguqieh" is a name. It matches the lexical definition of 
a name, and it occurred in the correct place to match the grammar rules 
for where names can occur (otherwise you would have had a SyntaxError). 
As far as the interpreter is concerned, it is a name, and that's all that 
matters. The programmer's intent hardly comes into it.

(Maybe the person who typed that was under the misapprehension that they 
could use base-32 numeric literals, and intended it to be the decimal 
number 381626039374006737. Or maybe it was typed by a cat walking over 
the keyboard and there was no semantic intention at all.)

> Conceptually, every identifier in every context refers to a memory slot.

That's false.

Of course the value of a variable must, if it is to exist inside a 
digital computer, be located *somewhere* in memory space at some moment 
in time. That's mere location, it isn't important, conceptually it is not 
part of the model of names and name bindings.

(See what I mean about the unfortunate baggage of the C model for 
variables as memory locations and how it leads people astray?)

The name/identifier is no more conceptually limited to a specific memory 
slot or location than the name/identifier "Marko" refers to the position 
in space you happen to be sitting in right now. When you move, the name 
follows you, it doesn't stick to the location.

Likewise, if your Python interpreter has a garbage collector that can 
move objects (such as Jython and IronPython), when the object moves, the 
names referring to them follow the object.

Names in general do not track memory slots (except as a mere accident of 
implementation). If they did, they would dangle when the objects move.

> Only a finite subset of them holds a reference to an object at any given
> time.

> Or to put it in Python-speak, all names exist, but not all of them are
> bound.

Or to put it in terms which are less wrong, all names are names, but only 
some of them are defined.

Names that have never been used only exist as some sort of purely 
Platonic abstraction. They don't *actually* exist in reality.

Since there is no upper bound on the length of Python names (except that 
of memory), we can be pretty sure than the Vast majority of them have 
never, and never will, be written down in source code anywhere. Not if 
people are still programming in Python in a million years could we have 
actually used anything more than a vanishing subset of potential names.

To put some hard numbers to it... the set of all possible names includes 
all the combinations of at least 5000 possible Unicode characters up to a 
maximum length of 2**64 (given existing computer limitations). So 
something of the order of 

    5000**(2**64) = 5000**18446744073709551616

or roughly 2**221360928884514619392 or 1 followed by approximately 66 
million trillion zeroes possible names.[1]

That is so much larger than the entire universe that the only way we can 
claim they all exist is to posit some sort of Platonic ideal existence. 
They exist only in the abstract, not in reality.

[1] Admittedly the Vast majority would be utterly impractical to use.

-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson