Scoping "bug" ?

Tue Aug 22 06:01:39 EDT 2000

"Kevin Bailey" <noone at nowhere.com> wrote in message
news:39A20A44.56F02F53 at nowhere.com...
> Alex Martelli wrote:
> >
> > As long as you don't mind getting such warnings when they're
> > not really warranted
>
> huh ? There was a line of code in my example that, when run,
> halts the Python interpretter. I'd like to know about such
> lines.

There is absolutely no line of code in your example that
halts the Python interpreter.

There IS a line of code which, when executed, raises an
exception, which, if you wished, could be caught and acted
upon by 'outer layers' of your code.

Consider:

def example():
    for i in range(100):
        if i: foo.append(i)
        else: foo=[]
    return foo

This is perfectly correct Python code that returns a list
of 99 elements (the numbers from 1 to 99 included).

How can it be determined that this code IS perfectly correct
and is going to raise no exceptions whatsoever?  Answer:
only by compile-time basic-blocks analysis.  The compiler
would have to 'know' that range always produces a list
starting with 0, so the else branch is executed first and
sets the local variable foo to a list (an empty list) so
that the .append method calls, when they come, target a
valid object for this method.

But, hmmm, DOES the compiler indeed "know" about range()...?
Suppose your script continues with:

def range(x):
    return [3,2,1,0,1,2,3]

Oops!  Now the function example _does_ cause an exception!
Specifically, a NameError, in Python 1.5.2 (a more specific
one regarding an unbound local name in Python 1.6 and further
ones, but that's basically the same thing).

So, how would the compiler react to the definition of
function 'example'?  Warn about it, or not?  Note that
the modified range function need not be defined in the
same module as example -- somebody somewhere could always
be re-binding the name 'range' in the global namespace
for 'example' to whatever is wished.  So, when the compiler
compiles 'example', it can make no assumptions...

There ARE cases in which it can be _proved_ that a certain
local variable will be unbound -- because it's accessed
early enough in the function, that nothing may possibly
have happened to cause a binding, yet.  They are definitely
the exception, not the rule, among errors having to do with
unbound local variables, and the cost of finding out about
them is (I believe) very close to full basic block analysis,
which is why I dispute the price/performance ratio for such
diagnosis is worthwhile -- *within Python's fundamental
design choices*.

It's certainly possible to design a language which is ABOUT
compile-time diagnosis of errors, or possible errors.  Indeed,
not only it's possible, but it has been done many, many times.

However, those many languages focused on compile-time diagnosis
of some possible errors are definitely not Python.  I would not
compromise Python's strengths in an attempt to gain a fraction
of the small benefit provided by compile-time diagnosis rather
than run-time diagnosis with very fast turn-around (a compiler
that is so fast you hardly notice when it's run:-).

> > , with no good way to turn them off except
> > by inserting otherwise-useless bindings,
>
> I guess I've misunderstood something. It seemed to me that
> you supported the 'global' directive (which is necessary
> to fix my example.)

If you want to re-bind a global inside a function, you do
need to declare it as global (in Python).  I do not "support"
using any more globals than is strictly necessary, and in
particular re-binding them from inside functions; I think
globals are a necessary evil, and re-binding them an even
rarer need -- when I face such a need, I normally take a step
back and find out where I've gone wrong in my design, as a
better design will probably eliminate it (and it's never too
late to make a design better -- that's what using a fluid
language is mostly about...).

Most of the time, when I do hit an unbound-local-variable
error, it's exactly that -- I forgot to bind the local
variable at the start.  Rarely did I mean to re-bind a
_global_ variable AND forgot to declare the fact: since
re-binding of globals from within functions is something
which I detest, I'm VERY aware of when I'm doing it anyway
(because I need a quick hack right now, and am only going
to refactor things and make them better later -- sometimes
one does work under huge time-pressure to show a working
prototype, for example); so, I don't tend to forget the
'global foo' at the start.

If a Python compiler protested about the above 'example'
function, to shut it up I'd probably have to give local
variable foo an otherwise-useless binding right at the
start of the function, before the loop.  That's how it
usually works in languages which do carp about unbound
local variables, such as Java, in my experience.

> The alternative is the way I
> assumed it worked - variable assignment is definition.

Even if 'assignment' worked as you assumed, this would
not help in the least with the above 'example' function,
would it?  If there is no global foo, and range happens
not to start at 0, foo.append would still raise the
exception; if there IS a global foo, then it would get
modified by the first few iterations, then when a 0
comes in the sequence (what should happen?  global foo
re-bound to empty, or local one magically created and
used for later iterations...?)...

I'd much rather not have to wrestle with such subtleties.

"If a name is bound anywhere within the local block,
then all uses of that name in the block are in the
local namespace": THIS is what I find simple, linear,
and unambiguous.  It's also usually what I want -- see
above for my attitude about globals.

> > I think it could be
> > arranged. Otherwise, you'd need to add basic-block analysis
> > to the Python compiler, which currently has no other need for
> > it.  The trade-off does not seem particularly worthwhile to me.
>
> It was worthwhile enough for Gnu C/C++. I find it handy, as

The design choices of a statically-typed language are, of
course, EXTREMELY different from those of a dynamically-typed
one; the implementation architecture for those languages
will differ enormously, if each is to be well-tuned for the
specific language it addresses.

The key language-design item in C++, introduced to make it
less likely that uninitialized variables will be used, is to
*declare a variable 'just in time' -- when you're ready to
initialize it* (as opposed to C's older choice that made you
declare every variable at block-start; the newer C standard
has adopted the C++ solution here).  This is a very good
decision *in a language that has declarations* mandatory for
all variables &tc.  Basic-block analysis being needed for
optimization of such a static language, it's good to use it
for diagnostic purposes as well if those can be done in a
reliable way (as I recall, the 'variable used before it's
initialized' warning only comes when compiling WITH the
optimization flag turned on, doesn't it?  No basic block
analysis if optimization is not requested).

But Python does not attach type to a variable (label): the
type resides _in the object_, and a variable can be bound
to many different objects at different times.  This fluidity
is a crucial part of Python.  Clearly, design choices made
for completely different languages, and implementations
thereof, do not translate simply or linearly.

> you implied, not for my own code but, you know, for beginners.
> If Guido has his way, "everyone" will be programming in Python.
> I think it would be useful to them.

It is quite possible that the language/implementation best
suited to a beginner may be different from that more suitable
for general consumption.  One interesting example is the
'DrScheme' implementation of the Scheme programming language:
it's explicitly intended for teaching, and lets the user (or
the instructor) fine-tune the exact language being accepted
and some further points (format of outputs, for example).

I'm not ruling out the possibility that a 'DrPython', intended
for beginners, should have different design parameters (at
least at the implementation level, not necessarily in terms
of language) than a Python intended for general consumption.

Beginners are notoriously averse to case-sensitive languages,
for example.  'DrPython' might address this, while avoiding
the very controversial change of making Python itself case
insensitive, by at least giving warnings any time it finds an
identifier used with different capitalization.

Similarly, 'DrPython' might at least provide a warning, if
not forbid the practice outright, any time it finds a local
identifier 'shadows' (hides) a global one.  And it might
well provide automatic, implicit 'lint' (static checking)
at each compile -- the beginner is likely to be producing
little enough code, that the resulting slow-down is probably
not going to be a problem, while the diagnostics might well
be a good help to him/her.

"What should be in a DrPython development environment, to
help beginners learn the basics while not interfering with
their future migration to `full Python'", is a fascinating
theme.  But, if you're also interested, I suggest we take
it to a separate thread, with its own clearer subject, as
it really has marginal and partial relevance to the specific
problem.  Personally, I consider the single hardest issue
to be integer division: 7/2 _should_ return 3.5, NOT 3, and
I consider it a Python wart that it does return 3.  But how
do you make life easier for beginner on this, if Python
integer division remains truncating, as it must for backward
compatibility?  vp (born visualpython) has a pseudo-module
that turns integer division into non-truncating (producing
floating point), but of course existing Python programs would
break in droves if you just imported that module...

> > A key design choice of Python is to compile *very, very fast*,
>
> If this is even true, it is a bad choice. Make your .pyc's
> and move on. I might be able to believe decisions to speed
> execution but compiling ? Nope.

Compilation time is overhead when you're in the development
cycle -- code, test, recode, retest, etc.  Execution speed
matters, but so does compilation speed.  So, I strongly
disagree that compiling fast "is a bad choice": if I thought
so, I would not be using Python (nor any other scripting
language, since they all rely on that, in different guises).

There are, of course, also language-design decisions that
come from a desire to enhance execution speed.  Singling
out local variables is among them: by identifying local
variables at compiletime, their 'lookup' at runtime gets
MUCH faster (a fixed-index slot into a list that was
prepared ahead of time, rather than dictionary-lookup).

> > How many of
> > your errors are intended uses of global variables where you
> > forgot the 'global x' declaration?  For me, surely less than 1%
>
> Fine. What about code where it's possible to use a _local_
> variable without binding it ?

See the above 'example'.  If it's broken, and my unit tests
don't catch such gross breakage, then what chance do my tests
have to catch *really* hard errors?!

That's the basic objection to gearing a language to compile
time diagnosis of errors: such diagnosis offers little extra,
in terms of development solidity, compared to what even very
elementary unit tests MUST catch anyway.  So, the rigidity
does not truly pay for itself in these terms (although it
might have other benefits -- read 'optimization').

What little controlled research has been done on this seems
to bear this out; programs developed in 'scripting' languages
appear to be a little bit more reliable than C++ ones:
http://www.ubka.uni-karlsruhe.de/vvv/ira/2000/5/5.text

If you have any URL's to _empirical_ results showing otherwise,
I'll most gladly receive them.  My current working hypothesis
on the matter is that the extremely fast turnaround of scripting
languages encourages more and better unit testing, which in turn
contributes more to reliability than rigidity and compile-time
checks do; Beck's and others' "Extreme Programming" (although
born in a Smalltalk environment, not Python, it seems to be quite
as applicable here) may be partly seen as an attempt to capitalize
on these strengths (http://c2.com/cgi/wiki?CodeUnitTestFirst and
most of the rest of the Wiki).

> > I would not approve of a 1%
> > slowdown of compilation to diagnose these specific errors
>
> 1% is 1% no matter how many test cases you have. You would
> notice an extra minute on a 2 hour test ?

No, but why would the "improvements" stop there?  If a 1%
slowdown can help me get, say, 0.1% of my errors in this
way, there must be tens of other "improvements" with similar
or better cost/benefit ratios.  So I'd end up with highly
noticeable 100% slowdowns to catch (at compile time) about
10% of my errors (which the tests would catch anyway).

> Now compare this to the time it would take all those beginners
> to find the source of an error like this.

When the beginner gets an exception that says 'Unbound local
name X' for the first time (as will happen, in Python 1.6 and
later), he or she may take some time to understand what it
is about -- but what difference does it make, to the beginner,
whether it comes right *before* the function starts executing
(i.e., while it's being compiled), or right *after* such a
start of execution?  The substance of the error message is
identical, and, if anything, more information is available at
run-time than it would be at compile-time, so the error message
should be just as clear.

So, if you're arguing for better/clearer-to-beginners error
messages (or, for motherhood, apple pie, etc), fine; the
specific exception about a local-name being unbound is, I
think, better than the slightly more generic one about some
name being unbound, from all POV's.  But, if we're talking
about making compile-time effort in order to give the same
diagnosis one bananosecond earlier, then you've lost me.

> > Maybe, a separate lint-like utility would be a better approach:
>
> The existence of lint didn't help the decision to keep "uninitialized
> variable" warnings out of gcc.

Apples and oranges, see above.  Anyway, developers tend to
under-use lint (& friends) in the quest for shorter turnaround
times (even by 1% -- every little bit helps:-).  But, see
above about DrPython, there could surely be a beginners-
oriented development environment geared to special needs
_without_ damaging the functionality of Python environments
meant for general uses.

> I really think the frustration of the many outweighs the
> inconvenience of the few.

_Every_ Python developer, not just "the few", needs to be
familiar with namespace issues, and able to interpret and
understand the common exceptions raised by typical errors.

When a variable-name is NOT locally bound, but just used,
and it's not found at runtime, a 'name error' exception
will be raised.  There is no way the compiler can find out
ahead of time what WILL be in the global namespace when
the function gets executed.  So, ANY Python developer must
know what happens in such cases -- period.  Why should it
be all so drastically different if the name which is not
bound at runtime IS subject to local binding under some
execution paths?  A slightly more precise exception, to
distinguish between such 'local names' and others, sure --
Python 1.6 and following gives you that.  But, a drastic
change of the diagnostics to be at compiletime rather than
runtime?!  WHY?  Where's the huge benefit in 'frustration'
to offset the costs of such a revolution?!

Alex