What's the best way to minimize the need of run time checks?

Steven D'Aprano steve+python at pearwood.info
Sat Aug 13 03:37:25 EDT 2016


On Fri, 12 Aug 2016 09:55 pm, BartC wrote:

> On 12/08/2016 12:07, Steven D'Aprano wrote:
>> On Fri, 12 Aug 2016 07:38 pm, BartC wrote:
>>
>>> 'year' has been spelled wrongly
>>
>> How do you know?
> 
> I know because my intention was to create a RECORD, with a specific set
> of members or fields. With records, you usually don't just create
> arbitrary named members as you go along.

Unfortunately your intention counts for nothing. You know what they say
about computers: the damned things don't do what you want, only what you
tell them to do.

The equivalent of a record in Python is not an arbitrary object, but a
tuple, or perhaps a namedtuple if you want to access fields by name.



>> There are broadly two tactics that a language can take when it comes to
>> attribute (or more widely, variable) names:
>>
>>
>> (1) Editing is cheap, but compiling and testing code is expensive
> 
> No, compiling is cheap too.

Perhaps now, but not always. And not even always now.


>> The first time I ever compiled a full-sized application (not a particular
>> large one either, it was a text editor a little more featureful than
>> Notepad) it took something like nine hours to compile on a Mac SE (this
>> was circa 1990).
> 
> That is completely crazy even for 1990. I was using my own tools then
> and the time to re-compile a 50,000-line application would have been
> measured in seconds. Certainly not minutes anyway (I wouldn't have had
> the patience!).

Long compile times were certainly not unusual in the 1980s and even into the
1990s. Using the most powerful machines available, people would regularly
set up compile jobs to run overnight.

The Macintosh SE was built on a Motorola 68000 CPU with a clock speed of 7.8
MHz. (That was a little slower than equivalent Intel processors at the
time, but the Motorola CPUs had bigger pipelines or more registers or
something, I forget exactly what, that meant that they were actually a tad
faster than Intel despite the lower clock speed.)

Let's compare with something more recent:

http://www.openoffice.org/FAQs/build_faq.html#howlong

Building Open Office on a 1.8GHz Pentium takes about 4 hours, or 1.44e+13
nanoseconds (the reciprocal unit of GHz). Now, I know, and you know, that
you can't just compare clock speeds across entirely different CPUs, but I'm
going to do it anyway. The Pentium was something like 230 times faster than
the Motorola 68000 in my Mac SE, which would suggest that Open Office would
have taken over 900 hours (five weeks plus change) to build on my Mac SE,
if it had existed (it didn't) and if it could have fit on the 20MB hard
drive I had at the time (just the Open Office source code is 400MB, so it
wouldn't have).

Of course Open Office is a little bigger and more complex than that old text
editor: 30K files versus a dozen or so, nine million LOC versus, oh I don't
know, let's say 50 KLOC. That gives a back-of-the-envelope figure of about
180 times "bigger", based on LOC. 900 hours divide by 180 gives five hours,
within the ballpark of my memory.

By memory, my Mac SE had all of 1MB of RAM, although unlike earlier Macs at
least it had dedicated VRAM for the display. You know how software can
trade off space for time? Yeah, well when you've got less than 1MB
available, you're often trading off time for space.

So I completely reject your assertion that compilation is and always has
been cheap. Overnight builds were common, and I'm not being unreasonable to
talk about a nine-hour compilation on an underpowered entry-level machine
back in 1990 or thereabouts.


[...]
> My example was specifically about attribute names where pre-declaring
> the set of allowed attributes would not be onerous, 

Others disagree: millions of people use languages which eschew mandatory
variable declarations (Ruby, Lua, Python, Javascript, Perl, PHP, etc).


> would be usefully 
> self-documenting, and would allow more errors to be picked up.

Not really - declarations simply switch *when* the error is noticed, not
whether or not it is.


> (Of course the design of Python makes that impractical because it would
> require the byte-code compiler to see inside imported modules before
> execution is commenced.)

That's not the fundamental problem. Most of the time, modules are available
compiled to byte-code (.pyc files), so Python could compile a list of
variable names used. CPython does something similar for functions: each
function has a table of variables, taken from a quick two-pass compilation
process.

But the fundamental problem is that Python has an exec command. Ultimately,
the only way to tell what variables actually exist is to execute the code.
In Python 2, CPython worked around exec inside functions with difficulty.
In Python 3, exec inside a function is simply prohibited unless you supply
a separate local namespace.

Now, a linter, editor or other external tool is perfectly capable of using
heuristics to recognise what is *likely* to be a variable. It doesn't
matter if your IDE's code completion fails to work here:

exec("value = 1")
x = val[press tab for code completion]

but it would be completely unacceptable for the compiler to flag this as an
error:

exec("value = 1")
x = value + 1
    ^
SyntaxError: no variable called 'value'


Its okay for the linter to get it wrong. Its not okay for the compiler to
get it wrong and refuse to compile legal code.


>> Turn it around though: one of the reasons why languages like Ruby,
>> Javascript, Lua, Python, Perl etc are so ridiculously productive is that
>> you don't have to keep stopping to declare variables just to satisfy the
>> compiler. Need a new variable? Just assign to it and keep going --
>> there's no, or very little, mental task switching.
>>
>> Another reason why such languages are so productive is the availability
>> of an interactive REPL where you can try out code interactively. How
>> clumsy and awkward the REPL would be if you had to keep declaring
>> variables ahead of time.
> 
> I agree when it comes to variables. But it /does/ allow these extra
> errors to creep in that are not detected until that particular fragment
> of code is executed.

Sure, there's a trade-off. With a good editor that has some sort of code
completion, you might make a typo one time in a thousand. Without code
completion, maybe one time in a hundred. So variable/attribute declarations
save you from an error that occurs perhaps one time in a hundred times that
you refer to a name or attribute: 99 times in a hundred, it's just a
nuisance.

That's the value-judgement of the communities of programmers who have
created and use languages like Ruby and Python. Obviously those who create
and use languages like C and Java and Pascal think differently, and those
using languages like Haskell with type and variable inference try to find a
middle-ground.

The programming community as a whole is slowly converging towards that
middle ground: let your tools (compiler or IDE or linter, it doesn't
matter) infer as much information as possible, and require as few explicit
declarations as necessary.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.




More information about the Python-list mailing list