Python and the need for speed

Fri Apr 14 21:55:47 EDT 2017

On Wednesday, April 12, 2017 at 4:57:10 AM UTC-5, bart... at gmail.com wrote:
> On Wednesday, 12 April 2017 07:48:57 UTC+1, Steven D'Aprano  wrote:
> > On Tue, 11 Apr 2017 21:10:56 -0700, Rick Johnson wrote:
> > >
> > > high level languages like Python should make it
> > > difficult, if not impossible, to write sub- optimal code
> > > (at least in the blatantly obvious cases).
> >
> > Here's another example:
> >
> >     answer = 0
> >     for i in range(10):
> >         answer += 1
> >
> > instead of
> >
> >     answer = 10
> >
> > So... how exactly does the compiler prohibit stupid code?
>
> Actually, an optimising C compiler (not one of mine!)
> probably could reduce that to answer=10. And eliminate even
> that if 'answer' was never used.
>
> But that's not really the issue here. Assume that such a
> loop /is/ doing something more useful. The problems with
> Python (last time I looked anyway) were these:
>
> (1) If answer was a global variable, then it needs a symbol
> table lookup to find out what it is. That would not happen
> in a static language, or a less dynamic one, as you already
> have the address.

Indeed. I have argued for syntax that will resolve variable
scope many times. but apparently, the py-devs believe we
only deserve type declarations that do _nothing_ to speed up
code execution (aka: type-hints), instead of type
declarations that could actually speed up the code. Go
figure!

I'm not a fan of forced static typing, but i am a fan of
optional static typing.

The global keyword is one idiosyncrasy of Python that causes
a lot of confusion , especially to noobs, but also from a
standpoint of "general intuitiviness" for old hats. This
keyword does not define "true globals" (aka: a symbol that
can be accessed or reassigned from anywhere), no, but
rather, a symbol that is global only with respect to the
"current module scope" (aka: module level variable). You can
only get "true globals" in Python by going rogue and
injecting symbols into sys.modules like some mad scientist
-- which is not officially supported. Go figure! ;-)

> And this [global] lookup happens for every loop iteration.

I sure hope not. I have not taken the time to inspect the
inner workings of Python, but if the lookup happens every
time, that would be an awfully inefficient design.

> (2) There is also 'range', which could have been redefined
> to mean something else, so /that/ needs a lookup. The byte-
> code compiler can't just assume this loop is to be executed
> 10 times.

Yep, and more evidence that Python has taken dynamics to
such a fundamentalist extreme, that even ISIS is jealous!

    # START INTERACTIVE SESSION  (Py2.x) ##
    >>> def range(*args):
    ...     return "Death to static infidels!"

    >>> for i in range(10):
    ...     print i

    D
    e
    a
    t
    h

    t
    o

    s
    t
    a
    t
    i
    c

    i
    n
    f
    i
    d
    e
    l
    s
    !

    ## END INTERACTIVE SESSION ##

> (3) This was fixed long ago but at one time, even when
> 'range' had been established to be a range, it involved
> constructing a list of items (10 here, but it could be a
> million), and then iterating over the list.

First, it was "range", then "xrange", and now, "range"
again. Third time's a charm i suppose. The last iteration of
the range function design removed the underlying
inefficiency, however, it created a semantical nightmare
(more on this below).

> This might seem crazy, but it might have been exceptable
> for a script language at one time. Not for a general
> purpose one however.

Yeah, this is one of those screw-ups that even GvR's famous
time machine could not fix. Must have been in the shop for a
tune-up...

> (4) Python's integer types being immutable, the +=
> operation means evaluating the result, then creating a new
> integer object and binding 'a' to that new value. (Correct
> me if I'm wrong.)
>
> These are all things the language could point a finger at
> before blaming the user for writing inefficient code.
>
> The problem is also the language encouraging people to use
> high-level but inefficient methods, as the emphasis is on
> productivity and readability** rather than performance. In
> fact most users probably have no idea on what is efficient
> and what isn't.

And that's the point i was trying to make earlier. Most
Python programmers are not CS majors, they are just
"technical folk" who need to write a logic program to do
this or that task. We shouldn't expect them to know about
things like memory management, neither should we
expect/burden them to know that calling range(1000000), just
to do one million iterations, is inefficient. Instead, we
should provide them with the proper "efficient tools" to get
the job done. So instead of:

    for x in range(1000000): # Legacy python was slow here!
        doSomething(x)

    It should have _always_ been...

    1000000.times:
        doSomething()

    or, if you need a counter:

    1000000.times as x:
        doSomething(x)

Or something similar. The exact syntax is irrelevant.
However, preventing the Python programmer from writing
inefficient code is the key. But, equally important, is
making a clear distinction between "building a list of
integers" and "looping N times". The current implementation
of range() is creating a semantical nightmare. "EXPLICIT IS
BETTER THAN IMPLICIT!" -- i know i heard that somewhere...

> If I wanted to work with the code for character 'A' (ie.
> the integer 65), in another language it might represent it
> as 'A' which is mapped to 65. In Python, 'A' is a string.
> To get the integer code, I have to use ord('A'). To do
> that, it has to look up 'ord', than execute a function
> call... In the meantime the more static language has long
> since finished whatever it was going to do with that code.

I don't see a problem here. Those other languages that
return ASCII values of string chars really piss me off. Ruby
comes to mind:

    ## START INTERACTIVE SESSION (Ruby 1.8.6) ##
    >>> sl = "Hello from Ruby"
    >>> sl[0]
    72
    ## END INTERACTIVE SESSION (Ruby 1.8.6) ##

When indexing a string, 99% of the time, a string is what i
want in return, not an integer. But maybe i'm the
exception???

> (** Although I find code full of class definitions, one-
> liners, decorators and all the other esoterics,
> incomprehensive. I'm sure I'm not the only one, so perhaps
> readability isn't too much of a priority either.)

Agreed on the decorators, and, in Python at least, i much
prefer a named function over the less readable comprehension
-- unless the comprehension is limited to one level. For
example:

    Good: [i+1 for i in range(10)]

    Bad: [(i+1, j) for i in range(10) for j in range(10) if j < 2]

Because "READABILITY COUNTS!". And long, syntactically
complex lines are just not readable.

As for classes, being that i have a solid background in
Java, classes don't bother me one bit. Actually, I find
their "syntactical scoping" to be quite natural and pleasing
to my eye. Which shouldn't be surprising when you consider
one of my pet peeves is namespace pollution. Especially
module- level namespace pollution.