The Cost of Dynamism (was Re: Pyhon 2.x or 3.x, which is faster?)

Mon Mar 21 08:34:04 EDT 2016

On 21/03/2016 02:21, Terry Reedy wrote:
> On 3/20/2016 9:15 PM, BartC wrote:
>> http://pastebin.com/dtM8WnFZ
>> This is a test of a character-at-a-time task in Python;
>
> I disagree.  It tests of C code re-written in ludicrously crippled
> Python.  No use of the re module,

You can't use the re module for this kind of test. It would be like a 
writing a C compiler in Python like this:

   system("gcc "+filename)

(or whatever the equivalent is in Python) and claiming the compilation 
speeds are due to Python's fast byte-code.

> designed for tasks like this,

(I've tested someone's parser written in Python using regular 
expressions, I seem to remember it was still pretty slow.)

>  > but exactly such tasks are what I often use dynamic languages for.
>
> For instance, there are about 15 clauses like
> ---
> elif c=="?":
> lxsymbol=question_sym
> return
> ---
>
> I believe it would be much faster to combine these in one clause. First
> define simple_symbols = {'?': question_sym, ...}. Then
> elif c in simple_symbols:
> lxsymbol = simple_symbols[c]
> return

I tried that (for 11 clauses), and it actually got a bit slower if the 
one test was placed towards the end! But faster if placed nearer the 
beginning.

I also tweaked the way each identifier name is built up (using a slice 
after the limits of the name are established instead of building a 
character at a time).

The "\r" check was got rid of (in text mode, it won't occur); the eof 
check was last (as it will only occur once), and the chr(0) check was 
removed as chr(0) isn't used (that would have zero cost in a jumptable 
switch or using a function table, but it does cost here).

Overall, Python 3's throughput increased from 33Klps to 43Kpls (and 
Python 2 from 43Klps to 53Kpls).

HOWEVER: PyPy doesn't seem to like those Dict lookups: it's throughput 
reduced from 105Klps (after those other changes) to 29Klps when the Dict 
lookup was used. Odd.

(I haven't tried this on Ubuntu as that seems to have snappier versions 
of both Python 2 and Pypy, but that's a bit of a pain to test.)

> In any case, the O(k), where k is the number of alternatives, linear
> search should be replaced by an O(log k) binary search (nested if-else
> statement) or O(1) hashed search (with a dictionary mapping chars to
> functions.

>> I started off trying to write it in a more efficient way that would suit
>> Python better, but quickly tired of that. I should be able to express
>> the code how I want.
>
> Of course you can.  But you cannot write in a crippled Python subset and
> fairly claim that the result represents idiomatic Python code.

For Python I would have used a table of 0..255 functions, indexed by the 
ord() code of each character. So all 52 letter codes map to the same 
name-handling function. (No Dict is needed at this point.)

But that's a hell of a lot of infra-structure to set up, only to find 
out that Python's function call overheads mean it's not much faster (or 
maybe it's slower) than using an if-elif chain.

(I've no idea if it might be much faster or not. And yet, having said 
that, I can't resist trying it out! But it'll have to be a bit later.)

-- 
Bartc