The Cost of Dynamism (was Re: Pyhon 2.x or 3.x, which is faster?)

Mon Mar 21 11:11:58 EDT 2016

On Mon, 21 Mar 2016 11:34 pm, BartC wrote:

> On 21/03/2016 02:21, Terry Reedy wrote:
>> On 3/20/2016 9:15 PM, BartC wrote:
>>> http://pastebin.com/dtM8WnFZ
>>> This is a test of a character-at-a-time task in Python;
>>
>> I disagree.  It tests of C code re-written in ludicrously crippled
>> Python.  No use of the re module,
> 
> You can't use the re module for this kind of test. It would be like a
> writing a C compiler in Python like this:
> 
>    system("gcc "+filename)
> 
> (or whatever the equivalent is in Python) and claiming the compilation
> speeds are due to Python's fast byte-code.

Of course you can and should use the re module, when necessary. It is as
much a part of Python's standard library as lists, strings, ints and dicts.
Would you refuse to use dicts because they're built into the core language
rather than written in pure Python? They're *part* of what "pure Python"
means.

>> designed for tasks like this,
> 
> (I've tested someone's parser written in Python using regular
> expressions, I seem to remember it was still pretty slow.)

This is fairly old now, but here are some comparisons of various Python
parsers:

http://www.dalkescientific.com/writings/diary/archive/2007/11/03/antlr_java.html

Parsing 2505 molecular formulae, the author gets the following times:

Custom hand-written parser using re: 0.18
PLY (pure Python) 2.33
ANTLR (Java) 8.73
PyParsing (pure Python) 9.87

I presume the times are in seconds.

>>  > but exactly such tasks are what I often use dynamic languages for.
>>
>> For instance, there are about 15 clauses like
>> ---
>> elif c=="?":
>> lxsymbol=question_sym
>> return
>> ---
>>
>> I believe it would be much faster to combine these in one clause. First
>> define simple_symbols = {'?': question_sym, ...}. Then
>> elif c in simple_symbols:
>> lxsymbol = simple_symbols[c]
>> return
> 
> 
> I tried that (for 11 clauses), and it actually got a bit slower if the
> one test was placed towards the end! But faster if placed nearer the
> beginning.

Without seeing exactly what you did, it is difficult to comment on why it
got slower, or whether the slowdown was significant or just "noise".

[...]
> Overall, Python 3's throughput increased from 33Klps to 43Kpls (and
> Python 2 from 43Klps to 53Kpls).
> 
> HOWEVER: PyPy doesn't seem to like those Dict lookups: it's throughput
> reduced from 105Klps (after those other changes) to 29Klps when the Dict
> lookup was used. Odd.

If you can replicate that with a smaller, more focused piece of code, I'm
sure that the PyPy people will be very interested to look at that.

-- 
Steven