The Cost of Dynamism (was Re: Pyhon 2.x or 3.x, which is faster?)

Thu Mar 24 14:03:33 EDT 2016

On 24/03/2016 17:13, Ned Batchelder wrote:
> On Thursday, March 24, 2016 at 12:12:55 PM UTC-4, BartC wrote:
>> On 24/03/2016 15:30, Ned Batchelder wrote:
>>> On Thursday, March 24, 2016 at 9:51:11 AM UTC-4, Steven D'Aprano wrote:
>>>> You know what is missing from this conversation?
>>>>
>>>> For one of Bart's critics to actually show faster code.
>>>>
>>>> There's plenty of people telling him off for writing unpythonic and slow
>>>> code, but I haven't seen anyone actually demonstrating that Python is
>>>> faster than his results show.
>>>
>>> As I mentioned before, I'm happy to explain the fuller Python way to
>>> write code, but I don't think Bart wants to learn it, because he is
>>> focused on a different goal than, "write real Python code the best
>>> possible way."
>>>
>>> Here, for example, is a real lexer for JavaScript that I wrote:
>>> https://bitbucket.org/ned/jslex/src
>>>
>>
>> Thanks for that.
>>
>> I don't have any JS to throw at it, but it seems happy with any bits of
>> source code or even just text.
>>
>> Using your short driver program (with the prints commented out), and
>> tested with 'bible.txt' as input (ie. mostly English words), then your
>> JS lexer was roughly half the speed of the Python version I linked to
>> last week (with the if-elif chains and working with strings).
>
> I have tried to find your code, but cannot find in the forest of this thread.
> Can you provide a link to it online?  I would be very interested to understand
> the difference in performance.

This the version I used today:

http://pastebin.com/dtM8WnFZ

(others I've experimented with aren't much faster or slower.)

Testing it with lots of C source, the difference is narrower.

(Note that the two will be doing different jobs; one or two things are 
not complete on mine, such as the final calculation for floating point 
literals. And the number of tokens read is a bit different. But then 
they are expecting a different language.

Also, I only tested with large monolithic files (to make measuring easier).

In terms of getting through the same amount of input however, I think 
the comparisons aren't too far off.)

This is the test code I used for your JS lexer:

"""A main program for jslex."""

import sys
from jslex import JsLexer

def show_js_tokens(jstext, ws=False):
     line = 1
     lexer = JsLexer()
     n=0
     for name, tok in lexer.lex(jstext):
         n+=1
#        print_it = True
#        if name == 'ws' and not ws:
#            print_it = False
#        if print_it:
#            print "%4d %s: %r" % (line, name, tok)
         line += tok.count("\n")
     print line,n

if __name__ == '__main__':
     file="bigpy"
#    file="sqlite"
#    file="bible.txt"

     show_js_tokens(open(file).read())

-- 
Bartc