The Cost of Dynamism (was Re: Pyhon 2.x or 3.x, which is faster?)

BartC bc at freeuk.com
Mon Mar 21 20:49:20 EDT 2016


On 21/03/2016 23:20, Dennis Lee Bieber wrote:
> On Mon, 21 Mar 2016 17:31:21 +0000, BartC <bc at freeuk.com> declaimed the
> following:

I wasn't going to post it but here it is anyway:

http://pastebin.com/FLbWSdpT

(I've added some spaces for your benefit. This also builds a histogram 
of names so as to do something useful. Note that despite my concerns 
about speed, this module can process itself in around 100ms.)

>>
>> def readtoken(psource):
>> 	global lxsptr, lxsymbol
>
> 	Why is "lxsymbol" a global, and not something returned by the function
> (I can understand your making lxsptr global as you intend to come back in
> with it later).

Ideally there would be a descriptor or handle passed around which 
contains the current state of the tokeniser, and where you stick the 
current token values. But for a speed test, I was worried about 
attribute lookups.

In the first Python version, I used 'nonlocals' (belonging to an 
enclosing function), but they were just as slow as globals!

>> 	lxsubcode = 0
>>
> 	Unused in the rest of the sample

This is a global. Some lxsymbol values will set it, for the rest it's 
neater if it's zeroed.


>> 	while (1):
>
> 	while True:		#At least since Python 2.x... No () needed
>
>> 		c=psource[lxsptr]
>
> 	Is the spacebar broken? How about some whitespace between language
> elements... They don't take up that much memory

(It's not broken but it wouldn't be consistent.)

> 	Given that you state you expect to only be working with 8-bit bytes...
>
>> 		if d<256:
>
> this will always be true

Unfortunately Python 3 doesn't play along. There could be some Unicode 
characters in the string, with values above 255. (And if I used 
byte-sequences, I don't know what would work and what wouldn't.)


>> 			lxsymbol = disptable[d](psource,c)
>
> 	Looks like you are indexing a 256-element table of functions, using the
> numeric value of the character/byte as the index... Only to then pass your
> entire source string along with the character from it to the function.

No, it passes only a reference to the entire string. The current 
position is in 'lxsptr'. Yes the mix of parameters and globals is messy. 
All globals might be better (in the original non-Python, 'globals' would 
be have module-scope, and not visible outside the tokeniser module 
unless explicitly exported. Semi-global...).

> 	I have no idea what your disptable functions look like but...
>
> 	while psource:
> 		c, psource = psource[0], psource[1:]

I don't think this will work. Slicing creates a hard copy of the rest of 
the string. Performance is going to be n-squared.

(I tried a mock of this line, working with a duplicate of the data; the 
time to process a 600-line module doubled. I'm still waiting on the 6MB 
data data, and it's been seven minutes so far; it normally takes 7 seconds.

I was surprised at one time that slices don't create 'views', but I've 
since implemented view-slices and I can appreciate the problems.)

-- 
Bartc



More information about the Python-list mailing list