The Cost of Dynamism (was Re: Pyhon 2.x or 3.x, which is faster?)

Fri Mar 11 20:16:38 EST 2016

On 11/03/2016 21:36, Chris Angelico wrote:
> On Sat, Mar 12, 2016 at 5:57 AM, BartC <bc at freeuk.com> wrote:

>> Functions are dynamic. That is, you don't know the F in F() is actually a function, even if it was defined a few lines previously, because it could have been rebound to something else. That means a runtime check to ensure that it is one.

> All of these are really one thing: When you compile some code, you
> don't know whether F() is still going to be the same object when you
> run it. FAT Python is intended to pick this up;

That's an interesting project. And they seem to understand the problems.

  it recognizes
> constants, and can then perform all the other optimizations you
> describe (since functions are themselves immutable, all you need is an
> identity check on the function object itself, and everything else you
> describe can be done safely).

You mean the compiler assumes it's a particular function, but adds a 
runtime check? (Perhaps by generating two call sequences, one fast, and 
selecting the path at runtime.)

>> * Variables are dynamic. So you can't for example declare (with a new const feature)
'const rows=50'

> Definitely agree with this. Having a way to declare that a name is
> "truly constant" would be extremely handy; there currently isn't a
> way, and I'm not sure whether FAT Python is looking into this or not.
> I think it's more focusing on the situation where something simply has
> never been rebound, which is probably the better optimization; but
> sometimes it'd be nice to be able to assert that something *will not*
> be rebound, to help with self-documenting code. (Note that "constant"
> implies both "never rebound" and "immutable".

The 'const' prefix here is intended to define a named constant (a 
numeric literal with a convenient alias) rather than some 'read-only 
attribute for a conventional variable.

But introducing that into Python would be a can of worms. (A named 
constant needs a compile-time expression on the right-hand-side. The 
compiler needs to be able to see named constants which are in an 
imported module, and which are accessed via attributes, and so on.)

>>   * Without a way of defining compile-time constants, very useful language features such as 'switch' are not practical.
>
> Maybe, but I honestly don't miss 'switch' all that often - and when I
> do, it's usually because I want a range. Consider this
> semi-hypothetical Pike code:
>
> switch (key)
> {
>      case 'A'..'Z': case 'a'..'z': return "Letter";
>      case 0xFF00..0xFFFF: return "Special key";
>      case '.': case ',': case '!': return "Punctuation";
>      default: return "Other";
> }
>
> With a small number of cases, this wouldn't be too bad in if/elif
> form; but as it gets bigger, the value of a switch block becomes
> evident. The normal advice in Python is "use a dispatch dictionary",
> but that would be impractical here. So this would end up being a
> massive if/elif tree, because there's just no better way to do this.

Your example doesn't really need a switch: a range check and a 
predefined list or tuple which maps 0 to 255 to a certain string.

Otherwise a list of functions indexed by the switch key would generally 
do, and might be faster than an if-elsif chain. But there's some untidy 
setup code and the rest would be spread out across various functions.

A proper 'switch' statement takes care of all that setup code, and keeps 
the blocks of code localised. And selection of the correct bit of code 
is very fast, as it's done inside the interpreter with a single 
byte-code (well, load and switch).

(See yet another microbenchmark below.)

>> * Python 3 has decided to have a single 'long' type for integers, instead of separate int (32 or 64-bit) and long types. This gives a noticeable slow-down in programs executing lots of integer code, compared with Python 2.

> That said, though: I suspect the reason nobody's gone and done this is
> not that it wouldn't be any benefit, but that applications doing a lot
> of integer code are usually going to see more benefit from Numpy,
> Cython, or PyPy,

You'd be surprised how much any kind of program relies on adhoc integer 
operations. It doesn't need to work with large int arrays for them to be 
important. (Look at the benchmark below: the inner loop is nearly all 
integer ops.)

> For anything other than individual characters, this is likely to
> perform better. However, I was unable to see this, because *CPython
> does optimize string appends*.

I guess mine doesn't! (I don't think my PC is 30 times slower than yours.)

----------------------------

'Switch' testing benchmark. The little program show below reads a text 
file (I used the entire CPython C sources, 6MB), and counts the number 
of characters of each category in upper, lower, digit and other.

(Note there are other ways to approach this task, but a proper 'lexer' 
usually does more than count. 'Switch' then becomes invaluable.)

On my machine, I got 2.8 - 3.6 seconds (PyPy 0.4 seconds).

I tried my bytecode language, with the same if-else chain, and it was 
0.6 seconds. Then I used 'switch', and it halved to 0.3 seconds.

And the switch here looks at only 3 different groups; typical switch 
statements might have dozens. Switch can make a big difference! However, 
because of the problem with consts, it would be awkward to add to Python 
(it would be unlikely anyway).

(I then tried something else, and got 0.08 seconds ... 
http://pastebin.com/whSbieiW for the (non-Python) code.)

upper=0
lower=0
digits=0
other=0

def lex(data):
	global upper,lower,digits,other
	AA=ord('A')
	ZZ=ord('Z')
	aa=ord('a')
	zz=ord('z')
	ZERO=ord('0')
	NINE=ord('9')

	for c in data:
		k=ord(c)
		if AA<=k<=ZZ:
			upper+=1
		elif aa<=k<=zz:
			lower+=1
		elif ZERO<=k<=NINE:
			digits+=1
		else:
			other+=1

print ("READING")

with open ("big", "r") as myfile:
	data=myfile.read()

print ("LEXING")

lex(data)

print ("Upper",upper)
print ("Lower",lower)
print ("Digits",digits)
print ("Other",other)
print ("Total",upper+lower+digits+other)

-- 
Bartc