The Cost of Dynamism (was Re: Pyhon 2.x or 3.x, which is faster?)

Fri Mar 11 21:20:28 EST 2016

On Sat, Mar 12, 2016 at 12:16 PM, BartC <bc at freeuk.com> wrote:
> On 11/03/2016 21:36, Chris Angelico wrote:
>>
>> constants, and can then perform all the other optimizations you
>> describe (since functions are themselves immutable, all you need is an
>> identity check on the function object itself, and everything else you
>> describe can be done safely).
>
>
> You mean the compiler assumes it's a particular function, but adds a runtime
> check? (Perhaps by generating two call sequences, one fast, and selecting
> the path at runtime.)

Exactly. Fast path, slow path. The cost of checking is, in many cases,
less than the advantage gained.

>>> * Variables are dynamic. So you can't for example declare (with a new
>>> const feature)
>
> 'const rows=50'
>
>> Definitely agree with this. Having a way to declare that a name is
>> "truly constant" would be extremely handy; there currently isn't a
>> way, and I'm not sure whether FAT Python is looking into this or not.
>> I think it's more focusing on the situation where something simply has
>> never been rebound, which is probably the better optimization; but
>> sometimes it'd be nice to be able to assert that something *will not*
>> be rebound, to help with self-documenting code. (Note that "constant"
>> implies both "never rebound" and "immutable".
>
>
> The 'const' prefix here is intended to define a named constant (a numeric
> literal with a convenient alias) rather than some 'read-only attribute for a
> conventional variable.
>
> But introducing that into Python would be a can of worms. (A named constant
> needs a compile-time expression on the right-hand-side. The compiler needs
> to be able to see named constants which are in an imported module, and which
> are accessed via attributes, and so on.)

Not sure about that. Consider:

MAX_SIZE = 1<<16
def some_func(blah):
    data = some_file.read(MAX_SIZE)

Currently, this disassembles to show that MAX_SIZE is being fetched
with LOAD_GLOBAL. If, instead, it were LOAD_CONST, this would mean
that rebinding MAX_SIZE after function definition would fail; but it
would run faster. That's the advantage of a "declared constant", even
without it being any sort of compile-time constant. As long as it can
be finalized before the function is defined, it can be called
constant.

>> Maybe, but I honestly don't miss 'switch' all that often - and when I
>> do, it's usually because I want a range. Consider this
>> semi-hypothetical Pike code:
>>
>> switch (key)
>> {
>>      case 'A'..'Z': case 'a'..'z': return "Letter";
>>      case 0xFF00..0xFFFF: return "Special key";
>>      case '.': case ',': case '!': return "Punctuation";
>>      default: return "Other";
>> }
>>
>> With a small number of cases, this wouldn't be too bad in if/elif
>> form; but as it gets bigger, the value of a switch block becomes
>> evident. The normal advice in Python is "use a dispatch dictionary",
>> but that would be impractical here. So this would end up being a
>> massive if/elif tree, because there's just no better way to do this.
>
>
> Your example doesn't really need a switch: a range check and a predefined
> list or tuple which maps 0 to 255 to a certain string.

This is not bounded to 0..255. Even if it were, I don't want to have
to represent this in my code with anything other than a range. (If
it's represented internally with a lookup table, that's fine, as long
as it's invisible to my code.) Plus, the value of a switch block comes
from arbitrary code, not toy examples that return constant strings; if
the only uses of switch were like this, sure, we could all use
dictionaries easily.

>> That said, though: I suspect the reason nobody's gone and done this is
>> not that it wouldn't be any benefit, but that applications doing a lot
>> of integer code are usually going to see more benefit from Numpy,
>> Cython, or PyPy,
>
> You'd be surprised how much any kind of program relies on adhoc integer
> operations. It doesn't need to work with large int arrays for them to be
> important. (Look at the benchmark below: the inner loop is nearly all
> integer ops.)

Sure, but in real-world code, there are other considerations than just
integer performance. If someone waves a magic wand and improves
machine-word integer performance, great, but there are other calls on
core dev time.

Since it can be done without any externally-visible changes, it's a
"safe optimization" that any Python implementation would be free to
attempt.

> 'Switch' testing benchmark. The little program show below reads a text file
> (I used the entire CPython C sources, 6MB), and counts the number of
> characters of each category in upper, lower, digit and other.
>
> (Note there are other ways to approach this task, but a proper 'lexer'
> usually does more than count. 'Switch' then becomes invaluable.)

Are you assuming that the files are entirely ASCII? (They're not.) Or
are you simply declaring that all non-ASCII characters count as
"other"?

Once again, you cannot ignore Unicode and pretend that everything's
ASCII, or eight-bit characters, or something. Asking if a character is
upper/lower/digit/other is best done with the unicodedata module.

ChrisA