[Python-ideas] A partial (wacky?) alternative to assignment expressions

Mon May 14 21:58:04 EDT 2018

[Steven D'Aprano <steve at pearwood.info>]
> I'm hoping that the arguments for assignment expressions will be over by
> Christmas *wink* so as a partial (and hopefully less controversial)
> alternative, what do people think of the idea of flagging certain
> expressions as "pure functions" so the compiler can automatically cache
> results from it?
>
> Let me explain: one of the use-cases for assignment expressions is to
> reduce repetition of code which may be expensive. A toy example:
>
>     func(arg) + func(arg)*2 + func(arg)**2
>
> If func() is a pure function with no side-effects, that is three times
> as costly as it ought to be:
>
>     (f := func(arg)) + f*2 + f**2
>
> Functional languages like Haskell can and do make this optimization all
> the time (or so I am lead to believe), because the compiler knows that
> func must be a pure, side-effect-free function. But the Python
> interpreter cannot do this optimization for us, because it has no way of
> knowing whether func() is a pure function.
>
> Now for the wacky idea: suppose we could tell the interpreter to cache
> the result of some sub-expression, and re-use it within the current
> expression? That would satisfy one use-case for assignment operators,
> and perhaps weaken the need for := operator.
>
> Good idea? Dumb idea?

Despite that Haskell can do optimizations like this , its "let ... in
..." and "... where ..." constructs (which create names for
expressions, for use in another expression or code block) are widely
used anyway.  They don't care about the optimization (they already get
it), but about improving clarity.  In Haskell they'd spell it like,
e.g., (mixing Python with Haskell keywords in UPPERCASE)

    :LET fa = func(arg) IN fa + fa*2 + fa**2

which the compiler may (but almost certainly won't) optimize further to

    LET fa = func(arg) IN fa * (3 + fa)

if it knows that fa is of a type for which that makes sense.

In Python today, I expect most people would do it as

    t = func(arg)
    t + 2*t + t*t  # or t*(3+t)

because they also know that multiplying t by itself once is usually
faster than squaring ;-)  And they wouldn't _want_ all the redundant
typing in

     func(arg) + func(arg)*2 + func(arg)**2

anyway.

So I'm not saying "good" or "bad", but that it needs a more compelling use case.

> Good idea, but you want the assignment operator regardless?

I'd probably write the example the way "I expect most people would do
it" above even if we do get assignment expressions.

> I don't have a suggestion for syntax yet, so I'm going to make up syntax
> which is *clearly and obviously rubbish*, a mere placeholder, so don't
> bother telling me all the myriad ways it sucks. I know it sucks, that's
> deliberate. Please focus on the *concept*, not the syntax.
>
> We would need to flag which expression can be cached because it is PURE,
> and tag how far the CACHE operates over:
>
>     <BEGIN CACHE>
>         <PURE>
>             func(arg)
>         <END PURE>
>         + func(arg)*2 + func(arg)**2
>     <END CACHE>

That syntax is clearly and obviously rubbish!  It sucks.  You're welcome ;-)

> This would tell the compiler to only evaluate the sub-expression
> "func(arg)" once, cache the result, and re-use it each other time it
> sees that same sub-expression within the surrounding expression.
>
> To be clear: it doesn't matter whether or not the sub-expression
> actually is pure. And it doesn't have to be a function call: it could be
> anything legal in an expression.
>
> If we had this, with appropriately awesome syntax, would that negate the
> usefulness of assignment expresions in your mind?

The use cases I envision for that have no intersection with use cases
I have for assignment expressions, so, no.

My first thought about where it might be handy probably has no
intersection with what you were thinking of either ;-)

    <BEGIN CACHE>
        <PURE>
             math.ceil
             math.floor
        <END PURE>
        def int_away_from_zero(x):
            if x >= 0:
                return math.ceil(x)
            else:
                 return math.floor(x)
    <END CACHE>

The body of `int_away_from_zero()` is the way I _want_ to write it.
But in heavily used functions it's expensive to look up "math", then
look up its "ceil" (or "floor") attribute, on every call.  So stuff
like this often abuses default arguments instead:

        def int_away_from_zero(x, mc=math.ceil, mf=math.floor):
            if x >= 0:
                return mc(x)
            else:
                 return mf(x)

As the function grows over time, the default arg abuse grows, and the
body of the function gets more obscure as more-&-more "tiny names" are
introduced to save on repeated global and module attribute lookups.

Indeed, in many cases I'd like to wrap an entire module in <BEGIN
CACHE> .... <END CACHE>,  with oodles of "module.attribute" thingies
in the <PURE> block.  _Most_ of my code gets no benefit from Python's
"extremely dynamic" treatment of module.attribute.  It would be great
if Python could do those lookups once at module import time, then
generate some kind of `LOAD_GLOBAL_FAST index` opcode to fetch the
results whenever they're used anywhere inside the module.

Which would doubtless delight all the people struggling to cut
Python's startup time - "Jeez Louise - now he wants Python to do even
_more_ at import time?!" ;-)

There are, e.g., other cases where invariant values of the form `n+1`
or `n-1` are frequently used in a long function, and - cheap as each
one is - it can actually make a time difference if they're
pre-computed outside a loop.  I'm ashamed of how many variables I have
named "np1" and "nm1" :-(

So there's some interesting stuff to ponder here!