[Python-ideas] combining two threads: switch statements and inline functions

Wed Feb 12 08:36:15 CET 2014

From: Skip Montanaro <skip at pobox.com>

Sent: Tuesday, February 11, 2014 9:35 PM

> On Tue, Feb 11, 2014 at 10:06 PM, Haoyi Li <haoyi.sg at gmail.com> wrote:
>>  You can write totally dictionaries with inline functions (i.e. lambdas) in 
> them.
> 
> Lambdas are not what I think of when I think of inline functions.
> Substituting them for named functions gains you nothing. They are just
> (very limited) functions you call in the usual fashion, and which have
> all the usual overhead you associate with calling Python functions.
> 
> I think of inline functions as they exist in C++ (or before that, in
> GCC).

Then I think you're thinking of them wrong.

> If you declare a function to be inline, the compiler is free to
> inline the body of the function's code at the call point and make the
> necessary fix-ups to the prolog and epilog to preserve semantics (as
> if it had not been declared inline), but eliminate call overhead, and
> much of the stack manipulation.

The compiler is _already_ free to do this for any function call where the definition is available in the current translation unit, even if you didn't declare it inline. And, conversely, the compiler is free to insert a regular function call even if you did declare it inline. Effectively, inline is just a hint. Or, as the standard puts it (7.1.2):

> The inline specifier indicates to the implementation that inline substitution of the 
> function body at the point of call is to be preferred to the usual function call 
> mechanism. An implementation is not required to perform this inline substitution 
> at the point of call; however, even if this inline substitution is omitted, the 
> other rules for inline functions defined by 7.1.2 shall still be respected."

There are a few minor differences with respect to the ODR and so forth, but basically, a C++ inline function is just a plain old function with a compiler hint attached to it.

Meanwhile, unlike the proposals in the earlier threads, a C++ inline function—even if they really worked the way you think they do—wouldn't be relevant to what people are talking about in this thread. It doesn't get dynamic scoping or any other special scoping, can't force a return from the calling function, can't be declared anywhere a normal top-level function couldn't, still have external linkage, etc.

> This works in statically typed languages like C and C++, because at
> compile time the compiler knows everything about the types of the
> functions arguments and return values, as well as the environment in
> which the function and the call point exist. I don't think that kind
> of function inlining would be possible in CPython. At minimum, the
> compiler simply doesn't know the types of the arguments to the
> function call at compile time.

How is relevant? If anything, dynamic typing makes inlining _easier_. In C++, you need to know the argument types in order to pass them on the stack or in registers—they have different sizes, some even have to go in different registers (e.g., floats vs. ints on most platforms). In Python, every argument is pushed on the stack in the same way. In CPython, under the covers, they're all just PyObject pointers.

> Heck, it probably wouldn't even know
> (except in the most trivial of circumstances) that any particular
> function available is the one to inline. 

That, on the other hand, is a real issue. Callables are usually attributes on a module or object, looked up by name at runtime, which means there is nothing to inline until runtime. The only case where you could meaningfully inline a function call would be if the function were defined locally within the calling function.

Meanwhile, your focus on optimizing out the cost of a function call is not nearly as relevant to Python as it is to C++. Python code is rarely micro-optimized in the way that C++ code is. C++ is, in fact, designed specifically for this kind of micro-optimization. In Python, the cost of looking up the function's name is already orders of magnitude slower than the stack operations that C++ programmers are trying to optimize out. And even in (modern) C++, this isn't nearly as important an optimization as many programmers seem to think—when it really does matter, automated optimization (especially PGO) almost always does a better job than the programmer's hints anyway. This is why most C++ compilers had to add non-standard "force-inline" directives for the rare cases where the programmer really does know better than the optimizer—because generally, he doesn't.

Finally, just like C++, Python could do this optimization without any need for a directive. And, besides the fact that Python is not designed for manual micro-optimization, the more limited scope in which it could work makes the directive less useful anyway. In C++, methods defined inside class definitions are automatically considered to be inline function definitions. Originally, this was a clever hack to get around the ODR while fitting Modula-style class definitions into the traditional C .h/.c model, but it was kept around in C++98 because, idiomatically, methods defined inside a class are generally good candidates for inlining. In the same way, in Python, functions defined inside the function they're called in would generally be good candidates for inlining. And, since those are the only functions that could legally be inlined, why make the programmer add anything else?