[Python-Dev] Re: Dynamic nested scopes

Guido van Rossum guido@python.org
Fri, 03 Nov 2000 15:31:51 -0500


>     Guido> If this is deemed a useful feature (for open()), we can make a
>     Guido> rule about which built-ins you cannot override like this and
>     Guido> which ones you can.

[Skip]
> I thought we were all adults...

And consenting as well... :-)

> For Py3k I think it should be sufficient to define the semantics of the
> builtin functions so that if people want to override them they can, but that
> overriding them in incompatible ways is likely to create some problems.
> (They might have to run with a "no optimize" flag to keep the compiler from
> assuming semantics, for instance.)  I see no particular reason to remove the
> current behavior unless there are clear instances where something important
> is not going to work properly.
> 
> Modifying builtins seems to me to be akin to linking a C program with a
> different version of malloc.  As long as the semantics of the new functions
> remain the same as the definition, everyone's happy.  You can have malloc
> leave a logfile behind or keep histograms of allocation sizes.  If someone
> links in a malloc library that only returns a pointer to a region that's
> only half the requested size though, you're likely to run into problems.

Actually, the C standard specifically says you are *not* allowed to
override standard library functions like malloc().

I'm thinking of the example of the rules in Fortran for intrinsic
functions (Fortran's name for built-ins).  Based on what Tim has told
me, I believe that Fortran by default assumes that you're not doing
anything funky with intrinsics (like sin, cos, tan) it can use a
shortcut, e.g. inline them.  But there are also real functions by
these names in the Fortran standard library, and you can call those by
declaring e.g. "external function sin".  (There may also be an
explicit way to say that you're happy with the intrinsic one.)  I
believe that when you use the external variant, they may be overridden
by the user.

I'm thinking of something similar here for Python.  If the bytecode
compiler knows that the builtins are vanilla, it can generate better
(== more efficient) code for e.g.

   for i in range(10):
       ...

Ditto for expressions like len(x) -- the len() operation is typically
so fast that the cost is dominated by the two dict lookup operations
(first in globals(), then in __builtins__).

Why am I interested in this?  People interested in speed routinely use
hacks that copy a built-in function into a local variable so that they
don't have dictlookups in their inner loop; it's really silly to have
to do this, and if certain built-ins were recognized by the compiler
it wouldn't be necessary.  There are other cases where this is not so
easy without much more analysis; but the built-ins (to me) seem
low-hanging fruit.  (Search the archives for that term, I've used it
before in this context.)

I assume that it's *really* unlikely that there are people patching
the __builtin__ module to replace the functions that are good inline
candidates (range, len, id, hash and so on).  So I'm interesting in
complicating the rules here.  I'd be happy to make an explicit list of
those builtins that should not be messed with, as part of the language
definition.  Program that *do* mess with these have undefined
semantics.

--Guido van Rossum (home page: http://www.python.org/~guido/)