[Python-ideas] Optimizing builtins

Michael Foord fuzzyman at voidspace.org.uk
Mon Jan 3 15:33:06 CET 2011


On 31/12/2010 21:51, Guido van Rossum wrote:
> On Fri, Dec 31, 2010 at 11:59 AM, Michael Foord
> <fuzzyman at voidspace.org.uk>  wrote:
>>
>> On 31 December 2010 18:49, Guido van Rossum<guido at python.org>  wrote:
>>> [Changed subject *and* list]
>>>
>>>> 2010/12/31 Maciej Fijalkowski<fijall at gmail.com>
>>>>> How do you know that range is a builtin you're thinking
>>>>> about and not some other object?
>>> On Fri, Dec 31, 2010 at 7:02 AM, Cesare Di Mauro
>>> <cesare.di.mauro at gmail.com>  wrote:
>>>> By a special opcode which could do this work. ]:-)
>>> That can't be the answer, because then the question would become "how
>>> does the compiler know it can use the special opcode". This particular
>>> issue (generating special opcodes for certain builtins) has actually
>>> been discussed many times before. Alas, given Python's extremely
>>> dynamic promises it is very hard to do it in a way that is
>>> *guaranteed* not to change the semantics. For example, I could have
>>> replaced builtins['range'] with something else; or I could have
>>> inserted a variable named 'range' into the module's __dict__. (Note
>>> that I am not talking about just creating a global variable named
>>> 'range' in the module; those the compiler could recognize. I am
>>> talking about interceptions that a compiler cannot see, assuming it
>>> compiles each module independently, i.e. without whole-program
>>> optimizations.)
>>>
>>> Now, *in practice* such manipulations are rare
>> Actually range is the one I've seen *most* overridden, not in order to
>> replace functionality but because range is such a useful (or relevant)
>> variable name in all sorts of circumstances...
> No, you're misunderstanding. I was not referring to the overriding a
> name using Python's regular syntax for defining names. If you set a
> (global or local) variable named 'range', the compiler is perfectly
> capable of noticing. E.g.:
>
>    range = 42
>    def foo():
>      for i in range(10): print(i)
>

Right, in the same way the compiler notices local and global variable 
use and compiles different bytecode for lookups.

It's just that accidentally overriding range is the source of my 
favourite "confusing Python error messages" story and I look for any 
opportunity to repeat it.

A few years ago I worked for a company where most of the (very talented) 
developers were new to Python. They called me over to explain what 
"UnboundLocalError" meant and why they were getting it in what looked 
(to them) like perfectly valid code. The code looked something like:

def something(start, stop):
     positions = range(start, stop)

     # more code here...

     range = process(positions)

All the best,

Michael Foord
> While this will of course fail with a TypeError if you try to execute
> it, a (hypothetical) optimizing compiler would have no trouble
> noticing that the 'range' in the for-loop must refer to the global
> variable of that name, not to the builtin of the same name.
>
> I was referring to an innocent module containing a use of the builtin
> range function, e.g.
>
>    # a.py
>    def f():
>      for i in range(10): print(i)
>
> which is imported by another module which manipulates a's globals, for example:
>
>    # b.py
>    import a
>    a.range = 42
>    a.f()
>
> The compiler has no way to notice this when a.py is being compiled.
>
> Variants of "hiding" a mutation like this include:
>
>    a.__dict__['range'] = 42
>
> or
>
>    import builtins
>    builtins.range = 42
>
> and of course for more fun you can make it more dynamic (think
> obfuscated code contests).
>


-- 
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html




More information about the Python-ideas mailing list