Best search algorithm to find condition within a range

Chris Angelico rosuav at gmail.com
Fri Apr 10 09:41:54 EDT 2015


On Fri, Apr 10, 2015 at 11:18 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> On Fri, 10 Apr 2015 03:56 am, Chris Angelico wrote:
>
>> On Fri, Apr 10, 2015 at 3:35 AM, Steven D'Aprano
>> <steve+comp.lang.python at pearwood.info> wrote:
>>> It's not so much the undefined behaviour part that gets me. That's bad
>>> enough. But the concept that having the compiler ignore what you write in
>>> the source code because you've accidentally hit some undefined part of
>>> the spec *is a feature* rather than a horrible, horrible design flaw that
>>> blows my mind.
>>
>> In Python, you can create an assert with a side effect. This is
>> something you should never do,
>
> "Never" is a bit strong. There's nothing wrong with:
>
>
> assert log("some message") or condition
>
> for example.
>
>
>> and the compiler is free to optimize that side effect out.
>
> Not really. The compiler is free to not execute the assertion at all, which
> is not the same thing. A Python compiler which determined that log() always
> returned None, and optimized the assertion to "assert condition", would be
> buggy.

All you're saying is that, if the compiler optimizes out the side
effect, it must also remove the assertion. Doesn't change the fact
that it's something it's allowed to do.

> Suppose you write this function in Python:
>
> def value_or_fallback(thingy):
>     print("The value of thingy is %d" % thingy)
>     return int(thingy) if thingy is not None else 42
>
>
> `value_or_fallback` is buggy, because %d of None is an error. In actual
> Python, the semantics of this are completely well defined: a runtime
> TypeError. But in BizarroPython, that operation is undefined behaviour. So
> the compiler can optimize that to:
>
> def value_or_fallback(thingy):
>     print("The value of thingy is %d" % thingy)
>     return int(thingy)
>
> BizarroPython can reason that if `thingy` is None, the first line (the print
> call) is already performing undefined behaviour, so the compiler can do
> anything, including act as if `thingy` is not None.

In other words, the definition of "undefined behaviour" is "something
the programmer will never do". So if one branch of a condition is
undefined, the compiler can rightly assume that you will never execute
that branch, and optimize it out. Here's a C++ example that I actually
ran into at one point, ported brutally to Python:

class TreeNode:
    def __init__(self, payload):
        self.left = self.right = None
        self.payload = payload
    def walk(self, func):
        if self.left: self.left.walk(func)
        func(self.payload)
        if self.right: self.right.walk(func)

(Omitting all the stuff about how they get linked together into a tree
as that's not significant here.)

Now, this works fine. But I thought to myself, there's duplication of
checks for null pointers - why not put that into just one place! And
wrote the walk function like this:

    def walk(self, func):
        if self is None: return
        self.left.walk(func)
        func(self.payload)
        self.right.walk(func)

(It doesn't look like much here, but imagine dozens of places
elsewhere as well that used to have to go "if node: node.walk(f)" and
now just go "node.walk(f)".)

In C++, a null pointer still has its proper type, so calling walk() on
it would indeed come through to this function, unlike in Python where
None is a completely separate type. But it's illegal to call methods
on null pointers, so the compiler quite correctly optimized out the
initial check. But try taking this just on its own, without the
explanatory lead-up; would you ever expect that a method would be
called with self set to None? Of course not. Sure you _could_, but it
doesn't make sense, so you wouldn't.

> To put it another way, the compiler knows that if thingy is None, the print
> call will crash before reaching the return call, so it can ignore the error
> checking around the return. If thingy is None, the error checking will
> never get the chance to run. If thingy isn't None, the error checking is
> superfluous. (If this reasoning seems dubious to you, it's because it is.)

No, it's not dubious reasoning. You've violated a contract with the
compiler by requesting something that's undefined, ergo all bets are
off.

> So far, this isn't too surprising. If we don't think too hard about it, we
> might even consider this to be reasonable behaviour. But now the undefined
> behaviour of `value_or_fallback` infects everything it touches, and allows
> BizarroPython free reign to brutalise your code with a rusty shovel.
>
>
> def unwitting(door_is_open):
>     if door_is_open:
>         walk_on_in()
>     else:
>        ring_bell()
>        # wait for the door to open using the fallback value
>        fallback = value_or_fallback(thingy)
>        wait_for_door_to_open(fallback)

What's thingy here? Is that meant to be None?

> BizarroPython can optimize this entire function to:
>
> def unwitting(door_is_open):
>     walk_on_in()

If you do indeed call value_or_fallback with None as its argument,
then yes, that's quite correct.

> Now let's start travelling in time:
> [chomp example]
>
> All this is well and good, if your intention was to crash faster. But if
> you're trying to debug the crash, you have lost a valuable debugging hint.
> If you thought you could debug your code by tracing it in your head, or on
> paper, you've got a nasty surprise waiting. If you reason that since the
> bell never gets rung, the bug causing the crash must be somewhere *before*
> the call to ring_bell, but that's not the case. The bug occurs after
> ring_bell, and propagates "backwards in time" to crash the application
> before reaching the buggy code.

This is where stuff gets distinctly weird, yes. Welcome to the
debugging of optimized code. All sorts of "obvious expectations" go
out the window. There's no guarantee that your code will be executed
in any semblance of the order it was written in, and things can affect
each other in ways that you wouldn't have predicted.

>> (At least, I think that's the case. Is it
>> legal for a Python implementation to fully evaluate asserts and then
>> simply not raise the exception?)
>
> The documented semantics of assertions in Python, like most (all?) other
> languages which have them, is that assertions either run or don't run. What
> doesn't happen is for them to run but suppress any exceptions. That would
> be pointless: all the expensive of running the test, without the benefit of
> seeing whether the test fails or not.

Of course it's pointless, but I was wondering whether or not it's
legal. I was assuming it wasn't legal, and you're supporting that
belief.

>> Take this example:
>>
>> def spam(lst):
>>     assert lst[0] < lst[1]
>>     # carry on with the rest of the code
>>
>> If lst is a Python list, this is side-effect free. But what if it
>> isn't? What if __getitem__ actually mutates the object? The assertion
>> can be dropped, and the getitem calls not made. Is that a horrible,
>> horrible design flaw in Python, or is it an ill-designed object that
>> does things that programmers should be allowed to assume never happen?
>
> Neither. This is perfectly reasonable behaviour, side-effects or not.
>
> assert can be considered syntactic sugar for this:
>
> # assert expression
> if assertions_enabled:
>     if expression: raise AssertionError

Yes, it's better defined. Doesn't change the fact that a lot of people
would be surprised if this happened.

And, in fact, *are* surprised when suppressing assertions breaks their
code. But it's not the suppression of assertions that broke their
code; their code was already broken. If your C code triggers any form
of undefined behaviour, *your code is already broken*. The compiler's
optimizations do not change this.

ChrisA



More information about the Python-list mailing list