Undefined behaviour in C [was Re: The Cost of Dynamism]

Fri Mar 25 17:23:41 EDT 2016

On Sat, Mar 26, 2016 at 2:50 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Fri, 25 Mar 2016 06:54 am, BartC wrote:
>
>>> In the case of C, the line is limited to working with some specific type
>>> (say, an int32). Even there, if the addition might overflow, the
>>> behaviour is undefined and the compiler can do anything it wants
>>> (including time travel,
>>
>> I'm pretty sure it can't do time travel...
>>
>>   or erasing your hard disk).
>
>
> You are absolutely, and potentially catastrophically, mistaken on that.
>
> Undefined behaviour does not mean "implementation specific behaviour". Nor
> does it mean "something sensible will happen but we don't promise what it
> will be". It means "the compiler can do anything", including ignoring the
> code you actually wrote and substituting its own faster code, which may or
> may not do what your original code did.

C's undefined behaviour is basically "you're not allowed to do this".
The compiler is free to assume that you will never do this, ergo it is
free to write out machine code that is correct only so long as you do
not do this. Let me draw a Python analogy:

1) A well-behaved iterator will return values until it raises
StopIteration, and then raise StopIteration thereafter.
2) An iterator which raises and then returns is thus badly written and
should not exist.
3) A consumer of an iterator is allowed to misbehave in the event that
the iterator returns after raising.
4) Therefore an optimizer is free to *cause* the consumer to misbehave
when given an ill-behaved iterator.

Consider this code:

def zip_forever(*iters, fillvalue=None):
    """Like zip_longest, but without the silly rule that
    it should terminate once they all finish"""
    while True:
        yield tuple(next(it, fillvalue) for it in iters)

If all its iterators are well-behaved, this is identical (modulo
monkey-patching of names like "tuple" and "next") to:

def zip_forever(*iters, fillvalue=None):
    yield from zip_longest(*iters, fillvalue=fillvalue)
    tup = (None,) * len(iters)
    while True: yield tup

Is PyPy/FAT Python/some other optimizer permitted to make this change?
The only difference between C's "undefined behaviour" and Python's way
of doing the spec is the answer to that one question. C says yes,
you're totally allowed to assume that; the end result in all cases
should be indistinguishable; any time you can detect that it optimized
this away, it's because of a bug somewhere else. Is your code allowed
to misbehave when other code has bugs? That, ultimately, is the
question.

Imagine a tightened up subset of the Python language that restricts
certain unusual behaviours. (This has the same purpose as PyPy's
RPython, and for all I know, RPython might do exactly this.) It's a
narrowly-used single purpose language, and its sole guarantee is that
well-written SubPython code will behave the same way in SubPython as
it does in CPython. It might then, for instance, not permit rebinding
of builtins, nor of function default argument replacements, without an
explicit "drop_caches()" call. Code could then be far more
aggressively optimized - but behaviour would become undefined in the
event that you break one of its rules. That's really all that C's
done, and you honestly don't have to get so panicky at the word
"undefined". It's simply "don't do this".

And let's face it... there's a *lot* of undefined behaviour in CPython
once you play with ctypes. Do we have any language guarantees as to
what will happen if you change the value of the integer 7 to now be 8?
Or will you just say "don't do that"?

ChrisA