Undefined behaviour in C [was Re: The Cost of Dynamism]

Steven D'Aprano steve at pearwood.info
Sun Mar 27 03:50:58 EDT 2016


On Sun, 27 Mar 2016 01:30 am, Chris Angelico wrote:

> On Sun, Mar 27, 2016 at 1:09 AM, BartC <bc at freeuk.com> wrote:
>> I'm surprised that both C and Python allow statements that apparently do
>> nothing. In both, an example is:
>>
>>   x
>>
>> on a line by itself. This expression is evaluated, but then any result
>> discarded. If there was a genuine use for this (for example, reporting
>> any error with the evaluation), then it would be simple enough to require
>> a keyword in front.
> 
> Tell me, which of these is a statement that "does nothing"?
> 
> foo
> foo.bar
> foo["bar"]
> foo.__call__
> foo()
> int(foo)
> 
> All of them are expressions to be evaluated and the result discarded.

Right. And with the exception of the first, all of them could call arbitrary
code (a property, __getattr__, __getitem__, etc.) and hence could have
side-effects.

But the first one is special. It can only do two things: evaluate a name,
then silently throw the result away, or raise NameError.

Bart's point is that Python *could* (and arguably should) define the first
one, a bare name, as a SyntaxError. If you want to test for the existence
of a name, you would have to write something like (let's say):

ifundef raw_input:
    raw_input = input


One might argue that according to the Zen, this is more Pythonic than the
status quo of a try...except. It's an explicit test of whether a name is
undefined, and it avoids at least some silent errors (where an expression
with no side-effects is evaluated, and the result silently thrown away).

We could argue the pros and cons of the two approaches, or even a more
radical approach like Javascripts where names evaluate as undef if they
haven't been defined yet. But one thing is certain: there is a class of
error which only occurs because it is legal to evaluate a bare name and do
nothing with the result.


> Point is, CPython can generally assume that bug-free code will never
> get anywhere *near* the limit of a signed integer. Consequently, C's
> undefined behaviour isn't a problem; it does NOT mean we need to be
> scared of signed integers.

I think you have badly misunderstood the nature of the problem.

My C is a bit rusty, so excuse me if I get the syntax wrong. I have a
function:

void foo(int n) {
    int i = n + 1;
    bar(i);
}

There's a possible overflow of a signed int in there. This is undefined
behaviour. Now, you might think to yourself:

"Well, that's okay. So long as n is not equal to MAXINT, the overflow will
never occur, which means the undefined behaviour will never occur, which
means that bar will be called with (n+1) as argument. So foo is safe, so
long as n is smaller than MAXINT in practice."

And then go on to write something like:

# my C is getting rustier by the second, sorry
int n = read_from_instrument();
foo(n);


secure in the knowledge that your external hardware instrument generates
values 0 to 1000 and will never go near MAXINT. But the C compiler doesn't
know that, so it has to assume that n can be any int, including MAXINT.
Consequently your foo is "meaningless" and your code can legally be
replaced by:

int n = read_from_instrument();
erase_hard_drive();


regardless of the actual value of n. Taken in isolation, of course this is
absurd, and no compiler would actually do that. But in the context of an
entire application, it is very difficult to predict what optimizations the
compiler will take, what code will be eliminated, what code will be
reordered, and the nett result is that hard drives may be erased, life
support systems could be turned off, safety systems can be disabled,
passwords may be exposed, arbitrary code may be run.

I'm sure that there are ways of guarding against this. There are compiler
directives that you can use to tell the compiler not to optimize the call
to foo, or command line switches to give warnings, or you might be able to
guard against this:

int n = read_from_instrument();
if n < MAXINT {
    foo(n);
}


But even the Linux kernel devs have been bitten by this sort of thing. With
all the warnings and linters and code checkers and multiple reviews by
experts, people get bitten by undefined behaviour.

What you can't do is say "foo is safe unless n actually equals MAXINT".
That's wrong. foo is unsafe if the C compiler is unable to determine at
compile-time whether or not n could ever, under any circumstances, be
MAXINT. If n might conceivably ever be MAXINT, then the behaviour of foo is
undefined. Not implementation-specific, or undocumented. Undefined, in the
special C meaning of the term.



-- 
Steven




More information about the Python-list mailing list