Undefined behaviour in C [was Re: The Cost of Dynamism]

Sun Mar 27 05:51:20 EDT 2016

On 27 Mar 2016 10:56, "Steven D'Aprano" <steve at pearwood.info> wrote:
>
>
> My C is a bit rusty, so excuse me if I get the syntax wrong. I have a
> function:
>
> void foo(int n) {
>     int i = n + 1;
>     bar(i);
> }
>
> There's a possible overflow of a signed int in there. This is undefined
> behaviour. Now, you might think to yourself:
>
> "Well, that's okay. So long as n is not equal to MAXINT, the overflow will
> never occur, which means the undefined behaviour will never occur, which
> means that bar will be called with (n+1) as argument. So foo is safe, so
> long as n is smaller than MAXINT in practice."
>
> And then go on to write something like:
>
> # my C is getting rustier by the second, sorry
> int n = read_from_instrument();
> foo(n);
>
>
> secure in the knowledge that your external hardware instrument generates
> values 0 to 1000 and will never go near MAXINT. But the C compiler doesn't
> know that, so it has to assume that n can be any int, including MAXINT.
> Consequently your foo is "meaningless" and your code can legally be
> replaced by:
>
> int n = read_from_instrument();
> erase_hard_drive();

This is incorrect. Provided n does not take the value INT_MAX the code is
conforming and the standard mandates how it should behave. The compiler is
allowed to make optimisations that assume n never takes that value such
that in the circumstances where n *would* take that value any behaviour is
acceptable. The compiler is not free to say "I don't know if it would take
that value so I'm unconstrained even if it does not".

>
> regardless of the actual value of n. Taken in isolation, of course this is
> absurd, and no compiler would actually do that. But in the context of an
> entire application, it is very difficult to predict what optimizations the
> compiler will take, what code will be eliminated, what code will be
> reordered, and the nett result is that hard drives may be erased, life
> support systems could be turned off, safety systems can be disabled,
> passwords may be exposed, arbitrary code may be run.
>
> I'm sure that there are ways of guarding against this. There are compiler
> directives that you can use to tell the compiler not to optimize the call
> to foo, or command line switches to give warnings, or you might be able to
> guard against this:
>
> int n = read_from_instrument();
> if n < MAXINT {
>     foo(n);
> }

This is correct. It is now impossible for the addition n+1 to overflow
since we cannot hit that code if n is INT_MAX.

> But even the Linux kernel devs have been bitten by this sort of thing.
With
> all the warnings and linters and code checkers and multiple reviews by
> experts, people get bitten by undefined behaviour.

I think you're overegging this a bit. Many experienced programmers get
bitten by bugs while working in many languages. C is more troublesome than
many and there is room for improvement but it's not as dramatic as you
suggest.

> What you can't do is say "foo is safe unless n actually equals MAXINT".
> That's wrong. foo is unsafe if the C compiler is unable to determine at
> compile-time whether or not n could ever, under any circumstances, be
> MAXINT. If n might conceivably ever be MAXINT, then the behaviour of foo
is
> undefined. Not implementation-specific, or undocumented. Undefined, in the
> special C meaning of the term.

I think you've misunderstood this: signed addition that would not overflow
is well defined so the optimiser cannot alter the semantics in that case.
It is free to assume that values that would overflow will not occur and
alter execution in surprising ways but that's not the same as rewriting
valid code on the assumption that an invalid value cannot be proven not to
occur. Rather the onus is on the optimiser to prove that the optimised code
is equivalent for all inputs where behaviour is defined.

--
Oscar