Undefined behaviour in C [was Re: The Cost of Dynamism]

Chris Angelico rosuav at gmail.com
Sat Mar 26 10:30:31 EDT 2016


On Sun, Mar 27, 2016 at 1:09 AM, BartC <bc at freeuk.com> wrote:
> I'm surprised that both C and Python allow statements that apparently do
> nothing. In both, an example is:
>
>   x
>
> on a line by itself. This expression is evaluated, but then any result
> discarded. If there was a genuine use for this (for example, reporting any
> error with the evaluation), then it would be simple enough to require a
> keyword in front.

Tell me, which of these is a statement that "does nothing"?

foo
foo.bar
foo["bar"]
foo.__call__
foo()
int(foo)

All of them are expressions to be evaluated and the result discarded.
I'm sure you'll recognize "foo()" as useful code, but to the
interpreter, they're all the same. And any one of them could raise an
exception rather than emit a value; for instance, consider these code
blocks:

# Personally, I prefer doing it the other way, but
# if you have a big Py2 codebase, this will help
# port it to Py3.
try: raw_input
except NameError: raw_input = input

try: int(sys.argv[1])
except IndexError:
    print("No argument given")
except ValueError:
    print("Not an integer")

In each case, the "dummy evaluation" of an expression is used as a way
of asking "Will this throw?". That's why this has to be squarely in
the hands of linters, not the main interpreter; there's nothing that
can't in some way be useful.

>> The main reason the C int has undefined behaviour is that it's
>> somewhere between "fixed size two's complement signed integer" and
>> "integer with plenty of room". A C compiler is generally free to use a
>> larger integer than you're expecting, which will cause numeric
>> overflow to not happen. That's (part of[1]) why overflow of signed
>> integers is undefined - it'd be too costly to emulate a smaller
>> integer. So tell me... what happens in CPython if you incref an object
>> more times than the native integer will permit? Are you bothered by
>> this possibility, or do you simply assume that nobody will ever do
>> that?
>
>
> (On a ref-counted scheme I use, with 32-bit counts (I don't think it matters
> if they are signed or not), each reference implies a 16-byte object
> elsewhere. For the count to wrap around back to zero, that would mean 64GB
> of RAM being needed. On a 32-bit system, something else will go wrong first.
>
> Even on 64-bits, it's a possibility I suppose although you might notice
> memory problems sooner.)

C code can claim references to Python objects without having actual
pointers anywhere. A naive object traversal algorithm could claim
temporary references on all the objects it moves past (to ensure they
don't get garbage collected in the middle), and then get stuck in an
infinite loop traversing a reference loop, thus infinitely increasing
reference counts. Yeah, it can happen... I've had bugs like that in my
code...

Point is, CPython can generally assume that bug-free code will never
get anywhere *near* the limit of a signed integer. Consequently, C's
undefined behaviour isn't a problem; it does NOT mean we need to be
scared of signed integers.

ChrisA



More information about the Python-list mailing list