Experiences/guidance on teaching Python as a first programming language

Chris Angelico rosuav at gmail.com
Thu Dec 19 12:02:38 EST 2013


On Fri, Dec 20, 2013 at 3:20 AM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> On Wed, 18 Dec 2013 19:51:26 +1100, Chris Angelico wrote:
>
>> On Wed, Dec 18, 2013 at 7:18 PM, Steven D'Aprano <steve at pearwood.info>
>> wrote:
>>> You want to know why programs written in C are so often full of
>>> security holes? One reason is "undefined behaviour". The C language
>>> doesn't give a damn about writing *correct* code, it only cares about
>>> writing *efficient* code. Consequently, one little error, and does the
>>> compiler tell you that you've done something undefined? No, it turns
>>> your validator into a no-op -- silently:
>>
>> I disagree about undefined behaviour causing a large proportion of
>> security holes.
>
> I didn't actually specify "large proportion", that's your words. But
> since you mention crashes:

You implied that it's a significant cause of security holes. I counter
by saying that most security holes come from well-defined behaviour.

> I think you are severely under-estimating the rule of undefined behaviour
> in C on security vulnerabilities. I quote from "Silent Elimination of
> Bounds Checks":
>
> "Most of the security vulnerabilities described in my book, Secure Coding
> in C and C++, Second Edition, are the result of exploiting undefined
> behavior in code."
>
> http://www.informit.com/articles/article.aspx?p=2086870

I don't intend to buy the book to find out what he's talking about.
All I know is that the one single most common cause of problems in C,
the buffer overrun, is NOT "exploiting undefined behavior", an nor are
several other common problems (as described in my previous message).

> Earlier this year, four researchers at MIT analysed how undefined
> behaviour is effecting software, and they found that C compilers are
> becoming increasingly aggressive at optimizing such code, resulting in
> more bugs and vulnerabilities. They found 32 previously unknown bugs in
> the Linux kernel, 9 in Postgres and 5 in Python.
>
> http://www.itworld.com/security/380406/how-your-compiler-may-be-compromising-application-security

Yes, those are issues. Not nearly as large as the ones that _don't_
involve your compiler hurting you, except that CPython had proper
memory-usage discipline and didn't have the more glaring bugs.

> I believe that the sheer number of buffer overflows in C is more due to
> the language semantics than the (lack of) skill of the programmers. C the
> language pushes responsibility for safety onto the developer. Even expert
> C programmers cannot always tell what their own code will do. Why else do
> you think there are so many applications for checking C code for buffer
> overflows, memory leaks, buggy code, and so forth? Because even expert C
> programmers cannot detect these things without help, and they don't get
> that help from the language or the compiler.

I agree. The lack of a native string type is fundamental to probably
99% of C program bugs. (Maybe I'm exaggerating, but I reckon it'll be
ball-park.) But at no point do these programs or programmers *exploit*
undefined behaviour. They might run into it when things go wrong, but
by that time, things have already gone wrong. Example:

int foo()
{
    char buffer[80];
    gets(buffer);
    return buffer[0]=='A';
}

So long as the user enters no more than 79 characters, this function's
perfectly well defined. It's vulnerable because user input can trigger
a problem, but if anyone consciously exploits compiler-specific memory
layouts, it's the attacker, and *NOT* the original code. On the flip
side, this code actually does depend on undefined behaviour:

int bar()
{
    char buffer[5];
    char tmp;
    memset(buffer,0,6);
    return tmp;
}

This code is always going to go past its buffer, and if 'tmp' happens
to be the next thing in memory, it'll be happily zeroed. I'm pretty
sure I saw code like this on thedailywtf.com a while back.

>> Python is actually *worse* than C in this respect.
>
> You've got to be joking.

Trolling, more than joking, but as usual, there is a grain of truth in
what I say.

>> I know this
>> particular one is reasonably well known now, but how likely is it that
>> you'll still see code like this:
>>
>> def create_file():
>>     f = open(".....", "w")
>>     f.write(".......")
>>     f.write(".......")
>>     f.write(".......")
>>
>> Looks fine, is nice and simple, does exactly what it should. And in
>> (current versions of) CPython, this will close the file before the
>> function returns, so it'd be perfectly safe to then immediately read
>> from that file. But that's undefined behaviour.
>
> No it isn't. I got chastised for (allegedly) conflating undefined and
> implementation-specific behaviour. In this case, whether the file is
> closed or not is clearly implementation-specific behaviour, not
> undefined. An implementation is permitted to delay closing the file. It's
> not permitted to erase your hard drive.

The problem is that delaying closing the file is a potentially major
issue, if the file is about to be reopened. And it _is_ undefined
behaviour that one particular Python implementation handles in a very
simple and convenient way (and, what's more, in a way that matches how
other languages (eg C++, Pike) would handle it, so it's going to "feel
right" to people); it's actually very easy to depend on this without
realizing it.

> Python doesn't have an ISO standard like C, so where the documentation
> doesn't define the semantics of something, CPython behaves as the
> reference implementation. CPython allows you to simultaneously open the
> same file for reading and writing, in which case subsequent reads and
> writes will deterministically depend on the precise timing of when writes
> are written to disk.

Errr, Python does have its standard. It's not an
implementation-defined language. Yes, there are places where CPython
is the de facto standard, but that doesn't mean something's not
undefined.

Delaying the close might be completely insignificant, but it has the
potential to be critical (depending on the exact share modes and
such). And, in the strictest sense of the word, it *is* undefined, and
it *is* depended on.

ChrisA



More information about the Python-list mailing list