Experiences/guidance on teaching Python as a first programming language

Thu Dec 19 11:20:10 EST 2013

On Wed, 18 Dec 2013 19:51:26 +1100, Chris Angelico wrote:

> On Wed, Dec 18, 2013 at 7:18 PM, Steven D'Aprano <steve at pearwood.info>
> wrote:
>> You want to know why programs written in C are so often full of
>> security holes? One reason is "undefined behaviour". The C language
>> doesn't give a damn about writing *correct* code, it only cares about
>> writing *efficient* code. Consequently, one little error, and does the
>> compiler tell you that you've done something undefined? No, it turns
>> your validator into a no-op -- silently:
> 
> I disagree about undefined behaviour causing a large proportion of
> security holes. 

I didn't actually specify "large proportion", that's your words. But 
since you mention crashes:

> Maybe it produces some, but it's more likely to produce
> crashes or inoperative codde. 

*Every* crash is a potential security hole. Not only is a denial of 
service, but a fatal exception[1] is a sign that arbitrary memory has 
been executed as if it were code, or an illegal instruction executed. 
Every such crash is a potential opportunity for an attacker to run 
arbitrary code. There are only two sorts of bugs: bugs with exploits, and 
bugs that haven't been exploited *yet*.

I think you are severely under-estimating the rule of undefined behaviour 
in C on security vulnerabilities. I quote from "Silent Elimination of 
Bounds Checks":

"Most of the security vulnerabilities described in my book, Secure Coding 
in C and C++, Second Edition, are the result of exploiting undefined 
behavior in code."

http://www.informit.com/articles/article.aspx?p=2086870

Undefined behaviour interferes with the ability of the programmer to 
understand causality with respect to his source code. That makes bugs of 
all sorts more likely, including buffer overflows.

Earlier this year, four researchers at MIT analysed how undefined 
behaviour is effecting software, and they found that C compilers are 
becoming increasingly aggressive at optimizing such code, resulting in 
more bugs and vulnerabilities. They found 32 previously unknown bugs in 
the Linux kernel, 9 in Postgres and 5 in Python.

http://www.itworld.com/security/380406/how-your-compiler-may-be-compromising-application-security

I believe that the sheer number of buffer overflows in C is more due to 
the language semantics than the (lack of) skill of the programmers. C the 
language pushes responsibility for safety onto the developer. Even expert 
C programmers cannot always tell what their own code will do. Why else do 
you think there are so many applications for checking C code for buffer 
overflows, memory leaks, buggy code, and so forth? Because even expert C 
programmers cannot detect these things without help, and they don't get 
that help from the language or the compiler.

[...]
> Apart from the last one (file system atomicity, not a C issue at all),
> every single issue on that page comes back to one thing: fixed-size
> buffers and functions that treat a char pointer as if it were a string.
> In fact, that one fundamental issue - the buffer overrun - comes up
> directly when I search Google for 'most common security holes in c code'

I think that you have missed the point that buffer overflows are often a 
direct consequence of the language. For example:

http://www.kb.cert.org/vuls/id/162289

Quote:

"Some C compilers optimize away pointer arithmetic overflow tests that 
depend on undefined behavior without providing a diagnostic (a warning). 
Applications containing these tests may be vulnerable to buffer overflows 
if compiled with these compilers."

The truly frightening thing about this is that even if the programmer 
tries to write safe code that checks the buffer length, the C compiler is 
*allowed to silently optimize that check away*.

> Python is actually *worse* than C in this respect.

You've got to be joking.

> I know this
> particular one is reasonably well known now, but how likely is it that
> you'll still see code like this:
> 
> def create_file():
>     f = open(".....", "w")
>     f.write(".......")
>     f.write(".......")
>     f.write(".......")
> 
> Looks fine, is nice and simple, does exactly what it should. And in
> (current versions of) CPython, this will close the file before the
> function returns, so it'd be perfectly safe to then immediately read
> from that file. But that's undefined behaviour. 

No it isn't. I got chastised for (allegedly) conflating undefined and 
implementation-specific behaviour. In this case, whether the file is 
closed or not is clearly implementation-specific behaviour, not 
undefined. An implementation is permitted to delay closing the file. It's 
not permitted to erase your hard drive.

Python doesn't have an ISO standard like C, so where the documentation 
doesn't define the semantics of something, CPython behaves as the 
reference implementation. CPython allows you to simultaneously open the 
same file for reading and writing, in which case subsequent reads and 
writes will deterministically depend on the precise timing of when writes 
are written to disk. That's not something which the language can control, 
given the expected semantics of file I/O. The behaviour is defined, but 
it's defined in such a way that what you'll get is deterministic but 
unpredictable -- a bit like dict order, or pseudo-random numbers.

A Python implementation is not permitted to optimize away subsequent 
reads, erase your hard drive, or download a copy of Wikipedia from the 
Internet. A C compiler is permitted to do any of these.

(Of course, no competent C compiler would actually download all of 
Wikipedia, since that would be slow. Instead, they would probably only 
download the HTTP headers for the main page.)

[1] I'm talking low level exceptions or errors, not Python exceptions.

-- 
Steven