[Tutor] Python vs. C

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Wed Jan 7 16:50:17 EST 2004



On Wed, 7 Jan 2004, Daniel Ehrenberg wrote:

> > So compilation is sort of a red herring here: Python is slower because
> > it's simply doing more for you.  If a Python program were compiled
> > into machine code, it would still be slow because it's doing more work
> > at runtime.
>
> How could that be true if extentions like Psyco and Pyrex help speed?



Hi Daniel,


For reference:


Pyrex is a language extension that allows the mixing of C and Python:

    http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/

Psyco is an extension that can optimize certain functions:

    http://psyco.sourceforge.net/



I have to admit that I'm not comfortably familiar with with either.  But
at least I can try talking about them!  *grin* And if I say something
wrong, at least I can be corrected and learn from my mistakes.


Pyrex does allow us to embed C stuff into Python, but at the expense of
safety.  That is, as soon as we start using Pyrex, all bets are off.
Here's a concrete example:

###
[dyoo at tesuque dyoo]$ cat test_pyrex.pyx
def test_pyrex():
    cdef int numbers[4]
    cdef int i
    i = 0
    while i < 20:
        numbers[i] = i
        i = i + 1
    i = 0
    while i < 20:
        print i
        i = i + 1
###


Again, as soon as we start using native C, we abandon strict
bounds-checks, and introduce the potential for mischief:

###
[dyoo at tesuque dyoo]$ python
Python 2.2.1 (#1, Sep  3 2002, 14:52:01)
[GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-112)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import test_pyrex
>>> test_pyrex.test_pyrex()
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Segmentation fault (core dumped)
###

So Pyrex does allow for increased speed, but that's mostly because it just
allows us to abandon the runtime checks that Python usually does.  People
who use Pyrex understand the sacrifice, and must simply be very very
careful when they write Pyrex code.



>From what I understand, Psyco optimizes away certain features of Python if
it can prove to itself that bad things aren't possible.  For example, if
it sees code like this:

###
for i in range(20):
    print i
###


Psyco can look at this code fragment, and assert to itself that 'i' here
can never be anything except an integer --- not only that, but that it's
going to always be an integer that won't overflow.  So rather than use a
full-fledged Python object, Psyco can instead transform this code so that
it uses native machine integers, something similar to:

/***/
int i;
for(i = 0; i < 20; i++) { ... }
/***/



But it does all this analysis at runtime, when it has enough information
to prove to itself that certain situations are true.  Here's another code
snippet that would confound this kind of optimization:

###
i = 2147483647
while 1:
    print i
    i += 1
###

Psyco better not try to optimize this by using standard hardware integers,
because we really do need full fledged Python integers to get correct
output.  Otherwise, when 'i' hits 2**31, it'll overflow and give radically
different results from the unoptimized Python code.


These sort of things are part of code analysis, and most compilers perform
code analysis statically, before the program is even run once.  Psyco, on
the other hand, does analysis at runtime, with the same end result of
getting the code to run faster.


But Python's features are still so rich that certain kinds of
optimizations are probably going to be difficult to implement.  One
particular place that's difficult is name resolution.  If we have a
function like:

###
def printSomething():
    print str(42)
###

It might seem obvious that this can be reduced to something like the C
code:

/***/
void printSomething() {
    printf("%d\n", 42);
}
/***/


Unfortunately, this optimization, too, isn't always correct.  There are
two things, in particular, that complicate matters.  What if sys.stdout is
reassigned, or what if str() has been redefined?  For it is perfectly
legal in Python to do things like:

###
sys.stdout = StringIO.StringIO()
str = 'Hello world, this is my string'
###


And because these things are allowed in Python, there's very little that
can be statically optimized out of printSomething().  Every time that
printSomething() is called Python has to make a runtime check to see what
'str' means.

What Psyco and other runtime analyzers can probably do, though, is make
two versions of printSomething(): one to handle the common case (if
sys.stdout and str are unmolested), and another to handle the general
case.  But our computer will still need to be some work to toggle between
the two cases.


Both Pyrex and Psyco allow us to improve the performance of our code, but
it doesn't invalidate the claim that Python's rich feature set forces the
system to do more work to support those dynamic features.


Hope this helps!




More information about the Tutor mailing list