Python is faster than C

Josiah Carlson jcarlson at uci.edu
Sat Apr 3 20:49:51 EST 2004


> No, range should return an object that is a list, as far as you can tell
> from Python, but which is represented more efficiently than an array of
> objects internally.  The distinction is between the language level (it
> would be a list, with all operations, etc.) and the implementation
> (there is no reason why all lists should be arrays of PyObjects
> internally).

You can implement such a thing already.  In fact, xrange up until 
recently, supported basically everything that a list object did, except 
for mutations.  The reason it doesn't anymore is because for multiple 
versions of Python, such behavior was buggy and poorly supported.  If 
you are bored enough, feel free to make your own xrange-like object that 
supports mutation.  Heck, it can even subclass 'list', though it need 
not have any standard list internals.


> Another example would be 'a'*999999999: the result is a string, but
> there is no reason that it takes 100MB of memory.  Instead, store it
> into a C structure that contains a pointer to the original string object
> 'a' and the repetition counter, but still give this C structure the
> Python type str, so that the difference doesn't show up and the Python
> language remains simple.  (This is a bit difficult to implement
> currently in CPython, but not impossible.)

Also, you are free to implement such a thing.  I believe that the 
current CPython implementation doesn't follow this route (and other 
suggested optimizations) is because it needlessly complicates the 
implementation of CPython.


> Ideally: If you do  x=range(100); x[50]='hi'  then the interpreter first
> builds this optimized range representation and assigns it to x; and when
> in the next statement you modify this list x it says 'oops! i cannot do
> that with this representation', so it reverts to an array-like
> representation (i.e. it creates all 100 elements) and then changes the
> 50th.  No gain here.  If on the other hand you only ever do 'easy'
> things with your list, like iterate over it or read elements, then it
> can all be done with the range representation, without falling back to
> the array representation.

Why not instead use a dictionary-based approach for special items?  It 
would be far more memory efficient, and wouldn't incur the modification 
penalty.


> Again I'm not saying it is trivial to implement it, but that not having
> to expose for optimization purposes 'xrange' and the whole 'iterator'
> part of the language would be worth it, in my opinion.

I think that one of the desires of offering 'iterator' concepts to 
users, both new and seasoned, is that it allows people to think in ways 
they didn't before.  Allowing people to make those optimizations 'by 
hand', I believe (as an educator and computer scientist), allows them to 
grow as programmers (or computer scientists, as the case may be).

Don't get me wrong, it would be terribly convenient for Python to 
abstract away all the nastiness of optimization, but if/when someone 
were to do something that had been /very/ fast before, but was now 
awfully slow (think the current Queue.Queue object for small vs. large 
numbers of items), they are going to jump to the conclusion that Python 
is broken in some fundamental way, maybe come here to complain about 
Python being broken, and those who are familliar with the internals 
would say, "you are ruining Python's early optimization by this thing 
that you do".

Which really gets us to the fact that you are asking for the Python 
internals to be optimized.  In fact, while simultaneously saying "don't 
optimize early", you are optimizing early by saying that range should be 
optimized, as should string multiplication, etc.  Goodness man, pick a 
position and stick with it.

  - Josiah



More information about the Python-list mailing list