Python is faster than C

Carl Banks imbosol at aerojockey.invalid
Sun Apr 4 20:14:58 EDT 2004


Armin Rigo wrote:
> Paul Rubin wrote:
>> I think you're saying that instead of having xrange return a special
>> object, range should return that special object instead.  I'm not too
>> clear on the distinction.
> 
> No, range should return an object that is a list, as far as you can tell
> from Python, but which is represented more efficiently than an array of
> objects internally.  The distinction is between the language level (it
> would be a list, with all operations, etc.) and the implementation
> (there is no reason why all lists should be arrays of PyObjects
> internally).
> 
> Another example would be 'a'*999999999: the result is a string, but
> there is no reason that it takes 100MB of memory.  Instead, store it
> into a C structure that contains a pointer to the original string object
> 'a' and the repetition counter, but still give this C structure the
> Python type str, so that the difference doesn't show up and the Python
> language remains simple.  (This is a bit difficult to implement
> currently in CPython, but not impossible.)

Hmm.  What would be really cool is an abstract sequence type with all
kinds of knobs to specify the operations you want to support.  In
other words, rather than saying "I want a vector" or "I want a linked
list," you say, "I want fast indexing, fast sorting, no insertion, and
no resizing," or "I want no indexing, no sorting, fast insertion, fast
appending on both ends".  Then, let the computer choose the actual
implementation.

Better yet, if the compilation is highly optimized, the compiler could
trace the path of the object (best as it can) through the program, and
maybe use profiling data too, to see for itself the how the object is
used, thus freeing the programmer from even specifying the operations.
The programmer could say, "I want a sequence," use it, and the
compiler could choose the best implementation.

The only thing is, I think that would be prohibitively difficult in an
on-the-fly compiled language like Python.  Even having a list with one
or two optimized forms would have drawbacks: it would be hard to
optimize the fast parts with the interpretter second-guessing you all
the time, and it would be hard to write extension modules, since
they'd have to be aware of all the different layouts, or have to
always use an API.

And, frankly, I think having a simple implementation is important,
too.  Complex implementations are more likely to have lots of bugs
that could burden users more than some distinct optimized types would.
In my opinion, Python is doing the right thing by having one internal
form per type.

Given that, and keeping in mind that some of these optimizations
really do help, the question to ask is: do the extra power and
efficiency the optimized version gives you justify the extra burden of
complexity on the programmer?  I believe that the answer is
occasionally yes.

Consider tuples.  Tuples are for the most part an optimized version
of list, but with a few powers list doesn't have, and without a few
powers list has.  It's the same with iterators.

IMO, the the benefit an iterator gives you is far more than the
benefit a tuple gives you.  I would say 'yes' for an iterator, it's
worth a bit of complexity; and probably 'no' for a tuple (although
it's close).


my-two-cents-ly y'rs,

-- 
CARL BANKS                      http://www.aerojockey.com/software
"If you believe in yourself, drink your school, stay on drugs, and
don't do milk, you can get work." 
          -- Parody of Mr. T from a Robert Smigel Cartoon



More information about the Python-list mailing list