[Python-checkins] CVS: python/dist/src/Lib string.py,1.46,1.47

Fredrik Lundh effbot at telia.com
Tue Feb 22 18:21:35 EST 2000


bjorn <bjorn at roguewave.com> wrote:
> Also, I don't understand your argument that ['a', 'b', 'c'].join() would
be
> slow.  It shouldn't be any harder to implement that in C, than
implementing
> ''.join(['a', 'b', 'c']).

it's not a question of difficulty (I've already written the
code, you know), it's a question of where to put that C
code.

...

silly or not, I'll make a last attempt to explain this:

factoid 1: Python 1.6 will have several string types.

factoid 2: Python has an abstract sequence API, which
is used in lots of places.  any object that implements this
API can be used with "for-in", "map", "list", "filter", etc
(as well as popular extensions like Numerical Python etc).

factoid 3: any function that can deal with more than
one sequence type should use this API.

factoid 4: you can implement "join" as a sequence of plain
string concatenations (possibly using "reduce"), but that's
not very efficient -- you'll end up doing lots of extra object
allocations and copies for the intermediate results.

(if you don't believe me, try it)

factoid 5: real life experiences show that the costs of (4)
is usually much higher than the cost of accessing items of
a sequence object in order -- after all, most sequence objects
are used with other things than "join", so the implementors
tend to spend some time making sure item access is reasonably
fast.

factoid 6: an efficient way to implement "join" is to make sure
the C code knows all relevant details about the internals of
a single given string type.  in that way, it can do all kinds of
tricks to minimize the number of memory allocations and block
copies required to build the output string.

think about this for a while, and then consider the following
statement.  does this look silly to you?

    unless we want to do a major redesign of the internals,
    the best way to get good performance is to have ONE
    implementation of "join" FOR EACH string type.

(now, if we need one implementation per string type, which
knows about the internals of that string type,  do you still think
it's a lousy idea to put that implementation *in* the string
type?)

</F>





More information about the Python-list mailing list