Light slices + COW

David wizzardx at gmail.com
Sat May 3 20:10:16 EDT 2008


>  That xpairs() generator is nice, but it's not the best possible code
>  (but you may disagree with me, and you may think that code better than
>  the successive D code that uses two slices). Inside it I can't use a
>  list slicing because it copies subparts of the list, probably becoming
>  too much slow (for example if len(seq) == 1000).

What do you mean by best possible? Most efficient? Most readable? And
why don't you use islice?

eg:

def xpairs(seq):
    len_seq = len(seq)
    for i, e1 in enumerate(seq):
        for e2 in islice(seq, i+1, None):
            yield e1, e2

Here's a version which makes more use of itertools. It should be more
efficient, but it's ugly :-) (this is my first time using itertools).

def xpairs(seq):
    def _subfunc():
        for i in xrange(len(seq)):
            e1 = seq[i]
            yield izip(repeat(e1), islice(seq, i+1, None))
    return chain(*_subfunc())

>  That D code is as fast or faster than the code you can back-translate
>  from Python, this is possible because in D arrays slices are very
>  light, they are a struct of <length, pointer>

D compiles to efficient machine code so Python is at a disadvantage
even if you use the same syntax (see my first example). You can make
the Python version faster, but beware of premature optimization.

>  I think Python 3.0 too may enjoy a similar strategy of light slices +
>  COW for strings, lists and arrays (tuples and strings don't need the
>  COW).

What I'dlike to see is a rope[1] module for Python. I'ts in C++'s STL
library[2], but I haven't found a Python version yet.

[1] http://en.wikipedia.org/wiki/Rope_(computer_science)
[2] http://www.sgi.com/tech/stl/Rope.html

With a Python rope library you could do things like this:

a = '<some extremely long string>'
b = rope(a) # Contains a reference to a
c = b[0:100000] # Get a rope object
d = b[100000:200000] # Get another rope object
e = b + b # Get another rope object
print e # Get the string representation of all the re-assembled sub-sections

# And so on. In the above code there was only 1 copy of the huge
string in memory. The rope objects only contain a tree of
sub-operations (slices, concatenations, references to original
sequences, etc).

This shouldn't be too hard to implement. Does anyone know of an
already-existing 'rope' module?

David.



More information about the Python-list mailing list