Non Continuous Subsequences

Wed Jul 30 12:32:25 EDT 2008

This post is not about practical stuff, so if you have little time,
you may ignore it.

This is a task of the rosettacode.org site:
http://www.rosettacode.org/wiki/Non_Continuous_Subsequences

A subsequence contains some subset of the elements of this sequence,
in the same order. A continuous subsequence is one in which no
elements are missing between the first and last elements of the
subsequence. The task is to enumerate all non-continuous subsequences
for a given sequence.

Translating the Scheme code to Python was easy (and I think this is
quite more readable than the Scheme version):

def ncsub(seq, s=0):
    if seq:
        x = seq[:1]
        xs = seq[1:]
        p2 = s % 2
        p1 = not p2
        return [x + ys for ys in ncsub(xs, s + p1)] + ncsub(xs, s +
p2)
    else:
        return [[]] if s >= 3 else []

Output:

>>> ncsub(range(1, 4))
[[1, 3]]
>>> ncsub(range(1, 5))
[[1, 2, 4], [1, 3, 4], [1, 3], [1, 4], [2, 4]]
>>> ncsub(range(1, 6))
[[1, 2, 3, 5], [1, 2, 4, 5], [1, 2, 4], [1, 2, 5], [1, 3, 4, 5], [1,
3, 4], [1, 3, 5], [1, 3], [1, 4, 5], [1, 4], [1, 5], [2, 3, 5], [2, 4,
5], [2, 4], [2, 5], [3, 5]]

To test its speed I use this:

from sys import argv

def ncsub(seq, s=0):
    if seq:
        x = seq[:1]
        xs = seq[1:]
        p2 = s % 2
        p1 = not p2
        return [x + ys for ys in ncsub(xs, s + p1)] + ncsub(xs, s +
p2)
    else:
        return [[]] if s >= 3 else []

import psyco; psyco.bind(ncsub)
print len( ncsub(range(1, int(argv[1]))) )

On a 2 GHz CPU it needs 6.4s with n=20 (the output contains 524_097
sublists!), and 7.8s without Psyco, so I think the speed isn't bad.

With the libs I have written for D, translating the Python code is not
difficult (with n=20 it runs in 3.72s with the last DMD compiler):

import d.all;

T[][] ncsub(T)(T[] seq, int s=0) {
    if (seq.length) {
        T[] xs = seq[1..$];
        int p2 = s % 2;
        int p1 = !p2;
        return map((T[] ys){return seq[0]~ys;}, ncsub(xs, s+p1)) ~
ncsub(xs, s+p2);
    } else
        return s >= 3 ? [DA!(T)] : null;
}

void main() {
    foreach (m; xrange(4, 7))
        putr(ncsub(range(1, m)));
}

But with normal D the program is short enough anyway (but a slower to
run (4.7s with n=20) and faster to compile, about 0.1s with full
optimizations and about 0.07s without):

import std.stdio: writefln;

T[][] ncsub(T)(T[] seq, int s=0) {
    if (seq.length) {
        T[][] aux;
        foreach (ys; ncsub(seq[1..$], s + !(s % 2)))
            aux ~= seq[0] ~ ys;
        return aux ~ ncsub(seq[1..$], s + s % 2);
    } else
        return s >= 3 ? [new T[0]] : null;
}

void main() {
    writefln(ncsub([1, 2, 3]));
    writefln(ncsub([1, 2, 3, 4]));
    writefln(ncsub([1, 2, 3, 4, 5]));
}

The Scheme version is eager, it comes from the first Haskell version,
that I think is lazy.

In Python the management of lazy iterables feels almost bolted-on
compared to Haskell, for example in Haskell lazy iterables don't
exhaust like in Python (because they are immutable), and while you
create a lazy list you can refer to the items already created.

But in many cases you can create a lazy code in Python too, even if
that may be harder. So I have tried to create a lazy version for
Python, hoping to speed up the code avoiding the creation and joining
of most/all sublists, but I have failed so far (once I have created a
lazy Python version, I can probably create a short lazy version in D
too, my D libs contain most of itertools module too).

In my failed attempts I have used chain() to join the sublists,
islice() to slice their items, and iter() to make the management more
uniform when the input seq is a Python list instead of an xrange, etc.

The:

if seq:

can be replaced by something like:

try:
    x = seq2.next()
except StopIteration:
    ...
else:
    ...

If you have some time to waste you may suggest me how to write a lazy
version in Python :-)

Bye,
bearophile