Enumerating k-segmentations of a sequence

bullockbefriending bard kinch1967 at gmail.com
Wed Nov 26 10:52:05 EST 2008


On Nov 26, 12:15 am, marek.ro... at wp.pl wrote:
> bullockbefriending bard napisa³(a):
>
>
>
> > I'm not sure if my terminology is precise enough, but what I want to
> > do is:
>
> > Given an ordered sequence of n items, enumerate all its possible k-
> > segmentations.
>
> > This is *not* the same as enumerating the k set partitions of the n
> > items because I am only interested in those set partitions which
> > preserve the order of the items. i.e. if the input sequence is (1 2 3
> > 4 5), then ((1 4) (2 3) (5)) is unacceptable, whereas ((1 2) (3) (4
> > 5)) is acceptable. Hence use of term 'segmentations'.
>
> > e.g., for input sequence (1 2 3 4 5) and k = 3, I need a function
> > which enumerates the following 3-segmentations:
>
> > ((1 2 3) (4) (5))
> > ((1 2) (3 4) (5))
> > ((1 2) (3) (4 5))
> > ((1) (2 3 4) (5))
> > ((1) (2 3) (4 5))
> > ((1) (2) (3 4 5))
>
> > The reason I want to do this is to use it in some simple one-
> > dimensional clustering, i.e. each set item (won't be just integers as
> > above) will have a value of interest, and i'll select the particular
> > set partition which minimises Sigma SSD (sum of squared differences
> > from mean) of these values.
>
> > It seems overkill to generate the full list of set partitions
> > [including e.g. ((1 4) (2) (3 5))] because I intend to begin by
> > sorting the input sequence such that element 1 < element 2 < ... <
> > element n.
>
> > Structural Recursion not being my strong point, any ideas on how to go
> > about this would be much appreciated!
>
> I'm not sure if I correctly understood the semantics of what you need.
> Anyway it looks like a nice exercise in nested generators:
>
> def segmentations(sequence, k):
>         assert 1 <= k <= len(sequence)
>         if k == 1:
>                 yield [sequence]
>         else:
>                 for headlen in xrange(1, len(sequence) - k + 2):
>                         head, tail = sequence[:headlen], sequence[headlen:]
>                         for tail_seg in segmentations(tail, k - 1):
>                                 yield [head] + tail_seg
>
> for segmentation in segmentations((1, 2, 3, 4, 5), 3):
>         print segmentation
>
> Regards,
> Marek

Exactly what I needed. Thank you all for examples and also for
explaining how to think about problems of this nature!




More information about the Python-list mailing list