How to generate all substrings of a string

girish at it.usyd.edu.au girish at it.usyd.edu.au
Tue Jul 11 22:44:54 EDT 2006


Quoting girish at it.usyd.edu.au:

> Quoting Bruno Desthuilliers <bdesth.quelquechose at free.quelquepart.fr>:
>
> > girish at it.usyd.edu.au a écrit :
> > > Hi,
> > >  I want to generate all non-empty substrings of a string of length
> > >=2.
> > > Also,
> > > each substring is to be paired with 'string - substring' part and
> vice
> > > versa.
> > >  Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'],
> ['c',
> > > 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
> > >  Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'],
> ['abc',
> > > 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd',
> > 'c'],
> > > ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
> > > 'bd'],['bd','ac']]
> > >
> > >  I've tried the following but i cant prevent duplicates and i'm
> > missing
> > > some substrings:
> > >
> > >
> > >>>>colocn = 'abcd'
> > >>>>k = 4
> > >>>>for i in range(k-1):
> > >
> > >             for j in range(1,k):
> > >                 rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
> > >                 rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
> > >                 rules.append(rule1)
> > >                 rules.append(rule2)
> > >
> > >>>>rules
> > >
> > > [['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc',
> 'd'],
> > > ['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad',
> 'bc'],
> > > ['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd',
> 'ab'],
> > > ['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]
> > >
> > >
> > > Any ideas??
> >
> > Algorithmic problem.
> >
> > First, you need to get all permutations. This is a known algorithm,
> that
> > you'll find examples of on the net. Then for each permutation, list
> > possibles 'pair-splits'.
> >
> > Here's a (probably sub-optimal, but it's getting late here) possible
> > implementation in functional style:
> >
> > def rotate(lst):
> >      yield lst
> >      max = len(lst)
> >      for i in range(1, max):
> >          yield lst[i:] + lst[:-(max-i)]
> >
> > def permute(lst):
> >      if len(lst) > 2:
> >          for rotated in rotate(lst):
> >              head, tail = rotated[0], rotated[1:]
> >              for permuted in permute(tail):
> >                  yield [head] + permuted
> >      elif len(lst) == 2:
> >          yield lst
> >          yield lst[::-1]
> >      else:
> >          yield lst
> >
> > def splits(lst):
> >      for i in range(1, len(lst)):
> >          yield lst[0:i], lst[i:]
> >
> > def allsplits(lst):
> >      for permuted in permute(lst):
> >          for pair in splits(permuted):
> >              yield pair
> >
> > def listsubstrings(thestr):
> >      format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
> >      return [format(list(pair)) for pair in allsplits(list(thestr))]
> >
> >
> > print listsubstrings("abcd")
> >
>
> thanks Bruno....wanted to avoid permute function but i guess i've no no
> option :((...
> also there is some double counting in this one (all rules outputted
> twice)...i've to find out where...
A Recursive Attempt:
def gen(s):
    sList = [s[:i]+s[i+1:] for i in range(len(s))]
    substringList = sList
    s = sList
    while len(s[0]) != 1:
        substrings = []
        for string in s:
            sList = [string[:i]+string[i+1:] for i in range(len(string))]
            substrings.extend(sList)
        s = set(substrings)
        s = list(s)
        substringList.extend(s)
    print substringList
    return substringList
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> > http://mail.python.org/mailman/listinfo/python-list
> >
>
>
>
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
> --
> http://mail.python.org/mailman/listinfo/python-list
>




----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.



More information about the Python-list mailing list