Some thougts on cartesian products

Christoph Zwerschke cito at online.de
Sun Jan 22 01:51:08 EST 2006


In Python, it is possible to multiply a string with a number:

 >>> "hello"*3
'hellohellohello'

However, you can't multiply a string with another string:

 >>> 'hello'*'world'
Traceback (most recent call last):
   File "<interactive input>", line 1, in ?
TypeError: can't multiply sequence by non-int

Sometimes I was missing such a feature.
What I expect as the result is the "cartesian product" of the strings.

Here is a simple implementation of new list and string objects that 
explains what I mean. It also implements powers of lists and strings:

class plist(list):
     """list with cartesian product list"""

     def __mul__(self, other):
         if isinstance(other, pstr):
             return plist([s+o for s in self for o in other])
         if hasattr(other, '__getitem__'):
             return plist([[s, o] for s in self for o in other])
         else:
             return list(self)*other

     def __pow__(self, other):
         if isinstance(other, int) and other > 0:
             if other == 1:
                 return self
             return self * self**(other-1)

class pstr(str):
     """str with cartesian product list"""

     def __mul__(self, other):
         if hasattr(other, '__getitem__'):
             return plist([s+o for s in self for o in other])
         else:
             return str(self)*other

     def __pow__(self, other):
         if isinstance(other, int) and other > 0:
             if other == 1:
                 return self
             return self * self**(other-1)

With these new strings you can do the following:

 >>> pstr("ab")*pstr("cd")*pstr("ef")
['ace', 'acf', 'ade', 'adf', 'bce', 'bcf', 'bde', 'bdf']

 >>> print pstr("abcdefgh")*pstr("12345678")
['a1', 'a2', ..., 'a8', 'b1', 'b2', ..., 'b8',
..., ..., ..., 'h1', 'h2', ..., 'h8']

 >>> print pstr("ACGU")**3
['AAA', 'AAC', 'AAG', 'AAU', 'ACA', 'ACC', 'ACG', ...,
..., 'UGC', 'UGG', 'UGU', 'UUA', 'UUC', 'UUG', 'UUU']

I think this can be quite handy at times and save some typing.

If Python strings had this ability, you could even outdo the
117 byte solution in the recent shortest Python coding contest 
(http://www.pycontest.net), as follows:

j=''.join;seven_seg=lambda x:j(j(\
(' |'*'_ '*' |')[ord('|B¬@z”(ÀD°'[int(d)])%e]\
for d in x)+'\n'for e in(4,9,7))

This has only 110 bytes.

Or you could write a simple password cracker like that:

def crack(crypted, alphabet):
     for passwd in alphabet**4:
         if crypt(passwd, crypted[:2]) == crypted:
             return passwd

And call it with alphabet = string.lowercase, for example.

Cartesian products may be generally interesting for iterables:

def imul(iterable1, iterable2):
     """cartesian product of two iterables"""
     for object1 in iterable1:
         for object2 in iterable2:
             if isinstance(object1, basestring) and \
                     isinstance(object2, basestring):
                 yield object1 + object2
             else:
                 yield (object1, object2)

def ipow(iterable, number):
     """cartesian product power of an iterable"""
     if number == 1:
         for object in iterable:
             yield object
     elif number > 1:
         for object1 in iterable:
             for object2 in ipow(iterable, number-1):
                 yield object1 + object2

class istr(str):
     """str with iterable cartesian product"""

     def __mul__(self, other):
         if isinstance(other, str):
             return imul(self, other)
         else:
             return str(self)*other

     def __pow__(self, other):
         return ipow(self, other)

I was wondering if similar functionality could be added in some way to 
Python. I noticed that Python has a lot of aggregate functions that can 
"reduce" given collection objects (like reduce, filter, min, max, sum, 
hash) and functions that keep the same size (like map, sort or zip), but 
few functions that can "inflate" the given objects (like range and 
enumerate). I know, such functions are dangerous because they also 
inflate time and memory consumed by the program. Still, sometimes they 
can make sense, whenever you for some reason simply *have* to walk 
through all the combinations.



More information about the Python-list mailing list