Candidate for a new itertool
bearophileHUGS at lycos.com
bearophileHUGS at lycos.com
Sat Mar 7 21:58:44 EST 2009
Raymond Hettinger, maybe it can be useful to add an optional argument
flag to tell such split_on to keep the separators or not? This is the
xsplit I usually use:
def xsplit(seq, key=bool, keepkeys=True):
"""xsplit(seq, key=bool, keepkeys=True): given an iterable seq and
a predicate
key, splits the iterable where key(item) is True and yields the
parts as lists.
If keepkeys is True then the splitting items are kept at the
beginning of the
sublists (but the first sublist may miss the key item).
>>> list(xsplit([]))
[]
>>> key = lambda x: 0x80 & x
>>> l = [1,2,3,0xF0,4,5,6,0xF1,7,8,0xF2,9,10,11,12,13]
>>> list(xsplit(l, key=key))
[[1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12, 13]]
>>> l =
[0xF0,1,2,3,0xF0,4,5,6,0xF1,7,8,0xF2,9,10,11,12,13,0xF0,14,0xF1]
>>> list(xsplit(l, key=key, keepkeys=False))
[[1, 2, 3], [4, 5, 6], [7, 8], [9, 10, 11, 12, 13], [14]]
>>> s1 = "100001000101100001000000010000"
>>> ["".join(map(str, g)) for g in xsplit(s1, key=int)]
['10000', '1000', '10', '1', '10000', '10000000', '10000']
>>> from itertools import groupby # To compare against groupby
>>> s2 = "1111100011111100011100101011111"
>>> ["".join(map(str, g)) for h, g in groupby(s2, key=int)]
['11111', '000', '111111', '000', '111', '00', '1', '0', '1', '0',
'11111']
"""
group = []
for el in seq:
if key(el):
if group:
yield group
group = []
if keepkeys:
group.append(el)
else:
group.append(el)
if group:
yield group
Maybe it's better to separate or denote the separators in some way?
A possibility:
"X1X23X456X" => "X", "1", "X", "23", "X", "456", "X"
Another possibility:
"X1X23X456X" => ("", "X"), ("1", "X"), (["2", "3"], "X"), (["4", "5",
"6"], "X")
Another possibility (True == is a separator):
"X1X23X456X" => (True, "X"), (False, ["1"]), (True, "X"), (False,
["2", "3"]), (True, "X"), (False, ["4", "5", "6"]), (True, "X")
Is it useful to merge successive separators (notice two X)?
"X1X23XX456X" => (True, ["X"]), (False, ["1"]), (True, ["X"]), (False,
["2", "3"]), (True, ["X", "X"]), (False, ["4", "5", "6"]), (True,
["X"])
Opps, this is groupby :-)
Is a name like isplitter or splitter better this itertool?
Bye,
bearophile
More information about the Python-list
mailing list