Sequence splitting

Fri Jul 3 00:40:13 EDT 2009

On Thu, Jul 2, 2009 at 23:34, Brad<schickb at gmail.com> wrote:
> On Jul 2, 9:08 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
>> Brad <schi... at gmail.com> writes:
>> > On Jul 2, 8:14 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
>> > > schickb <schi... at gmail.com> writes:
>> > > > def split(seq, func=None):
>> > > >     if func is None:
>> > > >         func = bool
>> > > >     t, f = [], []
>> > > >     for item in seq:
>> > > >         if func(item):
>> > > >             t.append(item)
>> > > >         else:
>> > > >             f.append(item)
>> > > >     return (t, f)
>>
>> > > untested:
>>
>> > >    def split(seq, func=bool):
>> > >       xs = zip(seq, itertools.imap(func, seq))
>> > >       t = list(x for (x,y) in xs if y)
>> > >       f = list(x for (x,y) in xs if not y)
>> > >       return (t, f)
>>
>> > In my testing that is 3.5x slower than the original solution (and less
>> > clear imo). I fixed my version to take a bool default. Either way, I'm
>> > not really looking for additional ways to do this in Python unless
>> > I've totally missed something. What I am considering is writing it in
>> > C, much like filter.
>>
>> I'm a little skeptical that the C version will help much, if it's
>> evaluating a python function at every list element.
>
> Perhaps true, but it would be a nice convenience (for me) as a built-
> in written in either Python or C. Although the default case of a bool
> function would surely be faster.
>
>> Here's a variant of your version:
>>
>>  def split(seq, func=bool):
>>      t, f = [], []
>>      ta, fa = t.append, f.append
>>      for item in seq:
>>          (ta if func(item) else fa)(item)
>>      return (t, f)
>>
>> This avoids some dict lookups and copying.  I wonder if that helps
>> significantly.
>
> Faster, but in tests of a few short sequences only 1% so.
>
> -Brad
> --
> http://mail.python.org/mailman/listinfo/python-list
>

If it is speed that we are after, it's my understanding that map and
filter are faster than iterating with the for statement (and also
faster than list comprehensions).  So here is a rewrite:

def split(seq, func=bool):
 	t = filter(func, seq)
 	f = filter(lambda x: not func(x), seq)
 	return list(t), list(f)

The lambda thing is kinda ugly, but I can't think of anything else.
Also, is it ok to return lists?  Py3k saw a lot of APIs changed to
return iterables instead of lists, so maybe my function should have
'return t, f' as it's last statement.

-- 
Pablo Torres N.