flatten(), [was Re: map/filter/reduce/lambda opinions and background unscientific mini-survey]

Tue Jul 5 19:51:28 EDT 2005

On Tue, 5 Jul 2005, Ron Adam wrote:

> Tom Anderson wrote:
>
>> The trouble with these is that they make a lot of temporary lists - 
>> George's version does it with the recursive calls to flatten, and Ron's 
>> with the slicing and concatenating. How about a version which never 
>> makes new lists, only appends the base list? We can use recursion to 
>> root through the lists ...
>
> Ok...  How about a non-recursive flatten in place? ;-)

How about, er, oh, i give up.

> def flatten(seq):
>    i = 0
>    while i!=len(seq):
>        while isinstance(seq[i],list):
>            seq.__setslice__(i,i+1,seq[i])
>        i+=1
>    return seq
>
> seq = [[1,2],[3],[],[4,[5,6]]]
> print flatten(seq)
>
> I think I'll be using the __setslice__ method more often.

Um, can't you just do slice assignment? Make that line:

             seq[i:i+1] = seq[i]

> And the test:

Stupendous and timely work!

> The results on Python 2.3.5: (maybe someone can try it on 2.4)
>
> recursive flatten: 23.6332723852
> flatten in place-non recursive: 22.1817641628
> recursive-no copies: 30.909762833
> smallest recursive: 35.2678756658
> non-recursive flatten in place without copies: 7.8551944451

GAAAAH!

> A 300% improvement!!!
>
> This shows the value of avoiding copies, recursion, and extra function 
> calls.

Specifically, it shows the value of avoiding extra function calls, since 
my zerocopy version is slower than the copy-happy single-function 
versions.

Also, there are some differences between the functions which constitute 
potential hillocks on the playing field - i test flattenability by looking 
for an __iter__ method, whereas other implementations mostly ask 
"instanceof list?" (less general, and less in the spirit of duck typing, 
IMNERHO). I'd argue that my decomposition into functions is this sort of 
difference, too - a matter of style (good style!), not algorithm. So, 
levelling those differences, and throwing in my non-recursive zerocopy 
foolery, here's my take on it ...

# ----

# here be a quick reimplementation of timeit to time function objects
# no exec for me no siree bob
# all you need to know is that timeit(fn) gives you time taken to run fn

import sys
import time

TIMERS = {
 	"win32": time.clock
}

timer = TIMERS.get(sys.platform, time.time)

def timeit(fn, n=None):
 	if (n == None):
 		t = 0.1
 		n = 1
 		while (t < 1.0):
 			n = max(int((n * min((1.0 / t), 10))), (n + 1))
 			t = _timeit(fn, n)
 	else:
 		t = _timeit(fn, n)
 	return t / n

def _timeit(fn, n):
 	it = xrange(n)
 	t0 = timer()
 	for i in it:
 		fn()
 	t1 = timer()
 	return float((t1 - t0))

# there is real code now
# i've rewritten the functions to use uniform variable names
# and to use isiterable

def isiterable(obj):
 	return hasattr(obj, "__iter__")

def georges_recursive_flatten(seq):
 	return reduce(_accum, seq, [])

def _accum(a, item):
 	if isiterable(item):
 		a.extend(georges_recursive_flatten(item))
 	else:
 		a.append(item)
 	return a

def rons_nonrecursive_flatten(seq):
 	a = []
 	while seq:
 		while isiterable(seq[0]):
 			seq = seq[0] + seq[1:]
 		a.append(seq.pop(0))
 	return a

def toms_recursive_zerocopy_flatten(seq, a=[]):
 	if (isiterable(seq)):
 		for item in seq:
 			toms_recursive_zerocopy_flatten(item, a)
 	else:
 		a.append(seq)
 	return a

def toms_iterative_zerocopy_flatten(seq):
 	stack = [None]
 	cur = iter(seq)
 	a = []
 	while (cur != None):
 		try:
 			item = cur.next()
 			if (isiterable(item)):
 				stack.append(cur)
 				cur = iter(item)
 			else:
 				a.append(item)
 		except StopIteration:
 			cur = stack.pop()
 	return a

def devans_smallest_recursive_flatten(seq):
 	if (isiterable(seq)):
 		return sum([devans_smallest_recursive_flatten(item) for item in seq], [])
 	else:
 		return [seq]

def rons_nonrecursive_inplace_flatten(seq):
 	i = 0
 	while (i != len(seq)):
 		while (isiterable(seq[i])):
 			seq[i:(i + 1)] = seq[i] # setslice takes iterators!
 		i = i + 1
 	return seq

flattens = [
 	georges_recursive_flatten,
 	rons_nonrecursive_flatten,
 	toms_recursive_zerocopy_flatten,
 	toms_iterative_zerocopy_flatten,
 	devans_smallest_recursive_flatten,
 	rons_nonrecursive_inplace_flatten
]

seq = [[1,2],[3],[],[4,[5,6]]]

def timeflatten(flatten):
 	return timeit(lambda: flatten(seq))

def funcname(fn):
 	return fn.func_name

print zip(map(funcname, flattens), map(timeflatten, flattens))

# ----

The output (in python 2.3 on a 1.5 GHz G4 pbook with OS X 10.3) is:

[
('georges_recursive_flatten', 0.00015331475192276888), 
('rons_nonrecursive_flatten', 0.00015447115513356376), 
('toms_recursive_zerocopy_flatten', 0.00012239551614106925), 
('toms_iterative_zerocopy_flatten', 0.00035910996630353429), 
('devans_smallest_recursive_flatten', 0.00019606360118084218), 
('rons_nonrecursive_inplace_flatten', 5.8524826144294404e-05)
]

So, my zerocopy flatten is, after all, faster than George, you and Devan's 
copy-happy versions, although not by much, my iterative version is much, 
much slower, and the winner is still your in-place flatten, which beats my 
not-too-bad 122 microseconds with a scorching 58!

I guess setslice is a lot faster than i thought. How are python lists 
implemented? Presumably not as straightforward arrays, where inserting a 
bunch of items at the head would mean moving everything else in the list 
along?

We really ought to do this benchmark with a bigger list as input - a few 
thousand elements, at least. But that would mean writing a function to 
generate random nested lists, and that would mean specifying parameters 
for the geometry of its nestedness, and that would mean exploring the 
dependence of the performance of each flatten on each parameter, and that 
would mean staying up until one, so i'm not going to do that.

tom

-- 
[Philosophy] is kind of like being driven behind the sofa by Dr Who - scary, but still entertaining. -- Itchyfidget