[Numpy-discussion] Fast way to convert (nested) list to numpy object array?

Marc Hulsman m.hulsman at tudelft.nl
Thu Jul 3 04:51:31 EDT 2014


Hello,

In my application I use nested, someitmes variable length lists, e.g.
[[1,2], [1,2,3], ...]. These
can also become double nested, etc. up to arbitrary complexity.

I like to use numpy indicing on the outer list,
i.e. I want to create: array([[1, 2], [1, 2, 3]], dtype=object)

However, because numpy likes to 'walk' through the nested lists, this
becomes rather slow
when the nested lists are large, e.g.

 k = [range(i) for i in range(10000)]
%timeit numpy.array(k)
1 loops, best of 3: 2.11 s per loop

Compared to shorter lists, e.g:
k2 = [range(numpy.random.randint(0,10)) for i in range(10000)]
%timeit numpy.array(k2)
100 loops, best of 3: 2.7 ms per loop

As I know beforehand that numpy does not have to descend into these
objects, I would just like to create
a 1-dimensional array.  I thought about using fromiter, but his fails with:

ValueError: cannot create object arrays from iterator

A second approach I tried is to create an empty array, and then fill it:
x = numpy.empty(len(k), dtype=object)
%timeit x[:] = k
1000 loops, best of 3: 220 µs per loop

This works already much, much better, but the loop still takes time to
'descend' into the objects if they have a fixed size, e.g.:
k3 = [[range(10) for i in range(100)] for i in range(10000)]
%timeit x[:] = k3
10 loops, best of 3: 45.6 ms per loop

A python loop is in these cases even faster
%timeit for pos, e in enumerate(k3): x[pos] = e
1000 loops, best of 3: 1.02 ms per loop

This piece of code is quite time-critical in my application, and I
observe slow downs due to this behaviour.
My question therefore is if there is a fast way to just convert a list
simply into a 1-dimensional object array,
without each object being descended into?

More in general, if i create an array with numpy.array(k), would it be
possible to indicate that it should
search only 1,2,... nested levels deep into k?

Thanks for any advice,
Marc














More information about the NumPy-Discussion mailing list