[Numpy-discussion] Structured array creation with list of lists and others

Allan Haldane allanhaldane at gmail.com
Fri Mar 24 12:48:32 EDT 2017


On 03/23/2017 02:16 PM, Kirill Balunov wrote:
> It was the first time I tried to create a structured array in numpy.
> Usually I use pandas for heterogeneous arrays, but it is one more
> dependency to my project.
> 
> It took me some time (really, much more than some), to understand the
> problem with structured array creation. As example:
> 
> I had list of list of this kind:
> b=[[ 1, 10.3, 12.1, 2.12 ],...]
> 
> And tried:
> np.array(b, dtype='i4,f4,f4,f4')
> 
> Which raises some weird exception:
> TypeError: a bytes-like object is required, not 'int'
> 
> Two hours later I found that I need list of tuples. I didn't find any help
> in documentation and could not realize that the problem with the inner
> lists...
> 
> Why there is such restriction - 'list of tuples' to create structured
> array? What is the idea behind that, why not list of lists, or tuple of
> lists or ...?
> 
> Also the exception does not help at all...
> p.s.: It looks like that dtype also accepts only list of tuples. But I can
> not catch the idea for this restrictions.
> 

The problem is that numpy needs to distinguish between multidimensional
arrays and structured elements. A "list of lists" will often trigger
numpy's broadcasting rules, which is not what you want here.

For instance, should numpy interpret your input list as a 2d array of
dimension Lx4 containing integer elements, or a 1d array of length L of
structs with 4 fields?

In this particular case maybe numpy could, in principle, figure it out
from what you gave it by calculating that the innermost dimension is
the same length as the number of fields. But there are other cases (such
as assignment) where similar ambiguities arise that are harder to
resolve. So to preserve our sanity we want to require that structures be
formatted as tuples all the time.

I have a draft of potential updated structured array docs you can read here:
https://gist.github.com/ahaldane/7d1873d33d4d0f80ba7a54ccf1052eee

See the section "Assignment from Python Native Types (Tuples)", which
hopefully better warns that tuples are needed. Let me know if you think
something is missing from the draft.

(WARNING: the section about multi-field assignment in the doc draft is
incorrect for current numpy - that's what I'm proposing for the next
release. The rest of the docs are accurate for current numpy)

Agreed that the error message could be changed.

Allan



More information about the NumPy-Discussion mailing list