[Numpy-discussion] question about creating numpy arrays

Benjamin Root ben.root at ou.edu
Thu May 20 10:44:41 EDT 2010


>
> I gave two counterexamples of why.
>

The examples you gave aren't counterexamples.  See below...

On Wed, May 19, 2010 at 7:06 PM, Darren Dale <dsdale24 at gmail.com> wrote:

> On Wed, May 19, 2010 at 4:19 PM,  <josef.pktd at gmail.com> wrote:
> > On Wed, May 19, 2010 at 4:08 PM, Darren Dale <dsdale24 at gmail.com> wrote:
> >> I have a question about creation of numpy arrays from a list of
> >> objects, which bears on the Quantities project and also on masked
> >> arrays:
> >>
> >>>>> import quantities as pq
> >>>>> import numpy as np
> >>>>> a, b = 2*pq.m,1*pq.s
> >>>>> np.array([a, b])
> >> array([ 12.,   1.])
> >>
> >> Why doesn't that create an object array? Similarly:
> >>
>

Consider the use case of a person creating a 1-D numpy array:
 > np.array([12.0, 1.0])
array([ 12.,  1.])

How is python supposed to tell the difference between
 > np.array([a, b])
and
 > np.array([12.0, 1.0])
?

It can't, and there are plenty of times when one wants to explicitly
initialize a small numpy array with a few discrete variables.



> >>>>> m = np.ma.array([1], mask=[True])
> >>>>> m
> >> masked_array(data = [--],
> >>             mask = [ True],
> >>       fill_value = 999999)
> >>
> >>>>> np.array([m])
> >> array([[1]])
> >>
>

Again, this is expected behavior.  Numpy saw an array of an array,
therefore, it produced a 2-D array. Consider the following:

 > np.array([[12, 4, 1], [32, 51, 9]])

I, as a user, expect numpy to create a 2-D array (2 rows, 3 columns) from
that array of arrays.


>  >> This has broader implications than just creating arrays, for example:
> >>
> >>>>> np.sum([m, m])
> >> 2
> >>>>> np.sum([a, b])
> >> 13.0
> >>
>

If you wanted sums from each object, there are some better (i.e., more
clear) ways to go about it.  If you have a predetermined number of
numpy-compatible objects, say a, b, c, then you can explicitly call the sum
for each one:
 > a_sum = np.sum(a)
 > b_sum = np.sum(b)
 > c_sum = np.sum(c)

Which I think communicates the programmer's intention better than (for a
numpy array, x, composed of a, b, c):
 > object_sums = np.sum(x)       # <--- As a numpy user, I would expect a
scalar out of this, not an array

If you have an arbitrary number of objects (which is what I suspect you
have), then one could easily produce an array of sums (for a list, x, of
numpy-compatible objects) like so:
 > object_sums = [np.sum(anObject) for anObject in x]

Performance-wise, it should be no more or less efficient than having numpy
somehow produce an array of sums from a single call to sum.
Readability-wise, it makes more sense because when you are treating objects
separately, a *list* of them is more intuitive than a numpy.array, which is
more-or-less treated as a single mathematical entity.

I hope that addresses your concerns.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100520/c2d41b38/attachment.html>


More information about the NumPy-Discussion mailing list