[Python-ideas] Python multi-dimensional array constructor

Todd toddrjen at gmail.com
Wed Oct 19 20:32:54 EDT 2016


On Wed, Oct 19, 2016 at 7:48 PM, Chris Barker <chris.barker at noaa.gov> wrote:

> a few thoughts:
>
> On Wed, Oct 19, 2016 at 12:08 PM, Todd <toddrjen at gmail.com> wrote:
>
>> I have been thinking about how to go about having a multidimensional
>> array constructor in python.  I know that Python doesn't have a built-in
>> multidimensional array class and won't for the foreseeable future.
>>
>
> no but it does have buffers and memoryviews and the extended buffer
> protocol supports "strided" data -- i.e. multi-dimensional arrays. So it
> would be nice to have SOME simple ndarray object in the standard library
> that would wrap such buffers -- it would be nice for working with image
> data, interacting with numpy  arrays, etc.
>
> The "trick" is that once you have the container, you want some
> functionality -- so you add indexing and slicing -- natch. Then maybe some
> simple math? then.... eventually, you are trying to put all of numpy into
> the stdlib, and we already know we don't want to do that.
>
> Though I still think a simple container that only supports indexing and
> slicing would be lovely.
>
> That all being said:
>
> a = [| 0, 1, 2 || 3, 4, 5 |]
>>
>
> I really don't see the advantage of that over:
>
> a = [[0, 1, 2],[3, 4, 5]]
>
> really I don't -- and I'm a heavy numpy user, so I write a lot of those!
>
> If there is a problem with the current options (and I'm not convinced
> there is) it's that it in'st a literal for multidimensional array, but
> rather a literal for a bunch of nested lists -- the list themselves are
> created, and so are all the "boxed" values in the array -- only to be
> pulled out and unboxed to be put in the array.
>
>
But as you said, that is not a multidimensional array.  We aren't comparing
"a = [| 0, 1, 2 || 3, 4, 5 |]" and "a = [[0, 1, 2],[3, 4, 5]]", we are
comparing "a = [| 0, 1, 2 || 3, 4, 5 |]" and "a = np.array([[0, 1, 2],[3,
4, 5]])".  That is a bigger difference.


> However, this is only for literals -- if your data are large, then they
> are not going to be in literals, but rather read form a file or something,
> so this is really not much of a limitation.
>

Even if your original data is large, I often need smaller areas when
processing, for example for broadcasting or as arguments to processing
functions.


>
> However, if you really don't like it, then you can pass a string to
> aconfsturctor function instead:
>
> a = arr_from_string(" | 0, 1, 2 || 3, 4, 5 | ")
>
> yeah, you need to type the extra quotes, but that's not much.
>

Then you need an even longer function call.  Again, that defeats the
purpose of having a literal, which is to make the syntax more concise.


>
> NOTE: I'm pretty sure numpy has something like this already, for folks
> that like the MATLAB style -- though I can't find it at the moment.
>

It is:

r_[[0, 1, 2], [3, 4, 5]

But this uses indexing behind the scenes, meaning your data is created as
an index then needs to be converted to a list later.  This adds
considerable overhead.  I just tested it and it was somewhere around 20
times slower than "np.array()" in the test.


>
> b = [| 0, 1, 2 |
>>      | 3, 4, 5 |]
>>
>
> b = [[ 0, 1, 2 ],
>      [ 3, 4, 5 ]]
>
>
>
No, this is the equivalent of:

b = np.array([[ 0, 1, 2 ],
              [ 3, 4, 5 ]])

The whole point of this is to avoid the "np.array" call.


> You can also create a 2D row array by combining the two:
>>
>> a = [|| 0, 1, 2 ||]
>>
>
> a = [[ 0, 1, 2 ]] or is it: [[[ 0, 1, 2 ]]]
>
> (I can't tell, so maybe your syntax is not so clear???
>


I am not clear where the ambiguity lies?  Count the number of "|" symbols.


>
>
>> For higher dimensions, you can just put more lines together:
>>
>> a = [||| 0, 1, 2 || 3, 4, 5 ||| 6, 7, 8 || 9, 10, 11 |||]
>>
>> b = [||| 0, 1, 2
>>       || 3, 4, 5
>>      ||| 6, 7, 8
>>       || 9, 10, 11
>>      |||]
>>
>
> I have no idea what that means!
>


||| is the delimiter for the third dimension, || is the delimiter for the
second dimension.  It is like how newline is used as a delimeter for the
second dimension in CSV files.  So it is equivalent to:

b = np.array([[[0, 1, 2],
               [3, 4, 5]],
              [[6, 7, 8],
               [9, 10, 11]]])



>
>
>> At least in my opinion, this sort of approach really shines when making
>> higher-dimensional arrays.  These would all be equivalent (the | at the
>> beginning and end are just to make it easier to align indentation, they
>> aren't required):
>>
>> a = [|||| 48, 11, 141, 13, -60, -37, 58, -52, -29, 134
>>        || -6, 96, -66, 137, -59, -147, -118, -104, -123, -7
>>       ||| -103, 50, -89, -12,  28, -12, 119, -131, -73, 21
>>        || -58, 105, 25, -138, -106, -118, -29, -49, -63, -56
>>      |||| -43, -34, 101, -115, 41, 121, 3, -117, 101, -145
>>        || 100, -128, 76, 128, -113, -90, 52, -91, -72, -15
>>       ||| 22, -65, -118, 134, -58, 55, -73, -118, -53, -60
>>        || -85, -136, 83, -66, -35, -117, -71, 115, -56, 133
>>      ||||]
>>
>
> It does seem that you are saving some typing when you have high-dim
> arrays, but I really dont see the readability here.
>


If you are used to counting braces, perhaps.  But imagine someone who is
just starting out.  How do you describe how to determine what dimension is
being split?  "It is one more than total number of sequential left braces
and left parentheses" vs “it is the number of vertical lines".   Add to
that having to deal with both left and right braces rather than a single
delimiter adds a lot of visual noise.  There is a reason we use commas
rather than, say ">,<" as a delimiter in lists, it is easier to deal with a
single kind of symbol rather than three (or potentially five in the current
case).


>
>
> but anyway, the way to more this kind of thing forward is to use it as a
> new format in an existing lib (like numpy, by passing it as a big string.
> IF folks like it and start using it, then there is room for a conversation.
>

The big problem with that is that having to wrap it as a string and pass it
to a function in the numpy namespace loses much of the advantage from
having a literal to begin with.


>
> But I doubt (and I wouldn't support) that anyone would put a literal into
> python for an object that doesn't exist in python...
>
>
Yes, I understand that.  But some projects are already doing that on their
own.  I think having a way for them to do it without losing the list
constructor (which is the approach currently being taken) would be a
benefit.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20161019/a1600a2f/attachment.html>


More information about the Python-ideas mailing list