[Numpy-discussion] Feature request: Alternative representation for arrays with many dimensions

Stephan Hoyer shoyer at gmail.com
Wed Dec 9 18:13:17 EST 2020


On Wed, Dec 9, 2020 at 2:24 PM Fang Zhang <fangzh at umich.edu> wrote:

> By default, the __repr__ and __str__ functions of NumPy arrays summarize
> long arrays (i.e. omit all items but a few at beginning and end of each
> dimension), which is a good thing because when debugging, programmers can
> call print() on arrays with millions of elements without clogging the
> output or taking up too much CPU/memory (unsurprisingly, the string
> representation of an array item usually takes more bytes than its binary
> representation).
>
> However, this mechanic does not help when an array has a lot of short
> dimensions, e.g. np.arange(2 ** 20).reshape((2,) * 20). I often encounter
> such arrays in my work, and every once in a while I would try to print such
> an array without flattening it first (usually because I didn't know what
> shape or even what type the variable I was trying to print is), which has
> caused incidents ranging from losing everything in my scrollback buffer to
> crashing my computer by using too much memory.
>
> I think it may be a good idea to change the way NumPy pretty prints arrays
> with such shapes to avoid this situation. Something like "array([ 0, 1, 2,
> ..., 1048573, 1048574, 1048575]).reshape(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
> 2, 2, 2, 2, 2, 2, 2, 2, 2)" would be good enough for me. The condition to
> trigger such a representation can either be a fixed number of dimensions,
> or when after summarizing the pretty printer would still print more items
> than the threshold (1000 by default). Since the outputs of __repr__ and
> __str__ are meant for human eyes rather than computers, I think this should
> not cause too much of a compatibility problem.
>

+1, this could use improvement. For high dimensional arrays, the way NumPy
prints is way too verbose.

In xarray, we automatically decrease "edgeitems" for printing NumPy arrays,
to 2 for ndim=3 and 1 for ndim>3:
https://github.com/pydata/xarray/blob/9802411b35291a6149d850e8e573cde71a93bfbf/xarray/core/formatting.py#L439-L453

As a last resort, we could consider automatically limiting the maximum
number of displayed lines, adding "..." for clipped lines. It is unlikely,
for example, that anymore ever wants to print more than ~100 lines of text
to the screen, which can easily happen for very high dimensional arrays.


> What do you all think?
>
> Sincerely,
> Fang Zhang
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20201209/948b0e3b/attachment.html>


More information about the NumPy-Discussion mailing list