[Numpy-discussion] Feature request: Alternative representation for arrays with many dimensions

Fang Zhang fangzh at umich.edu
Wed Dec 9 17:22:37 EST 2020


By default, the __repr__ and __str__ functions of NumPy arrays summarize
long arrays (i.e. omit all items but a few at beginning and end of each
dimension), which is a good thing because when debugging, programmers can
call print() on arrays with millions of elements without clogging the
output or taking up too much CPU/memory (unsurprisingly, the string
representation of an array item usually takes more bytes than its binary
representation).

However, this mechanic does not help when an array has a lot of short
dimensions, e.g. np.arange(2 ** 20).reshape((2,) * 20). I often encounter
such arrays in my work, and every once in a while I would try to print such
an array without flattening it first (usually because I didn't know what
shape or even what type the variable I was trying to print is), which has
caused incidents ranging from losing everything in my scrollback buffer to
crashing my computer by using too much memory.

I think it may be a good idea to change the way NumPy pretty prints arrays
with such shapes to avoid this situation. Something like "array([ 0, 1, 2,
..., 1048573, 1048574, 1048575]).reshape(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2)" would be good enough for me. The condition to
trigger such a representation can either be a fixed number of dimensions,
or when after summarizing the pretty printer would still print more items
than the threshold (1000 by default). Since the outputs of __repr__ and
__str__ are meant for human eyes rather than computers, I think this should
not cause too much of a compatibility problem.

What do you all think?

Sincerely,
Fang Zhang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20201209/b43dbee5/attachment.html>


More information about the NumPy-Discussion mailing list