[Numpy-discussion] Proposal to support __format__

Stephan Hoyer shoyer at gmail.com
Tue Feb 14 20:55:21 EST 2017


On Tue, Feb 14, 2017 at 5:35 PM, Gustav Larsson <larsson at cs.uchicago.edu>
wrote:

> 1. For object arrays, I would default to calling format on each element
>> (your "map principle") rather than raising an error.
>>
>
> I'm glad you brought this up as a possibility. It might be possible, but
> there are some issues that would need to be resolved. First of all, {} and
> {:} always works and gives the same result it currently does. So, this only
> affects the situation where the format spec is non-empty. I think there are
> two main issues:
>
> Heterogeneity: Let's say we have x = np.array([12.3, True, 'string',
> Foo(10)], dtype=np.object). Then, presumably {:.1f} should cause a
> ValueError since the string does not support format type 'f'. This could
> create a lot of ValueError land mines for the user.
>

Things will absolutely break if you try to do complex operations on
in-homogeneously typed arrays. I would put the onus on the user in such a
case.


> For x[:2] however it should work and produce something like [12.3  1.0].
> Note, the "map principle" still can't be strictly true. Let's say we have
> an array with type object and mostly string-like elements. Then {:5s} will
> still not produce exactly {:5s} element-wise, because the string
> representations need to be repr-based inside the array (otherwise it could
> break for newlines and things like that and produce spaces that make the
> boundary between elements ambiguous). This brings me to the next issue.
>

Indeed, this will be a departure from the behavior without a format string,
which just uses repr. In my mind, this is the strongest argument against
using the map principle here, because there is a discontinuous shift
between providing and not providing a format string.


> Str vs. repr: If we have a homogeneous object-array with types Foo and Foo
> implements __format__, it would be great if this worked. However, one issue
> is that Foo.__format__ might return things like newline (or spaces), which
> would break (or confuse) the printed output (unless it is made incredibly
> smart to support "vertical alignment"). This issue is essentially the same
> as for strings in general, which is why they use repr instead. I can think
> of two solutions: 1) Try to sanitize (or repr-ify) the string returned by
> __format__ somehow; 2) Put the responsibility on the user and simply let
> the rendering break if Foo.__format__ does not play well.
>

I wouldn't do anything fancy here to worry about line breaks. It's
basically impossible to get this right for edge cases, so I would certainly
put the responsibility on the user.

On another note, about Python 2 vs 3: I would definitely take the approach
of copying the Python 3 behavior on all versions of NumPy (when feasible)
and not being too concerned about compatibility with format on Python 2.
The future is Python 3.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170214/9f2f023c/attachment.html>


More information about the NumPy-Discussion mailing list