[Numpy-discussion] defining a NumPy API standard?

Sat Jul 13 03:47:32 EDT 2019

This slide deck from Matthew Rocklin at SciPy 2019 might be relevant:
https://matthewrocklin.com/slides/scipy-2019#/

On Tue, Jun 4, 2019 at 12:06 AM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Mon, Jun 3, 2019 at 7:56 PM Sebastian Berg <sebastian at sipsolutions.net>
> wrote:
>
>> On Sun, 2019-06-02 at 08:42 +0200, Ralf Gommers wrote:
>> >
>> >
>> <snip>
>> > > >
>> > >
>> > > This sounds like a restructuring or factorization of the API, in
>> > > order to make it smaller, and thus easier to learn and use.
>> > > It may start with the docs, by paying more attention to the "core"
>> > > or important functions and methods, and noting the deprecated, or
>> > > not frequently used, or not important functions. This could also
>> > > help the satellite projects, which use NumPy API as an example, and
>> > > may also be influenced by them and their decisions.
>> > >
>> >
>> >  Indeed. It will help restructure our docs. Perhaps not the reference
>> > guide (not sure yet), but definitely the user guide and other high-
>> > level docs we (or third parties) may want to create.
>> >
>>
>> Trying to follow the discussion, there seems to be various ideas? Do I
>> understand it right that the original proposal was much like doing a
>> list of:
>>
>>   * np.ndarray.cumprod: low importance -> prefer np.multiply.accumulate
>>   * np.ravel_multi_index: low importance, but distinct feature
>>
>
> Indeed. Certainly no more than that was my idea.
>
>
>> Maybe with added groups such as "transpose-like" and "reshape-like"
>> functions?
>> This would be based on 1. "Experience" and 2. usage statistics. This
>> seems mostly a task for 2-3 people to then throw out there for
>> discussion.
>> There will be some very difficult/impossible calls, since in the end
>> Nathaniel is right, we do not quite know the question we want to
>> answer. But for a huge part of the API it may not be problematic?
>>
>
> Agreed, won't be problematic.
>
>
>>
>> Then there is an idea of providing better mixins (and tests).
>> This could be made easier by the first idea, for prioritization.
>> Although, the first idea is probably not really necessary to kick this
>> off at all. The interesting parts to me seem likely how to best solve
>> testing of the mixins and numpy-api-duplicators in general.
>>
>> Implementing a growing set of mixin seems likely fairly straight
>> forwrad (although maybe much easier to approach if there is a list from
>> the first project)?
>
>
> Indeed. I think there's actually 3 levels here (at least):
> 1. function name: high/low importance or some such simple classification
> 2. function signature and behavior: is the behavior optimal, what would be
> change, etc.
> 3. making duck arrays and subclasses that rely on all those functions and
> their behavior easier to implemement/use
>
> Mixins are a specific answer to (3). And it's unclear if they're the best
> answer (could be, I don't know - please don't start a discussion on that
> here). Either way, working on (3) will be helped by having a better sense
> of (1) and (2).
>
> Also think about effort: (2) is at least an order of magnitude more work
> than (1), and (3) likely even more work than (2).
>
>
>> And, once we have a start, maybe we can rely on the
>> array-like implementors to be the main developers (limiting us mostly
>> to review).
>>
>>
>> The last part would be probably for users and consumers of array-likes.
>> This largely overlaps, but comes closer to the problem of "standard".
>> If we have a list of functions that we tend to see as more or less
>> important, it may be interesting for downstream projects to restrict
>> themselves to simplify interoperability e.g. with dask.
>>
>> Maybe we do not have to draw a strict line though? How plausible would
>> it be to set up a list (best auto-updating) saying nothing but:
>>
>> `np.concatenate` supported by: dask, jax, cupy
>>
>
> That's probably not that hard, and I agree it would be quite useful. The
> namespaces of each of those libraries is probably not the same, but with
> dir() and some strings and lists you'll get a long way here I think.
>
>
>>
>> I am not sure if this is helpful, but it feels to me that the first
>> part is what Ralf was thinking of? Just to kick of such a a "living
>> document".
>
>
> Indeed.
>
> I could maybe help with providing the second pair of eyes
>> for a first iteration there, Ralf.
>
>
> Awesome, thanks Sebastian.
>
> Cheers,
> Ralf
>
>
> The last list I would actually find
>> interesting myself, but not sure how easy it would be to approach it?
>>
>> Best,
>>
>> Sebastian
>>
>>
>> > Ralf
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at python.org
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

-- 
Mark Mikofski, PhD (2005)
*Fiat Lux*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190713/327c3d96/attachment.html>