[Numpy-discussion] is __array_ufunc__ ready for prime-time?

josef.pktd at gmail.com josef.pktd at gmail.com
Thu Nov 2 12:52:29 EDT 2017


On Thu, Nov 2, 2017 at 12:23 PM, Ryan May <rmay31 at gmail.com> wrote:

> On Thu, Nov 2, 2017 at 6:46 AM, <josef.pktd at gmail.com> wrote:
>
>>
>>
>> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum <nathan12343 at gmail.com>
>> wrote:
>>
>>> I think the biggest issues could be resolved if __array_concatenate__
>>> were finished. Unfortunately I don't feel like I can take that on right now.
>>>
>>> See Ryan May's talk at scipy about using an ndarray subclass for units
>>> and the issues he's run into:
>>>
>>> https://www.youtube.com/watch?v=qCo9bkT9sow
>>>
>>
>>
>> Interesting talk, but I don't see how general library code should know
>> what units the output has.
>> for example if units are some flows per unit of time and we average, sum
>> or integrate over time, then what are the new units? (e.g. pandas time
>> aggregation)
>>
>
> A general library doesn't have to do anything--just not do annoying things
> like isinstance() checks and calling np.asarray() everywhere. Honestly one
> of those is the core of most of the problems I run into. It's currently
> more complicated when doing things in compiled land, but that's
> implementation details, not any kind of fundamental incompatibility.
>
> For basic mathematical operations, units have perfectly well defined
> semantics that many of us encountered in an introductory physics or
> chemistry class:
> - Want to add or subtract two things? They need to have the same units; a
> units library can handle conversion provided they have the same
> dimensionality (e.g. length, time)
> - Multiplication/Divison: combine and cancel units ( m/s * s -> m)
>
> Everything else we do on a computer with data in some way boils down to:
> add, subtract, multiply, divide.
>
> Average keeps the same units -- it's just a sum and division by a
> unit-less constant
> Integration (in 1-D) involves *two* variables, your data as well as the
> time/space coordinates (or dx or dt); fundamentally it's a multiplication
> by dx and a summation. The units results then are e.g. data.units *
> dx.units. This works just like it does in Physics 101 where you integrate
> velocity (i.e. m/s) over time (e.g. s) and get displacement (e.g. m)
>
> What are units of covariance or correlation between two variables with the
>> same units, and what are they between variables with different units?
>>
>
> Well, covariance is subtracting the mean from each variable and
> multiplying the residuals; therefore the units for cov(x, y):
>
> (x.units - x.units) * (y.units - y.units) -> x.units * y.units
>
> Correlation takes covariance and divides by the product of the standard
> deviations, so that's:
>
> (x.units * y.units) / (x.units * y.units) -> dimensionless
>
> Which is what I'd expect for a correlation.
>
>
>> How do you concatenate and operate arrays with different units?
>>
>
> If all arrays have compatible dimensionality (say meters, inches, miles),
> you convert to one (say the first) and concatenate like normal. If they're
> not compatible, you error out.
>
>
>> interpolation or prediction would work with using the existing units.
>>
>
> I'm sure you wrote that thinking units didn't play a role, but the math
> behind those operations works perfectly fine with units, with things
> cancelling out properly to give the same units out as in.
>

Some of it is in my reply to Marten.

regression and polyfit requires an X matrix with different units and then
some linear algebra like solve, pinv or svd.

So, while the predicted values have well defined units, the computation
involves some messier operations, unless you want to forgo linear algebra
in all intermediate step and reduce it to sum, division and inverse.

Josef


>
> Ryan
>
> --
> Ryan May
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20171102/e20855a2/attachment.html>


More information about the NumPy-Discussion mailing list