[Numpy-discussion] improving arange()? introducing fma()?

Fri Feb 9 15:43:45 EST 2018

On Fri, Feb 9, 2018 at 12:19 PM, Chris Barker <chris.barker at noaa.gov> wrote:

> On Wed, Feb 7, 2018 at 12:09 AM, Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>>
>>  It is partly a plea for some development of numerically accurate
>>> functions for computing lat/lon grids from a combination of inputs: bounds,
>>> counts, and resolutions.
>>>
>>
> Can you be more specific about what problems you've run into -- I work
> with lat-lon grids all the time, and have never had a problem.
>
> float32 degrees gives you about 1 meter accuracy or better, so I can see
> how losing a few digits might be an issue, though I would argue that you
> maybe shouldn't use float32 if you are worried about anything close to 1m
> accuracy... -- or shift to a relative coordinate system of some sort.
>

The issue isn't so much the accuracy of the coordinates themselves. I am
only worried about 1km resolution (which is approximately 0.01 degrees at
mid-latitudes). My concern is with consistent *construction* of a
coordinate grid with even spacing. As it stands right now. If I provide a
corner coordinate, a resolution, and the number of pixels, the result is
not terrible (indeed, this is the approach used by gdal/rasterio). If I
have start/end coordinates and the number of pixels, the result is not bad,
either (use linspace). But, if I have start/end coordinates and a
resolution, then determining the number of pixels from that is actually
tricky to get right in the general case, especially with float32 and large
grids, and especially if the bounding box specified isn't exactly divisible
by the resolution.

>
> I have been playing around with the decimal package a bit lately,
>>>
>>
> sigh. decimal is so often looked at a solution to a problem it isn't
> designed for. lat-lon is natively Sexagesimal -- maybe we need that dtype
> :-)
>
> what you get from decimal is variable precision -- maybe a binary variable
> precision lib is a better answer -- that would be a good thing to have easy
> access to in numpy, but in this case, if you want better accuracy in a
> computation that will end up in float32, just use float64.
>

I am not concerned about computing distances or anything like that, I am
trying to properly construct my grid. I need consistent results regardless
of which way the grid is specified (start/end/count, start/res/count,
start/end/res). I have found that loading up the grid specs (using in a
config file or command-line) using the Decimal class allows me to exactly
and consistently represent the grid specification, and gets me most of the
way there. But the problems with arange() is frustrating, and I have to
have extra logic to go around that and over to linspace() instead.

>
> and I discovered the concept of "fused multiply-add" operations for
>>> improved accuracy. I have come to realize that fma operations could be used
>>> to greatly improve the accuracy of linspace() and arange().
>>>
>>
> arange() is problematic for non-integer use anyway, by its very definition
> (getting the "end point" correct requires the right step, even without FP
> error).
>
> and would it really help with linspace? it's computing a delta with one
> division in fp, then multiplying it by an integer (represented in fp --
> why? why not keep that an integer till the multiply?).
>

Sorry, that was a left-over from a previous draft of my email after I
discovered that linspace's accuracy was on par with fma(). And while
arange() has inherent problems, it can still be made better than it is now.
In fact, I haven't investigated this, but I did recently discover some unit
tests of mine started to fail after a numpy upgrade, and traced it back to
a reduction in the accuracy of a usage of arange() with float32s. So,
something got worse at some point, which means we could still get accuracy
back if we can figure out what changed.

>
> In particular, I have been needing improved results for computing
>>> latitude/longitude grids, which tend to be done in float32's to save memory
>>> (at least, this is true in data I come across).
>>>
>>
>> If you care about saving memory *and* accuracy, wouldn't it make more
>> sense to do your computations in float64, and convert to float32 at the
>> end?
>>
>
> that does seem to be the easy option :-)
>

Kinda missing the point, isn't it? Isn't that like saying "convert all your
data to float64s prior to calling np.mean()"? That's ridiculous. Instead,
we made np.mean() upcast the inner-loop operation, and even allow an option
to specify what the dtype that should be used for the aggregator.

>
>
>> Now, to the crux of my problem. It is next to impossible to generate a
>>> non-trivial numpy array of coordinates, even in double precision, without
>>> hitting significant numerical errors.
>>>
>>
> I'm confused, the example you posted doesn't have significant errors...
>

Hmm, "errors" was the wrong word. "Differences between methods" might be
more along the lines of what I was thinking. Remember, I am looking for
consistency.

>
>
>> Which has lead me down the path of using the decimal package (which
>>> doesn't play very nicely with numpy because of the lack of casting rules
>>> for it). Consider the following:
>>> ```
>>> $ cat test_fma.py
>>> from __future__ import print_function
>>> import numpy as np
>>> res = np.float32(0.01)
>>> cnt = 7001
>>> x0 = np.float32(-115.0)
>>> x1 = res * cnt + x0
>>> print("res * cnt + x0 = %.16f" % x1)
>>> x = np.arange(-115.0, -44.99 + (res / 2), 0.01, dtype='float32')
>>> print("len(arange()): %d  arange()[-1]: %16f" % (len(x), x[-1]))
>>> x = np.linspace(-115.0, -44.99, cnt, dtype='float32')
>>> print("linspace()[-1]: %.16f" % x[-1])
>>>
>>> $ python test_fma.py
>>> res * cnt + x0 = -44.9900015648454428
>>> len(arange()): 7002  arange()[-1]:       -44.975044
>>> linspace()[-1]: -44.9900016784667969
>>> ```
>>> arange just produces silly results (puts out an extra element... adding
>>> half of the resolution is typically mentioned as a solution on mailing
>>> lists to get around arange()'s limitations -- I personally don't do this).
>>>
>>
> The real solution is "don't do that" arange is not the right tool for the
> job.
>

Well, it isn't the right tool because as far as I am concerned, it is
useless for anything but integers. Why not fix it to be more suitable for
floating point?

>
> Then there is this:
>
> res * cnt + x0 = -44.9900015648454428
> linspace()[-1]: -44.9900016784667969
>
> that's as good as you are ever going to get with 32 bit floats...
>

Consistency is the key thing. I am fine with one of those values, so long
as that value is what happens no matter which way I specify my grid.

>
> Though I just noticed something about your numbers -- there should be a
> nice even base ten delta if you have 7001 gaps -- but linspace produces N
> points, not N gaps -- so maybe you want:
>
>
> In [*17*]: l = np.linspace(-115.0, -44.99, 7002)
>
>
> In [*18*]: l[:5]
>
> Out[*18*]: array([-115.  , -114.99, -114.98, -114.97, -114.96])
>
>
> In [*19*]: l[-5:]
>
> Out[*19*]: array([-45.03, -45.02, -45.01, -45.  , -44.99])
>
>
> or, in float32 -- not as pretty:
>
>
> In [*20*]: l = np.linspace(-115.0, -44.99, 7002, dtype=np.float32)
>
>
> In [*21*]: l[:5]
>
> Out[*21*]:
>
> array([-115.        , -114.98999786, -114.98000336, -114.97000122,
>
>        -114.95999908], dtype=float32)
>
>
> In [*22*]: l[-5:]
>
> Out[*22*]: array([-45.02999878, -45.02000046, -45.00999832, -45.        ,
> -44.99000168], dtype=float32)
>
>
> but still as good as you get with float32, and exactly the same result as
> computing in float64 and converting:
>
>
>
> In [*25*]: l = np.linspace(-115.0, -44.99, 7002).astype(np.float32)
>
>
> In [*26*]: l[:5]
>
> Out[*26*]:
>
> array([-115.        , -114.98999786, -114.98000336, -114.97000122,
>
>        -114.95999908], dtype=float32)
>
>
> In [*27*]: l[-5:]
>
> Out[*27*]: array([-45.02999878, -45.02000046, -45.00999832, -45.        ,
> -44.99000168], dtype=float32)
>

Argh! I got myself mixed up between specifying pixel corners versus pixel
centers. rasterio has been messing me up on this.

>
>
>>> So, does it make any sense to improve arange by utilizing fma() under
>>> the hood?
>>>
>>
> no -- this is simply not the right use-case for arange() anyway.
>

arange() has accuracy problems, so why not fix it?

>>> l4 = np.arange(-115, -44.99, 0.01, dtype=np.float32)
>>> np.median(np.diff(l4))
0.0099945068
>>> np.float32(0.01)
0.0099999998

There is something significantly wrong here if arange(), which takes a
resolution parameter, can't seem to produce a sequence with the proper
delta.

>
>
>> Also, any plans for making fma() available as a ufunc?
>>>
>>
> could be nice -- especially if used internally.
>
>
>> Notice that most of my examples required knowing the number of grid
>>> points ahead of time. But what if I didn't know that? What if I just have
>>> the bounds and the resolution? Then arange() is the natural fit, but as I
>>> showed, its accuracy is lacking, and you have to do some sort of hack to do
>>> a closed interval.
>>>
>>
> no -- it's not -- if you have the bounds and the resolution, you have an
> over-specified problem. That is:
>
> x_min + (n * delta_x) == x_max
>
> If there is ANY error in either delta_x or x_max (or x_min), then you'll
> get a missmatch. which is why arange is not the answer (you can make the
> algorithm a bit more accurate, I suppose but there is still fp limited
> precision -- if you can't exactly represent either delta_x or x_max, then
> you CAN'T use the arange() definition and expect to work consistently.
>
> The "right" way to do it is to compute N with: round((x_max - x_min) /
> delta), and then use linspace:
>
> linspace(x_min, x_max, N+1)
>
> (note that it's too bad you need to do N+1 -- if I had to do it over
> again, I'd use N as the number of "gaps" rather than the number of points
> -- that's more commonly what people want, if they care at all)
>
> This way, you get a grid with the endpoints as exact as they can be, and
> the deltas as close to each-other as they can be as well.
>
> maybe you can do a better algorithm in linspace to save an ULP, but it's
> hard to imagine when that would matter.
>

Yes, it is overspecified. My problem is that different tools require
different specs (ahem... rasterio/gdal), and I have gird specs coming from
other sources. And I need to produce data onto the same grid so that tools
like xarray won't yell at me when I am trying to do an operation between
gridded data that should have the same coordinates, but are off slightly
because they were computed differently for whatever reason.

I guess I am crying out for some sort of tool that will help the community
stop making the same mistakes. A one-stop shop that'll allow us to specify
a grid in a few different ways and still produce the right thing, and even
do the inverse... provide a coordinate array and get grids specs in
whatever form we want. Maybe even have options for dealing with pixel
corner vs. pixel centers, too? There are additional fun problems such as
padding out coordinate arrays, which np.pad doesn't really do a great job
with.

Cheers!
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180209/26928050/attachment-0001.html>