[Numpy-discussion] improving arange()? introducing fma()?

Thu Feb 22 14:57:17 EST 2018

On Thu, 2018-02-22 at 14:33 -0500, Benjamin Root wrote:
> Sorry, I have been distracted with xarray improvements the past
> couple of weeks.
> 
> Some thoughts on what has been discussed:
> 
> First, you are right...Decimal is not the right module for this. I
> think instead I should use the 'fractions' module for loading grid
> spec information from strings (command-line, configs, etc). The
> tricky part is getting the yaml reader to use it instead of
> converting to a float under the hood.
> 
> Second, what has been pointed out about the implementation of arange
> actually helps to explain some oddities I have encountered. In some
> situations, I have found that it was better for me to produce the
> reversed sequence, and then reverse that array back and use it.
> 
> Third, it would be nice to do what we can to improve arange()'s
> results. Would we be ok with a PR that uses fma() if it is available,
> but then falls back on a regular multiply and add if it isn't
> available, or are we going to need to implement it ourselves for
> consistency?
> 

I am not sure I like the idea:
  1. It sounds like it might break code
  2. It sounds *not* like a fix, but rather a
     "make it slightly less bad, but it is still awful"

Using fma inside linspace might make linspace a bit more exact
possible, and would be a good thing, though I am not sure we have a
policy yet for something that is only used sometimes, nor am I sure it
actually helps.

It also would be nice to add stable summation to numpy in general (in
whatever way), which maybe is half related but on nobody's specific
todo list.

> 
> Lastly, there definitely needs to be a better tool for grid making.
> The problem appears easy at first, but it is fraught with many
> pitfalls and subtle issues. It is easy to say, "always use
> linspace()", but if the user doesn't have the number of pixels, they
> will need to calculate that using --- gasp! -- floating point
> numbers, which could result in the wrong answer. Or maybe their
> first/last positions were determined by some other calculation, and
> so the resulting grid does not have the expected spacing. Another
> problem that I run into is starting from two different sized grids
> and padding them both to be the same spec -- and getting that to
> match what would come about if I had generated the grid from scratch.
> 

Maybe you are right, but right now I have no clue what that tool would
do :). If we should add it to numpy likely depends on what exactly it
does and how complex it is.
I once wanted to add a "step" argument to linspace, but didn't in the
end, largely because it basically enforced in a very convoluted way
that the step fit exactly to a number of steps (up to floating point
precision) and body was quite sure it was a good idea, since it would
just be useful for a little convenience when you do not want to
calculate the steps.

Best,

Sebastian

> 
> Getting these things right is hard. I am not even certain that my
> existing code for doing this even right. But, what I do know is that
> until we build such a tool, users will continue to incorrectly use
> arange() and linspace(), and waste time trying to re-invent the wheel
> badly, assuming they even notice their mistakes in the first place!
> So, should such a tool go into numpy, given how fundamental it is to
> generate a sequence of floating point numbers, or should we try to
> put it into a package like rasterio or xarray?
> 
> Cheers!
> Ben Root
> 
> 
> 
> On Thu, Feb 22, 2018 at 2:02 PM, Chris Barker <chris.barker at noaa.gov>
> wrote:
> > @Ben: Have you found a solution to your problem? Are there thinks
> > we could do in numpy to make it better?
> > 
> > -CHB
> > 
> > 
> > On Mon, Feb 12, 2018 at 9:33 AM, Chris Barker <chris.barker at noaa.go
> > v> wrote:
> > > I think it's all been said, but a few comments:
> > > 
> > > On Sun, Feb 11, 2018 at 2:19 PM, Nils Becker <nilsc.becker at gmail.
> > > com> wrote:
> > > > Generating equidistantly spaced grids is simply not always
> > > > possible.
> > > > 
> > > 
> > > exactly -- and linspace gives pretty much teh best possible
> > > result, guaranteeing tha tthe start an end points are exact, and
> > > the spacing is within an ULP or two (maybe we could make that
> > > within 1 ULP always, but not sure that's worth it).
> > >  
> > > > The reason is that the absolute spacing of the possible
> > > > floating point numbers depends on their magnitude [1].
> > > > 
> > > 
> > > Also that the exact spacing may not be exactly representable in
> > > FP -- so you have to have at least one space that's a bit off to
> > > get the end points right (or have the endpoints not exact).
> > >  
> > > > If you - for some reason - want the same grid spacing
> > > > everywhere you may choose an appropriate new spacing.
> > > > 
> > > 
> > > well, yeah, but usually you are trying to fit to some other
> > > constraint. I'm still confused as to where these couple of ULPs
> > > actually cause problems, unless you are doing in appropriate FP
> > > comparisons elsewhere.
> > > 
> > > > Curiously, either by design or accident, arange() seems to do
> > > > something similar as was mentioned by Eric. It creates a new
> > > > grid spacing by adding and subtracting the starting point of
> > > > the grid. This often has similar effect as adding and
> > > > subtracting N*dx (e.g. if the grid is symmetric around 0.0).
> > > > Consequently, arange() seems to trade keeping the grid spacing
> > > > constant for a larger error in the grid size and consequently
> > > > in the end point.
> > > > 
> > > 
> > > interesting -- but it actually makes sense -- that is the
> > > definition of arange(), borrowed from range(), which was designed
> > > for integers, and, in fact, pretty much mirroered the classic C
> > > index for loop:
> > > 
> > > 
> > > for (int i=0; i<N; i++) {
> > > ...
> > > 
> > > 
> > > or in python:
> > > 
> > > i = start
> > > while i < stop:
> > >     i += step
> > > 
> > > The problem here is that termination criteria -- i < stop -- that
> > > is the definition of the function, and works just fine for
> > > integers (where it came from), but with FP, even with no error
> > > accumulation, stop may not be exactly representable, so you could
> > > end up with a value for your last item that is about (stop-step), 
> > > or you could end up with a value that is a couple ULPs less than
> > > step -- essentially including the end point when you weren't
> > > supposed to.
> > > 
> > > The truth is, making a floating point range() was simply a bad
> > > idea to begin with -- it's not the way to define a range of
> > > numbers in floating point. Whiuch is why the docs now say "When
> > > using a non-integer step, such as 0.1, the results will often not
> > > be consistent.  It is better to use ``linspace`` for these
> > > cases."
> > > 
> > > Ben wants a way to define grids in a consistent way -- make
> > > sense. And yes, sometimes, the original source you are trying to
> > > match (like GDAL) provides a starting point and step. But with
> > > FP, that is simply problematic. If:
> > > 
> > > start + step*num_steps != stop
> > > 
> > > exactly in floating point, then you'll need to do the math one
> > > way or another to get what you want -- and I'm not sure anyone
> > > but the user knows what they want -- do you want step to be as
> > > exact as possible, or do you want stop to be as exact as
> > > possible?
> > > 
> > > All that being said -- if arange() could be made a tiny bit more
> > > accurate with fma or other numerical technique, why not? it won't
> > > solve the problem, but if someone writes and tests the code (and
> > > it does not require compiler or hardware features that aren't
> > > supported everywhere numpy compiles), then sure. (Same for
> > > linspace, though I'm not sure it's possible)
> > > 
> > > There is one other option: a new function (or option) that makes
> > > a grid from a specification of: start, step, num_points. If that
> > > is really a common use case (that is, you don't care exactly what
> > > the end-point is), then it might be handy to have it as a
> > > utility.
> > > 
> > > We could also have an arange-like function that, rather than <
> > > stop, would do "close to" stop. Someone that understands FP
> > > better than I might be able to compute what the expected error
> > > might be, and find the closest end point within that error. But I
> > > think that's a bad specification -- (stop - start) / step may be
> > > nowhere near an integer -- then what is the function supposed to
> > > do??
> > > 
> > > 
> > > BTW: I kind of wish that linspace specified the number of steps,
> > > rather than the number of points, that is (num+points - 1) that
> > > would save a little bit of logical thinking. So if something new
> > > is made, let's keep that in mind.
> > > 
> > >  
> > > > 1. Comparison to calculations with decimal can be difficult as
> > > > not all simple decimal step sizes are exactly representable as 
> > > > 
> > > > finite floating point numbers.
> > > > 
> > > 
> > > yeah, this is what I mean by inappropriate use of Decimal --
> > > decimal is not inherently "more accurate" than fp -- is just can
> > > represent _decimal_ numbers exactly, which we are all use to --
> > > we want  1 / 10 to be exact, but dont mind that 1 / 3 isn't.
> > > 
> > > Decimal also provided variable precision -- so it can be handy
> > > for that. I kinda wish Python had an arbitrary precision binary
> > > floating point built in...
> > > 
> > > -CHB
> > > 
> > > -- 
> > > 
> > > Christopher Barker, Ph.D.
> > > Oceanographer
> > > 
> > > Emergency Response Division
> > > NOAA/NOS/OR&R            (206) 526-6959   voice
> > > 7600 Sand Point Way NE   (206) 526-6329   fax
> > > Seattle, WA  98115       (206) 526-6317   main reception
> > > 
> > > Chris.Barker at noaa.gov
> > 
> > 
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180222/3bfe80a1/attachment.sig>