[Python-ideas] Consider adding clip or clamp function to math

Steven D'Aprano steve at pearwood.info
Mon Aug 1 16:10:44 EDT 2016


On Mon, Aug 01, 2016 at 12:00:11PM -0700, Chris Barker wrote:
> Something to keep in mind:
> 
> the math module is written in C, and will remain that way for the time
> being (see recent discussion on, I think, this list and also the discussion
> when we added math.isclose()
> 
> which means it will be for floats only.

Not necessarily. 

py> import math
py> math.factorial(100)
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000

Not a float :-)

It means that this clamp() function would have to be implemented in C. 
It *doesn't* mean that it will have to convert its arguments to floats, 
or reject non-float arguments.

As my implementation shows, this should work with any ordered numeric 
type if clamp() calls the Python < and > operators (i.e. the __lt__ and 
__gt__ dunders). Let the objects themselves do any numeric conversions 
*if necessary*, there's no need for clamp() to convert the arguments to 
floats and call the native C double < and > operators.

(I presume that there's a way to call Python operators from C code.)


> My first thought is that not every one line function needs to be in the
> standard library. However, as this thread shows, there are some
> complications to be considered, so maybe it does make sense to have them
> hashed out.

Indeed.


> Regarding NaN:
> 
> In [4]: nan = float('nan')
> In [6]: nan > 5
> Out[6]: False
> In [7]: 5 > nan
> Out[7]: False

NANs are *unordered* values: they are neither greater than, nor less 
than, any other value.


> This follows the IEEE spec -- so the only correct result from
> 
> clip(x, float('nan')) is NaN.

I don't agree that this is the "only correct result".

We only clamp the value if it is less than the lower bound, or greater 
than the upper bound. Otherwise we leave it untouched. So, given:

clamp(x, lower, upper)

we say:

if x < lower: x = lower
elif x > upper: x = upper

If lower or upper are NANs, then neither condition will ever be true, 
and x will never be clamped to a NAN (unless it is already a NAN).

That's why I said that it was an accident of implementation that passing 
a NAN as one of the lower or upper bounds will be equivalent to setting 
the bounds to minus/plus infinity: the value will never be less than 
NAN, or greater than NAN.

I suppose we could rule that case out: if either bound is a NAN, raise 
an exception. But that will require a conversion to float, which may 
fail. I'd rather just document that passing NANs as bounds will lead to 
implementation-specific behaviour that you cannot rely on it. If you 
want to specify an unbounded limit, pass None or an infinity with the 
right sign.


> Steven D'Aprano wrote:
> > I don't care too much whether the parameters are mandatory or have
> > defaults, so long as it is *possible* to pass something for the lower
> > and upper bounds which mean "unbounded".
> 
> I think the point was that if one of the liimts in unbounded, then you can
> jsut use min or max...
> 
> though I think I agree -- you may have code where the limits are sometimes
> unbounded, and sometimes not -- nice to have a way to have only one code
> path.

That's exactly my thinking. The last thing you want to do is to inspect 
the bounds, then decide whether you need to call min(), max() or 
clamp(). Not only is it a pain, but as Victor inadvertently showed, it's 
easy to get mixed up and call the wrong function.


>  (1) Explicitly pass -INFINITY or +INFINITY as needed;
> but which
> 
> that's it then.
> 
> > infinity, float or Decimal? If you pass the wrong one, you may have to
> > pay the cost of converting your values to float/Decimal, which could end
> > up expensive if you have a lot of them.
> 
> well, as above, if it's in the math module, it's only float.... you could
> add one ot the Decimal module, too, I suppose.

I'm pretty sure that a C implementation can be type agnostic and simply 
rely on the Python < and > operators.


> > (2) Pass a NAN as the bounds. With my implementation, that actually
> > works! But it's a surprising accident of implementation, it feels wrong
> > and looks weird,
> 
> and violates IEEE754 -- don't do that.

What part of IEEE-754 do you think it violates? I don't think it 
violates anything. But I agree, don't do that. If you do, you'll get 
whatever the implementation happens to do, no promises or guarantees.


[...]
> > (4) Use None as a placeholder for "no limit". That's my preferred
> > option.
> 
> reasonable enough -- and would make the API a bit easier -- both for
> matching different types, and because there is no literal or pre-existing
> object for Inf.

I agree with that reasoning.


-- 
Steve


More information about the Python-ideas mailing list