[SciPy-dev] scipy.special.hyp2f1 bug?

Thu Sep 17 14:54:14 EDT 2009

On Thu, Sep 17, 2009 at 3:02 AM, Skipper Seabold <jsseabold at gmail.com> wrote:
> Since y'all are looking at the distributions vs. mpmath, it reminded
> me of this.  I haven't filed a bug report yet, but I don't think one
> existed when I found this.
>
> I was working with the connection between the incomplete beta function
> and the hypergeometric2f1 function in scipy.special for GLM in
> statsmodels and I came across this blog post on mpmath about some
> corner cases for the hypergeometric2f1.
>
> <http://fredrik-j.blogspot.com/2009/06/hypergeometric-2f1-incomplete-beta.html>
>
> Two examples of discrepancies.
>
> In [1]: import mpmath
>
> In [2]: from scipy import special
>
> In [3]: mpmath.hyp2f1(3,-1,-1,0.5)
> Out[3]: mpf('2.5')
>
> In [4]: special.hyp2f1(3,-1,-1,.5)
> Out[4]: 8.0

Hi, mpmath author here. Both these values are correct, depending on definition.

The answer returned by mpmath agrees with Mathematica (intentionally),
but I'm not entirely happy about this as Mathematica's definition here
is *weird*. Cephes' interpretation (if that is what special.hyp2f1
uses) is that the zero and singularity in each term cancel each other
out. Mathematica's (and mpmath's) interpretation is that the
hypergeometric series terminates on the first zero numerator --
despite the same term having a zero denominator. Mathematica's
conventions are not even self-consistent: Hypergeometric2F1[3,x,x,1/2]
= 8 (with a symbolic variable x) but it changes its mind as soon as
you plug in -1 instead.

With regard to the incomplete beta integral, it seems that the
interpretations give different endpoint values, but the result is the
same upon subtraction:

>>> from mpmath import hyp2f1, quad
>>> a=-2.0; b=2.0; z1=0.25; z2=0.75
>>> T2 = z2**a/a*hyp2f1(a,1-b,a+1,z2)
>>> T1 = z1**a/a*hyp2f1(a,1-b,a+1,z1)
>>> print T1, T2
-4.0 0.444444444444444
>>> print T2-T1
4.44444444444444
>>> print quad(lambda u: u**(a-1)*(1-u)**(b-1), [z1,z2])
4.44444444444444
>>>
>>> h = 1e-15    # emulate cancellation instead
>>> a=-2.0+h; b=2.0+h; z1=0.25; z2=0.75
>>> T2 = z2**a/a*hyp2f1(a,1-b,a+1,z2)
>>> T1 = z1**a/a*hyp2f1(a,1-b,a+1,z1)
>>> print T1, T2
-3.59999999999999 0.844444444444447   # different endpoint values
>>> print T2-T1
4.44444444444444   # same difference

Which convention makes more sense? I'd be happy to change mpmath
(though I need to check how many derived results are changed). It
would probably be easy to support both conventions in mpmath by adding
a flag.

Fredrik