[SciPy-User] scipy.stats.nanmedian

Fri Jan 22 11:52:50 EST 2010

On Fri, Jan 22, 2010 at 8:46 AM,  <josef.pktd at gmail.com> wrote:
> On Fri, Jan 22, 2010 at 11:09 AM, Keith Goodman <kwgoodman at gmail.com> wrote:
>> On Thu, Jan 21, 2010 at 8:18 PM,  <josef.pktd at gmail.com> wrote:
>>> On Thu, Jan 21, 2010 at 10:01 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>>>> On Thu, Jan 21, 2010 at 6:41 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
>>>>> On Jan 21, 2010, at 9:28 PM, Keith Goodman wrote:
>>>>>> That's the only was I was able to figure out how to pull 1.0 out of
>>>>>> np.array(1.0). Is there a better way?
>>>>>
>>>>>
>>>>> .item()
>>>>
>>>> Thanks. item() looks better than tolist().
>>>>
>>>> I simplified the function:
>>>>
>>>> def nanmedian(x, axis=0):
>>>>    x, axis = _chk_asarray(x,axis)
>>>>    if x.ndim == 0:
>>>>        return float(x.item())
>>>>    x = x.copy()
>>>>    x = np.apply_along_axis(_nanmedian,axis,x)
>>>>    if x.ndim == 0:
>>>>        x = float(x.item())
>>>>    return x
>>>>
>>>> and opened a ticket:
>>>>
>>>> http://projects.scipy.org/scipy/ticket/1098
>>>
>>>
>>> How about getting rid of apply_along_axis?    see attachment
>>>
>>> I don't know whether or how much faster it is, but there is a ticket
>>> that the current version is slow.
>>> No hidden bug or corner case guarantee yet.
>>
>> It is faster. But here is one case it does not handle:
>>
>>>> nanmedian([1, 2])
>>   array([ 1.5])
>>>> np.median([1, 2])
>>   1.5
>>
>> I'm sure it could be fixed. But having to fix it, and the fact that it
>> is a larger change, decreases the likelihood that it will make it into
>> the next version of scipy. One option is to make the small bug fix I
>> suggested (ticket #1098) and add the corresponding unit tests. Then we
>> can take our time to design a better version of nanmedian.
>
> I didn't see the difference to np.median for this case, I think I was
> taking the shape answer from the other thread on the return of splines
> and interpolation.
>
> If I change the last 3 lines to
>    if nanmed.size == 1:
>       return nanmed.item()
>    return nanmed
>
> then I get agreement with numpy for the following test cases
>
> print nanmedian(1), np.median(1)
> print nanmedian(np.array(1)), np.median(1)
> print nanmedian(np.array([1])), np.median(np.array([1]))
> print nanmedian(np.array([[1]])), np.median(np.array([[1]]))
> print nanmedian(np.array([1,2])), np.median(np.array([1,2]))
> print nanmedian(np.array([[1,2]])), np.median(np.array([[1,2]]),axis=0)
> print nanmedian([1]), np.median([1])
> print nanmedian([[1]]), np.median([[1]])
> print nanmedian([1,2]), np.median([1,2])
> print nanmedian([[1,2]]), np.median([[1,2]],axis=0)
> print nanmedian([1j,2]), np.median([1j,2])
>
> Am I still missing any cases?
>
> The vectorized version should be faster for this case
> http://projects.scipy.org/scipy/ticket/740
> but maybe not for long and narrow arrays.

Here is an odd one:

>> nanmedian(True)
   1.0
>> nanmedian([True])
   0.5  # <--- strange

>> np.median(True)
   1.0
>> np.median([True])
   1.0