[SciPy-User] scipy.stats.nanmedian

Keith Goodman kwgoodman at gmail.com
Fri Jan 22 11:58:21 EST 2010


On Fri, Jan 22, 2010 at 8:52 AM, Keith Goodman <kwgoodman at gmail.com> wrote:
> On Fri, Jan 22, 2010 at 8:46 AM,  <josef.pktd at gmail.com> wrote:
>> On Fri, Jan 22, 2010 at 11:09 AM, Keith Goodman <kwgoodman at gmail.com> wrote:
>>> On Thu, Jan 21, 2010 at 8:18 PM,  <josef.pktd at gmail.com> wrote:
>>>> On Thu, Jan 21, 2010 at 10:01 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>>>>> On Thu, Jan 21, 2010 at 6:41 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
>>>>>> On Jan 21, 2010, at 9:28 PM, Keith Goodman wrote:
>>>>>>> That's the only was I was able to figure out how to pull 1.0 out of
>>>>>>> np.array(1.0). Is there a better way?
>>>>>>
>>>>>>
>>>>>> .item()
>>>>>
>>>>> Thanks. item() looks better than tolist().
>>>>>
>>>>> I simplified the function:
>>>>>
>>>>> def nanmedian(x, axis=0):
>>>>>    x, axis = _chk_asarray(x,axis)
>>>>>    if x.ndim == 0:
>>>>>        return float(x.item())
>>>>>    x = x.copy()
>>>>>    x = np.apply_along_axis(_nanmedian,axis,x)
>>>>>    if x.ndim == 0:
>>>>>        x = float(x.item())
>>>>>    return x
>>>>>
>>>>> and opened a ticket:
>>>>>
>>>>> http://projects.scipy.org/scipy/ticket/1098
>>>>
>>>>
>>>> How about getting rid of apply_along_axis?    see attachment
>>>>
>>>> I don't know whether or how much faster it is, but there is a ticket
>>>> that the current version is slow.
>>>> No hidden bug or corner case guarantee yet.
>>>
>>> It is faster. But here is one case it does not handle:
>>>
>>>>> nanmedian([1, 2])
>>>   array([ 1.5])
>>>>> np.median([1, 2])
>>>   1.5
>>>
>>> I'm sure it could be fixed. But having to fix it, and the fact that it
>>> is a larger change, decreases the likelihood that it will make it into
>>> the next version of scipy. One option is to make the small bug fix I
>>> suggested (ticket #1098) and add the corresponding unit tests. Then we
>>> can take our time to design a better version of nanmedian.
>>
>> I didn't see the difference to np.median for this case, I think I was
>> taking the shape answer from the other thread on the return of splines
>> and interpolation.
>>
>> If I change the last 3 lines to
>>    if nanmed.size == 1:
>>       return nanmed.item()
>>    return nanmed
>>
>> then I get agreement with numpy for the following test cases
>>
>> print nanmedian(1), np.median(1)
>> print nanmedian(np.array(1)), np.median(1)
>> print nanmedian(np.array([1])), np.median(np.array([1]))
>> print nanmedian(np.array([[1]])), np.median(np.array([[1]]))
>> print nanmedian(np.array([1,2])), np.median(np.array([1,2]))
>> print nanmedian(np.array([[1,2]])), np.median(np.array([[1,2]]),axis=0)
>> print nanmedian([1]), np.median([1])
>> print nanmedian([[1]]), np.median([[1]])
>> print nanmedian([1,2]), np.median([1,2])
>> print nanmedian([[1,2]]), np.median([[1,2]],axis=0)
>> print nanmedian([1j,2]), np.median([1j,2])
>>
>> Am I still missing any cases?
>>
>> The vectorized version should be faster for this case
>> http://projects.scipy.org/scipy/ticket/740
>> but maybe not for long and narrow arrays.
>
> Here is an odd one:
>
>>> nanmedian(True)
>   1.0
>>> nanmedian([True])
>   0.5  # <--- strange
>
>>> np.median(True)
>   1.0
>>> np.median([True])
>   1.0

Another one:

>> x = np.random.randn(3,4,5)
>> nanmedian(x)
ValueError: shape mismatch: objects cannot be broadcast to a single shape

If anything we should add a full set of unit tests for nanmedian. One
reason why the current unit tests did not catch the problem I ran into
is that

>> np.array(2.0) == 2.0
   True

So nanmedian was returning np.array(2.0) and np.median was returning
2.0 which when compared passed the unit test.



More information about the SciPy-User mailing list