[SciPy-Dev] Limit number of iterations in sigma clipping

Sri Krishna kitchi.srikrishna at gmail.com
Sun Nov 29 04:59:37 EST 2015


On 18 October 2015 at 21:12, <josef.pktd at gmail.com> wrote:

>
>
>
> On Sun, Oct 18, 2015 at 11:23 AM, Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>>
>> On Sun, Oct 18, 2015 at 1:18 PM, Sri Krishna <kitchi.srikrishna at gmail.com
>> > wrote:
>>
>>> On 18 October 2015 at 15:05, Ralf Gommers <ralf.gommers at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Fri, Oct 16, 2015 at 6:21 AM, Sri Krishna <
>>>> kitchi.srikrishna at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The current sigma clipping function has no iters keyword.
>>>>>
>>>>
>>>> For clarity: this is stats.sigmaclip
>>>>
>>>>
>>>>> I feel it would be useful to include it in the function, with a
>>>>> default like `iters=None` so that if people do want to use the keyword they
>>>>> will be able to do so.
>>>>>
>>>>> With the defaults of `iters = None` it shouldn't break anyone's
>>>>> workflow and will correspond to the current default behaviour.
>>>>>
>>>>> If there is any interest in this, I can file a pull request soon.
>>>>>
>>>>
>>>> Makes sense to me to add this.
>>>>
>>>> I just saw that AstroPy has a much more extensive sigma clipping
>>>> function. Replacing stats.sigmaclip with
>>>> http://docs.astropy.org/en/latest/api/astropy.stats.sigma_clip.html
>>>> would be the way to go I think. In a backwards-compatible fashion of
>>>> course, so the `sigma` keyword from that function should be left out.
>>>>
>>>
>>> Well, I am working on a pull request on the astropy sigma clipping
>>> routine to incorporate the scipy function - The scipy function can be upto
>>> 100x faster than the astropy function in benchmarks that I've run, and I'm
>>> not convinced that it is worthwhile loosing the speed for features. So once
>>> the PR is merged, the astropy function will have the option to use the
>>> scipy function as well as the existing astropy function,
>>>
>>
>> Ouch, 100x is a lot.
>>
>>
>>> As far as I can tell, the speed improvements in the
>>> scipy.stats.sigmaclip function come because we don't use masked arrays, and
>>> the sigma clipped data is discarded every iteration. We could potentially
>>> use another function instead of the mean (i.e., median or a user supplied
>>> function analogous to astropy) without too much of a speed hit.
>>>
>>> The only three features we need to port from the astropy function are
>>> the `cenfunc` and `varfunc` keywords to define a custom function to
>>> calculate mean/standard deviation and the `axis` keyword. I'm not sure we
>>> need to use masked arrays because of the previously mentioned performance
>>> implications.
>>>
>>> So should I work on a pull request to adapt the astropy function into
>>> scipy in a backwards compatible manner?
>>>
>>
>> All the keyword you mentioned make sense to add I think, so yes please.
>>
>> And a bit of bikeshedding: the names used by AstroPy aren't optimal imho.
>> Maybe:
>> - iters --> numiter
>>
>
> statsmodels uses maxiter in cases like this because it can stop early
>
>
>> - cenfunc --> centerfunc
>>
>
> I think this should correspond to the robust ANOVA function, IIRC
>
>
> names have power and are not bikesheds
>
>
> (default color choices ain't no bikesheds either)
>
>
>
>
I've taken my time about it, but finally got around to finishing the PR here
<https://github.com/scipy/scipy/pull/5559>.

Do let me know if I need to add/remove/throw away/fix anything! I hope I've
done everything correctly - This is the first time I'm submitting a PR to
scipy.

Thanks,
Krishna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20151129/ec8a9b96/attachment.html>


More information about the SciPy-Dev mailing list