From ilhanpolat at gmail.com  Wed Oct  3 15:25:42 2018
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Wed, 3 Oct 2018 21:25:42 +0200
Subject: [SciPy-Dev] Scipy "misc" and "interpolate" function deprecations
Message-ID: <CAEBuzr9uyrKrbG8CB=zNkfOmbF53p745Foe5js3_kdHVBH+0GA@mail.gmail.com>

Hi everyone,

As given in the docstrings of many functions

http://scipy.github.io/devdocs/misc.html

we have deprecated the functions in the misc module and the PR is ready.

https://github.com/scipy/scipy/pull/9325

Evgeni also reminded the "interpolate" funcs which are deprecated even
earlier and also to give a heads-up to the mailing list.

If you have any objections about the schedule or any particular detail
please let me know so that we can postpone this to 1.3.0 if need be in a
timely manner.

Best,
ilhan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181003/71631278/attachment.html>

From email.the.odonnells at gmail.com  Tue Oct  9 03:39:02 2018
From: email.the.odonnells at gmail.com (Kane & Anna O'Donnell)
Date: Tue, 9 Oct 2018 20:39:02 +1300
Subject: [SciPy-Dev] Improving spatial.distance code and performance
In-Reply-To: <30c6c7dd-fbe6-4cc2-feac-1d5dbbb96b26@gmail.com>
References: <CAGJKiAREZUuNyKZMsvkz8sJCgm0VZWAhb_wbBOMVB7uB0BVPTA@mail.gmail.com>
 <CAHPuU_a39HSoq5+QCrd_dGx9wrhi00n_31DBQSVhMTr3uFmdYA@mail.gmail.com>
 <CAEqRcW3x6qZ6X_kpg2U54O7=E4+S6UtkoM3pFQRv8PzQQBCw+w@mail.gmail.com>
 <CABL7CQjF5X2434JH7Xgn1Om9COWS50nkZMa46utDE72BvBKOUQ@mail.gmail.com>
 <30c6c7dd-fbe6-4cc2-feac-1d5dbbb96b26@gmail.com>
Message-ID: <CAGJKiAT9Y-DGhQ60zVj9xm7q6nbuand15LZSCy+m8r1_6HtYtQ@mail.gmail.com>

OK, after a bit of playing around, it actually looks like faiss supports
much of what I was intending (including auto-tuning
<https://github.com/facebookresearch/faiss/wiki/Index-IO,-index-factory,-cloning-and-hyper-parameter-tuning>
etc.). It's only the l2 norm / dot product, but I figure those cover most
use cases anyway. So maybe I'll submit some PRs to piece-by-piece migrate
scipy.spatial to cython etc.

On Sun, 16 Sep 2018 at 20:26, Kane & Anna O'Donnell <
email.the.odonnells at gmail.com> wrote:

> Sorry, flann and faiss were just examples (I haven't actually researched
> different libraries in depth).
>
> > ... approximate distances are probably best left to another package it
> looks like to me ... If you want to get really fancy, I'd lean towards a
> separate package.
>
> OK, I want to try things which it sounds like scipy won't support, so,
> decision made: I'll aim to create a new package. If I actually do it, and
> if it's actually 'good', then there'll be a better discussion point for
> integrating it (if at all).
>
> This older discussion on Flann may be relevant:
> https://mail.python.org/pipermail/scipy-dev/2011-May/thread.html. It says
> Flann only does Euclidean; not sure if that has changed since then.
> Regarding a dependency: https://github.com/mariusmuja/flann is basically
> inactivate for the last years; we wouldn't depend on it but could consider
> vendoring it. However, probably not worth it if it's for one method inside
> euclidean only.
>
> Faiss is still actively developed:
> https://github.com/facebookresearch/faiss, and looks like a much better
> option than Flann. However, something fast-moving like that which itself
> depends on BLAS and has GPU code in it too is not something we'd like to
> depend on nor want to vendor.
>
> Either way, Flann/Faiss is not about a 1:1 Cython translation, but about
> new features. We've got the distance metrics; approximate distances are
> probably best left to another package it looks like to me.
>
> On Mon, Sep 10, 2018 at 10:05 AM Mark Alexander Mikofski <
> mikofski at berkeley.edu> wrote:
>
>> I'm very interested to see how a successful cython/performance PR
>> progresses from a reviewers standpoint.
>>
>> On Sun, Sep 9, 2018, 5:10 PM Tyler Reddy <tyler.je.reddy at gmail.com>
>> wrote:
>>
>>> Good to see some activity / interest in spatial.
>>>
>>> Definitely agree with Ralf's github comments re: using smaller / more
>>> tractable PRs -- it really is tough to sit down at night with 30 minutes of
>>> free time or whatever and look at a massive diff & not want to give up.
>>>
>>> I like the idea of making small / targeted / clearly demonstrated
>>> performance improvements without overhauling the entire infrastructure
>>> first, but maybe that's a controversial view if it just causes too much
>>> heterogeneity.
>>>
>>> Presumably the affected code is all thoroughly covered by unit tests?
>>> That's an important pre-requisite to have the confidence to really start
>>> making changes, esp. with older low-level code like that.
>>>
>>> On Thu, 6 Sep 2018 at 02:29, Kane & Anna O'Donnell <
>>> email.the.odonnells at gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> TLDR; I'm wondering about a) porting the spatial.distance code to
>>>> Cython, and b) adding some performance optimizations, and I'd like the dev
>>>> community's input/feedback.
>>>>
>>>> For context (though I don't really need you to read these to give the
>>>> feedback I'm looking for),
>>>>
>>>> - original issue/proposal: https://github.com/scipy/scipy/issues/9205
>>>> - PR: https://github.com/scipy/scipy/pull/9218
>>>>
>>>> Before submitting the PR, I naively thought it was going to be nice
>>>> Cython or similar. Turns out it's some pretty old code, that I found pretty
>>>> hard to wrap my head around and understand. I eventually figured it out
>>>> after spending ages debugging a nastily hidden 'bug', and demonstrated the
>>>> performance optimizations, but it prompted the discussion about whether it
>>>> was best to port everything to Cython first.
>>>>
>>>> *Existing stuff to Cython*
>>>>
>>>> Doing so shouldn't be too hard, and it shouldn't change any
>>>> functionality, except to replace the distance functions with their Cython
>>>> ones (instead of the current approach, where the distance functions are
>>>> actually numpy things, and there's not supported access to the underlying C
>>>> stuff). A few possible 'bugs' (as above) should hopefully become non-issues
>>>> too. So, it'd be a win for performance (e.g. all the distance functions
>>>> will be much faster), and code quality, and future maintainability and
>>>> development. However, things to think about:
>>>>
>>>> - should I just replace like-for-like, or consider some nicer OOP stuff
>>>> like e.g. sklearn's approach (which is already Cython)?
>>>> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/neighbors/dist_metrics.pyx
>>>> (I'm guessing the reason they rolled their own was because they found scipy
>>>> lacking, as above.) In fact, we could largely just copy that file over. Not
>>>> sure about the interplay between scipy and scikit learn though.
>>>>
>>>
> It wouldn't be the first time that code is moved from scikit-learn to
> scipy, that could make sense. Would be good to see if that makes sense from
> scikit-learn's point of view.
>
>
>> - what's the best way to structure a PR? One metric at a time, or the
>>>> whole caboodle?
>>>>
>>>
> How about one metric first (easier to review), and then the rest in one go?
>
>
>>>> *Adding performance optimizations and future functionality*
>>>>
>>>> As per the links, this was initially about providing a nifty
>>>> performance hack. It should still be pretty easy to implement. Personally,
>>>> I think it makes sense to implement after the Cythonization - unless the
>>>> community are against that.
>>>>
>>>> However, there are other possibilities:
>>>>
>>>> - using 'indices' within calculations. E.g. when doing pdist, it might
>>>> pay to use a tree of some description. I also proposed another 'index' to
>>>> optimize the 'bail early' approach further (which would, I think, actually
>>>> work well with trees too). This would involve more API changes, possibly
>>>> significant.
>>>>
>>> - using approximate searches (e.g. Faiss). My understanding is that
>>>> depending on other libraries probably isn't really an option, so I'm not
>>>> sure what means.
>>>> - maybe other cool approaches like https://github.com/droyed/eucl_dist
>>>> - providing a way to 'tune' distance computations to be optimal to your
>>>> particular dataset and constraints (e.g. my 'bail early' optimization might
>>>> be a lot faster or a bit slower, depending on your data ... or you might be
>>>> OK with approximate matching with a 'low' error rate, etc.)
>>>>
>>>
> I think some of these are an option; they'd need to be applicable to all
> distance metrics though and not just euclidean or a small subset. In the
> mailing list thread I linked to above there was some discussion as well
> about using kdtree/balltree.
>
>
>>>> I guess what I'd like to see is a single place where users can get
>>>> access to everything related to distance metrics and their uses, including
>>>> all sorts of optimizations etc. (possibly for different hardware, and
>>>> possibly distributed). To do that well is a pretty big undertaking, and I
>>>> don't know whether it's suited to scipy - e.g. maybe scipy doesn't really
>>>> care about distance stuff, or only wants to stick with 'simple' distance
>>>> metric cases (e.g. a few thousand vectors, etc.). So, maybe it'd be better
>>>> to start a completely new python package - which would probably be a lot
>>>> easier to develop as I'd have a lot more flexibility (e.g. to depend on
>>>> other packages, and not have to worry about breaking the scipy API etc.).
>>>> On the other hand (as discussed in the latest comment on the PR), that
>>>> might not be best - it might never get used/maintained etc.
>>>>
>>>
> If you want to get really fancy, I'd lean towards a separate package. The
> idea of your current PR is in scopy for scipy.spatial though. We'd also be
> happy to link to a separate package from the scipy docs.
>
> Cheers,
> Ralf
>
>
>
>>>> So, what does the community think is the best approach? I've got too
>>>> little context of what scipy is and what it's aiming for, and I don't want
>>>> to head off on the wrong tack. Comments on any of the other implied
>>>> questions would also be appreciated.
>>>>
>>>> Thanks,
>>>>
>>>> kodonnell
>>>>
>>>>
>>>> _______________________________________________
>>>> SciPy-Dev mailing list
>>>> SciPy-Dev at python.org
>>>> https://mail.python.org/mailman/listinfo/scipy-dev
>>>>
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at python.org
>>> https://mail.python.org/mailman/listinfo/scipy-dev
>>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
>
>
> _______________________________________________
> SciPy-Dev mailing listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181009/f9984593/attachment-0001.html>

From phillip.m.feldman at gmail.com  Sun Oct 14 14:19:32 2018
From: phillip.m.feldman at gmail.com (Phillip Feldman)
Date: Sun, 14 Oct 2018 11:19:32 -0700
Subject: [SciPy-Dev] Poisson Disk Sampling
Message-ID: <CAB2ViTZXwCiUX+ZQ-f4eVcKx=pJTBkuWQgb0WmKNM5GXeznasA@mail.gmail.com>

Does anyone have code that does efficient subrandom sampling of the surface
of a sphere?  I'm looking, e.g., for an implementation of the algorithm in
https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf, or
something similar.

Thanks!

Phillip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181014/5db722db/attachment.html>

From andyfaff at gmail.com  Sun Oct 14 16:56:23 2018
From: andyfaff at gmail.com (Andrew Nelson)
Date: Sun, 14 Oct 2018 22:56:23 +0200
Subject: [SciPy-Dev] Poisson Disk Sampling
In-Reply-To: <CAB2ViTZXwCiUX+ZQ-f4eVcKx=pJTBkuWQgb0WmKNM5GXeznasA@mail.gmail.com>
References: <CAB2ViTZXwCiUX+ZQ-f4eVcKx=pJTBkuWQgb0WmKNM5GXeznasA@mail.gmail.com>
Message-ID: <CAAbtOZfRPYpX7DWh3nzOLhPg43-chMvSLHevnE6Y3CVi13mqyA@mail.gmail.com>

Is http://mathworld.wolfram.com/SpherePointPicking.html relevant? This
seems relatively simple.

On Sun., 14 Oct. 2018, 20:20 Phillip Feldman, <phillip.m.feldman at gmail.com>
wrote:

> Does anyone have code that does efficient subrandom sampling of the
> surface of a sphere?  I'm looking, e.g., for an implementation of the
> algorithm in
> https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf,
> or something similar.
>
> Thanks!
>
> Phillip
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181014/defcd92d/attachment.html>

From tyler.je.reddy at gmail.com  Sun Oct 14 19:31:33 2018
From: tyler.je.reddy at gmail.com (Tyler Reddy)
Date: Sun, 14 Oct 2018 16:31:33 -0700
Subject: [SciPy-Dev] Poisson Disk Sampling
In-Reply-To: <CAAbtOZfRPYpX7DWh3nzOLhPg43-chMvSLHevnE6Y3CVi13mqyA@mail.gmail.com>
References: <CAB2ViTZXwCiUX+ZQ-f4eVcKx=pJTBkuWQgb0WmKNM5GXeznasA@mail.gmail.com>
 <CAAbtOZfRPYpX7DWh3nzOLhPg43-chMvSLHevnE6Y3CVi13mqyA@mail.gmail.com>
Message-ID: <CAHPuU_ZoLiuFThmmSAf+VvqvqriWdTfhWLsNKxDk_xbox2bN3A@mail.gmail.com>

Yeah, that's an excellent page--many of the approaches are implemented on
Stack Overflow too -- I used a few of them while developing
SphericalVoronoi.

On Sun, 14 Oct 2018 at 13:57, Andrew Nelson <andyfaff at gmail.com> wrote:

> Is http://mathworld.wolfram.com/SpherePointPicking.html relevant? This
> seems relatively simple.
>
> On Sun., 14 Oct. 2018, 20:20 Phillip Feldman, <phillip.m.feldman at gmail.com>
> wrote:
>
>> Does anyone have code that does efficient subrandom sampling of the
>> surface of a sphere?  I'm looking, e.g., for an implementation of the
>> algorithm in
>> https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf,
>> or something similar.
>>
>> Thanks!
>>
>> Phillip
>>
>>
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181014/6fb1e8f8/attachment.html>

From warren.weckesser at gmail.com  Sun Oct 14 19:46:40 2018
From: warren.weckesser at gmail.com (Warren Weckesser)
Date: Sun, 14 Oct 2018 19:46:40 -0400
Subject: [SciPy-Dev] Poisson Disk Sampling
In-Reply-To: <CAAbtOZfRPYpX7DWh3nzOLhPg43-chMvSLHevnE6Y3CVi13mqyA@mail.gmail.com>
References: <CAB2ViTZXwCiUX+ZQ-f4eVcKx=pJTBkuWQgb0WmKNM5GXeznasA@mail.gmail.com>
 <CAAbtOZfRPYpX7DWh3nzOLhPg43-chMvSLHevnE6Y3CVi13mqyA@mail.gmail.com>
Message-ID: <CAGzF1ufm3gbEuTeQLJS1_DzXupy26zcdyi1-W6oKq1pgncOe-g@mail.gmail.com>

On 10/14/18, Andrew Nelson <andyfaff at gmail.com> wrote:
> Is http://mathworld.wolfram.com/SpherePointPicking.html relevant? This
> seems relatively simple.


Phillip is asking about *subrandom* samples, also known as
low-discrepancy or quasi-random samples; see
https://en.wikipedia.org/wiki/Low-discrepancy_sequence.

Warren


>
> On Sun., 14 Oct. 2018, 20:20 Phillip Feldman, <phillip.m.feldman at gmail.com>
> wrote:
>
>> Does anyone have code that does efficient subrandom sampling of the
>> surface of a sphere?  I'm looking, e.g., for an implementation of the
>> algorithm in
>> https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf,
>> or something similar.
>>
>> Thanks!
>>
>> Phillip
>>
>>
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
>

From tyler.je.reddy at gmail.com  Sun Oct 14 23:00:57 2018
From: tyler.je.reddy at gmail.com (Tyler Reddy)
Date: Sun, 14 Oct 2018 20:00:57 -0700
Subject: [SciPy-Dev] Poisson Disk Sampling
In-Reply-To: <CAGzF1ufm3gbEuTeQLJS1_DzXupy26zcdyi1-W6oKq1pgncOe-g@mail.gmail.com>
References: <CAB2ViTZXwCiUX+ZQ-f4eVcKx=pJTBkuWQgb0WmKNM5GXeznasA@mail.gmail.com>
 <CAAbtOZfRPYpX7DWh3nzOLhPg43-chMvSLHevnE6Y3CVi13mqyA@mail.gmail.com>
 <CAGzF1ufm3gbEuTeQLJS1_DzXupy26zcdyi1-W6oKq1pgncOe-g@mail.gmail.com>
Message-ID: <CAHPuU_bacfnNJpm_mFamG3_FPDEfJi_LxHi-wmF0dPAKS5+C+A@mail.gmail.com>

Ah, maybe that's another story then

On Sun, 14 Oct 2018 at 16:47, Warren Weckesser <warren.weckesser at gmail.com>
wrote:

> On 10/14/18, Andrew Nelson <andyfaff at gmail.com> wrote:
> > Is http://mathworld.wolfram.com/SpherePointPicking.html relevant? This
> > seems relatively simple.
>
>
> Phillip is asking about *subrandom* samples, also known as
> low-discrepancy or quasi-random samples; see
> https://en.wikipedia.org/wiki/Low-discrepancy_sequence.
>
> Warren
>
>
> >
> > On Sun., 14 Oct. 2018, 20:20 Phillip Feldman, <
> phillip.m.feldman at gmail.com>
> > wrote:
> >
> >> Does anyone have code that does efficient subrandom sampling of the
> >> surface of a sphere?  I'm looking, e.g., for an implementation of the
> >> algorithm in
> >> https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf
> ,
> >> or something similar.
> >>
> >> Thanks!
> >>
> >> Phillip
> >>
> >>
> >>
> >> _______________________________________________
> >> SciPy-Dev mailing list
> >> SciPy-Dev at python.org
> >> https://mail.python.org/mailman/listinfo/scipy-dev
> >>
> >
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181014/717ec762/attachment-0001.html>

From oneday2one at icloud.com  Mon Oct 15 05:53:57 2018
From: oneday2one at icloud.com (Jon Stein)
Date: Mon, 15 Oct 2018 09:53:57 +0000 (GMT)
Subject: [SciPy-Dev] two new scipy.stats requests code included:
Message-ID: <34111503-1aff-46d5-a4ea-8bb8fd912b94@me.com>

Scipy-dev,

Two additions to the scipy.stats module are missing and needed:

One addition is needed for a one sample z-test including confidence interval when the population mean and standard deviation are known:

def ztest(array_A, population_mean, population_stdv, level_of_confidence~example: .95):
? ? z_statistic = (array_A.mean() - population_stdv) / (population_stdv / math.sqrt(len(array_A)))
? ? p_value = (st.norm.cdf(z_stat))
? ? standard_error = population_stdv / math.sqrt(len(array_A))
? ? margin_of_error = st.norm.ppf(level_of_confidence) * standard_error
? ? MoE = margin_of_error
? ? return('z statistic =', z_statistic, 'p-value =', p_value, array_A.mean() - MoE, array_A.mean() + MoE)

And one addition is needed for a one-sample z-test for a categorical sample (*not quantitative*):

def ztest_1sample_categorical(sample_proportion, population_proportion, sample_size):
? ? sp, pp = sample_proportion, population_proportion
? ? z = (sp - pp) / math.sqrt((pp * (1 - pp)) / sample_size)
? ? p = st.norm.cdf(z)
? ? return('z statistic =', z, 'p value =', p)

Let me know what you think.
Jon Stein

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181015/cd7a1712/attachment.html>

From pmhobson at gmail.com  Mon Oct 15 12:45:30 2018
From: pmhobson at gmail.com (Paul Hobson)
Date: Mon, 15 Oct 2018 09:45:30 -0700
Subject: [SciPy-Dev] two new scipy.stats requests code included:
In-Reply-To: <34111503-1aff-46d5-a4ea-8bb8fd912b94@me.com>
References: <34111503-1aff-46d5-a4ea-8bb8fd912b94@me.com>
Message-ID: <CADT3MECnWVHhaJaATYbUMhCmeDJbxbgwyVcv0y+NjhymSkiaAA@mail.gmail.com>

Hey Jon,

To incorporate this into scipy, you'll need to open a pull request on
GitHub:
https://github.com/scipy/scipy

I'm not a scipy contributor, but I can tell you that you'll also need to
include tests that preferably use a (small) published dataset and confirm
that your function reproduce the published results.

Also, I don't think your return statements are behaving the way you think
they are. I believe that the preference is now to return a NamedTuple.

Hope that helps,
-Paul


On Mon, Oct 15, 2018 at 2:54 AM Jon Stein <oneday2one at icloud.com> wrote:

> Scipy-dev,
>
> Two additions to the scipy.stats module are missing and needed:
>
> One addition is needed for a one sample z-test including confidence
> interval when the population mean and standard deviation are known:
>
> def ztest(array_A, population_mean, population_stdv, level_of_confidence~*example:
> .95*):
>     z_statistic = (array_A.mean() - population_stdv) / (population_stdv /
> math.sqrt(len(array_A)))
>     p_value = (st.norm.cdf(z_stat))
>     standard_error = population_stdv / math.sqrt(len(array_A))
>     margin_of_error = st.norm.ppf(level_of_confidence) * standard_error
>     MoE = margin_of_error
>     return('z statistic =', z_statistic, 'p-value =', p_value,
> array_A.mean() - MoE, array_A.mean() + MoE)
>
> And one addition is needed for a one-sample z-test for a categorical
> sample (*not quantitative*):
>
> def ztest_1sample_categorical(sample_proportion, population_proportion,
> sample_size):
>     sp, pp = sample_proportion, population_proportion
>     z = (sp - pp) / math.sqrt((pp * (1 - pp)) / sample_size)
>     p = st.norm.cdf(z)
>     return('z statistic =', z, 'p value =', p)
>
> Let me know what you think.
> Jon Stein
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181015/364383db/attachment.html>

From josef.pktd at gmail.com  Mon Oct 15 13:10:07 2018
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 15 Oct 2018 13:10:07 -0400
Subject: [SciPy-Dev] two new scipy.stats requests code included:
In-Reply-To: <CADT3MECnWVHhaJaATYbUMhCmeDJbxbgwyVcv0y+NjhymSkiaAA@mail.gmail.com>
References: <34111503-1aff-46d5-a4ea-8bb8fd912b94@me.com>
 <CADT3MECnWVHhaJaATYbUMhCmeDJbxbgwyVcv0y+NjhymSkiaAA@mail.gmail.com>
Message-ID: <CAMMTP+DF_YNJbUZsbP8VvJDdUXTYoGt79R8W_aU3evHdnqGakw@mail.gmail.com>

On Mon, Oct 15, 2018 at 12:45 PM Paul Hobson <pmhobson at gmail.com> wrote:

> Hey Jon,
>
> To incorporate this into scipy, you'll need to open a pull request on
> GitHub:
> https://github.com/scipy/scipy
>
> I'm not a scipy contributor, but I can tell you that you'll also need to
> include tests that preferably use a (small) published dataset and confirm
> that your function reproduce the published results.
>
> Also, I don't think your return statements are behaving the way you think
> they are. I believe that the preference is now to return a NamedTuple.
>
> Hope that helps,
> -Paul
>
>
>
> On Mon, Oct 15, 2018 at 2:54 AM Jon Stein <oneday2one at icloud.com> wrote:
>
>> Scipy-dev,
>>
>> Two additions to the scipy.stats module are missing and needed:
>>
>> One addition is needed for a one sample z-test including confidence
>> interval when the population mean and standard deviation are known:
>>
>> def ztest(array_A, population_mean, population_stdv, level_of_confidence~*example:
>> .95*):
>>     z_statistic = (array_A.mean() - population_stdv) / (population_stdv /
>> math.sqrt(len(array_A)))
>>     p_value = (st.norm.cdf(z_stat))
>>     standard_error = population_stdv / math.sqrt(len(array_A))
>>     margin_of_error = st.norm.ppf(level_of_confidence) * standard_error
>>     MoE = margin_of_error
>>     return('z statistic =', z_statistic, 'p-value =', p_value,
>> array_A.mean() - MoE, array_A.mean() + MoE)
>>
>> And one addition is needed for a one-sample z-test for a categorical
>> sample (*not quantitative*):
>>
>> def ztest_1sample_categorical(sample_proportion, population_proportion,
>> sample_size):
>>     sp, pp = sample_proportion, population_proportion
>>     z = (sp - pp) / math.sqrt((pp * (1 - pp)) / sample_size)
>>     p = st.norm.cdf(z)
>>     return('z statistic =', z, 'p value =', p)
>>
>> Let me know what you think.
>> Jon Stein
>>
>
I think some discussion and decisions are needed for whether and how to add
this.

None of the hypothesis test currently returns a confidence interval.
Tuples are a pain because we cannot just return additional results without
breaking backwards compatibility.
Both ztests are based on summary statistics, for which scipy.stats has
already some cases.

Adding special cases like ztest_1sample_categorical opens up a large set of
statistical functions that could similarly be added, e.g. for poisson rates.
Additionally some tests have a choice of methods across stats package, e.g.
using pp corresponds to a score test (variance under the Null). And
alternative is to use variance based on sp, which corresponds to a Wald
test.
In the statsmodels version there is an extra option, but it doesn't have
the correct default.
For a two sample version for comparing proportions, the number of options
and available methods becomes much larger.
(Development for this in statsmodels is slow because I only find time every
once in a while to review or prepare PRs
https://github.com/statsmodels/statsmodels/pull/4829 )

I think some overlap in basic statistics functions between scipy.stats and
statsmodels is useful. However, the question where to draw the boundary is
always open.

Josef


>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181015/930e299d/attachment.html>

From oneday2one at icloud.com  Mon Oct 15 20:25:44 2018
From: oneday2one at icloud.com (Jon Stein)
Date: Tue, 16 Oct 2018 00:25:44 +0000 (GMT)
Subject: [SciPy-Dev] two new scipy.stats requests code included:
Message-ID: <6be6b0e2-d1d1-4540-9656-da76c3d9ce06@me.com>

Thank you for both your replies. ?I am very grateful. ?

There is no basic z-test or confidence interval in StatsModels or Scipy.stats. ?I was forced to create this to pass two recent statistics courses I took online with Standford and Carnegie Mellon.

The first is for quantitative data. ?The second is for Qualitative (categorical) data.

def ztest(array1, sample_size, array2.mean(), array2.std(), confidence_desired):
? ? z = (array2.mean() - array2.mean()) / (array2.std() / math.sqrt(len(array)))
? ? p = (st.norm.cdf(z_stat))
? ? standard_error = array2.std() / math.sqrt(len(array1))
? ? Margin_of_Error = st.norm.ppf(confidence_desired) * array2.std() / math.sqrt(len(array1))
? ? return(z, p, array1.mean() - Margin_of_Error, array1.mean() + Margin_of_Error)

def ztest_categorical(proportion1, proportion2, proportion1_sample_size):
? ? z = (proportion1 - proportion2) / math.sqrt(proportion2 * (1 - proportion2)) / proportion1_sample_size)
? ? p = st.norm.cdf(z)
? ? return(z,p)

let me know what you think.
Jon Stein
? ??

On Oct 15, 2018, at 01:11 PM, josef.pktd at gmail.com wrote:


On Mon, Oct 15, 2018 at 12:45 PM Paul Hobson <pmhobson at gmail.com> wrote:
Hey Jon,

To incorporate this into scipy, you'll need to open a pull request on GitHub: 
https://github.com/scipy/scipy

I'm not a scipy contributor, but I can tell you that you'll also need to include tests that preferably use a (small) published dataset and confirm that your function reproduce the published results.

Also, I don't think your return statements are behaving the way you think they are. I believe that the preference is now to return a NamedTuple.

Hope that helps,
-Paul


On Mon, Oct 15, 2018 at 2:54 AM Jon Stein <oneday2one at icloud.com> wrote:
Scipy-dev,

Two additions to the scipy.stats module are missing and needed:

One addition is needed for a one sample z-test including confidence interval when the population mean and standard deviation are known:

def ztest(array_A, population_mean, population_stdv, level_of_confidence~example: .95):
? ? z_statistic = (array_A.mean() - population_stdv) / (population_stdv / math.sqrt(len(array_A)))
? ? p_value = (st.norm.cdf(z_stat))
? ? standard_error = population_stdv / math.sqrt(len(array_A))
? ? margin_of_error = st.norm.ppf(level_of_confidence) * standard_error
? ? MoE = margin_of_error
? ? return('z statistic =', z_statistic, 'p-value =', p_value, array_A.mean() - MoE, array_A.mean() + MoE)

And one addition is needed for a one-sample z-test for a categorical sample (*not quantitative*):

def ztest_1sample_categorical(sample_proportion, population_proportion, sample_size):
? ? sp, pp = sample_proportion, population_proportion
? ? z = (sp - pp) / math.sqrt((pp * (1 - pp)) / sample_size)
? ? p = st.norm.cdf(z)
? ? return('z statistic =', z, 'p value =', p)

Let me know what you think.
Jon Stein

I think some discussion and decisions are needed for whether and how to add this.

None of the hypothesis test currently returns a confidence interval.
Tuples are a pain because we cannot just return additional results without breaking backwards compatibility.
Both ztests are based on summary statistics, for which scipy.stats has already some cases.

Adding special cases like ztest_1sample_categorical opens up a large set of statistical functions that could similarly be added, e.g. for poisson rates.
Additionally some tests have a choice of methods across stats package, e.g. using pp corresponds to a score test (variance under the Null). And alternative is to use variance based on sp, which corresponds to a Wald test.
In the statsmodels version there is an extra option, but it doesn't have the correct default.
For a two sample version for comparing proportions, the number of options and available methods becomes much larger.
(Development for this in statsmodels is slow because I only find time every once in a while to review or prepare PRs
https://github.com/statsmodels/statsmodels/pull/4829 )

I think some overlap in basic statistics functions between scipy.stats and statsmodels is useful. However, the question where to draw the boundary is always open.

Josef


?

_______________________________________________
SciPy-Dev mailing list
SciPy-Dev at python.org
https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev at python.org
https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev at python.org
https://mail.python.org/mailman/listinfo/scipy-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181016/d4f2f696/attachment-0001.html>

From robert.kern at gmail.com  Wed Oct 17 20:42:32 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 17 Oct 2018 17:42:32 -0700
Subject: [SciPy-Dev] Poisson Disk Sampling
In-Reply-To: <CAB2ViTZXwCiUX+ZQ-f4eVcKx=pJTBkuWQgb0WmKNM5GXeznasA@mail.gmail.com>
References: <CAB2ViTZXwCiUX+ZQ-f4eVcKx=pJTBkuWQgb0WmKNM5GXeznasA@mail.gmail.com>
Message-ID: <CAF6FJistE8M-Zab5b+CYr9Vp6VymUnqu4LcLRgGC9Pwvj3Wc1w@mail.gmail.com>

This article describes a new quasirandom scheme that is easy and efficient
to implement, and works nicely on the surface of a sphere through
transformation:


http://extremelearning.com.au/unreasonable-effectiveness-of-quasirandom-sequences/

The transformation should be applicable to any (quasi)random scheme that
generates numbers uniformly over [0,1]^2.

On Sun, Oct 14, 2018 at 11:20 AM Phillip Feldman <
phillip.m.feldman at gmail.com> wrote:

> Does anyone have code that does efficient subrandom sampling of the
> surface of a sphere?  I'm looking, e.g., for an implementation of the
> algorithm in
> https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf,
> or something similar.
>
> Thanks!
>
> Phillip
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>


-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181017/a9f477fd/attachment.html>

From phillip.m.feldman at gmail.com  Thu Oct 18 00:35:24 2018
From: phillip.m.feldman at gmail.com (Phillip Feldman)
Date: Wed, 17 Oct 2018 21:35:24 -0700
Subject: [SciPy-Dev] Poisson Disk Sampling
In-Reply-To: <CAF6FJistE8M-Zab5b+CYr9Vp6VymUnqu4LcLRgGC9Pwvj3Wc1w@mail.gmail.com>
References: <CAB2ViTZXwCiUX+ZQ-f4eVcKx=pJTBkuWQgb0WmKNM5GXeznasA@mail.gmail.com>
 <CAF6FJistE8M-Zab5b+CYr9Vp6VymUnqu4LcLRgGC9Pwvj3Wc1w@mail.gmail.com>
Message-ID: <CAB2ViTaY_gUNUM7eSRcTh0rEUTSQF=8mzWH4Ls+Mpf5h=s-d_Q@mail.gmail.com>

This is indeed very interesting.  Thanks!

P.S. I don't know of a clean mapping between [0, 1]^2 and the surface of
the sphere.  (This is a problem that cartographers have struggled with for
a few hundred years).  But, there is a simple mapping from [-1, 1]^3 to the
surface of the sphere, so I will explore that.

On Wed, Oct 17, 2018 at 5:43 PM Robert Kern <robert.kern at gmail.com> wrote:

> This article describes a new quasirandom scheme that is easy and efficient
> to implement, and works nicely on the surface of a sphere through
> transformation:
>
>
> http://extremelearning.com.au/unreasonable-effectiveness-of-quasirandom-sequences/
>
> The transformation should be applicable to any (quasi)random scheme that
> generates numbers uniformly over [0,1]^2.
>
> On Sun, Oct 14, 2018 at 11:20 AM Phillip Feldman <
> phillip.m.feldman at gmail.com> wrote:
>
>> Does anyone have code that does efficient subrandom sampling of the
>> surface of a sphere?  I'm looking, e.g., for an implementation of the
>> algorithm in
>> https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf,
>> or something similar.
>>
>> Thanks!
>>
>> Phillip
>>
>>
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
>
>
> --
> Robert Kern
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181017/d89afc77/attachment.html>

From robert.kern at gmail.com  Thu Oct 18 00:43:52 2018
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 17 Oct 2018 21:43:52 -0700
Subject: [SciPy-Dev] Poisson Disk Sampling
In-Reply-To: <CAB2ViTaY_gUNUM7eSRcTh0rEUTSQF=8mzWH4Ls+Mpf5h=s-d_Q@mail.gmail.com>
References: <CAB2ViTZXwCiUX+ZQ-f4eVcKx=pJTBkuWQgb0WmKNM5GXeznasA@mail.gmail.com>
 <CAF6FJistE8M-Zab5b+CYr9Vp6VymUnqu4LcLRgGC9Pwvj3Wc1w@mail.gmail.com>
 <CAB2ViTaY_gUNUM7eSRcTh0rEUTSQF=8mzWH4Ls+Mpf5h=s-d_Q@mail.gmail.com>
Message-ID: <CAF6FJiu33cUG0PaDxSA46CSM0NX3Tdq6cMyabbaPgdDSbHUuQQ@mail.gmail.com>

On Wed, Oct 17, 2018 at 9:36 PM Phillip Feldman <phillip.m.feldman at gmail.com>
wrote:

> This is indeed very interesting.  Thanks!
>
> P.S. I don't know of a clean mapping between [0, 1]^2 and the surface of
> the sphere.  (This is a problem that cartographers have struggled with for
> a few hundred years).  But, there is a simple mapping from [-1, 1]^3 to the
> surface of the sphere, so I will explore that.
>

See the section "Quasirandom Points on a sphere" in that article for the
details.


> On Wed, Oct 17, 2018 at 5:43 PM Robert Kern <robert.kern at gmail.com> wrote:
>
>> This article describes a new quasirandom scheme that is easy and
>> efficient to implement, and works nicely on the surface of a sphere through
>> transformation:
>>
>>
>> http://extremelearning.com.au/unreasonable-effectiveness-of-quasirandom-sequences/
>>
>> The transformation should be applicable to any (quasi)random scheme that
>> generates numbers uniformly over [0,1]^2.
>>
>> On Sun, Oct 14, 2018 at 11:20 AM Phillip Feldman <
>> phillip.m.feldman at gmail.com> wrote:
>>
>>> Does anyone have code that does efficient subrandom sampling of the
>>> surface of a sphere?  I'm looking, e.g., for an implementation of the
>>> algorithm in
>>> https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf,
>>> or something similar.
>>>
>>> Thanks!
>>>
>>> Phillip
>>>
>>>
>>>
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at python.org
>>> https://mail.python.org/mailman/listinfo/scipy-dev
>>>
>>
>>
>> --
>> Robert Kern
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>


-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181017/e546c8dd/attachment.html>

From phillip.m.feldman at gmail.com  Thu Oct 18 01:32:49 2018
From: phillip.m.feldman at gmail.com (Phillip Feldman)
Date: Wed, 17 Oct 2018 22:32:49 -0700
Subject: [SciPy-Dev] Poisson Disk Sampling
In-Reply-To: <CAF6FJiu33cUG0PaDxSA46CSM0NX3Tdq6cMyabbaPgdDSbHUuQQ@mail.gmail.com>
References: <CAB2ViTZXwCiUX+ZQ-f4eVcKx=pJTBkuWQgb0WmKNM5GXeznasA@mail.gmail.com>
 <CAF6FJistE8M-Zab5b+CYr9Vp6VymUnqu4LcLRgGC9Pwvj3Wc1w@mail.gmail.com>
 <CAB2ViTaY_gUNUM7eSRcTh0rEUTSQF=8mzWH4Ls+Mpf5h=s-d_Q@mail.gmail.com>
 <CAF6FJiu33cUG0PaDxSA46CSM0NX3Tdq6cMyabbaPgdDSbHUuQQ@mail.gmail.com>
Message-ID: <CAB2ViTaLW7hVFxfv+_tq6ZJo_YNOxc4RAKNdnB9jF40GkrKOUQ@mail.gmail.com>

I should have read the whole thing.  The equal-area projection does indeed
do the job.  (Conformality is unnecessary for this application).  Thanks
again!

On Wed, Oct 17, 2018 at 9:44 PM Robert Kern <robert.kern at gmail.com> wrote:

> On Wed, Oct 17, 2018 at 9:36 PM Phillip Feldman <
> phillip.m.feldman at gmail.com> wrote:
>
>> This is indeed very interesting.  Thanks!
>>
>> P.S. I don't know of a clean mapping between [0, 1]^2 and the surface of
>> the sphere.  (This is a problem that cartographers have struggled with for
>> a few hundred years).  But, there is a simple mapping from [-1, 1]^3 to the
>> surface of the sphere, so I will explore that.
>>
>
> See the section "Quasirandom Points on a sphere" in that article for the
> details.
>
>
>> On Wed, Oct 17, 2018 at 5:43 PM Robert Kern <robert.kern at gmail.com>
>> wrote:
>>
>>> This article describes a new quasirandom scheme that is easy and
>>> efficient to implement, and works nicely on the surface of a sphere through
>>> transformation:
>>>
>>>
>>> http://extremelearning.com.au/unreasonable-effectiveness-of-quasirandom-sequences/
>>>
>>> The transformation should be applicable to any (quasi)random scheme that
>>> generates numbers uniformly over [0,1]^2.
>>>
>>> On Sun, Oct 14, 2018 at 11:20 AM Phillip Feldman <
>>> phillip.m.feldman at gmail.com> wrote:
>>>
>>>> Does anyone have code that does efficient subrandom sampling of the
>>>> surface of a sphere?  I'm looking, e.g., for an implementation of the
>>>> algorithm in
>>>> https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf,
>>>> or something similar.
>>>>
>>>> Thanks!
>>>>
>>>> Phillip
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> SciPy-Dev mailing list
>>>> SciPy-Dev at python.org
>>>> https://mail.python.org/mailman/listinfo/scipy-dev
>>>>
>>>
>>>
>>> --
>>> Robert Kern
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at python.org
>>> https://mail.python.org/mailman/listinfo/scipy-dev
>>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
>
>
> --
> Robert Kern
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181017/5be15f63/attachment-0001.html>

From stefanv at berkeley.edu  Thu Oct 18 01:44:42 2018
From: stefanv at berkeley.edu (Stefan van der Walt)
Date: Wed, 17 Oct 2018 22:44:42 -0700
Subject: [SciPy-Dev] Poisson Disk Sampling
In-Reply-To: <CAF6FJistE8M-Zab5b+CYr9Vp6VymUnqu4LcLRgGC9Pwvj3Wc1w@mail.gmail.com>
References: <CAB2ViTZXwCiUX+ZQ-f4eVcKx=pJTBkuWQgb0WmKNM5GXeznasA@mail.gmail.com>
 <CAF6FJistE8M-Zab5b+CYr9Vp6VymUnqu4LcLRgGC9Pwvj3Wc1w@mail.gmail.com>
Message-ID: <16685b45d10.27ae.acf34a9c767d7bb498a799333be0433e@fastmail.com>

We have a few other schemes here:

https://github.com/fperez/spheredwi/blob/master/src/point_dist.py

And charged particles can be found in DiPy.

Not the sampling you mention, but perhaps helpful in that context.

St?fan

On October 17, 2018 17:43:27 Robert Kern <robert.kern at gmail.com> wrote:
> This article describes a new quasirandom scheme that is easy and efficient 
> to implement, and works nicely on the surface of a sphere through 
> transformation:
>
>  http://extremelearning.com.au/unreasonable-effectiveness-of-quasirandom-sequences/
>
> The transformation should be applicable to any (quasi)random scheme that 
> generates numbers uniformly over [0,1]^2.
>
> On Sun, Oct 14, 2018 at 11:20 AM Phillip Feldman 
> <phillip.m.feldman at gmail.com> wrote:
> Does anyone have code that does efficient subrandom sampling of the surface 
> of a sphere?  I'm looking, e.g., for an implementation of the algorithm in 
> https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf, or 
> something similar.
>
> Thanks!
>
> Phillip
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
>
> --
> Robert Kern
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181017/5f228dfd/attachment.html>

From ali.cetin at outlook.com  Thu Oct 18 06:40:50 2018
From: ali.cetin at outlook.com (Ali Cetin)
Date: Thu, 18 Oct 2018 10:40:50 +0000
Subject: [SciPy-Dev] Finding signal upcrossings and declustered peaks
Message-ID: <AM5P195MB00362051CAC68AF1D903A52BE3F80@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>

Hi,

this is my first attempt to write to this mailing list, so if I'm doing something wrong please bear with me.

I've been working with extreme value analysis during my PhD and also in my current job. What we often do is to find signal upcrossings (such as mean and zero upcrossings) and find declustered peaks. That is, find the largest peak between two upcrossings, or the two largest peaks between two upcrossings etc...

I'm proposing to add two new functions to the scipy.signal submodule:
   - find_upcross; takes in signal and returns index of upcrossings wrt user defined upcrossing level
   - find_peaks_dc; takes in signal and returns index of peaks (n largest)

(find_peaks_dc is not necessarly easily incorporated into find_peaks, so it may be cleaner to have a separate function).

Any thought on this?

Cheers,
Ali
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181018/e6a9c935/attachment.html>

From ralf.gommers at gmail.com  Thu Oct 18 15:59:02 2018
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Thu, 18 Oct 2018 19:59:02 +0000
Subject: [SciPy-Dev] Finding signal upcrossings and declustered peaks
In-Reply-To: <AM5P195MB00362051CAC68AF1D903A52BE3F80@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
References: <AM5P195MB00362051CAC68AF1D903A52BE3F80@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
Message-ID: <CABL7CQj4AAgJJTocwNpmZytBuEHr82Nd7ZbBH7DqqL2WwPsRrg@mail.gmail.com>

On Thu, Oct 18, 2018 at 10:41 AM Ali Cetin <ali.cetin at outlook.com> wrote:

> Hi,
>
> this is my first attempt to write to this mailing list, so if I'm doing
> something wrong please bear with me.
>

Hi Ali, no worries you got it all right. welcome:)

>
> I've been working with extreme value analysis during my PhD and also in my
> current job. What we often do is to find signal upcrossings (such as mean
> and zero upcrossings) and find declustered peaks. That is, find the largest
> peak between two upcrossings, or the two largest peaks between two
> upcrossings etc...
>
> I'm proposing to add two new functions to the scipy.signal submodule:
>    - find_upcross; takes in signal and returns index of upcrossings wrt
> user defined upcrossing level
>    - find_peaks_dc; takes in signal and returns index of peaks (n largest)
>
> (find_peaks_dc is not necessarly easily incorporated into find_peaks, so
> it may be cleaner to have a separate function).
>
> Any thought on this?
>

Do you have any references for the algorithms?

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181018/32552872/attachment.html>

From ali.cetin at outlook.com  Thu Oct 18 16:52:55 2018
From: ali.cetin at outlook.com (Ali Cetin)
Date: Thu, 18 Oct 2018 20:52:55 +0000
Subject: [SciPy-Dev] Finding signal upcrossings and declustered peaks
In-Reply-To: <CABL7CQj4AAgJJTocwNpmZytBuEHr82Nd7ZbBH7DqqL2WwPsRrg@mail.gmail.com>
References: <AM5P195MB00362051CAC68AF1D903A52BE3F80@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>,
 <CABL7CQj4AAgJJTocwNpmZytBuEHr82Nd7ZbBH7DqqL2WwPsRrg@mail.gmail.com>
Message-ID: <AM5P195MB00360EDC17B9631D98B1DF55E3F80@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>


________________________________
From: SciPy-Dev <scipy-dev-bounces+ali.cetin=outlook.com at python.org> on behalf of Ralf Gommers <ralf.gommers at gmail.com>
Sent: Thursday, October 18, 2018 21:59
To: SciPy Developers List
Subject: Re: [SciPy-Dev] Finding signal upcrossings and declustered peaks


On Thu, Oct 18, 2018 at 10:41 AM Ali Cetin <ali.cetin at outlook.com<mailto:ali.cetin at outlook.com>> wrote:
Hi,

this is my first attempt to write to this mailing list, so if I'm doing something wrong please bear with me.

Hi Ali, no worries you got it all right. welcome:)

I've been working with extreme value analysis during my PhD and also in my current job. What we often do is to find signal upcrossings (such as mean and zero upcrossings) and find declustered peaks. That is, find the largest peak between two upcrossings, or the two largest peaks between two upcrossings etc...

I'm proposing to add two new functions to the scipy.signal submodule:
   - find_upcross; takes in signal and returns index of upcrossings wrt user defined upcrossing level
   - find_peaks_dc; takes in signal and returns index of peaks (n largest)

(find_peaks_dc is not necessarly easily incorporated into find_peaks, so it may be cleaner to have a separate function).

Any thought on this?

Do you have any references for the algorithms?
Well, the "algorithms" are rather straight forward and heuristic.

   - find_upcross is often referred to as zero-crossing (https://en.wikipedia.org/wiki/Zero_crossing) (a special case). It is essentially detecting sign changes. Upcrossing count and rates are important statistics in time series (extreme value) analysis. Side note: Zero-crossing rate is by def the ratio between the second and zeroth moment of the power spectrum of the signal.

   - find_peaks_dc finds all peaks above an upcrossing level (or threshold) between two consecutive upcrossings, i.e. batches of peaks. Then the n-largest peaks are selected from each batch. Declustering is a technique often used to break down statistical dependency between peaks when performing extreme value analysis, and thus be able to use simpler distributions to describe them. As n-> inf find_peaks_dc -> find_peaks ? (Albeit in many practical situations, inf < 5). (Note that find_peaks_dc depends on find_upcross).

The code are just a few lines and mostly numpy array operations.

Ali


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181018/2600449e/attachment.html>

From daniele at grinta.net  Thu Oct 18 16:48:52 2018
From: daniele at grinta.net (Daniele Nicolodi)
Date: Thu, 18 Oct 2018 14:48:52 -0600
Subject: [SciPy-Dev] Strange code in scipy.signal.decimate
Message-ID: <795609c2-501f-6932-7013-2f2e2f38b996@grinta.net>

Hello,

I was having a look at the scipy.signal.decimate() function and I
noticed some strange looking code:

    elif ftype == 'iir':
        if n is None:
            n = 8
        system = dlti(*cheby1(n, 0.05, 0.8 / q))
        b, a = system.num, system.den

This is used to setup the anti aliasing low pass filter. What I don't
understand is the dance to obtain the IIR numerator and denominator
coefficients. Couldn't the above be simply as the code below?

    elif ftype == 'iir':
        if n is None:
            n = 8
        b, a = cheby1(n, 0.05, 0.8 / q)

Am I missing something?

Thanks!

Cheers,
Dan

From pmhobson at gmail.com  Thu Oct 18 17:28:47 2018
From: pmhobson at gmail.com (Paul Hobson)
Date: Thu, 18 Oct 2018 14:28:47 -0700
Subject: [SciPy-Dev] Strange code in scipy.signal.decimate
In-Reply-To: <795609c2-501f-6932-7013-2f2e2f38b996@grinta.net>
References: <795609c2-501f-6932-7013-2f2e2f38b996@grinta.net>
Message-ID: <CADT3MEC=cM92i8f5ivKXPzao6WYqi2sR7ee2jK5+YvHE0+KQoA@mail.gmail.com>

Dan,

I might be missing something, but dlti returns a subclass of
LinearTimeInvariant. Point is, it's not a tuple, but a class with multiple
attributes and method. Simple tuple unpacking won't likely work unless the
authors of LinearTimeInvariant really went out of their way to make it so.
-Paul

On Thu, Oct 18, 2018 at 1:58 PM Daniele Nicolodi <daniele at grinta.net> wrote:

> Hello,
>
> I was having a look at the scipy.signal.decimate() function and I
> noticed some strange looking code:
>
>     elif ftype == 'iir':
>         if n is None:
>             n = 8
>         system = dlti(*cheby1(n, 0.05, 0.8 / q))
>         b, a = system.num, system.den
>
> This is used to setup the anti aliasing low pass filter. What I don't
> understand is the dance to obtain the IIR numerator and denominator
> coefficients. Couldn't the above be simply as the code below?
>
>     elif ftype == 'iir':
>         if n is None:
>             n = 8
>         b, a = cheby1(n, 0.05, 0.8 / q)
>
> Am I missing something?
>
> Thanks!
>
> Cheers,
> Dan
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181018/964c7863/attachment.html>

From daniele at grinta.net  Thu Oct 18 17:49:00 2018
From: daniele at grinta.net (Daniele Nicolodi)
Date: Thu, 18 Oct 2018 15:49:00 -0600
Subject: [SciPy-Dev] Strange code in scipy.signal.decimate
In-Reply-To: <CADT3MEC=cM92i8f5ivKXPzao6WYqi2sR7ee2jK5+YvHE0+KQoA@mail.gmail.com>
References: <795609c2-501f-6932-7013-2f2e2f38b996@grinta.net>
 <CADT3MEC=cM92i8f5ivKXPzao6WYqi2sR7ee2jK5+YvHE0+KQoA@mail.gmail.com>
Message-ID: <72c45861-1e27-d2c1-28b6-4f88cb208064@grinta.net>

On 18-10-2018 15:28, Paul Hobson wrote:
> Dan,
> 
> I might be missing something, but dlti returns a subclass of
> LinearTimeInvariant. Point is, it's not a tuple, but a class with
> multiple attributes and method. Simple tuple unpacking won't likely work
> unless the authors of LinearTimeInvariant really went out of their way
> to make it so.

The point is not to go through the `dlti` class at all.

Please look at the replacement code I posted: it works.

Cheers,
Dan

> -Paul
> 
> On Thu, Oct 18, 2018 at 1:58 PM Daniele Nicolodi <daniele at grinta.net
> <mailto:daniele at grinta.net>> wrote:
> 
>     Hello,
> 
>     I was having a look at the scipy.signal.decimate() function and I
>     noticed some strange looking code:
> 
>     ? ? elif ftype == 'iir':
>     ? ? ? ? if n is None:
>     ? ? ? ? ? ? n = 8
>     ? ? ? ? system = dlti(*cheby1(n, 0.05, 0.8 / q))
>     ? ? ? ? b, a = system.num, system.den
> 
>     This is used to setup the anti aliasing low pass filter. What I don't
>     understand is the dance to obtain the IIR numerator and denominator
>     coefficients. Couldn't the above be simply as the code below?
> 
>     ? ? elif ftype == 'iir':
>     ? ? ? ? if n is None:
>     ? ? ? ? ? ? n = 8
>     ? ? ? ? b, a = cheby1(n, 0.05, 0.8 / q)
> 
>     Am I missing something?
> 
>     Thanks!
> 
>     Cheers,
>     Dan
>     _______________________________________________
>     SciPy-Dev mailing list
>     SciPy-Dev at python.org <mailto:SciPy-Dev at python.org>
>     https://mail.python.org/mailman/listinfo/scipy-dev
> 
> 
> 
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
> 


From pmhobson at gmail.com  Thu Oct 18 18:30:24 2018
From: pmhobson at gmail.com (Paul Hobson)
Date: Thu, 18 Oct 2018 15:30:24 -0700
Subject: [SciPy-Dev] Strange code in scipy.signal.decimate
In-Reply-To: <72c45861-1e27-d2c1-28b6-4f88cb208064@grinta.net>
References: <795609c2-501f-6932-7013-2f2e2f38b996@grinta.net>
 <CADT3MEC=cM92i8f5ivKXPzao6WYqi2sR7ee2jK5+YvHE0+KQoA@mail.gmail.com>
 <72c45861-1e27-d2c1-28b6-4f88cb208064@grinta.net>
Message-ID: <CADT3MEAfw=8CJZQ8JEFCLjJYyNWXR=TD0qzTrWyEYo-j3hCZ6A@mail.gmail.com>

If you make the changes and run the test suite, do the pertinent tests pass?
-Paul

On Thu, Oct 18, 2018 at 2:49 PM Daniele Nicolodi <daniele at grinta.net> wrote:

> On 18-10-2018 15:28, Paul Hobson wrote:
> > Dan,
> >
> > I might be missing something, but dlti returns a subclass of
> > LinearTimeInvariant. Point is, it's not a tuple, but a class with
> > multiple attributes and method. Simple tuple unpacking won't likely work
> > unless the authors of LinearTimeInvariant really went out of their way
> > to make it so.
>
> The point is not to go through the `dlti` class at all.
>
> Please look at the replacement code I posted: it works.
>
> Cheers,
> Dan
>
> > -Paul
> >
> > On Thu, Oct 18, 2018 at 1:58 PM Daniele Nicolodi <daniele at grinta.net
> > <mailto:daniele at grinta.net>> wrote:
> >
> >     Hello,
> >
> >     I was having a look at the scipy.signal.decimate() function and I
> >     noticed some strange looking code:
> >
> >         elif ftype == 'iir':
> >             if n is None:
> >                 n = 8
> >             system = dlti(*cheby1(n, 0.05, 0.8 / q))
> >             b, a = system.num, system.den
> >
> >     This is used to setup the anti aliasing low pass filter. What I don't
> >     understand is the dance to obtain the IIR numerator and denominator
> >     coefficients. Couldn't the above be simply as the code below?
> >
> >         elif ftype == 'iir':
> >             if n is None:
> >                 n = 8
> >             b, a = cheby1(n, 0.05, 0.8 / q)
> >
> >     Am I missing something?
> >
> >     Thanks!
> >
> >     Cheers,
> >     Dan
> >     _______________________________________________
> >     SciPy-Dev mailing list
> >     SciPy-Dev at python.org <mailto:SciPy-Dev at python.org>
> >     https://mail.python.org/mailman/listinfo/scipy-dev
> >
> >
> >
> > _______________________________________________
> > SciPy-Dev mailing list
> > SciPy-Dev at python.org
> > https://mail.python.org/mailman/listinfo/scipy-dev
> >
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181018/19b0cfb5/attachment.html>

From warren.weckesser at gmail.com  Thu Oct 18 19:07:10 2018
From: warren.weckesser at gmail.com (Warren Weckesser)
Date: Thu, 18 Oct 2018 19:07:10 -0400
Subject: [SciPy-Dev] Strange code in scipy.signal.decimate
In-Reply-To: <72c45861-1e27-d2c1-28b6-4f88cb208064@grinta.net>
References: <795609c2-501f-6932-7013-2f2e2f38b996@grinta.net>
 <CADT3MEC=cM92i8f5ivKXPzao6WYqi2sR7ee2jK5+YvHE0+KQoA@mail.gmail.com>
 <72c45861-1e27-d2c1-28b6-4f88cb208064@grinta.net>
Message-ID: <CAGzF1ufFiacsjY9hg-hX49gU3mxKfeHmOr+VJV7Z_ZYVuKA=aQ@mail.gmail.com>

On 10/18/18, Daniele Nicolodi <daniele at grinta.net> wrote:
> On 18-10-2018 15:28, Paul Hobson wrote:
>> Dan,
>>
>> I might be missing something, but dlti returns a subclass of
>> LinearTimeInvariant. Point is, it's not a tuple, but a class with
>> multiple attributes and method. Simple tuple unpacking won't likely work
>> unless the authors of LinearTimeInvariant really went out of their way
>> to make it so.
>
> The point is not to go through the `dlti` class at all.
>
> Please look at the replacement code I posted: it works.
>


There have been several incremental changes to that function over the
years.  I suspect the last person to change it simply did not notice
that the code could be simplified.  Your proposed change looks good.

Warren


> Cheers,
> Dan
>
>> -Paul
>>
>> On Thu, Oct 18, 2018 at 1:58 PM Daniele Nicolodi <daniele at grinta.net
>> <mailto:daniele at grinta.net>> wrote:
>>
>>     Hello,
>>
>>     I was having a look at the scipy.signal.decimate() function and I
>>     noticed some strange looking code:
>>
>>         elif ftype == 'iir':
>>             if n is None:
>>                 n = 8
>>             system = dlti(*cheby1(n, 0.05, 0.8 / q))
>>             b, a = system.num, system.den
>>
>>     This is used to setup the anti aliasing low pass filter. What I don't
>>     understand is the dance to obtain the IIR numerator and denominator
>>     coefficients. Couldn't the above be simply as the code below?
>>
>>         elif ftype == 'iir':
>>             if n is None:
>>                 n = 8
>>             b, a = cheby1(n, 0.05, 0.8 / q)
>>
>>     Am I missing something?
>>
>>     Thanks!
>>
>>     Cheers,
>>     Dan
>>     _______________________________________________
>>     SciPy-Dev mailing list
>>     SciPy-Dev at python.org <mailto:SciPy-Dev at python.org>
>>     https://mail.python.org/mailman/listinfo/scipy-dev
>>
>>
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>

From eric.antonio.quintero at gmail.com  Thu Oct 18 21:18:06 2018
From: eric.antonio.quintero at gmail.com (Eric Quintero)
Date: Thu, 18 Oct 2018 21:18:06 -0400
Subject: [SciPy-Dev] Strange code in scipy.signal.decimate
In-Reply-To: <67d883853eeb47388c1514192d93bb68@MWHPR03MB2637.namprd03.prod.outlook.com>
References: <795609c2-501f-6932-7013-2f2e2f38b996@grinta.net>
 <CADT3MEC=cM92i8f5ivKXPzao6WYqi2sR7ee2jK5+YvHE0+KQoA@mail.gmail.com>
 <72c45861-1e27-d2c1-28b6-4f88cb208064@grinta.net>
 <67d883853eeb47388c1514192d93bb68@MWHPR03MB2637.namprd03.prod.outlook.com>
Message-ID: <DEB1D551-8323-4745-846F-69C614E97918@gmail.com>

Warren is correct; there were some changes to how the `dlti` objects were being internally passed around on PR 7835, and this extra instantiation flew under the radar. A PR to simplify the code would be very welcome.

-Eric Q.

> On Oct 18, 2018, at 7:07 PM, Warren Weckesser <warren.weckesser at gmail.com> wrote:
> 
> On 10/18/18, Daniele Nicolodi <daniele at grinta.net> wrote:
>> On 18-10-2018 15:28, Paul Hobson wrote:
>>> Dan,
>>> 
>>> I might be missing something, but dlti returns a subclass of
>>> LinearTimeInvariant. Point is, it's not a tuple, but a class with
>>> multiple attributes and method. Simple tuple unpacking won't likely work
>>> unless the authors of LinearTimeInvariant really went out of their way
>>> to make it so.
>> 
>> The point is not to go through the `dlti` class at all.
>> 
>> Please look at the replacement code I posted: it works.
>> 
> 
> 
> There have been several incremental changes to that function over the
> years.  I suspect the last person to change it simply did not notice
> that the code could be simplified.  Your proposed change looks good.
> 
> Warren
> 
> 
>> Cheers,
>> Dan
>> 
>>> -Paul
>>> 
>>> On Thu, Oct 18, 2018 at 1:58 PM Daniele Nicolodi <daniele at grinta.net
>>> <mailto:daniele at grinta.net>> wrote:
>>> 
>>>    Hello,
>>> 
>>>    I was having a look at the scipy.signal.decimate() function and I
>>>    noticed some strange looking code:
>>> 
>>>        elif ftype == 'iir':
>>>            if n is None:
>>>                n = 8
>>>            system = dlti(*cheby1(n, 0.05, 0.8 / q))
>>>            b, a = system.num, system.den
>>> 
>>>    This is used to setup the anti aliasing low pass filter. What I don't
>>>    understand is the dance to obtain the IIR numerator and denominator
>>>    coefficients. Couldn't the above be simply as the code below?
>>> 
>>>        elif ftype == 'iir':
>>>            if n is None:
>>>                n = 8
>>>            b, a = cheby1(n, 0.05, 0.8 / q)
>>> 
>>>    Am I missing something?
>>> 
>>>    Thanks!
>>> 
>>>    Cheers,
>>>    Dan
>>>    _______________________________________________
>>>    SciPy-Dev mailing list
>>>    SciPy-Dev at python.org <mailto:SciPy-Dev at python.org>
>>>    https://mail.python.org/mailman/listinfo/scipy-dev
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at python.org
>>> https://mail.python.org/mailman/listinfo/scipy-dev
>>> 
>> 
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>> 
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev


From ilhanpolat at gmail.com  Fri Oct 19 07:56:51 2018
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Fri, 19 Oct 2018 13:56:51 +0200
Subject: [SciPy-Dev] Finding signal upcrossings and declustered peaks
In-Reply-To: <AM5P195MB00360EDC17B9631D98B1DF55E3F80@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
References: <AM5P195MB00362051CAC68AF1D903A52BE3F80@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
 <CABL7CQj4AAgJJTocwNpmZytBuEHr82Nd7ZbBH7DqqL2WwPsRrg@mail.gmail.com>
 <AM5P195MB00360EDC17B9631D98B1DF55E3F80@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
Message-ID: <CAEBuzr9OLDAX3S7casz7p8pNvmOrvaexB4ggGT22XZSdabEapg@mail.gmail.com>

Are these covered by the recent find_peaks functions or brand new ones?

On Thu, Oct 18, 2018, 22:53 Ali Cetin <ali.cetin at outlook.com> wrote:

>
>
> ------------------------------
> *From:* SciPy-Dev <scipy-dev-bounces+ali.cetin=outlook.com at python.org> on
> behalf of Ralf Gommers <ralf.gommers at gmail.com>
> *Sent:* Thursday, October 18, 2018 21:59
> *To:* SciPy Developers List
> *Subject:* Re: [SciPy-Dev] Finding signal upcrossings and declustered
> peaks
>
>
>
> On Thu, Oct 18, 2018 at 10:41 AM Ali Cetin <ali.cetin at outlook.com> wrote:
>
> Hi,
>
> this is my first attempt to write to this mailing list, so if I'm doing
> something wrong please bear with me.
>
>
> Hi Ali, no worries you got it all right. welcome:)
>
>
> I've been working with extreme value analysis during my PhD and also in my
> current job. What we often do is to find signal upcrossings (such as mean
> and zero upcrossings) and find declustered peaks. That is, find the largest
> peak between two upcrossings, or the two largest peaks between two
> upcrossings etc...
>
> I'm proposing to add two new functions to the scipy.signal submodule:
>    - find_upcross; takes in signal and returns index of upcrossings wrt
> user defined upcrossing level
>    - find_peaks_dc; takes in signal and returns index of peaks (n largest)
>
> (find_peaks_dc is not necessarly easily incorporated into find_peaks, so
> it may be cleaner to have a separate function).
>
> Any thought on this?
>
>
> Do you have any references for the algorithms?
>
> Well, the "algorithms" are rather straight forward and heuristic.
>
>    - find_upcross is often referred to as zero-crossing (
> https://en.wikipedia.org/wiki/Zero_crossing) (a special case). It is
> essentially detecting sign changes. Upcrossing count and rates are
> important statistics in time series (extreme value) analysis. Side note:
> Zero-crossing rate is by def the ratio between the second and zeroth moment
> of the power spectrum of the signal.
>
>    - find_peaks_dc finds all peaks above an upcrossing level (or
> threshold) between two consecutive upcrossings, i.e. batches of peaks. Then
> the n-largest peaks are selected from each batch. Declustering is a
> technique often used to break down statistical dependency between peaks
> when performing extreme value analysis, and thus be able to use simpler
> distributions to describe them. As n-> inf find_peaks_dc -> find_peaks ?
> (Albeit in many practical situations, inf < 5). (Note that find_peaks_dc
> depends on find_upcross).
>
> The code are just a few lines and mostly numpy array operations.
>
> Ali
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181019/49299edc/attachment.html>

From ali.cetin at outlook.com  Fri Oct 19 15:28:45 2018
From: ali.cetin at outlook.com (Ali Cetin)
Date: Fri, 19 Oct 2018 19:28:45 +0000
Subject: [SciPy-Dev] Finding signal upcrossings and declustered peaks
In-Reply-To: <CAEBuzr9OLDAX3S7casz7p8pNvmOrvaexB4ggGT22XZSdabEapg@mail.gmail.com>
References: <AM5P195MB00362051CAC68AF1D903A52BE3F80@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
 <CABL7CQj4AAgJJTocwNpmZytBuEHr82Nd7ZbBH7DqqL2WwPsRrg@mail.gmail.com>
 <AM5P195MB00360EDC17B9631D98B1DF55E3F80@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>,
 <CAEBuzr9OLDAX3S7casz7p8pNvmOrvaexB4ggGT22XZSdabEapg@mail.gmail.com>
Message-ID: <AM5P195MB0036CC7EC5BA6FE678497A10E3F90@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>


Are these covered by the recent find_peaks functions or brand new ones?
As far as I can see, the current find_peaks don't cover it. Also, finding crossings (up and down) is not "peak finding", but necessary to find declustered peaks.

On Thu, Oct 18, 2018, 22:53 Ali Cetin <ali.cetin at outlook.com<mailto:ali.cetin at outlook.com>> wrote:


________________________________
From: SciPy-Dev <scipy-dev-bounces+ali.cetin=outlook.com at python.org<mailto:outlook.com at python.org>> on behalf of Ralf Gommers <ralf.gommers at gmail.com<mailto:ralf.gommers at gmail.com>>
Sent: Thursday, October 18, 2018 21:59
To: SciPy Developers List
Subject: Re: [SciPy-Dev] Finding signal upcrossings and declustered peaks


On Thu, Oct 18, 2018 at 10:41 AM Ali Cetin <ali.cetin at outlook.com<mailto:ali.cetin at outlook.com>> wrote:
Hi,

this is my first attempt to write to this mailing list, so if I'm doing something wrong please bear with me.

Hi Ali, no worries you got it all right. welcome:)

I've been working with extreme value analysis during my PhD and also in my current job. What we often do is to find signal upcrossings (such as mean and zero upcrossings) and find declustered peaks. That is, find the largest peak between two upcrossings, or the two largest peaks between two upcrossings etc...

I'm proposing to add two new functions to the scipy.signal submodule:
   - find_upcross; takes in signal and returns index of upcrossings wrt user defined upcrossing level
   - find_peaks_dc; takes in signal and returns index of peaks (n largest)

(find_peaks_dc is not necessarly easily incorporated into find_peaks, so it may be cleaner to have a separate function).

Any thought on this?

Do you have any references for the algorithms?
Well, the "algorithms" are rather straight forward and heuristic.

   - find_upcross is often referred to as zero-crossing (https://en.wikipedia.org/wiki/Zero_crossing<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FZero_crossing&data=02%7C01%7C%7C29e7d31897f64406a0cc08d635ba0c52%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636755470512379532&sdata=1qJvYLvkDV8l3t2r46VXdcVaNtCxLaWAe%2FLgwhdF0b8%3D&reserved=0>) (a special case). It is essentially detecting sign changes. Upcrossing count and rates are important statistics in time series (extreme value) analysis. Side note: Zero-crossing rate is by def the ratio between the second and zeroth moment of the power spectrum of the signal.

   - find_peaks_dc finds all peaks above an upcrossing level (or threshold) between two consecutive upcrossings, i.e. batches of peaks. Then the n-largest peaks are selected from each batch. Declustering is a technique often used to break down statistical dependency between peaks when performing extreme value analysis, and thus be able to use simpler distributions to describe them. As n-> inf find_peaks_dc -> find_peaks ? (Albeit in many practical situations, inf < 5). (Note that find_peaks_dc depends on find_upcross).

The code are just a few lines and mostly numpy array operations.

Ali


_______________________________________________
SciPy-Dev mailing list
SciPy-Dev at python.org<mailto:SciPy-Dev at python.org>
https://mail.python.org/mailman/listinfo/scipy-dev<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman%2Flistinfo%2Fscipy-dev&data=02%7C01%7C%7C29e7d31897f64406a0cc08d635ba0c52%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636755470512379532&sdata=ibdU%2Fglw784cdI5BUH5ZvXYr%2BoHmpWur6%2Fbp9yYYR%2Bc%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181019/27ed7338/attachment.html>

From ralf.gommers at gmail.com  Fri Oct 19 18:14:17 2018
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Fri, 19 Oct 2018 22:14:17 +0000
Subject: [SciPy-Dev] Finding signal upcrossings and declustered peaks
In-Reply-To: <AM5P195MB0036CC7EC5BA6FE678497A10E3F90@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
References: <AM5P195MB00362051CAC68AF1D903A52BE3F80@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
 <CABL7CQj4AAgJJTocwNpmZytBuEHr82Nd7ZbBH7DqqL2WwPsRrg@mail.gmail.com>
 <AM5P195MB00360EDC17B9631D98B1DF55E3F80@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
 <CAEBuzr9OLDAX3S7casz7p8pNvmOrvaexB4ggGT22XZSdabEapg@mail.gmail.com>
 <AM5P195MB0036CC7EC5BA6FE678497A10E3F90@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
Message-ID: <CABL7CQit_Hx0Y8_aGn6TeVjnMAddSHnYpL7r0aU7FLzUiGTHOA@mail.gmail.com>

On Fri, Oct 19, 2018 at 7:28 PM Ali Cetin <ali.cetin at outlook.com> wrote:

>
> Are these covered by the recent find_peaks functions or brand new ones?
>
> As far as I can see, the current find_peaks don't cover it. Also, finding
> crossings (up and down) is not "peak finding", but necessary to find
> declustered peaks.
>
> On Thu, Oct 18, 2018, 22:53 Ali Cetin <ali.cetin at outlook.com> wrote:
>
>
>
> ------------------------------
> *From:* SciPy-Dev <scipy-dev-bounces+ali.cetin=outlook.com at python.org> on
> behalf of Ralf Gommers <ralf.gommers at gmail.com>
> *Sent:* Thursday, October 18, 2018 21:59
> *To:* SciPy Developers List
> *Subject:* Re: [SciPy-Dev] Finding signal upcrossings and declustered
> peaks
>
>
>
> On Thu, Oct 18, 2018 at 10:41 AM Ali Cetin <ali.cetin at outlook.com> wrote:
>
> Hi,
>
> this is my first attempt to write to this mailing list, so if I'm doing
> something wrong please bear with me.
>
>
> Hi Ali, no worries you got it all right. welcome:)
>
>
> I've been working with extreme value analysis during my PhD and also in my
> current job. What we often do is to find signal upcrossings (such as mean
> and zero upcrossings) and find declustered peaks. That is, find the largest
> peak between two upcrossings, or the two largest peaks between two
> upcrossings etc...
>
> I'm proposing to add two new functions to the scipy.signal submodule:
>    - find_upcross; takes in signal and returns index of upcrossings wrt
> user defined upcrossing level
>    - find_peaks_dc; takes in signal and returns index of peaks (n largest)
>
> (find_peaks_dc is not necessarly easily incorporated into find_peaks, so
> it may be cleaner to have a separate function).
>
> Any thought on this?
>
>
> Do you have any references for the algorithms?
>
> Well, the "algorithms" are rather straight forward and heuristic.
>
>    - find_upcross is often referred to as zero-crossing (
> https://en.wikipedia.org/wiki/Zero_crossing
> <https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FZero_crossing&data=02%7C01%7C%7C29e7d31897f64406a0cc08d635ba0c52%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636755470512379532&sdata=1qJvYLvkDV8l3t2r46VXdcVaNtCxLaWAe%2FLgwhdF0b8%3D&reserved=0>)
> (a special case). It is essentially detecting sign changes. Upcrossing
> count and rates are important statistics in time series (extreme value)
> analysis. Side note: Zero-crossing rate is by def the ratio between the
> second and zeroth moment of the power spectrum of the signal.
>
>    - find_peaks_dc finds all peaks above an upcrossing level (or
> threshold) between two consecutive upcrossings, i.e. batches of peaks. Then
> the n-largest peaks are selected from each batch. Declustering is a
> technique often used to break down statistical dependency between peaks
> when performing extreme value analysis, and thus be able to use simpler
> distributions to describe them. As n-> inf find_peaks_dc -> find_peaks ?
> (Albeit in many practical situations, inf < 5). (Note that find_peaks_dc
> depends on find_upcross).
>
> The code are just a few lines and mostly numpy array operations.
>
>
It sounds like you have implementations ready, could you link to them? (put
in a git branch or a gist for example).

If it's just a few lines of code and not based on a publication, then I'm
not sure we'd want to add these. We are interested in further peak finding
improvements, however it should be clear for any new functions that they're
an improvement over what we currently have. Otherwise I'm afraid we keep
adding separate functions that all do a small subset of the spectrum of
what users are interested in. E.g., from your description it's not clear
how to treat zero crossings in the presence of noise.

Cheers,
Ralf


> Ali
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
> <https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman%2Flistinfo%2Fscipy-dev&data=02%7C01%7C%7C29e7d31897f64406a0cc08d635ba0c52%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636755470512379532&sdata=ibdU%2Fglw784cdI5BUH5ZvXYr%2BoHmpWur6%2Fbp9yYYR%2Bc%3D&reserved=0>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181019/8c86a6b1/attachment-0001.html>

From ali.cetin at outlook.com  Sun Oct 21 15:50:44 2018
From: ali.cetin at outlook.com (Ali Cetin)
Date: Sun, 21 Oct 2018 19:50:44 +0000
Subject: [SciPy-Dev] Finding signal upcrossings and declustered peaks
In-Reply-To: <CABL7CQit_Hx0Y8_aGn6TeVjnMAddSHnYpL7r0aU7FLzUiGTHOA@mail.gmail.com>
References: <AM5P195MB00362051CAC68AF1D903A52BE3F80@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
 <CABL7CQj4AAgJJTocwNpmZytBuEHr82Nd7ZbBH7DqqL2WwPsRrg@mail.gmail.com>
 <AM5P195MB00360EDC17B9631D98B1DF55E3F80@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
 <CAEBuzr9OLDAX3S7casz7p8pNvmOrvaexB4ggGT22XZSdabEapg@mail.gmail.com>
 <AM5P195MB0036CC7EC5BA6FE678497A10E3F90@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>,
 <CABL7CQit_Hx0Y8_aGn6TeVjnMAddSHnYpL7r0aU7FLzUiGTHOA@mail.gmail.com>
Message-ID: <AM5P195MB00368FCAFC3F3D3ADDD83BA9E3FB0@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>


________________________________
From: SciPy-Dev <scipy-dev-bounces+ali.cetin=outlook.com at python.org> on behalf of Ralf Gommers <ralf.gommers at gmail.com>
Sent: Saturday, October 20, 2018 00:14
To: SciPy Developers List
Subject: Re: [SciPy-Dev] Finding signal upcrossings and declustered peaks


On Fri, Oct 19, 2018 at 7:28 PM Ali Cetin <ali.cetin at outlook.com<mailto:ali.cetin at outlook.com>> wrote:

Are these covered by the recent find_peaks functions or brand new ones?
As far as I can see, the current find_peaks don't cover it. Also, finding crossings (up and down) is not "peak finding", but necessary to find declustered peaks.

On Thu, Oct 18, 2018, 22:53 Ali Cetin <ali.cetin at outlook.com<mailto:ali.cetin at outlook.com>> wrote:


________________________________
From: SciPy-Dev <scipy-dev-bounces+ali.cetin=outlook.com at python.org<mailto:outlook.com at python.org>> on behalf of Ralf Gommers <ralf.gommers at gmail.com<mailto:ralf.gommers at gmail.com>>
Sent: Thursday, October 18, 2018 21:59
To: SciPy Developers List
Subject: Re: [SciPy-Dev] Finding signal upcrossings and declustered peaks


On Thu, Oct 18, 2018 at 10:41 AM Ali Cetin <ali.cetin at outlook.com<mailto:ali.cetin at outlook.com>> wrote:
Hi,

this is my first attempt to write to this mailing list, so if I'm doing something wrong please bear with me.

Hi Ali, no worries you got it all right. welcome:)

I've been working with extreme value analysis during my PhD and also in my current job. What we often do is to find signal upcrossings (such as mean and zero upcrossings) and find declustered peaks. That is, find the largest peak between two upcrossings, or the two largest peaks between two upcrossings etc...

I'm proposing to add two new functions to the scipy.signal submodule:
   - find_upcross; takes in signal and returns index of upcrossings wrt user defined upcrossing level
   - find_peaks_dc; takes in signal and returns index of peaks (n largest)

(find_peaks_dc is not necessarly easily incorporated into find_peaks, so it may be cleaner to have a separate function).

Any thought on this?

Do you have any references for the algorithms?
Well, the "algorithms" are rather straight forward and heuristic.

   - find_upcross is often referred to as zero-crossing (https://en.wikipedia.org/wiki/Zero_crossing<https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FZero_crossing&data=02%7C01%7C%7Cefb0986f81014d0559cf08d636104cf0%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636755840965540114&sdata=wFhnF2fdwmgQbdQKGxLfcZKgWh5%2FyAjwNfMzx2WJMwk%3D&reserved=0>) (a special case). It is essentially detecting sign changes. Upcrossing count and rates are important statistics in time series (extreme value) analysis. Side note: Zero-crossing rate is by def the ratio between the second and zeroth moment of the power spectrum of the signal.

   - find_peaks_dc finds all peaks above an upcrossing level (or threshold) between two consecutive upcrossings, i.e. batches of peaks. Then the n-largest peaks are selected from each batch. Declustering is a technique often used to break down statistical dependency between peaks when performing extreme value analysis, and thus be able to use simpler distributions to describe them. As n-> inf find_peaks_dc -> find_peaks ? (Albeit in many practical situations, inf < 5). (Note that find_peaks_dc depends on find_upcross).

The code are just a few lines and mostly numpy array operations.

It sounds like you have implementations ready, could you link to them? (put in a git branch or a gist for example).

If it's just a few lines of code and not based on a publication, then I'm not sure we'd want to add these. We are interested in further peak finding improvements, however it should be clear for any new functions that they're an improvement over what we currently have. Otherwise I'm afraid we keep adding separate functions that all do a small subset of the spectrum of what users are interested in. E.g., from your description it's not clear how to treat zero crossings in the presence of noise.

Cheers,
Ralf
I think I misunderstood what you ment by reference; peak declustering methods dont necessarily have a reference paper, as the methods are rather self-explanatory. However, scientific papers and textbooks that use these methods are plenty! (Peaks declustering is indeed a very common technique in extreme value analysis. Just google "peak over threshold declustering") These are some of them:
   - Coles, An Introduction to Statistical Modeling of Extreme Values, (https://www.springer.com/us/book/9781852334598)
   - Davison, A. C., & Smith, R. L. (1990). Models for exceedances over high thresholds. Journal of the Royal Statistical Society. Series B (Methodological), 393-442.
   - Ferro, C. A. T. and Segers, J. (2003) Inference for clusters of extreme values. Journal of the Royal Statistical Society B, 65, 545--556.

I also note that peak declustering methods are available in R. (https://www.rdocumentation.org/packages/extRemes/versions/1.65/topics/decluster.runs)

Yes, I have written a "small" package that can find up-crossings and one particular method for peaks declustering (https://github.com/4Subsea/evapy). However, the functions in this package are rather limited in scope, limited to 1D arrays (performance optimized), and depends heavily on NumPy and SciPy. I was thinking about expanding the functionality anyway, and thought that I might do that by contributing to SciPy.

I propose to take an agile approach on this:

  *   I'm almost done re-writing base signal up/down-crossing module. I can make a pull-request to scipy in the coming days. (It will add similar functionality as argrelmax, argrelmin -> argupcross, argdowncross etc.) If you don't like it, we can stop it there.
  *   next step may be to add peaks declustering methods (whether by extending the current find_peaks or a new function dedicated for peaks declustering methods.)

Cheers,
Ali


Ali


_______________________________________________
SciPy-Dev mailing list
SciPy-Dev at python.org<mailto:SciPy-Dev at python.org>
https://mail.python.org/mailman/listinfo/scipy-dev<https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman%2Flistinfo%2Fscipy-dev&data=02%7C01%7C%7Cefb0986f81014d0559cf08d636104cf0%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636755840965540114&sdata=QWW20Ty89lgdlccBursQwm5fXkpbcCLwWgcmty5mvCg%3D&reserved=0>
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev at python.org<mailto:SciPy-Dev at python.org>
https://mail.python.org/mailman/listinfo/scipy-dev<https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman%2Flistinfo%2Fscipy-dev&data=02%7C01%7C%7Cefb0986f81014d0559cf08d636104cf0%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636755840965540114&sdata=QWW20Ty89lgdlccBursQwm5fXkpbcCLwWgcmty5mvCg%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181021/4be57949/attachment-0001.html>

From ralf.gommers at gmail.com  Sun Oct 21 16:48:27 2018
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Mon, 22 Oct 2018 09:48:27 +1300
Subject: [SciPy-Dev] Finding signal upcrossings and declustered peaks
In-Reply-To: <AM5P195MB00368FCAFC3F3D3ADDD83BA9E3FB0@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
References: <AM5P195MB00362051CAC68AF1D903A52BE3F80@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
 <CABL7CQj4AAgJJTocwNpmZytBuEHr82Nd7ZbBH7DqqL2WwPsRrg@mail.gmail.com>
 <AM5P195MB00360EDC17B9631D98B1DF55E3F80@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
 <CAEBuzr9OLDAX3S7casz7p8pNvmOrvaexB4ggGT22XZSdabEapg@mail.gmail.com>
 <AM5P195MB0036CC7EC5BA6FE678497A10E3F90@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
 <CABL7CQit_Hx0Y8_aGn6TeVjnMAddSHnYpL7r0aU7FLzUiGTHOA@mail.gmail.com>
 <AM5P195MB00368FCAFC3F3D3ADDD83BA9E3FB0@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
Message-ID: <CABL7CQiCODqKqOQ0Jn2Q247kaEfMz5yCJS4DchJOe=6OSMr6gw@mail.gmail.com>

On Mon, Oct 22, 2018 at 8:51 AM Ali Cetin <ali.cetin at outlook.com> wrote:

>
>
> ------------------------------
> *From:* SciPy-Dev <scipy-dev-bounces+ali.cetin=outlook.com at python.org> on
> behalf of Ralf Gommers <ralf.gommers at gmail.com>
> *Sent:* Saturday, October 20, 2018 00:14
> *To:* SciPy Developers List
> *Subject:* Re: [SciPy-Dev] Finding signal upcrossings and declustered
> peaks
>
>
>
> On Fri, Oct 19, 2018 at 7:28 PM Ali Cetin <ali.cetin at outlook.com> wrote:
>
>
> Are these covered by the recent find_peaks functions or brand new ones?
>
> As far as I can see, the current find_peaks don't cover it. Also, finding
> crossings (up and down) is not "peak finding", but necessary to find
> declustered peaks.
>
> On Thu, Oct 18, 2018, 22:53 Ali Cetin <ali.cetin at outlook.com> wrote:
>
>
>
> ------------------------------
> *From:* SciPy-Dev <scipy-dev-bounces+ali.cetin=outlook.com at python.org> on
> behalf of Ralf Gommers <ralf.gommers at gmail.com>
> *Sent:* Thursday, October 18, 2018 21:59
> *To:* SciPy Developers List
> *Subject:* Re: [SciPy-Dev] Finding signal upcrossings and declustered
> peaks
>
>
>
> On Thu, Oct 18, 2018 at 10:41 AM Ali Cetin <ali.cetin at outlook.com> wrote:
>
> Hi,
>
> this is my first attempt to write to this mailing list, so if I'm doing
> something wrong please bear with me.
>
>
> Hi Ali, no worries you got it all right. welcome:)
>
>
> I've been working with extreme value analysis during my PhD and also in my
> current job. What we often do is to find signal upcrossings (such as mean
> and zero upcrossings) and find declustered peaks. That is, find the largest
> peak between two upcrossings, or the two largest peaks between two
> upcrossings etc...
>
> I'm proposing to add two new functions to the scipy.signal submodule:
>    - find_upcross; takes in signal and returns index of upcrossings wrt
> user defined upcrossing level
>    - find_peaks_dc; takes in signal and returns index of peaks (n largest)
>
> (find_peaks_dc is not necessarly easily incorporated into find_peaks, so
> it may be cleaner to have a separate function).
>
> Any thought on this?
>
>
> Do you have any references for the algorithms?
>
> Well, the "algorithms" are rather straight forward and heuristic.
>
>    - find_upcross is often referred to as zero-crossing (
> https://en.wikipedia.org/wiki/Zero_crossing
> <https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FZero_crossing&data=02%7C01%7C%7Cefb0986f81014d0559cf08d636104cf0%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636755840965540114&sdata=wFhnF2fdwmgQbdQKGxLfcZKgWh5%2FyAjwNfMzx2WJMwk%3D&reserved=0>)
> (a special case). It is essentially detecting sign changes. Upcrossing
> count and rates are important statistics in time series (extreme value)
> analysis. Side note: Zero-crossing rate is by def the ratio between the
> second and zeroth moment of the power spectrum of the signal.
>
>    - find_peaks_dc finds all peaks above an upcrossing level (or
> threshold) between two consecutive upcrossings, i.e. batches of peaks. Then
> the n-largest peaks are selected from each batch. Declustering is a
> technique often used to break down statistical dependency between peaks
> when performing extreme value analysis, and thus be able to use simpler
> distributions to describe them. As n-> inf find_peaks_dc -> find_peaks ?
> (Albeit in many practical situations, inf < 5). (Note that find_peaks_dc
> depends on find_upcross).
>
> The code are just a few lines and mostly numpy array operations.
>
>
> It sounds like you have implementations ready, could you link to them?
> (put in a git branch or a gist for example).
>
> If it's just a few lines of code and not based on a publication, then I'm
> not sure we'd want to add these. We are interested in further peak finding
> improvements, however it should be clear for any new functions that they're
> an improvement over what we currently have. Otherwise I'm afraid we keep
> adding separate functions that all do a small subset of the spectrum of
> what users are interested in. E.g., from your description it's not clear
> how to treat zero crossings in the presence of noise.
>
> Cheers,
> Ralf
>
> I think I misunderstood what you ment by reference; peak declustering
> methods dont necessarily have a reference paper, as the methods are rather
> self-explanatory. However, scientific papers and textbooks that use these
> methods are plenty! (Peaks declustering is indeed a very common technique
> in extreme value analysis. Just google "peak over threshold declustering")
> These are some of them:
>    - Coles, An Introduction to Statistical Modeling of Extreme Values, (
> https://www.springer.com/us/book/9781852334598)
>    - Davison, A. C., & Smith, R. L. (1990). Models for exceedances over
> high thresholds. Journal of the Royal Statistical Society. Series B
> (Methodological), 393-442.
>    - Ferro, C. A. T. and Segers, J. (2003) Inference for clusters of
> extreme values. Journal of the Royal Statistical Society B, 65, 545--556.
>
> I also note that peak declustering methods are available in R. (
> https://www.rdocumentation.org/packages/extRemes/versions/1.65/topics/decluster.runs
> )
>

Thanks, that all helps!


> Yes, I have written a "small" package that can find up-crossings and one
> particular method for peaks declustering (https://github.com/4Subsea/evapy).
> However, the functions in this package are rather limited in scope, limited
> to 1D arrays (performance optimized), and depends heavily on NumPy and
> SciPy. I was thinking about expanding the functionality anyway, and thought
> that I might do that by contributing to SciPy.
>
> I propose to take an agile approach on this:
>
>    - I'm almost done re-writing base signal up/down-crossing module. I
>    can make a pull-request to scipy in the coming days. (It will add similar
>    functionality as argrelmax, argrelmin -> argupcross, argdowncross etc.) If
>    you don't like it, we can stop it there.
>
>
This sounds good, always easier to talk about a feature when there's
already code.

Cheers,
Ralf


>    - next step may be to add peaks declustering methods (whether by
>    extending the current find_peaks or a new function dedicated for peaks
>    declustering methods.)
>
> Cheers,
> Ali
>
>
> Ali
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
> <https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman%2Flistinfo%2Fscipy-dev&data=02%7C01%7C%7Cefb0986f81014d0559cf08d636104cf0%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636755840965540114&sdata=QWW20Ty89lgdlccBursQwm5fXkpbcCLwWgcmty5mvCg%3D&reserved=0>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
> <https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman%2Flistinfo%2Fscipy-dev&data=02%7C01%7C%7Cefb0986f81014d0559cf08d636104cf0%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636755840965540114&sdata=QWW20Ty89lgdlccBursQwm5fXkpbcCLwWgcmty5mvCg%3D&reserved=0>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181022/c1fbed2d/attachment-0001.html>

From charlesr.harris at gmail.com  Mon Oct 22 14:06:53 2018
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 22 Oct 2018 12:06:53 -0600
Subject: [SciPy-Dev] NumPy 1.15.3 release
Message-ID: <CAB6mnx+gRa=bw_f2Gue-Z+63te5x9nwk-j6dXjM7sES1LFoL8A@mail.gmail.com>

Hi All,

On behalf of the NumPy team, I am pleased to announce the release of NumPy
1.15.3. This is a bugfix release for bugs and regressions reported
following the 1.15.2 release.  The most noticeable fix is probably for
the memory
leak <https://github.com/numpy/numpy/issues/12037> encountered when slicing
classes derived from Numpy.

The Python versions supported by this release are 2.7, 3.4-3.7. Wheels for
this release can be downloaded from PyPI
<https://pypi.org/project/numpy/1.15.3/>, source archives are available
from Github <https://github.com/numpy/numpy/releases/tag/v1.15.3>.

Compatibility Note
==================

The NumPy 1.15.x OS X wheels released on PyPI no longer contain 32-bit
binaries.  That will also be the case in future releases. See
`#11625 <https://github.com/numpy/numpy/issues/11625>`__ for the related
discussion.  Those needing 32-bit support should look elsewhere or build
from source.

Contributors
============

A total of 7 people contributed to this release.  People with a "+" by their
names contributed a patch for the first time.

* Allan Haldane
* Charles Harris
* Jeroen Demeyer
* Kevin Sheppard
* Matthew Bowden +
* Matti Picus
* Tyler Reddy

Pull requests merged
====================

A total of 12 pull requests were merged for this release.

* `#12080 <https://github.com/numpy/numpy/pull/12080>`__: MAINT: Blacklist
some MSVC complex functions.
* `#12083 <https://github.com/numpy/numpy/pull/12083>`__: TST: Add azure CI
testing to 1.15.x branch.
* `#12084 <https://github.com/numpy/numpy/pull/12084>`__: BUG: test_path()
now uses Path.resolve()
* `#12085 <https://github.com/numpy/numpy/pull/12085>`__: TST, MAINT: Fix
some failing tests on azure-pipelines mac and...
* `#12187 <https://github.com/numpy/numpy/pull/12187>`__: BUG: Fix memory
leak in mapping.c
* `#12188 <https://github.com/numpy/numpy/pull/12188>`__: BUG: Allow
boolean subtract in histogram
* `#12189 <https://github.com/numpy/numpy/pull/12189>`__: BUG: Fix in-place
permutation
* `#12190 <https://github.com/numpy/numpy/pull/12190>`__: BUG: limit
default for get_num_build_jobs() to 8
* `#12191 <https://github.com/numpy/numpy/pull/12191>`__: BUG: OBJECT_to_*
should check for errors
* `#12192 <https://github.com/numpy/numpy/pull/12192>`__: DOC: Prepare for
NumPy 1.15.3 release.
* `#12237 <https://github.com/numpy/numpy/pull/12237>`__: BUG: Fix
MaskedArray fill_value type conversion.
* `#12238 <https://github.com/numpy/numpy/pull/12238>`__: TST: Backport
azure-pipeline testing fixes for Mac

Cheers,

Charles Harris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181022/bad670ae/attachment.html>

From ali.cetin at outlook.com  Mon Oct 22 14:53:09 2018
From: ali.cetin at outlook.com (Ali Cetin)
Date: Mon, 22 Oct 2018 18:53:09 +0000
Subject: [SciPy-Dev] Finding signal upcrossings and declustered peaks
In-Reply-To: <CABL7CQiCODqKqOQ0Jn2Q247kaEfMz5yCJS4DchJOe=6OSMr6gw@mail.gmail.com>
References: <AM5P195MB00362051CAC68AF1D903A52BE3F80@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
 <CABL7CQj4AAgJJTocwNpmZytBuEHr82Nd7ZbBH7DqqL2WwPsRrg@mail.gmail.com>
 <AM5P195MB00360EDC17B9631D98B1DF55E3F80@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
 <CAEBuzr9OLDAX3S7casz7p8pNvmOrvaexB4ggGT22XZSdabEapg@mail.gmail.com>
 <AM5P195MB0036CC7EC5BA6FE678497A10E3F90@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>
 <CABL7CQit_Hx0Y8_aGn6TeVjnMAddSHnYpL7r0aU7FLzUiGTHOA@mail.gmail.com>
 <AM5P195MB00368FCAFC3F3D3ADDD83BA9E3FB0@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>,
 <CABL7CQiCODqKqOQ0Jn2Q247kaEfMz5yCJS4DchJOe=6OSMr6gw@mail.gmail.com>
Message-ID: <AM5P195MB0036F7DC855675DAD302F43AE3F40@AM5P195MB0036.EURP195.PROD.OUTLOOK.COM>


________________________________
From: SciPy-Dev <scipy-dev-bounces+ali.cetin=outlook.com at python.org> on behalf of Ralf Gommers <ralf.gommers at gmail.com>
Sent: Sunday, October 21, 2018 22:48
To: SciPy Developers List
Subject: Re: [SciPy-Dev] Finding signal upcrossings and declustered peaks


On Mon, Oct 22, 2018 at 8:51 AM Ali Cetin <ali.cetin at outlook.com<mailto:ali.cetin at outlook.com>> wrote:


________________________________
From: SciPy-Dev <scipy-dev-bounces+ali.cetin=outlook.com at python.org<mailto:outlook.com at python.org>> on behalf of Ralf Gommers <ralf.gommers at gmail.com<mailto:ralf.gommers at gmail.com>>
Sent: Saturday, October 20, 2018 00:14
To: SciPy Developers List
Subject: Re: [SciPy-Dev] Finding signal upcrossings and declustered peaks


On Fri, Oct 19, 2018 at 7:28 PM Ali Cetin <ali.cetin at outlook.com<mailto:ali.cetin at outlook.com>> wrote:

Are these covered by the recent find_peaks functions or brand new ones?
As far as I can see, the current find_peaks don't cover it. Also, finding crossings (up and down) is not "peak finding", but necessary to find declustered peaks.

On Thu, Oct 18, 2018, 22:53 Ali Cetin <ali.cetin at outlook.com<mailto:ali.cetin at outlook.com>> wrote:


________________________________
From: SciPy-Dev <scipy-dev-bounces+ali.cetin=outlook.com at python.org<mailto:outlook.com at python.org>> on behalf of Ralf Gommers <ralf.gommers at gmail.com<mailto:ralf.gommers at gmail.com>>
Sent: Thursday, October 18, 2018 21:59
To: SciPy Developers List
Subject: Re: [SciPy-Dev] Finding signal upcrossings and declustered peaks


On Thu, Oct 18, 2018 at 10:41 AM Ali Cetin <ali.cetin at outlook.com<mailto:ali.cetin at outlook.com>> wrote:
Hi,

this is my first attempt to write to this mailing list, so if I'm doing something wrong please bear with me.

Hi Ali, no worries you got it all right. welcome:)

I've been working with extreme value analysis during my PhD and also in my current job. What we often do is to find signal upcrossings (such as mean and zero upcrossings) and find declustered peaks. That is, find the largest peak between two upcrossings, or the two largest peaks between two upcrossings etc...

I'm proposing to add two new functions to the scipy.signal submodule:
   - find_upcross; takes in signal and returns index of upcrossings wrt user defined upcrossing level
   - find_peaks_dc; takes in signal and returns index of peaks (n largest)

(find_peaks_dc is not necessarly easily incorporated into find_peaks, so it may be cleaner to have a separate function).

Any thought on this?

Do you have any references for the algorithms?
Well, the "algorithms" are rather straight forward and heuristic.

   - find_upcross is often referred to as zero-crossing (https://en.wikipedia.org/wiki/Zero_crossing<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FZero_crossing&data=02%7C01%7C%7C8dcac2e762bb4144536708d63796a4ea%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636757517479940251&sdata=A9cotIn3wMgg3c9tyKgorDYr7xwPrhRe3C%2B7cvqAIGU%3D&reserved=0>) (a special case). It is essentially detecting sign changes. Upcrossing count and rates are important statistics in time series (extreme value) analysis. Side note: Zero-crossing rate is by def the ratio between the second and zeroth moment of the power spectrum of the signal.

   - find_peaks_dc finds all peaks above an upcrossing level (or threshold) between two consecutive upcrossings, i.e. batches of peaks. Then the n-largest peaks are selected from each batch. Declustering is a technique often used to break down statistical dependency between peaks when performing extreme value analysis, and thus be able to use simpler distributions to describe them. As n-> inf find_peaks_dc -> find_peaks ? (Albeit in many practical situations, inf < 5). (Note that find_peaks_dc depends on find_upcross).

The code are just a few lines and mostly numpy array operations.

It sounds like you have implementations ready, could you link to them? (put in a git branch or a gist for example).

If it's just a few lines of code and not based on a publication, then I'm not sure we'd want to add these. We are interested in further peak finding improvements, however it should be clear for any new functions that they're an improvement over what we currently have. Otherwise I'm afraid we keep adding separate functions that all do a small subset of the spectrum of what users are interested in. E.g., from your description it's not clear how to treat zero crossings in the presence of noise.

Cheers,
Ralf
I think I misunderstood what you ment by reference; peak declustering methods dont necessarily have a reference paper, as the methods are rather self-explanatory. However, scientific papers and textbooks that use these methods are plenty! (Peaks declustering is indeed a very common technique in extreme value analysis. Just google "peak over threshold declustering") These are some of them:
   - Coles, An Introduction to Statistical Modeling of Extreme Values, (https://www.springer.com/us/book/9781852334598<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.springer.com%2Fus%2Fbook%2F9781852334598&data=02%7C01%7C%7C8dcac2e762bb4144536708d63796a4ea%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636757517479940251&sdata=3eLMnUz%2FpkBHrQJLDIdD3l915ev1bLIBr6alPs3UXAQ%3D&reserved=0>)
   - Davison, A. C., & Smith, R. L. (1990). Models for exceedances over high thresholds. Journal of the Royal Statistical Society. Series B (Methodological), 393-442.
   - Ferro, C. A. T. and Segers, J. (2003) Inference for clusters of extreme values. Journal of the Royal Statistical Society B, 65, 545--556.

I also note that peak declustering methods are available in R. (https://www.rdocumentation.org/packages/extRemes/versions/1.65/topics/decluster.runs<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rdocumentation.org%2Fpackages%2FextRemes%2Fversions%2F1.65%2Ftopics%2Fdecluster.runs&data=02%7C01%7C%7C8dcac2e762bb4144536708d63796a4ea%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636757517479940251&sdata=S1xObWC9s7P7YDiWOtAwgrCdSx5653EYaDbL3NErXhE%3D&reserved=0>)

Thanks, that all helps!


Yes, I have written a "small" package that can find up-crossings and one particular method for peaks declustering (https://github.com/4Subsea/evapy<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2F4Subsea%2Fevapy&data=02%7C01%7C%7C8dcac2e762bb4144536708d63796a4ea%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636757517479940251&sdata=BI8VFrergXTdfvIrke%2BhxdDgRexaIsUNTvw3QynSbqg%3D&reserved=0>). However, the functions in this package are rather limited in scope, limited to 1D arrays (performance optimized), and depends heavily on NumPy and SciPy. I was thinking about expanding the functionality anyway, and thought that I might do that by contributing to SciPy.

I propose to take an agile approach on this:

  *   I'm almost done re-writing base signal up/down-crossing module. I can make a pull-request to scipy in the coming days. (It will add similar functionality as argrelmax, argrelmin -> argupcross, argdowncross etc.) If you don't like it, we can stop it there.

This sounds good, always easier to talk about a feature when there's already code.

Cheers,
Ralf
Hello Ralf, just made a pull request with base functionality required.

  *   next step may be to add peaks declustering methods (whether by extending the current find_peaks or a new function dedicated for peaks declustering methods.)

Cheers,
Ali


Ali


_______________________________________________
SciPy-Dev mailing list
SciPy-Dev at python.org<mailto:SciPy-Dev at python.org>
https://mail.python.org/mailman/listinfo/scipy-dev<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman%2Flistinfo%2Fscipy-dev&data=02%7C01%7C%7C8dcac2e762bb4144536708d63796a4ea%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636757517479940251&sdata=pe3WWdtVq0XteEFjpd%2FrW6m1XTp6mZagWbjRpuEUmDA%3D&reserved=0>
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev at python.org<mailto:SciPy-Dev at python.org>
https://mail.python.org/mailman/listinfo/scipy-dev<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman%2Flistinfo%2Fscipy-dev&data=02%7C01%7C%7C8dcac2e762bb4144536708d63796a4ea%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636757517479940251&sdata=pe3WWdtVq0XteEFjpd%2FrW6m1XTp6mZagWbjRpuEUmDA%3D&reserved=0>
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev at python.org<mailto:SciPy-Dev at python.org>
https://mail.python.org/mailman/listinfo/scipy-dev<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman%2Flistinfo%2Fscipy-dev&data=02%7C01%7C%7C8dcac2e762bb4144536708d63796a4ea%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636757517479940251&sdata=pe3WWdtVq0XteEFjpd%2FrW6m1XTp6mZagWbjRpuEUmDA%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181022/98db4f88/attachment-0001.html>

From nobodyinperson at gmx.de  Wed Oct 24 03:46:02 2018
From: nobodyinperson at gmx.de (=?UTF-8?Q?Yann_B=c3=bcchau?=)
Date: Wed, 24 Oct 2018 09:46:02 +0200
Subject: [SciPy-Dev] Unable to build RPM package via setup.py bdist_rpm
Message-ID: <3dd08eac-2640-0961-487f-a3a908893273@gmx.de>

Hello everyone,

After countless attempts and a good portion of frustration I am now 
asking for help in this mailinglist.

I would like to build a numpy RPM package. There is the |setup.py 
bdist_rpm| command which does that.

My final goal is to build numpy for SailfishOS (mobile operating system) 
myself. (There are numpy packages on OpenRepos.net, but they are 
outdated and I need an up-to-date numpy to package an up-to-date 
matplotlib?)

Eventually, |python3 setup.py bdist_rpm| always fails with:

|executing numpy/core/code_generators/generate_numpy_api.py Running from 
numpy source directory. /usr/lib/python3.6/distutils/dist.py:261: 
UserWarning: Unknown distribution option: 'define_macros' 
warnings.warn(msg) error: [Errno 2] No such file or directory: 
'numpy/core/code_generators/../src/multiarray/arraytypes.c.src' |

I was able to boil the problem down to these files missing in the source 
tarball |build/bdist_rpm*/rpm/SOURCES/numpy-1.15.3.tar.gz|:

|numpy/linalg/umath_linalg.c.src 
numpy/core/src/multiarray/arraytypes.c.src 
numpy/core/src/multiarray/scalartypes.c.src 
numpy/core/src/umath/_umath_tests.c.src numpy/core/src/umath/loops.h.src 
numpy/core/src/multiarray/nditer_templ.c.src 
numpy/core/src/umath/_operand_flag_tests.c.src 
numpy/core/src/umath/_struct_ufunc_tests.c.src 
numpy/core/src/umath/_rational_tests.c.src 
numpy/core/src/umath/scalarmath.c.src 
numpy/core/src/multiarray/_multiarray_tests.c.src 
numpy/core/src/multiarray/lowlevel_strided_loops.c.src 
numpy/core/src/umath/funcs.inc.src numpy/core/src/umath/loops.c.src 
numpy/core/src/multiarray/einsum.c.src |

Interestingly, when I run |python3 setup.py sdist|, the created tarball 
under |dist/numpy-1.15.3.tar.gz| contains these files.

So my question is: What is going wrong here? The |bdist_rpm| log shows 
that it is |running sdist|, but these |*.src|-files are not included in 
this run. What is different in the |bdist_rpm| run?

I also was not able to just run |rpmbuild| directly due to strange 
errors like |error: Macro %__python has empty body| and the like.

The |python3 setup.py install| mechanism works well both on my Ubuntu 
18.04 and my Jolla Phone with SailfishOS. Just the |bdist_rpm| part does 
not work.

I would be very delighted if someone could help me with this.

Cheers,

Yann

?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181024/a42e90a7/attachment.html>

From tyler.je.reddy at gmail.com  Wed Oct 24 19:55:04 2018
From: tyler.je.reddy at gmail.com (Tyler Reddy)
Date: Wed, 24 Oct 2018 16:55:04 -0700
Subject: [SciPy-Dev] SciPy 1.2.0 release schedule
Message-ID: <CAHPuU_bDrBvN8cZvYiZbiQR3MYdNvwUpfSA1d3SYE3REsmm8HQ@mail.gmail.com>

Hi all,

It is almost 6 months after the 1.1.0 release on May 5, so probably time to
plan the 1.2.0 release. It would be a good idea to look over the PRs with a
1.2.0 milestone
<https://github.com/scipy/scipy/pulls?q=is%3Aopen+is%3Apr+milestone%3A1.2.0>,
and tag anything else that should have this milestone appropriately.

I'd like to propose the following schedule:

Nov. 5: branch 1.2.x
Nov. 8: rc1
Nov. 21: rc2 (if needed)
Nov. 30: final release

Thoughts?

Best wishes,
Tyler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181024/08cd34cd/attachment.html>

From ralf.gommers at gmail.com  Fri Oct 26 17:37:06 2018
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sat, 27 Oct 2018 10:37:06 +1300
Subject: [SciPy-Dev] SciPy 1.2.0 release schedule
In-Reply-To: <CAHPuU_bDrBvN8cZvYiZbiQR3MYdNvwUpfSA1d3SYE3REsmm8HQ@mail.gmail.com>
References: <CAHPuU_bDrBvN8cZvYiZbiQR3MYdNvwUpfSA1d3SYE3REsmm8HQ@mail.gmail.com>
Message-ID: <CABL7CQhC=gmPFjuk80X2fpqF+H8qRZ0GTB3S=f4zZybAJzj3yg@mail.gmail.com>

On Thu, Oct 25, 2018 at 12:55 PM Tyler Reddy <tyler.je.reddy at gmail.com>
wrote:

> Hi all,
>
> It is almost 6 months after the 1.1.0 release on May 5, so probably time
> to plan the 1.2.0 release. It would be a good idea to look over the PRs
> with a 1.2.0 milestone
> <https://github.com/scipy/scipy/pulls?q=is%3Aopen+is%3Apr+milestone%3A1.2.0>,
> and tag anything else that should have this milestone appropriately.
>
> I'd like to propose the following schedule:
>
> Nov. 5: branch 1.2.x
> Nov. 8: rc1
> Nov. 21: rc2 (if needed)
> Nov. 30: final release
>
> Thoughts?
>

This looks like a good schedule to me. We'll probably struggle to get some
PRs marked for 1.2.0 merged, but that's always the case.

Tyler, if you send me your PyPI username I can give you permissions to
create releases.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181027/ba8aa80e/attachment.html>

From mikofski at berkeley.edu  Fri Oct 26 22:03:03 2018
From: mikofski at berkeley.edu (Mark Alexander Mikofski)
Date: Fri, 26 Oct 2018 19:03:03 -0700
Subject: [SciPy-Dev] SciPy 1.2.0 release schedule
In-Reply-To: <CABL7CQhC=gmPFjuk80X2fpqF+H8qRZ0GTB3S=f4zZybAJzj3yg@mail.gmail.com>
References: <CAHPuU_bDrBvN8cZvYiZbiQR3MYdNvwUpfSA1d3SYE3REsmm8HQ@mail.gmail.com>
 <CABL7CQhC=gmPFjuk80X2fpqF+H8qRZ0GTB3S=f4zZybAJzj3yg@mail.gmail.com>
Message-ID: <CAEqRcW3Udj8JV52wHxqOMch4BprjuBg9MH8qSB55uJyg360XyQ@mail.gmail.com>

Hi Tyler and others,

Thanks for managing the v1.2 release.

I think PR #8431, Cython optimize zeros API, is ready, hopefully, to merge.
It's been through several rounds of reviews and I think I've accommodated
all of the recommendations, all tests are passing, and there's been strong
support. Anyone please take a look.

https://github.com/scipy/scipy/pull/8431

Thanks,
Mark

On Fri, Oct 26, 2018, 2:38 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Thu, Oct 25, 2018 at 12:55 PM Tyler Reddy <tyler.je.reddy at gmail.com>
> wrote:
>
>> Hi all,
>>
>> It is almost 6 months after the 1.1.0 release on May 5, so probably time
>> to plan the 1.2.0 release. It would be a good idea to look over the PRs
>> with a 1.2.0 milestone
>> <https://github.com/scipy/scipy/pulls?q=is%3Aopen+is%3Apr+milestone%3A1.2.0>,
>> and tag anything else that should have this milestone appropriately.
>>
>> I'd like to propose the following schedule:
>>
>> Nov. 5: branch 1.2.x
>> Nov. 8: rc1
>> Nov. 21: rc2 (if needed)
>> Nov. 30: final release
>>
>> Thoughts?
>>
>
> This looks like a good schedule to me. We'll probably struggle to get some
> PRs marked for 1.2.0 merged, but that's always the case.
>
> Tyler, if you send me your PyPI username I can give you permissions to
> create releases.
>
> Cheers,
> Ralf
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181026/af6084fd/attachment.html>

From josef.pktd at gmail.com  Sat Oct 27 15:41:22 2018
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 27 Oct 2018 15:41:22 -0400
Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices
Message-ID: <CAMMTP+Cwb3p8SzGe8TcbzPtimcuyC7wUkzyg5KTVQ0YyG9uhAA@mail.gmail.com>

I needed a more general version of a matrix square root for GAM in
statsmodels.
https://github.com/statsmodels/statsmodels/pull/5296

Is there already something like this in numpy/scipy land?
Would there be interest in adding something like this?
Improvements to my implementation?
(threshold is not in terms of rcond, because I have more intuition about
small eigen values.)

(I still don't consider myself to be a linalg expert.)

Josef

def matrix_sqrt(mat, inverse=False, full=False, nullspace=False,
threshold=1e-15):
    """matrix square root for symmetric matrices

    Usage is for decomposing a covariance function S into a square root R
    such that

        R' R = S if inverse is False, or
        R' R = pinv(S) if inverse is True

    Parameters
    ----------
    mat : array_like, 2-d square
        symmetric square matrix for which square root or inverse square
        root is computed.
        There is no checking for whether the matrix is symmetric.
        A warning is issued if some singular values are negative, i.e.
        below the negative of the threshold.
    inverse : bool
        If False (default), then the matrix square root is returned.
        If inverse is True, then the matrix square root of the inverse
        matrix is returned.
    full : bool
        If full is False (default, then the square root has reduce number
        of rows if the matrix is singular, i.e. has singular values below
        the threshold.
    nullspace: bool
        If nullspace is true, then the matrix square root of the null space
        of the matrix is returned.
    threshold : float
        Singular values below the threshold are dropped.

    Returns
    -------
    msqrt : ndarray
        matrix square root or square root of inverse matrix.

    """
    # see also scipy.linalg null_space
    u, s, v = np.linalg.svd(mat)
    if np.any(s < -threshold):
        import warnings
        warnings.warn('some singular values are negative')

    if not nullspace:
        mask = s > threshold
        s[s < threshold] = 0
    else:
        mask = s < threshold
        s[s > threshold] = 0

    sqrt_s = np.sqrt(s[mask])
    if inverse:
        sqrt_s = 1 / np.sqrt(s[mask])

    if full:
        b = np.dot(u[:, mask], np.dot(np.diag(sqrt_s), v[mask]))
    else:
        b = np.dot(np.diag(sqrt_s), v[mask])
    return b
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181027/0b053c31/attachment.html>

From phillip.m.feldman at gmail.com  Sat Oct 27 16:01:29 2018
From: phillip.m.feldman at gmail.com (Phillip Feldman)
Date: Sat, 27 Oct 2018 13:01:29 -0700
Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices
In-Reply-To: <CAMMTP+Cwb3p8SzGe8TcbzPtimcuyC7wUkzyg5KTVQ0YyG9uhAA@mail.gmail.com>
References: <CAMMTP+Cwb3p8SzGe8TcbzPtimcuyC7wUkzyg5KTVQ0YyG9uhAA@mail.gmail.com>
Message-ID: <CAB2ViTa2wWqBdVzW1EH0VBh=ed7yeRNJtkSFVU-21S5QP5RJYw@mail.gmail.com>

A matrix square-root would be very useful.  As noted here, the matrix
square-root is typically non-unique:

https://en.wikipedia.org/wiki/Square_root_of_a_matrix

Since NumPy has methods for eigenvalues, calculating a matrix square-root
via diagonalization should be straightforward.

Phillip

On Sat, Oct 27, 2018 at 12:42 PM <josef.pktd at gmail.com> wrote:

> I needed a more general version of a matrix square root for GAM in
> statsmodels.
> https://github.com/statsmodels/statsmodels/pull/5296
>
> Is there already something like this in numpy/scipy land?
> Would there be interest in adding something like this?
> Improvements to my implementation?
> (threshold is not in terms of rcond, because I have more intuition about
> small eigen values.)
>
> (I still don't consider myself to be a linalg expert.)
>
> Josef
>
> def matrix_sqrt(mat, inverse=False, full=False, nullspace=False, threshold=1e-15):
>     """matrix square root for symmetric matrices
>
>     Usage is for decomposing a covariance function S into a square root R
>     such that
>
>         R' R = S if inverse is False, or
>         R' R = pinv(S) if inverse is True
>
>     Parameters
>     ----------
>     mat : array_like, 2-d square
>         symmetric square matrix for which square root or inverse square
>         root is computed.
>         There is no checking for whether the matrix is symmetric.
>         A warning is issued if some singular values are negative, i.e.
>         below the negative of the threshold.
>     inverse : bool
>         If False (default), then the matrix square root is returned.
>         If inverse is True, then the matrix square root of the inverse
>         matrix is returned.
>     full : bool
>         If full is False (default, then the square root has reduce number
>         of rows if the matrix is singular, i.e. has singular values below
>         the threshold.
>     nullspace: bool
>         If nullspace is true, then the matrix square root of the null space
>         of the matrix is returned.
>     threshold : float
>         Singular values below the threshold are dropped.
>
>     Returns
>     -------
>     msqrt : ndarray
>         matrix square root or square root of inverse matrix.
>
>     """
>     # see also scipy.linalg null_space
>     u, s, v = np.linalg.svd(mat)
>     if np.any(s < -threshold):
>         import warnings
>         warnings.warn('some singular values are negative')
>
>     if not nullspace:
>         mask = s > threshold
>         s[s < threshold] = 0
>     else:
>         mask = s < threshold
>         s[s > threshold] = 0
>
>     sqrt_s = np.sqrt(s[mask])
>     if inverse:
>         sqrt_s = 1 / np.sqrt(s[mask])
>
>     if full:
>         b = np.dot(u[:, mask], np.dot(np.diag(sqrt_s), v[mask]))
>     else:
>         b = np.dot(np.diag(sqrt_s), v[mask])
>     return b
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181027/4a519a88/attachment.html>

From josef.pktd at gmail.com  Sat Oct 27 16:16:58 2018
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 27 Oct 2018 16:16:58 -0400
Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices
In-Reply-To: <CAB2ViTa2wWqBdVzW1EH0VBh=ed7yeRNJtkSFVU-21S5QP5RJYw@mail.gmail.com>
References: <CAMMTP+Cwb3p8SzGe8TcbzPtimcuyC7wUkzyg5KTVQ0YyG9uhAA@mail.gmail.com>
 <CAB2ViTa2wWqBdVzW1EH0VBh=ed7yeRNJtkSFVU-21S5QP5RJYw@mail.gmail.com>
Message-ID: <CAMMTP+ByByArMp=eE1wRmd5t3+Vx0+j8aDfwk+VVkx5ZsqpgYQ@mail.gmail.com>

On Sat, Oct 27, 2018 at 4:01 PM Phillip Feldman <phillip.m.feldman at gmail.com>
wrote:

> A matrix square-root would be very useful.  As noted here, the matrix
> square-root is typically non-unique:
>
> https://en.wikipedia.org/wiki/Square_root_of_a_matrix
>

Which basis is chosen is irrelevant in most use cases that I know. So the
non-uniqueness can be broken in a algorithm specific way. In the full rank
case I have only seen the use of either cholesky or svd.

statsmodels includes now the code from a python "rotations" package that
can be used when we want to have interpretable components as in principal
component or factor analysis.
However, in my current application the matrix sqrt eventually ends up in
quadratic forms or we go back to the original space for the interpretation.

Josef


>
>
> Since NumPy has methods for eigenvalues, calculating a matrix square-root
> via diagonalization should be straightforward.
>
> Phillip
>
> On Sat, Oct 27, 2018 at 12:42 PM <josef.pktd at gmail.com> wrote:
>
>> I needed a more general version of a matrix square root for GAM in
>> statsmodels.
>> https://github.com/statsmodels/statsmodels/pull/5296
>>
>> Is there already something like this in numpy/scipy land?
>> Would there be interest in adding something like this?
>> Improvements to my implementation?
>> (threshold is not in terms of rcond, because I have more intuition about
>> small eigen values.)
>>
>> (I still don't consider myself to be a linalg expert.)
>>
>> Josef
>>
>> def matrix_sqrt(mat, inverse=False, full=False, nullspace=False, threshold=1e-15):
>>     """matrix square root for symmetric matrices
>>
>>     Usage is for decomposing a covariance function S into a square root R
>>     such that
>>
>>         R' R = S if inverse is False, or
>>         R' R = pinv(S) if inverse is True
>>
>>     Parameters
>>     ----------
>>     mat : array_like, 2-d square
>>         symmetric square matrix for which square root or inverse square
>>         root is computed.
>>         There is no checking for whether the matrix is symmetric.
>>         A warning is issued if some singular values are negative, i.e.
>>         below the negative of the threshold.
>>     inverse : bool
>>         If False (default), then the matrix square root is returned.
>>         If inverse is True, then the matrix square root of the inverse
>>         matrix is returned.
>>     full : bool
>>         If full is False (default, then the square root has reduce number
>>         of rows if the matrix is singular, i.e. has singular values below
>>         the threshold.
>>     nullspace: bool
>>         If nullspace is true, then the matrix square root of the null space
>>         of the matrix is returned.
>>     threshold : float
>>         Singular values below the threshold are dropped.
>>
>>     Returns
>>     -------
>>     msqrt : ndarray
>>         matrix square root or square root of inverse matrix.
>>
>>     """
>>     # see also scipy.linalg null_space
>>     u, s, v = np.linalg.svd(mat)
>>     if np.any(s < -threshold):
>>         import warnings
>>         warnings.warn('some singular values are negative')
>>
>>     if not nullspace:
>>         mask = s > threshold
>>         s[s < threshold] = 0
>>     else:
>>         mask = s < threshold
>>         s[s > threshold] = 0
>>
>>     sqrt_s = np.sqrt(s[mask])
>>     if inverse:
>>         sqrt_s = 1 / np.sqrt(s[mask])
>>
>>     if full:
>>         b = np.dot(u[:, mask], np.dot(np.diag(sqrt_s), v[mask]))
>>     else:
>>         b = np.dot(np.diag(sqrt_s), v[mask])
>>     return b
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181027/51b41c61/attachment-0001.html>

From pav at iki.fi  Sat Oct 27 16:52:42 2018
From: pav at iki.fi (Pauli Virtanen)
Date: Sat, 27 Oct 2018 22:52:42 +0200
Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices
In-Reply-To: <CAMMTP+Cwb3p8SzGe8TcbzPtimcuyC7wUkzyg5KTVQ0YyG9uhAA@mail.gmail.com>
References: <CAMMTP+Cwb3p8SzGe8TcbzPtimcuyC7wUkzyg5KTVQ0YyG9uhAA@mail.gmail.com>
Message-ID: <4dab76941a12e5d7c946ceb23203da38b9643708.camel@iki.fi>

la, 2018-10-27 kello 15:41 -0400, josef.pktd at gmail.com kirjoitti:
> I needed a more general version of a matrix square root for GAM in
> statsmodels.
> 
> 
> Is there already something like this in numpy/scipy land?
> Would there be interest in adding something like this?
> Improvements to my implementation?
> (threshold is not in terms of rcond, because I have more intuition
> about small eigen values.)

For matrix square root there is:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.sqrtm.html
which IIRC uses a good algorithm.

But matrix square root R is the solution to R^2 = S --- the solution to
L' L = S is given by (conjugate of) Cholesky decomposition,
https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.linalg.cholesky.html
if I understand what you want correctly.

	Pauli

> 
> (I still don't consider myself to be a linalg expert.)
> 
> Josef
> 
> def matrix_sqrt(mat, inverse=False, full=False, nullspace=False,
> threshold=1e-15):
>     """matrix square root for symmetric matrices
> 
>     Usage is for decomposing a covariance function S into a square
> root R
>     such that
> 
>         R' R = S if inverse is False, or
>         R' R = pinv(S) if inverse is True
> 
>     Parameters
>     ----------
>     mat : array_like, 2-d square
>         symmetric square matrix for which square root or inverse
> square
>         root is computed.
>         There is no checking for whether the matrix is symmetric.
>         A warning is issued if some singular values are negative,
> i.e.
>         below the negative of the threshold.
>     inverse : bool
>         If False (default), then the matrix square root is returned.
>         If inverse is True, then the matrix square root of the
> inverse
>         matrix is returned.
>     full : bool
>         If full is False (default, then the square root has reduce
> number
>         of rows if the matrix is singular, i.e. has singular values
> below
>         the threshold.
>     nullspace: bool
>         If nullspace is true, then the matrix square root of the null
> space
>         of the matrix is returned.
>     threshold : float
>         Singular values below the threshold are dropped.
> 
>     Returns
>     -------
>     msqrt : ndarray
>         matrix square root or square root of inverse matrix.
> 
>     """
>     # see also scipy.linalg null_space
>     u, s, v = np.linalg.svd(mat)
>     if np.any(s < -threshold):
>         import warnings
>         warnings.warn('some singular values are negative')
> 
>     if not nullspace:
>         mask = s > threshold
>         s[s < threshold] = 0
>     else:
>         mask = s < threshold
>         s[s > threshold] = 0
> 
>     sqrt_s = np.sqrt(s[mask])
>     if inverse:
>         sqrt_s = 1 / np.sqrt(s[mask])
> 
>     if full:
>         b = np.dot(u[:, mask], np.dot(np.diag(sqrt_s), v[mask]))
>     else:
>         b = np.dot(np.diag(sqrt_s), v[mask])
>     return b
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev


From pav at iki.fi  Sat Oct 27 17:08:32 2018
From: pav at iki.fi (Pauli Virtanen)
Date: Sat, 27 Oct 2018 23:08:32 +0200
Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices
In-Reply-To: <4dab76941a12e5d7c946ceb23203da38b9643708.camel@iki.fi>
References: <CAMMTP+Cwb3p8SzGe8TcbzPtimcuyC7wUkzyg5KTVQ0YyG9uhAA@mail.gmail.com>
 <4dab76941a12e5d7c946ceb23203da38b9643708.camel@iki.fi>
Message-ID: <b604b0912d7f267c6bf6b0097e0eed03b73852e8.camel@iki.fi>

la, 2018-10-27 kello 22:52 +0200, Pauli Virtanen kirjoitti:
> la, 2018-10-27 kello 15:41 -0400, josef.pktd at gmail.com kirjoitti:
> > I needed a more general version of a matrix square root for GAM in
> > statsmodels.
> > 
> > 
> > Is there already something like this in numpy/scipy land?
> > Would there be interest in adding something like this?
> > Improvements to my implementation?
> > (threshold is not in terms of rcond, because I have more intuition
> > about small eigen values.)
> 
> For matrix square root there is:
> https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.sqrtm.html
> which IIRC uses a good algorithm.
> 
> But matrix square root R is the solution to R^2 = S --- the solution
> to
> L' L = S is given by (conjugate of) Cholesky decomposition,
> https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.linalg.cholesky.html
> if I understand what you want correctly.

If I now read the mail properly, for singular matrices (for which
Cholesky won't work), I guess LDL decomposition could also give a
starting point
https://scipy.github.io/devdocs/generated/scipy.linalg.ldl.html


> > (I still don't consider myself to be a linalg expert.)
> > 
> > Josef
> > 
> > def matrix_sqrt(mat, inverse=False, full=False, nullspace=False,
> > threshold=1e-15):
> >     """matrix square root for symmetric matrices
> > 
> >     Usage is for decomposing a covariance function S into a square
> > root R
> >     such that
> > 
> >         R' R = S if inverse is False, or
> >         R' R = pinv(S) if inverse is True
> > 
> >     Parameters
> >     ----------
> >     mat : array_like, 2-d square
> >         symmetric square matrix for which square root or inverse
> > square
> >         root is computed.
> >         There is no checking for whether the matrix is symmetric.
> >         A warning is issued if some singular values are negative,
> > i.e.
> >         below the negative of the threshold.
> >     inverse : bool
> >         If False (default), then the matrix square root is
> > returned.
> >         If inverse is True, then the matrix square root of the
> > inverse
> >         matrix is returned.
> >     full : bool
> >         If full is False (default, then the square root has reduce
> > number
> >         of rows if the matrix is singular, i.e. has singular values
> > below
> >         the threshold.
> >     nullspace: bool
> >         If nullspace is true, then the matrix square root of the
> > null
> > space
> >         of the matrix is returned.
> >     threshold : float
> >         Singular values below the threshold are dropped.
> > 
> >     Returns
> >     -------
> >     msqrt : ndarray
> >         matrix square root or square root of inverse matrix.
> > 
> >     """
> >     # see also scipy.linalg null_space
> >     u, s, v = np.linalg.svd(mat)
> >     if np.any(s < -threshold):
> >         import warnings
> >         warnings.warn('some singular values are negative')
> > 
> >     if not nullspace:
> >         mask = s > threshold
> >         s[s < threshold] = 0
> >     else:
> >         mask = s < threshold
> >         s[s > threshold] = 0
> > 
> >     sqrt_s = np.sqrt(s[mask])
> >     if inverse:
> >         sqrt_s = 1 / np.sqrt(s[mask])
> > 
> >     if full:
> >         b = np.dot(u[:, mask], np.dot(np.diag(sqrt_s), v[mask]))
> >     else:
> >         b = np.dot(np.diag(sqrt_s), v[mask])
> >     return b
> > _______________________________________________
> > SciPy-Dev mailing list
> > SciPy-Dev at python.org
> > https://mail.python.org/mailman/listinfo/scipy-dev
> 
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev


From josef.pktd at gmail.com  Sat Oct 27 18:40:58 2018
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 27 Oct 2018 18:40:58 -0400
Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices
In-Reply-To: <b604b0912d7f267c6bf6b0097e0eed03b73852e8.camel@iki.fi>
References: <CAMMTP+Cwb3p8SzGe8TcbzPtimcuyC7wUkzyg5KTVQ0YyG9uhAA@mail.gmail.com>
 <4dab76941a12e5d7c946ceb23203da38b9643708.camel@iki.fi>
 <b604b0912d7f267c6bf6b0097e0eed03b73852e8.camel@iki.fi>
Message-ID: <CAMMTP+CR5sv3Sqs2P9gh4p8xXwrV0nBZyNKVj4H-EwG7ShsMyg@mail.gmail.com>

On Sat, Oct 27, 2018 at 5:10 PM Pauli Virtanen <pav at iki.fi> wrote:

> la, 2018-10-27 kello 22:52 +0200, Pauli Virtanen kirjoitti:
> > la, 2018-10-27 kello 15:41 -0400, josef.pktd at gmail.com kirjoitti:
> > > I needed a more general version of a matrix square root for GAM in
> > > statsmodels.
> > >
> > >
> > > Is there already something like this in numpy/scipy land?
> > > Would there be interest in adding something like this?
> > > Improvements to my implementation?
> > > (threshold is not in terms of rcond, because I have more intuition
> > > about small eigen values.)
> >
> > For matrix square root there is:
> >
> https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.sqrtm.html
> > which IIRC uses a good algorithm.
>

I could never figure out a use for those.
AFAIU (9 or 10 years ago) they don't use the transpose
R' R = S
they use R R = S


> >
> > But matrix square root R is the solution to R^2 = S --- the solution
> > to
> > L' L = S is given by (conjugate of) Cholesky decomposition,
> >
> https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.linalg.cholesky.html
> > if I understand what you want correctly.
>
> If I now read the mail properly, for singular matrices (for which
> Cholesky won't work), I guess LDL decomposition could also give a
> starting point
> https://scipy.github.io/devdocs/generated/scipy.linalg.ldl.html
>
>
The matrix is in most cases singular in my current usage for penalized
splines.
(The null space is the subspace that is not penalized.)

statsmodels uses cholesky in almost all full rank cases.

Josef


>
> > > (I still don't consider myself to be a linalg expert.)
> > >
> > > Josef
> > >
> > > def matrix_sqrt(mat, inverse=False, full=False, nullspace=False,
> > > threshold=1e-15):
> > >     """matrix square root for symmetric matrices
> > >
> > >     Usage is for decomposing a covariance function S into a square
> > > root R
> > >     such that
> > >
> > >         R' R = S if inverse is False, or
> > >         R' R = pinv(S) if inverse is True
> > >
> > >     Parameters
> > >     ----------
> > >     mat : array_like, 2-d square
> > >         symmetric square matrix for which square root or inverse
> > > square
> > >         root is computed.
> > >         There is no checking for whether the matrix is symmetric.
> > >         A warning is issued if some singular values are negative,
> > > i.e.
> > >         below the negative of the threshold.
> > >     inverse : bool
> > >         If False (default), then the matrix square root is
> > > returned.
> > >         If inverse is True, then the matrix square root of the
> > > inverse
> > >         matrix is returned.
> > >     full : bool
> > >         If full is False (default, then the square root has reduce
> > > number
> > >         of rows if the matrix is singular, i.e. has singular values
> > > below
> > >         the threshold.
> > >     nullspace: bool
> > >         If nullspace is true, then the matrix square root of the
> > > null
> > > space
> > >         of the matrix is returned.
> > >     threshold : float
> > >         Singular values below the threshold are dropped.
> > >
> > >     Returns
> > >     -------
> > >     msqrt : ndarray
> > >         matrix square root or square root of inverse matrix.
> > >
> > >     """
> > >     # see also scipy.linalg null_space
> > >     u, s, v = np.linalg.svd(mat)
> > >     if np.any(s < -threshold):
> > >         import warnings
> > >         warnings.warn('some singular values are negative')
> > >
> > >     if not nullspace:
> > >         mask = s > threshold
> > >         s[s < threshold] = 0
> > >     else:
> > >         mask = s < threshold
> > >         s[s > threshold] = 0
> > >
> > >     sqrt_s = np.sqrt(s[mask])
> > >     if inverse:
> > >         sqrt_s = 1 / np.sqrt(s[mask])
> > >
> > >     if full:
> > >         b = np.dot(u[:, mask], np.dot(np.diag(sqrt_s), v[mask]))
> > >     else:
> > >         b = np.dot(np.diag(sqrt_s), v[mask])
> > >     return b
> > > _______________________________________________
> > > SciPy-Dev mailing list
> > > SciPy-Dev at python.org
> > > https://mail.python.org/mailman/listinfo/scipy-dev
> >
> > _______________________________________________
> > SciPy-Dev mailing list
> > SciPy-Dev at python.org
> > https://mail.python.org/mailman/listinfo/scipy-dev
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181027/c08ad8a3/attachment-0001.html>

From gael.varoquaux at normalesup.org  Sun Oct 28 03:59:06 2018
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Sun, 28 Oct 2018 08:59:06 +0100
Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices
In-Reply-To: <20181028075637.d2mhsekvxw6sswi7@phare.normalesup.org>
References: <CAMMTP+Cwb3p8SzGe8TcbzPtimcuyC7wUkzyg5KTVQ0YyG9uhAA@mail.gmail.com>
 <CAB2ViTa2wWqBdVzW1EH0VBh=ed7yeRNJtkSFVU-21S5QP5RJYw@mail.gmail.com>
 <CAMMTP+ByByArMp=eE1wRmd5t3+Vx0+j8aDfwk+VVkx5ZsqpgYQ@mail.gmail.com>
 <20181028075637.d2mhsekvxw6sswi7@phare.normalesup.org>
Message-ID: <20181028075906.p5zi3mhmrkwxduzl@phare.normalesup.org>

On Sun, Oct 28, 2018 at 08:56:37AM +0100, Gael Varoquaux wrote:
> using '@' to denote the matrix product, and S = np.diag(s) is the
> diagonal matrix of eigenvalues. The matrix square root is then given by:

>    sqrt(M) = U' @ np.diag(np.sqrt(s)) @ U

I forgot to say: this is the definition used by Joseph, in his original
post (so basically, I am backing his choice), with the only difference
that I would not use an SVD, but and "eigh", which should be faster and
more stable for SPD matrices (and non SPD matrices do not have a square
root).

G

From gael.varoquaux at normalesup.org  Sun Oct 28 03:56:37 2018
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Sun, 28 Oct 2018 08:56:37 +0100
Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices
In-Reply-To: <CAMMTP+ByByArMp=eE1wRmd5t3+Vx0+j8aDfwk+VVkx5ZsqpgYQ@mail.gmail.com>
References: <CAMMTP+Cwb3p8SzGe8TcbzPtimcuyC7wUkzyg5KTVQ0YyG9uhAA@mail.gmail.com>
 <CAB2ViTa2wWqBdVzW1EH0VBh=ed7yeRNJtkSFVU-21S5QP5RJYw@mail.gmail.com>
 <CAMMTP+ByByArMp=eE1wRmd5t3+Vx0+j8aDfwk+VVkx5ZsqpgYQ@mail.gmail.com>
Message-ID: <20181028075637.d2mhsekvxw6sswi7@phare.normalesup.org>

On Sat, Oct 27, 2018 at 04:16:58PM -0400, josef.pktd at gmail.com wrote:
>     A matrix square-root would be very useful.? As noted here, the matrix
>     square-root is typically non-unique:

>     https://en.wikipedia.org/wiki/Square_root_of_a_matrix

> Which basis is chosen is irrelevant in most use cases that I know.

In some case, the following considerations matter:

SDP matrices (symmetric definite positive matrices) form an algebraic
structure linked to a group. The square root matrix on this group should
be also SDP, hence it should be also symmetric. The easiest way to build
it is using the eigen-value decomposition of the original matrix:

    M = U' @ S @ U

using '@' to denote the matrix product, and S = np.diag(s) is the
diagonal matrix of eigenvalues. The matrix square root is then given by:

   sqrt(M) = U' @ np.diag(np.sqrt(s)) @ U

Unlike with using a Cholesky decomposition to obtain a square-root
matrix, the operation defined above is smooth.

Such considerations are typically important in signal processing, to
manipulate covariance matrices for whitening and averaging.

Ga?l

From ilhanpolat at gmail.com  Sun Oct 28 09:11:03 2018
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Sun, 28 Oct 2018 14:11:03 +0100
Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices
In-Reply-To: <20181028075906.p5zi3mhmrkwxduzl@phare.normalesup.org>
References: <CAMMTP+Cwb3p8SzGe8TcbzPtimcuyC7wUkzyg5KTVQ0YyG9uhAA@mail.gmail.com>
 <CAB2ViTa2wWqBdVzW1EH0VBh=ed7yeRNJtkSFVU-21S5QP5RJYw@mail.gmail.com>
 <CAMMTP+ByByArMp=eE1wRmd5t3+Vx0+j8aDfwk+VVkx5ZsqpgYQ@mail.gmail.com>
 <20181028075637.d2mhsekvxw6sswi7@phare.normalesup.org>
 <20181028075906.p5zi3mhmrkwxduzl@phare.normalesup.org>
Message-ID: <CAEBuzr8JUfPNNiMdZ4yE609mbH+dCS370-8zyr=KgZF5zgfFkw@mail.gmail.com>

This is covered by LDLt decomposition with an extra step of taking the
square root of each block in D. The economy mode of this would be removing
rows/cols 0 blocks from D and L. For the second case I think a polar
decomposition would be a better approach.

Calling these factors a square root might take you out of the common
terminology though

On Sun, Oct 28, 2018 at 9:00 AM Gael Varoquaux <
gael.varoquaux at normalesup.org> wrote:

> On Sun, Oct 28, 2018 at 08:56:37AM +0100, Gael Varoquaux wrote:
> > using '@' to denote the matrix product, and S = np.diag(s) is the
> > diagonal matrix of eigenvalues. The matrix square root is then given by:
>
> >    sqrt(M) = U' @ np.diag(np.sqrt(s)) @ U
>
> I forgot to say: this is the definition used by Joseph, in his original
> post (so basically, I am backing his choice), with the only difference
> that I would not use an SVD, but and "eigh", which should be faster and
> more stable for SPD matrices (and non SPD matrices do not have a square
> root).
>
> G
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181028/2771b857/attachment.html>

From josef.pktd at gmail.com  Sun Oct 28 10:02:07 2018
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 28 Oct 2018 10:02:07 -0400
Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices
In-Reply-To: <CAEBuzr8JUfPNNiMdZ4yE609mbH+dCS370-8zyr=KgZF5zgfFkw@mail.gmail.com>
References: <CAMMTP+Cwb3p8SzGe8TcbzPtimcuyC7wUkzyg5KTVQ0YyG9uhAA@mail.gmail.com>
 <CAB2ViTa2wWqBdVzW1EH0VBh=ed7yeRNJtkSFVU-21S5QP5RJYw@mail.gmail.com>
 <CAMMTP+ByByArMp=eE1wRmd5t3+Vx0+j8aDfwk+VVkx5ZsqpgYQ@mail.gmail.com>
 <20181028075637.d2mhsekvxw6sswi7@phare.normalesup.org>
 <20181028075906.p5zi3mhmrkwxduzl@phare.normalesup.org>
 <CAEBuzr8JUfPNNiMdZ4yE609mbH+dCS370-8zyr=KgZF5zgfFkw@mail.gmail.com>
Message-ID: <CAMMTP+BqtXsedjps5KrOqu_QxBO0iqXYiaA456W3UEbF3xVscw@mail.gmail.com>

On Sun, Oct 28, 2018 at 9:11 AM Ilhan Polat <ilhanpolat at gmail.com> wrote:

> This is covered by LDLt decomposition with an extra step of taking the
> square root of each block in D. The economy mode of this would be removing
> rows/cols 0 blocks from D and L. For the second case I think a polar
> decomposition would be a better approach.
>
> Calling these factors a square root might take you out of the common
> terminology though
>

It's common terminology in statistics and econometrics with notation like
R = S^{1/2} or Q = S^{-1/2}

(In a related case where statsmodels uses cholesky for regression we have
boring but descriptive names like
cholsigma and cholsigmainv )

It would be fine if there is a "linalg" name that is discoverable.
The point would be to have a function for when cholesky doesn't work and
that I don't have to figure out each time I need it.
I had also written a similar function as `matrix_half` in the past and
inlined it several times, but it's the first time that I tried to get a
reduced number of rows or columns for it.

My usecase is similar to np.linalg.pinv that I use very often because it is
convenient and simple even if working with the SVD directly would be
computationally more efficient for getting additional results.

Josef


>
> On Sun, Oct 28, 2018 at 9:00 AM Gael Varoquaux <
> gael.varoquaux at normalesup.org> wrote:
>
>> On Sun, Oct 28, 2018 at 08:56:37AM +0100, Gael Varoquaux wrote:
>> > using '@' to denote the matrix product, and S = np.diag(s) is the
>> > diagonal matrix of eigenvalues. The matrix square root is then given by:
>>
>> >    sqrt(M) = U' @ np.diag(np.sqrt(s)) @ U
>>
>> I forgot to say: this is the definition used by Joseph, in his original
>> post (so basically, I am backing his choice), with the only difference
>> that I would not use an SVD, but and "eigh", which should be faster and
>> more stable for SPD matrices (and non SPD matrices do not have a square
>> root).
>>
>> G
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181028/8b5d4813/attachment.html>

From josef.pktd at gmail.com  Sun Oct 28 10:08:26 2018
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 28 Oct 2018 10:08:26 -0400
Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices
In-Reply-To: <CAMMTP+BqtXsedjps5KrOqu_QxBO0iqXYiaA456W3UEbF3xVscw@mail.gmail.com>
References: <CAMMTP+Cwb3p8SzGe8TcbzPtimcuyC7wUkzyg5KTVQ0YyG9uhAA@mail.gmail.com>
 <CAB2ViTa2wWqBdVzW1EH0VBh=ed7yeRNJtkSFVU-21S5QP5RJYw@mail.gmail.com>
 <CAMMTP+ByByArMp=eE1wRmd5t3+Vx0+j8aDfwk+VVkx5ZsqpgYQ@mail.gmail.com>
 <20181028075637.d2mhsekvxw6sswi7@phare.normalesup.org>
 <20181028075906.p5zi3mhmrkwxduzl@phare.normalesup.org>
 <CAEBuzr8JUfPNNiMdZ4yE609mbH+dCS370-8zyr=KgZF5zgfFkw@mail.gmail.com>
 <CAMMTP+BqtXsedjps5KrOqu_QxBO0iqXYiaA456W3UEbF3xVscw@mail.gmail.com>
Message-ID: <CAMMTP+DsR-u3HX4cXuTeW+x2336qU+Z3juPPUJjhr07XOd=KDQ@mail.gmail.com>

On Sun, Oct 28, 2018 at 10:02 AM <josef.pktd at gmail.com> wrote:

>
>
> On Sun, Oct 28, 2018 at 9:11 AM Ilhan Polat <ilhanpolat at gmail.com> wrote:
>
>> This is covered by LDLt decomposition with an extra step of taking the
>> square root of each block in D. The economy mode of this would be removing
>> rows/cols 0 blocks from D and L. For the second case I think a polar
>> decomposition would be a better approach.
>>
>> Calling these factors a square root might take you out of the common
>> terminology though
>>
>
> It's common terminology in statistics and econometrics with notation like
> R = S^{1/2} or Q = S^{-1/2}
>
> (In a related case where statsmodels uses cholesky for regression we have
> boring but descriptive names like
> cholsigma and cholsigmainv )
>
> It would be fine if there is a "linalg" name that is discoverable.
> The point would be to have a function for when cholesky doesn't work and
> that I don't have to figure out each time I need it.
> I had also written a similar function as `matrix_half` in the past and
> inlined it several times, but it's the first time that I tried to get a
> reduced number of rows or columns for it.
>

As extra:
This time I also needed to figure out the null space version for
matrix_sqrt.
I don't use it yet, but the R package (mgcv) that I compare with uses it
for additional penalization.

Josef


>
> My usecase is similar to np.linalg.pinv that I use very often because it
> is convenient and simple even if working with the SVD directly would be
> computationally more efficient for getting additional results.
>
> Josef
>
>
>>
>> On Sun, Oct 28, 2018 at 9:00 AM Gael Varoquaux <
>> gael.varoquaux at normalesup.org> wrote:
>>
>>> On Sun, Oct 28, 2018 at 08:56:37AM +0100, Gael Varoquaux wrote:
>>> > using '@' to denote the matrix product, and S = np.diag(s) is the
>>> > diagonal matrix of eigenvalues. The matrix square root is then given
>>> by:
>>>
>>> >    sqrt(M) = U' @ np.diag(np.sqrt(s)) @ U
>>>
>>> I forgot to say: this is the definition used by Joseph, in his original
>>> post (so basically, I am backing his choice), with the only difference
>>> that I would not use an SVD, but and "eigh", which should be faster and
>>> more stable for SPD matrices (and non SPD matrices do not have a square
>>> root).
>>>
>>> G
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at python.org
>>> https://mail.python.org/mailman/listinfo/scipy-dev
>>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181028/97d0d54a/attachment-0001.html>

From pav at iki.fi  Sun Oct 28 10:12:43 2018
From: pav at iki.fi (Pauli Virtanen)
Date: Sun, 28 Oct 2018 15:12:43 +0100
Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices
In-Reply-To: <20181028075637.d2mhsekvxw6sswi7@phare.normalesup.org>
References: <CAMMTP+Cwb3p8SzGe8TcbzPtimcuyC7wUkzyg5KTVQ0YyG9uhAA@mail.gmail.com>
 <CAB2ViTa2wWqBdVzW1EH0VBh=ed7yeRNJtkSFVU-21S5QP5RJYw@mail.gmail.com>
 <CAMMTP+ByByArMp=eE1wRmd5t3+Vx0+j8aDfwk+VVkx5ZsqpgYQ@mail.gmail.com>
 <20181028075637.d2mhsekvxw6sswi7@phare.normalesup.org>
Message-ID: <5520e7cc6fa3eea0da449a51ca3e24804f132786.camel@iki.fi>

su, 2018-10-28 kello 08:56 +0100, Gael Varoquaux kirjoitti:
> On Sat, Oct 27, 2018 at 04:16:58PM -0400, josef.pktd at gmail.com wrote:
> >     A matrix square-root would be very useful.  As noted here, the
> > matrix
> >     square-root is typically non-unique:
> >     https://en.wikipedia.org/wiki/Square_root_of_a_matrix
> > Which basis is chosen is irrelevant in most use cases that I know.
> 
> In some case, the following considerations matter:
> 
> SDP matrices (symmetric definite positive matrices) form an algebraic
> structure linked to a group. The square root matrix on this group
> should
> be also SDP, hence it should be also symmetric. 

Right indeed if R=R' the two equations are the same. scipy.linalg.sqrtm
computes the principal matrix square root, which then should be SDP
too.

It can handle singular cases. I would guess the algorithm in the
presence of eigenvalues on the negative real line gives "continuous"
behavior, i.e., you get the principal square root of a matrix with
eigenvalues pushed to either side of the branch cut in some
(uncontrolled) way giving either of the +/- i sqrt(|z|) and the result
probably is close to Hermitian still in the same way as the eigenvalue
decomposition.

So for the question whether there's something to add to scipy here, I'm
not so sure --- computation of the principal sqrtm, Cholesky, LDLt, and
eigenvalue decomposition is there.

	Pauli


> The easiest way to build
> it is using the eigen-value decomposition of the original matrix:
> 
>     M = U' @ S @ U
> 
> using '@' to denote the matrix product, and S = np.diag(s) is the
> diagonal matrix of eigenvalues. The matrix square root is then given
> by:
> 
>    sqrt(M) = U' @ np.diag(np.sqrt(s)) @ U
> 
> Unlike with using a Cholesky decomposition to obtain a square-root
> matrix, the operation defined above is smooth.
> 
> Such considerations are typically important in signal processing, to
> manipulate covariance matrices for whitening and averaging.
> 
> Ga?l
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev


From gael.varoquaux at normalesup.org  Mon Oct 29 06:34:14 2018
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Mon, 29 Oct 2018 11:34:14 +0100
Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices
In-Reply-To: <5520e7cc6fa3eea0da449a51ca3e24804f132786.camel@iki.fi>
References: <CAMMTP+Cwb3p8SzGe8TcbzPtimcuyC7wUkzyg5KTVQ0YyG9uhAA@mail.gmail.com>
 <CAB2ViTa2wWqBdVzW1EH0VBh=ed7yeRNJtkSFVU-21S5QP5RJYw@mail.gmail.com>
 <CAMMTP+ByByArMp=eE1wRmd5t3+Vx0+j8aDfwk+VVkx5ZsqpgYQ@mail.gmail.com>
 <20181028075637.d2mhsekvxw6sswi7@phare.normalesup.org>
 <5520e7cc6fa3eea0da449a51ca3e24804f132786.camel@iki.fi>
Message-ID: <20181029103414.fuyh3ygxwfdpsqcp@phare.normalesup.org>

On Sun, Oct 28, 2018 at 03:12:43PM +0100, Pauli Virtanen wrote:
> Right indeed if R=R' the two equations are the same. scipy.linalg.sqrtm
> computes the principal matrix square root, which then should be SDP
> too.

Indeed, it probably works for our needs. I had forgotten about this
function. Thank you.

> So for the question whether there's something to add to scipy here, I'm
> not so sure --- computation of the principal sqrtm, Cholesky, LDLt, and
> eigenvalue decomposition is there.

Me neither. The question is whether using eigh is more numerically stable
than the schur decomposition used in scipy.linalg.sqrtm. I do not know, I
must admit.

Ga?l

From rlucente at pipeline.com  Mon Oct 29 18:01:09 2018
From: rlucente at pipeline.com (rlucente at pipeline.com)
Date: Mon, 29 Oct 2018 18:01:09 -0400
Subject: [SciPy-Dev] Using Spark to scale SciPy?
Message-ID: <047c01d46fd2$e6514c90$b2f3e5b0$@pipeline.com>

I ran into a blog post titled

 
Prediction at Scale with scikit-learn and PySpark Pandas UDFs by Michael
Heilman

 
https://medium.com/civis-analytics/prediction-at-scale-with-scikit-learn-and
-pyspark-pandas-udfs-51d5ebfb2cd8

 
It seems to do a good job because it makes statements like

 
One issue is that passing data between 

a) Java-based Spark execution processes, which send data between machines
and can perform transformations super-efficiently, 

and 

b) a Python process (e.g., for predicting with scikit-learn) 

incurs some overhead due to serialization and inter-process communication. 

 
The article goes on to mention approaches to mitigate the above issues.

 
For example: Having UDFs expect Pandas Series also saves converting between
Python and NumPy floating point representations for scikit-learn

 
There was actually a SciPy talk on this

Performing Dimension Reduction at Scale with Applications to Public
Sentiment Models | SciPy 2018

https://www.youtube.com/watch?v=31YeSfDklfc

 
The other good thing is that it has code

 
https://gist.github.com/mheilman/6ce261549b55bf4997ec102ad4e8d643#file-pyspa
rk_pandas_udf_sklearn-ipynb

 
I was wondering what approaches people have used to scale SciPy?

 
The Bit Plumber <http://rlucente.blogspot.com/> 

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181029/33272bbb/attachment.html>

From rth.yurchak at pm.me  Tue Oct 30 19:56:15 2018
From: rth.yurchak at pm.me (Roman Yurchak)
Date: Tue, 30 Oct 2018 23:56:15 +0000
Subject: [SciPy-Dev] building scipy for WebAssembly
Message-ID: <dh-ffbiwb602aNm-jI8_s6am6HIQUI3cbwVgbWGkUdsuNVkSxZHtbHJjQNpigIfEhIfoHHDVTYJCbcKOJlrBppn9hOKxweq-WzduJSFX5b8=@pm.me>

Hello,

I am currently working on building scipy for WebAssembly as part of 
pyodide project (https://github.com/iodide-project/pyodide) and I was 
hoping for some feedback on that process. There is a preliminary build 
in https://github.com/iodide-project/pyodide/pull/211

Currently, this build uses scipy 0.17.1 as from what I understood, that 
was one of the last versions that only included f77 without any f90 
(https://github.com/scipy/scipy/issues/2829#issuecomment-223764054). In 
the WebAssembly environment there is currently no reliably working 
Fortran compiler (https://github.com/iodide-project/pyodide/issues/184), 
and f90 cannot be converted to C with f2c unlike f77.

If one wanted to (experimentally) compile the latest version of scipy 
without a Fortran compiler what would be your suggestions? i.e.
  - are there any alternatives to f2c that you think might work for f90
  - or any automatic converter f90 to f77 (so that f2c could be used)
I did search but maybe I missed something.
Alternatively, if that's really not realistic, could someone please 
comment on the rate of adoption for f90/f95 in the scipy code base? The 
decision to support f90 was taken before the 0.18 release in 2016 but 
I'm not sure what impact it had on the code base. In other words maybe 
there is a later version than 0.17.1 that might (mostly) work?

Another point is linking BLAS/LAPACK. Currently reference BLAS and 
CLAPACK are linked statically as I haven't managed to do this 
dynamically yet. The issue is the package size. As LAPACK, which is 
quite large, gets repeatedly included in around ~10 different .so 
modules, resulting in a 170MB large package (after compression), as 
opposed to ~30MB compressed package without BLAS/LAPACK. That is quite 
problematic when one is expected to download the dependencies at each 
page load (excluding caching). I'm not sure if there are other 
distributions of scipy that use static linking of LAPACK? or other 
things worth trying to reduce the package size, short of trying to make 
dynamic linking work or try to detect and strip unused symbols?

Also in scipy/linalg/setup.py I was wondering why/how the ATLAS_INFO 
macro defined the existence of `scipy.linalg._clapack`?  For instance, 
when using CLAPACK (with libf2c), would the following be correct?

     lapack_opt = {'libraries': ['f2c', 'blas', 'lapack'],
                   'include_dirs': [],
                   'library_dirs': ['<path_to_blas_lapack_lib>'],
                   'language': 'f77',
                   'define_macros': [('NO_ATLAS_INFO', 1),
                                     ('HAVE_CBLAS', None)]}

which does not appear to build `scipy.linalg._clapack`.

Finally, in later scipy versions, 
https://scipy.github.io/devdocs/building/linux.html suggests that with 
?export LAPACK=? one can build scipy without LAPACK, but then _flapack 
is still a mandatory import in scipy.linalg.__init___ (via .lapack) as 
far as I can tell. Is scipy.linalg (and anything that depends on it) 
expected to produce import errors in the latter case then?

In general, any comments or suggestions would be appreciated,

Thank you,

Roman


From tom.w.augspurger at gmail.com  Wed Oct 31 15:26:51 2018
From: tom.w.augspurger at gmail.com (Tom Augspurger)
Date: Wed, 31 Oct 2018 12:26:51 -0700
Subject: [SciPy-Dev] Using Spark to scale SciPy?
In-Reply-To: <047c01d46fd2$e6514c90$b2f3e5b0$@pipeline.com>
References: <047c01d46fd2$e6514c90$b2f3e5b0$@pipeline.com>
Message-ID: <CAE1aY-kO7b_i5KjjMU-ugvNQa7yxghr5RS_EKBUNWxmG3NNHCA@mail.gmail.com>

On Mon, Oct 29, 2018 at 3:17 PM <rlucente at pipeline.com> wrote:

> I ran into a blog post titled
>
>
>
> Prediction at Scale with scikit-learn and PySpark Pandas UDFs by Michael
> Heilman
>
>
>
>
> https://medium.com/civis-analytics/prediction-at-scale-with-scikit-learn-and-pyspark-pandas-udfs-51d5ebfb2cd8
>
>
>
> It seems to do a good job because it makes statements like
>
>
>
> One issue is that passing data between
>
> a) Java-based Spark execution processes, which send data between machines
> and can perform transformations super-efficiently,
>
> and
>
> b) a Python process (e.g., for predicting with scikit-learn)
>
> incurs some overhead due to serialization and inter-process communication.
>
>
>
> The article goes on to mention approaches to mitigate the above issues.
>
>
>
> For example: Having UDFs expect Pandas Series also saves converting
> between Python and NumPy floating point representations for scikit-learn
>
>
>
> There was actually a SciPy talk on this
>
> Performing Dimension Reduction at Scale with Applications to Public
> Sentiment Models | SciPy 2018
>
> https://www.youtube.com/watch?v=31YeSfDklfc
>
>
>
> The other good thing is that it has code
>
>
>
>
> https://gist.github.com/mheilman/6ce261549b55bf4997ec102ad4e8d643#file-pyspark_pandas_udf_sklearn-ipynb
>
>
>
> I was wondering what approaches people have used to scale SciPy?
>

http://examples.dask.org/machine-learning/parallel-prediction.html has an
example similar to the civil blog post, but uses Dask instead of Spark.


> The Bit Plumber <http://rlucente.blogspot.com/>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20181031/9132fe99/attachment.html>