From ilhanpolat at gmail.com Wed Oct 3 15:25:42 2018 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Wed, 3 Oct 2018 21:25:42 +0200 Subject: [SciPy-Dev] Scipy "misc" and "interpolate" function deprecations Message-ID: Hi everyone, As given in the docstrings of many functions http://scipy.github.io/devdocs/misc.html we have deprecated the functions in the misc module and the PR is ready. https://github.com/scipy/scipy/pull/9325 Evgeni also reminded the "interpolate" funcs which are deprecated even earlier and also to give a heads-up to the mailing list. If you have any objections about the schedule or any particular detail please let me know so that we can postpone this to 1.3.0 if need be in a timely manner. Best, ilhan -------------- next part -------------- An HTML attachment was scrubbed... URL: From email.the.odonnells at gmail.com Tue Oct 9 03:39:02 2018 From: email.the.odonnells at gmail.com (Kane & Anna O'Donnell) Date: Tue, 9 Oct 2018 20:39:02 +1300 Subject: [SciPy-Dev] Improving spatial.distance code and performance In-Reply-To: <30c6c7dd-fbe6-4cc2-feac-1d5dbbb96b26@gmail.com> References: <30c6c7dd-fbe6-4cc2-feac-1d5dbbb96b26@gmail.com> Message-ID: OK, after a bit of playing around, it actually looks like faiss supports much of what I was intending (including auto-tuning etc.). It's only the l2 norm / dot product, but I figure those cover most use cases anyway. So maybe I'll submit some PRs to piece-by-piece migrate scipy.spatial to cython etc. On Sun, 16 Sep 2018 at 20:26, Kane & Anna O'Donnell < email.the.odonnells at gmail.com> wrote: > Sorry, flann and faiss were just examples (I haven't actually researched > different libraries in depth). > > > ... approximate distances are probably best left to another package it > looks like to me ... If you want to get really fancy, I'd lean towards a > separate package. > > OK, I want to try things which it sounds like scipy won't support, so, > decision made: I'll aim to create a new package. If I actually do it, and > if it's actually 'good', then there'll be a better discussion point for > integrating it (if at all). > > This older discussion on Flann may be relevant: > https://mail.python.org/pipermail/scipy-dev/2011-May/thread.html. It says > Flann only does Euclidean; not sure if that has changed since then. > Regarding a dependency: https://github.com/mariusmuja/flann is basically > inactivate for the last years; we wouldn't depend on it but could consider > vendoring it. However, probably not worth it if it's for one method inside > euclidean only. > > Faiss is still actively developed: > https://github.com/facebookresearch/faiss, and looks like a much better > option than Flann. However, something fast-moving like that which itself > depends on BLAS and has GPU code in it too is not something we'd like to > depend on nor want to vendor. > > Either way, Flann/Faiss is not about a 1:1 Cython translation, but about > new features. We've got the distance metrics; approximate distances are > probably best left to another package it looks like to me. > > On Mon, Sep 10, 2018 at 10:05 AM Mark Alexander Mikofski < > mikofski at berkeley.edu> wrote: > >> I'm very interested to see how a successful cython/performance PR >> progresses from a reviewers standpoint. >> >> On Sun, Sep 9, 2018, 5:10 PM Tyler Reddy >> wrote: >> >>> Good to see some activity / interest in spatial. >>> >>> Definitely agree with Ralf's github comments re: using smaller / more >>> tractable PRs -- it really is tough to sit down at night with 30 minutes of >>> free time or whatever and look at a massive diff & not want to give up. >>> >>> I like the idea of making small / targeted / clearly demonstrated >>> performance improvements without overhauling the entire infrastructure >>> first, but maybe that's a controversial view if it just causes too much >>> heterogeneity. >>> >>> Presumably the affected code is all thoroughly covered by unit tests? >>> That's an important pre-requisite to have the confidence to really start >>> making changes, esp. with older low-level code like that. >>> >>> On Thu, 6 Sep 2018 at 02:29, Kane & Anna O'Donnell < >>> email.the.odonnells at gmail.com> wrote: >>> >>>> Hi all, >>>> >>>> TLDR; I'm wondering about a) porting the spatial.distance code to >>>> Cython, and b) adding some performance optimizations, and I'd like the dev >>>> community's input/feedback. >>>> >>>> For context (though I don't really need you to read these to give the >>>> feedback I'm looking for), >>>> >>>> - original issue/proposal: https://github.com/scipy/scipy/issues/9205 >>>> - PR: https://github.com/scipy/scipy/pull/9218 >>>> >>>> Before submitting the PR, I naively thought it was going to be nice >>>> Cython or similar. Turns out it's some pretty old code, that I found pretty >>>> hard to wrap my head around and understand. I eventually figured it out >>>> after spending ages debugging a nastily hidden 'bug', and demonstrated the >>>> performance optimizations, but it prompted the discussion about whether it >>>> was best to port everything to Cython first. >>>> >>>> *Existing stuff to Cython* >>>> >>>> Doing so shouldn't be too hard, and it shouldn't change any >>>> functionality, except to replace the distance functions with their Cython >>>> ones (instead of the current approach, where the distance functions are >>>> actually numpy things, and there's not supported access to the underlying C >>>> stuff). A few possible 'bugs' (as above) should hopefully become non-issues >>>> too. So, it'd be a win for performance (e.g. all the distance functions >>>> will be much faster), and code quality, and future maintainability and >>>> development. However, things to think about: >>>> >>>> - should I just replace like-for-like, or consider some nicer OOP stuff >>>> like e.g. sklearn's approach (which is already Cython)? >>>> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/neighbors/dist_metrics.pyx >>>> (I'm guessing the reason they rolled their own was because they found scipy >>>> lacking, as above.) In fact, we could largely just copy that file over. Not >>>> sure about the interplay between scipy and scikit learn though. >>>> >>> > It wouldn't be the first time that code is moved from scikit-learn to > scipy, that could make sense. Would be good to see if that makes sense from > scikit-learn's point of view. > > >> - what's the best way to structure a PR? One metric at a time, or the >>>> whole caboodle? >>>> >>> > How about one metric first (easier to review), and then the rest in one go? > > >>>> *Adding performance optimizations and future functionality* >>>> >>>> As per the links, this was initially about providing a nifty >>>> performance hack. It should still be pretty easy to implement. Personally, >>>> I think it makes sense to implement after the Cythonization - unless the >>>> community are against that. >>>> >>>> However, there are other possibilities: >>>> >>>> - using 'indices' within calculations. E.g. when doing pdist, it might >>>> pay to use a tree of some description. I also proposed another 'index' to >>>> optimize the 'bail early' approach further (which would, I think, actually >>>> work well with trees too). This would involve more API changes, possibly >>>> significant. >>>> >>> - using approximate searches (e.g. Faiss). My understanding is that >>>> depending on other libraries probably isn't really an option, so I'm not >>>> sure what means. >>>> - maybe other cool approaches like https://github.com/droyed/eucl_dist >>>> - providing a way to 'tune' distance computations to be optimal to your >>>> particular dataset and constraints (e.g. my 'bail early' optimization might >>>> be a lot faster or a bit slower, depending on your data ... or you might be >>>> OK with approximate matching with a 'low' error rate, etc.) >>>> >>> > I think some of these are an option; they'd need to be applicable to all > distance metrics though and not just euclidean or a small subset. In the > mailing list thread I linked to above there was some discussion as well > about using kdtree/balltree. > > >>>> I guess what I'd like to see is a single place where users can get >>>> access to everything related to distance metrics and their uses, including >>>> all sorts of optimizations etc. (possibly for different hardware, and >>>> possibly distributed). To do that well is a pretty big undertaking, and I >>>> don't know whether it's suited to scipy - e.g. maybe scipy doesn't really >>>> care about distance stuff, or only wants to stick with 'simple' distance >>>> metric cases (e.g. a few thousand vectors, etc.). So, maybe it'd be better >>>> to start a completely new python package - which would probably be a lot >>>> easier to develop as I'd have a lot more flexibility (e.g. to depend on >>>> other packages, and not have to worry about breaking the scipy API etc.). >>>> On the other hand (as discussed in the latest comment on the PR), that >>>> might not be best - it might never get used/maintained etc. >>>> >>> > If you want to get really fancy, I'd lean towards a separate package. The > idea of your current PR is in scopy for scipy.spatial though. We'd also be > happy to link to a separate package from the scipy docs. > > Cheers, > Ralf > > > >>>> So, what does the community think is the best approach? I've got too >>>> little context of what scipy is and what it's aiming for, and I don't want >>>> to head off on the wrong tack. Comments on any of the other implied >>>> questions would also be appreciated. >>>> >>>> Thanks, >>>> >>>> kodonnell >>>> >>>> >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at python.org >>>> https://mail.python.org/mailman/listinfo/scipy-dev >>>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at python.org >>> https://mail.python.org/mailman/listinfo/scipy-dev >>> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > > > _______________________________________________ > SciPy-Dev mailing listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From phillip.m.feldman at gmail.com Sun Oct 14 14:19:32 2018 From: phillip.m.feldman at gmail.com (Phillip Feldman) Date: Sun, 14 Oct 2018 11:19:32 -0700 Subject: [SciPy-Dev] Poisson Disk Sampling Message-ID: Does anyone have code that does efficient subrandom sampling of the surface of a sphere? I'm looking, e.g., for an implementation of the algorithm in https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf, or something similar. Thanks! Phillip -------------- next part -------------- An HTML attachment was scrubbed... URL: From andyfaff at gmail.com Sun Oct 14 16:56:23 2018 From: andyfaff at gmail.com (Andrew Nelson) Date: Sun, 14 Oct 2018 22:56:23 +0200 Subject: [SciPy-Dev] Poisson Disk Sampling In-Reply-To: References: Message-ID: Is http://mathworld.wolfram.com/SpherePointPicking.html relevant? This seems relatively simple. On Sun., 14 Oct. 2018, 20:20 Phillip Feldman, wrote: > Does anyone have code that does efficient subrandom sampling of the > surface of a sphere? I'm looking, e.g., for an implementation of the > algorithm in > https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf, > or something similar. > > Thanks! > > Phillip > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tyler.je.reddy at gmail.com Sun Oct 14 19:31:33 2018 From: tyler.je.reddy at gmail.com (Tyler Reddy) Date: Sun, 14 Oct 2018 16:31:33 -0700 Subject: [SciPy-Dev] Poisson Disk Sampling In-Reply-To: References: Message-ID: Yeah, that's an excellent page--many of the approaches are implemented on Stack Overflow too -- I used a few of them while developing SphericalVoronoi. On Sun, 14 Oct 2018 at 13:57, Andrew Nelson wrote: > Is http://mathworld.wolfram.com/SpherePointPicking.html relevant? This > seems relatively simple. > > On Sun., 14 Oct. 2018, 20:20 Phillip Feldman, > wrote: > >> Does anyone have code that does efficient subrandom sampling of the >> surface of a sphere? I'm looking, e.g., for an implementation of the >> algorithm in >> https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf, >> or something similar. >> >> Thanks! >> >> Phillip >> >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Sun Oct 14 19:46:40 2018 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 14 Oct 2018 19:46:40 -0400 Subject: [SciPy-Dev] Poisson Disk Sampling In-Reply-To: References: Message-ID: On 10/14/18, Andrew Nelson wrote: > Is http://mathworld.wolfram.com/SpherePointPicking.html relevant? This > seems relatively simple. Phillip is asking about *subrandom* samples, also known as low-discrepancy or quasi-random samples; see https://en.wikipedia.org/wiki/Low-discrepancy_sequence. Warren > > On Sun., 14 Oct. 2018, 20:20 Phillip Feldman, > wrote: > >> Does anyone have code that does efficient subrandom sampling of the >> surface of a sphere? I'm looking, e.g., for an implementation of the >> algorithm in >> https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf, >> or something similar. >> >> Thanks! >> >> Phillip >> >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > From tyler.je.reddy at gmail.com Sun Oct 14 23:00:57 2018 From: tyler.je.reddy at gmail.com (Tyler Reddy) Date: Sun, 14 Oct 2018 20:00:57 -0700 Subject: [SciPy-Dev] Poisson Disk Sampling In-Reply-To: References: Message-ID: Ah, maybe that's another story then On Sun, 14 Oct 2018 at 16:47, Warren Weckesser wrote: > On 10/14/18, Andrew Nelson wrote: > > Is http://mathworld.wolfram.com/SpherePointPicking.html relevant? This > > seems relatively simple. > > > Phillip is asking about *subrandom* samples, also known as > low-discrepancy or quasi-random samples; see > https://en.wikipedia.org/wiki/Low-discrepancy_sequence. > > Warren > > > > > > On Sun., 14 Oct. 2018, 20:20 Phillip Feldman, < > phillip.m.feldman at gmail.com> > > wrote: > > > >> Does anyone have code that does efficient subrandom sampling of the > >> surface of a sphere? I'm looking, e.g., for an implementation of the > >> algorithm in > >> https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf > , > >> or something similar. > >> > >> Thanks! > >> > >> Phillip > >> > >> > >> > >> _______________________________________________ > >> SciPy-Dev mailing list > >> SciPy-Dev at python.org > >> https://mail.python.org/mailman/listinfo/scipy-dev > >> > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oneday2one at icloud.com Mon Oct 15 05:53:57 2018 From: oneday2one at icloud.com (Jon Stein) Date: Mon, 15 Oct 2018 09:53:57 +0000 (GMT) Subject: [SciPy-Dev] two new scipy.stats requests code included: Message-ID: <34111503-1aff-46d5-a4ea-8bb8fd912b94@me.com> Scipy-dev, Two additions to the scipy.stats module are missing and needed: One addition is needed for a one sample z-test including confidence interval when the population mean and standard deviation are known: def ztest(array_A, population_mean, population_stdv, level_of_confidence~example: .95): ? ? z_statistic = (array_A.mean() - population_stdv) / (population_stdv / math.sqrt(len(array_A))) ? ? p_value = (st.norm.cdf(z_stat)) ? ? standard_error = population_stdv / math.sqrt(len(array_A)) ? ? margin_of_error = st.norm.ppf(level_of_confidence) * standard_error ? ? MoE = margin_of_error ? ? return('z statistic =', z_statistic, 'p-value =', p_value, array_A.mean() - MoE, array_A.mean() + MoE) And one addition is needed for a one-sample z-test for a categorical sample (*not quantitative*): def ztest_1sample_categorical(sample_proportion, population_proportion, sample_size): ? ? sp, pp = sample_proportion, population_proportion ? ? z = (sp - pp) / math.sqrt((pp * (1 - pp)) / sample_size) ? ? p = st.norm.cdf(z) ? ? return('z statistic =', z, 'p value =', p) Let me know what you think. Jon Stein -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmhobson at gmail.com Mon Oct 15 12:45:30 2018 From: pmhobson at gmail.com (Paul Hobson) Date: Mon, 15 Oct 2018 09:45:30 -0700 Subject: [SciPy-Dev] two new scipy.stats requests code included: In-Reply-To: <34111503-1aff-46d5-a4ea-8bb8fd912b94@me.com> References: <34111503-1aff-46d5-a4ea-8bb8fd912b94@me.com> Message-ID: Hey Jon, To incorporate this into scipy, you'll need to open a pull request on GitHub: https://github.com/scipy/scipy I'm not a scipy contributor, but I can tell you that you'll also need to include tests that preferably use a (small) published dataset and confirm that your function reproduce the published results. Also, I don't think your return statements are behaving the way you think they are. I believe that the preference is now to return a NamedTuple. Hope that helps, -Paul On Mon, Oct 15, 2018 at 2:54 AM Jon Stein wrote: > Scipy-dev, > > Two additions to the scipy.stats module are missing and needed: > > One addition is needed for a one sample z-test including confidence > interval when the population mean and standard deviation are known: > > def ztest(array_A, population_mean, population_stdv, level_of_confidence~*example: > .95*): > z_statistic = (array_A.mean() - population_stdv) / (population_stdv / > math.sqrt(len(array_A))) > p_value = (st.norm.cdf(z_stat)) > standard_error = population_stdv / math.sqrt(len(array_A)) > margin_of_error = st.norm.ppf(level_of_confidence) * standard_error > MoE = margin_of_error > return('z statistic =', z_statistic, 'p-value =', p_value, > array_A.mean() - MoE, array_A.mean() + MoE) > > And one addition is needed for a one-sample z-test for a categorical > sample (*not quantitative*): > > def ztest_1sample_categorical(sample_proportion, population_proportion, > sample_size): > sp, pp = sample_proportion, population_proportion > z = (sp - pp) / math.sqrt((pp * (1 - pp)) / sample_size) > p = st.norm.cdf(z) > return('z statistic =', z, 'p value =', p) > > Let me know what you think. > Jon Stein > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Oct 15 13:10:07 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 15 Oct 2018 13:10:07 -0400 Subject: [SciPy-Dev] two new scipy.stats requests code included: In-Reply-To: References: <34111503-1aff-46d5-a4ea-8bb8fd912b94@me.com> Message-ID: On Mon, Oct 15, 2018 at 12:45 PM Paul Hobson wrote: > Hey Jon, > > To incorporate this into scipy, you'll need to open a pull request on > GitHub: > https://github.com/scipy/scipy > > I'm not a scipy contributor, but I can tell you that you'll also need to > include tests that preferably use a (small) published dataset and confirm > that your function reproduce the published results. > > Also, I don't think your return statements are behaving the way you think > they are. I believe that the preference is now to return a NamedTuple. > > Hope that helps, > -Paul > > > > On Mon, Oct 15, 2018 at 2:54 AM Jon Stein wrote: > >> Scipy-dev, >> >> Two additions to the scipy.stats module are missing and needed: >> >> One addition is needed for a one sample z-test including confidence >> interval when the population mean and standard deviation are known: >> >> def ztest(array_A, population_mean, population_stdv, level_of_confidence~*example: >> .95*): >> z_statistic = (array_A.mean() - population_stdv) / (population_stdv / >> math.sqrt(len(array_A))) >> p_value = (st.norm.cdf(z_stat)) >> standard_error = population_stdv / math.sqrt(len(array_A)) >> margin_of_error = st.norm.ppf(level_of_confidence) * standard_error >> MoE = margin_of_error >> return('z statistic =', z_statistic, 'p-value =', p_value, >> array_A.mean() - MoE, array_A.mean() + MoE) >> >> And one addition is needed for a one-sample z-test for a categorical >> sample (*not quantitative*): >> >> def ztest_1sample_categorical(sample_proportion, population_proportion, >> sample_size): >> sp, pp = sample_proportion, population_proportion >> z = (sp - pp) / math.sqrt((pp * (1 - pp)) / sample_size) >> p = st.norm.cdf(z) >> return('z statistic =', z, 'p value =', p) >> >> Let me know what you think. >> Jon Stein >> > I think some discussion and decisions are needed for whether and how to add this. None of the hypothesis test currently returns a confidence interval. Tuples are a pain because we cannot just return additional results without breaking backwards compatibility. Both ztests are based on summary statistics, for which scipy.stats has already some cases. Adding special cases like ztest_1sample_categorical opens up a large set of statistical functions that could similarly be added, e.g. for poisson rates. Additionally some tests have a choice of methods across stats package, e.g. using pp corresponds to a score test (variance under the Null). And alternative is to use variance based on sp, which corresponds to a Wald test. In the statsmodels version there is an extra option, but it doesn't have the correct default. For a two sample version for comparing proportions, the number of options and available methods becomes much larger. (Development for this in statsmodels is slow because I only find time every once in a while to review or prepare PRs https://github.com/statsmodels/statsmodels/pull/4829 ) I think some overlap in basic statistics functions between scipy.stats and statsmodels is useful. However, the question where to draw the boundary is always open. Josef > >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oneday2one at icloud.com Mon Oct 15 20:25:44 2018 From: oneday2one at icloud.com (Jon Stein) Date: Tue, 16 Oct 2018 00:25:44 +0000 (GMT) Subject: [SciPy-Dev] two new scipy.stats requests code included: Message-ID: <6be6b0e2-d1d1-4540-9656-da76c3d9ce06@me.com> Thank you for both your replies. ?I am very grateful. ? There is no basic z-test or confidence interval in StatsModels or Scipy.stats. ?I was forced to create this to pass two recent statistics courses I took online with Standford and Carnegie Mellon. The first is for quantitative data. ?The second is for Qualitative (categorical) data. def ztest(array1, sample_size, array2.mean(), array2.std(), confidence_desired): ? ? z = (array2.mean() - array2.mean()) / (array2.std() / math.sqrt(len(array))) ? ? p = (st.norm.cdf(z_stat)) ? ? standard_error = array2.std() / math.sqrt(len(array1)) ? ? Margin_of_Error = st.norm.ppf(confidence_desired) * array2.std() / math.sqrt(len(array1)) ? ? return(z, p, array1.mean() - Margin_of_Error, array1.mean() + Margin_of_Error) def ztest_categorical(proportion1, proportion2, proportion1_sample_size): ? ? z = (proportion1 - proportion2) / math.sqrt(proportion2 * (1 - proportion2)) / proportion1_sample_size) ? ? p = st.norm.cdf(z) ? ? return(z,p) let me know what you think. Jon Stein ? ?? On Oct 15, 2018, at 01:11 PM, josef.pktd at gmail.com wrote: On Mon, Oct 15, 2018 at 12:45 PM Paul Hobson wrote: Hey Jon, To incorporate this into scipy, you'll need to open a pull request on GitHub: https://github.com/scipy/scipy I'm not a scipy contributor, but I can tell you that you'll also need to include tests that preferably use a (small) published dataset and confirm that your function reproduce the published results. Also, I don't think your return statements are behaving the way you think they are. I believe that the preference is now to return a NamedTuple. Hope that helps, -Paul On Mon, Oct 15, 2018 at 2:54 AM Jon Stein wrote: Scipy-dev, Two additions to the scipy.stats module are missing and needed: One addition is needed for a one sample z-test including confidence interval when the population mean and standard deviation are known: def ztest(array_A, population_mean, population_stdv, level_of_confidence~example: .95): ? ? z_statistic = (array_A.mean() - population_stdv) / (population_stdv / math.sqrt(len(array_A))) ? ? p_value = (st.norm.cdf(z_stat)) ? ? standard_error = population_stdv / math.sqrt(len(array_A)) ? ? margin_of_error = st.norm.ppf(level_of_confidence) * standard_error ? ? MoE = margin_of_error ? ? return('z statistic =', z_statistic, 'p-value =', p_value, array_A.mean() - MoE, array_A.mean() + MoE) And one addition is needed for a one-sample z-test for a categorical sample (*not quantitative*): def ztest_1sample_categorical(sample_proportion, population_proportion, sample_size): ? ? sp, pp = sample_proportion, population_proportion ? ? z = (sp - pp) / math.sqrt((pp * (1 - pp)) / sample_size) ? ? p = st.norm.cdf(z) ? ? return('z statistic =', z, 'p value =', p) Let me know what you think. Jon Stein I think some discussion and decisions are needed for whether and how to add this. None of the hypothesis test currently returns a confidence interval. Tuples are a pain because we cannot just return additional results without breaking backwards compatibility. Both ztests are based on summary statistics, for which scipy.stats has already some cases. Adding special cases like ztest_1sample_categorical opens up a large set of statistical functions that could similarly be added, e.g. for poisson rates. Additionally some tests have a choice of methods across stats package, e.g. using pp corresponds to a score test (variance under the Null). And alternative is to use variance based on sp, which corresponds to a Wald test. In the statsmodels version there is an extra option, but it doesn't have the correct default. For a two sample version for comparing proportions, the number of options and available methods becomes much larger. (Development for this in statsmodels is slow because I only find time every once in a while to review or prepare PRs https://github.com/statsmodels/statsmodels/pull/4829 ) I think some overlap in basic statistics functions between scipy.stats and statsmodels is useful. However, the question where to draw the boundary is always open. Josef ? _______________________________________________ SciPy-Dev mailing list SciPy-Dev at python.org https://mail.python.org/mailman/listinfo/scipy-dev _______________________________________________ SciPy-Dev mailing list SciPy-Dev at python.org https://mail.python.org/mailman/listinfo/scipy-dev _______________________________________________ SciPy-Dev mailing list SciPy-Dev at python.org https://mail.python.org/mailman/listinfo/scipy-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Oct 17 20:42:32 2018 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 17 Oct 2018 17:42:32 -0700 Subject: [SciPy-Dev] Poisson Disk Sampling In-Reply-To: References: Message-ID: This article describes a new quasirandom scheme that is easy and efficient to implement, and works nicely on the surface of a sphere through transformation: http://extremelearning.com.au/unreasonable-effectiveness-of-quasirandom-sequences/ The transformation should be applicable to any (quasi)random scheme that generates numbers uniformly over [0,1]^2. On Sun, Oct 14, 2018 at 11:20 AM Phillip Feldman < phillip.m.feldman at gmail.com> wrote: > Does anyone have code that does efficient subrandom sampling of the > surface of a sphere? I'm looking, e.g., for an implementation of the > algorithm in > https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf, > or something similar. > > Thanks! > > Phillip > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From phillip.m.feldman at gmail.com Thu Oct 18 00:35:24 2018 From: phillip.m.feldman at gmail.com (Phillip Feldman) Date: Wed, 17 Oct 2018 21:35:24 -0700 Subject: [SciPy-Dev] Poisson Disk Sampling In-Reply-To: References: Message-ID: This is indeed very interesting. Thanks! P.S. I don't know of a clean mapping between [0, 1]^2 and the surface of the sphere. (This is a problem that cartographers have struggled with for a few hundred years). But, there is a simple mapping from [-1, 1]^3 to the surface of the sphere, so I will explore that. On Wed, Oct 17, 2018 at 5:43 PM Robert Kern wrote: > This article describes a new quasirandom scheme that is easy and efficient > to implement, and works nicely on the surface of a sphere through > transformation: > > > http://extremelearning.com.au/unreasonable-effectiveness-of-quasirandom-sequences/ > > The transformation should be applicable to any (quasi)random scheme that > generates numbers uniformly over [0,1]^2. > > On Sun, Oct 14, 2018 at 11:20 AM Phillip Feldman < > phillip.m.feldman at gmail.com> wrote: > >> Does anyone have code that does efficient subrandom sampling of the >> surface of a sphere? I'm looking, e.g., for an implementation of the >> algorithm in >> https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf, >> or something similar. >> >> Thanks! >> >> Phillip >> >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > > > -- > Robert Kern > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Oct 18 00:43:52 2018 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 17 Oct 2018 21:43:52 -0700 Subject: [SciPy-Dev] Poisson Disk Sampling In-Reply-To: References: Message-ID: On Wed, Oct 17, 2018 at 9:36 PM Phillip Feldman wrote: > This is indeed very interesting. Thanks! > > P.S. I don't know of a clean mapping between [0, 1]^2 and the surface of > the sphere. (This is a problem that cartographers have struggled with for > a few hundred years). But, there is a simple mapping from [-1, 1]^3 to the > surface of the sphere, so I will explore that. > See the section "Quasirandom Points on a sphere" in that article for the details. > On Wed, Oct 17, 2018 at 5:43 PM Robert Kern wrote: > >> This article describes a new quasirandom scheme that is easy and >> efficient to implement, and works nicely on the surface of a sphere through >> transformation: >> >> >> http://extremelearning.com.au/unreasonable-effectiveness-of-quasirandom-sequences/ >> >> The transformation should be applicable to any (quasi)random scheme that >> generates numbers uniformly over [0,1]^2. >> >> On Sun, Oct 14, 2018 at 11:20 AM Phillip Feldman < >> phillip.m.feldman at gmail.com> wrote: >> >>> Does anyone have code that does efficient subrandom sampling of the >>> surface of a sphere? I'm looking, e.g., for an implementation of the >>> algorithm in >>> https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf, >>> or something similar. >>> >>> Thanks! >>> >>> Phillip >>> >>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at python.org >>> https://mail.python.org/mailman/listinfo/scipy-dev >>> >> >> >> -- >> Robert Kern >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From phillip.m.feldman at gmail.com Thu Oct 18 01:32:49 2018 From: phillip.m.feldman at gmail.com (Phillip Feldman) Date: Wed, 17 Oct 2018 22:32:49 -0700 Subject: [SciPy-Dev] Poisson Disk Sampling In-Reply-To: References: Message-ID: I should have read the whole thing. The equal-area projection does indeed do the job. (Conformality is unnecessary for this application). Thanks again! On Wed, Oct 17, 2018 at 9:44 PM Robert Kern wrote: > On Wed, Oct 17, 2018 at 9:36 PM Phillip Feldman < > phillip.m.feldman at gmail.com> wrote: > >> This is indeed very interesting. Thanks! >> >> P.S. I don't know of a clean mapping between [0, 1]^2 and the surface of >> the sphere. (This is a problem that cartographers have struggled with for >> a few hundred years). But, there is a simple mapping from [-1, 1]^3 to the >> surface of the sphere, so I will explore that. >> > > See the section "Quasirandom Points on a sphere" in that article for the > details. > > >> On Wed, Oct 17, 2018 at 5:43 PM Robert Kern >> wrote: >> >>> This article describes a new quasirandom scheme that is easy and >>> efficient to implement, and works nicely on the surface of a sphere through >>> transformation: >>> >>> >>> http://extremelearning.com.au/unreasonable-effectiveness-of-quasirandom-sequences/ >>> >>> The transformation should be applicable to any (quasi)random scheme that >>> generates numbers uniformly over [0,1]^2. >>> >>> On Sun, Oct 14, 2018 at 11:20 AM Phillip Feldman < >>> phillip.m.feldman at gmail.com> wrote: >>> >>>> Does anyone have code that does efficient subrandom sampling of the >>>> surface of a sphere? I'm looking, e.g., for an implementation of the >>>> algorithm in >>>> https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf, >>>> or something similar. >>>> >>>> Thanks! >>>> >>>> Phillip >>>> >>>> >>>> >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at python.org >>>> https://mail.python.org/mailman/listinfo/scipy-dev >>>> >>> >>> >>> -- >>> Robert Kern >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at python.org >>> https://mail.python.org/mailman/listinfo/scipy-dev >>> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > > > -- > Robert Kern > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Thu Oct 18 01:44:42 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Wed, 17 Oct 2018 22:44:42 -0700 Subject: [SciPy-Dev] Poisson Disk Sampling In-Reply-To: References: Message-ID: <16685b45d10.27ae.acf34a9c767d7bb498a799333be0433e@fastmail.com> We have a few other schemes here: https://github.com/fperez/spheredwi/blob/master/src/point_dist.py And charged particles can be found in DiPy. Not the sampling you mention, but perhaps helpful in that context. St?fan On October 17, 2018 17:43:27 Robert Kern wrote: > This article describes a new quasirandom scheme that is easy and efficient > to implement, and works nicely on the surface of a sphere through > transformation: > > http://extremelearning.com.au/unreasonable-effectiveness-of-quasirandom-sequences/ > > The transformation should be applicable to any (quasi)random scheme that > generates numbers uniformly over [0,1]^2. > > On Sun, Oct 14, 2018 at 11:20 AM Phillip Feldman > wrote: > Does anyone have code that does efficient subrandom sampling of the surface > of a sphere? I'm looking, e.g., for an implementation of the algorithm in > https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf, or > something similar. > > Thanks! > > Phillip > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > > > -- > Robert Kern > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From ali.cetin at outlook.com Thu Oct 18 06:40:50 2018 From: ali.cetin at outlook.com (Ali Cetin) Date: Thu, 18 Oct 2018 10:40:50 +0000 Subject: [SciPy-Dev] Finding signal upcrossings and declustered peaks Message-ID: Hi, this is my first attempt to write to this mailing list, so if I'm doing something wrong please bear with me. I've been working with extreme value analysis during my PhD and also in my current job. What we often do is to find signal upcrossings (such as mean and zero upcrossings) and find declustered peaks. That is, find the largest peak between two upcrossings, or the two largest peaks between two upcrossings etc... I'm proposing to add two new functions to the scipy.signal submodule: - find_upcross; takes in signal and returns index of upcrossings wrt user defined upcrossing level - find_peaks_dc; takes in signal and returns index of peaks (n largest) (find_peaks_dc is not necessarly easily incorporated into find_peaks, so it may be cleaner to have a separate function). Any thought on this? Cheers, Ali -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Thu Oct 18 15:59:02 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 18 Oct 2018 19:59:02 +0000 Subject: [SciPy-Dev] Finding signal upcrossings and declustered peaks In-Reply-To: References: Message-ID: On Thu, Oct 18, 2018 at 10:41 AM Ali Cetin wrote: > Hi, > > this is my first attempt to write to this mailing list, so if I'm doing > something wrong please bear with me. > Hi Ali, no worries you got it all right. welcome:) > > I've been working with extreme value analysis during my PhD and also in my > current job. What we often do is to find signal upcrossings (such as mean > and zero upcrossings) and find declustered peaks. That is, find the largest > peak between two upcrossings, or the two largest peaks between two > upcrossings etc... > > I'm proposing to add two new functions to the scipy.signal submodule: > - find_upcross; takes in signal and returns index of upcrossings wrt > user defined upcrossing level > - find_peaks_dc; takes in signal and returns index of peaks (n largest) > > (find_peaks_dc is not necessarly easily incorporated into find_peaks, so > it may be cleaner to have a separate function). > > Any thought on this? > Do you have any references for the algorithms? Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ali.cetin at outlook.com Thu Oct 18 16:52:55 2018 From: ali.cetin at outlook.com (Ali Cetin) Date: Thu, 18 Oct 2018 20:52:55 +0000 Subject: [SciPy-Dev] Finding signal upcrossings and declustered peaks In-Reply-To: References: , Message-ID: ________________________________ From: SciPy-Dev on behalf of Ralf Gommers Sent: Thursday, October 18, 2018 21:59 To: SciPy Developers List Subject: Re: [SciPy-Dev] Finding signal upcrossings and declustered peaks On Thu, Oct 18, 2018 at 10:41 AM Ali Cetin > wrote: Hi, this is my first attempt to write to this mailing list, so if I'm doing something wrong please bear with me. Hi Ali, no worries you got it all right. welcome:) I've been working with extreme value analysis during my PhD and also in my current job. What we often do is to find signal upcrossings (such as mean and zero upcrossings) and find declustered peaks. That is, find the largest peak between two upcrossings, or the two largest peaks between two upcrossings etc... I'm proposing to add two new functions to the scipy.signal submodule: - find_upcross; takes in signal and returns index of upcrossings wrt user defined upcrossing level - find_peaks_dc; takes in signal and returns index of peaks (n largest) (find_peaks_dc is not necessarly easily incorporated into find_peaks, so it may be cleaner to have a separate function). Any thought on this? Do you have any references for the algorithms? Well, the "algorithms" are rather straight forward and heuristic. - find_upcross is often referred to as zero-crossing (https://en.wikipedia.org/wiki/Zero_crossing) (a special case). It is essentially detecting sign changes. Upcrossing count and rates are important statistics in time series (extreme value) analysis. Side note: Zero-crossing rate is by def the ratio between the second and zeroth moment of the power spectrum of the signal. - find_peaks_dc finds all peaks above an upcrossing level (or threshold) between two consecutive upcrossings, i.e. batches of peaks. Then the n-largest peaks are selected from each batch. Declustering is a technique often used to break down statistical dependency between peaks when performing extreme value analysis, and thus be able to use simpler distributions to describe them. As n-> inf find_peaks_dc -> find_peaks ? (Albeit in many practical situations, inf < 5). (Note that find_peaks_dc depends on find_upcross). The code are just a few lines and mostly numpy array operations. Ali -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Thu Oct 18 16:48:52 2018 From: daniele at grinta.net (Daniele Nicolodi) Date: Thu, 18 Oct 2018 14:48:52 -0600 Subject: [SciPy-Dev] Strange code in scipy.signal.decimate Message-ID: <795609c2-501f-6932-7013-2f2e2f38b996@grinta.net> Hello, I was having a look at the scipy.signal.decimate() function and I noticed some strange looking code: elif ftype == 'iir': if n is None: n = 8 system = dlti(*cheby1(n, 0.05, 0.8 / q)) b, a = system.num, system.den This is used to setup the anti aliasing low pass filter. What I don't understand is the dance to obtain the IIR numerator and denominator coefficients. Couldn't the above be simply as the code below? elif ftype == 'iir': if n is None: n = 8 b, a = cheby1(n, 0.05, 0.8 / q) Am I missing something? Thanks! Cheers, Dan From pmhobson at gmail.com Thu Oct 18 17:28:47 2018 From: pmhobson at gmail.com (Paul Hobson) Date: Thu, 18 Oct 2018 14:28:47 -0700 Subject: [SciPy-Dev] Strange code in scipy.signal.decimate In-Reply-To: <795609c2-501f-6932-7013-2f2e2f38b996@grinta.net> References: <795609c2-501f-6932-7013-2f2e2f38b996@grinta.net> Message-ID: Dan, I might be missing something, but dlti returns a subclass of LinearTimeInvariant. Point is, it's not a tuple, but a class with multiple attributes and method. Simple tuple unpacking won't likely work unless the authors of LinearTimeInvariant really went out of their way to make it so. -Paul On Thu, Oct 18, 2018 at 1:58 PM Daniele Nicolodi wrote: > Hello, > > I was having a look at the scipy.signal.decimate() function and I > noticed some strange looking code: > > elif ftype == 'iir': > if n is None: > n = 8 > system = dlti(*cheby1(n, 0.05, 0.8 / q)) > b, a = system.num, system.den > > This is used to setup the anti aliasing low pass filter. What I don't > understand is the dance to obtain the IIR numerator and denominator > coefficients. Couldn't the above be simply as the code below? > > elif ftype == 'iir': > if n is None: > n = 8 > b, a = cheby1(n, 0.05, 0.8 / q) > > Am I missing something? > > Thanks! > > Cheers, > Dan > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Thu Oct 18 17:49:00 2018 From: daniele at grinta.net (Daniele Nicolodi) Date: Thu, 18 Oct 2018 15:49:00 -0600 Subject: [SciPy-Dev] Strange code in scipy.signal.decimate In-Reply-To: References: <795609c2-501f-6932-7013-2f2e2f38b996@grinta.net> Message-ID: <72c45861-1e27-d2c1-28b6-4f88cb208064@grinta.net> On 18-10-2018 15:28, Paul Hobson wrote: > Dan, > > I might be missing something, but dlti returns a subclass of > LinearTimeInvariant. Point is, it's not a tuple, but a class with > multiple attributes and method. Simple tuple unpacking won't likely work > unless the authors of LinearTimeInvariant really went out of their way > to make it so. The point is not to go through the `dlti` class at all. Please look at the replacement code I posted: it works. Cheers, Dan > -Paul > > On Thu, Oct 18, 2018 at 1:58 PM Daniele Nicolodi > wrote: > > Hello, > > I was having a look at the scipy.signal.decimate() function and I > noticed some strange looking code: > > ? ? elif ftype == 'iir': > ? ? ? ? if n is None: > ? ? ? ? ? ? n = 8 > ? ? ? ? system = dlti(*cheby1(n, 0.05, 0.8 / q)) > ? ? ? ? b, a = system.num, system.den > > This is used to setup the anti aliasing low pass filter. What I don't > understand is the dance to obtain the IIR numerator and denominator > coefficients. Couldn't the above be simply as the code below? > > ? ? elif ftype == 'iir': > ? ? ? ? if n is None: > ? ? ? ? ? ? n = 8 > ? ? ? ? b, a = cheby1(n, 0.05, 0.8 / q) > > Am I missing something? > > Thanks! > > Cheers, > Dan > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > From pmhobson at gmail.com Thu Oct 18 18:30:24 2018 From: pmhobson at gmail.com (Paul Hobson) Date: Thu, 18 Oct 2018 15:30:24 -0700 Subject: [SciPy-Dev] Strange code in scipy.signal.decimate In-Reply-To: <72c45861-1e27-d2c1-28b6-4f88cb208064@grinta.net> References: <795609c2-501f-6932-7013-2f2e2f38b996@grinta.net> <72c45861-1e27-d2c1-28b6-4f88cb208064@grinta.net> Message-ID: If you make the changes and run the test suite, do the pertinent tests pass? -Paul On Thu, Oct 18, 2018 at 2:49 PM Daniele Nicolodi wrote: > On 18-10-2018 15:28, Paul Hobson wrote: > > Dan, > > > > I might be missing something, but dlti returns a subclass of > > LinearTimeInvariant. Point is, it's not a tuple, but a class with > > multiple attributes and method. Simple tuple unpacking won't likely work > > unless the authors of LinearTimeInvariant really went out of their way > > to make it so. > > The point is not to go through the `dlti` class at all. > > Please look at the replacement code I posted: it works. > > Cheers, > Dan > > > -Paul > > > > On Thu, Oct 18, 2018 at 1:58 PM Daniele Nicolodi > > wrote: > > > > Hello, > > > > I was having a look at the scipy.signal.decimate() function and I > > noticed some strange looking code: > > > > elif ftype == 'iir': > > if n is None: > > n = 8 > > system = dlti(*cheby1(n, 0.05, 0.8 / q)) > > b, a = system.num, system.den > > > > This is used to setup the anti aliasing low pass filter. What I don't > > understand is the dance to obtain the IIR numerator and denominator > > coefficients. Couldn't the above be simply as the code below? > > > > elif ftype == 'iir': > > if n is None: > > n = 8 > > b, a = cheby1(n, 0.05, 0.8 / q) > > > > Am I missing something? > > > > Thanks! > > > > Cheers, > > Dan > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at python.org > > https://mail.python.org/mailman/listinfo/scipy-dev > > > > > > > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at python.org > > https://mail.python.org/mailman/listinfo/scipy-dev > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Thu Oct 18 19:07:10 2018 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Thu, 18 Oct 2018 19:07:10 -0400 Subject: [SciPy-Dev] Strange code in scipy.signal.decimate In-Reply-To: <72c45861-1e27-d2c1-28b6-4f88cb208064@grinta.net> References: <795609c2-501f-6932-7013-2f2e2f38b996@grinta.net> <72c45861-1e27-d2c1-28b6-4f88cb208064@grinta.net> Message-ID: On 10/18/18, Daniele Nicolodi wrote: > On 18-10-2018 15:28, Paul Hobson wrote: >> Dan, >> >> I might be missing something, but dlti returns a subclass of >> LinearTimeInvariant. Point is, it's not a tuple, but a class with >> multiple attributes and method. Simple tuple unpacking won't likely work >> unless the authors of LinearTimeInvariant really went out of their way >> to make it so. > > The point is not to go through the `dlti` class at all. > > Please look at the replacement code I posted: it works. > There have been several incremental changes to that function over the years. I suspect the last person to change it simply did not notice that the code could be simplified. Your proposed change looks good. Warren > Cheers, > Dan > >> -Paul >> >> On Thu, Oct 18, 2018 at 1:58 PM Daniele Nicolodi > > wrote: >> >> Hello, >> >> I was having a look at the scipy.signal.decimate() function and I >> noticed some strange looking code: >> >> elif ftype == 'iir': >> if n is None: >> n = 8 >> system = dlti(*cheby1(n, 0.05, 0.8 / q)) >> b, a = system.num, system.den >> >> This is used to setup the anti aliasing low pass filter. What I don't >> understand is the dance to obtain the IIR numerator and denominator >> coefficients. Couldn't the above be simply as the code below? >> >> elif ftype == 'iir': >> if n is None: >> n = 8 >> b, a = cheby1(n, 0.05, 0.8 / q) >> >> Am I missing something? >> >> Thanks! >> >> Cheers, >> Dan >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > From eric.antonio.quintero at gmail.com Thu Oct 18 21:18:06 2018 From: eric.antonio.quintero at gmail.com (Eric Quintero) Date: Thu, 18 Oct 2018 21:18:06 -0400 Subject: [SciPy-Dev] Strange code in scipy.signal.decimate In-Reply-To: <67d883853eeb47388c1514192d93bb68@MWHPR03MB2637.namprd03.prod.outlook.com> References: <795609c2-501f-6932-7013-2f2e2f38b996@grinta.net> <72c45861-1e27-d2c1-28b6-4f88cb208064@grinta.net> <67d883853eeb47388c1514192d93bb68@MWHPR03MB2637.namprd03.prod.outlook.com> Message-ID: Warren is correct; there were some changes to how the `dlti` objects were being internally passed around on PR 7835, and this extra instantiation flew under the radar. A PR to simplify the code would be very welcome. -Eric Q. > On Oct 18, 2018, at 7:07 PM, Warren Weckesser wrote: > > On 10/18/18, Daniele Nicolodi wrote: >> On 18-10-2018 15:28, Paul Hobson wrote: >>> Dan, >>> >>> I might be missing something, but dlti returns a subclass of >>> LinearTimeInvariant. Point is, it's not a tuple, but a class with >>> multiple attributes and method. Simple tuple unpacking won't likely work >>> unless the authors of LinearTimeInvariant really went out of their way >>> to make it so. >> >> The point is not to go through the `dlti` class at all. >> >> Please look at the replacement code I posted: it works. >> > > > There have been several incremental changes to that function over the > years. I suspect the last person to change it simply did not notice > that the code could be simplified. Your proposed change looks good. > > Warren > > >> Cheers, >> Dan >> >>> -Paul >>> >>> On Thu, Oct 18, 2018 at 1:58 PM Daniele Nicolodi >> > wrote: >>> >>> Hello, >>> >>> I was having a look at the scipy.signal.decimate() function and I >>> noticed some strange looking code: >>> >>> elif ftype == 'iir': >>> if n is None: >>> n = 8 >>> system = dlti(*cheby1(n, 0.05, 0.8 / q)) >>> b, a = system.num, system.den >>> >>> This is used to setup the anti aliasing low pass filter. What I don't >>> understand is the dance to obtain the IIR numerator and denominator >>> coefficients. Couldn't the above be simply as the code below? >>> >>> elif ftype == 'iir': >>> if n is None: >>> n = 8 >>> b, a = cheby1(n, 0.05, 0.8 / q) >>> >>> Am I missing something? >>> >>> Thanks! >>> >>> Cheers, >>> Dan >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at python.org >>> https://mail.python.org/mailman/listinfo/scipy-dev >>> >>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at python.org >>> https://mail.python.org/mailman/listinfo/scipy-dev >>> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev From ilhanpolat at gmail.com Fri Oct 19 07:56:51 2018 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Fri, 19 Oct 2018 13:56:51 +0200 Subject: [SciPy-Dev] Finding signal upcrossings and declustered peaks In-Reply-To: References: Message-ID: Are these covered by the recent find_peaks functions or brand new ones? On Thu, Oct 18, 2018, 22:53 Ali Cetin wrote: > > > ------------------------------ > *From:* SciPy-Dev on > behalf of Ralf Gommers > *Sent:* Thursday, October 18, 2018 21:59 > *To:* SciPy Developers List > *Subject:* Re: [SciPy-Dev] Finding signal upcrossings and declustered > peaks > > > > On Thu, Oct 18, 2018 at 10:41 AM Ali Cetin wrote: > > Hi, > > this is my first attempt to write to this mailing list, so if I'm doing > something wrong please bear with me. > > > Hi Ali, no worries you got it all right. welcome:) > > > I've been working with extreme value analysis during my PhD and also in my > current job. What we often do is to find signal upcrossings (such as mean > and zero upcrossings) and find declustered peaks. That is, find the largest > peak between two upcrossings, or the two largest peaks between two > upcrossings etc... > > I'm proposing to add two new functions to the scipy.signal submodule: > - find_upcross; takes in signal and returns index of upcrossings wrt > user defined upcrossing level > - find_peaks_dc; takes in signal and returns index of peaks (n largest) > > (find_peaks_dc is not necessarly easily incorporated into find_peaks, so > it may be cleaner to have a separate function). > > Any thought on this? > > > Do you have any references for the algorithms? > > Well, the "algorithms" are rather straight forward and heuristic. > > - find_upcross is often referred to as zero-crossing ( > https://en.wikipedia.org/wiki/Zero_crossing) (a special case). It is > essentially detecting sign changes. Upcrossing count and rates are > important statistics in time series (extreme value) analysis. Side note: > Zero-crossing rate is by def the ratio between the second and zeroth moment > of the power spectrum of the signal. > > - find_peaks_dc finds all peaks above an upcrossing level (or > threshold) between two consecutive upcrossings, i.e. batches of peaks. Then > the n-largest peaks are selected from each batch. Declustering is a > technique often used to break down statistical dependency between peaks > when performing extreme value analysis, and thus be able to use simpler > distributions to describe them. As n-> inf find_peaks_dc -> find_peaks ? > (Albeit in many practical situations, inf < 5). (Note that find_peaks_dc > depends on find_upcross). > > The code are just a few lines and mostly numpy array operations. > > Ali > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ali.cetin at outlook.com Fri Oct 19 15:28:45 2018 From: ali.cetin at outlook.com (Ali Cetin) Date: Fri, 19 Oct 2018 19:28:45 +0000 Subject: [SciPy-Dev] Finding signal upcrossings and declustered peaks In-Reply-To: References: , Message-ID: Are these covered by the recent find_peaks functions or brand new ones? As far as I can see, the current find_peaks don't cover it. Also, finding crossings (up and down) is not "peak finding", but necessary to find declustered peaks. On Thu, Oct 18, 2018, 22:53 Ali Cetin > wrote: ________________________________ From: SciPy-Dev > on behalf of Ralf Gommers > Sent: Thursday, October 18, 2018 21:59 To: SciPy Developers List Subject: Re: [SciPy-Dev] Finding signal upcrossings and declustered peaks On Thu, Oct 18, 2018 at 10:41 AM Ali Cetin > wrote: Hi, this is my first attempt to write to this mailing list, so if I'm doing something wrong please bear with me. Hi Ali, no worries you got it all right. welcome:) I've been working with extreme value analysis during my PhD and also in my current job. What we often do is to find signal upcrossings (such as mean and zero upcrossings) and find declustered peaks. That is, find the largest peak between two upcrossings, or the two largest peaks between two upcrossings etc... I'm proposing to add two new functions to the scipy.signal submodule: - find_upcross; takes in signal and returns index of upcrossings wrt user defined upcrossing level - find_peaks_dc; takes in signal and returns index of peaks (n largest) (find_peaks_dc is not necessarly easily incorporated into find_peaks, so it may be cleaner to have a separate function). Any thought on this? Do you have any references for the algorithms? Well, the "algorithms" are rather straight forward and heuristic. - find_upcross is often referred to as zero-crossing (https://en.wikipedia.org/wiki/Zero_crossing) (a special case). It is essentially detecting sign changes. Upcrossing count and rates are important statistics in time series (extreme value) analysis. Side note: Zero-crossing rate is by def the ratio between the second and zeroth moment of the power spectrum of the signal. - find_peaks_dc finds all peaks above an upcrossing level (or threshold) between two consecutive upcrossings, i.e. batches of peaks. Then the n-largest peaks are selected from each batch. Declustering is a technique often used to break down statistical dependency between peaks when performing extreme value analysis, and thus be able to use simpler distributions to describe them. As n-> inf find_peaks_dc -> find_peaks ? (Albeit in many practical situations, inf < 5). (Note that find_peaks_dc depends on find_upcross). The code are just a few lines and mostly numpy array operations. Ali _______________________________________________ SciPy-Dev mailing list SciPy-Dev at python.org https://mail.python.org/mailman/listinfo/scipy-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Oct 19 18:14:17 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 19 Oct 2018 22:14:17 +0000 Subject: [SciPy-Dev] Finding signal upcrossings and declustered peaks In-Reply-To: References: Message-ID: On Fri, Oct 19, 2018 at 7:28 PM Ali Cetin wrote: > > Are these covered by the recent find_peaks functions or brand new ones? > > As far as I can see, the current find_peaks don't cover it. Also, finding > crossings (up and down) is not "peak finding", but necessary to find > declustered peaks. > > On Thu, Oct 18, 2018, 22:53 Ali Cetin wrote: > > > > ------------------------------ > *From:* SciPy-Dev on > behalf of Ralf Gommers > *Sent:* Thursday, October 18, 2018 21:59 > *To:* SciPy Developers List > *Subject:* Re: [SciPy-Dev] Finding signal upcrossings and declustered > peaks > > > > On Thu, Oct 18, 2018 at 10:41 AM Ali Cetin wrote: > > Hi, > > this is my first attempt to write to this mailing list, so if I'm doing > something wrong please bear with me. > > > Hi Ali, no worries you got it all right. welcome:) > > > I've been working with extreme value analysis during my PhD and also in my > current job. What we often do is to find signal upcrossings (such as mean > and zero upcrossings) and find declustered peaks. That is, find the largest > peak between two upcrossings, or the two largest peaks between two > upcrossings etc... > > I'm proposing to add two new functions to the scipy.signal submodule: > - find_upcross; takes in signal and returns index of upcrossings wrt > user defined upcrossing level > - find_peaks_dc; takes in signal and returns index of peaks (n largest) > > (find_peaks_dc is not necessarly easily incorporated into find_peaks, so > it may be cleaner to have a separate function). > > Any thought on this? > > > Do you have any references for the algorithms? > > Well, the "algorithms" are rather straight forward and heuristic. > > - find_upcross is often referred to as zero-crossing ( > https://en.wikipedia.org/wiki/Zero_crossing > ) > (a special case). It is essentially detecting sign changes. Upcrossing > count and rates are important statistics in time series (extreme value) > analysis. Side note: Zero-crossing rate is by def the ratio between the > second and zeroth moment of the power spectrum of the signal. > > - find_peaks_dc finds all peaks above an upcrossing level (or > threshold) between two consecutive upcrossings, i.e. batches of peaks. Then > the n-largest peaks are selected from each batch. Declustering is a > technique often used to break down statistical dependency between peaks > when performing extreme value analysis, and thus be able to use simpler > distributions to describe them. As n-> inf find_peaks_dc -> find_peaks ? > (Albeit in many practical situations, inf < 5). (Note that find_peaks_dc > depends on find_upcross). > > The code are just a few lines and mostly numpy array operations. > > It sounds like you have implementations ready, could you link to them? (put in a git branch or a gist for example). If it's just a few lines of code and not based on a publication, then I'm not sure we'd want to add these. We are interested in further peak finding improvements, however it should be clear for any new functions that they're an improvement over what we currently have. Otherwise I'm afraid we keep adding separate functions that all do a small subset of the spectrum of what users are interested in. E.g., from your description it's not clear how to treat zero crossings in the presence of noise. Cheers, Ralf > Ali > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ali.cetin at outlook.com Sun Oct 21 15:50:44 2018 From: ali.cetin at outlook.com (Ali Cetin) Date: Sun, 21 Oct 2018 19:50:44 +0000 Subject: [SciPy-Dev] Finding signal upcrossings and declustered peaks In-Reply-To: References: , Message-ID: ________________________________ From: SciPy-Dev on behalf of Ralf Gommers Sent: Saturday, October 20, 2018 00:14 To: SciPy Developers List Subject: Re: [SciPy-Dev] Finding signal upcrossings and declustered peaks On Fri, Oct 19, 2018 at 7:28 PM Ali Cetin > wrote: Are these covered by the recent find_peaks functions or brand new ones? As far as I can see, the current find_peaks don't cover it. Also, finding crossings (up and down) is not "peak finding", but necessary to find declustered peaks. On Thu, Oct 18, 2018, 22:53 Ali Cetin > wrote: ________________________________ From: SciPy-Dev > on behalf of Ralf Gommers > Sent: Thursday, October 18, 2018 21:59 To: SciPy Developers List Subject: Re: [SciPy-Dev] Finding signal upcrossings and declustered peaks On Thu, Oct 18, 2018 at 10:41 AM Ali Cetin > wrote: Hi, this is my first attempt to write to this mailing list, so if I'm doing something wrong please bear with me. Hi Ali, no worries you got it all right. welcome:) I've been working with extreme value analysis during my PhD and also in my current job. What we often do is to find signal upcrossings (such as mean and zero upcrossings) and find declustered peaks. That is, find the largest peak between two upcrossings, or the two largest peaks between two upcrossings etc... I'm proposing to add two new functions to the scipy.signal submodule: - find_upcross; takes in signal and returns index of upcrossings wrt user defined upcrossing level - find_peaks_dc; takes in signal and returns index of peaks (n largest) (find_peaks_dc is not necessarly easily incorporated into find_peaks, so it may be cleaner to have a separate function). Any thought on this? Do you have any references for the algorithms? Well, the "algorithms" are rather straight forward and heuristic. - find_upcross is often referred to as zero-crossing (https://en.wikipedia.org/wiki/Zero_crossing) (a special case). It is essentially detecting sign changes. Upcrossing count and rates are important statistics in time series (extreme value) analysis. Side note: Zero-crossing rate is by def the ratio between the second and zeroth moment of the power spectrum of the signal. - find_peaks_dc finds all peaks above an upcrossing level (or threshold) between two consecutive upcrossings, i.e. batches of peaks. Then the n-largest peaks are selected from each batch. Declustering is a technique often used to break down statistical dependency between peaks when performing extreme value analysis, and thus be able to use simpler distributions to describe them. As n-> inf find_peaks_dc -> find_peaks ? (Albeit in many practical situations, inf < 5). (Note that find_peaks_dc depends on find_upcross). The code are just a few lines and mostly numpy array operations. It sounds like you have implementations ready, could you link to them? (put in a git branch or a gist for example). If it's just a few lines of code and not based on a publication, then I'm not sure we'd want to add these. We are interested in further peak finding improvements, however it should be clear for any new functions that they're an improvement over what we currently have. Otherwise I'm afraid we keep adding separate functions that all do a small subset of the spectrum of what users are interested in. E.g., from your description it's not clear how to treat zero crossings in the presence of noise. Cheers, Ralf I think I misunderstood what you ment by reference; peak declustering methods dont necessarily have a reference paper, as the methods are rather self-explanatory. However, scientific papers and textbooks that use these methods are plenty! (Peaks declustering is indeed a very common technique in extreme value analysis. Just google "peak over threshold declustering") These are some of them: - Coles, An Introduction to Statistical Modeling of Extreme Values, (https://www.springer.com/us/book/9781852334598) - Davison, A. C., & Smith, R. L. (1990). Models for exceedances over high thresholds. Journal of the Royal Statistical Society. Series B (Methodological), 393-442. - Ferro, C. A. T. and Segers, J. (2003) Inference for clusters of extreme values. Journal of the Royal Statistical Society B, 65, 545--556. I also note that peak declustering methods are available in R. (https://www.rdocumentation.org/packages/extRemes/versions/1.65/topics/decluster.runs) Yes, I have written a "small" package that can find up-crossings and one particular method for peaks declustering (https://github.com/4Subsea/evapy). However, the functions in this package are rather limited in scope, limited to 1D arrays (performance optimized), and depends heavily on NumPy and SciPy. I was thinking about expanding the functionality anyway, and thought that I might do that by contributing to SciPy. I propose to take an agile approach on this: * I'm almost done re-writing base signal up/down-crossing module. I can make a pull-request to scipy in the coming days. (It will add similar functionality as argrelmax, argrelmin -> argupcross, argdowncross etc.) If you don't like it, we can stop it there. * next step may be to add peaks declustering methods (whether by extending the current find_peaks or a new function dedicated for peaks declustering methods.) Cheers, Ali Ali _______________________________________________ SciPy-Dev mailing list SciPy-Dev at python.org https://mail.python.org/mailman/listinfo/scipy-dev _______________________________________________ SciPy-Dev mailing list SciPy-Dev at python.org https://mail.python.org/mailman/listinfo/scipy-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Oct 21 16:48:27 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 22 Oct 2018 09:48:27 +1300 Subject: [SciPy-Dev] Finding signal upcrossings and declustered peaks In-Reply-To: References: Message-ID: On Mon, Oct 22, 2018 at 8:51 AM Ali Cetin wrote: > > > ------------------------------ > *From:* SciPy-Dev on > behalf of Ralf Gommers > *Sent:* Saturday, October 20, 2018 00:14 > *To:* SciPy Developers List > *Subject:* Re: [SciPy-Dev] Finding signal upcrossings and declustered > peaks > > > > On Fri, Oct 19, 2018 at 7:28 PM Ali Cetin wrote: > > > Are these covered by the recent find_peaks functions or brand new ones? > > As far as I can see, the current find_peaks don't cover it. Also, finding > crossings (up and down) is not "peak finding", but necessary to find > declustered peaks. > > On Thu, Oct 18, 2018, 22:53 Ali Cetin wrote: > > > > ------------------------------ > *From:* SciPy-Dev on > behalf of Ralf Gommers > *Sent:* Thursday, October 18, 2018 21:59 > *To:* SciPy Developers List > *Subject:* Re: [SciPy-Dev] Finding signal upcrossings and declustered > peaks > > > > On Thu, Oct 18, 2018 at 10:41 AM Ali Cetin wrote: > > Hi, > > this is my first attempt to write to this mailing list, so if I'm doing > something wrong please bear with me. > > > Hi Ali, no worries you got it all right. welcome:) > > > I've been working with extreme value analysis during my PhD and also in my > current job. What we often do is to find signal upcrossings (such as mean > and zero upcrossings) and find declustered peaks. That is, find the largest > peak between two upcrossings, or the two largest peaks between two > upcrossings etc... > > I'm proposing to add two new functions to the scipy.signal submodule: > - find_upcross; takes in signal and returns index of upcrossings wrt > user defined upcrossing level > - find_peaks_dc; takes in signal and returns index of peaks (n largest) > > (find_peaks_dc is not necessarly easily incorporated into find_peaks, so > it may be cleaner to have a separate function). > > Any thought on this? > > > Do you have any references for the algorithms? > > Well, the "algorithms" are rather straight forward and heuristic. > > - find_upcross is often referred to as zero-crossing ( > https://en.wikipedia.org/wiki/Zero_crossing > ) > (a special case). It is essentially detecting sign changes. Upcrossing > count and rates are important statistics in time series (extreme value) > analysis. Side note: Zero-crossing rate is by def the ratio between the > second and zeroth moment of the power spectrum of the signal. > > - find_peaks_dc finds all peaks above an upcrossing level (or > threshold) between two consecutive upcrossings, i.e. batches of peaks. Then > the n-largest peaks are selected from each batch. Declustering is a > technique often used to break down statistical dependency between peaks > when performing extreme value analysis, and thus be able to use simpler > distributions to describe them. As n-> inf find_peaks_dc -> find_peaks ? > (Albeit in many practical situations, inf < 5). (Note that find_peaks_dc > depends on find_upcross). > > The code are just a few lines and mostly numpy array operations. > > > It sounds like you have implementations ready, could you link to them? > (put in a git branch or a gist for example). > > If it's just a few lines of code and not based on a publication, then I'm > not sure we'd want to add these. We are interested in further peak finding > improvements, however it should be clear for any new functions that they're > an improvement over what we currently have. Otherwise I'm afraid we keep > adding separate functions that all do a small subset of the spectrum of > what users are interested in. E.g., from your description it's not clear > how to treat zero crossings in the presence of noise. > > Cheers, > Ralf > > I think I misunderstood what you ment by reference; peak declustering > methods dont necessarily have a reference paper, as the methods are rather > self-explanatory. However, scientific papers and textbooks that use these > methods are plenty! (Peaks declustering is indeed a very common technique > in extreme value analysis. Just google "peak over threshold declustering") > These are some of them: > - Coles, An Introduction to Statistical Modeling of Extreme Values, ( > https://www.springer.com/us/book/9781852334598) > - Davison, A. C., & Smith, R. L. (1990). Models for exceedances over > high thresholds. Journal of the Royal Statistical Society. Series B > (Methodological), 393-442. > - Ferro, C. A. T. and Segers, J. (2003) Inference for clusters of > extreme values. Journal of the Royal Statistical Society B, 65, 545--556. > > I also note that peak declustering methods are available in R. ( > https://www.rdocumentation.org/packages/extRemes/versions/1.65/topics/decluster.runs > ) > Thanks, that all helps! > Yes, I have written a "small" package that can find up-crossings and one > particular method for peaks declustering (https://github.com/4Subsea/evapy). > However, the functions in this package are rather limited in scope, limited > to 1D arrays (performance optimized), and depends heavily on NumPy and > SciPy. I was thinking about expanding the functionality anyway, and thought > that I might do that by contributing to SciPy. > > I propose to take an agile approach on this: > > - I'm almost done re-writing base signal up/down-crossing module. I > can make a pull-request to scipy in the coming days. (It will add similar > functionality as argrelmax, argrelmin -> argupcross, argdowncross etc.) If > you don't like it, we can stop it there. > > This sounds good, always easier to talk about a feature when there's already code. Cheers, Ralf > - next step may be to add peaks declustering methods (whether by > extending the current find_peaks or a new function dedicated for peaks > declustering methods.) > > Cheers, > Ali > > > Ali > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Oct 22 14:06:53 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 22 Oct 2018 12:06:53 -0600 Subject: [SciPy-Dev] NumPy 1.15.3 release Message-ID: Hi All, On behalf of the NumPy team, I am pleased to announce the release of NumPy 1.15.3. This is a bugfix release for bugs and regressions reported following the 1.15.2 release. The most noticeable fix is probably for the memory leak encountered when slicing classes derived from Numpy. The Python versions supported by this release are 2.7, 3.4-3.7. Wheels for this release can be downloaded from PyPI , source archives are available from Github . Compatibility Note ================== The NumPy 1.15.x OS X wheels released on PyPI no longer contain 32-bit binaries. That will also be the case in future releases. See `#11625 `__ for the related discussion. Those needing 32-bit support should look elsewhere or build from source. Contributors ============ A total of 7 people contributed to this release. People with a "+" by their names contributed a patch for the first time. * Allan Haldane * Charles Harris * Jeroen Demeyer * Kevin Sheppard * Matthew Bowden + * Matti Picus * Tyler Reddy Pull requests merged ==================== A total of 12 pull requests were merged for this release. * `#12080 `__: MAINT: Blacklist some MSVC complex functions. * `#12083 `__: TST: Add azure CI testing to 1.15.x branch. * `#12084 `__: BUG: test_path() now uses Path.resolve() * `#12085 `__: TST, MAINT: Fix some failing tests on azure-pipelines mac and... * `#12187 `__: BUG: Fix memory leak in mapping.c * `#12188 `__: BUG: Allow boolean subtract in histogram * `#12189 `__: BUG: Fix in-place permutation * `#12190 `__: BUG: limit default for get_num_build_jobs() to 8 * `#12191 `__: BUG: OBJECT_to_* should check for errors * `#12192 `__: DOC: Prepare for NumPy 1.15.3 release. * `#12237 `__: BUG: Fix MaskedArray fill_value type conversion. * `#12238 `__: TST: Backport azure-pipeline testing fixes for Mac Cheers, Charles Harris -------------- next part -------------- An HTML attachment was scrubbed... URL: From ali.cetin at outlook.com Mon Oct 22 14:53:09 2018 From: ali.cetin at outlook.com (Ali Cetin) Date: Mon, 22 Oct 2018 18:53:09 +0000 Subject: [SciPy-Dev] Finding signal upcrossings and declustered peaks In-Reply-To: References: , Message-ID: ________________________________ From: SciPy-Dev on behalf of Ralf Gommers Sent: Sunday, October 21, 2018 22:48 To: SciPy Developers List Subject: Re: [SciPy-Dev] Finding signal upcrossings and declustered peaks On Mon, Oct 22, 2018 at 8:51 AM Ali Cetin > wrote: ________________________________ From: SciPy-Dev > on behalf of Ralf Gommers > Sent: Saturday, October 20, 2018 00:14 To: SciPy Developers List Subject: Re: [SciPy-Dev] Finding signal upcrossings and declustered peaks On Fri, Oct 19, 2018 at 7:28 PM Ali Cetin > wrote: Are these covered by the recent find_peaks functions or brand new ones? As far as I can see, the current find_peaks don't cover it. Also, finding crossings (up and down) is not "peak finding", but necessary to find declustered peaks. On Thu, Oct 18, 2018, 22:53 Ali Cetin > wrote: ________________________________ From: SciPy-Dev > on behalf of Ralf Gommers > Sent: Thursday, October 18, 2018 21:59 To: SciPy Developers List Subject: Re: [SciPy-Dev] Finding signal upcrossings and declustered peaks On Thu, Oct 18, 2018 at 10:41 AM Ali Cetin > wrote: Hi, this is my first attempt to write to this mailing list, so if I'm doing something wrong please bear with me. Hi Ali, no worries you got it all right. welcome:) I've been working with extreme value analysis during my PhD and also in my current job. What we often do is to find signal upcrossings (such as mean and zero upcrossings) and find declustered peaks. That is, find the largest peak between two upcrossings, or the two largest peaks between two upcrossings etc... I'm proposing to add two new functions to the scipy.signal submodule: - find_upcross; takes in signal and returns index of upcrossings wrt user defined upcrossing level - find_peaks_dc; takes in signal and returns index of peaks (n largest) (find_peaks_dc is not necessarly easily incorporated into find_peaks, so it may be cleaner to have a separate function). Any thought on this? Do you have any references for the algorithms? Well, the "algorithms" are rather straight forward and heuristic. - find_upcross is often referred to as zero-crossing (https://en.wikipedia.org/wiki/Zero_crossing) (a special case). It is essentially detecting sign changes. Upcrossing count and rates are important statistics in time series (extreme value) analysis. Side note: Zero-crossing rate is by def the ratio between the second and zeroth moment of the power spectrum of the signal. - find_peaks_dc finds all peaks above an upcrossing level (or threshold) between two consecutive upcrossings, i.e. batches of peaks. Then the n-largest peaks are selected from each batch. Declustering is a technique often used to break down statistical dependency between peaks when performing extreme value analysis, and thus be able to use simpler distributions to describe them. As n-> inf find_peaks_dc -> find_peaks ? (Albeit in many practical situations, inf < 5). (Note that find_peaks_dc depends on find_upcross). The code are just a few lines and mostly numpy array operations. It sounds like you have implementations ready, could you link to them? (put in a git branch or a gist for example). If it's just a few lines of code and not based on a publication, then I'm not sure we'd want to add these. We are interested in further peak finding improvements, however it should be clear for any new functions that they're an improvement over what we currently have. Otherwise I'm afraid we keep adding separate functions that all do a small subset of the spectrum of what users are interested in. E.g., from your description it's not clear how to treat zero crossings in the presence of noise. Cheers, Ralf I think I misunderstood what you ment by reference; peak declustering methods dont necessarily have a reference paper, as the methods are rather self-explanatory. However, scientific papers and textbooks that use these methods are plenty! (Peaks declustering is indeed a very common technique in extreme value analysis. Just google "peak over threshold declustering") These are some of them: - Coles, An Introduction to Statistical Modeling of Extreme Values, (https://www.springer.com/us/book/9781852334598) - Davison, A. C., & Smith, R. L. (1990). Models for exceedances over high thresholds. Journal of the Royal Statistical Society. Series B (Methodological), 393-442. - Ferro, C. A. T. and Segers, J. (2003) Inference for clusters of extreme values. Journal of the Royal Statistical Society B, 65, 545--556. I also note that peak declustering methods are available in R. (https://www.rdocumentation.org/packages/extRemes/versions/1.65/topics/decluster.runs) Thanks, that all helps! Yes, I have written a "small" package that can find up-crossings and one particular method for peaks declustering (https://github.com/4Subsea/evapy). However, the functions in this package are rather limited in scope, limited to 1D arrays (performance optimized), and depends heavily on NumPy and SciPy. I was thinking about expanding the functionality anyway, and thought that I might do that by contributing to SciPy. I propose to take an agile approach on this: * I'm almost done re-writing base signal up/down-crossing module. I can make a pull-request to scipy in the coming days. (It will add similar functionality as argrelmax, argrelmin -> argupcross, argdowncross etc.) If you don't like it, we can stop it there. This sounds good, always easier to talk about a feature when there's already code. Cheers, Ralf Hello Ralf, just made a pull request with base functionality required. * next step may be to add peaks declustering methods (whether by extending the current find_peaks or a new function dedicated for peaks declustering methods.) Cheers, Ali Ali _______________________________________________ SciPy-Dev mailing list SciPy-Dev at python.org https://mail.python.org/mailman/listinfo/scipy-dev _______________________________________________ SciPy-Dev mailing list SciPy-Dev at python.org https://mail.python.org/mailman/listinfo/scipy-dev _______________________________________________ SciPy-Dev mailing list SciPy-Dev at python.org https://mail.python.org/mailman/listinfo/scipy-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From nobodyinperson at gmx.de Wed Oct 24 03:46:02 2018 From: nobodyinperson at gmx.de (=?UTF-8?Q?Yann_B=c3=bcchau?=) Date: Wed, 24 Oct 2018 09:46:02 +0200 Subject: [SciPy-Dev] Unable to build RPM package via setup.py bdist_rpm Message-ID: <3dd08eac-2640-0961-487f-a3a908893273@gmx.de> Hello everyone, After countless attempts and a good portion of frustration I am now asking for help in this mailinglist. I would like to build a numpy RPM package. There is the |setup.py bdist_rpm| command which does that. My final goal is to build numpy for SailfishOS (mobile operating system) myself. (There are numpy packages on OpenRepos.net, but they are outdated and I need an up-to-date numpy to package an up-to-date matplotlib?) Eventually, |python3 setup.py bdist_rpm| always fails with: |executing numpy/core/code_generators/generate_numpy_api.py Running from numpy source directory. /usr/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'define_macros' warnings.warn(msg) error: [Errno 2] No such file or directory: 'numpy/core/code_generators/../src/multiarray/arraytypes.c.src' | I was able to boil the problem down to these files missing in the source tarball |build/bdist_rpm*/rpm/SOURCES/numpy-1.15.3.tar.gz|: |numpy/linalg/umath_linalg.c.src numpy/core/src/multiarray/arraytypes.c.src numpy/core/src/multiarray/scalartypes.c.src numpy/core/src/umath/_umath_tests.c.src numpy/core/src/umath/loops.h.src numpy/core/src/multiarray/nditer_templ.c.src numpy/core/src/umath/_operand_flag_tests.c.src numpy/core/src/umath/_struct_ufunc_tests.c.src numpy/core/src/umath/_rational_tests.c.src numpy/core/src/umath/scalarmath.c.src numpy/core/src/multiarray/_multiarray_tests.c.src numpy/core/src/multiarray/lowlevel_strided_loops.c.src numpy/core/src/umath/funcs.inc.src numpy/core/src/umath/loops.c.src numpy/core/src/multiarray/einsum.c.src | Interestingly, when I run |python3 setup.py sdist|, the created tarball under |dist/numpy-1.15.3.tar.gz| contains these files. So my question is: What is going wrong here? The |bdist_rpm| log shows that it is |running sdist|, but these |*.src|-files are not included in this run. What is different in the |bdist_rpm| run? I also was not able to just run |rpmbuild| directly due to strange errors like |error: Macro %__python has empty body| and the like. The |python3 setup.py install| mechanism works well both on my Ubuntu 18.04 and my Jolla Phone with SailfishOS. Just the |bdist_rpm| part does not work. I would be very delighted if someone could help me with this. Cheers, Yann ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From tyler.je.reddy at gmail.com Wed Oct 24 19:55:04 2018 From: tyler.je.reddy at gmail.com (Tyler Reddy) Date: Wed, 24 Oct 2018 16:55:04 -0700 Subject: [SciPy-Dev] SciPy 1.2.0 release schedule Message-ID: Hi all, It is almost 6 months after the 1.1.0 release on May 5, so probably time to plan the 1.2.0 release. It would be a good idea to look over the PRs with a 1.2.0 milestone , and tag anything else that should have this milestone appropriately. I'd like to propose the following schedule: Nov. 5: branch 1.2.x Nov. 8: rc1 Nov. 21: rc2 (if needed) Nov. 30: final release Thoughts? Best wishes, Tyler -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Oct 26 17:37:06 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 27 Oct 2018 10:37:06 +1300 Subject: [SciPy-Dev] SciPy 1.2.0 release schedule In-Reply-To: References: Message-ID: On Thu, Oct 25, 2018 at 12:55 PM Tyler Reddy wrote: > Hi all, > > It is almost 6 months after the 1.1.0 release on May 5, so probably time > to plan the 1.2.0 release. It would be a good idea to look over the PRs > with a 1.2.0 milestone > , > and tag anything else that should have this milestone appropriately. > > I'd like to propose the following schedule: > > Nov. 5: branch 1.2.x > Nov. 8: rc1 > Nov. 21: rc2 (if needed) > Nov. 30: final release > > Thoughts? > This looks like a good schedule to me. We'll probably struggle to get some PRs marked for 1.2.0 merged, but that's always the case. Tyler, if you send me your PyPI username I can give you permissions to create releases. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikofski at berkeley.edu Fri Oct 26 22:03:03 2018 From: mikofski at berkeley.edu (Mark Alexander Mikofski) Date: Fri, 26 Oct 2018 19:03:03 -0700 Subject: [SciPy-Dev] SciPy 1.2.0 release schedule In-Reply-To: References: Message-ID: Hi Tyler and others, Thanks for managing the v1.2 release. I think PR #8431, Cython optimize zeros API, is ready, hopefully, to merge. It's been through several rounds of reviews and I think I've accommodated all of the recommendations, all tests are passing, and there's been strong support. Anyone please take a look. https://github.com/scipy/scipy/pull/8431 Thanks, Mark On Fri, Oct 26, 2018, 2:38 PM Ralf Gommers wrote: > > > On Thu, Oct 25, 2018 at 12:55 PM Tyler Reddy > wrote: > >> Hi all, >> >> It is almost 6 months after the 1.1.0 release on May 5, so probably time >> to plan the 1.2.0 release. It would be a good idea to look over the PRs >> with a 1.2.0 milestone >> , >> and tag anything else that should have this milestone appropriately. >> >> I'd like to propose the following schedule: >> >> Nov. 5: branch 1.2.x >> Nov. 8: rc1 >> Nov. 21: rc2 (if needed) >> Nov. 30: final release >> >> Thoughts? >> > > This looks like a good schedule to me. We'll probably struggle to get some > PRs marked for 1.2.0 merged, but that's always the case. > > Tyler, if you send me your PyPI username I can give you permissions to > create releases. > > Cheers, > Ralf > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Oct 27 15:41:22 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 27 Oct 2018 15:41:22 -0400 Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices Message-ID: I needed a more general version of a matrix square root for GAM in statsmodels. https://github.com/statsmodels/statsmodels/pull/5296 Is there already something like this in numpy/scipy land? Would there be interest in adding something like this? Improvements to my implementation? (threshold is not in terms of rcond, because I have more intuition about small eigen values.) (I still don't consider myself to be a linalg expert.) Josef def matrix_sqrt(mat, inverse=False, full=False, nullspace=False, threshold=1e-15): """matrix square root for symmetric matrices Usage is for decomposing a covariance function S into a square root R such that R' R = S if inverse is False, or R' R = pinv(S) if inverse is True Parameters ---------- mat : array_like, 2-d square symmetric square matrix for which square root or inverse square root is computed. There is no checking for whether the matrix is symmetric. A warning is issued if some singular values are negative, i.e. below the negative of the threshold. inverse : bool If False (default), then the matrix square root is returned. If inverse is True, then the matrix square root of the inverse matrix is returned. full : bool If full is False (default, then the square root has reduce number of rows if the matrix is singular, i.e. has singular values below the threshold. nullspace: bool If nullspace is true, then the matrix square root of the null space of the matrix is returned. threshold : float Singular values below the threshold are dropped. Returns ------- msqrt : ndarray matrix square root or square root of inverse matrix. """ # see also scipy.linalg null_space u, s, v = np.linalg.svd(mat) if np.any(s < -threshold): import warnings warnings.warn('some singular values are negative') if not nullspace: mask = s > threshold s[s < threshold] = 0 else: mask = s < threshold s[s > threshold] = 0 sqrt_s = np.sqrt(s[mask]) if inverse: sqrt_s = 1 / np.sqrt(s[mask]) if full: b = np.dot(u[:, mask], np.dot(np.diag(sqrt_s), v[mask])) else: b = np.dot(np.diag(sqrt_s), v[mask]) return b -------------- next part -------------- An HTML attachment was scrubbed... URL: From phillip.m.feldman at gmail.com Sat Oct 27 16:01:29 2018 From: phillip.m.feldman at gmail.com (Phillip Feldman) Date: Sat, 27 Oct 2018 13:01:29 -0700 Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices In-Reply-To: References: Message-ID: A matrix square-root would be very useful. As noted here, the matrix square-root is typically non-unique: https://en.wikipedia.org/wiki/Square_root_of_a_matrix Since NumPy has methods for eigenvalues, calculating a matrix square-root via diagonalization should be straightforward. Phillip On Sat, Oct 27, 2018 at 12:42 PM wrote: > I needed a more general version of a matrix square root for GAM in > statsmodels. > https://github.com/statsmodels/statsmodels/pull/5296 > > Is there already something like this in numpy/scipy land? > Would there be interest in adding something like this? > Improvements to my implementation? > (threshold is not in terms of rcond, because I have more intuition about > small eigen values.) > > (I still don't consider myself to be a linalg expert.) > > Josef > > def matrix_sqrt(mat, inverse=False, full=False, nullspace=False, threshold=1e-15): > """matrix square root for symmetric matrices > > Usage is for decomposing a covariance function S into a square root R > such that > > R' R = S if inverse is False, or > R' R = pinv(S) if inverse is True > > Parameters > ---------- > mat : array_like, 2-d square > symmetric square matrix for which square root or inverse square > root is computed. > There is no checking for whether the matrix is symmetric. > A warning is issued if some singular values are negative, i.e. > below the negative of the threshold. > inverse : bool > If False (default), then the matrix square root is returned. > If inverse is True, then the matrix square root of the inverse > matrix is returned. > full : bool > If full is False (default, then the square root has reduce number > of rows if the matrix is singular, i.e. has singular values below > the threshold. > nullspace: bool > If nullspace is true, then the matrix square root of the null space > of the matrix is returned. > threshold : float > Singular values below the threshold are dropped. > > Returns > ------- > msqrt : ndarray > matrix square root or square root of inverse matrix. > > """ > # see also scipy.linalg null_space > u, s, v = np.linalg.svd(mat) > if np.any(s < -threshold): > import warnings > warnings.warn('some singular values are negative') > > if not nullspace: > mask = s > threshold > s[s < threshold] = 0 > else: > mask = s < threshold > s[s > threshold] = 0 > > sqrt_s = np.sqrt(s[mask]) > if inverse: > sqrt_s = 1 / np.sqrt(s[mask]) > > if full: > b = np.dot(u[:, mask], np.dot(np.diag(sqrt_s), v[mask])) > else: > b = np.dot(np.diag(sqrt_s), v[mask]) > return b > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Oct 27 16:16:58 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 27 Oct 2018 16:16:58 -0400 Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices In-Reply-To: References: Message-ID: On Sat, Oct 27, 2018 at 4:01 PM Phillip Feldman wrote: > A matrix square-root would be very useful. As noted here, the matrix > square-root is typically non-unique: > > https://en.wikipedia.org/wiki/Square_root_of_a_matrix > Which basis is chosen is irrelevant in most use cases that I know. So the non-uniqueness can be broken in a algorithm specific way. In the full rank case I have only seen the use of either cholesky or svd. statsmodels includes now the code from a python "rotations" package that can be used when we want to have interpretable components as in principal component or factor analysis. However, in my current application the matrix sqrt eventually ends up in quadratic forms or we go back to the original space for the interpretation. Josef > > > Since NumPy has methods for eigenvalues, calculating a matrix square-root > via diagonalization should be straightforward. > > Phillip > > On Sat, Oct 27, 2018 at 12:42 PM wrote: > >> I needed a more general version of a matrix square root for GAM in >> statsmodels. >> https://github.com/statsmodels/statsmodels/pull/5296 >> >> Is there already something like this in numpy/scipy land? >> Would there be interest in adding something like this? >> Improvements to my implementation? >> (threshold is not in terms of rcond, because I have more intuition about >> small eigen values.) >> >> (I still don't consider myself to be a linalg expert.) >> >> Josef >> >> def matrix_sqrt(mat, inverse=False, full=False, nullspace=False, threshold=1e-15): >> """matrix square root for symmetric matrices >> >> Usage is for decomposing a covariance function S into a square root R >> such that >> >> R' R = S if inverse is False, or >> R' R = pinv(S) if inverse is True >> >> Parameters >> ---------- >> mat : array_like, 2-d square >> symmetric square matrix for which square root or inverse square >> root is computed. >> There is no checking for whether the matrix is symmetric. >> A warning is issued if some singular values are negative, i.e. >> below the negative of the threshold. >> inverse : bool >> If False (default), then the matrix square root is returned. >> If inverse is True, then the matrix square root of the inverse >> matrix is returned. >> full : bool >> If full is False (default, then the square root has reduce number >> of rows if the matrix is singular, i.e. has singular values below >> the threshold. >> nullspace: bool >> If nullspace is true, then the matrix square root of the null space >> of the matrix is returned. >> threshold : float >> Singular values below the threshold are dropped. >> >> Returns >> ------- >> msqrt : ndarray >> matrix square root or square root of inverse matrix. >> >> """ >> # see also scipy.linalg null_space >> u, s, v = np.linalg.svd(mat) >> if np.any(s < -threshold): >> import warnings >> warnings.warn('some singular values are negative') >> >> if not nullspace: >> mask = s > threshold >> s[s < threshold] = 0 >> else: >> mask = s < threshold >> s[s > threshold] = 0 >> >> sqrt_s = np.sqrt(s[mask]) >> if inverse: >> sqrt_s = 1 / np.sqrt(s[mask]) >> >> if full: >> b = np.dot(u[:, mask], np.dot(np.diag(sqrt_s), v[mask])) >> else: >> b = np.dot(np.diag(sqrt_s), v[mask]) >> return b >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sat Oct 27 16:52:42 2018 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 27 Oct 2018 22:52:42 +0200 Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices In-Reply-To: References: Message-ID: <4dab76941a12e5d7c946ceb23203da38b9643708.camel@iki.fi> la, 2018-10-27 kello 15:41 -0400, josef.pktd at gmail.com kirjoitti: > I needed a more general version of a matrix square root for GAM in > statsmodels. > > > Is there already something like this in numpy/scipy land? > Would there be interest in adding something like this? > Improvements to my implementation? > (threshold is not in terms of rcond, because I have more intuition > about small eigen values.) For matrix square root there is: https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.sqrtm.html which IIRC uses a good algorithm. But matrix square root R is the solution to R^2 = S --- the solution to L' L = S is given by (conjugate of) Cholesky decomposition, https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.linalg.cholesky.html if I understand what you want correctly. Pauli > > (I still don't consider myself to be a linalg expert.) > > Josef > > def matrix_sqrt(mat, inverse=False, full=False, nullspace=False, > threshold=1e-15): > """matrix square root for symmetric matrices > > Usage is for decomposing a covariance function S into a square > root R > such that > > R' R = S if inverse is False, or > R' R = pinv(S) if inverse is True > > Parameters > ---------- > mat : array_like, 2-d square > symmetric square matrix for which square root or inverse > square > root is computed. > There is no checking for whether the matrix is symmetric. > A warning is issued if some singular values are negative, > i.e. > below the negative of the threshold. > inverse : bool > If False (default), then the matrix square root is returned. > If inverse is True, then the matrix square root of the > inverse > matrix is returned. > full : bool > If full is False (default, then the square root has reduce > number > of rows if the matrix is singular, i.e. has singular values > below > the threshold. > nullspace: bool > If nullspace is true, then the matrix square root of the null > space > of the matrix is returned. > threshold : float > Singular values below the threshold are dropped. > > Returns > ------- > msqrt : ndarray > matrix square root or square root of inverse matrix. > > """ > # see also scipy.linalg null_space > u, s, v = np.linalg.svd(mat) > if np.any(s < -threshold): > import warnings > warnings.warn('some singular values are negative') > > if not nullspace: > mask = s > threshold > s[s < threshold] = 0 > else: > mask = s < threshold > s[s > threshold] = 0 > > sqrt_s = np.sqrt(s[mask]) > if inverse: > sqrt_s = 1 / np.sqrt(s[mask]) > > if full: > b = np.dot(u[:, mask], np.dot(np.diag(sqrt_s), v[mask])) > else: > b = np.dot(np.diag(sqrt_s), v[mask]) > return b > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev From pav at iki.fi Sat Oct 27 17:08:32 2018 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 27 Oct 2018 23:08:32 +0200 Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices In-Reply-To: <4dab76941a12e5d7c946ceb23203da38b9643708.camel@iki.fi> References: <4dab76941a12e5d7c946ceb23203da38b9643708.camel@iki.fi> Message-ID: la, 2018-10-27 kello 22:52 +0200, Pauli Virtanen kirjoitti: > la, 2018-10-27 kello 15:41 -0400, josef.pktd at gmail.com kirjoitti: > > I needed a more general version of a matrix square root for GAM in > > statsmodels. > > > > > > Is there already something like this in numpy/scipy land? > > Would there be interest in adding something like this? > > Improvements to my implementation? > > (threshold is not in terms of rcond, because I have more intuition > > about small eigen values.) > > For matrix square root there is: > https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.sqrtm.html > which IIRC uses a good algorithm. > > But matrix square root R is the solution to R^2 = S --- the solution > to > L' L = S is given by (conjugate of) Cholesky decomposition, > https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.linalg.cholesky.html > if I understand what you want correctly. If I now read the mail properly, for singular matrices (for which Cholesky won't work), I guess LDL decomposition could also give a starting point https://scipy.github.io/devdocs/generated/scipy.linalg.ldl.html > > (I still don't consider myself to be a linalg expert.) > > > > Josef > > > > def matrix_sqrt(mat, inverse=False, full=False, nullspace=False, > > threshold=1e-15): > > """matrix square root for symmetric matrices > > > > Usage is for decomposing a covariance function S into a square > > root R > > such that > > > > R' R = S if inverse is False, or > > R' R = pinv(S) if inverse is True > > > > Parameters > > ---------- > > mat : array_like, 2-d square > > symmetric square matrix for which square root or inverse > > square > > root is computed. > > There is no checking for whether the matrix is symmetric. > > A warning is issued if some singular values are negative, > > i.e. > > below the negative of the threshold. > > inverse : bool > > If False (default), then the matrix square root is > > returned. > > If inverse is True, then the matrix square root of the > > inverse > > matrix is returned. > > full : bool > > If full is False (default, then the square root has reduce > > number > > of rows if the matrix is singular, i.e. has singular values > > below > > the threshold. > > nullspace: bool > > If nullspace is true, then the matrix square root of the > > null > > space > > of the matrix is returned. > > threshold : float > > Singular values below the threshold are dropped. > > > > Returns > > ------- > > msqrt : ndarray > > matrix square root or square root of inverse matrix. > > > > """ > > # see also scipy.linalg null_space > > u, s, v = np.linalg.svd(mat) > > if np.any(s < -threshold): > > import warnings > > warnings.warn('some singular values are negative') > > > > if not nullspace: > > mask = s > threshold > > s[s < threshold] = 0 > > else: > > mask = s < threshold > > s[s > threshold] = 0 > > > > sqrt_s = np.sqrt(s[mask]) > > if inverse: > > sqrt_s = 1 / np.sqrt(s[mask]) > > > > if full: > > b = np.dot(u[:, mask], np.dot(np.diag(sqrt_s), v[mask])) > > else: > > b = np.dot(np.diag(sqrt_s), v[mask]) > > return b > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at python.org > > https://mail.python.org/mailman/listinfo/scipy-dev > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev From josef.pktd at gmail.com Sat Oct 27 18:40:58 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 27 Oct 2018 18:40:58 -0400 Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices In-Reply-To: References: <4dab76941a12e5d7c946ceb23203da38b9643708.camel@iki.fi> Message-ID: On Sat, Oct 27, 2018 at 5:10 PM Pauli Virtanen wrote: > la, 2018-10-27 kello 22:52 +0200, Pauli Virtanen kirjoitti: > > la, 2018-10-27 kello 15:41 -0400, josef.pktd at gmail.com kirjoitti: > > > I needed a more general version of a matrix square root for GAM in > > > statsmodels. > > > > > > > > > Is there already something like this in numpy/scipy land? > > > Would there be interest in adding something like this? > > > Improvements to my implementation? > > > (threshold is not in terms of rcond, because I have more intuition > > > about small eigen values.) > > > > For matrix square root there is: > > > https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.sqrtm.html > > which IIRC uses a good algorithm. > I could never figure out a use for those. AFAIU (9 or 10 years ago) they don't use the transpose R' R = S they use R R = S > > > > But matrix square root R is the solution to R^2 = S --- the solution > > to > > L' L = S is given by (conjugate of) Cholesky decomposition, > > > https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.linalg.cholesky.html > > if I understand what you want correctly. > > If I now read the mail properly, for singular matrices (for which > Cholesky won't work), I guess LDL decomposition could also give a > starting point > https://scipy.github.io/devdocs/generated/scipy.linalg.ldl.html > > The matrix is in most cases singular in my current usage for penalized splines. (The null space is the subspace that is not penalized.) statsmodels uses cholesky in almost all full rank cases. Josef > > > > (I still don't consider myself to be a linalg expert.) > > > > > > Josef > > > > > > def matrix_sqrt(mat, inverse=False, full=False, nullspace=False, > > > threshold=1e-15): > > > """matrix square root for symmetric matrices > > > > > > Usage is for decomposing a covariance function S into a square > > > root R > > > such that > > > > > > R' R = S if inverse is False, or > > > R' R = pinv(S) if inverse is True > > > > > > Parameters > > > ---------- > > > mat : array_like, 2-d square > > > symmetric square matrix for which square root or inverse > > > square > > > root is computed. > > > There is no checking for whether the matrix is symmetric. > > > A warning is issued if some singular values are negative, > > > i.e. > > > below the negative of the threshold. > > > inverse : bool > > > If False (default), then the matrix square root is > > > returned. > > > If inverse is True, then the matrix square root of the > > > inverse > > > matrix is returned. > > > full : bool > > > If full is False (default, then the square root has reduce > > > number > > > of rows if the matrix is singular, i.e. has singular values > > > below > > > the threshold. > > > nullspace: bool > > > If nullspace is true, then the matrix square root of the > > > null > > > space > > > of the matrix is returned. > > > threshold : float > > > Singular values below the threshold are dropped. > > > > > > Returns > > > ------- > > > msqrt : ndarray > > > matrix square root or square root of inverse matrix. > > > > > > """ > > > # see also scipy.linalg null_space > > > u, s, v = np.linalg.svd(mat) > > > if np.any(s < -threshold): > > > import warnings > > > warnings.warn('some singular values are negative') > > > > > > if not nullspace: > > > mask = s > threshold > > > s[s < threshold] = 0 > > > else: > > > mask = s < threshold > > > s[s > threshold] = 0 > > > > > > sqrt_s = np.sqrt(s[mask]) > > > if inverse: > > > sqrt_s = 1 / np.sqrt(s[mask]) > > > > > > if full: > > > b = np.dot(u[:, mask], np.dot(np.diag(sqrt_s), v[mask])) > > > else: > > > b = np.dot(np.diag(sqrt_s), v[mask]) > > > return b > > > _______________________________________________ > > > SciPy-Dev mailing list > > > SciPy-Dev at python.org > > > https://mail.python.org/mailman/listinfo/scipy-dev > > > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at python.org > > https://mail.python.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Sun Oct 28 03:59:06 2018 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 28 Oct 2018 08:59:06 +0100 Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices In-Reply-To: <20181028075637.d2mhsekvxw6sswi7@phare.normalesup.org> References: <20181028075637.d2mhsekvxw6sswi7@phare.normalesup.org> Message-ID: <20181028075906.p5zi3mhmrkwxduzl@phare.normalesup.org> On Sun, Oct 28, 2018 at 08:56:37AM +0100, Gael Varoquaux wrote: > using '@' to denote the matrix product, and S = np.diag(s) is the > diagonal matrix of eigenvalues. The matrix square root is then given by: > sqrt(M) = U' @ np.diag(np.sqrt(s)) @ U I forgot to say: this is the definition used by Joseph, in his original post (so basically, I am backing his choice), with the only difference that I would not use an SVD, but and "eigh", which should be faster and more stable for SPD matrices (and non SPD matrices do not have a square root). G From gael.varoquaux at normalesup.org Sun Oct 28 03:56:37 2018 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 28 Oct 2018 08:56:37 +0100 Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices In-Reply-To: References: Message-ID: <20181028075637.d2mhsekvxw6sswi7@phare.normalesup.org> On Sat, Oct 27, 2018 at 04:16:58PM -0400, josef.pktd at gmail.com wrote: > A matrix square-root would be very useful.? As noted here, the matrix > square-root is typically non-unique: > https://en.wikipedia.org/wiki/Square_root_of_a_matrix > Which basis is chosen is irrelevant in most use cases that I know. In some case, the following considerations matter: SDP matrices (symmetric definite positive matrices) form an algebraic structure linked to a group. The square root matrix on this group should be also SDP, hence it should be also symmetric. The easiest way to build it is using the eigen-value decomposition of the original matrix: M = U' @ S @ U using '@' to denote the matrix product, and S = np.diag(s) is the diagonal matrix of eigenvalues. The matrix square root is then given by: sqrt(M) = U' @ np.diag(np.sqrt(s)) @ U Unlike with using a Cholesky decomposition to obtain a square-root matrix, the operation defined above is smooth. Such considerations are typically important in signal processing, to manipulate covariance matrices for whitening and averaging. Ga?l From ilhanpolat at gmail.com Sun Oct 28 09:11:03 2018 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Sun, 28 Oct 2018 14:11:03 +0100 Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices In-Reply-To: <20181028075906.p5zi3mhmrkwxduzl@phare.normalesup.org> References: <20181028075637.d2mhsekvxw6sswi7@phare.normalesup.org> <20181028075906.p5zi3mhmrkwxduzl@phare.normalesup.org> Message-ID: This is covered by LDLt decomposition with an extra step of taking the square root of each block in D. The economy mode of this would be removing rows/cols 0 blocks from D and L. For the second case I think a polar decomposition would be a better approach. Calling these factors a square root might take you out of the common terminology though On Sun, Oct 28, 2018 at 9:00 AM Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > On Sun, Oct 28, 2018 at 08:56:37AM +0100, Gael Varoquaux wrote: > > using '@' to denote the matrix product, and S = np.diag(s) is the > > diagonal matrix of eigenvalues. The matrix square root is then given by: > > > sqrt(M) = U' @ np.diag(np.sqrt(s)) @ U > > I forgot to say: this is the definition used by Joseph, in his original > post (so basically, I am backing his choice), with the only difference > that I would not use an SVD, but and "eigh", which should be faster and > more stable for SPD matrices (and non SPD matrices do not have a square > root). > > G > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Oct 28 10:02:07 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 28 Oct 2018 10:02:07 -0400 Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices In-Reply-To: References: <20181028075637.d2mhsekvxw6sswi7@phare.normalesup.org> <20181028075906.p5zi3mhmrkwxduzl@phare.normalesup.org> Message-ID: On Sun, Oct 28, 2018 at 9:11 AM Ilhan Polat wrote: > This is covered by LDLt decomposition with an extra step of taking the > square root of each block in D. The economy mode of this would be removing > rows/cols 0 blocks from D and L. For the second case I think a polar > decomposition would be a better approach. > > Calling these factors a square root might take you out of the common > terminology though > It's common terminology in statistics and econometrics with notation like R = S^{1/2} or Q = S^{-1/2} (In a related case where statsmodels uses cholesky for regression we have boring but descriptive names like cholsigma and cholsigmainv ) It would be fine if there is a "linalg" name that is discoverable. The point would be to have a function for when cholesky doesn't work and that I don't have to figure out each time I need it. I had also written a similar function as `matrix_half` in the past and inlined it several times, but it's the first time that I tried to get a reduced number of rows or columns for it. My usecase is similar to np.linalg.pinv that I use very often because it is convenient and simple even if working with the SVD directly would be computationally more efficient for getting additional results. Josef > > On Sun, Oct 28, 2018 at 9:00 AM Gael Varoquaux < > gael.varoquaux at normalesup.org> wrote: > >> On Sun, Oct 28, 2018 at 08:56:37AM +0100, Gael Varoquaux wrote: >> > using '@' to denote the matrix product, and S = np.diag(s) is the >> > diagonal matrix of eigenvalues. The matrix square root is then given by: >> >> > sqrt(M) = U' @ np.diag(np.sqrt(s)) @ U >> >> I forgot to say: this is the definition used by Joseph, in his original >> post (so basically, I am backing his choice), with the only difference >> that I would not use an SVD, but and "eigh", which should be faster and >> more stable for SPD matrices (and non SPD matrices do not have a square >> root). >> >> G >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Oct 28 10:08:26 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 28 Oct 2018 10:08:26 -0400 Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices In-Reply-To: References: <20181028075637.d2mhsekvxw6sswi7@phare.normalesup.org> <20181028075906.p5zi3mhmrkwxduzl@phare.normalesup.org> Message-ID: On Sun, Oct 28, 2018 at 10:02 AM wrote: > > > On Sun, Oct 28, 2018 at 9:11 AM Ilhan Polat wrote: > >> This is covered by LDLt decomposition with an extra step of taking the >> square root of each block in D. The economy mode of this would be removing >> rows/cols 0 blocks from D and L. For the second case I think a polar >> decomposition would be a better approach. >> >> Calling these factors a square root might take you out of the common >> terminology though >> > > It's common terminology in statistics and econometrics with notation like > R = S^{1/2} or Q = S^{-1/2} > > (In a related case where statsmodels uses cholesky for regression we have > boring but descriptive names like > cholsigma and cholsigmainv ) > > It would be fine if there is a "linalg" name that is discoverable. > The point would be to have a function for when cholesky doesn't work and > that I don't have to figure out each time I need it. > I had also written a similar function as `matrix_half` in the past and > inlined it several times, but it's the first time that I tried to get a > reduced number of rows or columns for it. > As extra: This time I also needed to figure out the null space version for matrix_sqrt. I don't use it yet, but the R package (mgcv) that I compare with uses it for additional penalization. Josef > > My usecase is similar to np.linalg.pinv that I use very often because it > is convenient and simple even if working with the SVD directly would be > computationally more efficient for getting additional results. > > Josef > > >> >> On Sun, Oct 28, 2018 at 9:00 AM Gael Varoquaux < >> gael.varoquaux at normalesup.org> wrote: >> >>> On Sun, Oct 28, 2018 at 08:56:37AM +0100, Gael Varoquaux wrote: >>> > using '@' to denote the matrix product, and S = np.diag(s) is the >>> > diagonal matrix of eigenvalues. The matrix square root is then given >>> by: >>> >>> > sqrt(M) = U' @ np.diag(np.sqrt(s)) @ U >>> >>> I forgot to say: this is the definition used by Joseph, in his original >>> post (so basically, I am backing his choice), with the only difference >>> that I would not use an SVD, but and "eigh", which should be faster and >>> more stable for SPD matrices (and non SPD matrices do not have a square >>> root). >>> >>> G >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at python.org >>> https://mail.python.org/mailman/listinfo/scipy-dev >>> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sun Oct 28 10:12:43 2018 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 28 Oct 2018 15:12:43 +0100 Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices In-Reply-To: <20181028075637.d2mhsekvxw6sswi7@phare.normalesup.org> References: <20181028075637.d2mhsekvxw6sswi7@phare.normalesup.org> Message-ID: <5520e7cc6fa3eea0da449a51ca3e24804f132786.camel@iki.fi> su, 2018-10-28 kello 08:56 +0100, Gael Varoquaux kirjoitti: > On Sat, Oct 27, 2018 at 04:16:58PM -0400, josef.pktd at gmail.com wrote: > > A matrix square-root would be very useful. As noted here, the > > matrix > > square-root is typically non-unique: > > https://en.wikipedia.org/wiki/Square_root_of_a_matrix > > Which basis is chosen is irrelevant in most use cases that I know. > > In some case, the following considerations matter: > > SDP matrices (symmetric definite positive matrices) form an algebraic > structure linked to a group. The square root matrix on this group > should > be also SDP, hence it should be also symmetric. Right indeed if R=R' the two equations are the same. scipy.linalg.sqrtm computes the principal matrix square root, which then should be SDP too. It can handle singular cases. I would guess the algorithm in the presence of eigenvalues on the negative real line gives "continuous" behavior, i.e., you get the principal square root of a matrix with eigenvalues pushed to either side of the branch cut in some (uncontrolled) way giving either of the +/- i sqrt(|z|) and the result probably is close to Hermitian still in the same way as the eigenvalue decomposition. So for the question whether there's something to add to scipy here, I'm not so sure --- computation of the principal sqrtm, Cholesky, LDLt, and eigenvalue decomposition is there. Pauli > The easiest way to build > it is using the eigen-value decomposition of the original matrix: > > M = U' @ S @ U > > using '@' to denote the matrix product, and S = np.diag(s) is the > diagonal matrix of eigenvalues. The matrix square root is then given > by: > > sqrt(M) = U' @ np.diag(np.sqrt(s)) @ U > > Unlike with using a Cholesky decomposition to obtain a square-root > matrix, the operation defined above is smooth. > > Such considerations are typically important in signal processing, to > manipulate covariance matrices for whitening and averaging. > > Ga?l > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev From gael.varoquaux at normalesup.org Mon Oct 29 06:34:14 2018 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 29 Oct 2018 11:34:14 +0100 Subject: [SciPy-Dev] matrix_sqrt for singular symmetric square matrices In-Reply-To: <5520e7cc6fa3eea0da449a51ca3e24804f132786.camel@iki.fi> References: <20181028075637.d2mhsekvxw6sswi7@phare.normalesup.org> <5520e7cc6fa3eea0da449a51ca3e24804f132786.camel@iki.fi> Message-ID: <20181029103414.fuyh3ygxwfdpsqcp@phare.normalesup.org> On Sun, Oct 28, 2018 at 03:12:43PM +0100, Pauli Virtanen wrote: > Right indeed if R=R' the two equations are the same. scipy.linalg.sqrtm > computes the principal matrix square root, which then should be SDP > too. Indeed, it probably works for our needs. I had forgotten about this function. Thank you. > So for the question whether there's something to add to scipy here, I'm > not so sure --- computation of the principal sqrtm, Cholesky, LDLt, and > eigenvalue decomposition is there. Me neither. The question is whether using eigh is more numerically stable than the schur decomposition used in scipy.linalg.sqrtm. I do not know, I must admit. Ga?l From rlucente at pipeline.com Mon Oct 29 18:01:09 2018 From: rlucente at pipeline.com (rlucente at pipeline.com) Date: Mon, 29 Oct 2018 18:01:09 -0400 Subject: [SciPy-Dev] Using Spark to scale SciPy? Message-ID: <047c01d46fd2$e6514c90$b2f3e5b0$@pipeline.com> I ran into a blog post titled Prediction at Scale with scikit-learn and PySpark Pandas UDFs by Michael Heilman https://medium.com/civis-analytics/prediction-at-scale-with-scikit-learn-and -pyspark-pandas-udfs-51d5ebfb2cd8 It seems to do a good job because it makes statements like One issue is that passing data between a) Java-based Spark execution processes, which send data between machines and can perform transformations super-efficiently, and b) a Python process (e.g., for predicting with scikit-learn) incurs some overhead due to serialization and inter-process communication. The article goes on to mention approaches to mitigate the above issues. For example: Having UDFs expect Pandas Series also saves converting between Python and NumPy floating point representations for scikit-learn There was actually a SciPy talk on this Performing Dimension Reduction at Scale with Applications to Public Sentiment Models | SciPy 2018 https://www.youtube.com/watch?v=31YeSfDklfc The other good thing is that it has code https://gist.github.com/mheilman/6ce261549b55bf4997ec102ad4e8d643#file-pyspa rk_pandas_udf_sklearn-ipynb I was wondering what approaches people have used to scale SciPy? The Bit Plumber -------------- next part -------------- An HTML attachment was scrubbed... URL: From rth.yurchak at pm.me Tue Oct 30 19:56:15 2018 From: rth.yurchak at pm.me (Roman Yurchak) Date: Tue, 30 Oct 2018 23:56:15 +0000 Subject: [SciPy-Dev] building scipy for WebAssembly Message-ID: Hello, I am currently working on building scipy for WebAssembly as part of pyodide project (https://github.com/iodide-project/pyodide) and I was hoping for some feedback on that process. There is a preliminary build in https://github.com/iodide-project/pyodide/pull/211 Currently, this build uses scipy 0.17.1 as from what I understood, that was one of the last versions that only included f77 without any f90 (https://github.com/scipy/scipy/issues/2829#issuecomment-223764054). In the WebAssembly environment there is currently no reliably working Fortran compiler (https://github.com/iodide-project/pyodide/issues/184), and f90 cannot be converted to C with f2c unlike f77. If one wanted to (experimentally) compile the latest version of scipy without a Fortran compiler what would be your suggestions? i.e. - are there any alternatives to f2c that you think might work for f90 - or any automatic converter f90 to f77 (so that f2c could be used) I did search but maybe I missed something. Alternatively, if that's really not realistic, could someone please comment on the rate of adoption for f90/f95 in the scipy code base? The decision to support f90 was taken before the 0.18 release in 2016 but I'm not sure what impact it had on the code base. In other words maybe there is a later version than 0.17.1 that might (mostly) work? Another point is linking BLAS/LAPACK. Currently reference BLAS and CLAPACK are linked statically as I haven't managed to do this dynamically yet. The issue is the package size. As LAPACK, which is quite large, gets repeatedly included in around ~10 different .so modules, resulting in a 170MB large package (after compression), as opposed to ~30MB compressed package without BLAS/LAPACK. That is quite problematic when one is expected to download the dependencies at each page load (excluding caching). I'm not sure if there are other distributions of scipy that use static linking of LAPACK? or other things worth trying to reduce the package size, short of trying to make dynamic linking work or try to detect and strip unused symbols? Also in scipy/linalg/setup.py I was wondering why/how the ATLAS_INFO macro defined the existence of `scipy.linalg._clapack`? For instance, when using CLAPACK (with libf2c), would the following be correct? lapack_opt = {'libraries': ['f2c', 'blas', 'lapack'], 'include_dirs': [], 'library_dirs': [''], 'language': 'f77', 'define_macros': [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]} which does not appear to build `scipy.linalg._clapack`. Finally, in later scipy versions, https://scipy.github.io/devdocs/building/linux.html suggests that with ?export LAPACK=? one can build scipy without LAPACK, but then _flapack is still a mandatory import in scipy.linalg.__init___ (via .lapack) as far as I can tell. Is scipy.linalg (and anything that depends on it) expected to produce import errors in the latter case then? In general, any comments or suggestions would be appreciated, Thank you, Roman From tom.w.augspurger at gmail.com Wed Oct 31 15:26:51 2018 From: tom.w.augspurger at gmail.com (Tom Augspurger) Date: Wed, 31 Oct 2018 12:26:51 -0700 Subject: [SciPy-Dev] Using Spark to scale SciPy? In-Reply-To: <047c01d46fd2$e6514c90$b2f3e5b0$@pipeline.com> References: <047c01d46fd2$e6514c90$b2f3e5b0$@pipeline.com> Message-ID: On Mon, Oct 29, 2018 at 3:17 PM wrote: > I ran into a blog post titled > > > > Prediction at Scale with scikit-learn and PySpark Pandas UDFs by Michael > Heilman > > > > > https://medium.com/civis-analytics/prediction-at-scale-with-scikit-learn-and-pyspark-pandas-udfs-51d5ebfb2cd8 > > > > It seems to do a good job because it makes statements like > > > > One issue is that passing data between > > a) Java-based Spark execution processes, which send data between machines > and can perform transformations super-efficiently, > > and > > b) a Python process (e.g., for predicting with scikit-learn) > > incurs some overhead due to serialization and inter-process communication. > > > > The article goes on to mention approaches to mitigate the above issues. > > > > For example: Having UDFs expect Pandas Series also saves converting > between Python and NumPy floating point representations for scikit-learn > > > > There was actually a SciPy talk on this > > Performing Dimension Reduction at Scale with Applications to Public > Sentiment Models | SciPy 2018 > > https://www.youtube.com/watch?v=31YeSfDklfc > > > > The other good thing is that it has code > > > > > https://gist.github.com/mheilman/6ce261549b55bf4997ec102ad4e8d643#file-pyspark_pandas_udf_sklearn-ipynb > > > > I was wondering what approaches people have used to scale SciPy? > http://examples.dask.org/machine-learning/parallel-prediction.html has an example similar to the civil blog post, but uses Dask instead of Spark. > The Bit Plumber > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: