From ralf.gommers at gmail.com Sun Mar 1 14:35:28 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 1 Mar 2020 20:35:28 +0100 Subject: [SciPy-Dev] Disable rpaths in shared objects when cross compiling In-Reply-To: <20200227214628.b6wmzrlvxjl7wrkd@Kepler> References: <20200227214237.ddm7cqhq77i4j24a@Kepler> <20200227214628.b6wmzrlvxjl7wrkd@Kepler> Message-ID: On Thu, Feb 27, 2020 at 10:46 PM Greg Anders wrote: > Hi all, > > I'm cross compiling Scipy in Yocto Linux for an embedded platform. I'm > able to compile Scipy, but Yocto is giving me warnings because the > shared objects are linked with the -rpath flag using absolute paths on > the build host. I'd like to disable the -rpath flag when linking the > shared objects, but I'm not sure how to do that. > > I've dug through the numpy.distutils code and I found the following in > the CCompiler_customize_cmd function: > > if allow('rpath'): > self.set_runtime_library_dirs(cmd.rpath) > > So it *looks* like if I can pass 'rpath' into the optional `ignore` > paramater to the `customize_cmd` function, numpy will not use the -rpath > flag when linking. > > However, it's not clear to me how to _use_ the `ignore` parameter of > `customize_cmd`. All usages of `customize_cmd` in build_ext.py only use > a single parameter, so `ignore` is always set to the default value of an > empty tuple. > > Is there an easier way to disable rpaths than patching numpy.distutils > to path `ignore=('rpath')` as a second parameter to `customize_cmd`? > Hi Greg, the not so great news is that cross-compiling is in general not well-supported by distutils. I don't have an answer to your specific question, but I do want to point you to an issue where people were discussing Yocto cross-compiling: https://github.com/scipy/scipy/issues/8571. Hopefully that's of help. Cheers, Ralf > Thanks! > > Greg > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.baumgarten at gmail.com Sun Mar 1 16:35:51 2020 From: christoph.baumgarten at gmail.com (Christoph Baumgarten) Date: Sun, 1 Mar 2020 22:35:51 +0100 Subject: [SciPy-Dev] Review of PR 11119 and 10796 Message-ID: Hi, two of my PRs have been open for a while, and it would be great if someone has time for a review: Cramer-von-Mises test: https://github.com/scipy/scipy/pull/11119 Exact p-values of the Wilcoxon test: https://github.com/scipy/scipy/pull/10796 Thanks Christoph -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Sun Mar 1 17:16:40 2020 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 1 Mar 2020 17:16:40 -0500 Subject: [SciPy-Dev] Review of PR 11119 and 10796 In-Reply-To: References: Message-ID: Hi Christoph, Thanks for the gentle reminder. I'll take a look this evening. Warren On 3/1/20, Christoph Baumgarten wrote: > Hi, > > two of my PRs have been open for a while, and it would be great if someone > has time for a review: > > Cramer-von-Mises test: https://github.com/scipy/scipy/pull/11119 > > Exact p-values of the Wilcoxon test: > https://github.com/scipy/scipy/pull/10796 > > Thanks > > Christoph > From wsw.raczek at gmail.com Mon Mar 2 04:40:08 2020 From: wsw.raczek at gmail.com (=?UTF-8?B?V8WCYWR5c8WCYXcgUmFjemVr?=) Date: Mon, 2 Mar 2020 10:40:08 +0100 Subject: [SciPy-Dev] What for the new contributor? Message-ID: Hi everybody, I'm third year student on Theoretical Computer Science. This year we have to contribute to open-source project, and I've chosen SciPy because it seems to be interesting, and it also seems to have lot of things to do. I've looked rather briefly through the list of issues, and I've come to thought that each issue is either a documentation issue, or some pretty complicated stuff. Maybe someone here has any thoughts on how can a new person in project (like me) contribute, e.g. You have some issues in mind, or You yourself opened some issue You consider suitable for a new contributor, and it didn't get a PR yet, or smth. else :) Would be truly grateful for help! Regards, Vladyslav Rachek P.S. This isn't about being afraid of doing "complicated stuff" but rather about help with suggesting about how to get to that "stuff" from beginner's level, doing useful things on the way :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlucas7 at vt.edu Tue Mar 3 17:01:45 2020 From: rlucas7 at vt.edu (rlucas7 at vt.edu) Date: Tue, 3 Mar 2020 17:01:45 -0500 Subject: [SciPy-Dev] What for the new contributor? In-Reply-To: References: Message-ID: <4081F1EA-E440-4D81-9C45-3127CF21DADB@vt.edu> > > On Mar 2, 2020, at 4:40 AM, W?adys?aw Raczek wrote: > > ? > Hi everybody, > Welcome. > I'm third year student on Theoretical Computer Science. This year we have to contribute to open-source project, and I've chosen SciPy because it seems to be interesting, and it also seems to have lot of things to do. I've looked rather briefly through the list of issues, and I've come to thought that each issue is either a documentation issue, or some pretty complicated stuff. Did you search for the issues with tag, ?good first issue?? Those are usually a good place to start. If you aren?t super familiar with github open source dev work, a doc change is a good first PR to get the feel for the workflow without muddying the water with other stuff (besides the docstring changes). > Maybe someone here has any thoughts on how can a new person in project (like me) contribute, e.g. You have some issues in mind, or You yourself opened some issue You consider suitable for a new contributor, and it didn't get a PR yet, or smth. else :) SciPy has a bunch of different subpackages, many of the regulars here are more familiar with specific packages than others. It might help if you identify a specific sub package, e.g. are you more familiar/comfortable with optimization, or with special functions, or statistics, etc. Of course some folks have a broader scope and may be able to help on the overall SciPy but they are fewer. > Would be truly grateful for help! We are grateful for your enthusiasm and future efforts in SciPy > > Regards, > Vladyslav Rachek > > P.S. This isn't about being afraid of doing "complicated stuff" but rather about help with suggesting about how to get to that "stuff" from beginner's level, doing useful things on the way :) > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From larson.eric.d at gmail.com Thu Mar 5 15:13:21 2020 From: larson.eric.d at gmail.com (Eric Larson) Date: Thu, 5 Mar 2020 15:13:21 -0500 Subject: [SciPy-Dev] SciPy Contribution: Gammatone Filters in scipy.signal In-Reply-To: References: Message-ID: Seems like a reasonable addition to me. At first I thought it might be a little bit specialized (despite having worked with them a little bit myself). However, there are a reasonable number of Google hits in a search, and it appears to be a part of the MATLAB "audio" toolbox, suggesting it probably does have sufficiently wide-ranging uses to warrant inclusion in SciPy. My 2c, Eric On Fri, Feb 28, 2020 at 10:12 AM Todd wrote: > I would be extremely interested in having gammatone filters implemented. > I have been wanting it for a long time, but haven't gotten around to doing > it myself yet. But I don't speak for the scipy core developers, they would > need to weigh in on this. > > On Thu, Feb 27, 2020 at 11:16 AM Shashaank Narayanan < > shashaank.n at columbia.edu> wrote: > >> Hi, >> >> Newbie here. I just wanted to follow up on the Gammatone filters for >> signal processing contribution idea that I posted recently. I would also be >> interested in contributing to SciPy by doing maintenance/bug fixes for the >> scipy.signal module. >> >> >> Thanks, >> Shashaank >> >> On Thu, Jan 30, 2020 at 5:38 PM Shashaank Narayanan < >> shashaank.n at columbia.edu> wrote: >> >>> Hello SciPy Team, >>> >>> I am new to this mailing list, and I am interested in contributing to >>> SciPy. I would like to suggest a new feature to be added to the >>> scipy.signal module: gammatone filters. Gammatone filters are becoming >>> increasingly popular in the fields of digital signal processing and music >>> analysis as it effectively models the auditory filters of the human >>> auditory system. Currently, there are very few implementations of gammatone >>> filters available for Python, and these implementations are not generalized >>> to basic finite impulse response (FIR) and infinite impulse response (IIR) >>> filters like SciPy has. >>> >>> I have written my own gammatone FIR filter using NumPy based on Malcolm >>> Slaney's 1993 paper on the topic ( >>> https://engineering.purdue.edu/~malcolm/apple/tr35/PattersonsEar.pdf). >>> This paper was used for Matlab's implementation of gammatone filters. I am >>> in the process of writing a gammatone IIR filter with NumPy and SciPy. >>> Please let me know if this feature will fit with the scipy.signal module. >>> Appreciate your time and guidance. >>> >>> >>> Thanks, >>> Shashaank >>> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shashaank.n at columbia.edu Fri Mar 6 14:38:45 2020 From: shashaank.n at columbia.edu (Shashaank Narayanan) Date: Fri, 6 Mar 2020 14:38:45 -0500 Subject: [SciPy-Dev] SciPy Contribution: Gammatone Filters in scipy.signal In-Reply-To: References: Message-ID: I've been looking into the new code contribution process on the SciPy website. If someone gives me the green light and is willing to review my code, I can begin creating the development environment on my machine and creating unit tests/benchmarks for the new functions. I hope that these gammatone filters can be included in SciPy, because they have a wide range of applications from neural auditory science to signal processing, and there is currently no efficient implementation for Python so far. - Shashaank On Thu, Mar 5, 2020 at 3:13 PM Eric Larson wrote: > Seems like a reasonable addition to me. At first I thought it might be a > little bit specialized (despite having worked with them a little bit > myself). However, there are a reasonable number of Google hits in a search, > and it appears to be a part of the MATLAB "audio" toolbox, suggesting it > probably does have sufficiently wide-ranging uses to warrant inclusion in > SciPy. > > My 2c, > Eric > > > On Fri, Feb 28, 2020 at 10:12 AM Todd wrote: > >> I would be extremely interested in having gammatone filters implemented. >> I have been wanting it for a long time, but haven't gotten around to >> doing it myself yet. But I don't speak for the scipy core developers, they >> would need to weigh in on this. >> >> On Thu, Feb 27, 2020 at 11:16 AM Shashaank Narayanan < >> shashaank.n at columbia.edu> wrote: >> >>> Hi, >>> >>> Newbie here. I just wanted to follow up on the Gammatone filters for >>> signal processing contribution idea that I posted recently. I would also be >>> interested in contributing to SciPy by doing maintenance/bug fixes for the >>> scipy.signal module. >>> >>> >>> Thanks, >>> Shashaank >>> >>> On Thu, Jan 30, 2020 at 5:38 PM Shashaank Narayanan < >>> shashaank.n at columbia.edu> wrote: >>> >>>> Hello SciPy Team, >>>> >>>> I am new to this mailing list, and I am interested in contributing to >>>> SciPy. I would like to suggest a new feature to be added to the >>>> scipy.signal module: gammatone filters. Gammatone filters are becoming >>>> increasingly popular in the fields of digital signal processing and music >>>> analysis as it effectively models the auditory filters of the human >>>> auditory system. Currently, there are very few implementations of gammatone >>>> filters available for Python, and these implementations are not generalized >>>> to basic finite impulse response (FIR) and infinite impulse response (IIR) >>>> filters like SciPy has. >>>> >>>> I have written my own gammatone FIR filter using NumPy based on Malcolm >>>> Slaney's 1993 paper on the topic ( >>>> https://engineering.purdue.edu/~malcolm/apple/tr35/PattersonsEar.pdf). >>>> This paper was used for Matlab's implementation of gammatone filters. I am >>>> in the process of writing a gammatone IIR filter with NumPy and SciPy. >>>> Please let me know if this feature will fit with the scipy.signal module. >>>> Appreciate your time and guidance. >>>> >>>> >>>> Thanks, >>>> Shashaank >>>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at python.org >>> https://mail.python.org/mailman/listinfo/scipy-dev >>> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Mar 7 02:36:43 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 7 Mar 2020 08:36:43 +0100 Subject: [SciPy-Dev] SciPy Contribution: Gammatone Filters in scipy.signal In-Reply-To: References: Message-ID: On Fri, Mar 6, 2020 at 9:36 PM Shashaank Narayanan wrote: > I've been looking into the new code contribution process on the SciPy > website. If someone gives me the green light and is willing to review my > code, I can begin creating the development environment on my machine and > creating unit tests/benchmarks for the new functions. I hope that these > gammatone filters can be included in SciPy, because they have a wide range > of applications from neural auditory science to signal processing, and > there is currently no efficient implementation for Python so far. > Hi Shashaank, Eric and Todd are both interested and no one has brought up a concern, so let's call it good - please go for it! Cheers, Ralf > - Shashaank > > On Thu, Mar 5, 2020 at 3:13 PM Eric Larson > wrote: > >> Seems like a reasonable addition to me. At first I thought it might be a >> little bit specialized (despite having worked with them a little bit >> myself). However, there are a reasonable number of Google hits in a search, >> and it appears to be a part of the MATLAB "audio" toolbox, suggesting it >> probably does have sufficiently wide-ranging uses to warrant inclusion in >> SciPy. >> >> My 2c, >> Eric >> >> >> On Fri, Feb 28, 2020 at 10:12 AM Todd wrote: >> >>> I would be extremely interested in having gammatone filters implemented. >>> I have been wanting it for a long time, but haven't gotten around to >>> doing it myself yet. But I don't speak for the scipy core developers, they >>> would need to weigh in on this. >>> >>> On Thu, Feb 27, 2020 at 11:16 AM Shashaank Narayanan < >>> shashaank.n at columbia.edu> wrote: >>> >>>> Hi, >>>> >>>> Newbie here. I just wanted to follow up on the Gammatone filters for >>>> signal processing contribution idea that I posted recently. I would also be >>>> interested in contributing to SciPy by doing maintenance/bug fixes for the >>>> scipy.signal module. >>>> >>>> >>>> Thanks, >>>> Shashaank >>>> >>>> On Thu, Jan 30, 2020 at 5:38 PM Shashaank Narayanan < >>>> shashaank.n at columbia.edu> wrote: >>>> >>>>> Hello SciPy Team, >>>>> >>>>> I am new to this mailing list, and I am interested in contributing to >>>>> SciPy. I would like to suggest a new feature to be added to the >>>>> scipy.signal module: gammatone filters. Gammatone filters are becoming >>>>> increasingly popular in the fields of digital signal processing and music >>>>> analysis as it effectively models the auditory filters of the human >>>>> auditory system. Currently, there are very few implementations of gammatone >>>>> filters available for Python, and these implementations are not generalized >>>>> to basic finite impulse response (FIR) and infinite impulse response (IIR) >>>>> filters like SciPy has. >>>>> >>>>> I have written my own gammatone FIR filter using NumPy based on >>>>> Malcolm Slaney's 1993 paper on the topic ( >>>>> https://engineering.purdue.edu/~malcolm/apple/tr35/PattersonsEar.pdf). >>>>> This paper was used for Matlab's implementation of gammatone filters. I am >>>>> in the process of writing a gammatone IIR filter with NumPy and SciPy. >>>>> Please let me know if this feature will fit with the scipy.signal module. >>>>> Appreciate your time and guidance. >>>>> >>>>> >>>>> Thanks, >>>>> Shashaank >>>>> >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at python.org >>>> https://mail.python.org/mailman/listinfo/scipy-dev >>>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at python.org >>> https://mail.python.org/mailman/listinfo/scipy-dev >>> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Mar 8 13:02:24 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 8 Mar 2020 18:02:24 +0100 Subject: [SciPy-Dev] changing default for median_absolute_deviation In-Reply-To: References: Message-ID: On Mon, Feb 17, 2020 at 3:37 AM Lucas Roberts wrote: > Hi scipy-dev, > > Looking for guidance if the change warrants ignoring deprecation policy or > if I should create a deprecation warning for existing behavior to change > defaults in median_absolute_deviation() > Hi Lucas, I think you can't do either of those things. Ignoring deprecation policy should only be done if something is a clear bug. In the absence of that, there's no good reason to silently (or even with a warning) return a different numerical value. Here, the choice seems to be between leaving as is or deprecating the function and introducing it with a different name. > PR: > https://github.com/scipy/scipy/pull/11431 > > [CONTEXT] > In this PR: > https://github.com/scipy/scipy/pull/9637 > the median_absolute_deviation() function was added to scipy.stats > the function takes in a scale argument that allows for a robust estimator > of the scale (robust to outliers). The choice of scale constant assuming > normal input data seemed reasonable to me at the time. > However, in this PR: > https://github.com/scipy/scipy/issues/11090 > a few thought otherwise and we came to the conclusion that the default > scale should be 1 which would also ensure internal consistency with the > stats.iqr() function (similar signature and functionality). > > I've opened a PR to change the defaults here: > https://github.com/scipy/scipy/pull/11431 > and pinged those on the 11090 issue. > > [ON DEPRECATION]: > > [REASON FOR] > Main reason to not follow deprecation policy are: > 1. Following deprecation policy would maintain existing confusing behavior > for up to 1 year from now. > 2. Function released in 1.3.0 so somewhat new > 3. Seems the default is confusing users and is non-obvious and > inconsistent with iqr() default > (3) is the main reason to do something here probably, but skipping a deprecation warning still isn't warranted. > [REASON NOT] > 1. Some users may depend on existing default > 2. Defaults with normal scaling exist in several places (c.f. 11090 issue > comments) > 3. Deprecation warning needs to be done to give fair warning of the > change. > > [COMMENTS] > I haven't much experience here with deprecations, when to do it vs > consider a defect so would appreciate any guidance. > Defect = incorrect behavior (rather than confusing behavior). > If we think the deprecation cycle should follow, should I leave the PR and > open a separate deprecation PR? > https://github.com/scipy/scipy/pull/11431 > Alternately could revert changes and convert to a deprecation warning PR. > I'll comment on PR and issue in more detail. Cheers, Ralf > Thanks in advance. > -- > -Lucas > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tyler.je.reddy at gmail.com Mon Mar 9 17:31:09 2020 From: tyler.je.reddy at gmail.com (Tyler Reddy) Date: Mon, 9 Mar 2020 15:31:09 -0600 Subject: [SciPy-Dev] Welcome Peter Larson to Core Team! Message-ID: I'm pleased to announce the addition of Peter Larson ( https://github.com/pmla ) to the SciPy Core Developer team! Welcome! Best wishes, Tyler -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Mon Mar 9 17:33:18 2020 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Mon, 09 Mar 2020 14:33:18 -0700 Subject: [SciPy-Dev] Welcome Peter Larson to Core Team! In-Reply-To: References: Message-ID: On Mon, Mar 9, 2020, at 14:31, Tyler Reddy wrote: > I'm pleased to announce the addition of Peter Larson ( https://github.com/pmla ) to the SciPy Core Developer team! Welcome! Welcome, Peter, it's good to have you on board! St?fan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Mar 9 18:00:28 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 9 Mar 2020 23:00:28 +0100 Subject: [SciPy-Dev] Welcome Peter Larson to Core Team! In-Reply-To: References: Message-ID: On Mon, Mar 9, 2020 at 10:33 PM Stefan van der Walt wrote: > On Mon, Mar 9, 2020, at 14:31, Tyler Reddy wrote: > > I'm pleased to announce the addition of Peter Larson ( > https://github.com/pmla ) to the SciPy Core Developer team! Welcome! > > > Welcome, Peter, it's good to have you on board! > +1, welcome Peter! Peter's improvements to scipy.spatial ( https://github.com/scipy/scipy/pulls/pmla) over the past 9 months have been awesome! Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Mon Mar 9 20:14:17 2020 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Mon, 9 Mar 2020 20:14:17 -0400 Subject: [SciPy-Dev] Welcome Peter Larson to Core Team! In-Reply-To: References: Message-ID: On 3/9/20, Tyler Reddy wrote: > I'm pleased to announce the addition of Peter Larson ( > https://github.com/pmla ) to the SciPy Core Developer team! Welcome! > Welcome, Peter. Thanks for all the great work done so far. Looking forward to more! Warren > Best wishes, > Tyler > From charlesr.harris at gmail.com Mon Mar 9 21:15:40 2020 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 9 Mar 2020 19:15:40 -0600 Subject: [SciPy-Dev] Welcome Peter Larson to Core Team! In-Reply-To: References: Message-ID: On Mon, Mar 9, 2020 at 3:32 PM Tyler Reddy wrote: > I'm pleased to announce the addition of Peter Larson ( > https://github.com/pmla ) to the SciPy Core Developer team! Welcome! > > Best wishes, > Tyler > Welcome Peter. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From evgeny.burovskiy at gmail.com Tue Mar 10 00:31:00 2020 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Tue, 10 Mar 2020 07:31:00 +0300 Subject: [SciPy-Dev] Welcome Peter Larson to Core Team! In-Reply-To: References: Message-ID: Welcome Peter! Cheers, Evgeni ??, 10 ???. 2020 ?., 4:16 Charles R Harris : > > > On Mon, Mar 9, 2020 at 3:32 PM Tyler Reddy > wrote: > >> I'm pleased to announce the addition of Peter Larson ( >> https://github.com/pmla ) to the SciPy Core Developer team! Welcome! >> >> Best wishes, >> Tyler >> > > Welcome Peter. > > Chuck > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilhanpolat at gmail.com Tue Mar 10 09:22:03 2020 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Tue, 10 Mar 2020 14:22:03 +0100 Subject: [SciPy-Dev] Welcome Peter Larson to Core Team! In-Reply-To: References: Message-ID: Welcome to the team Peter! On Tue, Mar 10, 2020, 05:32 Evgeni Burovski wrote: > Welcome Peter! > > Cheers, > Evgeni > > ??, 10 ???. 2020 ?., 4:16 Charles R Harris : > >> >> >> On Mon, Mar 9, 2020 at 3:32 PM Tyler Reddy >> wrote: >> >>> I'm pleased to announce the addition of Peter Larson ( >>> https://github.com/pmla ) to the SciPy Core Developer team! Welcome! >>> >>> Best wishes, >>> Tyler >>> >> >> Welcome Peter. >> >> Chuck >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lorentzen.ch at gmail.com Wed Mar 11 16:34:39 2020 From: lorentzen.ch at gmail.com (Christian Lorentzen) Date: Wed, 11 Mar 2020 21:34:39 +0100 Subject: [SciPy-Dev] Tweedie distributions in scipy.stats Message-ID: <4ef6c60d-7d2f-199c-30c0-7106e6761ee5@googlemail.com> Dear Scipy Developers and mailing list Readers I'd like to address the issue [1] to implement Tweedie distributions [2] in scipy.stats. Purpose The family of Tweedie distributions contains many known distributions like the Poisson and the Gamma distribution, but also distributions between them, aka compound poisson gamma distribution, see [3]. These are often appropriate for insurance claims and other fields, where one has a (Poisson) random count process of events and every event has a (Gamma) random size/amount. The distribution would enable simulations, maximum likelihood estimation of all parameters, choice and visualization of distributions, etc. Implementation I started PR [4] for Wrights generalized Bessel functions as a private function in scipy.special. Once this is ready, the pdf follows immediately. For the range of interest of Y ~ compound poisson gamma distribution, the distribution of Y has a point mass at zero and is otherwise continuous for Y>0. As already discussed in the issue [1], Tweedie might best fit as |rv_generic|. As such, it would be the first one, all others are either |rv_discrete| or |rv_continuous|. Without a template, I would need guidance how to implement a new rv_generic. References: [1] https://github.com/scipy/scipy/issues/11291 [2] https://en.wikipedia.org/wiki/Tweedie_distribution [3] https://en.wikipedia.org/wiki/Compound_Poisson_distribution [4] https://github.com/scipy/scipy/pull/11313 I'm looking forward to your feeback, thoughts and insights. Kind regards, Christian -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Mar 11 17:01:46 2020 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 11 Mar 2020 16:01:46 -0500 Subject: [SciPy-Dev] Tweedie distributions in scipy.stats In-Reply-To: <4ef6c60d-7d2f-199c-30c0-7106e6761ee5@googlemail.com> References: <4ef6c60d-7d2f-199c-30c0-7106e6761ee5@googlemail.com> Message-ID: On Wed, Mar 11, 2020 at 3:35 PM Christian Lorentzen wrote: > Dear Scipy Developers and mailing list Readers > > I'd like to address the issue [1] to implement Tweedie distributions [2] > in scipy.stats. > > Purpose > The family of Tweedie distributions contains many known distributions like > the Poisson and the Gamma distribution, but also distributions between > them, aka compound poisson gamma distribution, see [3]. These are often > appropriate for insurance claims and other fields, where one has a > (Poisson) random count process of events and every event has a (Gamma) > random size/amount. > The distribution would enable simulations, maximum likelihood estimation > of all parameters, choice and visualization of distributions, etc. > > Implementation > I started PR [4] for Wrights generalized Bessel functions as a private > function in scipy.special. > Once this is ready, the pdf follows immediately. > For the range of interest of Y ~ compound poisson gamma distribution, the > distribution of Y has a point mass at zero and is otherwise continuous for > Y>0. > As already discussed in the issue [1], Tweedie might best fit as > rv_generic. > As such, it would be the first one, all others are either rv_discrete or > rv_continuous. > Without a template, I would need guidance how to implement a new > rv_generic. > FWIW, `rv_generic` isn't really intended to be a concrete class. It was only intended to be a base class implementing the common parts needed by `rv_continuous` and `rv_discrete`. Nothing "fits into" `rv_generic`, per se. The Tweedie distributions, for some parameters at least, may not fit into the `scipy.stats` infrastructure at all. We have no infrastructure for continuous-with-point-mass distributions. `rv_generic` is still built under the assumption that it's going to be implementing *either* a continuous *or* a discrete distribution. I recommend implementing the functionality that you need outside of scipy following whatever API solves your problems best. Then we can evaluate if there is infrastructure that can be built that would help the second continuous-with-point-mass distribution that we might want next. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Mar 11 17:41:52 2020 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 11 Mar 2020 17:41:52 -0400 Subject: [SciPy-Dev] Tweedie distributions in scipy.stats In-Reply-To: References: <4ef6c60d-7d2f-199c-30c0-7106e6761ee5@googlemail.com> Message-ID: On Wed, Mar 11, 2020 at 5:02 PM Robert Kern wrote: > On Wed, Mar 11, 2020 at 3:35 PM Christian Lorentzen < > lorentzen.ch at gmail.com> wrote: > >> Dear Scipy Developers and mailing list Readers >> >> I'd like to address the issue [1] to implement Tweedie distributions [2] >> in scipy.stats. >> >> Purpose >> The family of Tweedie distributions contains many known distributions >> like the Poisson and the Gamma distribution, but also distributions between >> them, aka compound poisson gamma distribution, see [3]. These are often >> appropriate for insurance claims and other fields, where one has a >> (Poisson) random count process of events and every event has a (Gamma) >> random size/amount. >> The distribution would enable simulations, maximum likelihood estimation >> of all parameters, choice and visualization of distributions, etc. >> >> Implementation >> I started PR [4] for Wrights generalized Bessel functions as a private >> function in scipy.special. >> Once this is ready, the pdf follows immediately. >> For the range of interest of Y ~ compound poisson gamma distribution, the >> distribution of Y has a point mass at zero and is otherwise continuous for >> Y>0. >> As already discussed in the issue [1], Tweedie might best fit as >> rv_generic. >> As such, it would be the first one, all others are either rv_discrete or >> rv_continuous. >> Without a template, I would need guidance how to implement a new >> rv_generic. >> > FWIW, `rv_generic` isn't really intended to be a concrete class. It was > only intended to be a base class implementing the common parts needed by > `rv_continuous` and `rv_discrete`. Nothing "fits into" `rv_generic`, per > se. The Tweedie distributions, for some parameters at least, may not fit > into the `scipy.stats` infrastructure at all. We have no infrastructure for > continuous-with-point-mass distributions. `rv_generic` is still built under > the assumption that it's going to be implementing *either* a continuous > *or* a discrete distribution. > > I recommend implementing the functionality that you need outside of scipy > following whatever API solves your problems best. Then we can evaluate if > there is infrastructure that can be built that would help the second > continuous-with-point-mass distribution that we might want next. > a long long time ago, I started a ParametricMixture model for this https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/distributions/otherdist.py (until I gave up on distributions) An alternative as temporary solution would be to add some methods/functions like logpdf to scipy.stats, so statsmodels and sklearn can reuse those. Josef > > -- > Robert Kern > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lorentzen.ch at gmail.com Fri Mar 13 11:16:20 2020 From: lorentzen.ch at gmail.com (Christian Lorentzen) Date: Fri, 13 Mar 2020 16:16:20 +0100 Subject: [SciPy-Dev] Tweedie distributions in scipy.stats In-Reply-To: References: <4ef6c60d-7d2f-199c-30c0-7106e6761ee5@googlemail.com> Message-ID: Thank you for your feedback. If it were possible to mix distributions together, e.g. rv1 + rv2, the compound poisson could be represented. AFAIK, PyMC3 supports that with "Mixture" distributions [1]. So I see three options: 1. Implement only a log likelihood function as Josef suggests. In which package to put it? Scipy, Statsmodels, PyMC3? 2. Ask PyMC3 developers for this distribution. 3. Ask Statsmodel developers for this distribution. @Josef: I hereby ask you;-) The tricky part: As I intend to calculate the log likelihood via Wrights generalized Bessel function and PR [2] implements this as a private special function, can this function stay in scipy or should it move to the other packages in that case. [1] https://docs.pymc.io/api/distributions/mixture.html [2] https://github.com/scipy/scipy/pull/11313 Kind regards Christian On 11.03.20 22:41, josef.pktd at gmail.com wrote: > > > On Wed, Mar 11, 2020 at 5:02 PM Robert Kern > wrote: > > On Wed, Mar 11, 2020 at 3:35 PM Christian Lorentzen > > wrote: > > Dear Scipy Developers and mailing list Readers > > I'd like to address the issue [1] to implement Tweedie > distributions [2] in scipy.stats. > > Purpose > The family of Tweedie distributions contains many known > distributions like the Poisson and the Gamma distribution, but > also distributions between them, aka compound poisson gamma > distribution, see [3]. These are often appropriate for > insurance claims and other fields, where one has a (Poisson) > random count process of events and every event has a (Gamma) > random size/amount. > The distribution would enable simulations, maximum likelihood > estimation of all parameters, choice and visualization of > distributions, etc. > > Implementation > I started PR [4] for Wrights generalized Bessel functions as a > private function in scipy.special. > Once this is ready, the pdf follows immediately. > For the range of interest of Y ~ compound poisson gamma > distribution, the distribution of Y has a point mass at zero > and is otherwise continuous for Y>0. > As already discussed in the issue [1], Tweedie might best fit > as |rv_generic|. > As such, it would be the first one, all others are either > |rv_discrete| or |rv_continuous|. > Without a template, I would need guidance how to implement a > new rv_generic. > > FWIW, `rv_generic` isn't really intended to be a concrete class. > It was only intended to be a base class implementing the common > parts needed by `rv_continuous` and `rv_discrete`. Nothing "fits > into" `rv_generic`, per se. The Tweedie distributions, for some > parameters at least, may not fit into the `scipy.stats` > infrastructure at all. We have no infrastructure for > continuous-with-point-mass distributions. `rv_generic` is still > built under the assumption that it's going to be implementing > /either/?a continuous /or/?a discrete distribution. > > I recommend implementing the functionality that you need outside > of scipy following whatever API solves your problems best. Then we > can evaluate if there is infrastructure that can be built that > would help the second continuous-with-point-mass distribution that > we might want next. > > > a long long time ago, I started a ParametricMixture model for this > https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/distributions/otherdist.py > > (until I gave up on distributions) > > An alternative as temporary solution would be to add some > methods/functions like logpdf to scipy.stats, so statsmodels and > sklearn can reuse those. > > Josef > > -- > Robert Kern > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Mar 13 11:46:49 2020 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 13 Mar 2020 11:46:49 -0400 Subject: [SciPy-Dev] Tweedie distributions in scipy.stats In-Reply-To: References: <4ef6c60d-7d2f-199c-30c0-7106e6761ee5@googlemail.com> Message-ID: On Fri, Mar 13, 2020 at 11:16 AM Christian Lorentzen wrote: > Thank you for your feedback. > > If it were possible to mix distributions together, e.g. rv1 + rv2, the > compound poisson could be represented. AFAIK, PyMC3 supports that with > "Mixture" distributions [1]. So I see three options: > > 1. Implement only a log likelihood function as Josef suggests. In > which package to put it? Scipy, Statsmodels, PyMC3? > 2. Ask PyMC3 developers for this distribution. > 3. Ask Statsmodel developers for this distribution. > @Josef: I hereby ask you;-) > > The tricky part: As I intend to calculate the log likelihood via Wrights > generalized Bessel function and PR [2] implements this as a private special > function, can this function stay in scipy or should it move to the other > packages in that case. > > [1] https://docs.pymc.io/api/distributions/mixture.html > [2] https://github.com/scipy/scipy/pull/11313 > IMO, scipy would be a good central location for logpdf and associated wright. All packages like sklearn and statsmodels depend on scipy but not on each other. If wright in special works out, then adding just logpdf would be a good short term solution. Adding a new base distribution class e.g. for distribution mixtures will be a lot more work and won't happen fast. Because statsmodels has currently only an approximate tweedie logpdf, we would add any improved version if it doesn't go into scipy. We can add it also to compat.scipy until it is in all scipy versions that we support. scipy.stats has now multivariate distributions that don't fit into the old univariate distribution setup. Similarly, I think it would be possible to add other distributions that don't fit into the existing class hierarchy. That would be an intermediate step without adding full support for generic mixture distributions, or discrete-continuous combinations. Josef Josef > > > Kind regards > Christian > On 11.03.20 22:41, josef.pktd at gmail.com wrote: > > > > On Wed, Mar 11, 2020 at 5:02 PM Robert Kern wrote: > >> On Wed, Mar 11, 2020 at 3:35 PM Christian Lorentzen < >> lorentzen.ch at gmail.com> wrote: >> >>> Dear Scipy Developers and mailing list Readers >>> >>> I'd like to address the issue [1] to implement Tweedie distributions [2] >>> in scipy.stats. >>> >>> Purpose >>> The family of Tweedie distributions contains many known distributions >>> like the Poisson and the Gamma distribution, but also distributions between >>> them, aka compound poisson gamma distribution, see [3]. These are often >>> appropriate for insurance claims and other fields, where one has a >>> (Poisson) random count process of events and every event has a (Gamma) >>> random size/amount. >>> The distribution would enable simulations, maximum likelihood estimation >>> of all parameters, choice and visualization of distributions, etc. >>> >>> Implementation >>> I started PR [4] for Wrights generalized Bessel functions as a private >>> function in scipy.special. >>> Once this is ready, the pdf follows immediately. >>> For the range of interest of Y ~ compound poisson gamma distribution, >>> the distribution of Y has a point mass at zero and is otherwise continuous >>> for Y>0. >>> As already discussed in the issue [1], Tweedie might best fit as >>> rv_generic. >>> As such, it would be the first one, all others are either rv_discrete >>> or rv_continuous. >>> Without a template, I would need guidance how to implement a new >>> rv_generic. >>> >> FWIW, `rv_generic` isn't really intended to be a concrete class. It was >> only intended to be a base class implementing the common parts needed by >> `rv_continuous` and `rv_discrete`. Nothing "fits into" `rv_generic`, per >> se. The Tweedie distributions, for some parameters at least, may not fit >> into the `scipy.stats` infrastructure at all. We have no infrastructure for >> continuous-with-point-mass distributions. `rv_generic` is still built under >> the assumption that it's going to be implementing *either* a continuous >> *or* a discrete distribution. >> >> I recommend implementing the functionality that you need outside of scipy >> following whatever API solves your problems best. Then we can evaluate if >> there is infrastructure that can be built that would help the second >> continuous-with-point-mass distribution that we might want next. >> > > a long long time ago, I started a ParametricMixture model for this > > https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/distributions/otherdist.py > > (until I gave up on distributions) > > An alternative as temporary solution would be to add some > methods/functions like logpdf to scipy.stats, so statsmodels and sklearn > can reuse those. > > Josef > > > >> >> -- >> Robert Kern >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > > _______________________________________________ > SciPy-Dev mailing listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Mar 13 12:04:11 2020 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 13 Mar 2020 12:04:11 -0400 Subject: [SciPy-Dev] Tweedie distributions in scipy.stats In-Reply-To: References: <4ef6c60d-7d2f-199c-30c0-7106e6761ee5@googlemail.com> Message-ID: Aside: compound Poisson is a convolution of distributions and not a finite mixture. Allowing an infinite mixing distribution like Poisson creates numerical problems in the upper tail that are not easy to solve in general. In most cases, computation have to be truncated at the upper tail, but then the problem is to figure out the truncation threshold for a required precision. My guess is that this would be a lot of work to get it to scipy standards. I was looking at the general case for convolution and compound poisson a long time ago, mainly using fft to get the pdf and cdf from the characteristic function,, the cf is relatively simple to compute for convolutions. The references in extreme value and risk applications that I looked at, was emphasizing tail precision and ways how to work around it, or comparing different methods in how precise they are. fft was fast, but I only eyeballed the truncation threshold for my examples. Josef On Fri, Mar 13, 2020 at 11:46 AM wrote: > > > On Fri, Mar 13, 2020 at 11:16 AM Christian Lorentzen < > lorentzen.ch at gmail.com> wrote: > >> Thank you for your feedback. >> >> If it were possible to mix distributions together, e.g. rv1 + rv2, the >> compound poisson could be represented. AFAIK, PyMC3 supports that with >> "Mixture" distributions [1]. So I see three options: >> >> 1. Implement only a log likelihood function as Josef suggests. In >> which package to put it? Scipy, Statsmodels, PyMC3? >> 2. Ask PyMC3 developers for this distribution. >> 3. Ask Statsmodel developers for this distribution. >> @Josef: I hereby ask you;-) >> >> The tricky part: As I intend to calculate the log likelihood via Wrights >> generalized Bessel function and PR [2] implements this as a private special >> function, can this function stay in scipy or should it move to the other >> packages in that case. >> >> [1] https://docs.pymc.io/api/distributions/mixture.html >> [2] https://github.com/scipy/scipy/pull/11313 >> > > IMO, scipy would be a good central location for logpdf and associated > wright. > All packages like sklearn and statsmodels depend on scipy but not on each > other. > > If wright in special works out, then adding just logpdf would be a good > short term solution. > Adding a new base distribution class e.g. for distribution mixtures will > be a lot more work and won't happen fast. > > Because statsmodels has currently only an approximate tweedie logpdf, we > would add any improved version if it doesn't go into scipy. > We can add it also to compat.scipy until it is in all scipy versions that > we support. > > > scipy.stats has now multivariate distributions that don't fit into the old > univariate distribution setup. > Similarly, I think it would be possible to add other distributions that > don't fit into the existing class hierarchy. > That would be an intermediate step without adding full support for generic > mixture distributions, or discrete-continuous combinations. > > Josef > > > Josef > >> >> >> Kind regards >> Christian >> On 11.03.20 22:41, josef.pktd at gmail.com wrote: >> >> >> >> On Wed, Mar 11, 2020 at 5:02 PM Robert Kern >> wrote: >> >>> On Wed, Mar 11, 2020 at 3:35 PM Christian Lorentzen < >>> lorentzen.ch at gmail.com> wrote: >>> >>>> Dear Scipy Developers and mailing list Readers >>>> >>>> I'd like to address the issue [1] to implement Tweedie distributions >>>> [2] in scipy.stats. >>>> >>>> Purpose >>>> The family of Tweedie distributions contains many known distributions >>>> like the Poisson and the Gamma distribution, but also distributions between >>>> them, aka compound poisson gamma distribution, see [3]. These are often >>>> appropriate for insurance claims and other fields, where one has a >>>> (Poisson) random count process of events and every event has a (Gamma) >>>> random size/amount. >>>> The distribution would enable simulations, maximum likelihood >>>> estimation of all parameters, choice and visualization of distributions, >>>> etc. >>>> >>>> Implementation >>>> I started PR [4] for Wrights generalized Bessel functions as a private >>>> function in scipy.special. >>>> Once this is ready, the pdf follows immediately. >>>> For the range of interest of Y ~ compound poisson gamma distribution, >>>> the distribution of Y has a point mass at zero and is otherwise continuous >>>> for Y>0. >>>> As already discussed in the issue [1], Tweedie might best fit as >>>> rv_generic. >>>> As such, it would be the first one, all others are either rv_discrete >>>> or rv_continuous. >>>> Without a template, I would need guidance how to implement a new >>>> rv_generic. >>>> >>> FWIW, `rv_generic` isn't really intended to be a concrete class. It was >>> only intended to be a base class implementing the common parts needed by >>> `rv_continuous` and `rv_discrete`. Nothing "fits into" `rv_generic`, per >>> se. The Tweedie distributions, for some parameters at least, may not fit >>> into the `scipy.stats` infrastructure at all. We have no infrastructure for >>> continuous-with-point-mass distributions. `rv_generic` is still built under >>> the assumption that it's going to be implementing *either* a continuous >>> *or* a discrete distribution. >>> >>> I recommend implementing the functionality that you need outside of >>> scipy following whatever API solves your problems best. Then we can >>> evaluate if there is infrastructure that can be built that would help the >>> second continuous-with-point-mass distribution that we might want next. >>> >> >> a long long time ago, I started a ParametricMixture model for this >> >> https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/distributions/otherdist.py >> >> (until I gave up on distributions) >> >> An alternative as temporary solution would be to add some >> methods/functions like logpdf to scipy.stats, so statsmodels and sklearn >> can reuse those. >> >> Josef >> >> >> >>> >>> -- >>> Robert Kern >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at python.org >>> https://mail.python.org/mailman/listinfo/scipy-dev >>> >> >> _______________________________________________ >> SciPy-Dev mailing listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Mar 13 12:14:27 2020 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 13 Mar 2020 12:14:27 -0400 Subject: [SciPy-Dev] Tweedie distributions in scipy.stats In-Reply-To: References: <4ef6c60d-7d2f-199c-30c0-7106e6761ee5@googlemail.com> Message-ID: On Fri, Mar 13, 2020 at 12:04 PM wrote: > Aside: > compound Poisson is a convolution of distributions and not a finite > mixture. > Allowing an infinite mixing distribution like Poisson creates numerical > problems in the upper tail that are not easy to solve in general. > In most cases, computation have to be truncated at the upper tail, but > then the problem is to figure out the truncation threshold for a required > precision. > My guess is that this would be a lot of work to get it to scipy standards. > > I was looking at the general case for convolution and compound poisson a > long time ago, mainly using fft to get the pdf and cdf from the > characteristic function,, the cf is relatively simple to compute for > convolutions. The references in extreme value and risk applications that I > looked at, was emphasizing tail precision and ways how to work around it, > or comparing different methods in how precise they are. > fft was fast, but I only eyeballed the truncation threshold for my > examples. > I thought of representing tweedie for the computation as a mixture between a mass point/discrete distribution and a distribution for the continuous part, so we can handle the two parts separately. Following this, it might be possible to add a zero-truncated tweedie distribution as a continuous distribution subclass in scipy. Then we could just add a simple mixture of the mass point at zero and the zero-truncated tweedie. The same idea is used in hurdle models for count data, which have a mass point at zero and a zero-truncated Poisson or other count distributions for points > 0. > > Josef > > > On Fri, Mar 13, 2020 at 11:46 AM wrote: > >> >> >> On Fri, Mar 13, 2020 at 11:16 AM Christian Lorentzen < >> lorentzen.ch at gmail.com> wrote: >> >>> Thank you for your feedback. >>> >>> If it were possible to mix distributions together, e.g. rv1 + rv2, the >>> compound poisson could be represented. AFAIK, PyMC3 supports that with >>> "Mixture" distributions [1]. So I see three options: >>> >>> 1. Implement only a log likelihood function as Josef suggests. In >>> which package to put it? Scipy, Statsmodels, PyMC3? >>> 2. Ask PyMC3 developers for this distribution. >>> 3. Ask Statsmodel developers for this distribution. >>> @Josef: I hereby ask you;-) >>> >>> The tricky part: As I intend to calculate the log likelihood via Wrights >>> generalized Bessel function and PR [2] implements this as a private special >>> function, can this function stay in scipy or should it move to the other >>> packages in that case. >>> >>> [1] https://docs.pymc.io/api/distributions/mixture.html >>> [2] https://github.com/scipy/scipy/pull/11313 >>> >> >> IMO, scipy would be a good central location for logpdf and associated >> wright. >> All packages like sklearn and statsmodels depend on scipy but not on each >> other. >> >> If wright in special works out, then adding just logpdf would be a good >> short term solution. >> Adding a new base distribution class e.g. for distribution mixtures will >> be a lot more work and won't happen fast. >> >> Because statsmodels has currently only an approximate tweedie logpdf, we >> would add any improved version if it doesn't go into scipy. >> We can add it also to compat.scipy until it is in all scipy versions that >> we support. >> >> >> scipy.stats has now multivariate distributions that don't fit into the >> old univariate distribution setup. >> Similarly, I think it would be possible to add other distributions that >> don't fit into the existing class hierarchy. >> That would be an intermediate step without adding full support for >> generic mixture distributions, or discrete-continuous combinations. >> >> Josef >> >> >> Josef >> >>> >>> >>> Kind regards >>> Christian >>> On 11.03.20 22:41, josef.pktd at gmail.com wrote: >>> >>> >>> >>> On Wed, Mar 11, 2020 at 5:02 PM Robert Kern >>> wrote: >>> >>>> On Wed, Mar 11, 2020 at 3:35 PM Christian Lorentzen < >>>> lorentzen.ch at gmail.com> wrote: >>>> >>>>> Dear Scipy Developers and mailing list Readers >>>>> >>>>> I'd like to address the issue [1] to implement Tweedie distributions >>>>> [2] in scipy.stats. >>>>> >>>>> Purpose >>>>> The family of Tweedie distributions contains many known distributions >>>>> like the Poisson and the Gamma distribution, but also distributions between >>>>> them, aka compound poisson gamma distribution, see [3]. These are often >>>>> appropriate for insurance claims and other fields, where one has a >>>>> (Poisson) random count process of events and every event has a (Gamma) >>>>> random size/amount. >>>>> The distribution would enable simulations, maximum likelihood >>>>> estimation of all parameters, choice and visualization of distributions, >>>>> etc. >>>>> >>>>> Implementation >>>>> I started PR [4] for Wrights generalized Bessel functions as a private >>>>> function in scipy.special. >>>>> Once this is ready, the pdf follows immediately. >>>>> For the range of interest of Y ~ compound poisson gamma distribution, >>>>> the distribution of Y has a point mass at zero and is otherwise continuous >>>>> for Y>0. >>>>> As already discussed in the issue [1], Tweedie might best fit as >>>>> rv_generic. >>>>> As such, it would be the first one, all others are either rv_discrete >>>>> or rv_continuous. >>>>> Without a template, I would need guidance how to implement a new >>>>> rv_generic. >>>>> >>>> FWIW, `rv_generic` isn't really intended to be a concrete class. It was >>>> only intended to be a base class implementing the common parts needed by >>>> `rv_continuous` and `rv_discrete`. Nothing "fits into" `rv_generic`, per >>>> se. The Tweedie distributions, for some parameters at least, may not fit >>>> into the `scipy.stats` infrastructure at all. We have no infrastructure for >>>> continuous-with-point-mass distributions. `rv_generic` is still built under >>>> the assumption that it's going to be implementing *either* a >>>> continuous *or* a discrete distribution. >>>> >>>> I recommend implementing the functionality that you need outside of >>>> scipy following whatever API solves your problems best. Then we can >>>> evaluate if there is infrastructure that can be built that would help the >>>> second continuous-with-point-mass distribution that we might want next. >>>> >>> >>> a long long time ago, I started a ParametricMixture model for this >>> >>> https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/distributions/otherdist.py >>> >>> (until I gave up on distributions) >>> >>> An alternative as temporary solution would be to add some >>> methods/functions like logpdf to scipy.stats, so statsmodels and sklearn >>> can reuse those. >>> >>> Josef >>> >>> >>> >>>> >>>> -- >>>> Robert Kern >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at python.org >>>> https://mail.python.org/mailman/listinfo/scipy-dev >>>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at python.org >>> https://mail.python.org/mailman/listinfo/scipy-dev >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Mar 13 12:25:37 2020 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 13 Mar 2020 12:25:37 -0400 Subject: [SciPy-Dev] Tweedie distributions in scipy.stats In-Reply-To: References: <4ef6c60d-7d2f-199c-30c0-7106e6761ee5@googlemail.com> Message-ID: On Fri, Mar 13, 2020 at 12:14 PM wrote: > > > On Fri, Mar 13, 2020 at 12:04 PM wrote: > >> Aside: >> compound Poisson is a convolution of distributions and not a finite >> mixture. >> Allowing an infinite mixing distribution like Poisson creates numerical >> problems in the upper tail that are not easy to solve in general. >> In most cases, computation have to be truncated at the upper tail, but >> then the problem is to figure out the truncation threshold for a required >> precision. >> My guess is that this would be a lot of work to get it to scipy standards. >> >> I was looking at the general case for convolution and compound poisson a >> long time ago, mainly using fft to get the pdf and cdf from the >> characteristic function,, the cf is relatively simple to compute for >> convolutions. The references in extreme value and risk applications that I >> looked at, was emphasizing tail precision and ways how to work around it, >> or comparing different methods in how precise they are. >> fft was fast, but I only eyeballed the truncation threshold for my >> examples. >> > > I thought of representing tweedie for the computation as a mixture between > a mass point/discrete distribution and a distribution for the continuous > part, so we can handle the two parts separately. > > Following this, it might be possible to add a zero-truncated tweedie > distribution as a continuous distribution subclass in scipy. > Then we could just add a simple mixture of the mass point at zero and the > zero-truncated tweedie. > The same idea is used in hurdle models for count data, which have a mass > point at zero and a zero-truncated Poisson or other count distributions for > points > 0. > > > I haven't looked at Tweedie in a few years. AFAIK/AFAIR statsmodels only supports power parameter p in the open interval (1, 2), where it has the nice Poisson-Gamma distribution. > > >> >> Josef >> >> >> On Fri, Mar 13, 2020 at 11:46 AM wrote: >> >>> >>> >>> On Fri, Mar 13, 2020 at 11:16 AM Christian Lorentzen < >>> lorentzen.ch at gmail.com> wrote: >>> >>>> Thank you for your feedback. >>>> >>>> If it were possible to mix distributions together, e.g. rv1 + rv2, the >>>> compound poisson could be represented. AFAIK, PyMC3 supports that with >>>> "Mixture" distributions [1]. So I see three options: >>>> >>>> 1. Implement only a log likelihood function as Josef suggests. In >>>> which package to put it? Scipy, Statsmodels, PyMC3? >>>> 2. Ask PyMC3 developers for this distribution. >>>> 3. Ask Statsmodel developers for this distribution. >>>> @Josef: I hereby ask you;-) >>>> >>>> The tricky part: As I intend to calculate the log likelihood via >>>> Wrights generalized Bessel function and PR [2] implements this as a private >>>> special function, can this function stay in scipy or should it move to the >>>> other packages in that case. >>>> >>>> [1] https://docs.pymc.io/api/distributions/mixture.html >>>> [2] https://github.com/scipy/scipy/pull/11313 >>>> >>> >>> IMO, scipy would be a good central location for logpdf and associated >>> wright. >>> All packages like sklearn and statsmodels depend on scipy but not on >>> each other. >>> >>> If wright in special works out, then adding just logpdf would be a good >>> short term solution. >>> Adding a new base distribution class e.g. for distribution mixtures will >>> be a lot more work and won't happen fast. >>> >>> Because statsmodels has currently only an approximate tweedie logpdf, we >>> would add any improved version if it doesn't go into scipy. >>> We can add it also to compat.scipy until it is in all scipy versions >>> that we support. >>> >>> >>> scipy.stats has now multivariate distributions that don't fit into the >>> old univariate distribution setup. >>> Similarly, I think it would be possible to add other distributions that >>> don't fit into the existing class hierarchy. >>> That would be an intermediate step without adding full support for >>> generic mixture distributions, or discrete-continuous combinations. >>> >>> Josef >>> >>> >>> Josef >>> >>>> >>>> >>>> Kind regards >>>> Christian >>>> On 11.03.20 22:41, josef.pktd at gmail.com wrote: >>>> >>>> >>>> >>>> On Wed, Mar 11, 2020 at 5:02 PM Robert Kern >>>> wrote: >>>> >>>>> On Wed, Mar 11, 2020 at 3:35 PM Christian Lorentzen < >>>>> lorentzen.ch at gmail.com> wrote: >>>>> >>>>>> Dear Scipy Developers and mailing list Readers >>>>>> >>>>>> I'd like to address the issue [1] to implement Tweedie distributions >>>>>> [2] in scipy.stats. >>>>>> >>>>>> Purpose >>>>>> The family of Tweedie distributions contains many known distributions >>>>>> like the Poisson and the Gamma distribution, but also distributions between >>>>>> them, aka compound poisson gamma distribution, see [3]. These are often >>>>>> appropriate for insurance claims and other fields, where one has a >>>>>> (Poisson) random count process of events and every event has a (Gamma) >>>>>> random size/amount. >>>>>> The distribution would enable simulations, maximum likelihood >>>>>> estimation of all parameters, choice and visualization of distributions, >>>>>> etc. >>>>>> >>>>>> Implementation >>>>>> I started PR [4] for Wrights generalized Bessel functions as a >>>>>> private function in scipy.special. >>>>>> Once this is ready, the pdf follows immediately. >>>>>> For the range of interest of Y ~ compound poisson gamma distribution, >>>>>> the distribution of Y has a point mass at zero and is otherwise continuous >>>>>> for Y>0. >>>>>> As already discussed in the issue [1], Tweedie might best fit as >>>>>> rv_generic. >>>>>> As such, it would be the first one, all others are either rv_discrete >>>>>> or rv_continuous. >>>>>> Without a template, I would need guidance how to implement a new >>>>>> rv_generic. >>>>>> >>>>> FWIW, `rv_generic` isn't really intended to be a concrete class. It >>>>> was only intended to be a base class implementing the common parts needed >>>>> by `rv_continuous` and `rv_discrete`. Nothing "fits into" `rv_generic`, per >>>>> se. The Tweedie distributions, for some parameters at least, may not fit >>>>> into the `scipy.stats` infrastructure at all. We have no infrastructure for >>>>> continuous-with-point-mass distributions. `rv_generic` is still built under >>>>> the assumption that it's going to be implementing *either* a >>>>> continuous *or* a discrete distribution. >>>>> >>>>> I recommend implementing the functionality that you need outside of >>>>> scipy following whatever API solves your problems best. Then we can >>>>> evaluate if there is infrastructure that can be built that would help the >>>>> second continuous-with-point-mass distribution that we might want next. >>>>> >>>> >>>> a long long time ago, I started a ParametricMixture model for this >>>> >>>> https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/distributions/otherdist.py >>>> >>>> (until I gave up on distributions) >>>> >>>> An alternative as temporary solution would be to add some >>>> methods/functions like logpdf to scipy.stats, so statsmodels and sklearn >>>> can reuse those. >>>> >>>> Josef >>>> >>>> >>>> >>>>> >>>>> -- >>>>> Robert Kern >>>>> _______________________________________________ >>>>> SciPy-Dev mailing list >>>>> SciPy-Dev at python.org >>>>> https://mail.python.org/mailman/listinfo/scipy-dev >>>>> >>>> >>>> _______________________________________________ >>>> SciPy-Dev mailing listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev >>>> >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at python.org >>>> https://mail.python.org/mailman/listinfo/scipy-dev >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Mar 13 12:45:36 2020 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 13 Mar 2020 12:45:36 -0400 Subject: [SciPy-Dev] Tweedie distributions in scipy.stats In-Reply-To: References: <4ef6c60d-7d2f-199c-30c0-7106e6761ee5@googlemail.com> Message-ID: On Fri, Mar 13, 2020 at 12:15 PM wrote: > > > On Fri, Mar 13, 2020 at 12:04 PM wrote: > >> Aside: >> compound Poisson is a convolution of distributions and not a finite >> mixture. >> Allowing an infinite mixing distribution like Poisson creates numerical >> problems in the upper tail that are not easy to solve in general. >> In most cases, computation have to be truncated at the upper tail, but >> then the problem is to figure out the truncation threshold for a required >> precision. >> My guess is that this would be a lot of work to get it to scipy standards. >> >> I was looking at the general case for convolution and compound poisson a >> long time ago, mainly using fft to get the pdf and cdf from the >> characteristic function,, the cf is relatively simple to compute for >> convolutions. The references in extreme value and risk applications that I >> looked at, was emphasizing tail precision and ways how to work around it, >> or comparing different methods in how precise they are. >> fft was fast, but I only eyeballed the truncation threshold for my >> examples. >> > > I thought of representing tweedie for the computation as a mixture between > a mass point/discrete distribution and a distribution for the continuous > part, so we can handle the two parts separately. > > Following this, it might be possible to add a zero-truncated tweedie > distribution as a continuous distribution subclass in scipy. > Then we could just add a simple mixture of the mass point at zero and the > zero-truncated tweedie. > That could certainly work. It seems like handling that smoothly may be a pain for the user; you'd have to coordinate the effect of the parameters on both the size of the point mass and the continuous part, as well as the mixture. My recommendation is to implement this in its own package, using whatever frameworks you find help you solve your data analysis problems. Then we can figure out where it ought to finally live and how to extend the existing frameworks to handle this case best. The code doesn't have to start out in scipy.stats in order to make use of the scipy.stats framework. Please do continue to put the necessary special functions into scipy.special; that framework is a little harder to use outside of scipy.special. If you need my vote of support for that on that PR, you have it. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From willtironedev at gmail.com Sat Mar 14 11:31:25 2020 From: willtironedev at gmail.com (Will Tirone) Date: Sat, 14 Mar 2020 11:31:25 -0400 Subject: [SciPy-Dev] PR #11612 Message-ID: Hi everyone, First time contributor to Scipy here. I'm trying to fix a function, although for some reason it's failing the scipy.spatial distance testing ( https://github.com/scipy/scipy/pull/11612). I've read all the docs I can find and looked through the testing functions and I cannot seem to figure it out. I just followed an example of a different test for a similar function, although mine's still failing. I hate to keep asking my reviewer really basic questions, but is anyone available to take a look and see if 1) my fix isn't working or 2) I'm testing it incorrectly? And let me know if this isn't the usual format for sending stuff to the mailing list, first time doing this as well. Thanks in advance, really appreciate the community support! -Will -------------- next part -------------- An HTML attachment was scrubbed... URL: From wsw.raczek at gmail.com Sun Mar 15 07:50:15 2020 From: wsw.raczek at gmail.com (=?UTF-8?B?V8WCYWR5c8WCYXcgUmFjemVr?=) Date: Sun, 15 Mar 2020 12:50:15 +0100 Subject: [SciPy-Dev] Medfilt with signal size less than kernel_size Message-ID: Hi everyone, I'm new contributor here. Recently I've looked into issue https://github.com/scipy/scipy/issues/11503, and it turned out everything was fine (e.g. according to documentation), but then grlee77 suggested that maybe it's worth raising UserWarning for such cases. What do you think? It doesn't look so complicated, and I would be happy to do it as my first contribution. Regards, Vladyslav Rachek -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Mar 15 08:07:06 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 15 Mar 2020 13:07:06 +0100 Subject: [SciPy-Dev] Medfilt with signal size less than kernel_size In-Reply-To: References: Message-ID: On Sun, Mar 15, 2020 at 12:50 PM W?adys?aw Raczek wrote: > Hi everyone, > > I'm new contributor here. Recently I've looked into issue > https://github.com/scipy/scipy/issues/11503, and it turned out everything > was fine (e.g. according to documentation), but then grlee77 suggested that > maybe it's worth raising UserWarning for such cases. > What do you think? It doesn't look so complicated, and I would be happy to > do it as my first contribution. > > Hi W?adys?aw, that does seem like a good idea, please go for it! Best, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilhanpolat at gmail.com Sun Mar 15 08:46:21 2020 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Sun, 15 Mar 2020 13:46:21 +0100 Subject: [SciPy-Dev] Degraded performance via Cython LAPACK wrappers Message-ID: Dear all, I am trying to increase the performance of a few functions in linalg module by cythonizing trivial but time-consuming tasks. After not getting too much performance, I decided to consult the scipy folks about it. I have the following cython MVE specifically working on fortran contiguous arrays for simplicity. @cython.boundscheck(False) @cython.wraparound(False) @cython.cdivision(True) cpdef void zzz(double[::1, :] a, double[::1, :]b) nogil: cdef int n, nrhs n = a.shape[0] nrhs = b.shape[1] lda = n cdef int *ipiv = malloc(n * sizeof(int)) cdef int info = 0 if not ipiv: raise MemoryError() # dgesv(int *n, int *nrhs, d *a, int *lda, int *ipiv, d *b, int *ldb, int *info) dgesv(&n, &nrhs, &a[0,0], &lda, &ipiv[0], &b[0,0], &lda, &info) free(ipiv) This is supposed to be much faster than the regular sp.linalg.solve(a, b) call since it bypasses a lot of checks and the relevant lapack flavor detection etc. However using the following data n = 150 a = np.empty([n, n], dtype=float, order='F') a[:, :] = np.random.rand(n, n) + np.eye(n)*5 b = np.asfortranarray(np.random.rand(n, 1)) I see that for about 800 > n > 50 the results are almost identical which is pretty surprising as cython code should be spending a lot of time somewhere. I haven't been successful enabling the line trace in cython code hence can anyone please tell me where I am losing performance and how I can remedy it? Best, -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilhanpolat at gmail.com Sun Mar 15 09:18:26 2020 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Sun, 15 Mar 2020 14:18:26 +0100 Subject: [SciPy-Dev] Degraded performance via Cython LAPACK wrappers In-Reply-To: References: Message-ID: Sorry gmail ate my code fragment here is the imports for the first part cimport cython import numpy as np cimport numpy as cnp from libc.stdlib cimport malloc, free cimport libc.limits from scipy.linalg.cython_lapack cimport dgtsv, dgesv, dgetrf cdef int MEMORY_ERROR = libc.limits.INT_MAX On Sun, Mar 15, 2020 at 1:46 PM Ilhan Polat wrote: > Dear all, > > I am trying to increase the performance of a few functions in linalg > module by cythonizing trivial but time-consuming tasks. After not getting > too much performance, I decided to consult the scipy folks about it. > > I have the following cython MVE specifically working on fortran contiguous > arrays for simplicity. > > @cython.boundscheck(False) > @cython.wraparound(False) > @cython.cdivision(True) > cpdef void zzz(double[::1, :] a, double[::1, :]b) nogil: > cdef int n, nrhs > n = a.shape[0] > nrhs = b.shape[1] > lda = n > > cdef int *ipiv = malloc(n * sizeof(int)) > cdef int info = 0 > > if not ipiv: > raise MemoryError() > # dgesv(int *n, int *nrhs, d *a, int *lda, int *ipiv, d *b, int *ldb, > int *info) > dgesv(&n, &nrhs, &a[0,0], &lda, &ipiv[0], &b[0,0], &lda, &info) > free(ipiv) > > > This is supposed to be much faster than the regular sp.linalg.solve(a, b) > call since it bypasses a lot of checks and the relevant lapack flavor > detection etc. > > However using the following data > > n = 150 > a = np.empty([n, n], dtype=float, order='F') > a[:, :] = np.random.rand(n, n) + np.eye(n)*5 > b = np.asfortranarray(np.random.rand(n, 1)) > > I see that for about 800 > n > 50 the results are almost identical which > is pretty surprising as cython code should be spending a lot of time > somewhere. > > I haven't been successful enabling the line trace in cython code hence can > anyone please tell me where I am losing performance and how I can remedy it? > > Best, > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlucas7 at vt.edu Sun Mar 15 19:00:22 2020 From: rlucas7 at vt.edu (rlucas7 at vt.edu) Date: Sun, 15 Mar 2020 19:00:22 -0400 Subject: [SciPy-Dev] Tweedie distributions in scipy.stats In-Reply-To: References: Message-ID: <90626C84-DCFE-420C-BE6F-508052C66210@vt.edu> > On Mar 13, 2020, at 12:46 PM, Robert Kern wrote: > > ? >> On Fri, Mar 13, 2020 at 12:15 PM wrote: > >> >> >>> On Fri, Mar 13, 2020 at 12:04 PM wrote: >>> Aside: >>> compound Poisson is a convolution of distributions and not a finite mixture. >>> Allowing an infinite mixing distribution like Poisson creates numerical problems in the upper tail that are not easy to solve in general. >>> In most cases, computation have to be truncated at the upper tail, but then the problem is to figure out the truncation threshold for a required precision. >>> My guess is that this would be a lot of work to get it to scipy standards. >>> >>> I was looking at the general case for convolution and compound poisson a long time ago, mainly using fft to get the pdf and cdf from the characteristic function,, the cf is relatively simple to compute for convolutions. The references in extreme value and risk applications that I looked at, was emphasizing tail precision and ways how to work around it, or comparing different methods in how precise they are. >>> fft was fast, but I only eyeballed the truncation threshold for my examples. >> I think I also looked at that at my previous employer, I think the reference I had used is this one https://eprints.usq.edu.au/3888/1/Dunn_Smyth_Stats_and_Comp_v18n1.pdf Hopefully that helps. >> I thought of representing tweedie for the computation as a mixture between a mass point/discrete distribution and a distribution for the continuous part, so we can handle the two parts separately. I came to the same conclusion after thinking about this a bit over the last few days. >> Following this, it might be possible to add a zero-truncated tweedie distribution as a continuous distribution subclass in scipy. >> Then we could just add a simple mixture of the mass point at zero and the zero-truncated tweedie. The difference in zero inflated poisson is that it can be handled directly within the rv_discrete framework (I think). An rv_continuous with a zero point mass would handle a tweedie with 1 > That could certainly work. It seems like handling that smoothly may be a pain for the user; you'd have to coordinate the effect of the parameters on both the size of the point mass and the continuous part, as well as the mixture. > > My recommendation is to implement this in its own package, using whatever frameworks you find help you solve your data analysis problems. Then we can figure out where it ought to finally live and how to extend the existing frameworks to handle this case best. Thanks for suggesting this Robert, this is a wise strategy, this will enable to work out something that would generalize outside of the specifics of only the tweedie distribution. > The code doesn't have to start out in scipy.stats in order to make use of the scipy.stats framework. Please do continue to put the necessary special functions into scipy.special; that framework is a little harder to use outside of scipy.special. If you need my vote of support for that on that PR, you have it. I found this from another statsModels developer that may be helpful to use as reference https://github.com/thequackdaddy/tweedie/blob/master/tweedie/tweedie_dist.py Hopefully you find it helpful. > > -- > Robert Kern > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Mar 16 12:21:28 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 16 Mar 2020 17:21:28 +0100 Subject: [SciPy-Dev] welcome Lucas Roberts to the SciPy core team Message-ID: Hi all, On behalf of the SciPy developers I'd like to welcome Lucas Roberts as a member of the core team. Lucas has been contributing for well over a year, making nice improvements and fixes to scipy.stats and scipy.special. He has also done a lot of valuable reviews of stats PRs. Here is an overview of his SciPy PRs: https://github.com/scipy/scipy/pulls/rlucas7 I'm looking to Lucas' continued contributions! Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Mon Mar 16 12:24:13 2020 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Mon, 16 Mar 2020 12:24:13 -0400 Subject: [SciPy-Dev] welcome Lucas Roberts to the SciPy core team In-Reply-To: References: Message-ID: On 3/16/20, Ralf Gommers wrote: > Hi all, > > On behalf of the SciPy developers I'd like to welcome Lucas Roberts as a > member of the core team. Lucas has been contributing for well over a year, > making nice improvements and fixes to scipy.stats and scipy.special. He has > also done a lot of valuable reviews of stats PRs. Here is an overview of > his SciPy PRs: https://github.com/scipy/scipy/pulls/rlucas7 > > I'm looking to Lucas' continued contributions! Welcome, Lucas. Thanks for the great work so far, looking forward to more! Warren > > Cheers, > Ralf > From stefanv at berkeley.edu Mon Mar 16 12:27:54 2020 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Mon, 16 Mar 2020 18:27:54 +0200 Subject: [SciPy-Dev] welcome Lucas Roberts to the SciPy core team In-Reply-To: References: Message-ID: <170e42c4f10.27ae.acf34a9c767d7bb498a799333be0433e@fastmail.com> On March 16, 2020 18:22:08 Ralf Gommers wrote: > On behalf of the SciPy developers I'd like to welcome Lucas Roberts as a > member of the core team. Lucas has been contributing for well over a year, > making nice improvements and fixes to scipy.stats and scipy.special. He has > also done a lot of valuable reviews of stats PRs. Here is an overview of > his SciPy PRs: https://github.com/scipy/scipy/pulls/rlucas7 > > I'm looking to Lucas' continued contributions! Welcome, Lucas, and thank you for your contributions so far and going ahead! Best regards, St?fan -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhaberla at calpoly.edu Mon Mar 16 12:51:02 2020 From: mhaberla at calpoly.edu (Matt Haberland) Date: Mon, 16 Mar 2020 09:51:02 -0700 Subject: [SciPy-Dev] welcome Lucas Roberts to the SciPy core team In-Reply-To: References: Message-ID: Welcome, Lucas! Keep up the good work! On Mon, Mar 16, 2020 at 9:24 AM Warren Weckesser wrote: > On 3/16/20, Ralf Gommers wrote: > > Hi all, > > > > On behalf of the SciPy developers I'd like to welcome Lucas Roberts as a > > member of the core team. Lucas has been contributing for well over a > year, > > making nice improvements and fixes to scipy.stats and scipy.special. He > has > > also done a lot of valuable reviews of stats PRs. Here is an overview of > > his SciPy PRs: https://github.com/scipy/scipy/pulls/rlucas7 > > > > I'm looking to Lucas' continued contributions! > > > Welcome, Lucas. Thanks for the great work so far, looking forward to more! > > Warren > > > > > > Cheers, > > Ralf > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeguerra at ucdavis.edu Mon Mar 16 12:56:16 2020 From: jeguerra at ucdavis.edu (Jorge Guerra) Date: Mon, 16 Mar 2020 10:56:16 -0600 Subject: [SciPy-Dev] Multithreading sparse matrix to dense vector(s) multiplication Message-ID: Hello everyone, I'm quite new to the SciPy development community. I've identified a potential improvement that would be very helpful: in scipy/sparse/sparsetools/csr.h @ lines 1092 and 1129 in the csr_matvec() and mat_vecs() methods I would like to parallelize the outer for loop over the rows of the input sparse matrix. The current serial code looks just about optimal, but should take advantage of a multicore machine. This is a place where SciPy is deficient. Going through the compilation, SciPy depends on pThreads. However, my proposed change would be trivial by including OpenMP, but I don't necessarily want to add to the overall dependencies. Would anyone more familiar with pThreads be willing to collaborate with me to accomplish this proposed task? Or perhaps advice on an alternative to get multithreaded sparse matrix to dense vector multiplication implemented? Thank you all! Jorge -- *Jorge E. Guerra, PhD.* *Research Scientist, CIMMS/NOAA* *Earth Systems Research Laboratory, Boulder, CO* -------------- next part -------------- An HTML attachment was scrubbed... URL: From rth.yurchak at gmail.com Mon Mar 16 13:24:51 2020 From: rth.yurchak at gmail.com (Roman Yurchak) Date: Mon, 16 Mar 2020 18:24:51 +0100 Subject: [SciPy-Dev] Multithreading sparse matrix to dense vector(s) multiplication In-Reply-To: References: Message-ID: Hello Jorge, FYI there have been previous discussions about related topics in https://github.com/scipy/scipy/issues/10201 Also see https://github.com/scipy/scipy/issues/10239 for considerations about supporting OpenMP. I'm not sure if there are any plans for supporting it in the future. Can't help with the pThreads implementation, but generally I think multi-threading sparse operations if possible would be quite useful. -- Roman On 16/03/2020 17:56, Jorge Guerra wrote: > Hello everyone, > > I'm quite new to the SciPy development community. I've identified a > potential improvement that would be very helpful: > > in?scipy/sparse/sparsetools/csr.h?@ lines 1092 and 1129 in the > csr_matvec() and mat_vecs() methods I would like to parallelize the > outer for loop over the rows of the input sparse matrix. The current > serial code looks just about optimal, but should take advantage of a > multicore machine. This is a place where SciPy is deficient. > > Going through the compilation, SciPy depends on pThreads. However, my > proposed change would be trivial by including OpenMP, but I don't > necessarily want to add to the overall dependencies. > > Would anyone more familiar with pThreads be willing to collaborate with > me to accomplish this proposed task? Or perhaps advice on an alternative > to get multithreaded sparse matrix to dense vector multiplication > implemented? > > Thank you all! > > Jorge > > -- > /Jorge E. Guerra, PhD./ > /Research Scientist, CIMMS/NOAA/ > /Earth Systems Research Laboratory, Boulder, CO/ > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > From ralf.gommers at gmail.com Mon Mar 16 13:44:14 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 16 Mar 2020 18:44:14 +0100 Subject: [SciPy-Dev] Multithreading sparse matrix to dense vector(s) multiplication In-Reply-To: References: Message-ID: On Mon, Mar 16, 2020 at 6:25 PM Roman Yurchak wrote: > Hello Jorge, > > FYI there have been previous discussions about related topics in > https://github.com/scipy/scipy/issues/10201 > > Also see https://github.com/scipy/scipy/issues/10239 for considerations > about supporting OpenMP. I'm not sure if there are any plans for > supporting it in the future. > OpenMP is indeed not a good option, however we do have both C++ multithreading code in cKDTree and scipy.fft, and Python multiprocessing code in the likes of differential_evolution. Those patterns can be followed for other functions (in all cases, using a workers=1 keyword). An issue for matvec et al. may be that they can be called from operators, those of course cannot get a keyword. I don't see a good way around that, since using multiple threads by default is not a good idea (this messes with composability, multiprocessing may happen at a higher level already, like with dask or scikit-learn). Cheers, Ralf > Can't help with the pThreads implementation, but generally I think > multi-threading sparse operations if possible would be quite useful. > > -- > Roman > > On 16/03/2020 17:56, Jorge Guerra wrote: > > Hello everyone, > > > > I'm quite new to the SciPy development community. I've identified a > > potential improvement that would be very helpful: > > > > in scipy/sparse/sparsetools/csr.h @ lines 1092 and 1129 in the > > csr_matvec() and mat_vecs() methods I would like to parallelize the > > outer for loop over the rows of the input sparse matrix. The current > > serial code looks just about optimal, but should take advantage of a > > multicore machine. This is a place where SciPy is deficient. > > > > Going through the compilation, SciPy depends on pThreads. However, my > > proposed change would be trivial by including OpenMP, but I don't > > necessarily want to add to the overall dependencies. > > > > Would anyone more familiar with pThreads be willing to collaborate with > > me to accomplish this proposed task? Or perhaps advice on an alternative > > to get multithreaded sparse matrix to dense vector multiplication > > implemented? > > > > Thank you all! > > > > Jorge > > > > -- > > /Jorge E. Guerra, PhD./ > > /Research Scientist, CIMMS/NOAA/ > > /Earth Systems Research Laboratory, Boulder, CO/ > > > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at python.org > > https://mail.python.org/mailman/listinfo/scipy-dev > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From grlee77 at gmail.com Mon Mar 16 14:44:03 2020 From: grlee77 at gmail.com (Gregory Lee) Date: Mon, 16 Mar 2020 14:44:03 -0400 Subject: [SciPy-Dev] Multithreading sparse matrix to dense vector(s) multiplication In-Reply-To: References: Message-ID: On Mon, Mar 16, 2020 at 12:56 PM Jorge Guerra wrote: > Hello everyone, > > I'm quite new to the SciPy development community. I've identified a > potential improvement that would be very helpful: > > in scipy/sparse/sparsetools/csr.h @ lines 1092 and 1129 in the > csr_matvec() and mat_vecs() methods I would like to parallelize the outer > for loop over the rows of the input sparse matrix. The current serial code > looks just about optimal, but should take advantage of a multicore machine. > This is a place where SciPy is deficient. > > Welcome, Have you tried this to verify that you do indeed get a performance improvement with multithreading? I seem to recall sparse matrix-vector mulitiplication is more difficult to accelerate than dense matrix-vector multiplication because it tends to be memory bound. A quick search just now turned up the following paper that discusses some of the difficulties involved: http://www.cslab.ece.ntua.gr/~nkoziris/papers/pdp08understanding.pdf > Going through the compilation, SciPy depends on pThreads. However, my > proposed change would be trivial by including OpenMP, but I don't > necessarily want to add to the overall dependencies. > > Would anyone more familiar with pThreads be willing to collaborate with me > to accomplish this proposed task? Or perhaps advice on an alternative to > get multithreaded sparse matrix to dense vector multiplication implemented? > > Thank you all! > > Jorge > > -- > *Jorge E. Guerra, PhD.* > *Research Scientist, CIMMS/NOAA* > *Earth Systems Research Laboratory, Boulder, CO* > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeguerra at ucdavis.edu Mon Mar 16 18:46:35 2020 From: jeguerra at ucdavis.edu (Jorge Guerra) Date: Mon, 16 Mar 2020 16:46:35 -0600 Subject: [SciPy-Dev] Multithreading sparse matrix to dense vector(s) multiplication In-Reply-To: References: Message-ID: Thank you for the responses. I can see the problem with overloading the multiplication "*" operator and the dot() method. As for the algorithm itself, what I'm proposing would only allow for multiple rows of the sparse matrix to be computed on simultaneously without addressing any of the underlying issues identified in the reference http://www.cslab.ece.ntua.gr/~nkoziris/papers/pdp08understanding.pdf. I'm aware that this would not be an optimal solution, but naively it should provide some improvement in my opinion. At least it is worth investigating. I know Matlab is able to do this with multithreading with the use of an external library... although I have not been able to reverse engineer which and how that is implemented. Maybe it is in their use of SuiteSparse, but I'm not sure. I'm going to work on a verification and I'll have to learn pThreads it seems. Thank you all, Jorge On Mon, Mar 16, 2020 at 12:44 PM Gregory Lee wrote: > > > On Mon, Mar 16, 2020 at 12:56 PM Jorge Guerra > wrote: > >> Hello everyone, >> >> I'm quite new to the SciPy development community. I've identified a >> potential improvement that would be very helpful: >> >> in scipy/sparse/sparsetools/csr.h @ lines 1092 and 1129 in the >> csr_matvec() and mat_vecs() methods I would like to parallelize the outer >> for loop over the rows of the input sparse matrix. The current serial code >> looks just about optimal, but should take advantage of a multicore machine. >> This is a place where SciPy is deficient. >> >> > Welcome, > > Have you tried this to verify that you do indeed get a performance > improvement with multithreading? I seem to recall sparse matrix-vector > mulitiplication is more difficult to accelerate than dense matrix-vector > multiplication because it tends to be memory bound. A quick search just now > turned up the following paper that discusses some of the difficulties > involved: > http://www.cslab.ece.ntua.gr/~nkoziris/papers/pdp08understanding.pdf > > > >> Going through the compilation, SciPy depends on pThreads. However, my >> proposed change would be trivial by including OpenMP, but I don't >> necessarily want to add to the overall dependencies. >> >> Would anyone more familiar with pThreads be willing to collaborate with >> me to accomplish this proposed task? Or perhaps advice on an alternative to >> get multithreaded sparse matrix to dense vector multiplication implemented? >> >> Thank you all! >> >> Jorge >> >> -- >> *Jorge E. Guerra, PhD.* >> *Research Scientist, CIMMS/NOAA* >> *Earth Systems Research Laboratory, Boulder, CO* >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -- *Jorge E. Guerra, PhD.* *Research Scientist, CIMMS/NOAA* *Earth Systems Research Laboratory, Boulder, CO* -------------- next part -------------- An HTML attachment was scrubbed... URL: From evgeny.burovskiy at gmail.com Mon Mar 16 19:51:48 2020 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Tue, 17 Mar 2020 02:51:48 +0300 Subject: [SciPy-Dev] welcome Lucas Roberts to the SciPy core team In-Reply-To: References: Message-ID: Welcome Lucas! On Mon, Mar 16, 2020 at 7:21 PM Ralf Gommers wrote: > > Hi all, > > On behalf of the SciPy developers I'd like to welcome Lucas Roberts as a member of the core team. Lucas has been contributing for well over a year, making nice improvements and fixes to scipy.stats and scipy.special. He has also done a lot of valuable reviews of stats PRs. Here is an overview of his SciPy PRs: https://github.com/scipy/scipy/pulls/rlucas7 > > I'm looking to Lucas' continued contributions! > > Cheers, > Ralf > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev From charlesr.harris at gmail.com Tue Mar 17 13:32:35 2020 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 17 Mar 2020 11:32:35 -0600 Subject: [SciPy-Dev] NumPy 1.18.2 released. Message-ID: Hi All, On behalf of the NumPy team I am pleased to announce that NumPy 1.18.2 has been released.This small release contains a fix for a performance regression in numpy/random and several bug/maintenance updates. The Python versions supported in this release are 3.5-3.8. Downstream developers should use Cython >= 0.29.15 for Python 3.8 support and OpenBLAS >= 3.7 to avoid errors on the Skylake architecture. Wheels for this release can be downloaded from PyPI , source archives and release notes are available from Github . *Contributors* A total of 5 people contributed to this release. People with a "+" by their names contributed a patch for the first time. - Charles Harris - Ganesh Kathiresan + - Matti Picus - Sebastian Berg - przemb + *Pull requests merged* A total of 7 pull requests were merged for this release. - https://github.com/numpy/numpy/pull/15675: TST: move _no_tracing to testing._private - https://github.com/numpy/numpy/pull/15676: MAINT: Large overhead in some random functions - https://github.com/numpy/numpy/pull/15677: TST: Do not create gfortran link in azure Mac testing. - https://github.com/numpy/numpy/pull/15679: BUG: Added missing error check in ndarray.__contains__ - https://github.com/numpy/numpy/pull/15722: MAINT: use list-based APIs to call subprocesses - https://github.com/numpy/numpy/pull/15729: REL: Prepare for 1.18.2 release. - https://github.com/numpy/numpy/pull/15734: BUG: fix logic error when nm fails on 32-bit Cheers, Charles Harris -------------- next part -------------- An HTML attachment was scrubbed... URL: From andyfaff at gmail.com Tue Mar 17 21:21:17 2020 From: andyfaff at gmail.com (Andrew Nelson) Date: Wed, 18 Mar 2020 12:21:17 +1100 Subject: [SciPy-Dev] Shim for RandomState/Generator Message-ID: Hi all, default random number generation is heading towards use of np.random.Generator, with RandomState being preserved for legacy usage. It would be nice to start using Generator's, but in order to do that in scipy we would need to be able to write code that worked with RandomState or Generator. Unfortunately there are a few methods that have changed their name, `randint `--> `integers` and `rand/random_sample` --> `random` spring to mind I was wondering if we could add a shim to go in `scipy._lib._util` that would permit code to be written as if it were using the Generator front end (at least most of it), but could be using either RandomState or Generator as a backend. An example for the shim code would be: ``` from scipy._lib._util import check_random_state from numpy.random import Generator, RandomState class RG(object): """ Shim object for working across RandomState and Generator. RandomState is legacy, Generator is the future. """ def __init__(self, seed): rng = check_random_state(seed) # methods = [f for f in dir(np.random.default_rng()) if not f.startswith('_')] # methods.remove('integers') # methods.remove('random') methods = ['beta', 'binomial', 'bytes', 'chisquare', 'choice', 'dirichlet', 'exponential', 'f', 'gamma', 'geometric', 'gumbel', 'hypergeometric', 'laplace', 'logistic', 'lognormal', 'logseries', 'multinomial', 'multivariate_normal', 'negative_binomial', 'noncentral_chisquare', 'noncentral_f', 'normal', 'pareto', 'permutation', 'poisson', 'power', 'rayleigh', 'shuffle', 'standard_cauchy', 'standard_exponential', 'standard_gamma', 'standard_normal', 'standard_t', 'triangular', 'uniform', 'vonmises', 'wald', 'weibull', 'zipf'] for method in methods: setattr(self, method, getattr(rng, method)) if isinstance(rng, RandomState): setattr(self, 'integers', rng.randint) setattr(self, 'random', rng.random_sample) elif isinstance(rng, Generator): setattr(self, 'integers', rng.integers) setattr(self, 'random', rng.random) setattr(self, 'multivariate_hypergeometric', rng.multivariate_hypergeometric) self.rng = rng @property def state(self): if isinstance(self.rng, RandomState): return self.rng.get_state() elif isinstance(self.rng, Generator): return self.rng.bit_generator.state @state.setter def state(self, state): if isinstance(self.rng, RandomState): self.rng.set_state(state) elif isinstance(self.rng, Generator): self.rng.bit_generator.state = state ``` -- _____________________________________ Dr. Andrew Nelson _____________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Mar 17 21:48:13 2020 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 17 Mar 2020 21:48:13 -0400 Subject: [SciPy-Dev] Shim for RandomState/Generator In-Reply-To: References: Message-ID: On Tue, Mar 17, 2020 at 9:22 PM Andrew Nelson wrote: > Hi all, > default random number generation is heading towards use of > np.random.Generator, with RandomState being preserved for legacy usage. It > would be nice to start using Generator's, but in order to do that in scipy > we would need to be able to write code that worked with RandomState or > Generator. > Unfortunately there are a few methods that have changed their name, > `randint `--> `integers` and `rand/random_sample` --> `random` spring to > mind > FWIW, `random` exists in RandomState, too, so it's just a matter of using the existing common name for that one. `randint` --> `integers` is the only sticky one that I can recall. > I was wondering if we could add a shim to go in `scipy._lib._util` that > would permit code to be written as if it were using the Generator front end > (at least most of it), but could be using either RandomState or Generator > as a backend. > Instead of a wrapper object, which is inevitably going to be passed around and thus introducing a third API for code to be aware of, I would recommend having a set of functions for the few methods that have changed names. I.e. def rng_integers(gen, low, high=None, ...): if isinstance(gen, Generator): return gen.integers(low, high=high, ...) else: return gen.randint(low, high=high, ...) Yes, this also constitutes a third API, but it's one that can't "escape". It only affects functions that call these functions because it doesn't introduce a new object with its own lifetime to consider. It's also limited to a couple of functions. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From clemens.brunner at uni-graz.at Wed Mar 18 03:58:03 2020 From: clemens.brunner at uni-graz.at (Brunner, Clemens (clemens.brunner@uni-graz.at)) Date: Wed, 18 Mar 2020 07:58:03 +0000 Subject: [SciPy-Dev] Alternative return value for io.loadmat Message-ID: <0C48819D-3FDA-48E0-B53E-F560DCBD8F4F@uni-graz.at> Hello! Loading .mat files with scipy.io.loadmat returns a rather complex dict object with nested struct arrays. If the .mat file contains cell arrays, the resulting structure is particularly hard to parse for humans (see e.g. https://github.com/scipy/scipy/issues/7895). For this reason, people have come up with conversion routines that simplify the result of io.loadmat (e.g. http://blog.nephics.com/2019/08/28/better-loadmat-for-scipy). Simple example using the following MATLAB struct (download the .mat file here: https://github.com/scipy/scipy/files/1314287/matlab.zip): >> s = struct('mycell', {{'a', 'b', 'c'}}); Accessing the value 'a' with scipy.io.loadmat: >>> mat["s"]["mycell"][0, 0][0, 0][0] 'a' The solution from the blog post provides easier access to the same value because the nested struct arrays get flattened: >>> mat1["s"]["mycell"][0] 'a' In https://github.com/scipy/scipy/issues/7895 we propose to add such an alternative return value structure to scipy.io.loadmat. Do you think this is a useful addition that should be included? If so, what would be the best way to integrate it? Should scipy.io.loadmat get a new keyword argument that influences what is returned (i.e. the default behavior should return the current value, whereas setting the kwarg to a specific value would return the alternative value)? Or should there be a new function that gets the current value and converts it to the alternative structure? Thanks! Clemens From ralf.gommers at gmail.com Sat Mar 21 18:20:20 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 21 Mar 2020 23:20:20 +0100 Subject: [SciPy-Dev] PR #11612 In-Reply-To: References: Message-ID: On Sat, Mar 14, 2020 at 4:31 PM Will Tirone wrote: > Hi everyone, > > First time contributor to Scipy here. I'm trying to fix a function, > although for some reason it's failing the scipy.spatial distance testing ( > https://github.com/scipy/scipy/pull/11612). I've read all the docs I can > find and looked through the testing functions and I cannot seem to figure > it out. I just followed an example of a different test for a similar > function, although mine's still failing. I hate to keep asking my reviewer > really basic questions, but is anyone available to take a look and see if > 1) my fix isn't working or 2) I'm testing it incorrectly? > > And let me know if this isn't the usual format for sending stuff to the > mailing list, first time doing this as well. > Hi Will, welcome and thanks for sticking with it. This is a very valid question, you're doing everything right here. Cheers, Ralf > Thanks in advance, really appreciate the community support! > -Will > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mojca.miklavec.lists at gmail.com Wed Mar 25 10:20:21 2020 From: mojca.miklavec.lists at gmail.com (Mojca Miklavec) Date: Wed, 25 Mar 2020 15:20:21 +0100 Subject: [SciPy-Dev] Improving scaling factors in IIR SOS sections in scipy.signal Message-ID: Dear signal processing gurus, I'm using python for playing around, testing and quickly prototyping IIR filters for a piece of code which is otherwise written in C[++] & FPGA, but I miss one thing in zpk2sos, namely the complete scaling factor ends up in the first sos section, rather than being "evenly distributed" across all the sections. For floating point operations this doesn't make much of a difference, but not applying the scaling inside a sos section with integer arithmetics is potentially more problematic to work with (and I would like to have support for this to be able to compare calculations & results). Example: >>> signal.butter(8, 3E-4, output='sos') array([[ 2.42594124e-27, 4.85188247e-27, 2.42594124e-27, 1.00000000e+00, -1.99815208e+00, 9.98152971e-01], [ 1.00000000e+00, 2.00000000e+00, 1.00000000e+00, 1.00000000e+00, -1.99843306e+00, 9.98433944e-01], [ 1.00000000e+00, 2.00000000e+00, 1.00000000e+00, 1.00000000e+00, -1.99895244e+00, 9.98953323e-01], [ 1.00000000e+00, 2.00000000e+00, 1.00000000e+00, 1.00000000e+00, -1.99963144e+00, 9.99632331e-01]]) I would like to see the 2.4259e-27 part to be distributed among the four sos sections (rather than ending up with [1 2 1] everywhere else), which is also what Matlab returns. Since I heavily prefer using Python compared to Matlab (I don't even have the licence myself), I wanted to ask for your opinion before I start writing the code. Would such a change be acceptable upstream? And what would be the best way to handle "backward compatibility"? (I've never contributed to Python before, but I assume it's doable :) A bunch of unrelated questions: (1) I was playing with signal.sosfiltfilt(...) to apply an IIR filter to the input signal. I didn't check the source code, but the results look as if the filter was working partially "backwards". That is: it seems to be using some points from the "future" to determine the result, similar to how a FIR filter would take some points from the past + some points from the future to calculate the current point. I can send some screenshot, but I'm obviously missing something important here. Is there another approach to exactly simulate out[i] = b[0] * in[i] + b[1] * in[i-1] + b[2] * in[i-2] - a[1] * out[i-1] - a[2] * out[i-2] for the output? (2) I would like to simulate the filter with integer (fixed point) arithmetics. For a low-pass filter with either very low or very high cut-off frequency it's way too easy to either get into integer overflows, or be forced into cutting a lot of significant bits off. What's the least painful way to perform multiplications with a slightly higher bit count? Sadly something like dtype=np.int128 doesn't seem to exist :( I'm currently having some troubles with LP filters with cut-off frequency very close to the Nyquist frequency, and I'm thinking of perhaps improving the signal package first before investigating the problem any further, as that would help me simulate and compare where things go wrong. (3) Matlab seems to have some support for converting coefficients from sos sections into integers, and also tells you how many bits are needed to perform fixed-point arithmetics. It offers one specific way of rounding the coefficients to keep the calculations stable (I'm sorry, I don't remember what exactly it says, a friend showed it to me a while ago, but we no longer have access to that computer with Matlab installed). Does anyone know anything about this? I suspect it tries to round complex numbers close to the unit circle in a way that ensures the points remain inside the circle, but that's pure speculation. (It would be nice to be able to offer something similar on the python side.) (4) I have an ad-hoc simulation for filters with integer arithmetics. But it might be nice to add this support to scipy.signal as well, as this functionality might be generally useful. I can work on this, but I wouldn't mind a little bit of guidance on the subject if I start working in that direction. Anyway, I would start by fixing the coefficients first, and leave this one for the end. Thank you very much, Mojca PS: I'm relatively new to the subject of filtering digital signals, so please bear with me :) From 3ukip0s02 at sneakemail.com Wed Mar 25 13:06:13 2020 From: 3ukip0s02 at sneakemail.com (3ukip0s02 at sneakemail.com) Date: Wed, 25 Mar 2020 17:06:13 +0000 Subject: [SciPy-Dev] Improving scaling factors in IIR SOS sections in scipy.signal Message-ID: <5880-1585155973-556647@sneakemail.com> There was some discussion about this at the time, but I don't think there has been any work on implementation. See https://github.com/scipy/scipy/pull/3717#issuecomment-45398832 and replies. (I don't think closed issues come up in Google searches as easily) From warren.weckesser at gmail.com Sun Mar 29 02:03:39 2020 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 29 Mar 2020 02:03:39 -0400 Subject: [SciPy-Dev] Azure CI failures Message-ID: Some Azure tests for a couple recent PRs are failing with the message ##[error]TF24668: The following team project collection is stopped: scipy-org. Start the collection and then try again. Administrator Reason: abuse The message says "Administrator Reason: abuse". Does anyone know what this is about? Warren From ralf.gommers at gmail.com Sun Mar 29 07:01:00 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 29 Mar 2020 13:01:00 +0200 Subject: [SciPy-Dev] Azure CI failures In-Reply-To: References: Message-ID: On Sun, Mar 29, 2020 at 8:03 AM Warren Weckesser wrote: > Some Azure tests for a couple recent PRs are failing with the message > > ##[error]TF24668: The following team project collection is > stopped: scipy-org. Start the collection and then try again. > Administrator Reason: abuse > > The message says "Administrator Reason: abuse". Does anyone know what > this is about? > Very odd, never seen this before and searching for that string only gives me a single result [1] which is unrelated (it's about setting an env var we don't use). I logged into Azure Devops, there's nothing there that tells us more. Right now it looks like PRs are erroring, while builds from master triggered by merge commits are succeeding. Looks like an Azure bug - let's give it a day or two. Cheers, Ralf [1] https://developercommunity.visualstudio.com/content/problem/892178/tf24668-the-following-team-project-collection-is-s-2.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Mar 29 12:40:32 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 29 Mar 2020 18:40:32 +0200 Subject: [SciPy-Dev] Alternative return value for io.loadmat In-Reply-To: <0C48819D-3FDA-48E0-B53E-F560DCBD8F4F@uni-graz.at> References: <0C48819D-3FDA-48E0-B53E-F560DCBD8F4F@uni-graz.at> Message-ID: On Wed, Mar 18, 2020 at 8:59 AM Brunner, Clemens ( clemens.brunner at uni-graz.at) wrote: > Hello! > > Loading .mat files with scipy.io.loadmat returns a rather complex dict > object with nested struct arrays. If the .mat file contains cell arrays, > the resulting structure is particularly hard to parse for humans (see e.g. > https://github.com/scipy/scipy/issues/7895). For this reason, people have > come up with conversion routines that simplify the result of io.loadmat > (e.g. http://blog.nephics.com/2019/08/28/better-loadmat-for-scipy). > > Simple example using the following MATLAB struct (download the .mat file > here: https://github.com/scipy/scipy/files/1314287/matlab.zip): > > >> s = struct('mycell', {{'a', 'b', 'c'}}); > > Accessing the value 'a' with scipy.io.loadmat: > > >>> mat["s"]["mycell"][0, 0][0, 0][0] > 'a' > > The solution from the blog post provides easier access to the same value > because the nested struct arrays get flattened: > > >>> mat1["s"]["mycell"][0] > 'a' > > In https://github.com/scipy/scipy/issues/7895 we propose to add such an > alternative return value structure to scipy.io.loadmat. > > Do you think this is a useful addition that should be included? There's enough feedback from users and people writing workarounds that I think it's clear that there's value in supporting this within SciPy. If so, what would be the best way to integrate it? Should scipy.io.loadmat > get a new keyword argument that influences what is returned (i.e. the > default behavior should return the current value, whereas setting the kwarg > to a specific value would return the alternative value)? Or should there be > a new function that gets the current value and converts it to the > alternative structure? > I answered this on the issue. Thanks for working on this Clemens! Cheers, Ralf > Thanks! > Clemens > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Mar 30 06:40:34 2020 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 30 Mar 2020 12:40:34 +0200 Subject: [SciPy-Dev] Azure CI failures In-Reply-To: References: Message-ID: On Sun, Mar 29, 2020 at 1:01 PM Ralf Gommers wrote: > > > On Sun, Mar 29, 2020 at 8:03 AM Warren Weckesser < > warren.weckesser at gmail.com> wrote: > >> Some Azure tests for a couple recent PRs are failing with the message >> >> ##[error]TF24668: The following team project collection is >> stopped: scipy-org. Start the collection and then try again. >> Administrator Reason: abuse >> >> The message says "Administrator Reason: abuse". Does anyone know what >> this is about? >> > > Very odd, never seen this before and searching for that string only gives > me a single result [1] which is unrelated (it's about setting an env var we > don't use). > > I logged into Azure Devops, there's nothing there that tells us more. > > Right now it looks like PRs are erroring, while builds from master > triggered by merge commits are succeeding. Looks like an Azure bug - let's > give it a day or two. > Tyler pinged the right people in https://github.com/scipy/scipy/pull/11748#issuecomment-605725911 (thanks!). Ralf > > [1] > https://developercommunity.visualstudio.com/content/problem/892178/tf24668-the-following-team-project-collection-is-s-2.html > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tyler.je.reddy at gmail.com Mon Mar 30 12:05:22 2020 From: tyler.je.reddy at gmail.com (Tyler Reddy) Date: Mon, 30 Mar 2020 10:05:22 -0600 Subject: [SciPy-Dev] Azure CI failures In-Reply-To: References: Message-ID: Our host should be unblocked now. On Mon, 30 Mar 2020 at 04:41, Ralf Gommers wrote: > > > On Sun, Mar 29, 2020 at 1:01 PM Ralf Gommers > wrote: > >> >> >> On Sun, Mar 29, 2020 at 8:03 AM Warren Weckesser < >> warren.weckesser at gmail.com> wrote: >> >>> Some Azure tests for a couple recent PRs are failing with the message >>> >>> ##[error]TF24668: The following team project collection is >>> stopped: scipy-org. Start the collection and then try again. >>> Administrator Reason: abuse >>> >>> The message says "Administrator Reason: abuse". Does anyone know what >>> this is about? >>> >> >> Very odd, never seen this before and searching for that string only gives >> me a single result [1] which is unrelated (it's about setting an env var we >> don't use). >> >> I logged into Azure Devops, there's nothing there that tells us more. >> >> Right now it looks like PRs are erroring, while builds from master >> triggered by merge commits are succeeding. Looks like an Azure bug - let's >> give it a day or two. >> > > Tyler pinged the right people in > https://github.com/scipy/scipy/pull/11748#issuecomment-605725911 > (thanks!). > > Ralf > > >> >> [1] >> https://developercommunity.visualstudio.com/content/problem/892178/tf24668-the-following-team-project-collection-is-s-2.html >> >> _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From larson.eric.d at gmail.com Tue Mar 31 10:31:54 2020 From: larson.eric.d at gmail.com (Eric Larson) Date: Tue, 31 Mar 2020 10:31:54 -0400 Subject: [SciPy-Dev] Improving scaling factors in IIR SOS sections in scipy.signal In-Reply-To: References: Message-ID: > > Would such a change be acceptable upstream? And what would be the best way > to handle "backward compatibility"? > I was originally thinking that a new kwarg for the SOS-generation functions would be okay. But this would involve modifying potentially many functions, and at the same time seems like a separable step. In other words, once you get the `sos` array from the generation function (or even construct one yourself), you can directly figure out the gain factors of the sections (whatever they happen to be), compute the total gain, and then redistribute it among the sections however you want. Maybe a new `sos_even = distribute_gain(sos)` would be good. Then if there are other gain-distribution methods that are relevant eventually we could add a `method='even' | 'first'` kwarg. Perhaps we'd even want this `method` kwarg from the start so you could test that round-trip calls with `'even'` and then `'first'` give back the original `sos` you got from the construction function. I expect that this would only be a few lines (loop plus NumPy arithmetic), but it sounds generally useful enough to include, especially if MATLAB (and Octave) do this redistribution by default. (1) I was playing with signal.sosfiltfilt(...) to apply an IIR filter > to the input signal. I didn't check the source code, but the results > look as if the filter was working partially "backwards". > An IIR filter can use the same parts that a FIR filter can (i.e., as many values of the input as it wants) plus its own previous output values. If you want the IIR filter to do only one or the other, you restrict the numerator or denominator of the transfer function. Since you say you're a newcomer to digital filtering, I'd recommend reading up a bit on z-transform transfer function basics to get a handle on this if you're interested. (2) I would like to simulate the filter with integer (fixed point) > arithmetics. > ... (4) I have an ad-hoc simulation for filters with integer arithmetics. > But it might be nice to add this support to scipy.signal as well, as > this functionality might be generally useful. I can work on this, but > I wouldn't mind a little bit of guidance on the subject if I start > working in that direction. Anyway, I would start by fixing the > coefficients first, and leave this one for the end. After a little bit of searching the most relevant thing I could find is "demodel" for modeling fixed-point in Python, but I'm not sure how up to date it is. This sort of functionality in general seems like it would be useful, but perhaps better suited to its own package, at least to start, as I suspect it will require a lot of (fixed-point) domain-specific knowledge and testing to get right. For it to be included in SciPy we'd want to have someone with this domain expertise (or willingness to acquire enough of it) to commit to helping maintain the code -- I don't think any current maintainers have it (please correct me if I'm wrong!), and we don't want to be stuck in a position of maintaining code we don't understand :) (3) Matlab seems to have some support for converting coefficients from > sos sections into integers, and also tells you how many bits are > needed to perform fixed-point arithmetics. It offers one specific way > of rounding the coefficients to keep the calculations stable > I don't know anything about this, but in general a workable path has been: 1. Look at textbooks, MATLAB, or Octave for references (cannot look at their code due to license restrictions) for a given algorithm 2. Look for a BSD-compatible implementation that exists in Python (maybe not polished but could be adapted to SciPy) 3. If none exists, implement the algorithm in Python/SciPy from scratch 4. Test against values produced in MATLAB/Octave. Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From mojca.miklavec.lists at gmail.com Tue Mar 31 17:27:58 2020 From: mojca.miklavec.lists at gmail.com (Mojca Miklavec) Date: Tue, 31 Mar 2020 23:27:58 +0200 Subject: [SciPy-Dev] Improving scaling factors in IIR SOS sections in scipy.signal In-Reply-To: References: Message-ID: Hi, On Tue, 31 Mar 2020 at 16:32, Eric Larson wrote: >> >> Would such a change be acceptable upstream? And what would be the best way to handle "backward compatibility"? > > I was originally thinking that a new kwarg for the SOS-generation functions would be okay. But this would involve modifying potentially many functions, and at the same time seems like a separable step. In other words, once you get the `sos` array from the generation function (or even construct one yourself), you can directly figure out the gain factors of the sections (whatever they happen to be), compute the total gain, and then redistribute it among the sections however you want. > > Maybe a new `sos_even = distribute_gain(sos)` would be good. Then if there are other gain-distribution methods that are relevant eventually we could add a `method='even' | 'first'` kwarg. Perhaps we'd even want this `method` kwarg from the start so you could test that round-trip calls with `'even'` and then `'first'` give back the original `sos` you got from the construction function. > > I expect that this would only be a few lines (loop plus NumPy arithmetic), but it sounds generally useful enough to include, especially if MATLAB (and Octave) do this redistribution by default. In the meantime I managed to get access to a machine with Matlab installed and also tested what Octave does. I get the "correct" / full coefficients with fdatool (a graphical tool; see below for results), while > pkg load signal > [z,p,k] = butter(4, 3e-4); > sos = zp2sos(z,p,k); > sos sos = 4.9253e-14 9.8505e-14 4.9253e-14 1.0000e+00 -1.9983e+00 9.9826e-01 1.0000e+00 2.0000e+00 1.0000e+00 1.0000e+00 -1.9993e+00 9.9928e-01 sadly gives me the results equivalent to python :( With fdatool I can pick between "double precision", "single precision" and "fixed point" (with some additional settings), but let's say that ending up with double-precision is good enough for now. This is output from fdatool after some very basic point-and-click-ing: % Discrete-Time IIR Filter (real) % ------------------------------- % Filter Structure : Direct-Form II, Second-Order Sections % Number of Sections : 2 % Stable : Yes % Linear Phase : No SOS Matrix: 1 2 1 1 -1.9927241098599966 0.99281261642961327 1 2 1 1 -1.9826478027009247 0.98273586173273308 Scale Values: 0.000022126642404234759 0.000022014757952111739 You can see that 4.9e-14 is split into two sections (2.2e-7), with each section serving as its own standalone sos section with correct gain. >> (1) I was playing with signal.sosfiltfilt(...) to apply an IIR filter >> to the input signal. I didn't check the source code, but the results >> look as if the filter was working partially "backwards". > > An IIR filter can use the same parts that a FIR filter can (i.e., as many values of the input as it wants) plus its own previous output values. Sure, but generally FIR filters are (very) long and the result would be something like y[t] = a[- n] * x[t - n] + a[-n + 1] * x[t - n + 1] + ... + a[n] * x[t + n] with y[t] depending on values from the future. While one can have an IIR filter of even the first or the second order with y[t] = b[0] * x[t] + b[1] * x[t - 1] + b[2] * x[t - 2] - a[1] * y[t - 1] - a[2] * y[t - 2] which should depend exclusively on the values from the past. What I didn't particularly like was the following behaviour: sos = signal.butter(4, 3e-4, output='sos') plt.plot(signal.sosfiltfilt(sos, np.concatenate([np.zeros(2000), np.ones(100000)]))) plt.plot(signal.sosfiltfilt(sos, np.concatenate([np.zeros(10000), np.ones(100000)]))) plt.show() I wanted both to produce the same shape. But I believe I now found the solution at https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.sosfiltfilt.html with zi = x[:4].mean() * sosfilt_zi(sos) y2, zo = sosfilt(sos, x, zi=zi) When comparing the results with my own filter implementation and the results were wildly different, I forgot to search for the proper solution to correctly initialize the filter. (The result without initialisation looked more like a FIR filter.) >> (2) I would like to simulate the filter with integer (fixed point) >> arithmetics. >> >> ... >> >> (4) I have an ad-hoc simulation for filters with integer arithmetics. >> But it might be nice to add this support to scipy.signal as well, as >> this functionality might be generally useful. I can work on this, but >> I wouldn't mind a little bit of guidance on the subject if I start >> working in that direction. Anyway, I would start by fixing the >> coefficients first, and leave this one for the end. > > After a little bit of searching the most relevant thing I could find is "demodel" for modeling fixed-point in Python, but I'm not sure how up to date it is. This sort of functionality in general seems like it would be useful, but perhaps better suited to its own package, at least to start, as I suspect it will require a lot of (fixed-point) domain-specific knowledge and testing to get right. For it to be included in SciPy we'd want to have someone with this domain expertise (or willingness to acquire enough of it) to commit to helping maintain the code -- I don't think any current maintainers have it (please correct me if I'm wrong!), and we don't want to be stuck in a position of maintaining code we don't understand :) After staring at the Matlab interface for a while (today I had the first chance to access a machine) and seeing the gazillion of different option I tend to agree with this. There's probably a very large gap between my own ad-hoc implementation to get the job done (which is already working using just a few lines of code), and the feature-complete variant. >> (3) Matlab seems to have some support for converting coefficients from >> sos sections into integers, and also tells you how many bits are >> needed to perform fixed-point arithmetics. It offers one specific way >> of rounding the coefficients to keep the calculations stable > > I don't know anything about this, but in general a workable path has been: > > 1. Look at textbooks, MATLAB, or Octave for references (cannot look at their code due to license restrictions) for a given algorithm > 2. Look for a BSD-compatible implementation that exists in Python (maybe not polished but could be adapted to SciPy) > 3. If none exists, implement the algorithm in Python/SciPy from scratch > 4. Test against values produced in MATLAB/Octave. I now found for example Chapter 7 (Quantized Filter Analysis) in B. A. Shenoi: Introduction to Digital Signal Processing and Filter Design, and I guess there are a lot more books on this topic. I'll look through it, but it's certainly tons more work compared to just "fixing" the gain factors, so maybe too much for now. But just for completeness and to answer my own question. Here are the rounding options offered by Matlab: - Ceiling - Nearest - Nearest (convergent) - Round - Zero - Floor (there's a gazillion of other options to fine-tune the results). That above books says: "The convergent operation is the same as rounding except that in the case when the number is exactly halfway, it is rounded down if the penultimate bit is zero and rounded up if it is one." I'll come up with something for gain correction first and ask for feedback once I have some code to show. Mojca