From reddykaushik18 at gmail.com Tue Jan 1 11:01:16 2019 From: reddykaushik18 at gmail.com (K. Kaushik Reddy) Date: Tue, 1 Jan 2019 21:31:16 +0530 Subject: [SciPy-Dev] Greetings, Message-ID: Hey there, I am a second-year undergraduate C.S student at Amrita School of Engineering, Bengaluru, India. I would like to take up the project idea suggested in your wiki namely, "Enhance the Randomized Numerical Linear Algebra functionality". I am a Data science enthusiast and a python developer. I am familiar with NumPy and Pandas as well . I also have experience working with the Linux kernel. Linear Algebra was one of my courses during my first-year, so I'm good with the basics. I think the most natural way to get started would be making some small yet related contributions to the project. Do you have such small fixes that could be assigned to me? Besides that, I would really appreciate any form of suggestions or information about how the project could be completed. I really love to work with SciPy for the GSoC-2019 and much passionate about contributing to the SciPy community . Will be looking forward for your positive reply. Regards, K. Kaushik Reddy. ReplyForward -------------- next part -------------- An HTML attachment was scrubbed... URL: From reddykaushik18 at gmail.com Tue Jan 1 11:52:47 2019 From: reddykaushik18 at gmail.com (K. Kaushik Reddy) Date: Tue, 1 Jan 2019 22:22:47 +0530 Subject: [SciPy-Dev] Greetings, Message-ID: Hey there, I am a second-year undergraduate C.S student at Amrita School of Engineering, Bengaluru, India. I would like to take up the project idea suggested in your wiki namely, "Enhance the Randomized Numerical Linear Algebra functionality". I am a Data science enthusiast and a python developer. I am familiar with NumPy and Pandas as well . I also have experience working with the Linux kernel. Linear Algebra was one of my courses during my first-year, so I'm good with the basics. I think the most natural way to get started would be making some small yet related contributions to the project. Do you have such small fixes that could be assigned to me? Besides that, I would really appreciate any form of suggestions or information about how the project could be completed. I really love to work with SciPy for the GSoC-2019 and much passionate about contributing to the SciPy community . Will be looking forward for your positive reply. Regards, K. Kaushik Reddy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Tue Jan 1 17:48:57 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 1 Jan 2019 14:48:57 -0800 Subject: [SciPy-Dev] Greetings, In-Reply-To: References: Message-ID: On Tue, Jan 1, 2019 at 8:01 AM K. Kaushik Reddy wrote: > Hey there, > > I am a second-year undergraduate C.S student at Amrita School of > Engineering, Bengaluru, India. I would like to take up the project idea > suggested in your wiki namely, "Enhance the Randomized Numerical Linear > Algebra functionality". I am a Data science enthusiast and a python > developer. I am familiar with NumPy and Pandas as well . I also have > experience working with the Linux kernel. Linear Algebra was one of my > courses during my first-year, so I'm good with the basics. > > I think the most natural way to get started would be making some small > yet related contributions to the project. Do you have such small fixes > that could be assigned to me? > Besides that, I would really appreciate any form of suggestions or information > about how the project could be completed. I really love to work with SciPy for > the GSoC-2019 and much passionate about contributing to the SciPy community > . Will be looking forward for your positive reply. > Hi Kaushik, thanks for your interest. Note that you're looking at last years ideas list; we don't have one for this year and haven't yet discussed whether we'll participate in GSoC (we'll likely do so though). For things to get started with, you can have a look at the issues labeled "good first issue". You can also have a look at the roadmap sections of http://scipy.github.io/devdocs/ (note, a "detailed roadmap" section will appear there in an hour or so) for ideas that match your interest and expertise. Cheers, Ralf > > Regards, > K. Kaushik Reddy. > ReplyForward > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.baumgarten at gmail.com Wed Jan 2 04:34:37 2019 From: christoph.baumgarten at gmail.com (Christoph Baumgarten) Date: Wed, 2 Jan 2019 10:34:37 +0100 Subject: [SciPy-Dev] Deprecate planck distribution? Message-ID: Hi all, happy new year! I noted that the Planck distribution is a geometric distribution with a different parametrization, see Issue #9359: import numpy as np from scipy.stats import planck, geom a = 0.5 k = np.arange(20) sum(abs(geom.pmf(k, 1-np.exp(-a), loc=-1) - planck.pmf(k, a))) # 1.30e-18 I don't know if there is a specific reason to have the Planck distribution in addition to the geometric. If not, I would propose to deprecate it. Any views? Thanks Christoph -------------- next part -------------- An HTML attachment was scrubbed... URL: From rajiv.vaidyanathan4 at gmail.com Wed Jan 2 08:08:18 2019 From: rajiv.vaidyanathan4 at gmail.com (Rajiv Vaidyanathan) Date: Wed, 2 Jan 2019 18:38:18 +0530 Subject: [SciPy-Dev] Contributing to scipy Message-ID: Hi developers, I am Rajiv from India and I'm new to the scipy community. I want to contribute to the codebase and was looking for newcomers issue. But, all the new user issues seem to be taken. Is there any issue which I can work on so that I can get adapted to the codebase? Thanking you. Regards, Rajiv ? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Jan 2 15:07:54 2019 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 2 Jan 2019 12:07:54 -0800 Subject: [SciPy-Dev] Deprecate planck distribution? In-Reply-To: References: Message-ID: On Wed, Jan 2, 2019 at 1:36 AM Christoph Baumgarten < christoph.baumgarten at gmail.com> wrote: > > Hi all, > > happy new year! > > I noted that the Planck distribution is a geometric distribution with a different parametrization, see Issue #9359: > > import numpy as np > from scipy.stats import planck, geom > > a = 0.5 > k = np.arange(20) > sum(abs(geom.pmf(k, 1-np.exp(-a), loc=-1) - planck.pmf(k, a))) # 1.30e-18 > > I don't know if there is a specific reason to have the Planck distribution in addition to the geometric. If not, I would propose to deprecate it. > > Any views? Thanks If we were to turn back time, and the question was whether to *add* the Planck distribution given that we had the geometric distribution, I would probably be convinced by this. However, given that the Planck distribution has already been added, I don't think that it's worth removing it. The marginal cost to having this alternate parameterization is likely less than the cost of anyone changing their code. The collection of probability distributions are also a place where some nontrivial duplication actually has some positive value. People typically come to `scipy.stats` with a distribution (with a name and specific parameterization conventions) already in mind. Having more than one parameterization available helps people recognize the distribution that they want; having an alternate present doesn't impair the search task while not having one they are looking for (or burying it in the Notes of the docstring of the canonical version) can make the search task much harder. It's a common complaint that `scipy.stats` doesn't expose certain common parameterizations of distributions, so we should probably be working to expand the collection of parameterizations rather than collapsing them. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Jan 2 22:54:21 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 2 Jan 2019 19:54:21 -0800 Subject: [SciPy-Dev] Contributing to scipy In-Reply-To: References: Message-ID: On Wed, Jan 2, 2019 at 5:08 AM Rajiv Vaidyanathan < rajiv.vaidyanathan4 at gmail.com> wrote: > Hi developers, > > I am Rajiv from India and I'm new to the scipy community. I want to > contribute to the codebase and was looking for newcomers issue. But, all > the new user issues seem to be taken. Is there any issue which I can work > on so that I can get adapted to the codebase? > Hi Rajiv, welcome. We can point you in the right direction if you tell us what parts/submodules of scipy you're interested in. For context on getting started this may be helpful: https://github.com/scipy/scipy/issues/9030. Two PRs that you are stalled but easy to complete are: https://github.com/scipy/scipy/pull/8359 https://github.com/scipy/scipy/pull/8744 If for one or both you think you can address the remaining issues, you could take over the branch from which the PR was sent, rebase on master, and your own new commit(s) to fix things up, and create a new PR. Cheers, Ralf > Thanking you. > > Regards, > Rajiv > ? > ? > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Thu Jan 3 01:58:16 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 2 Jan 2019 22:58:16 -0800 Subject: [SciPy-Dev] SciPy GSoC'15 participation? In-Reply-To: <25b2b04a-7fb9-2d82-54a4-26f2cc88fa39@toybox.ca> References: <25b2b04a-7fb9-2d82-54a4-26f2cc88fa39@toybox.ca> Message-ID: Hi all, It's the time of the year where GSoC kicks off again. Below email from Terri, the lead PSF organizer, explains some of the deadlines and things that are different from last year. So: do we want to participate again this year? We already have one serious proposal from a student in preparation, about scipy.fftpack (see earlier thread on this list). If we do want to participate, do we have volunteers for mentors and a sub-org admin? I've done the latter for the last years, but this year I really do not have the time (if you're interested: it doesn't take much time compared to mentoring). Cheers, Ralf ---------- Forwarded message --------- From: Terri Oda Date: Tue, Jan 1, 2019 at 11:02 PM Subject: [GSoC-mentors] Python in GSoC 2019! To: Happy new year, everyone! As you may have seen, we're starting to prepare Python's application for GSoC 2019, and we need your ideas. We've got a few eager students already asking what you'd like them to work on. The website has been updated for 2019: http://python-gsoc.org New and notable: - GSoC org applications open early this year, starting January 15th! Google moves the dates around every year so students in different countries with different schedules get opportunities to participate more easily when the times line up with their scholastic year, and this is one of the early years. - We're asking for sub-orgs to get as many ideas ready as they can by Feb 4th so that we have lots of ideas ready for when Google judges our umbrella org ideas page. We need well-formed ideas if we want to get accepted, and now's a great time to catch the eye of the most eager students! Once you've got some ideas ready, you can make a pull request to get yourself added to the page here: https://github.com/python-gsoc/python-gsoc.github.io - John has set up a Slack channel for Python GSoC. It's bridged in to link to the IRC channel, but may be a more familiar interface/mobile app for people who aren't regular IRC users. I know, it's not open source, but we didn't have much luck with Zulip and while Matrix has been good it's not quite solving the usability problem we have with the students, so we're trying out the more popular Slack. We'll see how it works this year and if it's worth keeping, so if this is a thing you want us to support, please use it! (And if you see problems, please report them to gsoc-admins at python.org so we can get them fixed.) You can snag a Slack invite here: https://join.slack.com/t/python-gsoc/shared_invite/enQtNDg0NDgwNjMyNTE2LTNlOGM1MWY2MzRlMjNhMGM2OWJjYzE3ODRmMmM0MjFjNGJmNGRiYzI4ZDc1ODgxOTYzMDQyNzBiNGFlYWVjZTY - We've got space for some new sub-orgs this year! If you know of any projects that might want to try out GSoC with us this year, let us know, or tell them to email gsoc-admins at python.org (or join irc/matrix/slack, or whatever) to chat with us! Terri _______________________________________________ GSoC-mentors mailing list GSoC-mentors at python.org https://mail.python.org/mailman/listinfo/gsoc-mentors -------------- next part -------------- An HTML attachment was scrubbed... URL: From ali.cetin at outlook.com Thu Jan 3 09:22:14 2019 From: ali.cetin at outlook.com (Ali Cetin) Date: Thu, 3 Jan 2019 14:22:14 +0000 Subject: [SciPy-Dev] Deprecate planck distribution? In-Reply-To: References: , Message-ID: ________________________________ From: SciPy-Dev on behalf of Robert Kern Sent: Wednesday, January 2, 2019 21:07 To: SciPy Developers List Subject: Re: [SciPy-Dev] Deprecate planck distribution? On Wed, Jan 2, 2019 at 1:36 AM Christoph Baumgarten > wrote: > > Hi all, > > happy new year! > > I noted that the Planck distribution is a geometric distribution with a different parametrization, see Issue #9359: > > import numpy as np > from scipy.stats import planck, geom > > a = 0.5 > k = np.arange(20) > sum(abs(geom.pmf(k, 1-np.exp(-a), loc=-1) - planck.pmf(k, a))) # 1.30e-18 > > I don't know if there is a specific reason to have the Planck distribution in addition to the geometric. If not, I would propose to deprecate it. > > Any views? Thanks If we were to turn back time, and the question was whether to *add* the Planck distribution given that we had the geometric distribution, I would probably be convinced by this. However, given that the Planck distribution has already been added, I don't think that it's worth removing it. The marginal cost to having this alternate parameterization is likely less than the cost of anyone changing their code. The collection of probability distributions are also a place where some nontrivial duplication actually has some positive value. People typically come to `scipy.stats` with a distribution (with a name and specific parameterization conventions) already in mind. Having more than one parameterization available helps people recognize the distribution that they want; having an alternate present doesn't impair the search task while not having one they are looking for (or burying it in the Notes of the docstring of the canonical version) can make the search task much harder. It's a common complaint that `scipy.stats` doesn't expose certain common parameterizations of distributions, so we should probably be working to expand the collection of parameterizations rather than collapsing them. Robert Kern I agree with Robert on this one. If you want to go down that rat hole, you will quickly find that most distribution functions are mere special cases and/or alternative parameterizations of a few general classes of distributions. If the concern is code management, then it could be argued that an effort should be made on abstracting distribution functions from these more general classes. However, personally, I prefer transparency and consistency with established literature when it comes to parametrization. That's my two cents on the issue. Cheers, Ali Cetin -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.watson at sheffield.ac.uk Thu Jan 3 13:08:30 2019 From: mike.watson at sheffield.ac.uk (Michael Watson) Date: Thu, 3 Jan 2019 18:08:30 +0000 Subject: [SciPy-Dev] add johnson SL distribution Message-ID: Hi all, happy new year, We have the SB and SU Johnson distributions implemented but not the SL distribution, it doesn't look like much work to add it in if it's appropriate, I'm doing some work with these distributions and ultimately would like to implement functions to fit by moments and by quantiles too. there are existing implementations that are distributed under the BSD licence here: https://uk.mathworks.com/matlabcentral/fileexchange/46123-johnson-curve-toolbox so it doesn't seem like a big job from my point of view and I'll be doing it anyway. it would also be my first contribution so if it would be better to start with another issue (I saw a list and 2 stalled PRs in another email) then try to add functionality just say and I can look at contributing other ways first. Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From haberland at ucla.edu Thu Jan 3 15:30:33 2019 From: haberland at ucla.edu (Matt Haberland) Date: Thu, 3 Jan 2019 12:30:33 -0800 Subject: [SciPy-Dev] add johnson SL distribution In-Reply-To: References: Message-ID: I am not personally familiar with the Johnson family of distributions , but the SL does seem to complete the set. The license for the Matlab implementation does seem to be BSD 3-clause and thus compatible with SciPy. Seems like a reasonable first issue, but certainly finishing stalled PRs would be helpful, too! Matt Haberland On Thu, Jan 3, 2019 at 10:09 AM Michael Watson wrote: > Hi all, happy new year, > We have the SB and SU Johnson distributions implemented but not the SL > distribution, it doesn't look like much work to add it in if it's > appropriate, I'm doing some work with these distributions and ultimately > would like to implement functions to fit by moments and by quantiles too. > there are existing implementations that are distributed under the BSD > licence here: > > > https://uk.mathworks.com/matlabcentral/fileexchange/46123-johnson-curve-toolbox > > so it doesn't seem like a big job from my point of view and I'll be doing > it anyway. > > it would also be my first contribution so if it would be better to start > with another issue (I saw a list and 2 stalled PRs in another email) then > try to add functionality just say and I can look at contributing other ways > first. > Mike > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -- Matt Haberland Assistant Adjunct Professor in the Program in Computing Department of Mathematics 6617A Math Sciences Building, UCLA -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jan 3 15:54:41 2019 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 3 Jan 2019 15:54:41 -0500 Subject: [SciPy-Dev] add johnson SL distribution In-Reply-To: References: Message-ID: On Thu, Jan 3, 2019 at 3:31 PM Matt Haberland wrote: > I am not personally familiar with the Johnson family of distributions > , > but the SL does seem to complete the set. > > The license for the Matlab implementation does seem to be BSD 3-clause > and thus compatible > with SciPy. > > Seems like a reasonable first issue, but certainly finishing stalled PRs > would be helpful, too! > > Matt Haberland > > On Thu, Jan 3, 2019 at 10:09 AM Michael Watson < > mike.watson at sheffield.ac.uk> wrote: > >> Hi all, happy new year, >> We have the SB and SU Johnson distributions implemented but not the SL >> distribution, it doesn't look like much work to add it in if it's >> appropriate, I'm doing some work with these distributions and ultimately >> would like to implement functions to fit by moments and by quantiles too. >> there are existing implementations that are distributed under the BSD >> licence here: >> >> >> https://uk.mathworks.com/matlabcentral/fileexchange/46123-johnson-curve-toolbox >> >> so it doesn't seem like a big job from my point of view and I'll be doing >> it anyway. >> >> it would also be my first contribution so if it would be better to start >> with another issue (I saw a list and 2 stalled PRs in another email) then >> try to add functionality just say and I can look at contributing other ways >> first. >> > In general to adding new distributions The speed of getting a new distribution in depends a lot on how well it fits into the general distribution pattern and whether all core methods are available as closed form expression or by using scipy.special functions. If that is the case, then adding a new distribution is easy. If that is not the case, then it can be difficult to get a good version merged. One difficult case is if the pdf is only available as computationally expensive numerical approximation. The distributions have in general only the fit method using maximum likelihood estimation of parameters (which might reduce to method of moments in special cases). Based on a quick search it looks like JohnsonSL is just the log-normal distribution (as loc-scale family which is available in scipy) Josef > Mike >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > > > -- > Matt Haberland > Assistant Adjunct Professor in the Program in Computing > Department of Mathematics > 6617A Math Sciences Building, UCLA > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jan 3 15:57:26 2019 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 3 Jan 2019 15:57:26 -0500 Subject: [SciPy-Dev] add johnson SL distribution In-Reply-To: References: Message-ID: On Thu, Jan 3, 2019 at 3:54 PM wrote: > > > On Thu, Jan 3, 2019 at 3:31 PM Matt Haberland wrote: > >> I am not personally familiar with the Johnson family of distributions >> , >> but the SL does seem to complete the set. >> >> The license for the Matlab implementation does seem to be BSD 3-clause >> and thus >> compatible with SciPy. >> >> Seems like a reasonable first issue, but certainly finishing stalled PRs >> would be helpful, too! >> >> Matt Haberland >> >> On Thu, Jan 3, 2019 at 10:09 AM Michael Watson < >> mike.watson at sheffield.ac.uk> wrote: >> >>> Hi all, happy new year, >>> We have the SB and SU Johnson distributions implemented but not the SL >>> distribution, it doesn't look like much work to add it in if it's >>> appropriate, I'm doing some work with these distributions and ultimately >>> would like to implement functions to fit by moments and by quantiles too. >>> there are existing implementations that are distributed under the BSD >>> licence here: >>> >>> >>> https://uk.mathworks.com/matlabcentral/fileexchange/46123-johnson-curve-toolbox >>> >>> so it doesn't seem like a big job from my point of view and I'll be >>> doing it anyway. >>> >>> it would also be my first contribution so if it would be better to start >>> with another issue (I saw a list and 2 stalled PRs in another email) then >>> try to add functionality just say and I can look at contributing other ways >>> first. >>> >> > In general to adding new distributions > > The speed of getting a new distribution in depends a lot on how well it > fits into the general distribution pattern and whether all core methods are > available as closed form expression or by using scipy.special functions. > If that is the case, then adding a new distribution is easy. > If that is not the case, then it can be difficult to get a good version > merged. One difficult case is if the pdf is only available as > computationally expensive numerical approximation. > > The distributions have in general only the fit method using maximum > likelihood estimation of parameters (which might reduce to method of > moments in special cases). > > Based on a quick search it looks like JohnsonSL is just the log-normal > distribution (as loc-scale family which is available in scipy) > scipy lognorm is a 3 parameter family, maybe there should also be a 4 parameter family > > Josef > > >> Mike >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at python.org >>> https://mail.python.org/mailman/listinfo/scipy-dev >>> >> >> >> -- >> Matt Haberland >> Assistant Adjunct Professor in the Program in Computing >> Department of Mathematics >> 6617A Math Sciences Building, UCLA >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jan 3 16:29:22 2019 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 3 Jan 2019 16:29:22 -0500 Subject: [SciPy-Dev] Deprecate planck distribution? In-Reply-To: References: Message-ID: On Thu, Jan 3, 2019 at 9:22 AM Ali Cetin wrote: > > > ------------------------------ > *From:* SciPy-Dev on > behalf of Robert Kern > *Sent:* Wednesday, January 2, 2019 21:07 > *To:* SciPy Developers List > *Subject:* Re: [SciPy-Dev] Deprecate planck distribution? > > On Wed, Jan 2, 2019 at 1:36 AM Christoph Baumgarten < > christoph.baumgarten at gmail.com> wrote: > > > > Hi all, > > > > happy new year! > > > > I noted that the Planck distribution is a geometric distribution with a > different parametrization, see Issue #9359: > > > > import numpy as np > > from scipy.stats import planck, geom > > > > a = 0.5 > > k = np.arange(20) > > sum(abs(geom.pmf(k, 1-np.exp(-a), loc=-1) - planck.pmf(k, a))) # 1.30e-18 > > > > I don't know if there is a specific reason to have the Planck > distribution in addition to the geometric. If not, I would propose to > deprecate it. > > > > Any views? Thanks > > If we were to turn back time, and the question was whether to *add* the > Planck distribution given that we had the geometric distribution, I would > probably be convinced by this. However, given that the Planck distribution > has already been added, I don't think that it's worth removing it. The > marginal cost to having this alternate parameterization is likely less than > the cost of anyone changing their code. > > The collection of probability distributions are also a place where some > nontrivial duplication actually has some positive value. People typically > come to `scipy.stats` with a distribution (with a name and specific > parameterization conventions) already in mind. Having more than one > parameterization available helps people recognize the distribution that > they want; having an alternate present doesn't impair the search task while > not having one they are looking for (or burying it in the Notes of the > docstring of the canonical version) can make the search task much harder. > It's a common complaint that `scipy.stats` doesn't expose certain common > parameterizations of distributions, so we should probably be working to > expand the collection of parameterizations rather than collapsing them. > > > Robert Kern > > I agree with Robert on this one. If you want to go down that rat hole, you > will quickly find that most distribution functions are mere special cases > and/or alternative parameterizations of a few general classes of > distributions. If the concern is code management, then it could be argued > that an effort should be made on abstracting distribution functions from > these more general classes. However, personally, I prefer transparency and > consistency with established literature when it comes to parametrization. > I think there is a good reason for implementing special cases instead of only general cases because then computational simplifications can be used, e.g. using only general distribution with several extra parameters is cumbersome and requires a lot more work for the user, e.g. in setting all the extra parameters to their special case values. This is not the case for pure reparameterization that still have the same number of parameters. The main straight jacket in the scipy.stats distribution case in terms of parameterization is that all continuous distributions use the loc-scale (plus possibly shape) parameterization. I think there are enough maintainers now (where I don't count myself), that it would be feasible to add other distribution classes that don't have to follow the loc-scale parameterization, or that could be intermediate classes for groups of similar distributions. For example, I think something similar to the frozen distribution class could be added that is just a Reparameterization class, i.e. internally delegates to a standard scipy distribution, but uses a parameterization and parameter transformation that is more common and more familiar to users. Another advantage of reparameterization classes would be that estimation is often easier or more interpretable in a different parameterization. E.g. statsmodels uses negativebinomial in the mean-dispersion parameterization instead of the common negbin parameterization. Another advantage of that is that the hessian, covariance of the parameter estimates has often a nicer shape in different parameterization. A example for a intermediate class would be common support for distribution that are created by a transformation of another, mainly normal distribution. This includes the Johnson system of distribution in the other open thread on the list. (Just some thoughts, I'm currently not in this neighborhood of stats.) Josef > > That's my two cents on the issue. > > Cheers, > Ali Cetin > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From reddykaushik18 at gmail.com Fri Jan 4 04:32:55 2019 From: reddykaushik18 at gmail.com (K. Kaushik Reddy) Date: Fri, 4 Jan 2019 15:02:55 +0530 Subject: [SciPy-Dev] Thanks Ralph. Reference books needed for scipy projects. Message-ID: Hi everyone, For a couple of days I have been working out few project samples which could be tried out for scipy with a larger criteria. It would be helpful to me (in my work) if you guys can suggest any books or resources which could be used as a reference while trying out / going for any scipy project . Also, thank you Ralf for the reply mail. I thought the project "Enhance the Randomized Numerical Linear Algebra functionality" was not tried in 2018, so I thought of working on it this year. Anyways, I would come up with some cool projects for the scipy community very soon. Best, K. Kaushik Reddy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jan 4 21:35:55 2019 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 4 Jan 2019 19:35:55 -0700 Subject: [SciPy-Dev] NumPy 1.16.0rc2 released Message-ID: Hi All, On behalf of the NumPy team I'm pleased to announce the release of NumPy 1.16.0rc2. This is the last NumPy release to support Python 2.7 and will be maintained as a long term release with bug fixes until 2020. This release has seen a lot of refactoring and features many bug fixes, improved code organization, and better cross platform compatibility. Not all of these improvements will be visible to users, but they should help make maintenance easier going forward. Highlights are - Experimental support for overriding numpy functions in downstream projects. - The matmul function is now a ufunc and can be overridden using __array_ufunc__. - Improved support for the ARM and POWER architectures. - Improved support for AIX and PyPy. - Improved interoperation with ctypes. - Improved support for PEP 3118. The supported Python versions are 2.7 and 3.5-3.7, support for 3.4 has been dropped. The wheels on PyPI are linked with OpenBLAS v0.3.4+, which should fix the known threading issues found in previous OpenBLAS versions. Downstream developers building this release should use Cython >= 0.29.2 and, if linking OpenBLAS, OpenBLAS > v0.3.4. Wheels for this release can be downloaded from PyPI , source archives are available from Github . *Contributors* A total of 112 people contributed to this release. People with a "+" by their names contributed a patch for the first time. * Alan Fontenot + * Allan Haldane * Alon Hershenhorn + * Alyssa Quek + * Andreas Nussbaumer + * Anner + * Anthony Sottile + * Antony Lee * Ayappan P + * Bas van Schaik + * C.A.M. Gerlach + * Charles Harris * Chris Billington * Christian Clauss * Christoph Gohlke * Christopher Pezley + * Daniel B Allan + * Daniel Smith * Dawid Zych + * Derek Kim + * Dima Pasechnik + * Edgar Giovanni Lepe + * Elena Mokeeva + * Elliott Sales de Andrade + * Emil Hessman + * Eric Schles + * Eric Wieser * Giulio Benetti + * Guillaume Gautier + * Guo Ci * Heath Henley + * Isuru Fernando + * J. Lewis Muir + * Jack Vreeken + * Jaime Fernandez * James Bourbeau * Jeff VanOss * Jeffrey Yancey + * Jeremy Chen + * Jeremy Manning + * Jeroen Demeyer * John Darbyshire + * John Kirkham * John Zwinck * Jonas Jensen + * Joscha Reimer + * Juan Azcarreta + * Julian Taylor * Kevin Sheppard * Krzysztof Chomski + * Kyle Sunden * Lars Gr?ter * Lilian Besson + * MSeifert04 * Mark Harfouche * Marten van Kerkwijk * Martin Thoma * Matt Harrigan + * Matthew Bowden + * Matthew Brett * Matthias Bussonnier * Matti Picus * Max Aifer + * Michael Hirsch, Ph.D + * Michael James Jamie Schnaitter + * MichaelSaah + * Mike Toews * Minkyu Lee + * Mircea Akos Bruma + * Mircea-Akos Brum? + * Moshe Looks + * Muhammad Kasim + * Nathaniel J. Smith * Nikita Titov + * Paul M?ller + * Paul van Mulbregt * Pauli Virtanen * Pierre Glaser + * Pim de Haan * Ralf Gommers * Robert Kern * Robin Aggleton + * Rohit Pandey + * Roman Yurchak + * Ryan Soklaski * Sebastian Berg * Sho Nakamura + * Simon Gibbons * Stan Seibert + * Stefan Otte * Stefan van der Walt * Stephan Hoyer * Stuart Archibald * Taylor Smith + * Tim Felgentreff + * Tim Swast + * Tim Teichmann + * Toshiki Kataoka * Travis Oliphant * Tyler Reddy * Uddeshya Singh + * Warren Weckesser * Weitang Li + * Wenjamin Petrenko + * William D. Irons * Yannick Jadoul + * Yaroslav Halchenko * Yug Khanna + * Yuji Kanagawa + * Yukun Guo + * ankokumoyashi + * lerbuke + Cheers, Charles Harris -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.baumgarten at gmail.com Sat Jan 5 02:08:58 2019 From: christoph.baumgarten at gmail.com (Christoph Baumgarten) Date: Sat, 5 Jan 2019 08:08:58 +0100 Subject: [SciPy-Dev] Deprecate planck distribution? In-Reply-To: References: Message-ID: My main concern about planck is that I am not aware that this is a known distribution name. I found Planck's law ( https://en.wikipedia.org/wiki/Planck%27s_law) but I don't recognize the distribution implemented in SciPy. Does anyone know the distribution under that name? It is also called discrete exponential in scipy: normally, the geometric distribution is called the discrete analogue of the exponential (no memory property), so this could be confusing for users. The implementation of geom in SciPy is based on geometric in NumPy, my guess is that it has a better sampling method than the one of planck based on the ppf. We can also leave the different parametrization in stats and explain it in the docstring. Christoph On Thu, Jan 3, 2019 at 10:30 PM wrote: > Send SciPy-Dev mailing list submissions to > scipy-dev at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scipy-dev > or, via email, send a message with subject or body 'help' to > scipy-dev-request at python.org > > You can reach the person managing the list at > scipy-dev-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of SciPy-Dev digest..." > > > Today's Topics: > > 1. Re: add johnson SL distribution (josef.pktd at gmail.com) > 2. Re: Deprecate planck distribution? (josef.pktd at gmail.com) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 3 Jan 2019 15:57:26 -0500 > From: josef.pktd at gmail.com > To: SciPy Developers List > Subject: Re: [SciPy-Dev] add johnson SL distribution > Message-ID: > xNSuRn4b6okWQ at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > On Thu, Jan 3, 2019 at 3:54 PM wrote: > > > > > > > On Thu, Jan 3, 2019 at 3:31 PM Matt Haberland > wrote: > > > >> I am not personally familiar with the Johnson family of distributions > >> < > https://books.google.com/books?id=_LvgBwAAQBAJ&pg=PA197&lpg=PA197&dq=johns+su+sb+sl+distributions&source=bl&ots=LBowBmYTse&sig=9KPViyvSlLAFp9EYqi-ejTYgQ30&hl=en&sa=X&ved=2ahUKEwjE6cnvt9LfAhWG458KHdrQAmkQ6AEwDXoECAIQAQ#v=onepage&q=johns%20su%20sb%20sl%20distributions&f=false > >, > >> but the SL does seem to complete the set. > >> > >> The license for the Matlab implementation does seem to be BSD 3-clause > >> and thus > >> compatible with SciPy. > >> > >> Seems like a reasonable first issue, but certainly finishing stalled PRs > >> would be helpful, too! > >> > >> Matt Haberland > >> > >> On Thu, Jan 3, 2019 at 10:09 AM Michael Watson < > >> mike.watson at sheffield.ac.uk> wrote: > >> > >>> Hi all, happy new year, > >>> We have the SB and SU Johnson distributions implemented but not the SL > >>> distribution, it doesn't look like much work to add it in if it's > >>> appropriate, I'm doing some work with these distributions and > ultimately > >>> would like to implement functions to fit by moments and by quantiles > too. > >>> there are existing implementations that are distributed under the BSD > >>> licence here: > >>> > >>> > >>> > https://uk.mathworks.com/matlabcentral/fileexchange/46123-johnson-curve-toolbox > >>> > >>> so it doesn't seem like a big job from my point of view and I'll be > >>> doing it anyway. > >>> > >>> it would also be my first contribution so if it would be better to > start > >>> with another issue (I saw a list and 2 stalled PRs in another email) > then > >>> try to add functionality just say and I can look at contributing other > ways > >>> first. > >>> > >> > > In general to adding new distributions > > > > The speed of getting a new distribution in depends a lot on how well it > > fits into the general distribution pattern and whether all core methods > are > > available as closed form expression or by using scipy.special functions. > > If that is the case, then adding a new distribution is easy. > > If that is not the case, then it can be difficult to get a good version > > merged. One difficult case is if the pdf is only available as > > computationally expensive numerical approximation. > > > > The distributions have in general only the fit method using maximum > > likelihood estimation of parameters (which might reduce to method of > > moments in special cases). > > > > Based on a quick search it looks like JohnsonSL is just the log-normal > > distribution (as loc-scale family which is available in scipy) > > > > scipy lognorm is a 3 parameter family, maybe there should also be a 4 > parameter family > > > > > > Josef > > > > > >> Mike > >>> _______________________________________________ > >>> SciPy-Dev mailing list > >>> SciPy-Dev at python.org > >>> https://mail.python.org/mailman/listinfo/scipy-dev > >>> > >> > >> > >> -- > >> Matt Haberland > >> Assistant Adjunct Professor in the Program in Computing > >> Department of Mathematics > >> 6617A Math Sciences Building, UCLA > >> _______________________________________________ > >> SciPy-Dev mailing list > >> SciPy-Dev at python.org > >> https://mail.python.org/mailman/listinfo/scipy-dev > >> > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/scipy-dev/attachments/20190103/5b18e5d7/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Thu, 3 Jan 2019 16:29:22 -0500 > From: josef.pktd at gmail.com > To: SciPy Developers List > Subject: Re: [SciPy-Dev] Deprecate planck distribution? > Message-ID: > HNQvX9b572p4SWMF765D6sJYw at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > On Thu, Jan 3, 2019 at 9:22 AM Ali Cetin wrote: > > > > > > > ------------------------------ > > *From:* SciPy-Dev > on > > behalf of Robert Kern > > *Sent:* Wednesday, January 2, 2019 21:07 > > *To:* SciPy Developers List > > *Subject:* Re: [SciPy-Dev] Deprecate planck distribution? > > > > On Wed, Jan 2, 2019 at 1:36 AM Christoph Baumgarten < > > christoph.baumgarten at gmail.com> wrote: > > > > > > Hi all, > > > > > > happy new year! > > > > > > I noted that the Planck distribution is a geometric distribution with a > > different parametrization, see Issue #9359: > > > > > > import numpy as np > > > from scipy.stats import planck, geom > > > > > > a = 0.5 > > > k = np.arange(20) > > > sum(abs(geom.pmf(k, 1-np.exp(-a), loc=-1) - planck.pmf(k, a))) # > 1.30e-18 > > > > > > I don't know if there is a specific reason to have the Planck > > distribution in addition to the geometric. If not, I would propose to > > deprecate it. > > > > > > Any views? Thanks > > > > If we were to turn back time, and the question was whether to *add* the > > Planck distribution given that we had the geometric distribution, I would > > probably be convinced by this. However, given that the Planck > distribution > > has already been added, I don't think that it's worth removing it. The > > marginal cost to having this alternate parameterization is likely less > than > > the cost of anyone changing their code. > > > > The collection of probability distributions are also a place where some > > nontrivial duplication actually has some positive value. People typically > > come to `scipy.stats` with a distribution (with a name and specific > > parameterization conventions) already in mind. Having more than one > > parameterization available helps people recognize the distribution that > > they want; having an alternate present doesn't impair the search task > while > > not having one they are looking for (or burying it in the Notes of the > > docstring of the canonical version) can make the search task much harder. > > It's a common complaint that `scipy.stats` doesn't expose certain common > > parameterizations of distributions, so we should probably be working to > > expand the collection of parameterizations rather than collapsing them. > > > > > > Robert Kern > > > > I agree with Robert on this one. If you want to go down that rat hole, > you > > will quickly find that most distribution functions are mere special cases > > and/or alternative parameterizations of a few general classes of > > distributions. If the concern is code management, then it could be argued > > that an effort should be made on abstracting distribution functions from > > these more general classes. However, personally, I prefer transparency > and > > consistency with established literature when it comes to parametrization. > > > > I think there is a good reason for implementing special cases instead of > only general cases because then computational simplifications can be used, > e.g. using only general distribution with several extra parameters is > cumbersome and requires a lot more work for the user, e.g. in setting all > the extra parameters to their special case values. > > This is not the case for pure reparameterization that still have the same > number of parameters. > > The main straight jacket in the scipy.stats distribution case in terms of > parameterization is that all continuous distributions use the loc-scale > (plus possibly shape) parameterization. > I think there are enough maintainers now (where I don't count myself), that > it would be feasible to add other distribution classes that don't have to > follow the loc-scale parameterization, or that could be intermediate > classes for groups of similar distributions. > > For example, I think something similar to the frozen distribution class > could be added that is just a Reparameterization class, i.e. internally > delegates to a standard scipy distribution, but uses a parameterization and > parameter transformation that is more common and more familiar to users. > Another advantage of reparameterization classes would be that estimation is > often easier or more interpretable in a different parameterization. E.g. > statsmodels uses negativebinomial in the mean-dispersion parameterization > instead of the common negbin parameterization. > Another advantage of that is that the hessian, covariance of the parameter > estimates has often a nicer shape in different parameterization. > > A example for a intermediate class would be common support for distribution > that are created by a transformation of another, mainly normal > distribution. > This includes the Johnson system of distribution in the other open thread > on the list. > > (Just some thoughts, I'm currently not in this neighborhood of stats.) > > Josef > > > > > > > That's my two cents on the issue. > > > > Cheers, > > Ali Cetin > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at python.org > > https://mail.python.org/mailman/listinfo/scipy-dev > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/scipy-dev/attachments/20190103/f9e0f17f/attachment.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > > > ------------------------------ > > End of SciPy-Dev Digest, Vol 183, Issue 6 > ***************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sat Jan 5 02:54:18 2019 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 4 Jan 2019 23:54:18 -0800 Subject: [SciPy-Dev] Deprecate planck distribution? In-Reply-To: References: Message-ID: On Fri, Jan 4, 2019 at 11:09 PM Christoph Baumgarten < christoph.baumgarten at gmail.com> wrote: > My main concern about planck is that I am not aware that this is a known > distribution name. I found Planck's law ( > https://en.wikipedia.org/wiki/Planck%27s_law) but I don't recognize the > distribution implemented in SciPy. > I believe if you work out the details, you derive the spectral energy density distribution known as Planck's Law from the underlying geometric particle-count distributions of particles at each frequency. > Does anyone know the distribution under that name? > Travis did: https://github.com/scipy/scipy/commit/f1ad8198f2e967a8ca109d4f98f2bfe550b593a4 Here's one recent use (though I strongly expect that they picked up on the name because of scipy rather than independently knowing it under that name) (and on the gripping hand, still indicates a use in the wild that would be real code breakage if we removed it): https://ieeexplore.ieee.org/abstract/document/8052152 > It is also called discrete exponential in scipy: normally, the geometric > distribution is called the discrete analogue of the exponential (no memory > property), so this could be confusing for users. > That said, the `planck` parameterization is more related to the (canonical parameterization of the) continuous exponential distribution than the `geom` parameterization. It's worth noting the relationship in both of those docstrings, though. The implementation of geom in SciPy is based on geometric in NumPy, my > guess is that it has a better sampling method than the one of planck based > on the ppf. > Indeed, since there is an equivalence, there is the opportunity for a more direct implementation of `_rvs()`. > We can also leave the different parametrization in stats and explain it in > the docstring. > I would prefer this. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Jan 5 04:19:55 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 5 Jan 2019 01:19:55 -0800 Subject: [SciPy-Dev] Thanks Ralph. Reference books needed for scipy projects. In-Reply-To: References: Message-ID: On Fri, Jan 4, 2019 at 1:33 AM K. Kaushik Reddy wrote: > Hi everyone, > > For a couple of days I have been working out few project samples which > could be tried out for scipy with a larger criteria. It would be helpful to > me (in my work) if you guys can suggest any books or resources which could > be used as a reference while trying out / going for any scipy project . > That's a bit too general of a question. SciPy covers so many topics, there's no one book that covers it. The SciPy docs ( http://scipy.github.io/devdocs/) are your best guide; many functions include references for further reading. Cheers, Ralf Also, thank you Ralf for the reply mail. I thought the project "Enhance the > Randomized Numerical Linear Algebra functionality" was not tried in 2018, > so I thought of working on it this year. > Anyways, I would come up with some cool projects for the scipy community > very soon. > > Best, > K. Kaushik Reddy. > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Jan 5 04:31:03 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 5 Jan 2019 01:31:03 -0800 Subject: [SciPy-Dev] add johnson SL distribution In-Reply-To: References: Message-ID: On Thu, Jan 3, 2019 at 10:08 AM Michael Watson wrote: > Hi all, happy new year, > We have the SB and SU Johnson distributions implemented but not the SL > distribution, it doesn't look like much work to add it in if it's > appropriate, I'm doing some work with these distributions and ultimately > would like to implement functions to fit by moments and by quantiles too. > there are existing implementations that are distributed under the BSD > licence here: > > > https://uk.mathworks.com/matlabcentral/fileexchange/46123-johnson-curve-toolbox > > so it doesn't seem like a big job from my point of view and I'll be doing > it anyway. > +1 would be a nice addition. Cheers, Ralf > it would also be my first contribution so if it would be better to start > with another issue (I saw a list and 2 stalled PRs in another email) then > try to add functionality just say and I can look at contributing other ways > first. > Mike > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From evgeny.burovskiy at gmail.com Sat Jan 5 11:16:35 2019 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Sat, 5 Jan 2019 19:16:35 +0300 Subject: [SciPy-Dev] Fwd: [scipy/scipy] Move misc.doccer to _lib.doccer (#9652) In-Reply-To: References: Message-ID: Hi, As a part of an ongoing cleanup of scipy.misc namespace, here's a PR https://github.com/scipy/scipy/pull/9652 to deprecate scipy.misc.doccer module. With this PR, - the module itself is moved to scipy._lib.doccer, and all internal uses import from scipy._lib. - imports from scipy.misc.doccer emit DeprecationWarnings for now, and will stop working in a(yet unspecified) future release. The recommendation for the user code which uses `import scipy.misc.doccer` is to copy-paste the doccer.py file into your project. Thoughts? Cheers, Evgeni ---------- Forwarded message --------- From: Ralf Gommers Date: Sat, Jan 5, 2019 at 9:06 AM Subject: Re: [scipy/scipy] Move misc.doccer to _lib.doccer (#9652) To: scipy/scipy Cc: Evgeni Burovski , Author < author at noreply.github.com> +1 for this deprecation. Could you propose it on the mailing list though? I doubt there's many users, but nevertheless we should propose all deprecations there. ? You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub , or mute the thread . -------------- next part -------------- An HTML attachment was scrubbed... URL: From ssouravsingh12 at gmail.com Sun Jan 6 10:03:49 2019 From: ssouravsingh12 at gmail.com (Sourav Singh) Date: Sun, 6 Jan 2019 20:33:49 +0530 Subject: [SciPy-Dev] Discussion on backend system for scipy.fftpack Message-ID: Hello, I am writing the mail to start a discussion on the backend system for scipy.fftpack. I am currently looking into ways of implementing a backend system and have found that the Keras library's method for backend to be elegant. The Keras library maintains separate modules for each backend, which has their own classes and functions for operations and session creation. The default backend can be set up through a configuration or a flag of some kind. So far we have numpy.fft, scipy.fftpack, pyFFTW and cupy(I am not sure about having this) for FFT ops. I would like to discuss further about the decisions on the kind of backend system that would be required so I can design a document containing class diagrams and such. Thanks for taking the time to read my email and have a great day! Regards, Sourav -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Jan 9 01:20:48 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 8 Jan 2019 22:20:48 -0800 Subject: [SciPy-Dev] Fwd: [scipy/scipy] Move misc.doccer to _lib.doccer (#9652) In-Reply-To: References: Message-ID: On Sat, Jan 5, 2019 at 8:17 AM Evgeni Burovski wrote: > Hi, > > As a part of an ongoing cleanup of scipy.misc namespace, here's a PR > https://github.com/scipy/scipy/pull/9652 to deprecate scipy.misc.doccer > module. With this PR, > - the module itself is moved to scipy._lib.doccer, and all internal uses > import from scipy._lib. > - imports from scipy.misc.doccer emit DeprecationWarnings for now, and > will stop working in a(yet unspecified) future release. > > The recommendation for the user code which uses `import scipy.misc.doccer` > is to copy-paste the doccer.py file into your project. Thoughts? > That recommendation seems reasonable. It's not really SciPy's job to provide docstring formatting utilities. Cheers, Ralf > Cheers, > > Evgeni > > > > ---------- Forwarded message --------- > From: Ralf Gommers > Date: Sat, Jan 5, 2019 at 9:06 AM > Subject: Re: [scipy/scipy] Move misc.doccer to _lib.doccer (#9652) > To: scipy/scipy > Cc: Evgeni Burovski , Author < > author at noreply.github.com> > > > +1 for this deprecation. Could you propose it on the mailing list though? > I doubt there's many users, but nevertheless we should propose all > deprecations there. > > ? > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > , or mute > the thread > > . > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From larson.eric.d at gmail.com Wed Jan 9 10:45:29 2019 From: larson.eric.d at gmail.com (Eric Larson) Date: Wed, 9 Jan 2019 10:45:29 -0500 Subject: [SciPy-Dev] SciPy GSoC'15 participation? In-Reply-To: References: <25b2b04a-7fb9-2d82-54a4-26f2cc88fa39@toybox.ca> Message-ID: I say let's go for it. We can always apply to be a sub-org, and if no project/student combination seems is suitable given the mentor volunteer(s), I don't think there is any problem with not accepting any. The fftpack project sounds reasonable. I can at least be a secondary member, possibly a primary depending on my time this spring and summer. I can also be the SciPy sub-org admin, since I'm usually one for another sub-org anyway. Eric On Thu, Jan 3, 2019 at 1:58 AM Ralf Gommers wrote: > Hi all, > > It's the time of the year where GSoC kicks off again. Below email from > Terri, the lead PSF organizer, explains some of the deadlines and things > that are different from last year. > > So: do we want to participate again this year? We already have one serious > proposal from a student in preparation, about scipy.fftpack (see earlier > thread on this list). > > If we do want to participate, do we have volunteers for mentors and a > sub-org admin? I've done the latter for the last years, but this year I > really do not have the time (if you're interested: it doesn't take much > time compared to mentoring). > > Cheers, > Ralf > > > ---------- Forwarded message --------- > From: Terri Oda > Date: Tue, Jan 1, 2019 at 11:02 PM > Subject: [GSoC-mentors] Python in GSoC 2019! > To: > > > Happy new year, everyone! > > As you may have seen, we're starting to prepare Python's application for > GSoC 2019, and we need your ideas. We've got a few eager students > already asking what you'd like them to work on. > > The website has been updated for 2019: http://python-gsoc.org > > New and notable: > > - GSoC org applications open early this year, starting January 15th! > Google moves the dates around every year so students in different > countries with different schedules get opportunities to participate more > easily when the times line up with their scholastic year, and this is > one of the early years. > > - We're asking for sub-orgs to get as many ideas ready as they can by > Feb 4th so that we have lots of ideas ready for when Google judges our > umbrella org ideas page. We need well-formed ideas if we want to get > accepted, and now's a great time to catch the eye of the most eager > students! Once you've got some ideas ready, you can make a pull request > to get yourself added to the page here: > https://github.com/python-gsoc/python-gsoc.github.io > > - John has set up a Slack channel for Python GSoC. It's bridged in to > link to the IRC channel, but may be a more familiar interface/mobile app > for people who aren't regular IRC users. I know, it's not open source, > but we didn't have much luck with Zulip and while Matrix has been good > it's not quite solving the usability problem we have with the students, > so we're trying out the more popular Slack. We'll see how it works this > year and if it's worth keeping, so if this is a thing you want us to > support, please use it! (And if you see problems, please report them to > gsoc-admins at python.org so we can get them fixed.) > > You can snag a Slack invite here: > > https://join.slack.com/t/python-gsoc/shared_invite/enQtNDg0NDgwNjMyNTE2LTNlOGM1MWY2MzRlMjNhMGM2OWJjYzE3ODRmMmM0MjFjNGJmNGRiYzI4ZDc1ODgxOTYzMDQyNzBiNGFlYWVjZTY > > - We've got space for some new sub-orgs this year! If you know of any > projects that might want to try out GSoC with us this year, let us know, > or tell them to email gsoc-admins at python.org (or join irc/matrix/slack, > or whatever) to chat with us! > > Terri > > > > > > > _______________________________________________ > GSoC-mentors mailing list > GSoC-mentors at python.org > https://mail.python.org/mailman/listinfo/gsoc-mentors > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Jan 9 22:20:27 2019 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 9 Jan 2019 20:20:27 -0700 Subject: [SciPy-Dev] SciPy GSoC'15 participation? In-Reply-To: References: <25b2b04a-7fb9-2d82-54a4-26f2cc88fa39@toybox.ca> Message-ID: On Wed, Jan 9, 2019 at 8:46 AM Eric Larson wrote: > I say let's go for it. We can always apply to be a sub-org, and if no > project/student combination seems is suitable given the mentor > volunteer(s), I don't think there is any problem with not accepting any. > The fftpack project sounds reasonable. I can at least be a secondary > member, possibly a primary depending on my time this spring and summer. > > I can also be the SciPy sub-org admin, since I'm usually one for another > sub-org anyway. > > I suspect the fft project is going to be significant amount of work, especially if you get a student inexperienced with large chunks of C code and unfamiliar with the fft. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Jan 9 22:35:17 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 9 Jan 2019 19:35:17 -0800 Subject: [SciPy-Dev] SciPy GSoC'15 participation? In-Reply-To: References: <25b2b04a-7fb9-2d82-54a4-26f2cc88fa39@toybox.ca> Message-ID: On Wed, Jan 9, 2019 at 7:45 AM Eric Larson wrote: > I say let's go for it. We can always apply to be a sub-org, and if no > project/student combination seems is suitable given the mentor > volunteer(s), I don't think there is any problem with not accepting any. > True The fftpack project sounds reasonable. I can at least be a secondary > member, possibly a primary depending on my time this spring and summer. > > I can also be the SciPy sub-org admin, since I'm usually one for another > sub-org anyway. > Awesome, thanks! Ralf > Eric > > > On Thu, Jan 3, 2019 at 1:58 AM Ralf Gommers > wrote: > >> Hi all, >> >> It's the time of the year where GSoC kicks off again. Below email from >> Terri, the lead PSF organizer, explains some of the deadlines and things >> that are different from last year. >> >> So: do we want to participate again this year? We already have one >> serious proposal from a student in preparation, about scipy.fftpack (see >> earlier thread on this list). >> >> If we do want to participate, do we have volunteers for mentors and a >> sub-org admin? I've done the latter for the last years, but this year I >> really do not have the time (if you're interested: it doesn't take much >> time compared to mentoring). >> >> Cheers, >> Ralf >> >> >> ---------- Forwarded message --------- >> From: Terri Oda >> Date: Tue, Jan 1, 2019 at 11:02 PM >> Subject: [GSoC-mentors] Python in GSoC 2019! >> To: >> >> >> Happy new year, everyone! >> >> As you may have seen, we're starting to prepare Python's application for >> GSoC 2019, and we need your ideas. We've got a few eager students >> already asking what you'd like them to work on. >> >> The website has been updated for 2019: http://python-gsoc.org >> >> New and notable: >> >> - GSoC org applications open early this year, starting January 15th! >> Google moves the dates around every year so students in different >> countries with different schedules get opportunities to participate more >> easily when the times line up with their scholastic year, and this is >> one of the early years. >> >> - We're asking for sub-orgs to get as many ideas ready as they can by >> Feb 4th so that we have lots of ideas ready for when Google judges our >> umbrella org ideas page. We need well-formed ideas if we want to get >> accepted, and now's a great time to catch the eye of the most eager >> students! Once you've got some ideas ready, you can make a pull request >> to get yourself added to the page here: >> https://github.com/python-gsoc/python-gsoc.github.io >> >> - John has set up a Slack channel for Python GSoC. It's bridged in to >> link to the IRC channel, but may be a more familiar interface/mobile app >> for people who aren't regular IRC users. I know, it's not open source, >> but we didn't have much luck with Zulip and while Matrix has been good >> it's not quite solving the usability problem we have with the students, >> so we're trying out the more popular Slack. We'll see how it works this >> year and if it's worth keeping, so if this is a thing you want us to >> support, please use it! (And if you see problems, please report them to >> gsoc-admins at python.org so we can get them fixed.) >> >> You can snag a Slack invite here: >> >> https://join.slack.com/t/python-gsoc/shared_invite/enQtNDg0NDgwNjMyNTE2LTNlOGM1MWY2MzRlMjNhMGM2OWJjYzE3ODRmMmM0MjFjNGJmNGRiYzI4ZDc1ODgxOTYzMDQyNzBiNGFlYWVjZTY >> >> - We've got space for some new sub-orgs this year! If you know of any >> projects that might want to try out GSoC with us this year, let us know, >> or tell them to email gsoc-admins at python.org (or join irc/matrix/slack, >> or whatever) to chat with us! >> >> Terri >> >> >> >> >> >> >> _______________________________________________ >> GSoC-mentors mailing list >> GSoC-mentors at python.org >> https://mail.python.org/mailman/listinfo/gsoc-mentors >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From haberland at ucla.edu Sat Jan 12 21:14:08 2019 From: haberland at ucla.edu (Matt Haberland) Date: Sat, 12 Jan 2019 18:14:08 -0800 Subject: [SciPy-Dev] FAIL scipy/stats/tests/test_stats.py::TestIQR::test_scale ? Message-ID: Sorry if I missed something about this, but all builds have been experiencing a failure on Travis CI Python version 3.7 lately: FAIL scipy/stats/tests/test_stats.py::TestIQR::test_scale Thoughts? -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Jan 13 22:29:54 2019 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 13 Jan 2019 20:29:54 -0700 Subject: [SciPy-Dev] NumPy 1.16.0 released. Message-ID: Hi All, On behalf of the NumPy team I'm pleased to announce the release of NumPy 1.16.0. This is the last NumPy release to support Python 2.7 and will be maintained as a long term release with bug fixes until 2020. This release has seen a lot of refactoring and features many bug fixes, improved code organization, and better cross platform compatibility. Not all of these improvements will be visible to users, but they should help make maintenance easier going forward. Highlights are Experimental support for overriding numpy functions in downstream projects. - The matmul function is now a ufunc and can be overridden using __array_ufunc__. - Improved support for the ARM, POWER, and SPARC architectures. - Improved support for AIX and PyPy. - Improved interoperation with ctypes. - Improved support for PEP 3118. The supported Python versions are 2.7 and 3.5-3.7, support for 3.4 has been dropped. The wheels on PyPI are linked with OpenBLAS v0.3.4+, which should fix the known threading issues found in previous OpenBLAS versions. Downstream developers building this release should use Cython >= 0.29.2 and, if linking OpenBLAS, OpenBLAS > v0.3.4. Wheels for this release can be downloaded from PyPI , source archives are available from Github . *Contributors* A total of 113 people contributed to this release. People with a "+" by their names contributed a patch for the first time. * Alan Fontenot + * Allan Haldane * Alon Hershenhorn + * Alyssa Quek + * Andreas Nussbaumer + * Anner + * Anthony Sottile + * Antony Lee * Ayappan P + * Bas van Schaik + * C.A.M. Gerlach + * Charles Harris * Chris Billington * Christian Clauss * Christoph Gohlke * Christopher Pezley + * Daniel B Allan + * Daniel Smith * Dawid Zych + * Derek Kim + * Dima Pasechnik + * Edgar Giovanni Lepe + * Elena Mokeeva + * Elliott Sales de Andrade + * Emil Hessman + * Eric Larson * Eric Schles + * Eric Wieser * Giulio Benetti + * Guillaume Gautier + * Guo Ci * Heath Henley + * Isuru Fernando + * J. Lewis Muir + * Jack Vreeken + * Jaime Fernandez * James Bourbeau * Jeff VanOss * Jeffrey Yancey + * Jeremy Chen + * Jeremy Manning + * Jeroen Demeyer * John Darbyshire + * John Kirkham * John Zwinck * Jonas Jensen + * Joscha Reimer + * Juan Azcarreta + * Julian Taylor * Kevin Sheppard * Krzysztof Chomski + * Kyle Sunden * Lars Gr?ter * Lilian Besson + * MSeifert04 * Mark Harfouche * Marten van Kerkwijk * Martin Thoma * Matt Harrigan + * Matthew Bowden + * Matthew Brett * Matthias Bussonnier * Matti Picus * Max Aifer + * Michael Hirsch, Ph.D + * Michael James Jamie Schnaitter + * MichaelSaah + * Mike Toews * Minkyu Lee + * Mircea Akos Bruma + * Mircea-Akos Brum? + * Moshe Looks + * Muhammad Kasim + * Nathaniel J. Smith * Nikita Titov + * Paul M?ller + * Paul van Mulbregt * Pauli Virtanen * Pierre Glaser + * Pim de Haan * Ralf Gommers * Robert Kern * Robin Aggleton + * Rohit Pandey + * Roman Yurchak + * Ryan Soklaski * Sebastian Berg * Sho Nakamura + * Simon Gibbons * Stan Seibert + * Stefan Otte * Stefan van der Walt * Stephan Hoyer * Stuart Archibald * Taylor Smith + * Tim Felgentreff + * Tim Swast + * Tim Teichmann + * Toshiki Kataoka * Travis Oliphant * Tyler Reddy * Uddeshya Singh + * Warren Weckesser * Weitang Li + * Wenjamin Petrenko + * William D. Irons * Yannick Jadoul + * Yaroslav Halchenko * Yug Khanna + * Yuji Kanagawa + * Yukun Guo + * @ankokumoyashi + * @lerbuke + Cheers, Charles Harris -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.haessig at crans.org Mon Jan 14 03:40:36 2019 From: pierre.haessig at crans.org (Pierre Haessig) Date: Mon, 14 Jan 2019 09:40:36 +0100 Subject: [SciPy-Dev] ANN: SciPy 1.2.0 In-Reply-To: References: Message-ID: <1df8cef1-04f9-0342-5d18-132f7be7a2b9@crans.org> Hi, Nice new stuff, thanks! Le 18/12/2018 ? 17:57, Tyler Reddy a ?crit?: > ``welch()`` and ``csd()`` methods in `scipy.signal` now support > calculation > of a median average PSD, using ``average='mean'`` keyword Is this a typo? Should it be average='median' keyword? Best, Pierre From pierre.haessig at crans.org Mon Jan 14 09:00:34 2019 From: pierre.haessig at crans.org (Pierre Haessig) Date: Mon, 14 Jan 2019 15:00:34 +0100 Subject: [SciPy-Dev] References for weibull_min and weibull_max distributions Message-ID: Hello, I just submitted a small PR to clarify the docstring of exponweib distribution in scipy.stats (https://github.com/scipy/scipy/pull/9679). However, in the process, I got a bit confused with weibull_min and weibull_max. It seems that up to Scipy 0.19, it was specified as an alias to Frechet left distribution (https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.stats.weibull_min.html). However this is not mentioned anymore since 1.0. Also, it seems that weibull_min corresponds to the usual Weibull distribution, but its docstring doesn't say it explicitly. Also, I find no references on the web for those Weibull min/max. Would it be appropriate, in the long term, to simply have a Weibull distribution? Best, Pierre -- Pierre Haessig assistant professor at CentraleSup?lec, IETR CentraleSup?lec Avenue de la Boulaie - CS 47601 35576 Cesson-S?vign? Cedex France +33 299 84 45 76 From larson.eric.d at gmail.com Mon Jan 14 10:11:25 2019 From: larson.eric.d at gmail.com (Eric Larson) Date: Mon, 14 Jan 2019 10:11:25 -0500 Subject: [SciPy-Dev] FAIL scipy/stats/tests/test_stats.py::TestIQR::test_scale ? In-Reply-To: References: Message-ID: It seems to be due to a change in NumPy, I opened an issue upstream: https://github.com/numpy/numpy/issues/12737 Eric On Sat, Jan 12, 2019 at 9:14 PM Matt Haberland wrote: > Sorry if I missed something about this, but all builds have been > experiencing a failure on Travis CI Python version 3.7 lately: > > FAIL scipy/stats/tests/test_stats.py::TestIQR::test_scale > > Thoughts? > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfoxrabinovitz at gmail.com Mon Jan 14 14:46:48 2019 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Mon, 14 Jan 2019 14:46:48 -0500 Subject: [SciPy-Dev] FAIL scipy/stats/tests/test_stats.py::TestIQR::test_scale ? In-Reply-To: References: Message-ID: Given the state of the issue, I can write up a patch shortly, since the original was my PR to begin with. Regards, - Joe On Mon, Jan 14, 2019 at 10:12 AM Eric Larson wrote: > It seems to be due to a change in NumPy, I opened an issue upstream: > > https://github.com/numpy/numpy/issues/12737 > > Eric > > > On Sat, Jan 12, 2019 at 9:14 PM Matt Haberland wrote: > >> Sorry if I missed something about this, but all builds have been >> experiencing a failure on Travis CI Python version 3.7 lately: >> >> FAIL scipy/stats/tests/test_stats.py::TestIQR::test_scale >> >> Thoughts? >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Tue Jan 15 16:46:44 2019 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Tue, 15 Jan 2019 16:46:44 -0500 Subject: [SciPy-Dev] References for weibull_min and weibull_max distributions In-Reply-To: References: Message-ID: Hi Pierre, My answers to your questions are below... On 1/14/19, Pierre Haessig wrote: > Hello, > > I just submitted a small PR to clarify the docstring of exponweib > distribution in scipy.stats (https://github.com/scipy/scipy/pull/9679). > > However, in the process, I got a bit confused with weibull_min and > weibull_max. It seems that up to Scipy 0.19, it was specified as an > alias to Frechet left distribution > (https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.stats.weibull_min.html). > However this is not mentioned anymore since 1.0. For a long time, weibull_min and weibull_max were aliases of frechet_l and frechet_r. The problem was that the implementations in frechet_l and frechet_r were not what anyone calls the Fr?chet distribution these days. They were, in fact, what are almost universally called the Weibull distribution. So in SciPy 1.0.0, those implementations were moved to the weibull_min and weibull_max names, and the names frechet_r and frechet_l were deprecated (see https://github.com/scipy/scipy/pull/7838 for the details of the changes). The names frechet_l and frechet_r still exist, but if you use any of their methods, you will get a deprecation warning. (Unfortunately, it looks like I neglected make a note of this deprecation in the 1.0.0 release notes.) The distribution that is generally known as *the* Fr?chet distribution is also known as the inverse Weibull distribution (see, for example, https://en.wikipedia.org/wiki/Fr%C3%A9chet_distribution). It is implemented in SciPy as scipy.stats.invweibull. > Also, it seems that > weibull_min corresponds to the usual Weibull distribution, but its > docstring doesn't say it explicitly. Also, I find no references on the > web for those Weibull min/max. Would it be appropriate, in the long > term, to simply have a Weibull distribution? The extreme value distributions arise as the limiting distribution of taking the exteme value (i.e. maximum or minimum) of a large number of samples from some underlying distribution. For a certain class of underlying distributions, if you take the maximum, in the (appropriately renormalized) limit you get the distribution that SciPy calls weibull_max, and if you take the minimum, you get weibull_min. (These distributions are related: if F(x, c) is the CDF of weibull_min with shape parameter c, then the CDF of weibull_max is 1 - F(-x, c).) The issue, then, is which one should be considered the "usual" Weibull distribution? The answer is not obvious. For example, the distribution described in the wikipedia article on the Weibull distribution (https://en.wikipedia.org/wiki/Weibull_distribution) corresponds to weibull_min. This is also the distribution from which numpy.random.weibull draws samples. On the other hand, in the book "An Introduction to Statistical Modeling of Extreme Values" by Stuart Coles, and in the book "Modelling Extremal Events" by Embrechts, Kl?ppelberg and Mikosch (two widely used texts on extreme value theory), the distribution that is called the Weibull distribution corresponds to SciPy's weibull_max. So I think we are better off *not* picking one to be called the "usual" Weibull distribution. The current names accurately describe the basis of the two flavors of the distribution. However, we should improve the documentation to include this information about the min/max distinction in their docstrings. We should do the same for gumbel_l and gumbel_r. I'd be happy to make this change, but I probably won't get to it in the near future, so I'd be even happier if someone created a pull request that added this information to the docstrings of weibull_min and weibull_max. Similar updates for gumbel_l and gumber_r could be made at the same time or in a separate pull request. (The original implementations of these extreme value distributions dates back to before my involvement with SciPy, so I can't say why the Weibull distribution used the suffixes _min and _max while the other distributions with two conventions used _l and _r, and I don't know why we don't have the two versions for the inverse Weibull--a.k.a. Fr?chet-- distribution.) Warren From stefanv at berkeley.edu Fri Jan 18 12:55:34 2019 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Fri, 18 Jan 2019 09:55:34 -0800 Subject: [SciPy-Dev] ANN: scikit-image 0.14.2 Message-ID: <20190118175534.hjhcnkhrkjjnckfp@carbo> Announcement: scikit-image 0.14.2 ================================= This release handles an incompatibility between scikit-image and NumPy 1.16.0, released on January 13th 2019. It contains the following changes from 0.14.1: API changes ----------- - ``skimage.measure.regionprops`` no longer removes singleton dimensions from label images (#3284). To recover the old behavior, replace ``regionprops(label_image)`` calls with ``regionprops(np.squeeze(label_image))`` Bug fixes --------- - Address deprecation of NumPy ``_validate_lengths`` (backport of #3556) - Correctly handle the maximum number of lines in Hough transforms (backport of #3514) - Correctly implement early stopping criterion for rank kernel noise filter (backport of #3503) - Fix ``skimage.measure.regionprops`` for 1x1 inputs (backport of #3284) Enhancements ------------ - Rewrite of ``local_maxima`` with flood-fill (backport of #3022, #3447) Build Process & Testing ----------------------- - Dedicate a ``--pre`` build in appveyor (backport of #3222) - Avoid Travis-CI failure regarding ``skimage.lookfor`` (backport of #3477) - Stop using the ``pytest.fixtures`` decorator (#3558) - Filter out DeprecationPendingWarning for matrix subclass (#3637) - Fix matplotlib test warnings and circular import (#3632) Contributors & Reviewers ------------------------ - Fran?ois Boulogne - Emmanuelle Gouillart - Lars Gr?ter - Mark Harfouche - Juan Nunez-Iglesias - Egor Panfilov - Stefan van der Walt From tyler.je.reddy at gmail.com Wed Jan 23 18:26:59 2019 From: tyler.je.reddy at gmail.com (Tyler Reddy) Date: Wed, 23 Jan 2019 15:26:59 -0800 Subject: [SciPy-Dev] ppc64le linux in CI Message-ID: In a recent PR ( https://github.com/scipy/scipy/pull/9684 ) it was suggested to bring this discussion to the mailing list. In short, do we want to have ppc64le linux testing in our CI? This would follow suit from NumPy, since IBM has made the nodes available on Travis, but could have drawbacks too re: CI times and maintenance burden. Thoughts? Tyler -------------- next part -------------- An HTML attachment was scrubbed... URL: From newville at cars.uchicago.edu Wed Jan 23 20:24:57 2019 From: newville at cars.uchicago.edu (Matt Newville) Date: Wed, 23 Jan 2019 19:24:57 -0600 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters Message-ID: Hi All, First, I apologize in advance if this sounds un-appreciative of the efforts made in scipy and scipy.optimize. I am a very big fan, and very appreciative of the work done here. With lmfit we have tried to take the "rough edges" from optimization and curve-fitting with python, but we're very much in favor of building wrappers on top of the core of scipy.optimize. Still, many people use `curve_fit` because it comes built-in with scipy, is well advertised, and is well-suited for simple uses. It is clearly aimed at novices and tries to hide many of the complexities of using optimization routines for curve fitting. I try to help people with questions about using `curve_fit` and `scipy.optimize` as well as `lmfit`. In trying to help people using `curve_fit`, I've noticed a disturbing trend. When a novice or even experienced user asks for help using `curve_fit` because a fit "doesn't work" there is a very good chance that they did not specify `p0` for the initial values and that the default behavior of setting all starting values to 1.0 will cause their fit to fail to converge, or really to even start. This failure is understandable to an experienced user, but apparently not to the novice. Curve-fitting problems are by nature local solvers and are always sensitive to initial values. For some problems, parameter values of 1.0 are simply inappropriate, and the numerical derivatives for some parameters near values of 1.0 will be 0. Indeed, there is no value that is always a reasonable starting value for all parameters. FWIW, in lmfit, we simply refuse to let a user run a curve-fitting problem without initial values. I believe that most other curve-fitting interfaces also require initial values for all parameters. Unfortunately, faced with no initial parameter values, `curve_fit` silently chooses initial values. It doesn't try to make an informed decision, it simply chooses '1.0', which can easily be so far off as to prevent a solution from being found. When this happens, `curve_fit` gives no information to the user of what the problem is. Indeed it allows initial values to not be set, giving the impression that they are not important. This impression is wrong: initial values are always important. `curve_fit` is mistaken in having a default starting value. I've made a Pull Request at https://github.com/scipy/scipy/pull/9701 to fix this misbehavior, so that `curve_fit()` requires starting values. It was suggested there that this topic should be discussed here. I'm happy to do so. It was suggested in the github Issue that forcing the user to give initial values was "annoying". It was also suggested that a deprecation cycle would be required. I should say that I don't actually use `curve_fit()` myself, I'm just trying to help make this commonly used routine be completely wrong less often. Cheers, --Matt Newville -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Jan 23 23:12:19 2019 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 23 Jan 2019 20:12:19 -0800 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: Message-ID: For what it's worth, I agree that we should remove the default. There's no generic value that accidentally works often enough to justify the time wasted and novice confusion when it fails. I also agree that it is going to take a reasonably long deprecation cycle. On Wed, Jan 23, 2019 at 5:26 PM Matt Newville wrote: > Hi All, > > First, I apologize in advance if this sounds un-appreciative of the > efforts made in scipy and scipy.optimize. I am a very big fan, and very > appreciative of the work done here. With lmfit we have tried to take the > "rough edges" from optimization and curve-fitting with python, but we're > very much in favor of building wrappers on top of the core of > scipy.optimize. > > Still, many people use `curve_fit` because it comes built-in with scipy, > is well advertised, and is well-suited for simple uses. It is clearly > aimed at novices and tries to hide many of the complexities of using > optimization routines for curve fitting. I try to help people with > questions about using `curve_fit` and `scipy.optimize` as well as `lmfit`. > In trying to help people using `curve_fit`, I've noticed a disturbing > trend. When a novice or even experienced user asks for help using > `curve_fit` because a fit "doesn't work" there is a very good chance that > they did not specify `p0` for the initial values and that the default > behavior of setting all starting values to 1.0 will cause their fit to fail > to converge, or really to even start. > > This failure is understandable to an experienced user, but apparently not > to the novice. Curve-fitting problems are by nature local solvers and are > always sensitive to initial values. For some problems, parameter values of > 1.0 are simply inappropriate, and the numerical derivatives for some > parameters near values of 1.0 will be 0. Indeed, there is no value that is > always a reasonable starting value for all parameters. FWIW, in lmfit, we > simply refuse to let a user run a curve-fitting problem without initial > values. I believe that most other curve-fitting interfaces also require > initial values for all parameters. > > Unfortunately, faced with no initial parameter values, `curve_fit` > silently chooses initial values. It doesn't try to make an informed > decision, it simply chooses '1.0', which can easily be so far off as to > prevent a solution from being found. When this happens, `curve_fit` gives > no information to the user of what the problem is. Indeed it allows > initial values to not be set, giving the impression that they are not > important. This impression is wrong: initial values are always important. > `curve_fit` is mistaken in having a default starting value. > > I've made a Pull Request at https://github.com/scipy/scipy/pull/9701 to > fix this misbehavior, so that `curve_fit()` requires starting values. It > was suggested there that this topic should be discussed here. I'm happy to > do so. It was suggested in the github Issue that forcing the user to give > initial values was "annoying". It was also suggested that a deprecation > cycle would be required. I should say that I don't actually use > `curve_fit()` myself, I'm just trying to help make this commonly used > routine be completely wrong less often. > > Cheers, > > --Matt Newville > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Wed Jan 23 23:41:43 2019 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Wed, 23 Jan 2019 20:41:43 -0800 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: Message-ID: <20190124044143.ug7zhhxvx65yxsyc@carbo> Hi Matt, On Wed, 23 Jan 2019 19:24:57 -0600, Matt Newville wrote: > First, I apologize in advance if this sounds un-appreciative of the efforts > made in scipy and scipy.optimize. Nothing you said in your email sounds unappreciative to me, for what it's worth. > Unfortunately, faced with no initial parameter values, `curve_fit` silently > chooses initial values. It doesn't try to make an informed decision, it > simply chooses '1.0', which can easily be so far off as to prevent a > solution from being found. When this happens, `curve_fit` gives no > information to the user of what the problem is. Indeed it allows initial > values to not be set, giving the impression that they are not important. > This impression is wrong: initial values are always important. `curve_fit` > is mistaken in having a default starting value. > > I've made a Pull Request at https://github.com/scipy/scipy/pull/9701 to fix > this misbehavior, so that `curve_fit()` requires starting values. It was > suggested there that this topic should be discussed here. I'm happy to do > so. It was suggested in the github Issue that forcing the user to give > initial values was "annoying". It was also suggested that a deprecation > cycle would be required. I should say that I don't actually use > `curve_fit()` myself, I'm just trying to help make this commonly used > routine be completely wrong less often. Thank you for highlighting this issue. In general, magic is to be avoided, where possible. In this case, I would suggest: 1. For one or two releases, changing the function to warn when no default is specified. Then, turn that warning into an error, that also explains good rules of thumb for calculating initial values (and mentions the 'auto' method in #2 below). 2. Provide an easy mechanism to get sensible defaults. I.e., allow specific values to be specified, but also allow estimation methods, such as 'auto' (just an example, this would probably be more specific) or 'ones' (to restore old behavior, and to prevent existing users from having to modify their code drastically). St?fan From andyfaff at gmail.com Wed Jan 23 23:50:45 2019 From: andyfaff at gmail.com (Andrew Nelson) Date: Thu, 24 Jan 2019 15:50:45 +1100 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: <20190124044143.ug7zhhxvx65yxsyc@carbo> References: <20190124044143.ug7zhhxvx65yxsyc@carbo> Message-ID: > > In terms of proposed usage emitting warnings is relatively straightforward. I guess p0 should become a new positional argument as part of this change (or some other name). How does one introduce a new positional argument whilst remaining back compatible? Using *args? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Thu Jan 24 02:07:38 2019 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Wed, 23 Jan 2019 23:07:38 -0800 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: <20190124044143.ug7zhhxvx65yxsyc@carbo> Message-ID: <20190124070738.fzvgyc3tg7kfjcm3@carbo> On Thu, 24 Jan 2019 15:50:45 +1100, Andrew Nelson wrote: > > > > In terms of proposed usage emitting warnings is relatively > straightforward. I guess p0 should become a new positional argument as part > of this change (or some other name). How does one introduce a new > positional argument whilst remaining back compatible? Using *args? That's harder work; perhaps just keep it a kwd arg for now and make sure it gets specified? Soon, we'll have https://www.python.org/dev/peps/pep-3102/ ! St?fan From nilsc.becker at gmail.com Thu Jan 24 05:03:40 2019 From: nilsc.becker at gmail.com (Nils Geib) Date: Thu, 24 Jan 2019 11:03:40 +0100 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: <20190124070738.fzvgyc3tg7kfjcm3@carbo> References: <20190124044143.ug7zhhxvx65yxsyc@carbo> <20190124070738.fzvgyc3tg7kfjcm3@carbo> Message-ID: Just one small remark: in my experience many users utilize curve_fit to fit linear models (models that are linear in the fit parameters, e.g. f(x) = a * exp(-x) + b * exp(-x^2)). In this case the linear least squares problem can be solved uniquely without providing initial values for a and b. Even though curve_fit employs a general nonlinear least squares solver it should be able to find the global minimum for virtually all starting values (although I have not comprehensively tested it). On the other hand when the model is truly nonlinear the initial values of curve_fit fail with very high probability. One may argue that using curve_fit for such cases is not the right choice. However, scipy does not provide a convenient interface such as curve_fit for linear models (as far as I know). And users who are not aware that a nonlinear least squares solver requires good initial values may also not be aware of the difference between linear and nonlinear models. (When I teach I regularly encounter the misconception that models that are nonlinear in the independent variable, e.g., x, require a nonlinear solver). Long story short: when deprecating the initial choice for p0 it may be worthwile to also consider providing a convenient interface for lsq_linear that is similar to curve_fit and does not require initial values. Cheers Nils -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jan 24 11:26:09 2019 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 24 Jan 2019 11:26:09 -0500 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: Message-ID: On Wed, Jan 23, 2019 at 11:12 PM Robert Kern wrote: > For what it's worth, I agree that we should remove the default. There's no > generic value that accidentally works often enough to justify the time > wasted and novice confusion when it fails. > > I also agree that it is going to take a reasonably long deprecation cycle. > > On Wed, Jan 23, 2019 at 5:26 PM Matt Newville > wrote: > >> Hi All, >> >> First, I apologize in advance if this sounds un-appreciative of the >> efforts made in scipy and scipy.optimize. I am a very big fan, and very >> appreciative of the work done here. With lmfit we have tried to take the >> "rough edges" from optimization and curve-fitting with python, but we're >> very much in favor of building wrappers on top of the core of >> scipy.optimize. >> >> Still, many people use `curve_fit` because it comes built-in with scipy, >> is well advertised, and is well-suited for simple uses. It is clearly >> aimed at novices and tries to hide many of the complexities of using >> optimization routines for curve fitting. I try to help people with >> questions about using `curve_fit` and `scipy.optimize` as well as `lmfit`. >> In trying to help people using `curve_fit`, I've noticed a disturbing >> trend. When a novice or even experienced user asks for help using >> `curve_fit` because a fit "doesn't work" there is a very good chance that >> they did not specify `p0` for the initial values and that the default >> behavior of setting all starting values to 1.0 will cause their fit to fail >> to converge, or really to even start. >> >> This failure is understandable to an experienced user, but apparently not >> to the novice. Curve-fitting problems are by nature local solvers and are >> always sensitive to initial values. For some problems, parameter values of >> 1.0 are simply inappropriate, and the numerical derivatives for some >> parameters near values of 1.0 will be 0. Indeed, there is no value that is >> always a reasonable starting value for all parameters. FWIW, in lmfit, we >> simply refuse to let a user run a curve-fitting problem without initial >> values. I believe that most other curve-fitting interfaces also require >> initial values for all parameters. >> >> Unfortunately, faced with no initial parameter values, `curve_fit` >> silently chooses initial values. It doesn't try to make an informed >> decision, it simply chooses '1.0', which can easily be so far off as to >> prevent a solution from being found. When this happens, `curve_fit` gives >> no information to the user of what the problem is. Indeed it allows >> initial values to not be set, giving the impression that they are not >> important. This impression is wrong: initial values are always important. >> `curve_fit` is mistaken in having a default starting value. >> >> I've made a Pull Request at https://github.com/scipy/scipy/pull/9701 to >> fix this misbehavior, so that `curve_fit()` requires starting values. It >> was suggested there that this topic should be discussed here. I'm happy to >> do so. It was suggested in the github Issue that forcing the user to give >> initial values was "annoying". It was also suggested that a deprecation >> cycle would be required. I should say that I don't actually use >> `curve_fit()` myself, I'm just trying to help make this commonly used >> routine be completely wrong less often. >> > I think making initial values compulsory is too much of a break with tradition. IMO, a warning and better documentation would be more appropriate. https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html does not show an example with starting values. curve_fit could issue a warning if p0 is not specified, or warn if convergence fails and p0 was not specified. I think it should also be possible to improve the default starting values, e.g. if the function fails or if bounds are provided. I'm not a user of curve_fit, but I guess there might be a strong selection bias in use cases when helping out users that run into problems. I have no idea what the overall selection of use cases is. A brief skimming of some github searches shows that many users don't specify the initial values. (A semi-random search result https://github.com/jmunroe/phys2820-fall2018/blob/e270b1533130b2b7acd0ec5da3edd9262b792da6/Lecture.13-Data-Analysis-and-Curve-Fitting.ipynb ) (Asides: The feedback I get about statsmodels are almost only for cases that "don't work", e.g. badly conditioned data, bad scaling of the data, corner case. Based on this feedback statsmodels optimization looks pretty bad, but this does not reflect that it works well in, say, 90% of the cases. However, unlike curve_fit, statsmodels has mostly models with predefined nonlinear functions which makes it easier to fit. ) > >> Cheers, >> >> --Matt Newville >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > > > -- > Robert Kern > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From newville at cars.uchicago.edu Thu Jan 24 13:15:32 2019 From: newville at cars.uchicago.edu (Matt Newville) Date: Thu, 24 Jan 2019 12:15:32 -0600 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: Message-ID: On Thu, Jan 24, 2019 at 10:26 AM wrote: > > > On Wed, Jan 23, 2019 at 11:12 PM Robert Kern > wrote: > >> For what it's worth, I agree that we should remove the default. There's >> no generic value that accidentally works often enough to justify the time >> wasted and novice confusion when it fails. >> >> I also agree that it is going to take a reasonably long deprecation cycle. >> >> On Wed, Jan 23, 2019 at 5:26 PM Matt Newville >> wrote: >> >>> Hi All, >>> >>> First, I apologize in advance if this sounds un-appreciative of the >>> efforts made in scipy and scipy.optimize. I am a very big fan, and very >>> appreciative of the work done here. With lmfit we have tried to take the >>> "rough edges" from optimization and curve-fitting with python, but we're >>> very much in favor of building wrappers on top of the core of >>> scipy.optimize. >>> >>> Still, many people use `curve_fit` because it comes built-in with scipy, >>> is well advertised, and is well-suited for simple uses. It is clearly >>> aimed at novices and tries to hide many of the complexities of using >>> optimization routines for curve fitting. I try to help people with >>> questions about using `curve_fit` and `scipy.optimize` as well as `lmfit`. >>> In trying to help people using `curve_fit`, I've noticed a disturbing >>> trend. When a novice or even experienced user asks for help using >>> `curve_fit` because a fit "doesn't work" there is a very good chance that >>> they did not specify `p0` for the initial values and that the default >>> behavior of setting all starting values to 1.0 will cause their fit to fail >>> to converge, or really to even start. >>> >>> This failure is understandable to an experienced user, but apparently >>> not to the novice. Curve-fitting problems are by nature local solvers and >>> are always sensitive to initial values. For some problems, parameter values >>> of 1.0 are simply inappropriate, and the numerical derivatives for some >>> parameters near values of 1.0 will be 0. Indeed, there is no value that is >>> always a reasonable starting value for all parameters. FWIW, in lmfit, we >>> simply refuse to let a user run a curve-fitting problem without initial >>> values. I believe that most other curve-fitting interfaces also require >>> initial values for all parameters. >>> >>> Unfortunately, faced with no initial parameter values, `curve_fit` >>> silently chooses initial values. It doesn't try to make an informed >>> decision, it simply chooses '1.0', which can easily be so far off as to >>> prevent a solution from being found. When this happens, `curve_fit` gives >>> no information to the user of what the problem is. Indeed it allows >>> initial values to not be set, giving the impression that they are not >>> important. This impression is wrong: initial values are always important. >>> `curve_fit` is mistaken in having a default starting value. >>> >>> I've made a Pull Request at https://github.com/scipy/scipy/pull/9701 to >>> fix this misbehavior, so that `curve_fit()` requires starting values. It >>> was suggested there that this topic should be discussed here. I'm happy to >>> do so. It was suggested in the github Issue that forcing the user to give >>> initial values was "annoying". It was also suggested that a deprecation >>> cycle would be required. I should say that I don't actually use >>> `curve_fit()` myself, I'm just trying to help make this commonly used >>> routine be completely wrong less often. >>> >> > > I think making initial values compulsory is too much of a break with > tradition. > Well, it may be a break with the tradition of using scipy.optimize.curve_fit, but I do not think it is a break with the tradition of curve fitting. Indeed, what curve_fit does when a user leaves `p0=None` is *not* to leave the initial values unspecified -- the underlying optimization routine would simply not accept that -- but rather to silently select values that are all '1.0'. I am not aware of any other curve-fitting code or use of non-linear optimization that does this. So, in a sense it is a "traditional". It is also wrong. IMO, a warning and better documentation would be more appropriate. > > https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html > does not show an example with starting values. > curve_fit could issue a warning if p0 is not specified, or warn if > convergence fails and p0 was not specified. > > I think it should also be possible to improve the default starting values, > e.g. if the function fails or if bounds are provided. > I think trying to guess starting values would require an understanding of the function calculating the model, and is not generally solvable. For sure, if one knows the function, initial guesses are possible. Lmfit has this capability for a few commonly used model functions and those initial guess are often very good. But it cannot be done in general. > I'm not a user of curve_fit, but I guess there might be a strong selection > bias in use cases when helping out users that run into problems. > I have no idea what the overall selection of use cases is. A brief > skimming of some github searches shows that many users don't specify the > initial values. > (A semi-random search result > https://github.com/jmunroe/phys2820-fall2018/blob/e270b1533130b2b7acd0ec5da3edd9262b792da6/Lecture.13-Data-Analysis-and-Curve-Fitting.ipynb > ) > > (Asides: > The feedback I get about statsmodels are almost only for cases that "don't > work", e.g. badly conditioned data, bad scaling of the data, corner case. > Based on this feedback statsmodels optimization looks pretty bad, but this > does not reflect that it works well in, say, 90% of the cases. > However, unlike curve_fit, statsmodels has mostly models with predefined > nonlinear functions which makes it easier to fit. > ) > > I do not know what the usage of `curve_fit` is. Apparently some users get tripped up by not specifying initial values. But that is in the nature of curve fitting -- initial values are necessary. Claiming that they do not matter or are an unnecessary burden is just not correct. It seems like the first step is to change `curve_fit` to raise a warning or print a message (not sure which is preferred) when `p0` is `None`, but continue guessing `1.0`, at least for the time being. Eventually, this could be changed to raise an exception if `p0` is `None`. Perhaps a middle step would be to change it to not guess `1.0` but a number comprised of a uniformly selected random number between [-1, 1] for the mantissa and a uniformly selected random integer between [-20, 20] for the exponent, as long as bounds are respected? Cheers, --Matt Newville -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Thu Jan 24 13:45:44 2019 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 24 Jan 2019 10:45:44 -0800 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: Message-ID: <20190124184544.s33zdmtecorrogtz@carbo> Hi Josef, On Thu, 24 Jan 2019 11:26:09 -0500, josef.pktd at gmail.com wrote: > I think making initial values compulsory is too much of a break with > tradition. > IMO, a warning and better documentation would be more appropriate. > https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html > does not show an example with starting values. > curve_fit could issue a warning if p0 is not specified, or warn if > convergence fails and p0 was not specified. Isn't the greater danger that convergence succeeds, with p0 unspecified, and the resulting model not being at all what the user had in mind? > I think it should also be possible to improve the default starting values, > e.g. if the function fails or if bounds are provided. This is the type of magic I hope we can avoid. Having different execution paths based on some vaguely defined notion of perceived failure seems dangerous at best. > I'm not a user of curve_fit, but I guess there might be a strong selection > bias in use cases when helping out users that run into problems. I agree; and I think this can be accomplished by better documentation, helpful warnings, and assisting the user in choosing correct parameters. Best regards, St?fan From josef.pktd at gmail.com Thu Jan 24 13:51:14 2019 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 24 Jan 2019 13:51:14 -0500 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: Message-ID: On Thu, Jan 24, 2019 at 1:16 PM Matt Newville wrote: > > > On Thu, Jan 24, 2019 at 10:26 AM wrote: > >> >> >> On Wed, Jan 23, 2019 at 11:12 PM Robert Kern >> wrote: >> >>> For what it's worth, I agree that we should remove the default. There's >>> no generic value that accidentally works often enough to justify the time >>> wasted and novice confusion when it fails. >>> >>> I also agree that it is going to take a reasonably long deprecation >>> cycle. >>> >>> On Wed, Jan 23, 2019 at 5:26 PM Matt Newville < >>> newville at cars.uchicago.edu> wrote: >>> >>>> Hi All, >>>> >>>> First, I apologize in advance if this sounds un-appreciative of the >>>> efforts made in scipy and scipy.optimize. I am a very big fan, and very >>>> appreciative of the work done here. With lmfit we have tried to take the >>>> "rough edges" from optimization and curve-fitting with python, but we're >>>> very much in favor of building wrappers on top of the core of >>>> scipy.optimize. >>>> >>>> Still, many people use `curve_fit` because it comes built-in with >>>> scipy, is well advertised, and is well-suited for simple uses. It is >>>> clearly aimed at novices and tries to hide many of the complexities of >>>> using optimization routines for curve fitting. I try to help people with >>>> questions about using `curve_fit` and `scipy.optimize` as well as `lmfit`. >>>> In trying to help people using `curve_fit`, I've noticed a disturbing >>>> trend. When a novice or even experienced user asks for help using >>>> `curve_fit` because a fit "doesn't work" there is a very good chance that >>>> they did not specify `p0` for the initial values and that the default >>>> behavior of setting all starting values to 1.0 will cause their fit to fail >>>> to converge, or really to even start. >>>> >>>> This failure is understandable to an experienced user, but apparently >>>> not to the novice. Curve-fitting problems are by nature local solvers and >>>> are always sensitive to initial values. For some problems, parameter values >>>> of 1.0 are simply inappropriate, and the numerical derivatives for some >>>> parameters near values of 1.0 will be 0. Indeed, there is no value that is >>>> always a reasonable starting value for all parameters. FWIW, in lmfit, we >>>> simply refuse to let a user run a curve-fitting problem without initial >>>> values. I believe that most other curve-fitting interfaces also require >>>> initial values for all parameters. >>>> >>>> Unfortunately, faced with no initial parameter values, `curve_fit` >>>> silently chooses initial values. It doesn't try to make an informed >>>> decision, it simply chooses '1.0', which can easily be so far off as to >>>> prevent a solution from being found. When this happens, `curve_fit` gives >>>> no information to the user of what the problem is. Indeed it allows >>>> initial values to not be set, giving the impression that they are not >>>> important. This impression is wrong: initial values are always important. >>>> `curve_fit` is mistaken in having a default starting value. >>>> >>>> I've made a Pull Request at https://github.com/scipy/scipy/pull/9701 >>>> to fix this misbehavior, so that `curve_fit()` requires starting values. >>>> It was suggested there that this topic should be discussed here. I'm happy >>>> to do so. It was suggested in the github Issue that forcing the user to >>>> give initial values was "annoying". It was also suggested that a >>>> deprecation cycle would be required. I should say that I don't >>>> actually use `curve_fit()` myself, I'm just trying to help make this commonly >>>> used routine be completely wrong less often. >>>> >>> >> >> I think making initial values compulsory is too much of a break with >> tradition. >> > > Well, it may be a break with the tradition of using > scipy.optimize.curve_fit, but I do not think it is a break with the > tradition of curve fitting. > Indeed, what curve_fit does when a user leaves `p0=None` is *not* to leave > the initial values unspecified -- the underlying optimization routine would > simply not accept that -- but rather to silently select values that are all > '1.0'. I am not aware of any other curve-fitting code or use of > non-linear optimization that does this. So, in a sense it is a > "traditional". It is also wrong. > scipy.stats distribution fit also default to ones if not overridden by the specific distribution. statsmodels only has a few models with arbitrary user functions, but I usually try to set a default that works at least in some common cases. def fitstart(self): #might not make sense for more general functions return np.zeros(self.exog.shape[1]) curve_fit was added to scipy as a convenience function, in contrast to the "serious" optimizers. For that I think putting in more effort to reduce the work by users is useful. (Note, I was never a fan of using `inspect` which is needed to know how many default starting values to create.) > > IMO, a warning and better documentation would be more appropriate. >> >> https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html >> does not show an example with starting values. >> curve_fit could issue a warning if p0 is not specified, or warn if >> convergence fails and p0 was not specified. >> >> I think it should also be possible to improve the default starting >> values, e.g. if the function fails or if bounds are provided. >> > > I think trying to guess starting values would require an understanding of > the function calculating the model, and is not generally solvable. > For sure, if one knows the function, initial guesses are possible. Lmfit > has this capability for a few commonly used model functions and those > initial guess are often very good. But it cannot be done in general. > > >> I'm not a user of curve_fit, but I guess there might be a strong >> selection bias in use cases when helping out users that run into problems. >> I have no idea what the overall selection of use cases is. A brief >> skimming of some github searches shows that many users don't specify the >> initial values. >> (A semi-random search result >> https://github.com/jmunroe/phys2820-fall2018/blob/e270b1533130b2b7acd0ec5da3edd9262b792da6/Lecture.13-Data-Analysis-and-Curve-Fitting.ipynb >> ) >> >> (Asides: >> The feedback I get about statsmodels are almost only for cases that >> "don't work", e.g. badly conditioned data, bad scaling of the data, corner >> case. Based on this feedback statsmodels optimization looks pretty bad, but >> this does not reflect that it works well in, say, 90% of the cases. >> However, unlike curve_fit, statsmodels has mostly models with predefined >> nonlinear functions which makes it easier to fit. >> ) >> >> > I do not know what the usage of `curve_fit` is. Apparently some users get > tripped up by not specifying initial values. But that is in the nature of > curve fitting -- initial values are necessary. Claiming that they do not > matter or are an unnecessary burden is just not correct. > > It seems like the first step is to change `curve_fit` to raise a warning > or print a message (not sure which is preferred) when `p0` is `None`, but > continue guessing `1.0`, at least for the time being. Eventually, this > could be changed to raise an exception if `p0` is `None`. > > Perhaps a middle step would be to change it to not guess `1.0` but a > number comprised of a uniformly selected random number between [-1, 1] for > the mantissa and a uniformly selected random integer between [-20, 20] for > the exponent, as long as bounds are respected? > > Cheers, > > --Matt Newville > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jan 24 14:19:09 2019 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 24 Jan 2019 14:19:09 -0500 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: <20190124184544.s33zdmtecorrogtz@carbo> References: <20190124184544.s33zdmtecorrogtz@carbo> Message-ID: On Thu, Jan 24, 2019 at 1:46 PM Stefan van der Walt wrote: > Hi Josef, > > On Thu, 24 Jan 2019 11:26:09 -0500, josef.pktd at gmail.com wrote: > > I think making initial values compulsory is too much of a break with > > tradition. > > IMO, a warning and better documentation would be more appropriate. > > > https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html > > does not show an example with starting values. > > curve_fit could issue a warning if p0 is not specified, or warn if > > convergence fails and p0 was not specified. > > Isn't the greater danger that convergence succeeds, with p0 unspecified, > and the resulting model not being at all what the user had in mind? > Unless the optimization problem is globally convex, the user always needs to check the results. > > > I think it should also be possible to improve the default starting > values, > > e.g. if the function fails or if bounds are provided. > > This is the type of magic I hope we can avoid. Having different > execution paths based on some vaguely defined notion of perceived > failure seems dangerous at best. > I there is no guarantee for a global optimum, it's still what either the program or the user has to do. E.g. for statsmodels (very rough guess on numbers) 90% of the cases work fine 10% of the cases the data is not appropriate, singular, ill conditioned or otherwise "not nice" 10% of the cases the optimizer has problems and does not converge. In this last case either the program or the user needs to work more: We can try different optimizers, e.g. start with nelder-mead before switching to a gradient optimizer. Or, switch to global optimizer from scipy, if the underlying model is complex and might not be well behaved. or pure man's global optimizer: try out many different random or semi-random starting values. (and if all fails go back to the drawing board and try to find a parameterization that is better behaved.) statsmodels is switching optimizers in some cases, but in most cases it is up to the user to change the optimizers after convergence failure. However, we did select default optimizers by which scipy optimizer seems to work well for the various cases. Stata is also switching optimizers in some cases, and AFAIR has in some cases and option to "try harder". statsmodels is still missing an automatic "try harder" option, that automatically switches optimizers on convergence failure. > > > I'm not a user of curve_fit, but I guess there might be a strong > selection > > bias in use cases when helping out users that run into problems. > > I agree; and I think this can be accomplished by better documentation, > helpful warnings, and assisting the user in choosing correct parameters. > The main question for me is whether the warnings and improved documentation are enough, or whether curve_fit needs to force every user to specify the starting values. i.e. I think Try automatic first, and if that does not succeed, then the user has to think again, is more convenient, than "you have to think about your problem first, don't just hit the button". Josef > > Best regards, > St?fan > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Thu Jan 24 15:01:06 2019 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 24 Jan 2019 12:01:06 -0800 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: <20190124184544.s33zdmtecorrogtz@carbo> Message-ID: <20190124200106.5eq44beu6sekwb7s@carbo> On Thu, 24 Jan 2019 14:19:09 -0500, josef.pktd at gmail.com wrote: > i.e. I think > Try automatic first, and if that does not succeed, then the user has to > think again, > is more convenient, than > "you have to think about your problem first, don't just hit the > button". My question is: can you ever know with certainty when you did not succeed? In many cases, you'll probably think you did fine? But, consider, e.g., an alternative implementation that tries 5 different fitting methods, and then picks the solution with the lowest error. That, while not computationally optimal, is a process you can explain clearly. In this case, how do you communicate to the user through which steps there data went to obtained the returned result? This is what I mean by avoiding the magic; making sure the user knows where their results came from. I am not opposed to convenience, as long as there is clear communication. St?fan From josef.pktd at gmail.com Thu Jan 24 15:35:14 2019 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 24 Jan 2019 15:35:14 -0500 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: <20190124200106.5eq44beu6sekwb7s@carbo> References: <20190124184544.s33zdmtecorrogtz@carbo> <20190124200106.5eq44beu6sekwb7s@carbo> Message-ID: On Thu, Jan 24, 2019 at 3:01 PM Stefan van der Walt wrote: > On Thu, 24 Jan 2019 14:19:09 -0500, josef.pktd at gmail.com wrote: > > i.e. I think > > Try automatic first, and if that does not succeed, then the user has to > > think again, > > is more convenient, than > > "you have to think about your problem first, don't just hit the > > button". > > My question is: can you ever know with certainty when you did not > succeed? In many cases, you'll probably think you did fine? > > But, consider, e.g., an alternative implementation that tries 5 > different fitting methods, and then picks the solution with the lowest > error. That, while not computationally optimal, is a process you can > explain clearly. > > In this case, how do you communicate to the user through which steps > there data went to obtained the returned result? This is what I mean by > avoiding the magic; making sure the user knows where their results came > from. I am not opposed to convenience, as long as there is clear > communication. > Trying out different starting values and check convergence is similar to what global optimizers like basinhopping do. curve_fit has optional returns, which are not mentioned in the docstring, no description of `full_output`. Any systematic method, like "try again if convergence failed" can be described. The actual search path, which methods have been tried, can be included in some `fit_history` in the `full_output`. (E.g. statsmodels includes in most models some extra information about the last optimization method used that includes which starting values were used, which optimizer and similar. AFAIR, we don't return information about earlier, preliminary optimization steps to get good starting values for the final optimization.) I never tried whether leastsq can be used inside basinhopping and I have not tried the new global optimizers yet. There would be options to make it work for more cases without requiring users to specify starting values and possibly to switch to fancier optimizers. (I'm mainly arguing this position because curve_fit was initially sold as a function that is more convenient to use than leastsq, i.e. it should put less demand on the user than just using the underlying optimizers directly.) Josef > > St?fan > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Thu Jan 24 15:54:06 2019 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 24 Jan 2019 12:54:06 -0800 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: <20190124184544.s33zdmtecorrogtz@carbo> <20190124200106.5eq44beu6sekwb7s@carbo> Message-ID: <20190124205406.jv6whpdurb6cho4y@carbo> On Thu, 24 Jan 2019 15:35:14 -0500, josef.pktd at gmail.com wrote: > Any systematic method, like "try again if convergence failed" can be > described. > The actual search path, which methods have been tried, can be included in > some `fit_history` in the `full_output`. I could go along with that, if the user then has a clear path towards restarting the procedure using only the latest method + initial parameter. And then, as you say, the function could be marketed as more of a black box solution. I don't usually recommend that people use black boxes, but I guess I can be convinced that they have a place in certain use cases. Best regards, St?fan From djpine at gmail.com Thu Jan 24 16:21:33 2019 From: djpine at gmail.com (David J Pine) Date: Fri, 25 Jan 2019 06:21:33 +0900 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: <20190124044143.ug7zhhxvx65yxsyc@carbo> <20190124070738.fzvgyc3tg7kfjcm3@carbo> Message-ID: Five years ago I submitted a linear fitting routine for SciPy so that people could have an alternative to curve_fit when they wanted to fit a linear model. Here it is: https://github.com/djpine/linfit Some people felt it was redundant with linear regression routines in scipy.stats. However, the linear regression routine in scipy.stats does not offer the same capabilities as curve_fit for including error estimation (data weighting) so it may not meet the needs of users who typically use curve_fit. The above routine is meant to remedy that. pine at nyu.edu On Thu, Jan 24, 2019 at 7:04 PM Nils Geib wrote: > Just one small remark: in my experience many users utilize curve_fit to > fit linear models (models that are linear in the fit parameters, e.g. f(x) > = a * exp(-x) + b * exp(-x^2)). In this case the linear least squares > problem can be solved uniquely without providing initial values for a and > b. Even though curve_fit employs a general nonlinear least squares solver > it should be able to find the global minimum for virtually all starting > values (although I have not comprehensively tested it). On the other hand > when the model is truly nonlinear the initial values of curve_fit fail with > very high probability. > > One may argue that using curve_fit for such cases is not the right choice. > However, scipy does not provide a convenient interface such as curve_fit > for linear models (as far as I know). And users who are not aware that a > nonlinear least squares solver requires good initial values may also not be > aware of the difference between linear and nonlinear models. (When I teach > I regularly encounter the misconception that models that are nonlinear in > the independent variable, e.g., x, require a nonlinear solver). > > Long story short: when deprecating the initial choice for p0 it may be > worthwile to also consider providing a convenient interface for lsq_linear > that is similar to curve_fit and does not require initial values. > > Cheers > Nils > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Thu Jan 24 20:01:37 2019 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 24 Jan 2019 17:01:37 -0800 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: <20190124044143.ug7zhhxvx65yxsyc@carbo> <20190124070738.fzvgyc3tg7kfjcm3@carbo> Message-ID: <20190125010137.lmovf4kbvv3odcnu@carbo> On Fri, 25 Jan 2019 06:21:33 +0900, David J Pine wrote: > Five years ago I submitted a linear fitting routine for SciPy so that > people could have an alternative to curve_fit when they wanted to fit a > linear model. Here it is: > > https://github.com/djpine/linfit If we implement a higher-level `optimize`-like API, it may be worth switching out between implementations such as these. Interesting reading along these lines: "Data analysis recipes: Fitting a model to data" by Hogg et al. https://arxiv.org/abs/1008.4686 which also discusses rejection of data outliers. SciPy doesn't currently have any rejection methods included, as far as I am aware. St?fan From jose-marcio.martins at mines-paristech.fr Fri Jan 25 05:10:44 2019 From: jose-marcio.martins at mines-paristech.fr (Jose-Marcio Martins da Cruz) Date: Fri, 25 Jan 2019 11:10:44 +0100 Subject: [SciPy-Dev] ANN: SMIL - Small Morphological Image Library Message-ID: <1cf239d6-731b-becd-16e2-f23bfd9c1385@mines-paristech.fr> Hi all, I'm the maintainer of SMIL and I'd like to, eventually integrate SMIL into SciPy or Scikit-image. Announcement: SMIL 0.9.1 ======================== I'm pleased to announce SMIL - Simple Morphological Image Library - v. 0.9.1 SMIL is a library with all basic and some advanced mathematical morphology features which can be extended with plugins and user modules. Among its features it can handle 2D and 3D images and can handle data from/to NumPy data. It's been developed in C++ and has a Python interface thanks to Swig. SMIL is a product of CMM, the research Center of Mathematical Morphology of Mines-Paristech, where the discipline of Mathematical Morphology was created in the 60's by Jean Serra and Georges Matheron. SMIL is distributed with GPL license. We use SMIL in our research and teaching activities in the field. You can find SMIL - binaries and documentation - at our web site : http://smil.cmm.mines-paristech.fr or the source code at : https://github.com/ensmp-cmm/smil Thanks Jose-Marcio -- --------------------------------------------------------------- Jose Marcio MARTINS DA CRUZ, Ph.D. Ecole des Mines de Paris CMM - Centre de Morphologie Math?matique --------------------------------------------------------------- Spam : http://amzn.to/LEscRu ou http://bit.ly/SpamJM --------------------------------------------------------------- From deil.christoph at googlemail.com Fri Jan 25 06:47:12 2019 From: deil.christoph at googlemail.com (Christoph Deil) Date: Fri, 25 Jan 2019 12:47:12 +0100 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: Message-ID: <125E9660-A357-4B38-A2A0-F64967B8EC23@googlemail.com> Dear Matt, Joseph, all, Here?s my 2 cents. > On 24. Jan 2019, at 17:26, josef.pktd at gmail.com wrote: > > IMO, a warning and better documentation would be more appropriate. > https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html does not show an example with starting values. > curve_fit could issue a warning if p0 is not specified, or warn if convergence fails and p0 was not specified. I think a better docstring for curve_fit would be good, mentioning the importance of specifying p0 and that the default is ones. Also +1 to add a mention of lmfit from the scipy.optimize docs somewhere, recommending it to users as a high-level package that directly builds on scipy.optimise providing extra functionality and conveniences. -1 on making the change to require p0 to be passed, and starting a deprecation process to eventually give an error on this in a few years. There are cases and existing scripts that work just fine with the current default. IMO they should continue to work (even without warning) in scipy 1.x just like they do in the currently scipy 1.1. > > I think it should also be possible to improve the default starting values, e.g. if the function fails or if bounds are provided. > -1 on adding complex logic and guessing and trying code for p0 to curve_fit. scipy.optimize is pretty low-level, adding this kind of ?convenience? which is hard to explain and document and maintain seems a bit out of place to me. Also any change in how p0 is chosen likely will change results that come out for some use cases, so for the same reason mentioned above (stability within the scipy 1.x series) I?m -1 to change this. Christoph -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilhanpolat at gmail.com Fri Jan 25 10:11:56 2019 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Fri, 25 Jan 2019 16:11:56 +0100 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: <125E9660-A357-4B38-A2A0-F64967B8EC23@googlemail.com> References: <125E9660-A357-4B38-A2A0-F64967B8EC23@googlemail.com> Message-ID: As I commented on the GitHub issue, we are not supposed to educate users through keywords and default values in a passive manner. A clear "You need to choose p0 wisely to use this tool" message would be enough to be warned. On Fri, Jan 25, 2019 at 12:48 PM Christoph Deil < deil.christoph at googlemail.com> wrote: > Dear Matt, Joseph, all, > > Here?s my 2 cents. > > On 24. Jan 2019, at 17:26, josef.pktd at gmail.com wrote: > > IMO, a warning and better documentation would be more appropriate. > > https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html > does not show an example with starting values. > curve_fit could issue a warning if p0 is not specified, or warn if > convergence fails and p0 was not specified. > > > I think a better docstring for curve_fit would be good, mentioning the > importance of specifying p0 and that the default is ones. > Also +1 to add a mention of lmfit from the scipy.optimize docs somewhere, > recommending it to users as a high-level package that directly builds on > scipy.optimise providing extra functionality and conveniences. > > -1 on making the change to require p0 to be passed, and starting a > deprecation process to eventually give an error on this in a few years. > There are cases and existing scripts that work just fine with the current > default. IMO they should continue to work (even without warning) in scipy > 1.x just like they do in the currently scipy 1.1. > > > I think it should also be possible to improve the default starting values, > e.g. if the function fails or if bounds are provided. > > > -1 on adding complex logic and guessing and trying code for p0 to > curve_fit. > scipy.optimize is pretty low-level, adding this kind of ?convenience? > which is hard to explain and document and maintain seems a bit out of place > to me. > Also any change in how p0 is chosen likely will change results that come > out for some use cases, so for the same reason mentioned above (stability > within the scipy 1.x series) I?m -1 to change this. > > Christoph > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From phillip.m.feldman at gmail.com Fri Jan 25 12:16:10 2019 From: phillip.m.feldman at gmail.com (Phillip Feldman) Date: Fri, 25 Jan 2019 09:16:10 -0800 Subject: [SciPy-Dev] ANN: SMIL - Small Morphological Image Library In-Reply-To: <1cf239d6-731b-becd-16e2-f23bfd9c1385@mines-paristech.fr> References: <1cf239d6-731b-becd-16e2-f23bfd9c1385@mines-paristech.fr> Message-ID: The documentation page (at http://smil.cmm.mines-paristech.fr/wiki/doku.php/doc/start) is currently empty. On Fri, Jan 25, 2019 at 2:11 AM Jose-Marcio Martins da Cruz < jose-marcio.martins at mines-paristech.fr> wrote: > > Hi all, > > I'm the maintainer of SMIL and I'd like to, eventually integrate SMIL into > SciPy or Scikit-image. > > Announcement: SMIL 0.9.1 > ======================== > > I'm pleased to announce SMIL - Simple Morphological Image Library - v. > 0.9.1 > > SMIL is a library with all basic and some advanced mathematical morphology > features which can be extended with plugins > and user modules. > > Among its features it can handle 2D and 3D images and can handle data > from/to NumPy data. > > It's been developed in C++ and has a Python interface thanks to Swig. > > SMIL is a product of CMM, the research Center of Mathematical Morphology > of Mines-Paristech, where the discipline of > Mathematical Morphology was created in the 60's by Jean Serra and Georges > Matheron. > > SMIL is distributed with GPL license. > > We use SMIL in our research and teaching activities in the field. > > You can find SMIL - binaries and documentation - at our web site : > > http://smil.cmm.mines-paristech.fr > > or the source code at : > > https://github.com/ensmp-cmm/smil > > Thanks > > Jose-Marcio > > -- > > --------------------------------------------------------------- > Jose Marcio MARTINS DA CRUZ, Ph.D. > Ecole des Mines de Paris > CMM - Centre de Morphologie Math?matique > --------------------------------------------------------------- > Spam : http://amzn.to/LEscRu ou http://bit.ly/SpamJM > --------------------------------------------------------------- > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jose-Marcio.Martins at mines-paristech.fr Fri Jan 25 13:58:56 2019 From: Jose-Marcio.Martins at mines-paristech.fr (Jose-Marcio Martins da Cruz) Date: Fri, 25 Jan 2019 19:58:56 +0100 Subject: [SciPy-Dev] ANN: SMIL - Small Morphological Image Library In-Reply-To: References: <1cf239d6-731b-becd-16e2-f23bfd9c1385@mines-paristech.fr> Message-ID: <4629d318-6bad-4b31-27d6-84fbc4428de4@mines-paristech.fr> On 1/25/19 6:16 PM, Phillip Feldman wrote: > The documentation page (at http://smil.cmm.mines-paristech.fr/wiki/doku.php/doc/start) is currently empty. Please follow links in the left menu : All links are OK, "Running SMIL under Python", lacks some contents. > > On Fri, Jan 25, 2019 at 2:11 AM Jose-Marcio Martins da Cruz > wrote: > > > Hi all, > > I'm the maintainer of SMIL and I'd like to, eventually integrate SMIL into SciPy or Scikit-image. > > Announcement: SMIL 0.9.1 > ======================== > > I'm pleased to announce SMIL - Simple Morphological Image Library - v. 0.9.1 > > SMIL is a library with all basic and some advanced mathematical morphology features which can be extended with plugins > and user modules. > > Among its features it can handle 2D and 3D images and can handle data from/to NumPy data. > > It's been developed in C++ and has a Python interface thanks to Swig. > > SMIL is a product of CMM, the research Center of Mathematical Morphology of Mines-Paristech, where the discipline of > Mathematical Morphology was created in the 60's by Jean Serra and Georges Matheron. > > SMIL is distributed with GPL license. > > We use SMIL in our research and teaching activities in the field. > > You can find SMIL - binaries and documentation - at our web site : > > http://smil.cmm.mines-paristech.fr > > or the source code at : > > https://github.com/ensmp-cmm/smil > > Thanks > > Jose-Marcio > > -- > > ? --------------------------------------------------------------- > ? Jose Marcio MARTINS DA CRUZ, Ph.D. > ? Ecole des Mines de Paris > ? CMM - Centre de Morphologie Math?matique > ? --------------------------------------------------------------- > ? ? ? Spam : http://amzn.to/LEscRu ou http://bit.ly/SpamJM > ? --------------------------------------------------------------- > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > From phillip.m.feldman at gmail.com Fri Jan 25 15:39:45 2019 From: phillip.m.feldman at gmail.com (Phillip Feldman) Date: Fri, 25 Jan 2019 12:39:45 -0800 Subject: [SciPy-Dev] ANN: SMIL - Small Morphological Image Library In-Reply-To: <4629d318-6bad-4b31-27d6-84fbc4428de4@mines-paristech.fr> References: <1cf239d6-731b-becd-16e2-f23bfd9c1385@mines-paristech.fr> <4629d318-6bad-4b31-27d6-84fbc4428de4@mines-paristech.fr> Message-ID: When I click on the Tutorials link, I get the following "This topic does not exist yet". It would be good to have 3-4 sentences of top-level explanation about what this package does and why someone would want to use it. On Fri, Jan 25, 2019 at 10:58 AM Jose-Marcio Martins da Cruz < Jose-Marcio.Martins at mines-paristech.fr> wrote: > On 1/25/19 6:16 PM, Phillip Feldman wrote: > > The documentation page (at > http://smil.cmm.mines-paristech.fr/wiki/doku.php/doc/start) is currently > empty. > > Please follow links in the left menu : All links are OK, "Running SMIL > under Python", lacks some contents. > > > > > On Fri, Jan 25, 2019 at 2:11 AM Jose-Marcio Martins da Cruz < > jose-marcio.martins at mines-paristech.fr > > > wrote: > > > > > > Hi all, > > > > I'm the maintainer of SMIL and I'd like to, eventually integrate > SMIL into SciPy or Scikit-image. > > > > Announcement: SMIL 0.9.1 > > ======================== > > > > I'm pleased to announce SMIL - Simple Morphological Image Library - > v. 0.9.1 > > > > SMIL is a library with all basic and some advanced mathematical > morphology features which can be extended with plugins > > and user modules. > > > > Among its features it can handle 2D and 3D images and can handle > data from/to NumPy data. > > > > It's been developed in C++ and has a Python interface thanks to Swig. > > > > SMIL is a product of CMM, the research Center of Mathematical > Morphology of Mines-Paristech, where the discipline of > > Mathematical Morphology was created in the 60's by Jean Serra and > Georges Matheron. > > > > SMIL is distributed with GPL license. > > > > We use SMIL in our research and teaching activities in the field. > > > > You can find SMIL - binaries and documentation - at our web site : > > > > http://smil.cmm.mines-paristech.fr > > > > or the source code at : > > > > https://github.com/ensmp-cmm/smil > > > > Thanks > > > > Jose-Marcio > > > > -- > > > > --------------------------------------------------------------- > > Jose Marcio MARTINS DA CRUZ, Ph.D. > > Ecole des Mines de Paris > > CMM - Centre de Morphologie Math?matique > > --------------------------------------------------------------- > > Spam : http://amzn.to/LEscRu ou http://bit.ly/SpamJM > > --------------------------------------------------------------- > > > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at python.org > > https://mail.python.org/mailman/listinfo/scipy-dev > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From newville at cars.uchicago.edu Sun Jan 27 21:06:18 2019 From: newville at cars.uchicago.edu (Matt Newville) Date: Sun, 27 Jan 2019 20:06:18 -0600 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: <20190124184544.s33zdmtecorrogtz@carbo> Message-ID: Hi All, On Thu, Jan 24, 2019 at 1:20 PM wrote: > > > On Thu, Jan 24, 2019 at 1:46 PM Stefan van der Walt > wrote: > >> Hi Josef, >> >> On Thu, 24 Jan 2019 11:26:09 -0500, josef.pktd at gmail.com wrote: >> > I think making initial values compulsory is too much of a break with >> > tradition. >> > IMO, a warning and better documentation would be more appropriate. >> > >> https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html >> > does not show an example with starting values. >> > curve_fit could issue a warning if p0 is not specified, or warn if >> > convergence fails and p0 was not specified. >> >> Isn't the greater danger that convergence succeeds, with p0 unspecified, >> and the resulting model not being at all what the user had in mind? >> > > Unless the optimization problem is globally convex, the user always needs > to check the results. > > > >> >> > I think it should also be possible to improve the default starting >> values, >> > e.g. if the function fails or if bounds are provided. >> >> This is the type of magic I hope we can avoid. Having different >> execution paths based on some vaguely defined notion of perceived >> failure seems dangerous at best. >> > > I there is no guarantee for a global optimum, it's still what either the > program or the user has to do. > > E.g. for statsmodels (very rough guess on numbers) > 90% of the cases work fine > 10% of the cases the data is not appropriate, singular, ill conditioned or > otherwise "not nice" > 10% of the cases the optimizer has problems and does not converge. > > In this last case either the program or the user needs to work more: > We can try different optimizers, e.g. start with nelder-mead before > switching to a gradient optimizer. > Or, switch to global optimizer from scipy, if the underlying model is > complex and might not be well behaved. > or pure man's global optimizer: try out many different random or > semi-random starting values. > (and if all fails go back to the drawing board and try to find a > parameterization that is better behaved.) > > statsmodels is switching optimizers in some cases, but in most cases it is > up to the user to change the optimizers after convergence failure. > However, we did select default optimizers by which scipy optimizer seems > to work well for the various cases. > Stata is also switching optimizers in some cases, and AFAIR has in some > cases and option to "try harder". > statsmodels is still missing an automatic "try harder" option, that > automatically switches optimizers on convergence failure. > > > >> >> > I'm not a user of curve_fit, but I guess there might be a strong >> selection >> > bias in use cases when helping out users that run into problems. >> >> I agree; and I think this can be accomplished by better documentation, >> helpful warnings, and assisting the user in choosing correct parameters. >> > > The main question for me is whether the warnings and improved > documentation are enough, or whether curve_fit needs to force every user to > specify the starting values. > I may not be understanding what you say about statsmodel. Is that using or related to `curve_fit()`? Perhaps it works well in many cases for you because of the limited range of the probability distribution functions being fitted? My view on this starts with the fact that Initial values are actually required in non-linear optimization. In a sense, not "forcing every user to specify starting values" and silently replacing `None` with `np.ones(n_variables)` is misinforming the user. I cannot think of any reason to recommend this behavior. It will certainly fail spectacularly sometimes. I would not try to guess (or probably believe anyone else's guess ;)) how often this would happen, but I can tell you that for essentially all of the fitting I do and my applications do for other users, giving initial values of 1 for all parameters would fail in such a way as to not move past the initial values (that is "not work" in a way that might easily confuse a novice). Again, I do not use `curve_fit()`, but clearly `p0=None` fails often enough to cause confusion. i.e. I think > Try automatic first, and if that does not succeed, then the user has to > think again, > is more convenient, than > "you have to think about your problem first, don't just hit the button". > > The problem I have with this is that there really is not an option to "try automatic first". There is "try `np.ones(n_variables)` first". This, or any other value, is really not a defensible choice for starting values. Starting values always depend on the function used and the data being fit. The user of `curve_fit` already has to provide data (about which they presumably know something) and write a function that models that data. I think that qualifies as "has to think about their problem". They should be able to make some guess ("prior belief") of the parameter values. Hopefully they will run their modelling function with some sensible values for the parameters before running `curve_fit` to make sure that their function runs correctly. Currently `curve_fit` converts `p0=None` to `np.ones(n_variables)` without warning or explanation. Again, I do not use `curve_fit()` myself. I find several aspects of it unpleasant. But this behavior strikes me as utterly wrong and a disservice to the scipy ecosystem. I do not think that a documentation change is sufficient. I can believe a deprecation time would be reasonable, but I would hope this behavior could be removed. --Matt Newville -------------- next part -------------- An HTML attachment was scrubbed... URL: From jose-marcio.martins at mines-paristech.fr Mon Jan 28 05:25:03 2019 From: jose-marcio.martins at mines-paristech.fr (Jose-Marcio Martins da Cruz) Date: Mon, 28 Jan 2019 11:25:03 +0100 Subject: [SciPy-Dev] ANN: SMIL - Small Morphological Image Library In-Reply-To: References: <1cf239d6-731b-becd-16e2-f23bfd9c1385@mines-paristech.fr> <4629d318-6bad-4b31-27d6-84fbc4428de4@mines-paristech.fr> Message-ID: <21bcb708-2027-ee35-3578-b9dfcc2f017d@mines-paristech.fr> On 25/01/2019 21:39, Phillip Feldman wrote: > When I click on the Tutorials link, I get the following "This topic does not exist yet".? It would be good to have 3-4 > sentences of top-level explanation about what this package does and why someone would want to use it. Thanks for the suggestion. The Python part of the documentation is still under construction. The documentation of the C++ is OK, thanks do Doxygen. The python part is being build based on this, with the same structure, but it isn't enough. Hopefully, all this will be OK as soon as possible. The source code is still being improved with more advanced algorithms, like stochastic Watershed, Bilateral Filtering, and so. > > On Fri, Jan 25, 2019 at 10:58 AM Jose-Marcio Martins da Cruz > wrote: > > On 1/25/19 6:16 PM, Phillip Feldman wrote: > > The documentation page (at http://smil.cmm.mines-paristech.fr/wiki/doku.php/doc/start) is currently empty. > > Please follow links in the left menu : All links are OK, "Running SMIL under Python", lacks some contents. > > > > > On Fri, Jan 25, 2019 at 2:11 AM Jose-Marcio Martins da Cruz > > >> wrote: > > > > > >? ? ?Hi all, > > > >? ? ?I'm the maintainer of SMIL and I'd like to, eventually integrate SMIL into SciPy or Scikit-image. > > > >? ? ?Announcement: SMIL 0.9.1 > >? ? ?======================== > > > >? ? ?I'm pleased to announce SMIL - Simple Morphological Image Library - v. 0.9.1 > > > >? ? ?SMIL is a library with all basic and some advanced mathematical morphology features which can be extended > with plugins > >? ? ?and user modules. > > > >? ? ?Among its features it can handle 2D and 3D images and can handle data from/to NumPy data. > > > >? ? ?It's been developed in C++ and has a Python interface thanks to Swig. > > > >? ? ?SMIL is a product of CMM, the research Center of Mathematical Morphology of Mines-Paristech, where the > discipline of > >? ? ?Mathematical Morphology was created in the 60's by Jean Serra and Georges Matheron. > > > >? ? ?SMIL is distributed with GPL license. > > > >? ? ?We use SMIL in our research and teaching activities in the field. > > > >? ? ?You can find SMIL - binaries and documentation - at our web site : > > > > http://smil.cmm.mines-paristech.fr > > > >? ? ?or the source code at : > > > > https://github.com/ensmp-cmm/smil > > > >? ? ?Thanks > > > >? ? ?Jose-Marcio > > > >? ? ?-- > > > >? ? ? ? --------------------------------------------------------------- > >? ? ? ? Jose Marcio MARTINS DA CRUZ, Ph.D. > >? ? ? ? Ecole des Mines de Paris > >? ? ? ? CMM - Centre de Morphologie Math?matique > >? ? ? ? --------------------------------------------------------------- > >? ? ? ? ? ? Spam : http://amzn.to/LEscRu ou http://bit.ly/SpamJM > >? ? ? ? --------------------------------------------------------------- > > > >? ? ?_______________________________________________ > >? ? ?SciPy-Dev mailing list > > SciPy-Dev at python.org > > > https://mail.python.org/mailman/listinfo/scipy-dev > > > -- --------------------------------------------------------------- Jose Marcio MARTINS DA CRUZ, Ph.D. Ecole des Mines de Paris CMM - Centre de Morphologie Math?matique --------------------------------------------------------------- Spam : http://amzn.to/LEscRu ou http://bit.ly/SpamJM --------------------------------------------------------------- From stefanv at berkeley.edu Mon Jan 28 18:44:38 2019 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Mon, 28 Jan 2019 15:44:38 -0800 Subject: [SciPy-Dev] ANN: SMIL - Small Morphological Image Library In-Reply-To: <1cf239d6-731b-becd-16e2-f23bfd9c1385@mines-paristech.fr> References: <1cf239d6-731b-becd-16e2-f23bfd9c1385@mines-paristech.fr> Message-ID: <20190128234438.iuugwpzx5nlvw4ni@carbo> Hi Jose-Marcio, On Fri, 25 Jan 2019 11:10:44 +0100, Jose-Marcio Martins da Cruz wrote: > I'm pleased to announce SMIL - Simple Morphological Image Library - > v. 0.9.1 Neat and compact little library; congratulations on the release! > SMIL is distributed with GPL license. Note that the website currently states BSD. Is there any reason you are not using standard NumPy arrays to represent data, or at least return those from the Python API? E.g., the following code would feel very foreign to someone used to Python: >>> im3D = sp.Image(im2.getWidth(), im2.getHeight(), 3) >>> im3D << 0 >>> sp.copy(im1, 0, 256, im3D) >>> sp.copy(im3, im3D, 0, 0, 2) Usage of the `lena` image is somewhat frowned upon nowadays, so you may want to consider replacing that with another image. In skimage, we use astronaut Eileen Collins: https://github.com/scikit-image/scikit-image/blob/v0.14.2/skimage/data/astronaut.png Best regards, St?fan From ilhanpolat at gmail.com Tue Jan 29 11:53:37 2019 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Tue, 29 Jan 2019 17:53:37 +0100 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: <20190124184544.s33zdmtecorrogtz@carbo> Message-ID: > The problem I have with this is that there really is not an option to "try automatic first". There is "try `np.ones(n_variables)` first". This, or any other value, is really not a defensible choice for starting values. Starting values always depend on the function used and the data being fit. Why not? 1.s are as good as any other choice. I don't know anything about the curve fit I will get in the end. So I don't need to pretend that I know a good starting value. Maybe for 3 parameter functions, fine I can come up with an argument but you surely don't expect me to know the starting point if I am fitting a 7 parameter func involving esoteric structure. At that point I am completely ignorant about anything about this function. So not knowing where to start is not due to my noviceness about the tools but because by definition. My search might even turn out to be convex so initial value won't matter. > Currently `curve_fit` converts `p0=None` to `np.ones(n_variables)` without warning or explanation. Again, I do not use `curve_fit()` myself. I find several aspects of it unpleasant. It is documented in the p0 argument docs. I am using this function quite often. That's why I don't like extra required arguments. It's annoying to enter some random array just to please the API where I know that I am just taking a shot in the dark. I am pretty confident that if we force this argument most of the people you want to educate will enter np.zeros(n). Then they will get an even weirder error then they'll try np.ones(n) but misremember n then they get another error to remember the func parameter number which has already trippped up twice. This curve_fit function is one of those functions that you don't run just once and be done with it but over and over again until you give up or satisfied. Hence defaults matter a lot from a UX perspective. "If you have an initial value in mind fine enter it otherwise let me do my thing" is much better than "I don't care about your quick experiment give me some values or I will keep tripping up". > But this behavior strikes me as utterly wrong and a disservice to the scipy ecosystem. I do not think that a documentation change is sufficient. Maybe a bit overzealous? On Mon, Jan 28, 2019 at 3:07 AM Matt Newville wrote: > Hi All, > > On Thu, Jan 24, 2019 at 1:20 PM wrote: > >> >> >> On Thu, Jan 24, 2019 at 1:46 PM Stefan van der Walt >> wrote: >> >>> Hi Josef, >>> >>> On Thu, 24 Jan 2019 11:26:09 -0500, josef.pktd at gmail.com wrote: >>> > I think making initial values compulsory is too much of a break with >>> > tradition. >>> > IMO, a warning and better documentation would be more appropriate. >>> > >>> https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html >>> > does not show an example with starting values. >>> > curve_fit could issue a warning if p0 is not specified, or warn if >>> > convergence fails and p0 was not specified. >>> >>> Isn't the greater danger that convergence succeeds, with p0 unspecified, >>> and the resulting model not being at all what the user had in mind? >>> >> >> Unless the optimization problem is globally convex, the user always needs >> to check the results. >> >> >> >>> >>> > I think it should also be possible to improve the default starting >>> values, >>> > e.g. if the function fails or if bounds are provided. >>> >>> This is the type of magic I hope we can avoid. Having different >>> execution paths based on some vaguely defined notion of perceived >>> failure seems dangerous at best. >>> >> >> I there is no guarantee for a global optimum, it's still what either the >> program or the user has to do. >> >> E.g. for statsmodels (very rough guess on numbers) >> 90% of the cases work fine >> 10% of the cases the data is not appropriate, singular, ill conditioned >> or otherwise "not nice" >> 10% of the cases the optimizer has problems and does not converge. >> >> In this last case either the program or the user needs to work more: >> We can try different optimizers, e.g. start with nelder-mead before >> switching to a gradient optimizer. >> Or, switch to global optimizer from scipy, if the underlying model is >> complex and might not be well behaved. >> or pure man's global optimizer: try out many different random or >> semi-random starting values. >> (and if all fails go back to the drawing board and try to find a >> parameterization that is better behaved.) >> >> statsmodels is switching optimizers in some cases, but in most cases it >> is up to the user to change the optimizers after convergence failure. >> However, we did select default optimizers by which scipy optimizer seems >> to work well for the various cases. >> Stata is also switching optimizers in some cases, and AFAIR has in some >> cases and option to "try harder". >> statsmodels is still missing an automatic "try harder" option, that >> automatically switches optimizers on convergence failure. >> >> >> >>> >>> > I'm not a user of curve_fit, but I guess there might be a strong >>> selection >>> > bias in use cases when helping out users that run into problems. >>> >>> I agree; and I think this can be accomplished by better documentation, >>> helpful warnings, and assisting the user in choosing correct parameters. >>> >> >> The main question for me is whether the warnings and improved >> documentation are enough, or whether curve_fit needs to force every user to >> specify the starting values. >> > > I may not be understanding what you say about statsmodel. Is that using > or related to `curve_fit()`? Perhaps it works well in many cases for > you because of the limited range of the probability distribution functions > being fitted? > > My view on this starts with the fact that Initial values are actually > required in non-linear optimization. In a sense, not "forcing every user > to specify starting values" and silently replacing `None` with > `np.ones(n_variables)` is misinforming the user. I cannot think of any > reason to recommend this behavior. It will certainly fail spectacularly > sometimes. I would not try to guess (or probably believe anyone else's > guess ;)) how often this would happen, but I can tell you that for > essentially all of the fitting I do and my applications do for other users, > giving initial values of 1 for all parameters would fail in such a way as > to not move past the initial values (that is "not work" in a way that might > easily confuse a novice). Again, I do not use `curve_fit()`, but clearly > `p0=None` fails often enough to cause confusion. > > > i.e. I think >> Try automatic first, and if that does not succeed, then the user has to >> think again, >> is more convenient, than >> "you have to think about your problem first, don't just hit the button". >> >> > The problem I have with this is that there really is not an option to "try > automatic first". There is "try `np.ones(n_variables)` first". This, > or any other value, is really not a defensible choice for starting values. > Starting values always depend on the function used and the data being > fit. > > The user of `curve_fit` already has to provide data (about which they > presumably know something) and write a function that models that data. I > think that qualifies as "has to think about their problem". They should be > able to make some guess ("prior belief") of the parameter values. > Hopefully they will run their modelling function with some sensible values > for the parameters before running `curve_fit` to make sure that their > function runs correctly. > > Currently `curve_fit` converts `p0=None` to `np.ones(n_variables)` > without warning or explanation. Again, I do not use `curve_fit()` myself. > I find several aspects of it unpleasant. But this behavior strikes me as > utterly wrong and a disservice to the scipy ecosystem. I do not think > that a documentation change is sufficient. I can believe a deprecation > time would be reasonable, but I would hope this behavior could be removed. > > --Matt Newville > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jose-Marcio.Martins at mines-paristech.fr Tue Jan 29 14:17:32 2019 From: Jose-Marcio.Martins at mines-paristech.fr (Jose-Marcio Martins da Cruz) Date: Tue, 29 Jan 2019 20:17:32 +0100 Subject: [SciPy-Dev] ANN: SMIL - Small Morphological Image Library In-Reply-To: <20190128234438.iuugwpzx5nlvw4ni@carbo> References: <1cf239d6-731b-becd-16e2-f23bfd9c1385@mines-paristech.fr> <20190128234438.iuugwpzx5nlvw4ni@carbo> Message-ID: Hi Stefan, On 1/29/19 12:44 AM, Stefan van der Walt wrote: > Hi Jose-Marcio, > > On Fri, 25 Jan 2019 11:10:44 +0100, Jose-Marcio Martins da Cruz wrote: >> I'm pleased to announce SMIL - Simple Morphological Image Library - >> v. 0.9.1 > > Neat and compact little library; congratulations on the release! Thanks, > >> SMIL is distributed with GPL license. > > Note that the website currently states BSD. Sorry, you're right ! > > Is there any reason you are not using standard NumPy arrays to represent > data, or at least return those from the Python API? E.g., the following > code would feel very foreign to someone used to Python: > For the moment, you can go back and forth from NumPy. See below. Some internal users are asking for this, but... The library was optimized to have better performance with data in the usual matrix structure. We could surely rewrite all this, as was done by OpenCV, but There will surely have a performance loss. NumPy data structures are very flexible at the cost of some complexity. But we're thinking in how to implement it in some clever way. >>>> im3D = sp.Image(im2.getWidth(), im2.getHeight(), 3) >>>> im3D << 0 >>>> sp.copy(im1, 0, 256, im3D) >>>> sp.copy(im3, im3D, 0, 0, 2) Sorry, this part isn't yet documented in the web site. * to create an image from numpy data : im.fromNumArray(npArr) * to get a numpy pointer from an Image : npArr = im.getNumArray() Related to this, there are floating point images. In Morphology, we don't need this as only ordered sets matter, not the real values. Some people (medical or materials) handling 3D images from scanners convert them to 32 bits before doing Morpho operations on them. But we need to have an answer to this. > > Usage of the `lena` image is somewhat frowned upon nowadays, so you may > want to consider replacing that with another image. In skimage, we use > astronaut Eileen Collins: > > https://github.com/scikit-image/scikit-image/blob/v0.14.2/skimage/data/astronaut.png Thanks. This is an interesting image. I'll add some examples with this. Thanks a lot for your comments. Best regards Jos?-Marcio > > Best regards, > St?fan > From tyler.je.reddy at gmail.com Tue Jan 29 14:47:36 2019 From: tyler.je.reddy at gmail.com (Tyler Reddy) Date: Tue, 29 Jan 2019 11:47:36 -0800 Subject: [SciPy-Dev] SciPy 1.0 paper writing proposal In-Reply-To: References: <20180126074429.3csbkpsg7hbme22g@fastmail.com> <20180130195725.GC20306@lakota> Message-ID: Just a reminder that the manuscript for SciPy 1.0 is starting to look more mature now and help with cleanup / improvement tasks is always appreciated. There's a checklist for some revisions ( https://github.com/scipy/scipy-articles/issues/65 ), but maybe we don't need all of those depending on what people think. Some people have expressed concerns about the density of the manuscript, which was a somewhat unavoidable thing for a project of this size. We have well over double the normally-suggested citation limit for Scientific Reports, etc. May be ok in the end, depending on editorial decisions. Constructive suggestions for dealing with that (smart use of supporting information, etc.) are also welcome. Best wishes, Tyler On Tue, 20 Mar 2018 at 12:57, Warren Weckesser wrote: > > > On Tue, Mar 20, 2018 at 11:24 AM, Ralf Gommers > wrote: > >> >> >> On Sun, Mar 18, 2018 at 1:07 PM, Scott Sievert >> wrote: >> >>> What updates are there on this? Ralf said mid-April was a submission >>> target, but I haven?t seen anything yet. I?m more than happy to write some >>> on my contribution. >>> >> >> We've decided on the document outline ( >> https://github.com/scipy/scipy-articles/pull/14) and the first section >> drafts have started to appear (thanks Tyler and Matt!): >> https://github.com/scipy/scipy-articles/issues/13 >> https://github.com/scipy/scipy-articles/pull/15 >> >> However you're right that we're a bit slow, mid-April finishing a first >> draft could still be feasibly if people start writing asap, but submission >> by then we won't make. >> >> https://github.com/scipy/scipy-articles/issues/9 contains the sections >> and people who volunteered to author them. There's still some sections that >> need a volunteer; especially the technical sections could be written by >> people other than the main developer/maintainer - if you're confident you >> could (co-)write a section, feel free to comment there and jump in. >> >> Cheers, >> Ralf >> >> > > I'll be able to help with writing on `stats` and `signal` (and possibly > other areas, if needed). > > Warren > > > Scott >> >> On February 4, 2018 at 3:43:17 AM, Ralf Gommers (ralf.gommers at gmail.com) >> wrote: >> >> >> >> On Tue, Jan 30, 2018 at 8:10 PM, Ilhan Polat >> wrote: >> >>> Same as Eric also if you need some TeX stuff, I have a very broad and >>> useless TeX experience. >>> >> >> Thanks Eric & Ilhan. I just sent the follow-up email to the coordinating >> volunteers. >> >> Cheers, >> Ralf >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andyfaff at gmail.com Tue Jan 29 15:54:17 2019 From: andyfaff at gmail.com (Andrew Nelson) Date: Wed, 30 Jan 2019 07:54:17 +1100 Subject: [SciPy-Dev] Better __repr__ for scipy.stats distributions Message-ID: I'm using various rv_continuous distributions as priors for a Bayesian analysis package I'm writing. Part of the GUI functionality is to produce code fragments from the GUI state to encourage scientific reproducibility. i.e. set up the analysis objects in a GUI, then output code that could be run from a script. I'm making a lot of use of __repr__ to produce text that can be used to build the objects. However, I'm stalled at recreating the scipy.stats distributions, and would like to propose that the __repr__ of these classes be improved to a point where the repr can be used to reproduce the distribution. e.g instead of: >>> from scipy.stats import norm >>> print(repr(norm(0, 1))) one would get: norm(0, 1) This would require that all the initialisation variables be accessible from the object (which I think was discussed recently if I remember rightly). Would others find this useful? Andrew. -- _____________________________________ Dr. Andrew Nelson _____________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Tue Jan 29 16:30:03 2019 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Tue, 29 Jan 2019 13:30:03 -0800 Subject: [SciPy-Dev] Better __repr__ for scipy.stats distributions In-Reply-To: References: Message-ID: <20190129213003.vigtuzae6d22jo6d@carbo> On Wed, 30 Jan 2019 07:54:17 +1100, Andrew Nelson wrote: > ... would like to propose that the __repr__ of > these classes be improved to a point where the repr can be used to > reproduce the distribution. > > e.g instead of: > > >>> from scipy.stats import norm > >>> print(repr(norm(0, 1))) > > > one would get: > norm(0, 1) That seems like a very reasonable request. `scikit-learn` does something similar: In [6]: br = linear_model.BayesianRidge() In [7]: br Out[7]: BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, compute_score=False, copy_X=True, fit_intercept=True, lambda_1=1e-06, lambda_2=1e-06, n_iter=300, normalize=False, tol=0.001, verbose=False) The only concern I can see may be about the internal state of the RNG that keeps changing after values are generated. St?fan From stefanv at berkeley.edu Tue Jan 29 20:29:42 2019 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Tue, 29 Jan 2019 17:29:42 -0800 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: <20190124184544.s33zdmtecorrogtz@carbo> Message-ID: <20190130012942.rzb4btcasmrr5pcr@carbo> On Tue, 29 Jan 2019 17:53:37 +0100, Ilhan Polat wrote: > > The problem I have with this is that there really is not an option to "try > automatic first". There is "try `np.ones(n_variables)` first". This, or > any other value, is really not a defensible choice for starting > values. Starting > values always depend on the function used and the data being fit. > > Why not? 1.s are as good as any other choice. This entire thread is about why np.ones is not a good choice. It is not as good as any other in any particular case, and is patently wrong for most problems. Sure, each problem has its own "better" and "worse" initial values, so if you are trying to optimize over all possible cases, it may be as good a choice as any other. But the question here is exactly about whether that is a sensible thing to do. Matt argues that it is not, and I find his argument convincing. The current API leads the user to believe that, without specifying p0, they are likely to get a reasonable answer out. We can help our users by signaling to them that this is simply not the case. Best regards, St?fan From blairuk at gmail.com Wed Jan 30 02:46:55 2019 From: blairuk at gmail.com (Blair Azzopardi) Date: Wed, 30 Jan 2019 07:46:55 +0000 Subject: [SciPy-Dev] Build failures (PR 9523) Message-ID: Hi all I recently refreshed my PR with latest master and have some build failures that I'm not sure how to handle. One appears to be an issue with a new azure pipeline check. This produces some cryptic messages including powershell failure codes. https://github.com/scipy/scipy/pull/9523/checks?check_run_id=56360311 Another is a typical spurious error in Travis, where it fails to download a python 3.5 tar ball. I tested the link and the tarball exists so might have been caused by an intermittent network issue. https://travis-ci.org/scipy/scipy/jobs/486160253 Any suggestions? It would be useful if a PR owner was able to re-trigger the travis build somehow. Perhaps not possible. Kind regards Blair -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilhanpolat at gmail.com Wed Jan 30 02:47:22 2019 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Wed, 30 Jan 2019 08:47:22 +0100 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: <20190130012942.rzb4btcasmrr5pcr@carbo> References: <20190124184544.s33zdmtecorrogtz@carbo> <20190130012942.rzb4btcasmrr5pcr@carbo> Message-ID: Sorry, I can't see this helps our users by forcing them entering values that they have no idea about would educate them about nonconvexity of the problem. This is properly documented in the argument docs and we can expand it and if they don't read the docs we are done. My opinion is that we are trying to solve a problem that doesn't exist. On Wed, Jan 30, 2019 at 2:30 AM Stefan van der Walt wrote: > On Tue, 29 Jan 2019 17:53:37 +0100, Ilhan Polat wrote: > > > The problem I have with this is that there really is not an option to > "try > > automatic first". There is "try `np.ones(n_variables)` first". This, > or > > any other value, is really not a defensible choice for starting > > values. Starting > > values always depend on the function used and the data being fit. > > > > Why not? 1.s are as good as any other choice. > > This entire thread is about why np.ones is not a good choice. It is not > as good as any other in any particular case, and is patently wrong for > most problems. Sure, each problem has its own "better" and "worse" > initial values, so if you are trying to optimize over all possible > cases, it may be as good a choice as any other. But the question here > is exactly about whether that is a sensible thing to do. Matt argues > that it is not, and I find his argument convincing. > > The current API leads the user to believe that, without specifying p0, > they are likely to get a reasonable answer out. We can help our users > by signaling to them that this is simply not the case. > > Best regards, > St?fan > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From newville at cars.uchicago.edu Wed Jan 30 11:04:13 2019 From: newville at cars.uchicago.edu (Matt Newville) Date: Wed, 30 Jan 2019 10:04:13 -0600 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: <20190124184544.s33zdmtecorrogtz@carbo> Message-ID: Hi Ilhan, On Tue, Jan 29, 2019 at 10:54 AM Ilhan Polat wrote: > > The problem I have with this is that there really is not an option to "try > automatic first". There is "try `np.ones(n_variables)` first". This, > or any other value, is really not a defensible choice for starting values. > Starting values always depend on the function used and the data being > fit. > > Why not? 1.s are as good as any other choice. > Well, I agree that `np.ones(n_variables)` is as good as any other choice. All default choices are horrible and not defensible. Mathematically, algorithmically, and conceptually, initial values ARE REQUIRED for non-linear least squares optimization. The codes underlying `curve_fit()` (including `leastsq` and the mess that is `least_square`) do not permit the user to not provide initial values. The algorithms used simply do not make sense without initial values. Programmatically, an optional keyword argument to a function, say with a default value of `None`, implies that there is a sensible default for that value. Thus, default fitting tolerances might be 1.e-7 (or, perhaps square root of machine precision) or the default value for "method to calculate jacobian" might be `None` to mean "calculate by finite difference". For these optional inputs, the function (say, `curve_fit()`) has a sensible default value that will work independently from the other input. That notion of "independent, sensible default value" is not ever possible for the initial values `p0` for `curve_fit`. Sensible initial values always depend on the data to be fit and the function modeling the data. Change the data values dramatically and `p0` must change. Change the definition of the function (or even the order of the arguments), and `p0` must change. There is not and cannot be a sensible default. Telling the user that `p0` is optional and can be `None` (as the current documentation does clearly state) is utterly and profoundly wrong. It is mathematically indefensible. It is horrible API design. It harms the integretiy of `scipy.optimize` to tell the user this. I don't know anything about the curve fit I will get in the end. So I > don't need to pretend that I know a good starting value. > It is not possible to do curve-fitting or non-linear least-squares minimization when you "don't know anything". The user MUST provide data to be modeled and MUST provide a function to model that data. It is "pretending" to think that this is sufficient. The user also must provide initial values. Maybe for 3 parameter functions, fine I can come up with an argument but > you surely don't expect me to know the starting point if I am fitting a 7 > parameter func involving esoteric structure. At that point I am completely > ignorant about anything about this function. So not knowing where to start > is not due to my noviceness about the tools but because by definition. My > search might even turn out to be convex so initial value won't matter. > It is not possible to do curve-fitting or non-linear least-squares minimization when one is completely ignorant of the the function. > > Currently `curve_fit` converts `p0=None` to `np.ones(n_variables)` > without warning or explanation. Again, I do not use `curve_fit()` myself. > I find several aspects of it unpleasant. > > It is documented in the p0 argument docs. I am using this function quite > often. That's why I don't like extra required arguments. It's annoying to > enter some random array just to please the API where I know that I am just > taking a shot in the dark. > It is not an extra keyword argument. It is required input for the problem. `curve_fit()` is converting your feigned (or perhaps obstinate) ignorance to a set of starting values for you. But starting values simply cannot be independent of the input model function or input data. I am pretty confident that if we force this argument most of the people you > want to educate will enter np.zeros(n). Then they will get an even weirder > error then they'll try np.ones(n) but misremember n then they get another > error to remember the func parameter number which has already trippped up > twice. This curve_fit function is one of those functions that you don't run > just once and be done with it but over and over again until you give up or > satisfied. Hence defaults matter a lot from a UX perspective. "If you have > an initial value in mind fine enter it otherwise let me do my thing" is > much better than "I don't care about your quick experiment give me some > values or I will keep tripping up". > > I refuse to speculate on what "most users" will do, and I also refuse to accept your speculation on this without evidence. There a great many applications of curve-fitting for which `np.ones(n_variables)` and `np.zeros(n_variables)` will completely fail -- the fit will never move from the starting point. For the kinds of fits done in the programs I support, either of these would mean that essentially all fits would never move from its starting point, as at least one parameter being 0 or 1 would essentially always mean the model function was 0 over the full data range. But, again, I don't use `curve_fit` but other tools built on top of `scipy.optimize()`. Generally, the model function and data together imply sensible or at least guessable default values, but these cannot independent of model or data. In lmfit we do not permit the user to not supply default starting values -- default parameter values are `None` which will quickly raise a ValueError. I don't recall ever getting asked to change this. Because it should be obvious to all users that each parameter requires an initial value. Where appropriate and possible, we do provide methods for model functions to make initial guesses based on data. But again, the starting values always depend on model function and data. Default arguments *do* matter from a UX perspective when defaults are sensible. `curve_fit` has three required positional arguments: a model function (that must be annoying to have to provide), "y" data to be fit (well, I guess I have that), and "x" data. Why are those all required? Why not allow `func=None` to be a function that calculates a sine wave? Why not allow `y=None` to mean `np.ones(1000)`? Why not allow `x=None` to mean `np.arange(len(y))`? Wouldn't that be friendlier to the user? > But this behavior strikes me as utterly wrong and a disservice to the > scipy ecosystem. I do not think that a documentation change is > sufficient. > > Maybe a bit overzealous? > > Nope, just trying to help `curve_fit()` work better. --Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From theodore.goetz at gmail.com Wed Jan 30 11:46:52 2019 From: theodore.goetz at gmail.com (Johann Goetz) Date: Wed, 30 Jan 2019 08:46:52 -0800 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: <20190124184544.s33zdmtecorrogtz@carbo> Message-ID: I appologize for the anecdotal nature of my experience, but I have used curve_fit() since it was first introduced and I just took a tour of almost all the times I've used it (well over 100 times, all told) and not once did I fail to provide initial parameters - even if I merely set them to all ones which it turns out I did more than I care to admit. This covers "production-level" code used by others as well as one-off throw-away jupyter notebooks. It may be that it never occured to me that p0 was optional. Anyways, I'm all for correctness and quality over backward-compatibility, even in the libraries and modules I consume, so I'd vote for making p0 required. -- Johann On Wed, Jan 30, 2019 at 8:05 AM Matt Newville wrote: > Hi Ilhan, > > On Tue, Jan 29, 2019 at 10:54 AM Ilhan Polat wrote: > >> > The problem I have with this is that there really is not an option to "try >> automatic first". There is "try `np.ones(n_variables)` first". This, >> or any other value, is really not a defensible choice for starting values. >> Starting values always depend on the function used and the data being >> fit. >> >> Why not? 1.s are as good as any other choice. >> > > Well, I agree that `np.ones(n_variables)` is as good as any other choice. > All default choices are horrible and not defensible. > > Mathematically, algorithmically, and conceptually, initial values ARE > REQUIRED for non-linear least squares optimization. The codes underlying > `curve_fit()` (including `leastsq` and the mess that is `least_square`) do > not permit the user to not provide initial values. The algorithms used > simply do not make sense without initial values. > > Programmatically, an optional keyword argument to a function, say with a > default value of `None`, implies that there is a sensible default for that > value. Thus, default fitting tolerances might be 1.e-7 (or, perhaps square > root of machine precision) or the default value for "method to calculate > jacobian" might be `None` to mean "calculate by finite difference". For > these optional inputs, the function (say, `curve_fit()`) has a sensible > default value that will work independently from the other input. > > That notion of "independent, sensible default value" is not ever possible > for the initial values `p0` for `curve_fit`. Sensible initial values > always depend on the data to be fit and the function modeling the data. > Change the data values dramatically and `p0` must change. Change the > definition of the function (or even the order of the arguments), and `p0` > must change. There is not and cannot be a sensible default. > > Telling the user that `p0` is optional and can be `None` (as the current > documentation does clearly state) is utterly and profoundly wrong. It is > mathematically indefensible. It is horrible API design. It harms the > integretiy of `scipy.optimize` to tell the user this. > > I don't know anything about the curve fit I will get in the end. So I >> don't need to pretend that I know a good starting value. >> > > It is not possible to do curve-fitting or non-linear least-squares > minimization when you "don't know anything". The user MUST provide data to > be modeled and MUST provide a function to model that data. It is > "pretending" to think that this is sufficient. The user also must provide > initial values. > > Maybe for 3 parameter functions, fine I can come up with an argument but >> you surely don't expect me to know the starting point if I am fitting a 7 >> parameter func involving esoteric structure. At that point I am completely >> ignorant about anything about this function. So not knowing where to start >> is not due to my noviceness about the tools but because by definition. My >> search might even turn out to be convex so initial value won't matter. >> > > It is not possible to do curve-fitting or non-linear least-squares > minimization when one is completely ignorant of the the function. > > > >> > Currently `curve_fit` converts `p0=None` to `np.ones(n_variables)` >> without warning or explanation. Again, I do not use `curve_fit()` myself. >> I find several aspects of it unpleasant. >> >> It is documented in the p0 argument docs. I am using this function quite >> often. That's why I don't like extra required arguments. It's annoying to >> enter some random array just to please the API where I know that I am just >> taking a shot in the dark. >> > > It is not an extra keyword argument. It is required input for the > problem. `curve_fit()` is converting your feigned (or perhaps obstinate) > ignorance to a set of starting values for you. But starting values simply > cannot be independent of the input model function or input data. > > > I am pretty confident that if we force this argument most of the people >> you want to educate will enter np.zeros(n). Then they will get an even >> weirder error then they'll try np.ones(n) but misremember n then they get >> another error to remember the func parameter number which has already >> trippped up twice. This curve_fit function is one of those functions that >> you don't run just once and be done with it but over and over again until >> you give up or satisfied. Hence defaults matter a lot from a UX >> perspective. "If you have an initial value in mind fine enter it otherwise >> let me do my thing" is much better than "I don't care about your quick >> experiment give me some values or I will keep tripping up". >> >> > I refuse to speculate on what "most users" will do, and I also refuse to > accept your speculation on this without evidence. There a great many > applications of curve-fitting for which `np.ones(n_variables)` and > `np.zeros(n_variables)` will completely fail -- the fit will never move > from the starting point. For the kinds of fits done in the programs I > support, either of these would mean that essentially all fits would never > move from its starting point, as at least one parameter being 0 or 1 would > essentially always mean the model function was 0 over the full data range. > > > But, again, I don't use `curve_fit` but other tools built on top of > `scipy.optimize()`. Generally, the model function and data together imply > sensible or at least guessable default values, but these cannot independent > of model or data. In lmfit we do not permit the user to not supply default > starting values -- default parameter values are `None` which will quickly > raise a ValueError. I don't recall ever getting asked to change this. > Because it should be obvious to all users that each parameter requires an > initial value. Where appropriate and possible, we do provide methods for > model functions to make initial guesses based on data. But again, the > starting values always depend on model function and data. > > Default arguments *do* matter from a UX perspective when defaults are > sensible. `curve_fit` has three required positional arguments: a model > function (that must be annoying to have to provide), "y" data to be fit > (well, I guess I have that), and "x" data. Why are those all required? > Why not allow `func=None` to be a function that calculates a sine wave? > Why not allow `y=None` to mean `np.ones(1000)`? Why not allow `x=None` to > mean `np.arange(len(y))`? Wouldn't that be friendlier to the user? > > > > But this behavior strikes me as utterly wrong and a disservice to the >> scipy ecosystem. I do not think that a documentation change is >> sufficient. >> >> Maybe a bit overzealous? >> >> > Nope, just trying to help `curve_fit()` work better. > > --Matt > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tyler.je.reddy at gmail.com Wed Jan 30 12:39:43 2019 From: tyler.je.reddy at gmail.com (Tyler Reddy) Date: Wed, 30 Jan 2019 09:39:43 -0800 Subject: [SciPy-Dev] Build failures (PR 9523) In-Reply-To: References: Message-ID: re: Azure failures, looks like every job is being sent to Linux containers, which is the opposite of what NumPy currently sees -- every job sent to Windows containers. Anyway, we're trying to ping someone at Microsoft for advice. On Tue, 29 Jan 2019 at 23:48, Blair Azzopardi wrote: > Hi all > > I recently refreshed my PR with latest master and have some build failures > that I'm not sure how to handle. > > One appears to be an issue with a new azure pipeline check. This produces > some cryptic messages including powershell failure codes. > > https://github.com/scipy/scipy/pull/9523/checks?check_run_id=56360311 > > Another is a typical spurious error in Travis, where it fails to download > a python 3.5 tar ball. I tested the link and the tarball exists so might > have been caused by an intermittent network issue. > > https://travis-ci.org/scipy/scipy/jobs/486160253 > > Any suggestions? > > It would be useful if a PR owner was able to re-trigger the travis build > somehow. Perhaps not possible. > > Kind regards > Blair > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilhanpolat at gmail.com Wed Jan 30 13:47:43 2019 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Wed, 30 Jan 2019 19:47:43 +0100 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: <20190124184544.s33zdmtecorrogtz@carbo> Message-ID: I am a frequent user of this function. I am also occasionally teaching control theory and I use this also professionally, I provided a proper use case as a "user". You already admitted that you don't use it. So excuse me when I say that I have more experience than you about this function API (please read this twice; function API not the underlying theory). How can I even provide evidence on this completely subjective matter? Here is all the issues related to curve_fit https://github.com/scipy/scipy/search?q=curve_fit&type=Issues As far as I know there was no complaint so far. Hence that means it is really not that big of a deal that would grant such tone. I don't know what to add other than what I already provided. You cannot make educated guesses about initial points on nonconvex searches. As you mentioned, we are as blind as np.ones(n) choice. That's just being pedantic about the API. Making the function api more annoying than it has to be is for me a wrong choice or even unpythonic if you are among that kind of crowd. A wrongness that matlab and alike software continuously annoy with their clunky UI. Users are not ours to educate. And if something is None then it means the code will make up some values not the correct ones; as clearly documented, some values. If you have better ones provide them. Having said that, I am getting a lot of "horrible, utterly wrong, obstinate, disservice" etc. that makes me uncomfortable and for me it's past beyond the discussion. I am not good at interwebz so I'd better stop here. We are talking about a simple function argument being required or optional that is essentially a made-up array. Thus I would like to reserve these words for more important occasions It's just a python library I am contributing to so I don't want to be involved in this particular issue any more. Since I am clearly biased on one side, I leave it to other members to decide, I am fine with any outcome. Best, On Wed, Jan 30, 2019 at 5:05 PM Matt Newville wrote: > Hi Ilhan, > > On Tue, Jan 29, 2019 at 10:54 AM Ilhan Polat wrote: > >> > The problem I have with this is that there really is not an option to "try >> automatic first". There is "try `np.ones(n_variables)` first". This, >> or any other value, is really not a defensible choice for starting values. >> Starting values always depend on the function used and the data being >> fit. >> >> Why not? 1.s are as good as any other choice. >> > > Well, I agree that `np.ones(n_variables)` is as good as any other choice. > All default choices are horrible and not defensible. > > Mathematically, algorithmically, and conceptually, initial values ARE > REQUIRED for non-linear least squares optimization. The codes underlying > `curve_fit()` (including `leastsq` and the mess that is `least_square`) do > not permit the user to not provide initial values. The algorithms used > simply do not make sense without initial values. > > Programmatically, an optional keyword argument to a function, say with a > default value of `None`, implies that there is a sensible default for that > value. Thus, default fitting tolerances might be 1.e-7 (or, perhaps square > root of machine precision) or the default value for "method to calculate > jacobian" might be `None` to mean "calculate by finite difference". For > these optional inputs, the function (say, `curve_fit()`) has a sensible > default value that will work independently from the other input. > > That notion of "independent, sensible default value" is not ever possible > for the initial values `p0` for `curve_fit`. Sensible initial values > always depend on the data to be fit and the function modeling the data. > Change the data values dramatically and `p0` must change. Change the > definition of the function (or even the order of the arguments), and `p0` > must change. There is not and cannot be a sensible default. > > Telling the user that `p0` is optional and can be `None` (as the current > documentation does clearly state) is utterly and profoundly wrong. It is > mathematically indefensible. It is horrible API design. It harms the > integretiy of `scipy.optimize` to tell the user this. > > I don't know anything about the curve fit I will get in the end. So I >> don't need to pretend that I know a good starting value. >> > > It is not possible to do curve-fitting or non-linear least-squares > minimization when you "don't know anything". The user MUST provide data to > be modeled and MUST provide a function to model that data. It is > "pretending" to think that this is sufficient. The user also must provide > initial values. > > Maybe for 3 parameter functions, fine I can come up with an argument but >> you surely don't expect me to know the starting point if I am fitting a 7 >> parameter func involving esoteric structure. At that point I am completely >> ignorant about anything about this function. So not knowing where to start >> is not due to my noviceness about the tools but because by definition. My >> search might even turn out to be convex so initial value won't matter. >> > > It is not possible to do curve-fitting or non-linear least-squares > minimization when one is completely ignorant of the the function. > > > >> > Currently `curve_fit` converts `p0=None` to `np.ones(n_variables)` >> without warning or explanation. Again, I do not use `curve_fit()` myself. >> I find several aspects of it unpleasant. >> >> It is documented in the p0 argument docs. I am using this function quite >> often. That's why I don't like extra required arguments. It's annoying to >> enter some random array just to please the API where I know that I am just >> taking a shot in the dark. >> > > It is not an extra keyword argument. It is required input for the > problem. `curve_fit()` is converting your feigned (or perhaps obstinate) > ignorance to a set of starting values for you. But starting values simply > cannot be independent of the input model function or input data. > > > I am pretty confident that if we force this argument most of the people >> you want to educate will enter np.zeros(n). Then they will get an even >> weirder error then they'll try np.ones(n) but misremember n then they get >> another error to remember the func parameter number which has already >> trippped up twice. This curve_fit function is one of those functions that >> you don't run just once and be done with it but over and over again until >> you give up or satisfied. Hence defaults matter a lot from a UX >> perspective. "If you have an initial value in mind fine enter it otherwise >> let me do my thing" is much better than "I don't care about your quick >> experiment give me some values or I will keep tripping up". >> >> > I refuse to speculate on what "most users" will do, and I also refuse to > accept your speculation on this without evidence. There a great many > applications of curve-fitting for which `np.ones(n_variables)` and > `np.zeros(n_variables)` will completely fail -- the fit will never move > from the starting point. For the kinds of fits done in the programs I > support, either of these would mean that essentially all fits would never > move from its starting point, as at least one parameter being 0 or 1 would > essentially always mean the model function was 0 over the full data range. > > > But, again, I don't use `curve_fit` but other tools built on top of > `scipy.optimize()`. Generally, the model function and data together imply > sensible or at least guessable default values, but these cannot independent > of model or data. In lmfit we do not permit the user to not supply default > starting values -- default parameter values are `None` which will quickly > raise a ValueError. I don't recall ever getting asked to change this. > Because it should be obvious to all users that each parameter requires an > initial value. Where appropriate and possible, we do provide methods for > model functions to make initial guesses based on data. But again, the > starting values always depend on model function and data. > > Default arguments *do* matter from a UX perspective when defaults are > sensible. `curve_fit` has three required positional arguments: a model > function (that must be annoying to have to provide), "y" data to be fit > (well, I guess I have that), and "x" data. Why are those all required? > Why not allow `func=None` to be a function that calculates a sine wave? > Why not allow `y=None` to mean `np.ones(1000)`? Why not allow `x=None` to > mean `np.arange(len(y))`? Wouldn't that be friendlier to the user? > > > > But this behavior strikes me as utterly wrong and a disservice to the >> scipy ecosystem. I do not think that a documentation change is >> sufficient. >> >> Maybe a bit overzealous? >> >> > Nope, just trying to help `curve_fit()` work better. > > --Matt > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Jan 30 14:27:29 2019 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 30 Jan 2019 14:27:29 -0500 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: <20190124184544.s33zdmtecorrogtz@carbo> Message-ID: On Wed, Jan 30, 2019 at 1:48 PM Ilhan Polat wrote: > I am a frequent user of this function. I am also occasionally teaching > control theory and I use this also professionally, I provided a proper use > case as a "user". You already admitted that you don't use it. So excuse me > when I say that I have more experience than you about this function API > (please read this twice; function API not the underlying theory). How can I > even provide evidence on this completely subjective matter? Here is all the > issues related to curve_fit > > https://github.com/scipy/scipy/search?q=curve_fit&type=Issues > > As far as I know there was no complaint so far. Hence that means it is > really not that big of a deal that would grant such tone. I don't know what > to add other than what I already provided. You cannot make educated guesses > about initial points on nonconvex searches. As you mentioned, we are as > blind as np.ones(n) choice. That's just being pedantic about the API. > Making the function api more annoying than it has to be is for me a wrong > choice or even unpythonic if you are among that kind of crowd. A wrongness > that matlab and alike software continuously annoy with their clunky UI. > Users are not ours to educate. And if something is None then it means the > code will make up some values not the correct ones; as clearly documented, > some values. If you have better ones provide them. > > Having said that, I am getting a lot of "horrible, utterly wrong, > obstinate, disservice" etc. that makes me uncomfortable and for me it's > past beyond the discussion. I am not good at interwebz so I'd better stop > here. We are talking about a simple function argument being required or > optional that is essentially a made-up array. Thus I would like to reserve > these words for more important occasions It's just a python library I am > contributing to so I don't want to be involved in this particular issue any > more. Since I am clearly biased on one side, I leave it to other members to > decide, I am fine with any outcome. > > Best, > Except that I am not a user of curve_fit, I agree completely with Ilhan. Actually, I think `ones` is one of the most reasonable default choices. Most cases are not in an arbitrary space of functions. The parameterization is often chosen to have nice positive values for interpretability. For example, I think that all (or almost all) parameters in scipy.stats distributions are positive or nonnegative (except for loc where any number is possible). Based on a brief browsing of recent stackoverflow questions, it looks like there are many possible problems with curve_fit which is an inherent problem with nonlinear optimization in general. But I think that specifying starting values if the results don't look nice, should be an obvious solution to users (especially with improved docstring for curve_fit). Many other problems on stackoverflow seem to be that users don't use a good parameterization of their function. Starting values is just one possible source of problems, and a user needs to be willing to investigate those when the first try doesn't work. (*) In the rest of scipy.optimize (and in related functions in statsmodels) there is no default starting values also because there is no information about the number of parameters (or length or the parameter) available. curve_fit using inspect and args was designed for making automatic starting values possible. "If at first you don't succeed, try, try again." (I'm strongly in favor of trying "defaults" first, and if that doesn't work, then dig into or debug likely candidates. in loose analogy of test driven development instead of up-front design.) Currently no user is prevented from specifying starting values. After the change everyone is forced to add this additional step, just because some users are surprised that nonlinear optimization doesn't always work (out of the box). Josef > > > > > > On Wed, Jan 30, 2019 at 5:05 PM Matt Newville > wrote: > >> Hi Ilhan, >> >> On Tue, Jan 29, 2019 at 10:54 AM Ilhan Polat >> wrote: >> >>> > The problem I have with this is that there really is not an option to "try >>> automatic first". There is "try `np.ones(n_variables)` first". This, >>> or any other value, is really not a defensible choice for starting values. >>> Starting values always depend on the function used and the data being >>> fit. >>> >>> Why not? 1.s are as good as any other choice. >>> >> >> Well, I agree that `np.ones(n_variables)` is as good as any other >> choice. All default choices are horrible and not defensible. >> >> Mathematically, algorithmically, and conceptually, initial values ARE >> REQUIRED for non-linear least squares optimization. The codes underlying >> `curve_fit()` (including `leastsq` and the mess that is `least_square`) do >> not permit the user to not provide initial values. The algorithms used >> simply do not make sense without initial values. >> >> Programmatically, an optional keyword argument to a function, say with a >> default value of `None`, implies that there is a sensible default for that >> value. Thus, default fitting tolerances might be 1.e-7 (or, perhaps square >> root of machine precision) or the default value for "method to calculate >> jacobian" might be `None` to mean "calculate by finite difference". For >> these optional inputs, the function (say, `curve_fit()`) has a sensible >> default value that will work independently from the other input. >> >> That notion of "independent, sensible default value" is not ever possible >> for the initial values `p0` for `curve_fit`. Sensible initial values >> always depend on the data to be fit and the function modeling the data. >> Change the data values dramatically and `p0` must change. Change the >> definition of the function (or even the order of the arguments), and `p0` >> must change. There is not and cannot be a sensible default. >> >> Telling the user that `p0` is optional and can be `None` (as the current >> documentation does clearly state) is utterly and profoundly wrong. It is >> mathematically indefensible. It is horrible API design. It harms the >> integretiy of `scipy.optimize` to tell the user this. >> >> I don't know anything about the curve fit I will get in the end. So I >>> don't need to pretend that I know a good starting value. >>> >> >> It is not possible to do curve-fitting or non-linear least-squares >> minimization when you "don't know anything". The user MUST provide data to >> be modeled and MUST provide a function to model that data. It is >> "pretending" to think that this is sufficient. The user also must provide >> initial values. >> >> Maybe for 3 parameter functions, fine I can come up with an argument but >>> you surely don't expect me to know the starting point if I am fitting a 7 >>> parameter func involving esoteric structure. At that point I am completely >>> ignorant about anything about this function. So not knowing where to start >>> is not due to my noviceness about the tools but because by definition. My >>> search might even turn out to be convex so initial value won't matter. >>> >> >> It is not possible to do curve-fitting or non-linear least-squares >> minimization when one is completely ignorant of the the function. >> >> >> >>> > Currently `curve_fit` converts `p0=None` to `np.ones(n_variables)` >>> without warning or explanation. Again, I do not use `curve_fit()` myself. >>> I find several aspects of it unpleasant. >>> >>> It is documented in the p0 argument docs. I am using this function quite >>> often. That's why I don't like extra required arguments. It's annoying to >>> enter some random array just to please the API where I know that I am just >>> taking a shot in the dark. >>> >> >> It is not an extra keyword argument. It is required input for the >> problem. `curve_fit()` is converting your feigned (or perhaps obstinate) >> ignorance to a set of starting values for you. But starting values simply >> cannot be independent of the input model function or input data. >> >> >> I am pretty confident that if we force this argument most of the people >>> you want to educate will enter np.zeros(n). Then they will get an even >>> weirder error then they'll try np.ones(n) but misremember n then they get >>> another error to remember the func parameter number which has already >>> trippped up twice. This curve_fit function is one of those functions that >>> you don't run just once and be done with it but over and over again until >>> you give up or satisfied. Hence defaults matter a lot from a UX >>> perspective. "If you have an initial value in mind fine enter it otherwise >>> let me do my thing" is much better than "I don't care about your quick >>> experiment give me some values or I will keep tripping up". >>> >>> >> I refuse to speculate on what "most users" will do, and I also refuse to >> accept your speculation on this without evidence. There a great many >> applications of curve-fitting for which `np.ones(n_variables)` and >> `np.zeros(n_variables)` will completely fail -- the fit will never move >> from the starting point. For the kinds of fits done in the programs I >> support, either of these would mean that essentially all fits would never >> move from its starting point, as at least one parameter being 0 or 1 would >> essentially always mean the model function was 0 over the full data range. >> >> >> But, again, I don't use `curve_fit` but other tools built on top of >> `scipy.optimize()`. Generally, the model function and data together imply >> sensible or at least guessable default values, but these cannot independent >> of model or data. In lmfit we do not permit the user to not supply default >> starting values -- default parameter values are `None` which will quickly >> raise a ValueError. I don't recall ever getting asked to change this. >> Because it should be obvious to all users that each parameter requires an >> initial value. Where appropriate and possible, we do provide methods for >> model functions to make initial guesses based on data. But again, the >> starting values always depend on model function and data. >> >> Default arguments *do* matter from a UX perspective when defaults are >> sensible. `curve_fit` has three required positional arguments: a model >> function (that must be annoying to have to provide), "y" data to be fit >> (well, I guess I have that), and "x" data. Why are those all required? >> Why not allow `func=None` to be a function that calculates a sine wave? >> Why not allow `y=None` to mean `np.ones(1000)`? Why not allow `x=None` to >> mean `np.arange(len(y))`? Wouldn't that be friendlier to the user? >> >> >> > But this behavior strikes me as utterly wrong and a disservice to >>> the scipy ecosystem. I do not think that a documentation change is >>> sufficient. >>> >>> Maybe a bit overzealous? >>> >>> >> Nope, just trying to help `curve_fit()` work better. >> >> --Matt >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Jan 30 15:22:06 2019 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 30 Jan 2019 15:22:06 -0500 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: <20190124184544.s33zdmtecorrogtz@carbo> Message-ID: On Wed, Jan 30, 2019 at 2:27 PM wrote: > > > On Wed, Jan 30, 2019 at 1:48 PM Ilhan Polat wrote: > >> I am a frequent user of this function. I am also occasionally teaching >> control theory and I use this also professionally, I provided a proper use >> case as a "user". You already admitted that you don't use it. So excuse me >> when I say that I have more experience than you about this function API >> (please read this twice; function API not the underlying theory). How can I >> even provide evidence on this completely subjective matter? Here is all the >> issues related to curve_fit >> >> https://github.com/scipy/scipy/search?q=curve_fit&type=Issues >> >> As far as I know there was no complaint so far. Hence that means it is >> really not that big of a deal that would grant such tone. I don't know what >> to add other than what I already provided. You cannot make educated guesses >> about initial points on nonconvex searches. As you mentioned, we are as >> blind as np.ones(n) choice. That's just being pedantic about the API. >> Making the function api more annoying than it has to be is for me a wrong >> choice or even unpythonic if you are among that kind of crowd. A wrongness >> that matlab and alike software continuously annoy with their clunky UI. >> Users are not ours to educate. And if something is None then it means the >> code will make up some values not the correct ones; as clearly documented, >> some values. If you have better ones provide them. >> >> Having said that, I am getting a lot of "horrible, utterly wrong, >> obstinate, disservice" etc. that makes me uncomfortable and for me it's >> past beyond the discussion. I am not good at interwebz so I'd better stop >> here. We are talking about a simple function argument being required or >> optional that is essentially a made-up array. Thus I would like to reserve >> these words for more important occasions It's just a python library I am >> contributing to so I don't want to be involved in this particular issue any >> more. Since I am clearly biased on one side, I leave it to other members to >> decide, I am fine with any outcome. >> >> Best, >> > > Except that I am not a user of curve_fit, I agree completely with Ilhan. > > > Actually, I think `ones` is one of the most reasonable default choices. > Most cases are not in an arbitrary space of functions. The > parameterization is often chosen to have nice positive values for > interpretability. For example, I think that all (or almost all) parameters > in scipy.stats distributions are positive or nonnegative (except for loc > where any number is possible). > > Based on a brief browsing of recent stackoverflow questions, it looks like > there are many possible problems with curve_fit which is an inherent > problem with nonlinear optimization in general. > But I think that specifying starting values if the results don't look > nice, should be an obvious solution to users (especially with improved > docstring for curve_fit). > Many other problems on stackoverflow seem to be that users don't use a > good parameterization of their function. > Starting values is just one possible source of problems, and a user needs > to be willing to investigate those when the first try doesn't work. (*) > > In the rest of scipy.optimize (and in related functions in statsmodels) > there is no default starting values also because there is no information > about the number of parameters (or length or the parameter) available. > curve_fit using inspect and args was designed for making automatic starting > values possible. > > > "If at first you don't succeed, try, try again." > > > (I'm strongly in favor of trying "defaults" first, and if that doesn't > work, then dig into or debug likely candidates. in loose analogy of test > driven development instead of up-front design.) > This finally reminded me that I do have a similar example with default starting values, although with fmin and not leastsq. scipy had around 94 distributions when I started, and maybe around 65 (IIRC) continuous distributions with a fit method. This was too many to go through every distribution individually. Essentially the only information available in general is the number of parameters. The way I worked initially was to start with some generic defaults, and then work my way through the failing cases. Nice cases like normal and similar work out of the box (scale=1 is actually a good choice for default, mean=1 is arbitrary but no problem). Later we added fitstart for individual distributions with reasonable starting values that can be inferred from the data. Some distribution don't have a well behaved loglikelihood functions, and AFAIK, they still don't work "out of the box". Each stage of refinement requires more work for those cases that have failed all previous stages. However, the number of cases that are left is shrinking so the heavy work is mostly for nasty cases. (It's pretty much the same in statsmodels. We have a large number of models and very simple default starting parameters. In some cases or for some datasets this works fine. Other cases failed or still fail and I spent several months overall to improve starting values and numerical stability for those, not always with full success. But I don't "waste" that time on cases that work fine out of the box, i.e. with simple starting values.) Josef > > Currently no user is prevented from specifying starting values. > After the change everyone is forced to add this additional step, just > because some users are surprised that nonlinear optimization doesn't always > work (out of the box). > > Josef > > > > >> >> >> >> >> >> On Wed, Jan 30, 2019 at 5:05 PM Matt Newville >> wrote: >> >>> Hi Ilhan, >>> >>> On Tue, Jan 29, 2019 at 10:54 AM Ilhan Polat >>> wrote: >>> >>>> > The problem I have with this is that there really is not an option to >>>> "try automatic first". There is "try `np.ones(n_variables)` first". >>>> This, or any other value, is really not a defensible choice for starting >>>> values. Starting values always depend on the function used and the >>>> data being fit. >>>> >>>> Why not? 1.s are as good as any other choice. >>>> >>> >>> Well, I agree that `np.ones(n_variables)` is as good as any other >>> choice. All default choices are horrible and not defensible. >>> >>> Mathematically, algorithmically, and conceptually, initial values ARE >>> REQUIRED for non-linear least squares optimization. The codes underlying >>> `curve_fit()` (including `leastsq` and the mess that is `least_square`) do >>> not permit the user to not provide initial values. The algorithms used >>> simply do not make sense without initial values. >>> >>> Programmatically, an optional keyword argument to a function, say with a >>> default value of `None`, implies that there is a sensible default for that >>> value. Thus, default fitting tolerances might be 1.e-7 (or, perhaps square >>> root of machine precision) or the default value for "method to calculate >>> jacobian" might be `None` to mean "calculate by finite difference". For >>> these optional inputs, the function (say, `curve_fit()`) has a sensible >>> default value that will work independently from the other input. >>> >>> That notion of "independent, sensible default value" is not ever >>> possible for the initial values `p0` for `curve_fit`. Sensible initial >>> values always depend on the data to be fit and the function modeling the >>> data. Change the data values dramatically and `p0` must change. Change >>> the definition of the function (or even the order of the arguments), and >>> `p0` must change. There is not and cannot be a sensible default. >>> >>> Telling the user that `p0` is optional and can be `None` (as the current >>> documentation does clearly state) is utterly and profoundly wrong. It is >>> mathematically indefensible. It is horrible API design. It harms the >>> integretiy of `scipy.optimize` to tell the user this. >>> >>> I don't know anything about the curve fit I will get in the end. So I >>>> don't need to pretend that I know a good starting value. >>>> >>> >>> It is not possible to do curve-fitting or non-linear least-squares >>> minimization when you "don't know anything". The user MUST provide data to >>> be modeled and MUST provide a function to model that data. It is >>> "pretending" to think that this is sufficient. The user also must provide >>> initial values. >>> >>> Maybe for 3 parameter functions, fine I can come up with an argument but >>>> you surely don't expect me to know the starting point if I am fitting a 7 >>>> parameter func involving esoteric structure. At that point I am completely >>>> ignorant about anything about this function. So not knowing where to start >>>> is not due to my noviceness about the tools but because by definition. My >>>> search might even turn out to be convex so initial value won't matter. >>>> >>> >>> It is not possible to do curve-fitting or non-linear least-squares >>> minimization when one is completely ignorant of the the function. >>> >>> >>> >>>> > Currently `curve_fit` converts `p0=None` to `np.ones(n_variables)` >>>> without warning or explanation. Again, I do not use `curve_fit()` myself. >>>> I find several aspects of it unpleasant. >>>> >>>> It is documented in the p0 argument docs. I am using this function >>>> quite often. That's why I don't like extra required arguments. It's >>>> annoying to enter some random array just to please the API where I know >>>> that I am just taking a shot in the dark. >>>> >>> >>> It is not an extra keyword argument. It is required input for the >>> problem. `curve_fit()` is converting your feigned (or perhaps obstinate) >>> ignorance to a set of starting values for you. But starting values simply >>> cannot be independent of the input model function or input data. >>> >>> >>> I am pretty confident that if we force this argument most of the people >>>> you want to educate will enter np.zeros(n). Then they will get an even >>>> weirder error then they'll try np.ones(n) but misremember n then they get >>>> another error to remember the func parameter number which has already >>>> trippped up twice. This curve_fit function is one of those functions that >>>> you don't run just once and be done with it but over and over again until >>>> you give up or satisfied. Hence defaults matter a lot from a UX >>>> perspective. "If you have an initial value in mind fine enter it otherwise >>>> let me do my thing" is much better than "I don't care about your quick >>>> experiment give me some values or I will keep tripping up". >>>> >>>> >>> I refuse to speculate on what "most users" will do, and I also refuse to >>> accept your speculation on this without evidence. There a great many >>> applications of curve-fitting for which `np.ones(n_variables)` and >>> `np.zeros(n_variables)` will completely fail -- the fit will never move >>> from the starting point. For the kinds of fits done in the programs I >>> support, either of these would mean that essentially all fits would never >>> move from its starting point, as at least one parameter being 0 or 1 would >>> essentially always mean the model function was 0 over the full data range. >>> >>> >>> But, again, I don't use `curve_fit` but other tools built on top of >>> `scipy.optimize()`. Generally, the model function and data together imply >>> sensible or at least guessable default values, but these cannot independent >>> of model or data. In lmfit we do not permit the user to not supply default >>> starting values -- default parameter values are `None` which will quickly >>> raise a ValueError. I don't recall ever getting asked to change this. >>> Because it should be obvious to all users that each parameter requires an >>> initial value. Where appropriate and possible, we do provide methods for >>> model functions to make initial guesses based on data. But again, the >>> starting values always depend on model function and data. >>> >>> Default arguments *do* matter from a UX perspective when defaults are >>> sensible. `curve_fit` has three required positional arguments: a model >>> function (that must be annoying to have to provide), "y" data to be fit >>> (well, I guess I have that), and "x" data. Why are those all required? >>> Why not allow `func=None` to be a function that calculates a sine wave? >>> Why not allow `y=None` to mean `np.ones(1000)`? Why not allow `x=None` to >>> mean `np.arange(len(y))`? Wouldn't that be friendlier to the user? >>> >>> >>> > But this behavior strikes me as utterly wrong and a disservice to >>>> the scipy ecosystem. I do not think that a documentation change is >>>> sufficient. >>>> >>>> Maybe a bit overzealous? >>>> >>>> >>> Nope, just trying to help `curve_fit()` work better. >>> >>> --Matt >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at python.org >>> https://mail.python.org/mailman/listinfo/scipy-dev >>> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Jan 30 15:24:05 2019 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 30 Jan 2019 12:24:05 -0800 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: <20190124184544.s33zdmtecorrogtz@carbo> Message-ID: On Wed, Jan 30, 2019 at 10:48 AM Ilhan Polat wrote: > I am a frequent user of this function. I am also occasionally teaching > control theory and I use this also professionally, I provided a proper use > case as a "user". You already admitted that you don't use it. So excuse me > when I say that I have more experience than you about this function API > (please read this twice; function API not the underlying theory). How can I > even provide evidence on this completely subjective matter? Here is all the > issues related to curve_fit > > https://github.com/scipy/scipy/search?q=curve_fit&type=Issues > > As far as I know there was no complaint so far. > StackOverflow is likely a better place to look for evidence: https://stackoverflow.com/questions/27097957/my-use-of-scipy-curve-fit-does-not-seem-to-work-well https://stackoverflow.com/questions/23828226/scipy-curve-fit-does-not-seem-to-change-the-initial-parameters https://stackoverflow.com/questions/53509550/error-in-scipy-curve-fit-for-more-than-two-parameters > Hence that means it is really not that big of a deal that would grant such > tone. I don't know what to add other than what I already provided. You > cannot make educated guesses about initial points on nonconvex searches. As > you mentioned, we are as blind as np.ones(n) choice. > That is certainly not the case. It is impossible to do so generally for every arbitrary nonconvex problem, which is why we can't automate it, but there are tons of problems where one has a reasonable idea from priors, examining the plotted data, or heuristic algorithms (e.g. finding the peak location and amplitude with a peak finder). -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Wed Jan 30 15:25:38 2019 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Wed, 30 Jan 2019 12:25:38 -0800 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: <20190124184544.s33zdmtecorrogtz@carbo> Message-ID: <20190130202538.k2sb3o63numeqxjp@carbo> On Wed, 30 Jan 2019 14:27:29 -0500, josef.pktd at gmail.com wrote: > (I'm strongly in favor of trying "defaults" first, and if that doesn't > work, then dig into or debug likely candidates. in loose analogy of test > driven development instead of up-front design.) It seems unlikely that we will reach full agreement in this thread, given the differing experiences and philosophies at play. But, that's probably OK if we can all agree to modify the documentation to be clearer about the risks of the preset values for p0, how to select better values, and how to handle failure modes. This won't 100% address Matt's concerns, but it will go a long way to keeping users out of trouble, without having to make breaking changes to the API. What do you think? Best regards, St?fan From josef.pktd at gmail.com Wed Jan 30 15:37:07 2019 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 30 Jan 2019 15:37:07 -0500 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: <20190130202538.k2sb3o63numeqxjp@carbo> References: <20190124184544.s33zdmtecorrogtz@carbo> <20190130202538.k2sb3o63numeqxjp@carbo> Message-ID: On Wed, Jan 30, 2019 at 3:25 PM Stefan van der Walt wrote: > On Wed, 30 Jan 2019 14:27:29 -0500, josef.pktd at gmail.com wrote: > > (I'm strongly in favor of trying "defaults" first, and if that doesn't > > work, then dig into or debug likely candidates. in loose analogy of test > > driven development instead of up-front design.) > > It seems unlikely that we will reach full agreement in this thread, > given the differing experiences and philosophies at play. But, that's > probably OK if we can all agree to modify the documentation to be clearer > about the risks of the preset values for p0, how to select better > values, and how to handle failure modes. > > This won't 100% address Matt's concerns, but it will go a long way to > keeping users out of trouble, without having to make breaking changes to > the API. > > What do you think? > I fully agree with that part. I think docstrings and tutorial are the places to "educate" users. I pointed out early in this thread, that the current documentation does not have examples with starting values and does not emphasize their importance. Also more explicit warnings on failure would be an obvious and not very intrusive ex-post reminder, IMO. I'm mainly arguing that forcing users to come up with random or meaningful starting values up-front is not in the (initial) "spirit" of curve_fit. Josef > > Best regards, > St?fan > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilhanpolat at gmail.com Wed Jan 30 15:39:11 2019 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Wed, 30 Jan 2019 21:39:11 +0100 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: <20190124184544.s33zdmtecorrogtz@carbo> Message-ID: For SO questions, 1st one agreed, 2nd one look at the given values. 3rd one have a look at the first line and also the comments. That's actually a counter argument. > That is certainly not the case. It is impossible to do so generally for every arbitrary nonconvex problem, which is why we can't automate it, but there are tons of problems where one has a reasonable idea from priors, examining the plotted data, or heuristic algorithms (e.g. finding the peak location and amplitude with a peak finder). Yes exactly my point. If you have a better idea put it in. p0 is not going to reject your guess but insist on 1.s. But I can give you easily tons of problems you can not give a proper guess. That is simply impossible because that defeats the purpose of why we use this tool. On Wed, Jan 30, 2019 at 9:25 PM Robert Kern wrote: > On Wed, Jan 30, 2019 at 10:48 AM Ilhan Polat wrote: > >> I am a frequent user of this function. I am also occasionally teaching >> control theory and I use this also professionally, I provided a proper use >> case as a "user". You already admitted that you don't use it. So excuse me >> when I say that I have more experience than you about this function API >> (please read this twice; function API not the underlying theory). How can I >> even provide evidence on this completely subjective matter? Here is all the >> issues related to curve_fit >> >> https://github.com/scipy/scipy/search?q=curve_fit&type=Issues >> >> As far as I know there was no complaint so far. >> > > StackOverflow is likely a better place to look for evidence: > > > https://stackoverflow.com/questions/27097957/my-use-of-scipy-curve-fit-does-not-seem-to-work-well > > https://stackoverflow.com/questions/23828226/scipy-curve-fit-does-not-seem-to-change-the-initial-parameters > > https://stackoverflow.com/questions/53509550/error-in-scipy-curve-fit-for-more-than-two-parameters > > >> Hence that means it is really not that big of a deal that would grant >> such tone. I don't know what to add other than what I already provided. You >> cannot make educated guesses about initial points on nonconvex searches. As >> you mentioned, we are as blind as np.ones(n) choice. >> > > That is certainly not the case. It is impossible to do so generally for > every arbitrary nonconvex problem, which is why we can't automate it, but > there are tons of problems where one has a reasonable idea from priors, > examining the plotted data, or heuristic algorithms (e.g. finding the peak > location and amplitude with a peak finder). > > -- > Robert Kern > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From deak.andris at gmail.com Wed Jan 30 16:38:03 2019 From: deak.andris at gmail.com (Andras Deak) Date: Wed, 30 Jan 2019 22:38:03 +0100 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: <20190124184544.s33zdmtecorrogtz@carbo> Message-ID: One reason why I think this has become a surprisingly controversial point is that on one hand curve_fit is a convenience function and it works in a lot of cases as-is (and breaking APIs is difficult), but on the other hand the guesswork involved in the default p0 choice goes against "in the face of ambiguity, refuse the temptation to guess" of native python. There are probably plenty of other cases where scipy makes pragmatic choices which might otherwise seem "unpythonic", and in case of such a high-level front-end I believe we're better off with the current API, given that the documentation is made clear enough when it comes to dangers of not passing a p0. But I can see why this arbitrary choice for an important input parameter can be seen as a problem. Andr?s On Wed, Jan 30, 2019 at 9:39 PM Ilhan Polat wrote: > > For SO questions, 1st one agreed, 2nd one look at the given values. 3rd one have a look at the first line and also the comments. That's actually a counter argument. > > > That is certainly not the case. It is impossible to do so generally for every arbitrary nonconvex problem, which is why we can't automate it, but there are tons of problems where one has a reasonable idea from priors, examining the plotted data, or heuristic algorithms (e.g. finding the peak location and amplitude with a peak finder). > > Yes exactly my point. If you have a better idea put it in. p0 is not going to reject your guess but insist on 1.s. But I can give you easily tons of problems you can not give a proper guess. That is simply impossible because that defeats the purpose of why we use this tool. > > > > > > On Wed, Jan 30, 2019 at 9:25 PM Robert Kern wrote: >> >> On Wed, Jan 30, 2019 at 10:48 AM Ilhan Polat wrote: >>> >>> I am a frequent user of this function. I am also occasionally teaching control theory and I use this also professionally, I provided a proper use case as a "user". You already admitted that you don't use it. So excuse me when I say that I have more experience than you about this function API (please read this twice; function API not the underlying theory). How can I even provide evidence on this completely subjective matter? Here is all the issues related to curve_fit >>> >>> https://github.com/scipy/scipy/search?q=curve_fit&type=Issues >>> >>> As far as I know there was no complaint so far. >> >> >> StackOverflow is likely a better place to look for evidence: >> >> https://stackoverflow.com/questions/27097957/my-use-of-scipy-curve-fit-does-not-seem-to-work-well >> https://stackoverflow.com/questions/23828226/scipy-curve-fit-does-not-seem-to-change-the-initial-parameters >> https://stackoverflow.com/questions/53509550/error-in-scipy-curve-fit-for-more-than-two-parameters >> >>> >>> Hence that means it is really not that big of a deal that would grant such tone. I don't know what to add other than what I already provided. You cannot make educated guesses about initial points on nonconvex searches. As you mentioned, we are as blind as np.ones(n) choice. >> >> >> That is certainly not the case. It is impossible to do so generally for every arbitrary nonconvex problem, which is why we can't automate it, but there are tons of problems where one has a reasonable idea from priors, examining the plotted data, or heuristic algorithms (e.g. finding the peak location and amplitude with a peak finder). >> >> -- >> Robert Kern >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at python.org >> https://mail.python.org/mailman/listinfo/scipy-dev > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev From newville at cars.uchicago.edu Thu Jan 31 09:30:09 2019 From: newville at cars.uchicago.edu (Matt Newville) Date: Thu, 31 Jan 2019 08:30:09 -0600 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: <20190130202538.k2sb3o63numeqxjp@carbo> References: <20190124184544.s33zdmtecorrogtz@carbo> <20190130202538.k2sb3o63numeqxjp@carbo> Message-ID: On Wed, Jan 30, 2019 at 2:26 PM Stefan van der Walt wrote: > On Wed, 30 Jan 2019 14:27:29 -0500, josef.pktd at gmail.com wrote: > > (I'm strongly in favor of trying "defaults" first, and if that doesn't > > work, then dig into or debug likely candidates. in loose analogy of test > > driven development instead of up-front design.) > > It seems unlikely that we will reach full agreement in this thread, > given the differing experiences and philosophies at play. But, that's > probably OK if we can all agree to modify the documentation to be clearer > about the risks of the preset values for p0, how to select better > values, and how to handle failure modes. > > This won't 100% address Matt's concerns, but it will go a long way to > keeping users out of trouble, without having to make breaking changes to > the API. > > What do you think? > > Well, I'm not sure that agreement here should be the sole driver for what scipy developers do. There will be disagreements in design philosophy, and someone needs to be willing and able to make decisions in such situations. I do not know who is making such decisions or reviewing changes in `scipy.optimize`, but it appears to me that this has suffered for awhile, leaving conceptual, interface, and organizational messes. I thought I would try to help by cleaning up one of the most egregious and simplest of these. The documentation for `curve_fit` does currrently state that `p0=None` is converted to `np.ones(n_variables)`. It appears that some view this as sufficient and that these folks view some automated assignment of initial values is useful, even while acknowledging that it cannot be correct in general. The argument for requiring initial values might be summarized as "initial values are actually required". The argument against might be summarized as "we don't want to change the current behavior". Anyway, I am perfectly willing to lose this argument (I do not use `curve_fit` myself, and do not feel compelled to support its use), and the decision is not mine to make. I do hope someone sensible is making these decisions. Cheers, --Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jan 31 10:04:41 2019 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 31 Jan 2019 10:04:41 -0500 Subject: [SciPy-Dev] curve_fit() should require initial values for parameters In-Reply-To: References: <20190124184544.s33zdmtecorrogtz@carbo> <20190130202538.k2sb3o63numeqxjp@carbo> Message-ID: On Thu, Jan 31, 2019 at 9:30 AM Matt Newville wrote: > > > On Wed, Jan 30, 2019 at 2:26 PM Stefan van der Walt > wrote: > >> On Wed, 30 Jan 2019 14:27:29 -0500, josef.pktd at gmail.com wrote: >> > (I'm strongly in favor of trying "defaults" first, and if that doesn't >> > work, then dig into or debug likely candidates. in loose analogy of test >> > driven development instead of up-front design.) >> >> It seems unlikely that we will reach full agreement in this thread, >> given the differing experiences and philosophies at play. But, that's >> probably OK if we can all agree to modify the documentation to be clearer >> about the risks of the preset values for p0, how to select better >> values, and how to handle failure modes. >> >> This won't 100% address Matt's concerns, but it will go a long way to >> keeping users out of trouble, without having to make breaking changes to >> the API. >> >> What do you think? >> >> > Well, I'm not sure that agreement here should be the sole driver for what > scipy developers do. There will be disagreements in design philosophy, and > someone needs to be willing and able to make decisions in such situations. > I do not know who is making such decisions or reviewing changes in > `scipy.optimize`, but it appears to me that this has suffered for awhile, > leaving conceptual, interface, and organizational messes. I thought I > would try to help by cleaning up one of the most egregious and simplest of > these. > > The documentation for `curve_fit` does currrently state that `p0=None` is > converted to `np.ones(n_variables)`. It appears that some view this as > sufficient and that these folks view some automated assignment of initial > values is useful, even while acknowledging that it cannot be correct in > general. > > The argument for requiring initial values might be summarized as "initial > values are actually required". > The argument against might be summarized as "we don't want to change the > current behavior". > > Anyway, I am perfectly willing to lose this argument (I do not use > `curve_fit` myself, and do not feel compelled to support its use), and the > decision is not mine to make. I do hope someone sensible is making these > decisions. > > I was only the stats reviewer for curve_fit, and never had any real stake in the API. Looking at the last heavily discussed change in curve_fit that I was involved in, I found this https://github.com/scipy/scipy/pull/3098#issuecomment-29837264 and a few followup comments. Josef > Cheers, > > --Matt > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Jan 31 19:07:21 2019 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 31 Jan 2019 17:07:21 -0700 Subject: [SciPy-Dev] NumPy 1.16.1 release Message-ID: Hi All, On behalf of the NumPy team I'm pleased to announce the release of NumPy 1.16.1. This release fixes bugs reported against the 1.16.0 release, and also backports several enhancements from master that seem appropriate for a release series that is the last to support Python 2.7. The supported Python versions are 2.7 and 3.5-3.7. The wheels on PyPI are linked with OpenBLAS v0.3.4+, which should fix the known threading issues found in previous OpenBLAS versions. Downstream developers building this release should use Cython >= 0.29.2 and, if using OpenBLAS, OpenBLAS > v0.3.4. If you are installing using pip, you may encounter a problem with older installed versions of NumPy that pip did not delete becoming mixed with the current version, resulting in an *ImportError*. That problem is particularly common on Debian derived distributions due to a modified pip. The fix is to make sure all previous NumPy versions installed by pip have been removed. See #12736 for discussion of the issue. Note that previously this problem resulted in an *AttributeError*. Wheels for this release can be downloaded from PyPI , source archives and release notes are available from Github . *Enhancements* * #12767: ENH: add mm->q floordiv * #12768: ENH: port np.core.overrides to C for speed * #12769: ENH: Add np.ctypeslib.as_ctypes_type(dtype), improve `np.ctypeslib.as_ctypes` * #12773: ENH: add "max difference" messages to np.testing.assert_array_equal... * #12820: ENH: Add mm->qm divmod * #12890: ENH: add _dtype_ctype to namespace for freeze analysis *Contributors* A total of 16 people contributed to this release. People with a "+" by their names contributed a patch for the first time. * Antoine Pitrou * Arcesio Castaneda Medina + * Charles Harris * Chris Markiewicz + * Christoph Gohlke * Christopher J. Markiewicz + * Daniel Hrisca + * EelcoPeacs + * Eric Wieser * Kevin Sheppard * Matti Picus * OBATA Akio + * Ralf Gommers * Sebastian Berg * Stephan Hoyer * Tyler Reddy Cheers, Charles Harris -------------- next part -------------- An HTML attachment was scrubbed... URL: