From evgeny.burovskiy at gmail.com Thu Sep 1 04:39:19 2016 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Thu, 1 Sep 2016 11:39:19 +0300 Subject: [SciPy-Dev] scipy 0.18.1 release Message-ID: Hi, It seems to me that we have enough material for a bugfix release 0.18.1. Here's the milestone contents: https://github.com/scipy/scipy/milestone/31 At the moment, remaining issues without a fix are: * A regression in stats.circmean, https://github.com/scipy/scipy/issues/6420; there is a PR which needs finishing https://github.com/scipy/scipy/pull/6424. The PR just needs a test. * A regression in stats.ks_2samp, https://github.com/scipy/scipy/issues/6435. >From the discussion in the issue, it seems that https://github.com/scipy/scipy/pull/5938 changed behavior more than expected, and figuring out the "correct" thing to do needs more effort that anyone can spend in the short term. So the suggestion is to undo gh-5938 for 0.18.1. If someone thinks something else needs to be included please speak up, here in this thread or comment on a relevant GH issue. I think we should aim to release 0.18.1 by the end of September. Thoughts? Cheers, Evgeni From ralf.gommers at gmail.com Fri Sep 2 23:31:14 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 3 Sep 2016 15:31:14 +1200 Subject: [SciPy-Dev] scipy 0.18.1 release In-Reply-To: References: Message-ID: On Thu, Sep 1, 2016 at 8:39 PM, Evgeni Burovski wrote: > Hi, > > It seems to me that we have enough material for a bugfix release 0.18.1. > > Here's the milestone contents: > https://github.com/scipy/scipy/milestone/31 > > At the moment, remaining issues without a fix are: > > * A regression in stats.circmean, > https://github.com/scipy/scipy/issues/6420; there is a PR which needs > finishing https://github.com/scipy/scipy/pull/6424. The PR just needs > a test. > > * A regression in stats.ks_2samp, https://github.com/scipy/ > scipy/issues/6435. > From the discussion in the issue, it seems that > https://github.com/scipy/scipy/pull/5938 changed behavior more than > expected, and figuring out the "correct" thing to do needs more effort > that anyone can spend in the short term. So the suggestion is to undo > gh-5938 for 0.18.1. > > If someone thinks something else needs to be included please speak up, > here in this thread or comment on a relevant GH issue. > > I think we should aim to release 0.18.1 by the end of September. Thoughts? > Sounds good to me. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Sep 3 20:13:56 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 4 Sep 2016 12:13:56 +1200 Subject: [SciPy-Dev] Fwd: PyData Community Cookbook - August Update In-Reply-To: References: Message-ID: Hi all, Is anyone interested to write or contribute to a chapter about SciPy for a PyData Community Cookbook? We're a bit late (but so are most people), so ideally we get this organized within a day or two. It can be a single-author or multi-author effort. The projects that submitted an abstract so far all seem to do 2 or 3 authors: https://github.com/pydata/pydata-cookbook. I'm happy to contribute, or if there's a lot of interest leave it to others. Would like to see it happen though - SciPy should really not be missing in this book. Cheers, Ralf ---------- Forwarded message ---------- From: Andy Ray Terrel Date: Sun, Aug 21, 2016 at 12:14 AM Subject: PyData Community Cookbook - August Update To: Cc: pydata-cookbook at numfocus.org Hello everyone, You are receiving this email because you were either invited and committed to join our project. Please feel free to forward this message to a more appropriate list or person. For questions please email pydata-cookbook at numfocus.org. Katy Huff and myself are starting a project to build a cookbook of advanced material for the PyData community. The cookbook will be published by Addison-Wesley. We have invited a number of contributors to see if such a project would have some interest and received overwhelmingly positive feedback. The book will cover several major topics, organized as such, with some sample packages: - IDE: IPython/Jupyter - Data Structures / Numerics: NumPy, Pandas, Xray, PyTables - Viz: Matplotlib, Bokeh, Seaborn, yt - Algorithms / Science: SciPy, Scikit-learn, Scikit-image, statsmodels, sympy, gensim - Performance / Scale: Cython, Numexpr, Numba, Dask, pyspark We expect each submission to be about 15 - 20 pages describing an example of the power of each library. While we have reached out to the projects about putting each submission together we are happy to accept chapters for libraries we did not initially identify. To facilitate the book we have put together a repository for collecting and reviewing submissions at https://github.com/pydata/pydata-cookbook . We are asking for submissions in rst but would appreciate any other files, such as jupyter notebooks or code, for a digital appendix as well. If you read this far and are interested in contributing. The proposed schedule is the following: Sept 1: Submit a pull request with a title, abstract and author list for the submission. Nov 15: Submit a completed chapter. Dec 31: Reviews for chapters finished. Jan 31: All chapter revisions due. Thanks for you time! -- Andy R. Terrel, PhD President, NumFOCUS andy at numfocus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Sep 3 21:26:27 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 3 Sep 2016 19:26:27 -0600 Subject: [SciPy-Dev] Fwd: PyData Community Cookbook - August Update In-Reply-To: References: Message-ID: On Sat, Sep 3, 2016 at 6:13 PM, Ralf Gommers wrote: > Hi all, > > Is anyone interested to write or contribute to a chapter about SciPy for a > PyData Community Cookbook? We're a bit late (but so are most people), so > ideally we get this organized within a day or two. It can be a > single-author or multi-author effort. The projects that submitted an > abstract so far all seem to do 2 or 3 authors: https://github.com/pydata/ > pydata-cookbook. > > I'm happy to contribute, or if there's a lot of interest leave it to > others. Would like to see it happen though - SciPy should really not be > missing in this book. > > I had several projects that might have been interesting, but the code was written for work and I would have needed to get permission from my former employer to publish it... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Sat Sep 3 21:57:12 2016 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sat, 3 Sep 2016 21:57:12 -0400 Subject: [SciPy-Dev] Fwd: PyData Community Cookbook - August Update In-Reply-To: References: Message-ID: On Sat, Sep 3, 2016 at 8:13 PM, Ralf Gommers wrote: > Hi all, > > Is anyone interested to write or contribute to a chapter about SciPy for a > PyData Community Cookbook? We're a bit late (but so are most people), so > ideally we get this organized within a day or two. It can be a > single-author or multi-author effort. The projects that submitted an > abstract so far all seem to do 2 or 3 authors: https://github.com/pydata/ > pydata-cookbook. > > I'm happy to contribute, or if there's a lot of interest leave it to > others. Would like to see it happen though - SciPy should really not be > missing in this book. > > I would be happy to contribute. Andy said "We expect each submission to be about 15 - 20 pages describing an example of the power of each library." SciPy has a pretty diverse collection of subpackages, so the first question I have is whether we try to find one big example that uses several of the subpackages, or instead provide several examples, each focused primarily on one of the subpackages. Warren Cheers, > Ralf > > > > ---------- Forwarded message ---------- > From: Andy Ray Terrel > Date: Sun, Aug 21, 2016 at 12:14 AM > Subject: PyData Community Cookbook - August Update > To: > Cc: pydata-cookbook at numfocus.org > > > Hello everyone, > > You are receiving this email because you were either invited and committed > to join our project. Please feel free to forward this message to a more > appropriate list or person. For questions please email > pydata-cookbook at numfocus.org. > > Katy Huff and myself are starting a project to build a cookbook of > advanced material for the PyData community. The cookbook will be published > by Addison-Wesley. We have invited a number of contributors to see if such > a project would have some interest and received overwhelmingly positive > feedback. > > The book will cover several major topics, organized as such, with some > sample packages: > > - IDE: IPython/Jupyter > - Data Structures / Numerics: NumPy, Pandas, Xray, PyTables > - Viz: Matplotlib, Bokeh, Seaborn, yt > - Algorithms / Science: SciPy, Scikit-learn, Scikit-image, statsmodels, > sympy, gensim > - Performance / Scale: Cython, Numexpr, Numba, Dask, pyspark > > > We expect each submission to be about 15 - 20 pages describing an example > of the power of each library. While we have reached out to the projects > about putting each submission together we are happy to accept chapters for > libraries we did not initially identify. > > To facilitate the book we have put together a repository for collecting > and reviewing submissions at https://github.com/pydata/pydata-cookbook . > We are asking for submissions in rst but would appreciate any other files, > such as jupyter notebooks or code, for a digital appendix as well. > > If you read this far and are interested in contributing. The proposed > schedule is the following: > > Sept 1: Submit a pull request with a title, abstract and author list for > the submission. > Nov 15: Submit a completed chapter. > Dec 31: Reviews for chapters finished. > Jan 31: All chapter revisions due. > > Thanks for you time! > > > -- > Andy R. Terrel, PhD > President, NumFOCUS > andy at numfocus.org > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Sep 3 22:07:31 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 4 Sep 2016 14:07:31 +1200 Subject: [SciPy-Dev] Fwd: PyData Community Cookbook - August Update In-Reply-To: References:

Message-ID: On Sun, Sep 4, 2016 at 1:57 PM, Warren Weckesser wrote: > > > On Sat, Sep 3, 2016 at 8:13 PM, Ralf Gommers > wrote: > >> Hi all, >> >> Is anyone interested to write or contribute to a chapter about SciPy for >> a PyData Community Cookbook? We're a bit late (but so are most people), so >> ideally we get this organized within a day or two. It can be a >> single-author or multi-author effort. The projects that submitted an >> abstract so far all seem to do 2 or 3 authors: >> https://github.com/pydata/pydata-cookbook. >> >> I'm happy to contribute, or if there's a lot of interest leave it to >> others. Would like to see it happen though - SciPy should really not be >> missing in this book. >> >> > > I would be happy to contribute. > > Andy said "We expect each submission to be about 15 - 20 pages describing > an example of the power of each library." SciPy has a pretty diverse > collection of subpackages, so the first question I have is whether we try > to find one big example that uses several of the subpackages, or instead > provide several examples, each focused primarily on one of the subpackages. > Either way could work I think. It's related to the choice of which subpackages to focus on. I'd suggest a selection out of the more polished and actively developed higher-level ones: interpolate, optimize, signal, sparse, spatial, stats. Things like linalg, special, fftpack are used mostly under the hood, hard to make standalone nice examples from. Maybe two examples of 10 pages each that use 2-3 subpackages? Ralf > >> >> >> ---------- Forwarded message ---------- >> From: Andy Ray Terrel >> Date: Sun, Aug 21, 2016 at 12:14 AM >> Subject: PyData Community Cookbook - August Update >> To: >> Cc: pydata-cookbook at numfocus.org >> >> >> Hello everyone, >> >> You are receiving this email because you were either invited and >> committed to join our project. Please feel free to forward this message to >> a more appropriate list or person. For questions please email >> pydata-cookbook at numfocus.org. >> >> Katy Huff and myself are starting a project to build a cookbook of >> advanced material for the PyData community. The cookbook will be published >> by Addison-Wesley. We have invited a number of contributors to see if such >> a project would have some interest and received overwhelmingly positive >> feedback. >> >> The book will cover several major topics, organized as such, with some >> sample packages: >> >> - IDE: IPython/Jupyter >> - Data Structures / Numerics: NumPy, Pandas, Xray, PyTables >> - Viz: Matplotlib, Bokeh, Seaborn, yt >> - Algorithms / Science: SciPy, Scikit-learn, Scikit-image, statsmodels, >> sympy, gensim >> - Performance / Scale: Cython, Numexpr, Numba, Dask, pyspark >> >> >> We expect each submission to be about 15 - 20 pages describing an example >> of the power of each library. While we have reached out to the projects >> about putting each submission together we are happy to accept chapters for >> libraries we did not initially identify. >> >> To facilitate the book we have put together a repository for collecting >> and reviewing submissions at https://github.com/pydata/pydata-cookbook . >> We are asking for submissions in rst but would appreciate any other files, >> such as jupyter notebooks or code, for a digital appendix as well. >> >> If you read this far and are interested in contributing. The proposed >> schedule is the following: >> >> Sept 1: Submit a pull request with a title, abstract and author list for >> the submission. >> Nov 15: Submit a completed chapter. >> Dec 31: Reviews for chapters finished. >> Jan 31: All chapter revisions due. >> >> Thanks for you time! >> >> >> -- >> Andy R. Terrel, PhD >> President, NumFOCUS >> andy at numfocus.org >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> https://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Sep 4 16:00:12 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 5 Sep 2016 08:00:12 +1200 Subject: [SciPy-Dev] Moving SciPy project organization forward Message-ID: Hi all, The next weeks/months I should have quite a bit of bandwidth, which I'd like to use to push forward a few things related to project organization that we've mostly discussed before but never really finalized. The below is a summary of my view on where we are and where we need to go. To discuss individual topics I'll send separate follow-up emails with separate subjects. What we have: - Reasonably good docs for new contributors and core devs. - Established community process for communication and decision making. - A healthy group of active contributors and core devs. - A project that's overall in good shape. What we need: - Set up a mailing list for the core team. - Draft and accept a governance model & docs. - Draft and accept a Code of Conduct (CoC) or "community guidelines" [1] or similar. - Agree on an FSA with NumFOCUS. - Merge the roadmap PR. - Agree on when to do a 1.0 release, and what still has to happen for that. Mailing list ------------ This is not so we do more in private, but just to be more efficient in the small number of conversations that already happens in private (like giving out commit rights). The core team has grown large enough that we tend to forget people or use an old email address. I will set this one up within a week, probably using Google Groups (if anyone has a better idea, please speak up). My plan is to add all active core devs, and send other people with commit rights a personal email about this to ask if they want to be on it (because not everyone follows this list). Governance model & docs ----------------------- We had some discussions and a hangout on that last year [2]. In the hangout we decided to give people some time to read up on provided info on how this worked in other projects, the Karl Fogel book [4], etc. I was supposed to organize a follow-up, but failed to do so until now. I will send a separate email about this shortly. CoC or similar -------------- This is pretty standard to have nowadays, can be welcoming to new people, and is required now by all projects that join NumFOCUS. I'll send a separate email about this later on - probably right after we've merged a governance document. Fiscal Sponsorship Agreement (FSA) ---------------------------------- We need this so we have all our ducks in a row to start dealing with funding/donations in a better way. We did get donations in the past, and even used that money to support the MingwPy project to help improve our Fortran-on-Windows situation. This was OK as a one-off for a very small amount of money, but this needs to be organized better. We also do get people who come up with ideas that would improve SciPy a lot and that we could have supported with some money, had we had the infrastructure for that in place. Let's come back to this as well once we have governance and CoC documents (those are prerequisites). Roadmap PR (gh-2908) -------------------- Let's just pick a date at which the thing [3] has to be merged, in the state it is in then (updated for comments received). And then start improving it with new PRs if needed. Leaving the PR open until everything is perfect hasn't quite worked so far. I'll propose a date 2 weeks from now on the PR itself. SciPy 1.0 --------- We've discussed this several times before. We're now at the point where most of the major gaps have been filled though, so it looks to me like it's time to just pick a date for it (either after or instead of 0.19.0) and then fix up the last things that we think we really need for a 1.0 release. I'll send a separate email about this shortly. Ralf [1] https://github.com/scikit-image/scikit-image-web/pull/36 [2] https://mail.scipy.org/pipermail/scipy-dev/2015-April/020636.html [3] https://github.com/scipy/scipy/pull/2908 [4] http://producingoss.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Sep 4 16:07:05 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 5 Sep 2016 08:07:05 +1200 Subject: [SciPy-Dev] SciPy governance model Message-ID: Hi all, Starting with the summary of my email of earlier today: I'd like to push on with agreeing on a governance model and document. We had some discussions and a hangout on that last year [1]. In the hangout we decided to give people some time to read up on provided info on how this worked in other projects, the Karl Fogel book, etc. I was supposed to organize a follow-up, but failed to do so until now. I will send a separate email about this shortly. We're now at a point where most other major projects in the scientific Python ecosystem have a formal governance model. Many are modeled after the Jupyter one [2], which defines a BDFL, a steering council and contributors, and a voting system to make decisions (simplified, there's much more - the whole document is worth reading). NumPy chose another model, with a steering council and consensus-based decision making [3]. >From the outside, the Jupyter model seems to be working for them. The NumPy model hasn't been exercised too much yet, but should work well too. There are variations on those two models in use as well (like the Jupyter model minus the BDFL). An email discussion on this topic without a concrete proposal or summary document is likely to go on for a long time and may not converge easily. A video conference is also tricky, with dates/timezones and discussion often going off on tangents. So I have the following proposal: - We start by drafting an extended summary of the various models and getting a sense of which model the core team prefers. - We then work out that model, and bring it back to this list for discussion/finetuning/acceptance. - "We" here is the group of people who indicate, on this list or to me off-list, that they want to participate by the end of this week. Thoughts? Volunteers? Ralf [1] https://mail.scipy.org/pipermail/scipy-dev/2015-April/020636.html [2] Jupyter governance doc: https://github.com/jupyter/governance [3] NumPy governance doc: http://docs.scipy.org/doc/numpy-dev/dev/governance/index.html [4] Producing Open Source software (Karl Fogel): http://producingoss.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fboulogne at sciunto.org Sun Sep 4 16:09:47 2016 From: fboulogne at sciunto.org (=?UTF-8?Q?Fran=c3=a7ois_Boulogne?=) Date: Sun, 4 Sep 2016 22:09:47 +0200 Subject: [SciPy-Dev] Moving SciPy project organization forward In-Reply-To: References: Message-ID: Hi Ralf, Thanks for this great and promising announcement. > Mailing list > ------------ > This is not so we do more in private, but just to be more efficient in > the small number of conversations that already happens in private > (like giving out commit rights). The core team has grown large enough > that we tend to forget people or use an old email address. I will set > this one up within a week, probably using Google Groups (if anyone has > a better idea, please speak up). My plan is to add all active core > devs, and send other people with commit rights a personal email about > this to ask if they want to be on it (because not everyone follows > this list). St?fan is moving out scikit-image ML to a list hosted on Python.org. I don't know the details, but I guess it's a place to look at and anything else than GAFAM would be appreciated. Best, -- Fran?ois Boulogne. http://www.sciunto.org GPG: 32D5F22F From ralf.gommers at gmail.com Sun Sep 4 16:22:37 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 5 Sep 2016 08:22:37 +1200 Subject: [SciPy-Dev] Timing of SciPy 1.0 Message-ID: Hi all, Starting with the summary of my email of earlier today: I'd like to pick a time for when we release SciPy 1.0, and what is still essential to do for that version number. We've discussed this a couple of times before [1,2]. We're now at the point where most of the major gaps have been filled though, so it looks to me like it's time to just pick a date for it (either after or instead of 0.19.0) and then fix up the last things that we think we really need for a 1.0 release. Here are the things that I see as essential: - Getting project organization in order: governance and CoC at least (see my other email of today). - scipy.signal: clean up the messes in wavelets and B-splines. - scipy.signal: unified filter API [3] - scipy.spatial: remove Python implementation of KDTree, just keep cKDTree - scipy.interpolate: not sure of the details, but I think there are some new interpolator classes and a spline PR that aren't quite finished? - Remove some deprecated items (weave is the biggest one), and decide now if there's anything else we need to deprecate. - Merge or close more PRs. We've stabilized them at around 120-130 open ones, but that's not really good enough if there are (almost) finished PRs that no one has looked at in a year. This may be the single biggest task. We shouldn't make the above list too long, otherwise we won't get there. Really, SciPy is production quality software (with a few dusty corners), so we should limit ourselves to listing what is essential here. Timing: I suspect that we want to deprecate some more things, and that 4 months is a little too short to get to the point where we want to be. So I would propose to still do a 0.19.0, make sure that all deprecations are in there, merge PRs quite aggressively for 0.19.0 as well, and then plan 1.0 as the next release (can be shorter than 6 months after 0.19.0). So maybe Nov/Dec for 0.19.0 and say March '17 for 1.0. Thoughts? Ralf [1] https://github.com/scipy/scipy/pull/2908 [2] https://mail.scipy.org/pipermail/scipy-dev/2013-September/019238.html [3] https://github.com/scipy/scipy/issues/6137 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Sep 4 16:50:13 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 5 Sep 2016 08:50:13 +1200 Subject: [SciPy-Dev] Moving SciPy project organization forward In-Reply-To: References:

Message-ID: On Mon, Sep 5, 2016 at 8:09 AM, Fran?ois Boulogne wrote: > Hi Ralf, > > Thanks for this great and promising announcement. > > Mailing list > > ------------ > > This is not so we do more in private, but just to be more efficient in > > the small number of conversations that already happens in private > > (like giving out commit rights). The core team has grown large enough > > that we tend to forget people or use an old email address. I will set > > this one up within a week, probably using Google Groups (if anyone has > > a better idea, please speak up). My plan is to add all active core > > devs, and send other people with commit rights a personal email about > > this to ask if they want to be on it (because not everyone follows > > this list). > > St?fan is moving out scikit-image ML to a list hosted on Python.org. I > don't know the details, but I guess it's a place to look at and anything > else than GAFAM would be appreciated. > Keep in mind that this is a private list for ~15-20 people, so easy of setup and maintenance is key. If we'd move the main scipy-user and scipy-dev lists, yes definitely I would prefer python.org. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlucente at pipeline.com Sun Sep 4 17:00:08 2016 From: rlucente at pipeline.com (Robert Lucente) Date: Sun, 4 Sep 2016 17:00:08 -0400 (GMT-04:00) Subject: [SciPy-Dev] cKDTree Message-ID: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net> Please note that I am a newbie and just a lurker. I noticed in a recent email that cKDTree was mentioned. Q: What is the relationship if any between SciPy an scikit-learn when it comes to cKDTree? The reason that I ask are the following 2 links https://jakevdp.github.io/blog/2013/04/29/benchmarking-nearest-neighbor-searches-in-python/ https://github.com/scikit-learn/scikit-learn/issues/3682 From gael.varoquaux at normalesup.org Sun Sep 4 17:12:03 2016 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 4 Sep 2016 23:12:03 +0200 Subject: [SciPy-Dev] cKDTree In-Reply-To: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net> References: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net> Message-ID: <20160904211203.GD372820@phare.normalesup.org> I did benches a while ago, and the scikit-learn KDTree tends to be faster for larger dimensions situations: http://gael-varoquaux.info/programming/scikit-learn-014-release-features-and-benchmarks.html Note that it's not a general rule: they implement different heuristics, and hence explore different tradeoffs. G On Sun, Sep 04, 2016 at 05:00:08PM -0400, Robert Lucente wrote: > Please note that I am a newbie and just a lurker. > I noticed in a recent email that cKDTree was mentioned. > Q: What is the relationship if any between SciPy an scikit-learn when it comes to cKDTree? > The reason that I ask are the following 2 links > https://jakevdp.github.io/blog/2013/04/29/benchmarking-nearest-neighbor-searches-in-python/ > https://github.com/scikit-learn/scikit-learn/issues/3682 > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-dev -- Gael Varoquaux Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux From ericq at caltech.edu Sun Sep 4 19:17:41 2016 From: ericq at caltech.edu (Eric Q) Date: Sun, 04 Sep 2016 23:17:41 +0000 Subject: [SciPy-Dev] Timing of SciPy 1.0 In-Reply-To: References: Message-ID: The timeline you suggest and the goals for signal sound good to me. I've been pretty tied up lately, and am traveling for the next two weeks, but after that I intend on attacking the filtering API (especially second order section support). Eric Q. On Sun, Sep 4, 2016 at 16:22 Ralf Gommers wrote: > Hi all, > > Starting with the summary of my email of earlier today: I'd like to pick a > time for when we release SciPy 1.0, and what is still essential to do for > that version number. > > We've discussed this a couple of times before [1,2]. We're now at the > point where most of the major gaps have been filled though, so it looks to > me like it's time to just pick a date for it (either after or instead of > 0.19.0) and then fix up the last things that we think we really need for a > 1.0 release. > > Here are the things that I see as essential: > - Getting project organization in order: governance and CoC at least (see > my other email of today). > - scipy.signal: clean up the messes in wavelets and B-splines. > - scipy.signal: unified filter API [3] > - scipy.spatial: remove Python implementation of KDTree, just keep cKDTree > - scipy.interpolate: not sure of the details, but I think there are some > new interpolator classes and a spline PR that aren't quite finished? > - Remove some deprecated items (weave is the biggest one), and decide now > if there's anything else we need to deprecate. > - Merge or close more PRs. We've stabilized them at around 120-130 open > ones, but that's not really good enough if there are (almost) finished PRs > that no one has looked at in a year. This may be the single biggest task. > > We shouldn't make the above list too long, otherwise we won't get there. > Really, SciPy is production quality software (with a few dusty corners), so > we should limit ourselves to listing what is essential here. > > Timing: I suspect that we want to deprecate some more things, and that 4 > months is a little too short to get to the point where we want to be. So I > would propose to still do a 0.19.0, make sure that all deprecations are in > there, merge PRs quite aggressively for 0.19.0 as well, and then plan 1.0 > as the next release (can be shorter than 6 months after 0.19.0). So maybe > Nov/Dec for 0.19.0 and say March '17 for 1.0. > > Thoughts? > > Ralf > > [1] https://github.com/scipy/scipy/pull/2908 > [2] https://mail.scipy.org/pipermail/scipy-dev/2013-September/019238.html > [3] https://github.com/scipy/scipy/issues/6137 > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Mon Sep 5 03:46:24 2016 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Mon, 5 Sep 2016 09:46:24 +0200 Subject: [SciPy-Dev] cKDTree In-Reply-To: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net> References: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net> Message-ID: On 4 September 2016 at 23:00, Robert Lucente wrote: > Please note that I am a newbie and just a lurker. > > I noticed in a recent email that cKDTree was mentioned. > > Q: What is the relationship if any between SciPy an scikit-learn when it comes to cKDTree? > > The reason that I ask are the following 2 links > > https://jakevdp.github.io/blog/2013/04/29/benchmarking-nearest-neighbor-searches-in-python/ > > https://github.com/scikit-learn/scikit-learn/issues/3682 Note that these benchmarks are from 2013 and 2014. Scipy's KDTree has seen its performance recently improved, twice. Scikit's last update to its KDTree was in 2015. So, we need to run the benchmarks again. /David. From matthew.brett at gmail.com Mon Sep 5 13:04:07 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 5 Sep 2016 10:04:07 -0700 Subject: [SciPy-Dev] SciPy governance model In-Reply-To: References: Message-ID: Hi, On Sun, Sep 4, 2016 at 1:07 PM, Ralf Gommers wrote: > Hi all, > > Starting with the summary of my email of earlier today: I'd like to push on > with agreeing on a governance model and document. We had some discussions > and a hangout on that last year [1]. In the hangout we decided to give > people some time to read up on provided info on how this worked in other > projects, the Karl Fogel book, etc. I was supposed to organize a follow-up, > but failed to do so until now. I will send a separate email about this > shortly. > > We're now at a point where most other major projects in the scientific > Python ecosystem have a formal governance model. Many are modeled after the > Jupyter one [2], which defines a BDFL, a steering council and contributors, > and a voting system to make decisions (simplified, there's much more - the > whole document is worth reading). NumPy chose another model, with a > steering council and consensus-based decision making [3]. From the outside, > the Jupyter model seems to be working for them. The NumPy model hasn't been > exercised too much yet, but should work well too. There are variations on > those two models in use as well (like the Jupyter model minus the BDFL). > > An email discussion on this topic without a concrete proposal or summary > document is likely to go on for a long time and may not converge easily. A > video conference is also tricky, with dates/timezones and discussion often > going off on tangents. So I have the following proposal: > > - We start by drafting an extended summary of the various models and getting > a sense of which model the core team prefers. > - We then work out that model, and bring it back to this list for > discussion/finetuning/acceptance. > - "We" here is the group of people who indicate, on this list or to me > off-list, that they want to participate by the end of this week. I'm happy to help. I have some time this coming week. I know that Stefan vdW is thinking hard about these issues at the moment for scikit-image, so we may be able to collaborate with scikit-image on some of these discussions. The Jupyter (BDFL) model got picked up by at Pandas [1], and MPL [2]. The numpy model is designed for the situation where was not an obvious candidate to be project leader. So, I strongly suspect that our choice of model will come down to whether we can agree on a project leader, or agree on a way to chose one. The obvious candidates would I guess be in this list: git shortlog -ns --since "5 years ago" | head -5 1789 Pauli Virtanen 1528 Ralf Gommers 770 Evgeni Burovski 604 Alex Griffing 402 Warren Weckesser I have the impression that you (Ralf) and Pauli have also been the most active in reviewing and merging pull requests over that time. Can I humbly suggest that you 5 discuss amongst yourselves who among you would like to be project leader? If there's only one of you who wants to do that job, then the decision process about governance is much easier. If there are several of you who want to do the job, it's still easier, because we can just work out a voting process to select you. I think we should consider the numpy governance model, only if there are none of you who want to be leader. Thanks for bringing up the discussion, Matthew [1] https://github.com/pydata/pandas-governance [2] https://github.com/matplotlib/governance/pull/1 From evgeny.burovskiy at gmail.com Mon Sep 5 15:09:42 2016 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Mon, 5 Sep 2016 22:09:42 +0300 Subject: [SciPy-Dev] Timing of SciPy 1.0 In-Reply-To: References: Message-ID: > Starting with the summary of my email of earlier today: I'd like to pick a > time for when we release SciPy 1.0, and what is still essential to do for > that version number. I'd think we should treat 1.0 as *almost* a usual release. It's not that by calling a release 1.0 instead of 0.20 or something we are going to freeze everything. It seems that we already have a healthy deprecation policy, so we might consider extending the deprecation periods, but that seems to be more or less it. > Here are the things that I see as essential: > - Getting project organization in order: governance and CoC at least (see my > other email of today). > - scipy.signal: clean up the messes in wavelets and B-splines. > - scipy.signal: unified filter API [3] > - scipy.spatial: remove Python implementation of KDTree, just keep cKDTree > - scipy.interpolate: not sure of the details, but I think there are some new > interpolator classes and a spline PR that aren't quite finished? Not sure there are unfinished new interpolator classes apart from Pauli's rational interpolation PR. Which I'd label as a very-nice-to-have, but not a blocker. The spline PR is nearly ready. It mainly needs some more review and user testing (and a small amount of work to tweak the interface to be consistent with newer additions from 2016). The PR itself is fairly straightforward, but it just touches *a lot* of legacy code, so it's potentially disruptive. > - Remove some deprecated items (weave is the biggest one), and decide now if > there's anything else we need to deprecate. > - Merge or close more PRs. We've stabilized them at around 120-130 open > ones, but that's not really good enough if there are (almost) finished PRs > that no one has looked at in a year. This may be the single biggest task. +1 for this. I also agree it's better to keep this list short. I would, however, add to this list of high-priority items also Pauli's LowLevelFunction, https://github.com/scipy/scipy/pull/6509. > We shouldn't make the above list too long, otherwise we won't get there. > Really, SciPy is production quality software (with a few dusty corners), so > we should limit ourselves to listing what is essential here. > > Timing: I suspect that we want to deprecate some more things, and that 4 > months is a little too short to get to the point where we want to be. So I > would propose to still do a 0.19.0, make sure that all deprecations are in > there, merge PRs quite aggressively for 0.19.0 as well, and then plan 1.0 as > the next release (can be shorter than 6 months after 0.19.0). So maybe > Nov/Dec for 0.19.0 and say March '17 for 1.0. Either this, or skip 0.19.0 and just release 1.0 in March or so. The only thing I'd watch out for is that we likely want to avoid having three active branches (maintenance/0.19.x, maintenance/1.0.x and master) for an extended period of time. IOW, if we do release 0.19.0, we need keep in mind 0.19.1 before branching 1.x. Or maybe we can ask matplotlib people about managing their 1.5.x, 2.0.x and master branches. Cheers, Evgeni From evgeny.burovskiy at gmail.com Mon Sep 5 15:21:49 2016 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Mon, 5 Sep 2016 22:21:49 +0300 Subject: [SciPy-Dev] Moving SciPy project organization forward In-Reply-To: References: Message-ID: > Mailing list > ------------ > This is not so we do more in private, but just to be more efficient in the > small number of conversations that already happens in private (like giving > out commit rights). The core team has grown large enough that we tend to > forget people or use an old email address. I will set this one up within a > week, probably using Google Groups (if anyone has a better idea, please > speak up). My plan is to add all active core devs, and send other people > with commit rights a personal email about this to ask if they want to be on > it (because not everyone follows this list). +1 A fun note in passing: discussing the commit rights on such a list would actually make things more open, since a new dev would get access to past discussions including the one about giving them commit rights :-). (And no, I'm not suggesting this is a problem.) Maybe we should also consider setting up some communication channels with higher bandwidth --- handouts or conference calls. Yes, time zones is a problem. And yes, the conference call last year went into more tangents than it could have. But maybe if we manage to make these conference calls somewhat more regular, we'd be able to have more manageable agendas for each of them. I wonder how it worked back in times of the numpy doc marathon, were there regular conf calls, did they work? Evgeni From evgeny.burovskiy at gmail.com Mon Sep 5 15:42:05 2016 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Mon, 5 Sep 2016 22:42:05 +0300 Subject: [SciPy-Dev] SciPy governance model In-Reply-To: References: Message-ID: > So, I strongly suspect that our choice of model will come down to > whether we can agree on a project leader, or agree on a way to chose > one. The obvious candidates would I guess be in this list: > > git shortlog -ns --since "5 years ago" | head -5 > > 1789 Pauli Virtanen > 1528 Ralf Gommers > 770 Evgeni Burovski > 604 Alex Griffing > 402 Warren Weckesser > > I have the impression that you (Ralf) and Pauli have also been the > most active in reviewing and merging pull requests over that time. As far as I'm concerned, I always considered these two individuals as project leads. > I think we should consider the numpy governance model, only if > there are none of you who want to be leader. Agreed --- it's better to have at each point in time a person with deciding vote or veto. Or two persons. And for a really low-probability case where they two cannot agree, they together designate a third one. (Not sure we need to worry about that possibility though.) If we want to be fancy and can't find a better alternative, we can consider a EU presidency model where the deciding vote position is rotating among the dev team on a time basis. From ralf.gommers at gmail.com Mon Sep 5 15:45:06 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 6 Sep 2016 07:45:06 +1200 Subject: [SciPy-Dev] Moving SciPy project organization forward In-Reply-To: References:

Message-ID: On Tue, Sep 6, 2016 at 7:21 AM, Evgeni Burovski wrote: > > Mailing list > > ------------ > > This is not so we do more in private, but just to be more efficient in > the > > small number of conversations that already happens in private (like > giving > > out commit rights). The core team has grown large enough that we tend to > > forget people or use an old email address. I will set this one up > within a > > week, probably using Google Groups (if anyone has a better idea, please > > speak up). My plan is to add all active core devs, and send other people > > with commit rights a personal email about this to ask if they want to be > on > > it (because not everyone follows this list). > > +1 > > A fun note in passing: discussing the commit rights on such a list > would actually make things more open, since a new dev would get access > to past discussions including the one about giving them commit rights > :-). > (And no, I'm not suggesting this is a problem.) > Agreed, that's a good thing. It gives new devs a picture of how decisions were made in the past. > Maybe we should also consider setting up some communication channels > with higher bandwidth --- handouts or conference calls. Yes, time > zones is a problem. And yes, the conference call last year went into > more tangents than it could have. But maybe if we manage to make these > conference calls somewhat more regular, we'd be able to have more > manageable agendas for each of them. > Yes, that could work. If we have a clear agenda and don't have to discuss changing the world each time, it may be productive. What were you thinking, like once every 6-8 weeks or so? > > I wonder how it worked back in times of the numpy doc marathon, were > there regular conf calls, did they work? > There were, IIRC once a month. Those were reasonably productive I think, but that was a small group (4-5 people) and a single topic. So hard to compare. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Sep 5 15:56:27 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 6 Sep 2016 07:56:27 +1200 Subject: [SciPy-Dev] Timing of SciPy 1.0 In-Reply-To: References:

Message-ID: On Tue, Sep 6, 2016 at 7:09 AM, Evgeni Burovski wrote: > > Starting with the summary of my email of earlier today: I'd like to pick > a > > time for when we release SciPy 1.0, and what is still essential to do for > > that version number. > > I'd think we should treat 1.0 as *almost* a usual release. It's not > that by calling a release 1.0 instead of 0.20 or something we are > going to freeze everything. It seems that we already have a healthy > deprecation policy, so we might consider extending the deprecation > periods, but that seems to be more or less it. > Indeed, that's what I was thinking. Thanks for spelling it out:) > > > > Here are the things that I see as essential: > > - Getting project organization in order: governance and CoC at least > (see my > > other email of today). > > - scipy.signal: clean up the messes in wavelets and B-splines. > > - scipy.signal: unified filter API [3] > > - scipy.spatial: remove Python implementation of KDTree, just keep > cKDTree > > - scipy.interpolate: not sure of the details, but I think there are some > new > > interpolator classes and a spline PR that aren't quite finished? > > Not sure there are unfinished new interpolator classes apart from > Pauli's rational interpolation PR. Which I'd label as a > very-nice-to-have, but not a blocker. > > The spline PR is nearly ready. It mainly needs some more review and > user testing (and a small amount of work to tweak the interface to be > consistent with newer additions from 2016). The PR itself is fairly > straightforward, but it just touches *a lot* of legacy code, so it's > potentially disruptive. > > > > - Remove some deprecated items (weave is the biggest one), and decide > now if > > there's anything else we need to deprecate. > > - Merge or close more PRs. We've stabilized them at around 120-130 open > > ones, but that's not really good enough if there are (almost) finished > PRs > > that no one has looked at in a year. This may be the single biggest > task. > > +1 for this. > > I also agree it's better to keep this list short. I would, however, > add to this list of high-priority items also Pauli's LowLevelFunction, > https://github.com/scipy/scipy/pull/6509. > That would be good indeed. > > > > We shouldn't make the above list too long, otherwise we won't get there. > > Really, SciPy is production quality software (with a few dusty corners), > so > > we should limit ourselves to listing what is essential here. > > > > Timing: I suspect that we want to deprecate some more things, and that 4 > > months is a little too short to get to the point where we want to be. > So I > > would propose to still do a 0.19.0, make sure that all deprecations are > in > > there, merge PRs quite aggressively for 0.19.0 as well, and then plan > 1.0 as > > the next release (can be shorter than 6 months after 0.19.0). So maybe > > Nov/Dec for 0.19.0 and say March '17 for 1.0. > > Either this, or skip 0.19.0 and just release 1.0 in March or so. > That works too, if there are no deprecations to be made of stuff we want to remove by 1.0. I'm not sure that there are. The signal splines would be nice to get rid of, but having them sit around deprecated is also not a problem. > The only thing I'd watch out for is that we likely want to avoid > having three active branches (maintenance/0.19.x, maintenance/1.0.x > and master) for an extended period of time. IOW, if we do release > 0.19.0, we need keep in mind 0.19.1 before branching 1.x. > Or maybe we can ask matplotlib people about managing their 1.5.x, > 2.0.x and master branches. > If we don't make radical/breaking changes for 1.0 (which is not the plan I think), there's also no strong reason to keep the last pre-1.0 branch as a long-lived maintenance branch I'd say. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Sep 5 16:34:47 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 6 Sep 2016 08:34:47 +1200 Subject: [SciPy-Dev] Fwd: PyData Community Cookbook - August Update In-Reply-To: References: Message-ID: On Sun, Sep 4, 2016 at 1:26 PM, Charles R Harris wrote: > > > On Sat, Sep 3, 2016 at 6:13 PM, Ralf Gommers > wrote: > >> Hi all, >> >> Is anyone interested to write or contribute to a chapter about SciPy for >> a PyData Community Cookbook? We're a bit late (but so are most people), so >> ideally we get this organized within a day or two. It can be a >> single-author or multi-author effort. The projects that submitted an >> abstract so far all seem to do 2 or 3 authors: >> https://github.com/pydata/pydata-cookbook. >> >> I'm happy to contribute, or if there's a lot of interest leave it to >> others. Would like to see it happen though - SciPy should really not be >> missing in this book. >> >> > I had several projects that might have been interesting, but the code was > written for work and I would have needed to get permission from my former > employer to publish it... > Just to be sure, is that a yes, a maybe or a no to contributing? Warren is in (thanks!). Anyone else? This would be a great way also for newer contributors to help promote SciPy and participate in a book editing process. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From bennet at umich.edu Mon Sep 5 17:04:07 2016 From: bennet at umich.edu (Bennet Fauber) Date: Mon, 5 Sep 2016 17:04:07 -0400 Subject: [SciPy-Dev] Fwd: PyData Community Cookbook - August Update In-Reply-To: References: Message-ID: I am probably not able to provide any original contributions, but if you want someone of middling competence with Python/SciPy to look over examples and try to get things going from what you produce, I would be happy to help. Maybe that's the target audience? I was also a copy editor and would be willing to copy edit, if another set of editorial eyes would be useful. -- bennet On Mon, Sep 5, 2016 at 4:34 PM, Ralf Gommers wrote: > > > On Sun, Sep 4, 2016 at 1:26 PM, Charles R Harris > wrote: >> >> >> >> On Sat, Sep 3, 2016 at 6:13 PM, Ralf Gommers >> wrote: >>> >>> Hi all, >>> >>> Is anyone interested to write or contribute to a chapter about SciPy for >>> a PyData Community Cookbook? We're a bit late (but so are most people), so >>> ideally we get this organized within a day or two. It can be a single-author >>> or multi-author effort. The projects that submitted an abstract so far all >>> seem to do 2 or 3 authors: https://github.com/pydata/pydata-cookbook. >>> >>> I'm happy to contribute, or if there's a lot of interest leave it to >>> others. Would like to see it happen though - SciPy should really not be >>> missing in this book. >>> >> >> I had several projects that might have been interesting, but the code was >> written for work and I would have needed to get permission from my former >> employer to publish it... > > > Just to be sure, is that a yes, a maybe or a no to contributing? > > Warren is in (thanks!). Anyone else? This would be a great way also for > newer contributors to help promote SciPy and participate in a book editing > process. > > Cheers, > Ralf > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-dev > From evgeny.burovskiy at gmail.com Mon Sep 5 17:21:58 2016 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Tue, 6 Sep 2016 00:21:58 +0300 Subject: [SciPy-Dev] Fwd: PyData Community Cookbook - August Update In-Reply-To: References: Message-ID: Count me in, too. On Sep 5, 2016 11:34 PM, "Ralf Gommers" wrote: > > > On Sun, Sep 4, 2016 at 1:26 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sat, Sep 3, 2016 at 6:13 PM, Ralf Gommers >> wrote: >> >>> Hi all, >>> >>> Is anyone interested to write or contribute to a chapter about SciPy for >>> a PyData Community Cookbook? We're a bit late (but so are most people), so >>> ideally we get this organized within a day or two. It can be a >>> single-author or multi-author effort. The projects that submitted an >>> abstract so far all seem to do 2 or 3 authors: >>> https://github.com/pydata/pydata-cookbook. >>> >>> I'm happy to contribute, or if there's a lot of interest leave it to >>> others. Would like to see it happen though - SciPy should really not be >>> missing in this book. >>> >>> >> I had several projects that might have been interesting, but the code was >> written for work and I would have needed to get permission from my former >> employer to publish it... >> > > Just to be sure, is that a yes, a maybe or a no to contributing? > > Warren is in (thanks!). Anyone else? This would be a great way also for > newer contributors to help promote SciPy and participate in a book editing > process. > > Cheers, > Ralf > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Sep 5 17:54:50 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 5 Sep 2016 15:54:50 -0600 Subject: [SciPy-Dev] Fwd: PyData Community Cookbook - August Update In-Reply-To: References: Message-ID: On Mon, Sep 5, 2016 at 2:34 PM, Ralf Gommers wrote: > > > On Sun, Sep 4, 2016 at 1:26 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sat, Sep 3, 2016 at 6:13 PM, Ralf Gommers >> wrote: >> >>> Hi all, >>> >>> Is anyone interested to write or contribute to a chapter about SciPy for >>> a PyData Community Cookbook? We're a bit late (but so are most people), so >>> ideally we get this organized within a day or two. It can be a >>> single-author or multi-author effort. The projects that submitted an >>> abstract so far all seem to do 2 or 3 authors: >>> https://github.com/pydata/pydata-cookbook. >>> >>> I'm happy to contribute, or if there's a lot of interest leave it to >>> others. Would like to see it happen though - SciPy should really not be >>> missing in this book. >>> >>> >> I had several projects that might have been interesting, but the code was >> written for work and I would have needed to get permission from my former >> employer to publish it... >> > > Just to be sure, is that a yes, a maybe or a no to contributing? > Somewhat reluctant no, I don't think I will have the time and don't want it hanging over my head. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jakevdp at cs.washington.edu Mon Sep 5 22:14:36 2016 From: jakevdp at cs.washington.edu (Jacob Vanderplas) Date: Mon, 5 Sep 2016 19:14:36 -0700 Subject: [SciPy-Dev] cKDTree In-Reply-To: References: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net> Message-ID: >From my own casual benchmarks, the new scipy cKDTree is much faster than any of the scikit-learn options, though it still only supports axis-aligned euclidean-like metrics (where sklearn's BallTree supports dozens of additional metrics). The cKDTree also has a limited range of query types compared to scikit-learn's trees, Jake Jake VanderPlas Senior Data Science Fellow Director of Research in Physical Sciences University of Washington eScience Institute On Mon, Sep 5, 2016 at 12:46 AM, Da?id wrote: > On 4 September 2016 at 23:00, Robert Lucente > wrote: > > Please note that I am a newbie and just a lurker. > > > > I noticed in a recent email that cKDTree was mentioned. > > > > Q: What is the relationship if any between SciPy an scikit-learn when it > comes to cKDTree? > > > > The reason that I ask are the following 2 links > > > > https://jakevdp.github.io/blog/2013/04/29/benchmarking- > nearest-neighbor-searches-in-python/ > > > > https://github.com/scikit-learn/scikit-learn/issues/3682 > > Note that these benchmarks are from 2013 and 2014. Scipy's KDTree has > seen its performance recently improved, twice. Scikit's last update to > its KDTree was in 2015. So, we need to run the benchmarks again. > > /David. > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Tue Sep 6 04:40:05 2016 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Tue, 06 Sep 2016 01:40:05 -0700 Subject: [SciPy-Dev] SciPy governance model In-Reply-To: References: Message-ID: <1473151205.1482512.716887281.63889567@webmail.messagingengine.com> On Mon, Sep 5, 2016, at 12:42, Evgeni Burovski wrote: > If we want to be fancy and can't find a better alternative, we can > consider a EU presidency model where the deciding vote position is > rotating among the dev team on a time basis. As Matthew mentioned, I am indeed very interested in this conversation, since we're also investigating potential governance models for scikit-image. My feeling is that having a clear leader in place is important, so I'm also leaning away from the numpy model towards one where responsibilities are more explicitly assigned. Exactly how to best make that assignment is still unclear to me. St?fan From larson.eric.d at gmail.com Tue Sep 6 09:37:59 2016 From: larson.eric.d at gmail.com (Eric Larson) Date: Tue, 6 Sep 2016 09:37:59 -0400 Subject: [SciPy-Dev] SciPy governance model In-Reply-To: <1473151205.1482512.716887281.63889567@webmail.messagingengine.com> References: <1473151205.1482512.716887281.63889567@webmail.messagingengine.com> Message-ID: > > My feeling is that having a clear leader in place is important, so I'm > also leaning away from the numpy model towards one where > responsibilities are more explicitly assigned. Exactly how to best make > that assignment is still unclear to me. > +1 for BD(FL) / leader-style from me, too. I like Matthew's suggestion of the top 5 active folks discuss to see which of them are actually interested in taking on that role. Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From larson.eric.d at gmail.com Tue Sep 6 10:13:45 2016 From: larson.eric.d at gmail.com (Eric Larson) Date: Tue, 6 Sep 2016 10:13:45 -0400 Subject: [SciPy-Dev] Moving SciPy project organization forward In-Reply-To: References:

Message-ID: > > Maybe we should also consider setting up some communication channels >> with higher bandwidth --- handouts or conference calls. Yes, time >> zones is a problem. And yes, the conference call last year went into >> more tangents than it could have. But maybe if we manage to make these >> conference calls somewhat more regular, we'd be able to have more >> manageable agendas for each of them. >> > > Yes, that could work. If we have a clear agenda and don't have to discuss > changing the world each time, it may be productive. What were you thinking, > like once every 6-8 weeks or so? > That sounds like a reasonable timing estimate, as it gets us a few meetings in between releases without adding too much burden (for me at least). Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From sylvain.corlay at gmail.com Tue Sep 6 14:03:31 2016 From: sylvain.corlay at gmail.com (Sylvain Corlay) Date: Tue, 6 Sep 2016 20:03:31 +0200 Subject: [SciPy-Dev] cKDTree In-Reply-To: References: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net> Message-ID: Would you guys consider in scope for scipy to have implementation of faster nearest neighbor search methods than KdTree? Some methods are fairly simple... e.g principal axis tree which use the principal direction of the dataset to split the dataset into smaller subsets. As soon as intrinsic dimensionality is significantly smaller than the dimension of the space, it is significantly faster. Besides, only having to compute the (an approximate) principal axis is much faster than doing an actual PCA. On Tue, Sep 6, 2016 at 4:14 AM, Jacob Vanderplas wrote: > From my own casual benchmarks, the new scipy cKDTree is much faster than > any of the scikit-learn options, though it still only supports axis-aligned > euclidean-like metrics (where sklearn's BallTree supports dozens of > additional metrics). The cKDTree also has a limited range of query types > compared to scikit-learn's trees, > Jake > > Jake VanderPlas > Senior Data Science Fellow > Director of Research in Physical Sciences > University of Washington eScience Institute > > On Mon, Sep 5, 2016 at 12:46 AM, Da?id wrote: > >> On 4 September 2016 at 23:00, Robert Lucente >> wrote: >> > Please note that I am a newbie and just a lurker. >> > >> > I noticed in a recent email that cKDTree was mentioned. >> > >> > Q: What is the relationship if any between SciPy an scikit-learn when >> it comes to cKDTree? >> > >> > The reason that I ask are the following 2 links >> > >> > https://jakevdp.github.io/blog/2013/04/29/benchmarking-neare >> st-neighbor-searches-in-python/ >> > >> > https://github.com/scikit-learn/scikit-learn/issues/3682 >> >> Note that these benchmarks are from 2013 and 2014. Scipy's KDTree has >> seen its performance recently improved, twice. Scikit's last update to >> its KDTree was in 2015. So, we need to run the benchmarks again. >> >> /David. >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> https://mail.scipy.org/mailman/listinfo/scipy-dev >> > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jakevdp at cs.washington.edu Tue Sep 6 14:08:41 2016 From: jakevdp at cs.washington.edu (Jacob Vanderplas) Date: Tue, 6 Sep 2016 11:08:41 -0700 Subject: [SciPy-Dev] cKDTree In-Reply-To: References: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net>

Message-ID: As for adding more functionality along those lines, I would advocate creation of a new package or perhaps a scikit. As we've seen with scikit-learn, useful features can filter-up into scipy (e.g. sparse.csgraph) and the development of new features within an independent package is *much* easier than development from within a scipy PR. Jake Jake VanderPlas Senior Data Science Fellow Director of Research in Physical Sciences University of Washington eScience Institute On Tue, Sep 6, 2016 at 11:03 AM, Sylvain Corlay wrote: > Would you guys consider in scope for scipy to have implementation of > faster nearest neighbor search methods than KdTree? > > Some methods are fairly simple... e.g principal axis tree which use the > principal direction of the dataset to split the dataset into smaller > subsets. As soon as intrinsic dimensionality is significantly smaller than > the dimension of the space, it is significantly faster. > > Besides, only having to compute the (an approximate) principal axis is > much faster than doing an actual PCA. > > On Tue, Sep 6, 2016 at 4:14 AM, Jacob Vanderplas < > jakevdp at cs.washington.edu> wrote: > >> From my own casual benchmarks, the new scipy cKDTree is much faster than >> any of the scikit-learn options, though it still only supports axis-aligned >> euclidean-like metrics (where sklearn's BallTree supports dozens of >> additional metrics). The cKDTree also has a limited range of query types >> compared to scikit-learn's trees, >> Jake >> >> Jake VanderPlas >> Senior Data Science Fellow >> Director of Research in Physical Sciences >> University of Washington eScience Institute >> >> On Mon, Sep 5, 2016 at 12:46 AM, Da?id wrote: >> >>> On 4 September 2016 at 23:00, Robert Lucente >>> wrote: >>> > Please note that I am a newbie and just a lurker. >>> > >>> > I noticed in a recent email that cKDTree was mentioned. >>> > >>> > Q: What is the relationship if any between SciPy an scikit-learn when >>> it comes to cKDTree? >>> > >>> > The reason that I ask are the following 2 links >>> > >>> > https://jakevdp.github.io/blog/2013/04/29/benchmarking-neare >>> st-neighbor-searches-in-python/ >>> > >>> > https://github.com/scikit-learn/scikit-learn/issues/3682 >>> >>> Note that these benchmarks are from 2013 and 2014. Scipy's KDTree has >>> seen its performance recently improved, twice. Scikit's last update to >>> its KDTree was in 2015. So, we need to run the benchmarks again. >>> >>> /David. >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> https://mail.scipy.org/mailman/listinfo/scipy-dev >>> >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> https://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sylvain.corlay at gmail.com Tue Sep 6 14:23:05 2016 From: sylvain.corlay at gmail.com (Sylvain Corlay) Date: Tue, 6 Sep 2016 20:23:05 +0200 Subject: [SciPy-Dev] cKDTree In-Reply-To: References: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net>

Message-ID: I understand (especially about the scikits), although I was surprised by some recently added features in scipy, such as differential evolution DE is more of a "recipe" which has been studied empirically for which there is no theoretical convergence rate. Even though it may "work well" for certain problems, it causes some issues such as defining on which basis it should even be "improved", and what should be the foundation for a change. (Recently a result on convergence in probability (with no error rate) on a bounded domain has been published, but no result exists on the speed of convergence afaik...) Evolutionary optimization is definitely cool and may work strikingly well for certain problems, but I was surprised that it got elected to inclusion in scipy. DE would have been a nice seed for a scikit-evolution. For those interested in stochastic algorithms for global multivariate optimization for which we have proven convergence results and an abundant literature, we have at least the two following categories of methods - *MCMC* (Markov Chain Monte Carlo) which comprises simulated annealing, for which there are nice results of convergence in distribution. - *Robbins-Monro* methods, which comprise stochastic gradient methods, for which we have almost-sure convergence and central-limit - type theorems. This sort of raises the question of the scope of scipy. Should scipy go through the same sort of "big split" as Jupyter or more ? la d3js? Is amount of specialized knowledge required to understand a methods part of what defines the line of division in what should be or not be in scipy? Sylvain On Tue, Sep 6, 2016 at 8:08 PM, Jacob Vanderplas wrote: > As for adding more functionality along those lines, I would advocate > creation of a new package or perhaps a scikit. As we've seen with > scikit-learn, useful features can filter-up into scipy (e.g. > sparse.csgraph) and the development of new features within an independent > package is *much* easier than development from within a scipy PR. > Jake > > Jake VanderPlas > Senior Data Science Fellow > Director of Research in Physical Sciences > University of Washington eScience Institute > > On Tue, Sep 6, 2016 at 11:03 AM, Sylvain Corlay > wrote: > >> Would you guys consider in scope for scipy to have implementation of >> faster nearest neighbor search methods than KdTree? >> >> Some methods are fairly simple... e.g principal axis tree which use the >> principal direction of the dataset to split the dataset into smaller >> subsets. As soon as intrinsic dimensionality is significantly smaller than >> the dimension of the space, it is significantly faster. >> >> Besides, only having to compute the (an approximate) principal axis is >> much faster than doing an actual PCA. >> >> On Tue, Sep 6, 2016 at 4:14 AM, Jacob Vanderplas < >> jakevdp at cs.washington.edu> wrote: >> >>> From my own casual benchmarks, the new scipy cKDTree is much faster than >>> any of the scikit-learn options, though it still only supports axis-aligned >>> euclidean-like metrics (where sklearn's BallTree supports dozens of >>> additional metrics). The cKDTree also has a limited range of query types >>> compared to scikit-learn's trees, >>> Jake >>> >>> Jake VanderPlas >>> Senior Data Science Fellow >>> Director of Research in Physical Sciences >>> University of Washington eScience Institute >>> >>> On Mon, Sep 5, 2016 at 12:46 AM, Da?id wrote: >>> >>>> On 4 September 2016 at 23:00, Robert Lucente >>>> wrote: >>>> > Please note that I am a newbie and just a lurker. >>>> > >>>> > I noticed in a recent email that cKDTree was mentioned. >>>> > >>>> > Q: What is the relationship if any between SciPy an scikit-learn when >>>> it comes to cKDTree? >>>> > >>>> > The reason that I ask are the following 2 links >>>> > >>>> > https://jakevdp.github.io/blog/2013/04/29/benchmarking-neare >>>> st-neighbor-searches-in-python/ >>>> > >>>> > https://github.com/scikit-learn/scikit-learn/issues/3682 >>>> >>>> Note that these benchmarks are from 2013 and 2014. Scipy's KDTree has >>>> seen its performance recently improved, twice. Scikit's last update to >>>> its KDTree was in 2015. So, we need to run the benchmarks again. >>>> >>>> /David. >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at scipy.org >>>> https://mail.scipy.org/mailman/listinfo/scipy-dev >>>> >>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> https://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> https://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Tue Sep 6 16:40:35 2016 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 6 Sep 2016 20:40:35 +0000 (UTC) Subject: [SciPy-Dev] cKDTree References: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net>

Message-ID: Tue, 06 Sep 2016 20:23:05 +0200, Sylvain Corlay kirjoitti: [clip] > This sort of raises the question of the scope of scipy. Should scipy > go through the same sort of "big split" as Jupyter or more ? la d3js? Is > amount of specialized knowledge required to understand a methods part of > what defines the line of division in what should be or not be in scipy? I'm not sure a "big split" will help with defining the scope --- only changing the name of "scipy.spatial" to "scikit-spatial" will likely not help in deciding what its scope is. Splitting may help with speeding up the release cycle, but if the team stays the same, it sounds likely to multiply the amount of release work. Rebuild times etc. development convenience are probably not a reason to split, as I think it's fast enough in practice currently. The general decision rule to accept has so far been generally conditional on, (i) the method is applicable in many fields and "generally agreed" to be useful, (ii) fits the topic of the subpackage, and does not require extensive support frameworks to operate, (iii) the implementation looks sound and unlikely to need much tweaking in the future, and (iv) someone wants to do it. I don't remember being involved with DE, and can't immediately comment on how the PCA trees look here. I think the original suggestion of scikits being something that will eventually, after maturing, end up in Scipy has not very generally been the case in practice. To my memory, many of the recent new features were written specifically for Scipy from the beginning. (OTOH, I admit I did not go and look through merged pull requests with this eye, so maybe the this claim is not accurate.) I also don't really feel that the implied idea of Scipy as a "repository of algorithms" where you hand off code, developed in some other project on PyPi, for someone else to maintain is workable in practice --- that doesn't sound like a living open source project. I would rather mentally substitute "scipy.interpolate" with "scikit-interpolate" and work accordingly to that picture. -- Pauli Virtanen From ralf.gommers at gmail.com Tue Sep 6 17:21:35 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 7 Sep 2016 09:21:35 +1200 Subject: [SciPy-Dev] Fwd: PyData Community Cookbook - August Update In-Reply-To: References:

Message-ID: Thanks Warren, Evgeni. Sent a WIP PR with names and abstract to get the ball rolling: https://github.com/pydata/pydata-cookbook/pull/7 Cheers, Ralf On Tue, Sep 6, 2016 at 9:21 AM, Evgeni Burovski wrote: > Count me in, too. > On Sep 5, 2016 11:34 PM, "Ralf Gommers" wrote: > >> >> >> On Sun, Sep 4, 2016 at 1:26 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Sat, Sep 3, 2016 at 6:13 PM, Ralf Gommers >>> wrote: >>> >>>> Hi all, >>>> >>>> Is anyone interested to write or contribute to a chapter about SciPy >>>> for a PyData Community Cookbook? We're a bit late (but so are most people), >>>> so ideally we get this organized within a day or two. It can be a >>>> single-author or multi-author effort. The projects that submitted an >>>> abstract so far all seem to do 2 or 3 authors: >>>> https://github.com/pydata/pydata-cookbook. >>>> >>>> I'm happy to contribute, or if there's a lot of interest leave it to >>>> others. Would like to see it happen though - SciPy should really not be >>>> missing in this book. >>>> >>>> >>> I had several projects that might have been interesting, but the code >>> was written for work and I would have needed to get permission from my >>> former employer to publish it... >>> >> >> Just to be sure, is that a yes, a maybe or a no to contributing? >> >> Warren is in (thanks!). Anyone else? This would be a great way also for >> newer contributors to help promote SciPy and participate in a book editing >> process. >> >> Cheers, >> Ralf >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> https://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Tue Sep 6 17:25:22 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 7 Sep 2016 09:25:22 +1200 Subject: [SciPy-Dev] Fwd: PyData Community Cookbook - August Update In-Reply-To: References:

Message-ID: On Tue, Sep 6, 2016 at 9:04 AM, Bennet Fauber wrote: > I am probably not able to provide any original contributions, but if > you want someone of middling competence with Python/SciPy to look over > examples and try to get things going from what you produce, I would be > happy to help. Maybe that's the target audience? > > I was also a copy editor and would be willing to copy edit, if another > set of editorial eyes would be useful. > Hi Bennet, thanks for the offer - help with editing and feedback on the content would be great. The target audience is indeed medium to advanced users. I've started an abstract PR at https://github.com/pydata/pydata-cookbook/pull/7. If you can comment there that would be helpful, if I have your GitHub handle I can include you in any follow-up content. Cheers, Ralf > -- bennet > > > > On Mon, Sep 5, 2016 at 4:34 PM, Ralf Gommers > wrote: > > > > > > On Sun, Sep 4, 2016 at 1:26 PM, Charles R Harris < > charlesr.harris at gmail.com> > > wrote: > >> > >> > >> > >> On Sat, Sep 3, 2016 at 6:13 PM, Ralf Gommers > >> wrote: > >>> > >>> Hi all, > >>> > >>> Is anyone interested to write or contribute to a chapter about SciPy > for > >>> a PyData Community Cookbook? We're a bit late (but so are most > people), so > >>> ideally we get this organized within a day or two. It can be a > single-author > >>> or multi-author effort. The projects that submitted an abstract so far > all > >>> seem to do 2 or 3 authors: https://github.com/pydata/pydata-cookbook. > >>> > >>> I'm happy to contribute, or if there's a lot of interest leave it to > >>> others. Would like to see it happen though - SciPy should really not be > >>> missing in this book. > >>> > >> > >> I had several projects that might have been interesting, but the code > was > >> written for work and I would have needed to get permission from my > former > >> employer to publish it... > > > > > > Just to be sure, is that a yes, a maybe or a no to contributing? > > > > Warren is in (thanks!). Anyone else? This would be a great way also for > > newer contributors to help promote SciPy and participate in a book > editing > > process. > > > > Cheers, > > Ralf > > > > > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > https://mail.scipy.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Tue Sep 6 19:19:53 2016 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 6 Sep 2016 23:19:53 +0000 (UTC) Subject: [SciPy-Dev] cKDTree References: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net>

Message-ID: Tue, 06 Sep 2016 20:23:05 +0200, Sylvain Corlay kirjoitti: [clip] > This sort of raises the question of the scope of scipy. Should scipy go > through the same sort of "big split" as Jupyter or more ? la d3js? Is > amount of specialized knowledge required to understand a methods part of > what defines the line of division in what should be or not be in scipy? To write a bit more on this, although I think it's difficult to give hard rules on what "generally useful and generally agreed to work" means, I would perhaps weigh the following against each other: - Is the method used/useful in different domains in practice? How much domain-specific background knowledge is needed to use it properly? - Consider the stuff already in the module. Is what you are adding an omission? Does it solve a problem that you'd expect the module be able to solve? Does it supplement an existing feature in a significant way? - Consider the equivalence class of similar methods / features usually expected. Among them, what would in principle be the minimal set so that there's not a glaring omission in the offered features remaining? How much stuff would that be? Does including a representative one of them cover most use cases? Would it in principle sound reasonable to include everything from the minimal set in the module? - Is what you are adding something that is well understood in the literature? If not, how sure are you that it will turn out well? Does the method perform well compared to other similar ones? - Note that the twice-a-year release cycle and backward-compat policy makes correcting things later on more difficult. The scopes of the submodules also vary, so it's probably best to consider each as if a separate project --- "numerical evaluation of special functions" is relatively well-defined, but "commonly needed optimization algorithms" less so. On a meta-level, it's probably also bad to be too restrictive on the scope, as telling people to go away can result to just that. -- Pauli Virtanen From greg.dooper at gmail.com Wed Sep 7 00:56:52 2016 From: greg.dooper at gmail.com (Greg Dooper) Date: Tue, 6 Sep 2016 22:56:52 -0600 Subject: [SciPy-Dev] misc.bytescale bug In-Reply-To: References:

Message-ID: Hello, It seems like there is a bug in misc.bytescale. Or I am not understanding what the intended behavior should be. What I think is a bug shows up when you try to bytescale an array with both cmin/cmax parameters and low/high parameters. I think it's a pretty simple fix that I would like to tackle. What is the next step for me? Creating an issue on github with some example code and data? Thanks for the input. -Greg Dooper -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Wed Sep 7 01:10:48 2016 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Wed, 7 Sep 2016 01:10:48 -0400 Subject: [SciPy-Dev] misc.bytescale bug In-Reply-To: References:

Message-ID: On Wed, Sep 7, 2016 at 12:56 AM, Greg Dooper wrote: > Hello, > It seems like there is a bug in misc.bytescale. Or I am not understanding > what the intended behavior should be. What I think is a bug shows up when > you try to bytescale an array with both cmin/cmax parameters and low/high > parameters. > > I think it's a pretty simple fix that I would like to tackle. > > What is the next step for me? Creating an issue on github with some > example code and data? > Yes, that would be great. Be sure to include an example that demonstrates the problem. Warren > Thanks for the input. > > -Greg Dooper > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.dooper at gmail.com Thu Sep 8 01:37:46 2016 From: greg.dooper at gmail.com (Greg Dooper) Date: Wed, 7 Sep 2016 23:37:46 -0600 Subject: [SciPy-Dev] misc.bytescale bug In-Reply-To: References:

Message-ID: I have a pull request up for this bug ( https://github.com/scipy/scipy/pull/6554). I should probably squash some of the commits together, but I'd rather wait on feedback first. -Greg On Tue, Sep 6, 2016 at 11:10 PM, Warren Weckesser < warren.weckesser at gmail.com> wrote: > > > On Wed, Sep 7, 2016 at 12:56 AM, Greg Dooper > wrote: > >> Hello, >> It seems like there is a bug in misc.bytescale. Or I am not understanding >> what the intended behavior should be. What I think is a bug shows up when >> you try to bytescale an array with both cmin/cmax parameters and low/high >> parameters. >> >> I think it's a pretty simple fix that I would like to tackle. >> >> What is the next step for me? Creating an issue on github with some >> example code and data? >> > > Yes, that would be great. Be sure to include an example that demonstrates > the problem. > > Warren > > > >> Thanks for the input. >> >> -Greg Dooper >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> https://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Thu Sep 8 04:11:51 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 8 Sep 2016 20:11:51 +1200 Subject: [SciPy-Dev] cKDTree In-Reply-To: References: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net>

Message-ID: On Wed, Sep 7, 2016 at 11:19 AM, Pauli Virtanen wrote: > Tue, 06 Sep 2016 20:23:05 +0200, Sylvain Corlay kirjoitti: > [clip] > > This sort of raises the question of the scope of scipy. Should scipy go > > through the same sort of "big split" as Jupyter or more ? la d3js? Is > > amount of specialized knowledge required to understand a methods part of > > what defines the line of division in what should be or not be in scipy? > > To write a bit more on this, although I think it's difficult to give hard > rules on what "generally useful and generally agreed to work" means, I > would perhaps weigh the following against each other: > > - Is the method used/useful in different domains in practice? > How much domain-specific background knowledge is needed to use it > properly? > > - Consider the stuff already in the module. Is what you are adding > an omission? Does it solve a problem that you'd expect the module > be able to solve? Does it supplement an existing feature in > a significant way? > > - Consider the equivalence class of similar methods / features usually > expected. Among them, what would in principle be the minimal set so > that there's not a glaring omission in the offered features remaining? > How much stuff would that be? Does including a representative one of > them cover most use cases? Would it in principle sound reasonable to > include everything from the minimal set in the module? > > - Is what you are adding something that is well understood in the > literature? If not, how sure are you that it will turn out well? > Does the method perform well compared to other similar ones? > > - Note that the twice-a-year release cycle and backward-compat > policy makes correcting things later on more difficult. > > The scopes of the submodules also vary, so it's probably best to consider > each as if a separate project --- "numerical evaluation of special > functions" is relatively well-defined, but "commonly needed optimization > algorithms" less so. > > On a meta-level, it's probably also bad to be too restrictive on the > scope, as telling people to go away can result to just that. > Thanks Pauli, this and your other mail are the best summary yet of how to judge suitability for inclusion. I propose to stick this almost verbatim in the developer docs. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Thu Sep 8 04:18:56 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 8 Sep 2016 20:18:56 +1200 Subject: [SciPy-Dev] cKDTree In-Reply-To: References: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net>

Message-ID: On Wed, Sep 7, 2016 at 6:23 AM, Sylvain Corlay wrote: > I understand (especially about the scikits), although I was surprised by > some recently added features in scipy, such as differential evolution > > DE is more of a "recipe" which has been studied empirically for which > there is no theoretical convergence rate. Even though it may "work well" > for certain problems, it causes some issues such as defining on which basis > it should even be "improved", and what should be the foundation for a > change. (Recently a result on convergence in probability (with no error > rate) on a bounded domain has been published, but no result exists on the > speed of convergence afaik...) > > Evolutionary optimization is definitely cool and may work strikingly well > for certain problems, but I was surprised that it got elected to inclusion > in scipy. DE would have been a nice seed for a scikit-evolution. > > For those interested in stochastic algorithms for global multivariate > optimization for which we have proven convergence results and an abundant > literature, we have at least the two following categories of methods > > - *MCMC* (Markov Chain Monte Carlo) which comprises simulated annealing, > for which there are nice results of convergence in distribution. > - *Robbins-Monro* methods, which comprise stochastic gradient methods, > for which we have almost-sure convergence and central-limit - type theorems. > I think it's fair to summarize the reason for the current status and inclusion of DE as: theoretical convergence rates and papers are nice, but code that actually works on a good set of benchmarks is way more important. Some years ago we had only bad global optimizers. We did have simulated annealing, but it just didn't work for most problems. No one stepped up to improve it, so we deprecated and removed it. DE was benchmarked quite thoroughly (on the benchmark functions from benchmarks/benchmarks/test_go_benchmark_functions.py) and came out looking good. It solved problems that the only other good optimizer we had (basinhopping) did not do well on. So that means there was value in adding it. Cheers, Ralf > This sort of raises the question of the scope of scipy. Should scipy go > through the same sort of "big split" as Jupyter or more ? la d3js? Is > amount of specialized knowledge required to understand a methods part of > what defines the line of division in what should be or not be in scipy? > > Sylvain > > On Tue, Sep 6, 2016 at 8:08 PM, Jacob Vanderplas < > jakevdp at cs.washington.edu> wrote: > >> As for adding more functionality along those lines, I would advocate >> creation of a new package or perhaps a scikit. As we've seen with >> scikit-learn, useful features can filter-up into scipy (e.g. >> sparse.csgraph) and the development of new features within an independent >> package is *much* easier than development from within a scipy PR. >> Jake >> >> Jake VanderPlas >> Senior Data Science Fellow >> Director of Research in Physical Sciences >> University of Washington eScience Institute >> >> On Tue, Sep 6, 2016 at 11:03 AM, Sylvain Corlay > > wrote: >> >>> Would you guys consider in scope for scipy to have implementation of >>> faster nearest neighbor search methods than KdTree? >>> >>> Some methods are fairly simple... e.g principal axis tree which use the >>> principal direction of the dataset to split the dataset into smaller >>> subsets. As soon as intrinsic dimensionality is significantly smaller than >>> the dimension of the space, it is significantly faster. >>> >>> Besides, only having to compute the (an approximate) principal axis is >>> much faster than doing an actual PCA. >>> >>> On Tue, Sep 6, 2016 at 4:14 AM, Jacob Vanderplas < >>> jakevdp at cs.washington.edu> wrote: >>> >>>> From my own casual benchmarks, the new scipy cKDTree is much faster >>>> than any of the scikit-learn options, though it still only supports >>>> axis-aligned euclidean-like metrics (where sklearn's BallTree supports >>>> dozens of additional metrics). The cKDTree also has a limited range of >>>> query types compared to scikit-learn's trees, >>>> Jake >>>> >>>> Jake VanderPlas >>>> Senior Data Science Fellow >>>> Director of Research in Physical Sciences >>>> University of Washington eScience Institute >>>> >>>> On Mon, Sep 5, 2016 at 12:46 AM, Da?id wrote: >>>> >>>>> On 4 September 2016 at 23:00, Robert Lucente >>>>> wrote: >>>>> > Please note that I am a newbie and just a lurker. >>>>> > >>>>> > I noticed in a recent email that cKDTree was mentioned. >>>>> > >>>>> > Q: What is the relationship if any between SciPy an scikit-learn >>>>> when it comes to cKDTree? >>>>> > >>>>> > The reason that I ask are the following 2 links >>>>> > >>>>> > https://jakevdp.github.io/blog/2013/04/29/benchmarking-neare >>>>> st-neighbor-searches-in-python/ >>>>> > >>>>> > https://github.com/scikit-learn/scikit-learn/issues/3682 >>>>> >>>>> Note that these benchmarks are from 2013 and 2014. Scipy's KDTree has >>>>> seen its performance recently improved, twice. Scikit's last update to >>>>> its KDTree was in 2015. So, we need to run the benchmarks again. >>>>> >>>>> /David. >>>>> _______________________________________________ >>>>> SciPy-Dev mailing list >>>>> SciPy-Dev at scipy.org >>>>> https://mail.scipy.org/mailman/listinfo/scipy-dev >>>>> >>>> >>>> >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at scipy.org >>>> https://mail.scipy.org/mailman/listinfo/scipy-dev >>>> >>>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> https://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> https://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sylvain.corlay at gmail.com Thu Sep 8 06:02:35 2016 From: sylvain.corlay at gmail.com (Sylvain Corlay) Date: Thu, 8 Sep 2016 12:02:35 +0200 Subject: [SciPy-Dev] cKDTree In-Reply-To: References: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net>

Message-ID: Hi Ralf, On Thu, Sep 8, 2016 at 10:18 AM, Ralf Gommers wrote: > > On Wed, Sep 7, 2016 at 6:23 AM, Sylvain Corlay > wrote: > >> I understand (especially about the scikits), although I was surprised by >> some recently added features in scipy, such as differential evolution >> >> DE is more of a "recipe" which has been studied empirically for which >> there is no theoretical convergence rate. Even though it may "work well" >> for certain problems, it causes some issues such as defining on which basis >> it should even be "improved", and what should be the foundation for a >> change. (Recently a result on convergence in probability (with no error >> rate) on a bounded domain has been published, but no result exists on the >> speed of convergence afaik...) >> > >> Evolutionary optimization is definitely cool and may work strikingly well >> for certain problems, but I was surprised that it got elected to inclusion >> in scipy. DE would have been a nice seed for a scikit-evolution. >> >> For those interested in stochastic algorithms for global multivariate >> optimization for which we have proven convergence results and an abundant >> literature, we have at least the two following categories of methods >> >> - *MCMC* (Markov Chain Monte Carlo) which comprises simulated >> annealing, for which there are nice results of convergence in distribution. >> - *Robbins-Monro* methods, which comprise stochastic gradient methods, >> for which we have almost-sure convergence and central-limit - type theorems. >> > > I think it's fair to summarize the reason for the current status and > inclusion of DE as: theoretical convergence rates and papers are nice, but > code that actually works on a good set of benchmarks is way more important. > > Some years ago we had only bad global optimizers. We did have simulated > annealing, but it just didn't work for most problems. No one stepped up to > improve it, so we deprecated and removed it. > > DE was benchmarked quite thoroughly (on the benchmark functions from > benchmarks/benchmarks/test_go_benchmark_functions.py) and came out > looking good. It solved problems that the only other good optimizer we had > (basinhopping) did not do well on. So that means there was value in adding > it. > > Cheers, > Ralf > > It does raise the question on what to define as an "improvement" for a DE optimizer since it is only empirical. Is an improvement something that makes it work better on a fixed set of examples? Is it something that makes it closer to the seminal article describing it or a later more general description of this approach? DE was added to Scipy almost at the same time as a first implementation of a simple LP solver. In the case of the LP solver, bug reports can be easily be sorted: I have this reasonably-sized and conditioned LP problem. The solver says it is infeasible why this set of values is in a feasible set -> clear bug. I find this a bit worrying that it was never question of theoretical bounds or rates for a method to be elected as part of Scipy. I would have thought that some scientific foundation was a requirement. Just imagine: I have a new uniformly filling sequence, but no proof that it is pseudo-random, and I don't even know the discrepancy of the sequence, but a bunch of examples for which "it works"... Well, I doubt that anyone would want to use it for Monte-Carlo simulation / cryptography etc... In any case, I find this question on the need of a scientific justification worthy to be answered in general - especially in the context of the discussions on scope and the 1.0 release. Sylvain -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sylvain.Gubian at pmi.com Thu Sep 8 08:24:56 2016 From: Sylvain.Gubian at pmi.com (Gubian, Sylvain) Date: Thu, 8 Sep 2016 12:24:56 +0000 Subject: [SciPy-Dev] SciPy-Dev Digest, Vol 155, Issue 12 In-Reply-To: References: Message-ID: Dear All, Following the discussion regarding DE and simulating annealing, I have made a PR for proposing a new implementation for simulating annealing (that is a generalized one). https://github.com/scipy/scipy/pull/6197 (already kindly commented by Ralf and Andrew). The homemade (parallelized) benchmarks are showing very good results especially in terms of successful rate. Since, Andrew has fixed the benchmark code for python3. In order to consider this enhancement proposal, what do you guys recommend me to do. The idea is to benchmark it with the new fixed benchmark (using the PR branch for having this new optimizer). I presented a poster in EuroSciPy in Erlangen describing the performance of this approach and had a nice talk with Olivier Grisel about it. Any thoughts to move this forward? Thanks a lot in advance. Sylvain. -----Original Message----- From: SciPy-Dev [mailto:scipy-dev-bounces at scipy.org] On Behalf Of scipy-dev-request at scipy.org Sent: jeudi 8 septembre 2016 10:19 To: scipy-dev at scipy.org Subject: SciPy-Dev Digest, Vol 155, Issue 12 Send SciPy-Dev mailing list submissions to scipy-dev at scipy.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.scipy.org/mailman/listinfo/scipy-dev or, via email, send a message with subject or body 'help' to scipy-dev-request at scipy.org You can reach the person managing the list at scipy-dev-owner at scipy.org When replying, please edit your Subject line so it is more specific than "Re: Contents of SciPy-Dev digest..." Today's Topics: 1. Re: misc.bytescale bug (Greg Dooper) 2. Re: cKDTree (Ralf Gommers) 3. Re: cKDTree (Ralf Gommers) ---------------------------------------------------------------------- Message: 1 Date: Wed, 7 Sep 2016 23:37:46 -0600 From: Greg Dooper To: SciPy Developers List Subject: Re: [SciPy-Dev] misc.bytescale bug Message-ID: Content-Type: text/plain; charset="utf-8" I have a pull request up for this bug ( https://github.com/scipy/scipy/pull/6554). I should probably squash some of the commits together, but I'd rather wait on feedback first. -Greg On Tue, Sep 6, 2016 at 11:10 PM, Warren Weckesser < warren.weckesser at gmail.com> wrote: > > > On Wed, Sep 7, 2016 at 12:56 AM, Greg Dooper > wrote: > >> Hello, >> It seems like there is a bug in misc.bytescale. Or I am not >> understanding what the intended behavior should be. What I think is a >> bug shows up when you try to bytescale an array with both cmin/cmax >> parameters and low/high parameters. >> >> I think it's a pretty simple fix that I would like to tackle. >> >> What is the next step for me? Creating an issue on github with some >> example code and data? >> > > Yes, that would be great. Be sure to include an example that > demonstrates the problem. > > Warren > > > >> Thanks for the input. >> >> -Greg Dooper >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> https://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Thu, 8 Sep 2016 20:11:51 +1200 From: Ralf Gommers To: SciPy Developers List Subject: Re: [SciPy-Dev] cKDTree Message-ID: Content-Type: text/plain; charset="utf-8" On Wed, Sep 7, 2016 at 11:19 AM, Pauli Virtanen wrote: > Tue, 06 Sep 2016 20:23:05 +0200, Sylvain Corlay kirjoitti: > [clip] > > This sort of raises the question of the scope of scipy. Should scipy > > go through the same sort of "big split" as Jupyter or more ? la > > d3js? Is amount of specialized knowledge required to understand a > > methods part of what defines the line of division in what should be or not be in scipy? > > To write a bit more on this, although I think it's difficult to give > hard rules on what "generally useful and generally agreed to work" > means, I would perhaps weigh the following against each other: > > - Is the method used/useful in different domains in practice? > How much domain-specific background knowledge is needed to use it > properly? > > - Consider the stuff already in the module. Is what you are adding > an omission? Does it solve a problem that you'd expect the module > be able to solve? Does it supplement an existing feature in > a significant way? > > - Consider the equivalence class of similar methods / features usually > expected. Among them, what would in principle be the minimal set so > that there's not a glaring omission in the offered features remaining? > How much stuff would that be? Does including a representative one of > them cover most use cases? Would it in principle sound reasonable to > include everything from the minimal set in the module? > > - Is what you are adding something that is well understood in the > literature? If not, how sure are you that it will turn out well? > Does the method perform well compared to other similar ones? > > - Note that the twice-a-year release cycle and backward-compat > policy makes correcting things later on more difficult. > > The scopes of the submodules also vary, so it's probably best to > consider each as if a separate project --- "numerical evaluation of > special functions" is relatively well-defined, but "commonly needed > optimization algorithms" less so. > > On a meta-level, it's probably also bad to be too restrictive on the > scope, as telling people to go away can result to just that. > Thanks Pauli, this and your other mail are the best summary yet of how to judge suitability for inclusion. I propose to stick this almost verbatim in the developer docs. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Thu, 8 Sep 2016 20:18:56 +1200 From: Ralf Gommers To: SciPy Developers List Subject: Re: [SciPy-Dev] cKDTree Message-ID: Content-Type: text/plain; charset="utf-8" On Wed, Sep 7, 2016 at 6:23 AM, Sylvain Corlay wrote: > I understand (especially about the scikits), although I was surprised > by some recently added features in scipy, such as differential > evolution > > DE is more of a "recipe" which has been studied empirically for which > there is no theoretical convergence rate. Even though it may "work well" > for certain problems, it causes some issues such as defining on which > basis it should even be "improved", and what should be the foundation > for a change. (Recently a result on convergence in probability (with > no error > rate) on a bounded domain has been published, but no result exists on > the speed of convergence afaik...) > > Evolutionary optimization is definitely cool and may work strikingly > well for certain problems, but I was surprised that it got elected to > inclusion in scipy. DE would have been a nice seed for a scikit-evolution. > > For those interested in stochastic algorithms for global multivariate > optimization for which we have proven convergence results and an > abundant literature, we have at least the two following categories of > methods > > - *MCMC* (Markov Chain Monte Carlo) which comprises simulated > annealing, for which there are nice results of convergence in distribution. > - *Robbins-Monro* methods, which comprise stochastic gradient > methods, for which we have almost-sure convergence and central-limit - type theorems. > I think it's fair to summarize the reason for the current status and inclusion of DE as: theoretical convergence rates and papers are nice, but code that actually works on a good set of benchmarks is way more important. Some years ago we had only bad global optimizers. We did have simulated annealing, but it just didn't work for most problems. No one stepped up to improve it, so we deprecated and removed it. DE was benchmarked quite thoroughly (on the benchmark functions from benchmarks/benchmarks/test_go_benchmark_functions.py) and came out looking good. It solved problems that the only other good optimizer we had (basinhopping) did not do well on. So that means there was value in adding it. Cheers, Ralf > This sort of raises the question of the scope of scipy. Should scipy > go through the same sort of "big split" as Jupyter or more ? la d3js? > Is amount of specialized knowledge required to understand a methods > part of what defines the line of division in what should be or not be in scipy? > > Sylvain > > On Tue, Sep 6, 2016 at 8:08 PM, Jacob Vanderplas < > jakevdp at cs.washington.edu> wrote: > >> As for adding more functionality along those lines, I would advocate >> creation of a new package or perhaps a scikit. As we've seen with >> scikit-learn, useful features can filter-up into scipy (e.g. >> sparse.csgraph) and the development of new features within an >> independent package is *much* easier than development from within a scipy PR. >> Jake >> >> Jake VanderPlas >> Senior Data Science Fellow >> Director of Research in Physical Sciences University of Washington >> eScience Institute >> >> On Tue, Sep 6, 2016 at 11:03 AM, Sylvain Corlay >> > > wrote: >> >>> Would you guys consider in scope for scipy to have implementation of >>> faster nearest neighbor search methods than KdTree? >>> >>> Some methods are fairly simple... e.g principal axis tree which use >>> the principal direction of the dataset to split the dataset into >>> smaller subsets. As soon as intrinsic dimensionality is >>> significantly smaller than the dimension of the space, it is significantly faster. >>> >>> Besides, only having to compute the (an approximate) principal axis >>> is much faster than doing an actual PCA. >>> >>> On Tue, Sep 6, 2016 at 4:14 AM, Jacob Vanderplas < >>> jakevdp at cs.washington.edu> wrote: >>> >>>> From my own casual benchmarks, the new scipy cKDTree is much faster >>>> than any of the scikit-learn options, though it still only supports >>>> axis-aligned euclidean-like metrics (where sklearn's BallTree >>>> supports dozens of additional metrics). The cKDTree also has a >>>> limited range of query types compared to scikit-learn's trees, >>>> Jake >>>> >>>> Jake VanderPlas >>>> Senior Data Science Fellow >>>> Director of Research in Physical Sciences University of >>>> Washington eScience Institute >>>> >>>> On Mon, Sep 5, 2016 at 12:46 AM, Da?id wrote: >>>> >>>>> On 4 September 2016 at 23:00, Robert Lucente >>>>> >>>>> wrote: >>>>> > Please note that I am a newbie and just a lurker. >>>>> > >>>>> > I noticed in a recent email that cKDTree was mentioned. >>>>> > >>>>> > Q: What is the relationship if any between SciPy an scikit-learn >>>>> when it comes to cKDTree? >>>>> > >>>>> > The reason that I ask are the following 2 links >>>>> > >>>>> > https://jakevdp.github.io/blog/2013/04/29/benchmarking-neare >>>>> st-neighbor-searches-in-python/ >>>>> > >>>>> > https://github.com/scikit-learn/scikit-learn/issues/3682 >>>>> >>>>> Note that these benchmarks are from 2013 and 2014. Scipy's KDTree >>>>> has seen its performance recently improved, twice. Scikit's last >>>>> update to its KDTree was in 2015. So, we need to run the benchmarks again. >>>>> >>>>> /David. >>>>> _______________________________________________ >>>>> SciPy-Dev mailing list >>>>> SciPy-Dev at scipy.org >>>>> https://mail.scipy.org/mailman/listinfo/scipy-dev >>>>> >>>> >>>> >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at scipy.org >>>> https://mail.scipy.org/mailman/listinfo/scipy-dev >>>> >>>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> https://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> https://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Subject: Digest Footer _______________________________________________ SciPy-Dev mailing list SciPy-Dev at scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev ------------------------------ End of SciPy-Dev Digest, Vol 155, Issue 12 ****************************************** From pav at iki.fi Thu Sep 8 14:26:41 2016 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 8 Sep 2016 18:26:41 +0000 (UTC) Subject: [SciPy-Dev] cKDTree References: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net>

Message-ID: Hi, Thu, 08 Sep 2016 12:02:35 +0200, Sylvain Corlay kirjoitti: > Just imagine: I have a new uniformly filling sequence, but no proof that > it is pseudo-random, and I don't even know the discrepancy of the > sequence, > but a bunch of examples for which "it works"... Well, I doubt that > anyone would want to use it for Monte-Carlo simulation / cryptography > etc... Are you presenting this as a fair analogy to DE and the literature about it? > In any case, I find this question on the need of a scientific > justification worthy to be answered in general - especially in the > context of the discussions on scope and the 1.0 release. Yes, justification that an approach works and that is regarded as useful is required. This is not really the place to do completely original research. If I understand you correctly, you are saying that you would not generally recommend DE as a global optimization method, because there are superior choices? Or, are you saying it's essentially crock? -- Pauli Virtanen From ralf.gommers at gmail.com Thu Sep 8 14:52:17 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 9 Sep 2016 06:52:17 +1200 Subject: [SciPy-Dev] cKDTree In-Reply-To: References: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net>

Message-ID: On Thu, Sep 8, 2016 at 10:02 PM, Sylvain Corlay wrote: > Hi Ralf, > > > On Thu, Sep 8, 2016 at 10:18 AM, Ralf Gommers > wrote: > >> >> On Wed, Sep 7, 2016 at 6:23 AM, Sylvain Corlay >> wrote: >> >>> I understand (especially about the scikits), although I was surprised by >>> some recently added features in scipy, such as differential evolution >>> >>> DE is more of a "recipe" which has been studied empirically for which >>> there is no theoretical convergence rate. Even though it may "work well" >>> for certain problems, it causes some issues such as defining on which basis >>> it should even be "improved", and what should be the foundation for a >>> change. (Recently a result on convergence in probability (with no error >>> rate) on a bounded domain has been published, but no result exists on the >>> speed of convergence afaik...) >>> >> >>> Evolutionary optimization is definitely cool and may work strikingly >>> well for certain problems, but I was surprised that it got elected to >>> inclusion in scipy. DE would have been a nice seed for a scikit-evolution. >>> >>> For those interested in stochastic algorithms for global multivariate >>> optimization for which we have proven convergence results and an abundant >>> literature, we have at least the two following categories of methods >>> >>> - *MCMC* (Markov Chain Monte Carlo) which comprises simulated >>> annealing, for which there are nice results of convergence in distribution. >>> - *Robbins-Monro* methods, which comprise stochastic gradient methods, >>> for which we have almost-sure convergence and central-limit - type theorems. >>> >> >> I think it's fair to summarize the reason for the current status and >> inclusion of DE as: theoretical convergence rates and papers are nice, but >> code that actually works on a good set of benchmarks is way more important. >> >> Some years ago we had only bad global optimizers. We did have simulated >> annealing, but it just didn't work for most problems. No one stepped up to >> improve it, so we deprecated and removed it. >> >> DE was benchmarked quite thoroughly (on the benchmark functions from >> benchmarks/benchmarks/test_go_benchmark_functions.py) and came out >> looking good. It solved problems that the only other good optimizer we had >> (basinhopping) did not do well on. So that means there was value in adding >> it. >> >> Cheers, >> Ralf >> >> > It does raise the question on what to define as an "improvement" for a DE > optimizer since it is only empirical. Is an improvement something that > makes it work better on a fixed set of examples? > Unless the set of examples is too small or the improvement is tailored for those examples (which requires a bit of judgement by the reviewers), yes. > Is it something that makes it closer to the seminal article describing it > or a later more general description of this approach? > That has to be judged on a case by case basis I'd say. > > I find this a bit worrying that it was never question of theoretical > bounds or rates for a method to be elected as part of Scipy. I would have > thought that some scientific foundation was a requirement. > There is a large body of peer-reviewed work for DE, and Google Scholar tells me that the original Storn and Price paper has 11652 citations. So that very very clearly meets any requirement for a scientific foundation we have. I'm not sure if your questioning of reasons for including DE has anything to do with your original question of adding new methods to KDTree, but on that topic: I would suggest to sketch the picture of pros and cons for Jake's suggestion of a scikit. How wide would the scope of that scikit potentially be, is there a smaller subset of methods that would make sense for scipy. Yu Feng and Sturla Molden have been improving cKDTree pretty much every release, so it's certainly not code that is defined as maintenance-only. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu Sep 8 15:12:39 2016 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 8 Sep 2016 19:12:39 +0000 (UTC) Subject: [SciPy-Dev] cKDTree References: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net>

Message-ID: Tue, 06 Sep 2016 20:23:05 +0200, Sylvain Corlay kirjoitti: > I understand (especially about the scikits), although I was surprised by > some recently added features in scipy, such as differential evolution Also, it could be helpful to list these other added features that you think are problematic. -- Pauli Virtanen From sylvain.corlay at gmail.com Thu Sep 8 17:12:54 2016 From: sylvain.corlay at gmail.com (Sylvain Corlay) Date: Thu, 8 Sep 2016 23:12:54 +0200 Subject: [SciPy-Dev] cKDTree In-Reply-To: References: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net>

Message-ID: Hi Pauli, Ralf, On Thu, Sep 8, 2016 at 8:26 PM, Pauli Virtanen wrote: > > > Just imagine: I have a new uniformly filling sequence, but no proof that > > it is pseudo-random, and I don't even know the discrepancy of the > > sequence, > > but a bunch of examples for which "it works"... Well, I doubt that > > anyone would want to use it for Monte-Carlo simulation / cryptography > > etc... > > Are you presenting this as a fair analogy to DE and the literature > about it? > > > In any case, I find this question on the need of a scientific > > justification worthy to be answered in general - especially in the > > context of the discussions on scope and the 1.0 release. > > Yes, justification that an approach works and that is regarded as useful > is required. This is not really the place to do completely original > research. > > If I understand you correctly, you are saying that you would not > generally recommend DE as a global optimization method, because there are > superior choices? Or, are you saying it's essentially crock? That is not really it. I was mostly pointing DE as an example which is sort of in a gray area, in order to ask the questions of scope, criterion for inclusion into scipy etc. When it was included into scipy, it triggered my attention since I had worked on flavors of DE in the past (It is used as an alternative to lloyd algorithms for optimal quantization). There is indeed some literature about the applications in this area. So I did find it useful, and found that it does work well for the problems I used it for. (However the sort of generic implementation proposed today in scipy would not have been a good fit in this case.) My understanding of what scipy's scope is, is that it is a collection of routines that are robust reference implementations of well established numerical methods that are of general interest regardless of the area of applications. Linear algebra, interpolation, quadrature, random number generation, convex optimization, specialized optimizers for dense LP and QP etc... In each ones of these areas, if you need something more specialized, you should probably used a specialized library or implement something ad-hoc to your use case. Evolutionary optimization algorithms don't seem to fall into this category for the reasons that we discussed earlier. It is mostly a set of heuristics. It is cool, inspired by nature, etc. (however, a number of citations is probably not a substitute for a mathematical proof...) The other methods that I listed for stochastic optimization would have been more natural candidates to fall into the "category" that I roughly described above, in that they are extremely well established and backed by theory. I imagine that the inclusion of DE into Scipy could have been questioned at the time, but that now that it is in there, it should probably not be removed without a good alternative. Finally, I am still curious about what can be considered a bug or a feature in the case of a method like this. On the subject of the use of a faster flavor of KdTree as I was proposing, I was only gauging interest. The long discussion on DE on this specific thread is mostly coincidental. My main goal was to use it as an example for the question of the scope - also to ask about the "big split" idea. If there was to be a split, a potential scipy-incubator organization with proposals for inclusion as scipy subprojects would make sense then... Sylvain -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Sep 9 18:25:55 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 10 Sep 2016 10:25:55 +1200 Subject: [SciPy-Dev] SciPy governance model In-Reply-To: References: <1473151205.1482512.716887281.63889567@webmail.messagingengine.com> Message-ID: On Wed, Sep 7, 2016 at 1:37 AM, Eric Larson wrote: > > My feeling is that having a clear leader in place is important, so I'm >> also leaning away from the numpy model towards one where >> responsibilities are more explicitly assigned. Exactly how to best make >> that assignment is still unclear to me. >> > > +1 for BD(FL) / leader-style from me, too. I like Matthew's suggestion of > the top 5 active folks discuss to see which of them are actually interested > in taking on that role. > Thanks for the feedback everyone. Looks like everyone likes this suggestion so far, so we'll give that a try. And Matthew, thanks for the offer to help with drafting some docs. I'll contact you off-list about that. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Sep 10 01:44:55 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 10 Sep 2016 17:44:55 +1200 Subject: [SciPy-Dev] New simulated annealing implementation (PR 6197) Message-ID: On Fri, Sep 9, 2016 at 12:24 AM, Gubian, Sylvain wrote: > Dear All, > > Following the discussion regarding DE and simulating annealing, I have > made a PR for proposing a new implementation for simulating annealing (that > is a generalized one). > https://github.com/scipy/scipy/pull/6197 (already kindly commented by > Ralf and Andrew). > The homemade (parallelized) benchmarks are showing very good results > especially in terms of successful rate. > Since, Andrew has fixed the benchmark code for python3. In order to > consider this enhancement proposal, what do you guys recommend me to do. > The idea is to benchmark it with the new fixed benchmark (using the PR > branch for having this new optimizer). I presented a poster in EuroSciPy in > Erlangen describing the performance of this approach and had a nice talk > with Olivier Grisel about it. > Any thoughts to move this forward? > > Thanks a lot in advance. > Hi Sylvain, thanks for reminding us of your PR. Now that the issue with the benchmarks is fixed, I'd say you can indeed just add your gensa to the existing benchmarks I'd say. Then if those look good as well (I'd expect so, given the amount of work you put in your own benchmark) then let's try to get your PR merged. If you have your poster somewhere public, it would be nice to link it from the PR. I'd be interested to have a look at it. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Sep 10 06:34:19 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 10 Sep 2016 22:34:19 +1200 Subject: [SciPy-Dev] cKDTree In-Reply-To: References: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net>

Message-ID: On Fri, Sep 9, 2016 at 9:12 AM, Sylvain Corlay wrote: > Hi Pauli, Ralf, > > On Thu, Sep 8, 2016 at 8:26 PM, Pauli Virtanen wrote: >> >> > Just imagine: I have a new uniformly filling sequence, but no proof that >> > it is pseudo-random, and I don't even know the discrepancy of the >> > sequence, >> > but a bunch of examples for which "it works"... Well, I doubt that >> > anyone would want to use it for Monte-Carlo simulation / cryptography >> > etc... >> >> Are you presenting this as a fair analogy to DE and the literature >> about it? >> >> > In any case, I find this question on the need of a scientific >> > justification worthy to be answered in general - especially in the >> > context of the discussions on scope and the 1.0 release. >> >> Yes, justification that an approach works and that is regarded as useful >> is required. This is not really the place to do completely original >> research. >> >> If I understand you correctly, you are saying that you would not >> generally recommend DE as a global optimization method, because there are >> superior choices? Or, are you saying it's essentially crock? > > > That is not really it. I was mostly pointing DE as an example which is > sort of in a gray area, in order to ask the questions of scope, criterion > for inclusion into scipy etc. > > When it was included into scipy, it triggered my attention since I had > worked on flavors of DE in the past (It is used as an alternative to lloyd > algorithms for optimal quantization). There is indeed some literature about > the applications in this area. So I did find it useful, and found that it > does work well for the problems I used it for. (However the sort of generic > implementation proposed today in scipy would not have been a good fit in > this case.) > > My understanding of what scipy's scope is, is that it is a collection of > routines that are robust reference implementations > "reference implementation" has a bit of a negative implication for me. Like we have reference BLAS/LAPACK, which are references for correctness but you should really be using something better for pretty much any application. That is definitely not the intention for anything in Scipy, nor is it the case for most modules. If we have an algorithm or data structure, we want to strike a good balance between features, maintainability, usability and performance. > of well established numerical methods that are of general interest > regardless of the area of applications. Linear algebra, interpolation, > quadrature, random number generation, convex optimization, specialized > optimizers for dense LP and QP etc... In each ones of these areas, if you > need something more specialized, you should probably used a specialized > library or implement something ad-hoc to your use case. > For linear algebra there's really not that much more specialized that we want to not have in scope. Statistical distributions, special functions and distance metrics are other examples of where we go for comprehensive coverage. It can also depend on what other packages are out there, for example for hypothesis tests we don't accept much anymore because statsmodels is a better package for more comprehensive coverage. > > Evolutionary optimization algorithms don't seem to fall into this category > for the reasons that we discussed earlier. It is mostly a set of > heuristics. It is cool, inspired by nature, etc. (however, a number of > citations is probably not a substitute for a mathematical proof...) > We're just going to have to disagree about this one. You seem to attach an unusually high value to a "mathematical proof" (not that that's a black or white thing either), and a low value to things like realistic benchmarks and solving users' problems. It's not about number of citations either. The AMPGO (http://infinity77.net/global_optimization/ampgo.html, https://github.com/andyfaff/ampgo) paper has only 14 I see, but we'd probably add it if someone submits a good-quality PR. > The other methods that I listed for stochastic optimization would have > been more natural candidates to fall into the "category" that I roughly > described above, in that they are extremely well established and backed by > theory. I imagine that the inclusion of DE into Scipy could have been > questioned at the time, > I still don' see any reason for having had to question that inclusion. > > but that now that it is in there, it should probably not be removed > without a good alternative. Finally, I am still curious about what can be > considered a bug or a feature in the case of a method like this. > I answered the feature part already, but I guess you didn't like the answer much:) > On the subject of the use of a faster flavor of KdTree as I was proposing, > I was only gauging interest. The long discussion on DE on this specific > thread is mostly coincidental. My main goal was to use it as an example for > the question of the scope > I hope this discussion helped a bit. But if you want a fixed definition of scope and a recipe to apply so that you can know without discussion if a new feature will be in scope for Scipy - that's not possible I'm afraid. There's always some judgment involved. > - also to ask about the "big split" idea. If there was to be a split, a > potential scipy-incubator organization with proposals for inclusion as > scipy subprojects would make sense then... > There are no plans for splitting up SciPy. It was considered several times, but it's really not the best way to spend our time. It's possible we would get more development on some modules, but that's not guaranteed, and the maintenance and release overhead goes up significantly. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sat Sep 10 08:34:02 2016 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 10 Sep 2016 12:34:02 +0000 (UTC) Subject: [SciPy-Dev] cKDTree References: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net>

Message-ID: Hi, Thu, 08 Sep 2016 23:12:54 +0200, Sylvain Corlay kirjoitti: [clip] > On the subject of the use of a faster flavor of KdTree as I was > proposing, I was only gauging interest. The long discussion on DE on > this specific thread is mostly coincidental. My main goal was to use it > as an example for the question of the scope - also to ask about the > "big split" idea. If there was to be a split, a potential > scipy-incubator organization with proposals for inclusion as > scipy subprojects would make sense then... If there was to be a split, > a potential scipy-incubator organization with proposals for inclusion > as scipy subprojects would make sense then... Now that we're still derailed: Several years back, there was a scipy.sandbox package which was nominally aligned with things intended for inclusion, but it was scrapped in the end: http://blog.jarrodmillman.com/2007/12/end-of-scipy-sandbox.html Some of these did end up as scipy subpackages, but many also were scrapped or turned up into separate projects. This was of course before Git/Github --- nowadays I think the role is mainly served by PRs, and for more complicated stuff, separate ad-hoc repositories plus issue tickets. Of course, scipy.sandbox had also overlap with completely separate projects. With the "staging area", the main question is that is the amount of significant new features proposed such that there should be a separate "staging area", aside from PRs etc. Or, would such staging area encourage their creation? Or would it be useful for creating completely new projects? It's not really clear to me that this is the case. In a more decentralized "big split" approach, "Scipy" probably would mainly stand as some sort an umbrella organization for more or less separate projects. Does such central authority need to exist, apart from the sense of maintaining the specific projects? Is it better in some sense than the completely decentralized scikit approach? -- Pauli Virtanen From sylvain.corlay at gmail.com Sat Sep 10 10:31:24 2016 From: sylvain.corlay at gmail.com (Sylvain Corlay) Date: Sat, 10 Sep 2016 16:31:24 +0200 Subject: [SciPy-Dev] cKDTree In-Reply-To: References: <10843293.1473022808690.JavaMail.wam@mswamui-backed.atl.sa.earthlink.net>

Message-ID: Hey, On Sat, Sep 10, 2016 at 2:34 PM, Pauli Virtanen wrote: > Hi, > > Thu, 08 Sep 2016 23:12:54 +0200, Sylvain Corlay kirjoitti: > [clip] > > On the subject of the use of a faster flavor of KdTree as I was > > proposing, I was only gauging interest. The long discussion on DE on > > this specific thread is mostly coincidental. My main goal was to use it > > as an example for the question of the scope - also to ask about the > > "big split" idea. If there was to be a split, a potential > > scipy-incubator organization with proposals for inclusion as > > scipy subprojects would make sense then... If there was to be a split, > > a potential scipy-incubator organization with proposals for inclusion > > as scipy subprojects would make sense then... > > Now that we're still derailed: > > Several years back, there was a scipy.sandbox package which was nominally > aligned with things intended for inclusion, but it was scrapped in the > end: http://blog.jarrodmillman.com/2007/12/end-of-scipy-sandbox.html > Some of these did end up as scipy subpackages, but many also were > scrapped or turned up into separate projects. > > This was of course before Git/Github --- nowadays I think the role is > mainly served by PRs, and for more complicated stuff, separate ad-hoc > repositories plus issue tickets. Of course, scipy.sandbox had also > overlap with completely separate projects. > > With the "staging area", the main question is that is the amount of > significant new features proposed such that there should be a separate > "staging area", aside from PRs etc. Or, would such staging area > encourage their creation? Or would it be useful for creating completely > new projects? It's not really clear to me that this is the case. > > In a more decentralized "big split" approach, "Scipy" probably would > mainly stand as some sort an umbrella organization for more or less > separate projects. Does such central authority need to exist, apart from > the sense of maintaining the specific projects? Is it better in some > sense than the completely decentralized scikit approach? > The benefits / reasons of a split would be the same as usual: managing growth by a separation of concerns and focus, with lines of division would be defined with both technological and scientific criterions. Keeping things under a single umbrella would show the intention of keeping consistency between the different components on certain aspects. Experiments that imply significant changes like a new type of language bindings would have less implications for packaging since they would remain separate. Cheers, Sylvain -------------- next part -------------- An HTML attachment was scrubbed... URL: From milesdowe at gmail.com Sat Sep 10 16:35:06 2016 From: milesdowe at gmail.com (Miles Dowe) Date: Sat, 10 Sep 2016 20:35:06 +0000 Subject: [SciPy-Dev] scipy.io.wavfile to read byte array directly? Message-ID: Hi all, I was interested in the creation of a function for the scipy.io.wavfile utility. Rather than requiring that read() only be performed on a file, I'd like to add a read() function where a byte array of WAV data can be provided directly. Here's some background behind this motivation. I am a student with the University of Washington and I have been working with a former student's machine learning algorithm. The aim of the algorithm is to detect human laughter and it utilizes SciPy and NumPy. We're aiming to create a service-oriented architecture maintained in AWS and our audio data is stored within S3. I've been experimenting with the Boto3 library, which returns a byte array, and I'd like to provide that data directly to the machine learning script (instead of writing to the disk and reading from it). I'd like to hear your thoughts and might experiment with this idea until approval is expressed by the community. Thank you for your time, Miles -- Miles -------------- next part -------------- An HTML attachment was scrubbed... URL: From joe at neoturbine.net Sat Sep 10 16:53:22 2016 From: joe at neoturbine.net (Joseph Booker) Date: Sat, 10 Sep 2016 16:53:22 -0400 Subject: [SciPy-Dev] scipy.io.wavfile to read byte array directly? In-Reply-To: References: Message-ID: Miles, Are you aware of io.BytesIO? I don't know the performance implications of using a wrapper, but I'd expect loading the data to take marginal time compared to training your ML model. -- Joseph On Sep 10, 2016 4:35 PM, "Miles Dowe" wrote: > Hi all, > > I was interested in the creation of a function for the scipy.io.wavfile > utility. Rather than requiring that read() only be performed on a file, I'd > like to add a read() function where a byte array of WAV data can be > provided directly. > > Here's some background behind this motivation. I am a student with the > University of Washington and I have been working with a former student's > machine learning algorithm. The aim of the algorithm is to detect human > laughter and it utilizes SciPy and NumPy. > > We're aiming to create a service-oriented architecture maintained in AWS > and our audio data is stored within S3. I've been experimenting with the > Boto3 library, which returns a byte array, and I'd like to provide that > data directly to the machine learning script (instead of writing to the > disk and reading from it). > > I'd like to hear your thoughts and might experiment with this idea until > approval is expressed by the community. > > Thank you for your time, > > > Miles > -- > Miles > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > https://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Sep 10 17:46:45 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 11 Sep 2016 09:46:45 +1200 Subject: [SciPy-Dev] scipy.io.wavfile to read byte array directly? In-Reply-To: