From josef.pktd at gmail.com Sun Apr 1 11:36:09 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 1 Apr 2012 11:36:09 -0400 Subject: [SciPy-Dev] cephes_smirnov never returns on mips/sparc/... In-Reply-To: References: <20120330183500.GG22956@onerussian.com> <20120330203758.GH22956@onerussian.com> <20120331004538.GI22956@onerussian.com> <20120331015056.GL22956@onerussian.com> <20120331024518.GN22956@onerussian.com> <20120331151516.GO22956@onerussian.com> Message-ID: On Sat, Mar 31, 2012 at 12:02 PM, wrote: > On Sat, Mar 31, 2012 at 11:15 AM, Yaroslav Halchenko > wrote: >> Probably you are right Josef -- especially since I am only distantly familiar >> with KS test -- but lets keep the dialog open a bit longer ;) : >> >>> But what's the point in fitting ksone? >> >> for me it was just that it has .fit() ;) ? ?You might recall (I believe I >> appeared on the list long ago with similar whining and that is how we got >> introduced to each other) our evil/silly function in PyMVPA >> match_distributions which simply tries to choose the best matching distribution >> given the data -- that is the reason how ksone got involved > > I remember and if I remember correctly, then I recommended using a > blacklist of distributions to avoid. > > The last time I looked at the source of pymvpa, you used all > distribution in the fit and then reported the best fitting ones. At > the bottom of this ranking there should be some distributions that > will (almost) never be a good match because fit doesn't work for them. > The only time you see how bad they are is in extreme cases like going > off to neverland. > >> >>> > if starting values are the most sensible -- then yeap -- them ;) >>> > if I ask to 'fit' something, getting some fit is better than getting no >>> > fit (as NaNs in output suggest) >> >>> getting the starting values back doesn't mean that you have "some" fit. >> >>> If my brief playing with it today is correct, then the starting values >>> don't make sense, for example you have points outside of the support >>> of the distribution with estimated parameters (if you have negative >>> values in the sample) >> >>> NaN would be better, then at least you know it doesn't make sense. >> >> 1. to me the big question became: what ARE the logical values here? > > if you look at my second message above, you see some examples, where > fit returns numbers. > I didn't check how good they are. > >> >> followed docstring/example on >> http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ksone.html >> -- got NaNs >> >> then given that >> >> In [44]: ksone.a, ksone.b >> Out[44]: (0.0, inf) >> >> I still failed to get any sensible fit() for positive values or even for >> its own creation, e.g. >> >> ss.ksone.fit(ss.ksone(5).rvs(size=100)) > >>>> rv = stats.ksone(50).rvs(size=1000) >>>> plt.hist(rv, bins=30, normed=True, cumulative=True) >>>> x = np.linspace(0, rv.max(), 1000); plt.plot(x, stats.ksone(50).cdf(x)) >>>> plt.show() > >>>> stats.ksone.fit(rv, 100, loc=-0.01, scale=1) > > (181.94347728444751, -3.8554246919087482e-05, 1.9277121337713585) >>>> stats.ksone.fit(rv, 10, loc=-0.01, scale=1) > (13.999896396912176, -0.010783712808254388, 0.57818285700694405) > >> >> results in bulk of warnings and then (1.0, nan, nan). >> >> Looking in detail -- rvs is happily generating NaNs (especially for small n's). >> >> b. Also the range of sensible values of the parameter n isn't specified >> anywhere for KS test newbies like me, which I guess adds the confusion: >> >>> support of the sample would help. I have no idea about good starting >>> values for the shape parameter (n is sample size for kstest) >> >> aga -- so the 'demo' value of 0.9 indeed makes no sense ;) ?Might be >> worth adjusting somehow? >> >> 2. >> >> BTW -- trying to familiarize myself with the distribution plotted its >> pdf, e.g.: >> >> x = np.linspace(0, 3, 1000); plt.plot(x, ksone(10).pdf(x)) >> >> and it looks weirdish: http://www.onerussian.com/tmp/ksone-ns.png in that it is >> not smooth and my algebra-forgotten eyes do not see obvious points with >> no 2nd derivative of cdf given on >> http://en.wikipedia.org/wiki/Kolmogorov_Smirnov > > IIRC (no time to check again right now): > ksone is, I think, a small sample distribution, > kstwobign is the distribution of the max/sup of a Brownian Bridge, > which is the asymptotic distribution for Kolmogorov-Smirnov I needed to check the source: the c source says smirnov is the distribution for one-sided test, and I had forgotten that I had added the one-sided option to kstest. algorithm also used by R: " The formula of Birnbaum & Tingey (1951) is used for the one-sample one-sided case." http://www.jstor.org/stable/2236929 ks_2samp is still missing the one-sided options Josef anderson darling is most of the time more powerful than KS https://github.com/aarchiba/kuiper no license > > as distribution we are mainly interested in cdf and ppf (both look > reasonably good in ?a plot), and mainly in the right tail > ksone looks like a piecewise approximation, where they didn't care > much about the lower part. > > (I'm a bit rushed right now so there might be parts missing in my reply) > > Josef > >> >> Also why ksone.b is inf -- shouldn't it be 1? >> >> -- >> =------------------------------------------------------------------= >> Keep in touch ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? www.onerussian.com >> Yaroslav Halchenko ? ? ? ? ? ? ? ? www.ohloh.net/accounts/yarikoptic >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev From lists at onerussian.com Sun Apr 1 18:37:00 2012 From: lists at onerussian.com (Yaroslav Halchenko) Date: Sun, 1 Apr 2012 18:37:00 -0400 Subject: [SciPy-Dev] cephes_smirnov never returns on mips/sparc/... In-Reply-To: References: <20120331004538.GI22956@onerussian.com> <20120331015056.GL22956@onerussian.com> <20120331024518.GN22956@onerussian.com> <20120331151516.GO22956@onerussian.com> Message-ID: <20120401223700.GQ22956@onerussian.com> Dear Anne, Josef has referred me to your kuiper repository which does not list any copyright/license for the code you made available. Would you mind clarifying the situation and may be releasing the code under the same license as scipy -- BSD 3-clause?, e.g.: Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: a. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. b. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. c. Neither the name of the Enthought nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Thanks in advance for the reply On Sun, 01 Apr 2012, josef.pktd at gmail.com wrote: > Josef > anderson darling is most of the time more powerful than KS > https://github.com/aarchiba/kuiper no license -- =------------------------------------------------------------------= Keep in touch www.onerussian.com Yaroslav Halchenko www.ohloh.net/accounts/yarikoptic From warren.weckesser at enthought.com Wed Apr 4 17:30:35 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Wed, 4 Apr 2012 16:30:35 -0500 Subject: [SciPy-Dev] SciPy 2012 - The Eleventh Annual Conference on Scientific Computing with Python Message-ID: SciPy 2012, the eleventh annual Conference on Scientific Computing with Python, will be held July 16?21, 2012, in Austin, Texas. At this conference, novel scientific applications and libraries related to data acquisition, analysis, dissemination and visualization using Python are presented. Attended by leading figures from both academia and industry, it is an excellent opportunity to experience the cutting edge of scientific software development. The conference is preceded by two days of tutorials, during which community experts provide training on several scientific Python packages. Following the main conference will be two days of coding sprints. We invite you to give a talk or present a poster at SciPy 2012. The list of topics that are appropriate for the conference includes (but is not limited to): - new Python libraries for science and engineering; - applications of Python in solving scientific or computational problems; - high performance, parallel and GPU computing with Python; - use of Python in science education. Specialized Tracks Two specialized tracks run in parallel to the main conference: - High Performance Computing with Python Whether your algorithm is distributed, threaded, memory intensive or latency bound, Python is making headway into the problem. We are looking for performance driven designs and applications in Python. Candidates include the use of Python within a parallel application, new architectures, and ways of making traditional applications execute more efficiently. - Visualization They say a picture is worth a thousand words--we?re interested in both! Python provides numerous visualization tools that allow scientists to show off their work, and we want to know about any new tools and techniques out there. Come show off your latest graphics, whether it?s an old library with a slick new feature, a new library out to challenge the status quo, or simply a beautiful result. Domain-specific Mini-symposia Mini-symposia on the following topics are also being organized: - Computational bioinformatics - Meteorology and climatology - Astronomy and astrophysics - Geophysics Talks, papers and posters We invite you to take part by submitting a talk or poster abstract. Instructions are on the conference website: http://conference.scipy.org/scipy2012/talks.php Selected talks are included as papers in the peer-reviewed conference proceedings, to be published online. Tutorials Tutorials will be given July 16?17. We invite instructors to submit proposals for half-day tutorials on topics relevant to scientific computing with Python. See http://conference.scipy.org/scipy2012/tutorials.php for information about submitting a tutorial proposal. To encourage tutorials of the highest quality, the instructor (or team of instructors) is given a $1,000 stipend for each half day tutorial. Student/Community Scholarships We anticipate providing funding for students and for active members of the SciPy community who otherwise might not be able to attend the conference. See http://conference.scipy.org/scipy2012/student.php for scholarship application guidelines. Be a Sponsor The SciPy conference could not run without the generous support of the institutions and corporations who share our enthusiasm for Python as a tool for science. Please consider sponsoring SciPy 2012. For more information, see http://conference.scipy.org/scipy2012/sponsor/index.php Important dates: Monday, April 30: Talk abstracts and tutorial proposals due. Monday, May 7: Accepted tutorials announced. Monday, May 13: Accepted talks announced. Monday, June 18: Early registration ends. (Price increases after this date.) Sunday, July 8: Online registration ends. Monday-Tuesday, July 16 - 17: Tutorials Wednesday-Thursday, July 18 - July 19: Conference Friday-Saturday, July 20 - July 21: Sprints We look forward to seeing you all in Austin this year! The SciPy 2012 Team http://conference.scipy.org/scipy2012/organizers.php -------------- next part -------------- An HTML attachment was scrubbed... URL: From dturgut at gmail.com Fri Apr 6 02:10:54 2012 From: dturgut at gmail.com (Deniz Turgut) Date: Fri, 6 Apr 2012 02:10:54 -0400 Subject: [SciPy-Dev] stats.ttest_ind doc Message-ID: docs [1] say that ttest_ind: ? Calculates the T-test for the means of TWO INDEPENDENT samples of scores. ? This is a two-sided test for the null hypothesis that 2 independent samples have identical average (expected) values. This function performs t-test for two independent samples with *identical variances*. There is also a form of t-test for independent samples with different variances, also known as Welch's t-test [2]. I think it is better to include the 'identical variance' assumption in the doc to avoid confusion. PS: I could do the edit. I registered but looks like I need edit rights. And apparently this is the place to ask for it. My username: avaris [1] http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html [2] http://en.wikipedia.org/wiki/Welch's_t_test From gael.varoquaux at normalesup.org Fri Apr 6 05:29:21 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 6 Apr 2012 11:29:21 +0200 Subject: [SciPy-Dev] stats.ttest_ind doc In-Reply-To: References: Message-ID: <20120406092921.GA1766@phare.normalesup.org> On Fri, Apr 06, 2012 at 02:10:54AM -0400, Deniz Turgut wrote: > PS: I could do the edit. I registered but looks like I need edit > rights. And apparently this is the place to ask for it. My username: > avaris I have given you edit rights. Gael From lists at hilboll.de Wed Apr 11 12:53:25 2012 From: lists at hilboll.de (Andreas H.) Date: Wed, 11 Apr 2012 18:53:25 +0200 Subject: [SciPy-Dev] wrapped FITPACK's `sphere.f` Message-ID: <268004a6823eb3d9418d1256cb045411.squirrel@srv2.s4y.tournesol-consulting.eu> Hi, I just wrapped FITPACK's `sphere.f` for scipy.interpolate, to make interpolation/smoothing in spherical coordinates possible. Here's the pull request: https://github.com/scipy/scipy/pull/192 Sorry for not discussing this here earlier, I just now found out about the "Contributing to SciPy" guide. Cheers, Andreas. From eugeneai at irnok.net Thu Apr 12 03:18:31 2012 From: eugeneai at irnok.net (=?UTF-8?B?0JXQstCz0LXQvdC40Lkg0KfQtdGA0LrQsNGI0LjQvQ==?=) Date: Thu, 12 Apr 2012 16:18:31 +0900 Subject: [SciPy-Dev] Could anyone compile win32 Py-2.7.2 module for SciPy-0.11dev as is? Message-ID: Hi! I need some last features of signal processing developed in SciPy-0.11dev for merging in my project for Windows users, but I am not able to compile it myself for Win32 and Python-2.7.2. Can Anyone help me and send me a compiled module. It not necessary to the module to be stable one. Thank You and Best regards, Evgeny -------------- next part -------------- An HTML attachment was scrubbed... URL: From denis-bz-py at t-online.de Thu Apr 12 13:04:47 2012 From: denis-bz-py at t-online.de (denis) Date: Thu, 12 Apr 2012 17:04:47 +0000 (UTC) Subject: [SciPy-Dev] testbench for some non-derivative optimizers in NLopt and scipy.optimize Message-ID: Folks, fwiw, https://github.com/denis-bz/opt has testbenches for some of the non-derivative optimizers in NLopt and scipy.optimize. >From opt-py.Readme -- These few test functions are *very* noisy, sensitive to random seed: fmin 0.4 neval 369 LN_BOBYQA fmin 3 neval 580 LN_BOBYQA fmin 0.06 neval 423 LN_BOBYQA fmin 12 neval 385 LN_BOBYQA fmin 2 neval 325 LN_BOBYQA ... ==> we need test functions. Comments, any birds of a feather ? cheers -- denis From ralf.gommers at googlemail.com Sat Apr 14 04:56:54 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 14 Apr 2012 10:56:54 +0200 Subject: [SciPy-Dev] Contributing to SciPy guide Message-ID: Hi all, It has been pointed out by a number of people that it's not so easy to get started with contributing to SciPy, and better documentation may help here. So I've written a guide for this. It would be great to get some feedback especially from people who've found it difficult to find this information before. And if you haven't contributed before but were thinking about doing so, perhaps this is a good opportunity to get started! Pull Request: https://github.com/scipy/scipy/pull/191 Rendered guide: https://github.com/rgommers/scipy/blob/howto-contribute/doc/HOWTO_CONTRIBUTE.rst.txt Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim at cerazone.net Sat Apr 14 07:59:35 2012 From: tim at cerazone.net (Tim Cera) Date: Sat, 14 Apr 2012 07:59:35 -0400 Subject: [SciPy-Dev] Contributing to SciPy guide In-Reply-To: References: Message-ID: This is a fantastic document. Thanks! The only thing that occurred to me that was missing was a section on licensing issues. Kindest regards, Tim On Apr 14, 2012 4:57 AM, "Ralf Gommers" wrote: > Hi all, > > It has been pointed out by a number of people that it's not so easy to get > started with contributing to SciPy, and better documentation may help here. > So I've written a guide for this. It would be great to get some feedback > especially from people who've found it difficult to find this information > before. And if you haven't contributed before but were thinking about doing > so, perhaps this is a good opportunity to get started! > > Pull Request: https://github.com/scipy/scipy/pull/191 > Rendered guide: > https://github.com/rgommers/scipy/blob/howto-contribute/doc/HOWTO_CONTRIBUTE.rst.txt > > Cheers, > Ralf > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Apr 15 04:41:11 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 15 Apr 2012 10:41:11 +0200 Subject: [SciPy-Dev] Contributing to SciPy guide In-Reply-To: References: Message-ID: On Sat, Apr 14, 2012 at 1:59 PM, Tim Cera wrote: > This is a fantastic document. Thanks! The only thing that occurred to me > that was missing was a section on licensing issues. > Thanks Tim, that's a very good point. I added some info on licensing issues and a link to http://www.scipy.org/License_Compatibility Ralf > On Apr 14, 2012 4:57 AM, "Ralf Gommers" > wrote: > >> Hi all, >> >> It has been pointed out by a number of people that it's not so easy to >> get started with contributing to SciPy, and better documentation may help >> here. So I've written a guide for this. It would be great to get some >> feedback especially from people who've found it difficult to find this >> information before. And if you haven't contributed before but were thinking >> about doing so, perhaps this is a good opportunity to get started! >> >> Pull Request: https://github.com/scipy/scipy/pull/191 >> Rendered guide: >> https://github.com/rgommers/scipy/blob/howto-contribute/doc/HOWTO_CONTRIBUTE.rst.txt >> >> Cheers, >> Ralf >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Apr 15 07:09:36 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 15 Apr 2012 13:09:36 +0200 Subject: [SciPy-Dev] Trac unlock? Message-ID: Hi, SciPy Trac now seems permanently locked. Can someone who knows how please kick it so it works again after a few tries (or even on the first try)? Thanks, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sun Apr 15 07:16:18 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 15 Apr 2012 13:16:18 +0200 Subject: [SciPy-Dev] Trac unlock? In-Reply-To: References: Message-ID: Hi, 15.04.2012 13:09, Ralf Gommers kirjoitti: > SciPy Trac now seems permanently locked. Can someone who knows how > please kick it so it works again after a few tries (or even on the first > try)? Seems to be working again. Pauli From ralf.gommers at googlemail.com Sun Apr 15 08:02:14 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 15 Apr 2012 14:02:14 +0200 Subject: [SciPy-Dev] Trac unlock? In-Reply-To: References: Message-ID: On Sun, Apr 15, 2012 at 1:16 PM, Pauli Virtanen wrote: > Hi, > > 15.04.2012 13:09, Ralf Gommers kirjoitti: > > SciPy Trac now seems permanently locked. Can someone who knows how > > please kick it so it works again after a few tries (or even on the first > > try)? > > Seems to be working again. > > Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.gramfort at inria.fr Sun Apr 15 08:28:32 2012 From: alexandre.gramfort at inria.fr (Alexandre Gramfort) Date: Sun, 15 Apr 2012 14:28:32 +0200 Subject: [SciPy-Dev] Contributing to SciPy guide In-Reply-To: References: Message-ID: Dear Ralf, thanks for this helpful document. What I think is missing is info and link to pages that explain how to get started with a dev version of scipy. Many users have a released version and have never compiled it although they could be contributors. FAQ could be, how to work with a dev version of scipy while keeping the last release to switch between both? You can find such info on the web but it might be worth centralizing them. If it already exists, please let me know and forget this message. my 2c, Alex On Sat, Apr 14, 2012 at 10:56 AM, Ralf Gommers wrote: > Hi all, > > It has been pointed out by a number of people that it's not so easy to get > started with contributing to SciPy, and better documentation may help here. > So I've written a guide for this. It would be great to get some feedback > especially from people who've found it difficult to find this information > before. And if you haven't contributed before but were thinking about doing > so, perhaps this is a good opportunity to get started! > > Pull Request: https://github.com/scipy/scipy/pull/191 > Rendered guide: > https://github.com/rgommers/scipy/blob/howto-contribute/doc/HOWTO_CONTRIBUTE.rst.txt > > Cheers, > Ralf > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From lists at hilboll.de Sun Apr 15 08:34:21 2012 From: lists at hilboll.de (Andreas H.) Date: Sun, 15 Apr 2012 14:34:21 +0200 Subject: [SciPy-Dev] Contributing to SciPy guide In-Reply-To: References: Message-ID: <4F8AC04D.30608@hilboll.de> > thanks for this helpful document. What I think is missing is info and > link to pages that explain how to get started with a dev version of > scipy. Many users have a released version and have never compiled it > although they could be contributors. FAQ could be, how to work with a > dev version of scipy while keeping the last release to switch between > both? You can find such info on the web but it might be worth > centralizing them. If it already exists, please let me know and forget > this message. Good point, I think. I'm using a virtualenv for development, isolated from my system's site-packages. If you're interested, I can write a short paragraph about my setup and include it in the DOC. Cheers, Andreas. I'm From alexandre.gramfort at inria.fr Sun Apr 15 08:35:58 2012 From: alexandre.gramfort at inria.fr (Alexandre Gramfort) Date: Sun, 15 Apr 2012 14:35:58 +0200 Subject: [SciPy-Dev] Contributing to SciPy guide In-Reply-To: <4F8AC04D.30608@hilboll.de> References: <4F8AC04D.30608@hilboll.de> Message-ID: > Good point, I think. I'm using a virtualenv for development, isolated > from my system's site-packages. If you're interested, I can write a > short paragraph about my setup and include it in the DOC. that would be really useful. Alex From eigenspaces at gmail.com Sun Apr 15 16:13:10 2012 From: eigenspaces at gmail.com (Patrick "Kai" Baker) Date: Sun, 15 Apr 2012 16:13:10 -0400 Subject: [SciPy-Dev] My Intro Message-ID: I have never worked on an open source project before so I'm not exactly sure what I'm doing. But I'm real excited about doing this. I have an advanced degree in mathematics and an emphasis on mathematical physics. I love numerical analysis and coding so I'm real excited about joining the SciPy team. At some point, once I get a handle on things, I will like to contribute to the documentation as well. I guess I will hang out, figure out what's going on, read the archives, and check out the source code. Real excited about this! Regards, Kai -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Apr 15 17:10:51 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 15 Apr 2012 23:10:51 +0200 Subject: [SciPy-Dev] Contributing to SciPy guide In-Reply-To: References: <4F8AC04D.30608@hilboll.de> Message-ID: On Sun, Apr 15, 2012 at 2:35 PM, Alexandre Gramfort < alexandre.gramfort at inria.fr> wrote: > > Good point, I think. I'm using a virtualenv for development, isolated > > from my system's site-packages. If you're interested, I can write a > > short paragraph about my setup and include it in the DOC. > > that would be really useful. > Sounds good. If you could describe the virtualenv setup Andreas, and I'll add a description of using an in-place build, plus when to use which, then that should cover the basics. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From b.telenczuk at biologie.hu-berlin.de Mon Apr 16 10:43:45 2012 From: b.telenczuk at biologie.hu-berlin.de (Bartosz Telenczuk) Date: Mon, 16 Apr 2012 16:43:45 +0200 Subject: [SciPy-Dev] ANN: SpikeSort 0.12 Message-ID: <3550033E-D8A5-45BA-991A-BE25D3CA72EC@biologie.hu-berlin.de> We are pleased to announce the first official release of the SpikeSort, a new spike sorting library based on dynamic and interactive language (Python). SpikeSort 0.12 is available for download at http://spikesort.org SpikeSort aims to be both flexible and user-friendly. It achieves that by defining a set of components that can be mix-and-matched to fit specific needs. These components are based on standard scientific libraries, so they can be easily re-used in custom applications. Main features: * user-friendly and customisable, * interactive command-line interface in Python, * visualization widgets, * k-means and gaussian-mixture-models (GMM) clustering algorithms and manual cluster cutting, * support for multi-channel data (for example, from tetrodes), * support for binary datasets and HDF5 files (support for other formats planned in future version). SpikeSort includes >60 tests making it one of the best-tested spike sorting software on the market. The distribution also contains full documentation, sample scripts and sample data, so that you can start playing with it almost immediately. SpikeSort is offered free of charge under a liberal Open Source license (two-clause BSD license) allowing for non-commercial and commercial use. The project was partially supported by Deutsche Forschungsgemeischaft (SFB 618 ``Theoretische Biologie'', project B4). The sample data is available by courtesy of Stuart Baker and Wellcome Trust who provided funding for data collection. Yours, Bartosz Telenczuk Dmytro Bielievtsov From ralf.gommers at googlemail.com Mon Apr 16 13:43:08 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 16 Apr 2012 19:43:08 +0200 Subject: [SciPy-Dev] My Intro In-Reply-To: References: Message-ID: On Sun, Apr 15, 2012 at 10:13 PM, Patrick "Kai" Baker wrote: > I have never worked on an open source project before so I'm not exactly > sure what I'm doing. But I'm real excited about doing this. I have an > advanced degree in mathematics and an emphasis on mathematical physics. I > love numerical analysis and coding so I'm real excited about joining the > SciPy team. At some point, once I get a handle on things, I will like to > contribute to the documentation as well. I guess I will hang out, figure > out what's going on, read the archives, and check out the source code. Real > excited about this! > Hi Kai, welcome! If you have any specific question or run into some issues getting started, let us know. Looking forward to your contributions. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Mon Apr 16 15:49:42 2012 From: cournape at gmail.com (David Cournapeau) Date: Mon, 16 Apr 2012 20:49:42 +0100 Subject: [SciPy-Dev] My Intro In-Reply-To: References: Message-ID: Hi Kai, On Sun, Apr 15, 2012 at 9:13 PM, Patrick "Kai" Baker wrote: > I have never worked on an open source project before so I'm not exactly > sure what I'm doing. But I'm real excited about doing this. I have an > advanced degree in mathematics and an emphasis on mathematical physics. I > love numerical analysis and coding so I'm real excited about joining the > SciPy team. At some point, once I get a handle on things, I will like to > contribute to the documentation as well. I guess I will hang out, figure > out what's going on, read the archives, and check out the source code. Real > excited about this! > > Regards, > Kai > Welcome to the scipy community ! As a newcommer into the community, you are actually in the best position to tell us what's missing for people like you who want to start contributing (FAQ, documents, etc?) cheers, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Tue Apr 17 17:28:59 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 17 Apr 2012 23:28:59 +0200 Subject: [SciPy-Dev] testbench for some non-derivative optimizers in NLopt and scipy.optimize In-Reply-To: References: Message-ID: On Thu, Apr 12, 2012 at 7:04 PM, denis wrote: > Folks, > fwiw, https://github.com/denis-bz/opt has testbenches > for some of the non-derivative optimizers in NLopt and scipy.optimize. > >From opt-py.Readme -- > > These few test functions are *very* noisy, sensitive to random seed: > fmin 0.4 neval 369 LN_BOBYQA > fmin 3 neval 580 LN_BOBYQA > fmin 0.06 neval 423 LN_BOBYQA > fmin 12 neval 385 LN_BOBYQA > fmin 2 neval 325 LN_BOBYQA > ... > > ==> we need test functions. > > > Comments, any birds of a feather ? > This seems to be telling me something interesting, but not sure exactly what: https://github.com/denis-bz/opt/blob/master/nlopt/test/Powellsincos-dim10.png It looks like there are a lot more tests for NLopt than for scipy.optimize. Are you planning to keep developing this test bench? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Tue Apr 17 17:39:10 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 17 Apr 2012 23:39:10 +0200 Subject: [SciPy-Dev] Contributing to SciPy guide In-Reply-To: References: <4F8AC04D.30608@hilboll.de> Message-ID: On Sun, Apr 15, 2012 at 11:10 PM, Ralf Gommers wrote: > > > On Sun, Apr 15, 2012 at 2:35 PM, Alexandre Gramfort < > alexandre.gramfort at inria.fr> wrote: > >> > Good point, I think. I'm using a virtualenv for development, isolated >> > from my system's site-packages. If you're interested, I can write a >> > short paragraph about my setup and include it in the DOC. >> >> that would be really useful. >> > > Sounds good. If you could describe the virtualenv setup Andreas, and I'll > add a description of using an in-place build, plus when to use which, then > that should cover the basics. > This should also be described in http://docs.scipy.org/doc/numpy/dev/gitwash/, but unfortunately it doesn't. The instructions on sending patches are also really outdated, they still recommend to attached patches to a Trac ticket. In case you've read that doc, *please don't do that*. Send a pull request on Github instead. @Matthew: should this also be fixed in https://github.com/matthew-brett/gitwash? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From eigenspaces at gmail.com Wed Apr 18 15:04:36 2012 From: eigenspaces at gmail.com (Patrick "Kai" Baker) Date: Wed, 18 Apr 2012 15:04:36 -0400 Subject: [SciPy-Dev] My Intro In-Reply-To: References: Message-ID: Thanks you two. Well, I got scipy installed and got git set up. Now I'm real confused about what I can do. Documentation is one of them, which I will want to do once I become more familiar with the software. But as far as coding goes, I'm not sure what needs to be done. Is there anyplace I can go with a to-do list of things which needs to be done? I looked around and haven't found anything like that so far. On Mon, Apr 16, 2012 at 3:49 PM, David Cournapeau wrote: > Hi Kai, > > On Sun, Apr 15, 2012 at 9:13 PM, Patrick "Kai" Baker < > eigenspaces at gmail.com> wrote: > >> I have never worked on an open source project before so I'm not exactly >> sure what I'm doing. But I'm real excited about doing this. I have an >> advanced degree in mathematics and an emphasis on mathematical physics. I >> love numerical analysis and coding so I'm real excited about joining the >> SciPy team. At some point, once I get a handle on things, I will like to >> contribute to the documentation as well. I guess I will hang out, figure >> out what's going on, read the archives, and check out the source code. Real >> excited about this! >> >> Regards, >> Kai >> > > Welcome to the scipy community ! > > As a newcommer into the community, you are actually in the best position > to tell us what's missing for people like you who want to start > contributing (FAQ, documents, etc?) > > cheers, > > David > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From denis at laxalde.org Wed Apr 18 16:21:30 2012 From: denis at laxalde.org (Denis Laxalde) Date: Wed, 18 Apr 2012 16:21:30 -0400 Subject: [SciPy-Dev] PR #195: optimize.root - interface to root finding algorithms Message-ID: <20120418162130.653aaca0@laxalde.org> Hi, In pull request #195, I am proposing the addition of a unified interface `optimize.root` for root finding algorithms for multivariate functions, similar to `optimize.minimize` and `optimize.minimize_scalar` (already merged). Available solvers are: - hybr: MINPACK's hybrid Powell algorithm (fsolve) - lm: MINPACK's Levenberg-Marquardt algorithm (leastsq) - and solvers from optimize.nonlin (broyden1, krylov, etc.) See . Comments welcome. Thanks, Denis. From denis at laxalde.org Wed Apr 18 16:26:04 2012 From: denis at laxalde.org (Denis Laxalde) Date: Wed, 18 Apr 2012 16:26:04 -0400 Subject: [SciPy-Dev] PR #196: simplification of optimization wrappers Message-ID: <20120418162604.227beb44@laxalde.org> Hi, In pull request #196, I am proposing to simplify the signature of optimization wrappers (minimize, minimize_scalar, etc.) to get something like: x, info = minimize(fun, x0, [jac, constraints], options) This boils down to eliminating the full_output and retall parameters from respective function. Besides, the info dictionnary would always be returned. Comments welcome - https://github.com/scipy/scipy/pull/196 Thanks, -- Denis From pav at iki.fi Wed Apr 18 16:33:09 2012 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 18 Apr 2012 22:33:09 +0200 Subject: [SciPy-Dev] My Intro In-Reply-To: References: Message-ID: Hi, 18.04.2012 21:04, Patrick "Kai" Baker kirjoitti: > Thanks you two. Well, I got scipy installed and got git set up. Now I'm > real confused about what I can do. Documentation is one of them, which I > will want to do once I become more familiar with the software. But as > far as coding goes, I'm not sure what needs to be done. Is there > anyplace I can go with a to-do list of things which needs to be done? I > looked around and haven't found anything like that so far. A to-do list is not written down anywhere (but that could be a part of the module status summary for the maintainers to write). We however do have a long list of bugs and issues to fix: http://projects.scipy.org/scipy/query?status=apply&status=needs_decision&status=needs_info&status=needs_review&status=needs_work&status=new&status=reopened&group=component&max=100000&order=priority Some of these are easier to do, others are difficult, and some require "deeper" knowledge of how the things work. There are also some feature requests mixed in. Pauli From pav at iki.fi Wed Apr 18 16:38:19 2012 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 18 Apr 2012 22:38:19 +0200 Subject: [SciPy-Dev] PR #196: simplification of optimization wrappers In-Reply-To: <20120418162604.227beb44@laxalde.org> References: <20120418162604.227beb44@laxalde.org> Message-ID: Hi, 18.04.2012 22:26, Denis Laxalde kirjoitti: > In pull request #196, I am proposing to simplify the signature of > optimization wrappers (minimize, minimize_scalar, etc.) to get > something like: > > x, info = minimize(fun, x0, [jac, constraints], options) How about going even further, and not even returning `x`. Rather, stuff it inside `info`: sol = minimize(fun, x0, [jac, constraints], options) x = sol.x Just change the solution object to a dict subclass with attribute accessors, and you're done. And maybe even add `def __array__(self): return self.x`, so you can do `asarray(sol)`? Pauli From ralf.gommers at googlemail.com Wed Apr 18 16:59:49 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 18 Apr 2012 22:59:49 +0200 Subject: [SciPy-Dev] My Intro In-Reply-To: References: Message-ID: On Wed, Apr 18, 2012 at 10:33 PM, Pauli Virtanen wrote: > Hi, > > 18.04.2012 21:04, Patrick "Kai" Baker kirjoitti: > > Thanks you two. Well, I got scipy installed and got git set up. Now I'm > > real confused about what I can do. Documentation is one of them, which I > > will want to do once I become more familiar with the software. But as > > far as coding goes, I'm not sure what needs to be done. Is there > > anyplace I can go with a to-do list of things which needs to be done? I > > looked around and haven't found anything like that so far. > > A to-do list is not written down anywhere (but that could be a part of > the module status summary for the maintainers to write). > > We however do have a long list of bugs and issues to fix: > > > http://projects.scipy.org/scipy/query?status=apply&status=needs_decision&status=needs_info&status=needs_review&status=needs_work&status=new&status=reopened&group=component&max=100000&order=priority > > Some of these are easier to do, others are difficult, and some require > "deeper" knowledge of how the things work. There are also some feature > requests mixed in. Browsing those tickets may indeed give you a good impression of what things there are to do. If you have preferences/interests, such as a particular topic/module (optimization/interpolation/stats/etc) or language (Python/Cython/C/C++/Fortran), we can give you more specific suggestions. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From eigenspaces at gmail.com Wed Apr 18 17:11:26 2012 From: eigenspaces at gmail.com (Patrick "Kai" Baker) Date: Wed, 18 Apr 2012 17:11:26 -0400 Subject: [SciPy-Dev] My Intro In-Reply-To: References: Message-ID: That's excellent. Debugging sounds like a good start. I know C/C++ the most since they're my first languages. But I also know Python and would like to gain more working knowledge of it. As far as mathematics, I'm most familiar with analysis: calculus, differential equations, special functions, etc. Kai On Wed, Apr 18, 2012 at 4:59 PM, Ralf Gommers wrote: > > > On Wed, Apr 18, 2012 at 10:33 PM, Pauli Virtanen wrote: > >> Hi, >> >> 18.04.2012 21:04, Patrick "Kai" Baker kirjoitti: >> > Thanks you two. Well, I got scipy installed and got git set up. Now I'm >> > real confused about what I can do. Documentation is one of them, which I >> > will want to do once I become more familiar with the software. But as >> > far as coding goes, I'm not sure what needs to be done. Is there >> > anyplace I can go with a to-do list of things which needs to be done? I >> > looked around and haven't found anything like that so far. >> >> A to-do list is not written down anywhere (but that could be a part of >> the module status summary for the maintainers to write). >> >> We however do have a long list of bugs and issues to fix: >> >> >> http://projects.scipy.org/scipy/query?status=apply&status=needs_decision&status=needs_info&status=needs_review&status=needs_work&status=new&status=reopened&group=component&max=100000&order=priority >> >> Some of these are easier to do, others are difficult, and some require >> "deeper" knowledge of how the things work. There are also some feature >> requests mixed in. > > > Browsing those tickets may indeed give you a good impression of what > things there are to do. If you have preferences/interests, such as a > particular topic/module (optimization/interpolation/stats/etc) or language > (Python/Cython/C/C++/Fortran), we can give you more specific suggestions. > > Ralf > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gustavo.goretkin at gmail.com Wed Apr 18 20:23:55 2012 From: gustavo.goretkin at gmail.com (Gustavo Goretkin) Date: Wed, 18 Apr 2012 20:23:55 -0400 Subject: [SciPy-Dev] PR #196: simplification of optimization wrappers In-Reply-To: References: <20120418162604.227beb44@laxalde.org> Message-ID: If I recall correctly, a similar approach is used in CVXOPT. On Apr 18, 2012 4:38 PM, "Pauli Virtanen" wrote: > Hi, > > 18.04.2012 22:26, Denis Laxalde kirjoitti: > > In pull request #196, I am proposing to simplify the signature of > > optimization wrappers (minimize, minimize_scalar, etc.) to get > > something like: > > > > x, info = minimize(fun, x0, [jac, constraints], options) > > How about going even further, and not even returning `x`. Rather, stuff > it inside `info`: > > sol = minimize(fun, x0, [jac, constraints], options) > x = sol.x > > Just change the solution object to a dict subclass with attribute > accessors, and you're done. > > And maybe even add `def __array__(self): return self.x`, so you can do > `asarray(sol)`? > > Pauli > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Apr 18 20:47:49 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 18 Apr 2012 20:47:49 -0400 Subject: [SciPy-Dev] PR #196: simplification of optimization wrappers In-Reply-To: References: <20120418162604.227beb44@laxalde.org> Message-ID: On Wed, Apr 18, 2012 at 8:23 PM, Gustavo Goretkin wrote: > If I recall correctly, a similar approach is used in CVXOPT. > > On Apr 18, 2012 4:38 PM, "Pauli Virtanen" wrote: >> >> Hi, >> >> 18.04.2012 22:26, Denis Laxalde kirjoitti: >> > In pull request #196, I am proposing to simplify the signature of >> > optimization wrappers (minimize, minimize_scalar, etc.) to get >> > something like: >> > >> > ? x, info = minimize(fun, x0, [jac, constraints], options) >> >> How about going even further, and not even returning `x`. Rather, stuff >> it inside `info`: >> >> ? ?sol = minimize(fun, x0, [jac, constraints], options) >> ? ?x = sol.x >> >> Just change the solution object to a dict subclass with attribute >> accessors, and you're done. >> >> And maybe even add `def __array__(self): return self.x`, so you can do >> `asarray(sol)`? What's the overhead of retall, especially fmin with a few thousand iterations? The rest of info looks all calculated as a byproduct, so there shouldn't be much extra cost, is there? I don't think I ever used or looked at retall. Josef >> >> ? ? ? ?Pauli >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From denis at laxalde.org Wed Apr 18 21:27:39 2012 From: denis at laxalde.org (Denis Laxalde) Date: Wed, 18 Apr 2012 21:27:39 -0400 Subject: [SciPy-Dev] PR #196: simplification of optimization wrappers In-Reply-To: References: <20120418162604.227beb44@laxalde.org> Message-ID: <20120418212739.2be61924@laxalde.org> josef.pktd at gmail.com wrote: > What's the overhead of retall, especially fmin with a few thousand > iterations? The retall parameter is replaced by the field 'return_all' in the options dictionary which, if True, will lead to an 'allvecs' field in info. So there's no extra overhead as it remains optional. > The rest of info looks all calculated as a byproduct, so > there shouldn't be much extra cost, is there? No, I don't think so. -- Denis From denis at laxalde.org Wed Apr 18 21:34:37 2012 From: denis at laxalde.org (Denis Laxalde) Date: Wed, 18 Apr 2012 21:34:37 -0400 Subject: [SciPy-Dev] PR #196: simplification of optimization wrappers In-Reply-To: References: <20120418162604.227beb44@laxalde.org> Message-ID: <20120418213437.66e7e216@laxalde.org> Pauli Virtanen wrote: > How about going even further, and not even returning `x`. Rather, > stuff it inside `info`: > > sol = minimize(fun, x0, [jac, constraints], options) > x = sol.x Ok, if no one objects, I'm fine with this as well. Actually, x is already in info['solution']. -- Denis From josef.pktd at gmail.com Wed Apr 18 22:09:53 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 18 Apr 2012 22:09:53 -0400 Subject: [SciPy-Dev] PR #196: simplification of optimization wrappers In-Reply-To: <20120418212739.2be61924@laxalde.org> References: <20120418162604.227beb44@laxalde.org> <20120418212739.2be61924@laxalde.org> Message-ID: On Wed, Apr 18, 2012 at 9:27 PM, Denis Laxalde wrote: > josef.pktd at gmail.com wrote: >> What's the overhead of retall, especially fmin with a few thousand >> iterations? > > The retall parameter is replaced by the field 'return_all' in the > options dictionary which, if True, will lead to an 'allvecs' field in > info. So there's no extra overhead as it remains optional. good, I don't see any problem then. > >> The rest of info looks all calculated as a byproduct, so >> there shouldn't be much extra cost, is there? > > No, I don't think so. Another question: I just saw that the options use a mutable keyword, dict. Are we running into problems? It might be safer to set it to None instead of an empty dict, given that an empty dict doesn't make a more informative signature either. (I saw a reminder on this on planet python today. http://reinout.vanrees.org/weblog/2012/04/18/default-parameters.html ) Josef > > -- > Denis > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From lists at hilboll.de Thu Apr 19 05:52:30 2012 From: lists at hilboll.de (Andreas H.) Date: Thu, 19 Apr 2012 11:52:30 +0200 Subject: [SciPy-Dev] Contributing to SciPy guide In-Reply-To: References: <4F8AC04D.30608@hilboll.de> Message-ID: <4F8FE05E.9090407@hilboll.de> Am Di 17 Apr 2012 23:39:10 CEST schrieb Ralf Gommers: > > > On Sun, Apr 15, 2012 at 11:10 PM, Ralf Gommers > > wrote: > > > > On Sun, Apr 15, 2012 at 2:35 PM, Alexandre Gramfort > > > wrote: > > > Good point, I think. I'm using a virtualenv for development, > isolated > > from my system's site-packages. If you're interested, I can > write a > > short paragraph about my setup and include it in the DOC. > > that would be really useful. > > > Sounds good. If you could describe the virtualenv setup Andreas, > and I'll add a description of using an in-place build, plus when > to use which, then that should cover the basics. > > > This should also be described in > http://docs.scipy.org/doc/numpy/dev/gitwash/, but unfortunately it > doesn't. I wrote up a short tutorial about using virtualenv and placed it here: https://gist.github.com/2419961 Should I incorporate it into the gitwash docs and send a PR to numpy? Andreas. From gyromagnetic at gmail.com Thu Apr 19 08:55:02 2012 From: gyromagnetic at gmail.com (Gyro Funch) Date: Thu, 19 Apr 2012 06:55:02 -0600 Subject: [SciPy-Dev] PR #196: simplification of optimization wrappers In-Reply-To: <20120418213437.66e7e216@laxalde.org> References: <20120418162604.227beb44@laxalde.org> <20120418213437.66e7e216@laxalde.org> Message-ID: On 2012-04-18 7:34 PM, Denis Laxalde wrote: > Pauli Virtanen wrote: >> How about going even further, and not even returning `x`. Rather, >> stuff it inside `info`: >> >> sol = minimize(fun, x0, [jac, constraints], options) >> x = sol.x > > Ok, if no one objects, I'm fine with this as well. Actually, x is > already in info['solution']. > Hi, I am a user of various SciPy optimization functions. Since I am interested primarily in 'x', I'm not sure of the benefit of burying this information in a dictionary. Perhaps this can be clarified. Thanks. -gyro From denis at laxalde.org Thu Apr 19 10:28:09 2012 From: denis at laxalde.org (Denis Laxalde) Date: Thu, 19 Apr 2012 10:28:09 -0400 Subject: [SciPy-Dev] PR #196: simplification of optimization wrappers In-Reply-To: References: <20120418162604.227beb44@laxalde.org> <20120418212739.2be61924@laxalde.org> Message-ID: <20120419102809.211398b1@laxalde.org> josef.pktd at gmail.com wrote: > I just saw that the options use a mutable keyword, dict. > Are we running into problems? It might be safer to set it to None > instead of an empty dict, given that an empty dict doesn't make a more > informative signature either. I'm not sure to see the problem but, AFAICT, setting its default value to None would not prevent an existing options dictionary (or any mutable object) that would be passed as an argument to be modified. It seems there might a problem iff the default value is not an empty dictionary. At least, {} as a default value ensures that dictionary methods always work. -- Denis From njs at pobox.com Thu Apr 19 10:31:53 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 19 Apr 2012 15:31:53 +0100 Subject: [SciPy-Dev] PR #196: simplification of optimization wrappers In-Reply-To: <20120419102809.211398b1@laxalde.org> References: <20120418162604.227beb44@laxalde.org> <20120418212739.2be61924@laxalde.org> <20120419102809.211398b1@laxalde.org> Message-ID: On Thu, Apr 19, 2012 at 3:28 PM, Denis Laxalde wrote: > josef.pktd at gmail.com wrote: >> I just saw that the options use a mutable keyword, dict. >> Are we running into problems? It might be safer to set it to None >> instead of an empty dict, given that an empty dict doesn't make a more >> informative signature either. > > I'm not sure to see the problem but, AFAICT, setting its default value > to None would not prevent an existing options dictionary (or any > mutable object) that would be passed as an argument to be modified. It > seems there might a problem iff the default value is not an empty > dictionary. > At least, {} as a default value ensures that dictionary methods > always work. {} as a default value is fine iff you are careful to treat the passed-in dictionary as read-only. E.g. this is bad def dosomething(options={}): options.setdefault("quickly", True) if options["quickly"]: ... -- Nathaniel From denis at laxalde.org Thu Apr 19 12:30:16 2012 From: denis at laxalde.org (Denis Laxalde) Date: Thu, 19 Apr 2012 12:30:16 -0400 Subject: [SciPy-Dev] PR #196: simplification of optimization wrappers In-Reply-To: References: <20120418162604.227beb44@laxalde.org> <20120418213437.66e7e216@laxalde.org> Message-ID: <20120419123016.0496d18f@laxalde.org> Gyro Funch wrote: > I am a user of various SciPy optimization functions. Since I am > interested primarily in 'x', I'm not sure of the benefit of burying > this information in a dictionary. Perhaps this can be clarified. 'x' is not that ??buried??, you can easily access it as sol.x. So, if you only want 'x', do: x = minimize(fun, x0, [jac, constraints], options).x (Note that there's no 'x' attribute defined. It is currently named 'solution'.) -- Denis From denis at laxalde.org Thu Apr 19 15:08:34 2012 From: denis at laxalde.org (Denis Laxalde) Date: Thu, 19 Apr 2012 15:08:34 -0400 Subject: [SciPy-Dev] PR #196: simplification of optimization wrappers In-Reply-To: References: <20120418162604.227beb44@laxalde.org> <20120418212739.2be61924@laxalde.org> <20120419102809.211398b1@laxalde.org> Message-ID: <20120419150834.17933167@laxalde.org> Nathaniel Smith wrote: > >> I just saw that the options use a mutable keyword, dict. > >> Are we running into problems? It might be safer to set it to None > >> instead of an empty dict, given that an empty dict doesn't make a more > >> informative signature either. > > > > I'm not sure to see the problem but, AFAICT, setting its default value > > to None would not prevent an existing options dictionary (or any > > mutable object) that would be passed as an argument to be modified. It > > seems there might a problem iff the default value is not an empty > > dictionary. > > At least, {} as a default value ensures that dictionary methods > > always work. > > {} as a default value is fine iff you are careful to treat the > passed-in dictionary as read-only. E.g. this is bad > > def dosomething(options={}): > options.setdefault("quickly", True) > if options["quickly"]: ... AFAICT, that dictionary is not modified. Yet I am now convinced that it is safer to use None. -- Denis From ralf.gommers at googlemail.com Fri Apr 20 15:34:36 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 20 Apr 2012 21:34:36 +0200 Subject: [SciPy-Dev] My Intro In-Reply-To: References: Message-ID: On Wed, Apr 18, 2012 at 11:11 PM, Patrick "Kai" Baker wrote: > That's excellent. Debugging sounds like a good start. Great. > I know C/C++ the most since they're my first languages. But I also know > Python and would like to gain more working knowledge of it. As far as > mathematics, I'm most familiar with analysis: calculus, differential > equations, special functions, etc. > Special functions (scipy.special) are mostly written in C, its tests are in Python of course. Pauli can perhaps give you more pointers for special, but here's already a ticket with an issue that needs investigating (it came up again today on scipy-user): http://projects.scipy.org/scipy/ticket/1453 scipy.integrate may also fit with your analysis background. Possible ticket to start: http://projects.scipy.org/scipy/ticket/1572 Here's one that needs investigation in interpolate: http://projects.scipy.org/scipy/ticket/1642 The above modules have more issues that require digging into compiled code that issues with pure Python code I'd say. For pure Python code, stats may be a good module: http://projects.scipy.org/scipy/ticket/1183 http://projects.scipy.org/scipy/ticket/1493 http://projects.scipy.org/scipy/ticket/913 The above are all bugs. There are also many tickets with enhancement suggestions, looking at some of those could be interesting too. Cheers, Ralf > > Kai > > On Wed, Apr 18, 2012 at 4:59 PM, Ralf Gommers > wrote: > >> >> >> On Wed, Apr 18, 2012 at 10:33 PM, Pauli Virtanen wrote: >> >>> Hi, >>> >>> 18.04.2012 21:04, Patrick "Kai" Baker kirjoitti: >>> > Thanks you two. Well, I got scipy installed and got git set up. Now I'm >>> > real confused about what I can do. Documentation is one of them, which >>> I >>> > will want to do once I become more familiar with the software. But as >>> > far as coding goes, I'm not sure what needs to be done. Is there >>> > anyplace I can go with a to-do list of things which needs to be done? I >>> > looked around and haven't found anything like that so far. >>> >>> A to-do list is not written down anywhere (but that could be a part of >>> the module status summary for the maintainers to write). >>> >>> We however do have a long list of bugs and issues to fix: >>> >>> >>> http://projects.scipy.org/scipy/query?status=apply&status=needs_decision&status=needs_info&status=needs_review&status=needs_work&status=new&status=reopened&group=component&max=100000&order=priority >>> >>> Some of these are easier to do, others are difficult, and some require >>> "deeper" knowledge of how the things work. There are also some feature >>> requests mixed in. >> >> >> Browsing those tickets may indeed give you a good impression of what >> things there are to do. If you have preferences/interests, such as a >> particular topic/module (optimization/interpolation/stats/etc) or language >> (Python/Cython/C/C++/Fortran), we can give you more specific suggestions. >> >> Ralf >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eigenspaces at gmail.com Fri Apr 20 23:29:37 2012 From: eigenspaces at gmail.com (Patrick "Kai" Baker) Date: Fri, 20 Apr 2012 23:29:37 -0400 Subject: [SciPy-Dev] My Intro In-Reply-To: References: Message-ID: Wow, a lot of great stuff I can't wait to delve in to next week! (I work this weekend - yeah, it sucks!) Once I get the hang of things I'll look in to the other areas of development. I'm not only interested in gaining experience in what I already know but also expand upon it. Kai On Fri, Apr 20, 2012 at 3:34 PM, Ralf Gommers wrote: > > > On Wed, Apr 18, 2012 at 11:11 PM, Patrick "Kai" Baker < > eigenspaces at gmail.com> wrote: > >> That's excellent. Debugging sounds like a good start. > > > Great. > > >> I know C/C++ the most since they're my first languages. But I also know >> Python and would like to gain more working knowledge of it. As far as >> mathematics, I'm most familiar with analysis: calculus, differential >> equations, special functions, etc. >> > > Special functions (scipy.special) are mostly written in C, its tests are > in Python of course. Pauli can perhaps give you more pointers for special, > but here's already a ticket with an issue that needs investigating (it came > up again today on scipy-user): http://projects.scipy.org/scipy/ticket/1453 > > scipy.integrate may also fit with your analysis background. Possible > ticket to start: http://projects.scipy.org/scipy/ticket/1572 > > Here's one that needs investigation in interpolate: > http://projects.scipy.org/scipy/ticket/1642 > > The above modules have more issues that require digging into compiled code > that issues with pure Python code I'd say. For pure Python code, stats may > be a good module: > http://projects.scipy.org/scipy/ticket/1183 > http://projects.scipy.org/scipy/ticket/1493 > http://projects.scipy.org/scipy/ticket/913 > > The above are all bugs. There are also many tickets with enhancement > suggestions, looking at some of those could be interesting too. > > Cheers, > Ralf > > > >> >> Kai >> >> On Wed, Apr 18, 2012 at 4:59 PM, Ralf Gommers < >> ralf.gommers at googlemail.com> wrote: >> >>> >>> >>> On Wed, Apr 18, 2012 at 10:33 PM, Pauli Virtanen wrote: >>> >>>> Hi, >>>> >>>> 18.04.2012 21:04, Patrick "Kai" Baker kirjoitti: >>>> > Thanks you two. Well, I got scipy installed and got git set up. Now >>>> I'm >>>> > real confused about what I can do. Documentation is one of them, >>>> which I >>>> > will want to do once I become more familiar with the software. But as >>>> > far as coding goes, I'm not sure what needs to be done. Is there >>>> > anyplace I can go with a to-do list of things which needs to be done? >>>> I >>>> > looked around and haven't found anything like that so far. >>>> >>>> A to-do list is not written down anywhere (but that could be a part of >>>> the module status summary for the maintainers to write). >>>> >>>> We however do have a long list of bugs and issues to fix: >>>> >>>> >>>> http://projects.scipy.org/scipy/query?status=apply&status=needs_decision&status=needs_info&status=needs_review&status=needs_work&status=new&status=reopened&group=component&max=100000&order=priority >>>> >>>> Some of these are easier to do, others are difficult, and some require >>>> "deeper" knowledge of how the things work. There are also some feature >>>> requests mixed in. >>> >>> >>> Browsing those tickets may indeed give you a good impression of what >>> things there are to do. If you have preferences/interests, such as a >>> particular topic/module (optimization/interpolation/stats/etc) or language >>> (Python/Cython/C/C++/Fortran), we can give you more specific suggestions. >>> >>> Ralf >>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat Apr 21 04:52:30 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 21 Apr 2012 10:52:30 +0200 Subject: [SciPy-Dev] Contributing to SciPy guide In-Reply-To: <4F8FE05E.9090407@hilboll.de> References: <4F8AC04D.30608@hilboll.de> <4F8FE05E.9090407@hilboll.de> Message-ID: On Thu, Apr 19, 2012 at 11:52 AM, Andreas H. wrote: > Am Di 17 Apr 2012 23:39:10 CEST schrieb Ralf Gommers: > > > > > > On Sun, Apr 15, 2012 at 11:10 PM, Ralf Gommers > > > > wrote: > > > > > > > > On Sun, Apr 15, 2012 at 2:35 PM, Alexandre Gramfort > > > > > wrote: > > > > > Good point, I think. I'm using a virtualenv for development, > > isolated > > > from my system's site-packages. If you're interested, I can > > write a > > > short paragraph about my setup and include it in the DOC. > > > > that would be really useful. > > > > > > Sounds good. If you could describe the virtualenv setup Andreas, > > and I'll add a description of using an in-place build, plus when > > to use which, then that should cover the basics. > > > > > > This should also be described in > > http://docs.scipy.org/doc/numpy/dev/gitwash/, but unfortunately it > > doesn't. > > I wrote up a short tutorial about using virtualenv and placed it here: > > https://gist.github.com/2419961 > > Should I incorporate it into the gitwash docs and send a PR to numpy? That sounds good. I can link to it from the FAQ of the contributors guide then. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From vanforeest at gmail.com Sat Apr 21 15:56:18 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sat, 21 Apr 2012 21:56:18 +0200 Subject: [SciPy-Dev] scipy.stats Message-ID: Hi, While reading the code and the examples of scipy.stats I came across the attribute rv.dist. Specifically, line 227 of the scipy.stats source on the gibhub server mentions: >>> x = np.linspace(0, np.minimum(rv.dist.b, 3)) The problem is that the meaning of this attribute is nowhere explained. Is this a bug? Nicky From josef.pktd at gmail.com Sat Apr 21 16:17:28 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 21 Apr 2012 16:17:28 -0400 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: Message-ID: On Sat, Apr 21, 2012 at 3:56 PM, nicky van foreest wrote: > Hi, > > While reading the code and the examples of scipy.stats I came across > the attribute rv.dist. Specifically, line 227 of the scipy.stats > source on the gibhub server mentions: > >>>> x = np.linspace(0, np.minimum(rv.dist.b, 3)) > > The problem is that the meaning of this attribute is nowhere > explained. Is this a bug? Do you have the github link to the source? It's not clear which module you are talking about. (klick on line number and you get the link to that specific line) It would safe me some time searching for this. Josef > > Nicky > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From vanforeest at gmail.com Sat Apr 21 16:22:32 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sat, 21 Apr 2012 22:22:32 +0200 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: Message-ID: Hopefully this is the correct link: https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L227 btw, I am compiling a list of points about the scipy.stats code. Once finished what should I do with it? Would it be best to send it to this list? Should I send the points one by one, or better as one file? I am very enthousiastic about the fact that the code is now easily accessible via the web. On 21 April 2012 22:17, wrote: > On Sat, Apr 21, 2012 at 3:56 PM, nicky van foreest wrote: >> Hi, >> >> While reading the code and the examples of scipy.stats I came across >> the attribute rv.dist. Specifically, line 227 of the scipy.stats >> source on the gibhub server mentions: >> >>>>> x = np.linspace(0, np.minimum(rv.dist.b, 3)) >> >> The problem is that the meaning of this attribute is nowhere >> explained. Is this a bug? > > Do you have the github link to the source? It's not clear which module > you are talking about. > (klick on line number and you get the link to that specific line) > It would safe me some time searching for this. > > Josef > > >> >> Nicky >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From josef.pktd at gmail.com Sat Apr 21 16:39:45 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 21 Apr 2012 16:39:45 -0400 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: Message-ID: On Sat, Apr 21, 2012 at 4:22 PM, nicky van foreest wrote: > Hopefully this is the correct link: > > https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L227 much easier answer a link https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L453 dist is an attribute of the frozen class. the frozen class delegates to the real class and has that attached as `dist` so this is the same as the upper bound .b attribute of the class. > > btw, I am compiling a list of points about the scipy.stats code. Once > finished what should I do with it? Would it be best to send it to this > list? Should I send the points one by one, or better as one file? > > I am very enthousiastic about the fact that the code is now easily > accessible via the web. pull request on github? depends on what the "points" are. If you find bugs, individual tickets would be useful. If you find several/many smaller things, then one would make it easier to go over all of them. Josef > > > On 21 April 2012 22:17, ? wrote: >> On Sat, Apr 21, 2012 at 3:56 PM, nicky van foreest wrote: >>> Hi, >>> >>> While reading the code and the examples of scipy.stats I came across >>> the attribute rv.dist. Specifically, line 227 of the scipy.stats >>> source on the gibhub server mentions: >>> >>>>>> x = np.linspace(0, np.minimum(rv.dist.b, 3)) >>> >>> The problem is that the meaning of this attribute is nowhere >>> explained. Is this a bug? >> >> Do you have the github link to the source? It's not clear which module >> you are talking about. >> (klick on line number and you get the link to that specific line) >> It would safe me some time searching for this. >> >> Josef >> >> >>> >>> Nicky >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From vanforeest at gmail.com Sat Apr 21 16:56:21 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sat, 21 Apr 2012 22:56:21 +0200 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: Message-ID: On 21 April 2012 22:39, wrote: > On Sat, Apr 21, 2012 at 4:22 PM, nicky van foreest wrote: >> Hopefully this is the correct link: >> >> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L227 > > much easier > > answer a link https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L453 > > dist is an attribute of the frozen class. the frozen class delegates > to the real class and has that attached as `dist` > > so this is the same as the upper bound .b attribute of the class. Sure. I have seen this in the code. The point I wanted to make is that for a user who doesn't (want to) read the source the text in the example is somewhat confusing. Shouldn't these doc strings be targeted at plain users (with `plain' I have no derogatory intentions)? > > >> >> btw, I am compiling a list of points about the scipy.stats code. Once >> finished what should I do with it? Would it be best to send it to this >> list? Should I send the points one by one, or better as one file? >> >> I am very enthousiastic about the fact that the code is now easily >> accessible via the web. > > pull request on github? > > depends on what the "points" are. > If you find bugs, individual tickets would be useful. If you find > several/many smaller things, then one would make it easier to go over > all of them. > > Josef > >> >> >> On 21 April 2012 22:17, ? wrote: >>> On Sat, Apr 21, 2012 at 3:56 PM, nicky van foreest wrote: >>>> Hi, >>>> >>>> While reading the code and the examples of scipy.stats I came across >>>> the attribute rv.dist. Specifically, line 227 of the scipy.stats >>>> source on the gibhub server mentions: >>>> >>>>>>> x = np.linspace(0, np.minimum(rv.dist.b, 3)) >>>> >>>> The problem is that the meaning of this attribute is nowhere >>>> explained. Is this a bug? >>> >>> Do you have the github link to the source? It's not clear which module >>> you are talking about. >>> (klick on line number and you get the link to that specific line) >>> It would safe me some time searching for this. >>> >>> Josef >>> >>> >>>> >>>> Nicky >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From vanforeest at gmail.com Sat Apr 21 17:06:06 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sat, 21 Apr 2012 23:06:06 +0200 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: Message-ID: >> btw, I am compiling a list of points about the scipy.stats code. Once >> finished what should I do with it? Would it be best to send it to this >> list? Should I send the points one by one, or better as one file? >> >> I am very enthousiastic about the fact that the code is now easily >> accessible via the web. > > pull request on github? I have seen this request coming by some time ago. I am happy to see that it has been implemented. > > depends on what the "points" are. > If you find bugs, individual tickets would be useful. If you find > several/many smaller things, then one would make it easier to go over > all of them. I don't always know whether these points are bugs, or due to my misunderstanding of the code or the example text. Hence, it is not always clear to me whether these points are bugs. Here are two examples that confuse me: In [11]: from scipy.stats import uniform In [12]: U = uniform(loc = 3, scale = 5) In [13]: U.mean() Out[13]: 5.5 In [14]: U.moment(1) Out[14]: 0.5 In [15]: U.moment(8) Out[15]: array(0.11111111111111112) First point: why in line 14 is U.moment(1) not equal to U.mean()? I checked the code on line https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L358 to see why, and this explains the result. However, from the doc-string on line https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L129 I would expect to see that U.moment(1) = U.mean(). Second point: From the code I understand that U.moment(1) returns a float, and that the U.moment(8) returns an array. From a user's perspective I find this inconsistent, however. So, are these points real bugs? From josef.pktd at gmail.com Sat Apr 21 17:12:05 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 21 Apr 2012 17:12:05 -0400 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: Message-ID: On Sat, Apr 21, 2012 at 4:56 PM, nicky van foreest wrote: > On 21 April 2012 22:39, ? wrote: >> On Sat, Apr 21, 2012 at 4:22 PM, nicky van foreest wrote: >>> Hopefully this is the correct link: >>> >>> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L227 >> >> much easier >> >> answer a link https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L453 >> >> dist is an attribute of the frozen class. the frozen class delegates >> to the real class and has that attached as `dist` >> >> so this is the same as the upper bound .b attribute of the class. > > Sure. I have seen this in the code. The point I wanted to make is that > for a user who doesn't (want to) read the source the text in the > example is somewhat confusing. Shouldn't these doc strings be targeted > at plain users (with `plain' I have no derogatory intentions)? As a user I would think it's just an upper bound for the plot. But I'm not a good measure for users new to the distributions. It would be possible to inject a number into the docstring, by changing the template https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1159 Josef > >> >> >>> >>> btw, I am compiling a list of points about the scipy.stats code. Once >>> finished what should I do with it? Would it be best to send it to this >>> list? Should I send the points one by one, or better as one file? >>> >>> I am very enthousiastic about the fact that the code is now easily >>> accessible via the web. >> >> pull request on github? >> >> depends on what the "points" are. >> If you find bugs, individual tickets would be useful. If you find >> several/many smaller things, then one would make it easier to go over >> all of them. >> >> Josef >> >>> >>> >>> On 21 April 2012 22:17, ? wrote: >>>> On Sat, Apr 21, 2012 at 3:56 PM, nicky van foreest wrote: >>>>> Hi, >>>>> >>>>> While reading the code and the examples of scipy.stats I came across >>>>> the attribute rv.dist. Specifically, line 227 of the scipy.stats >>>>> source on the gibhub server mentions: >>>>> >>>>>>>> x = np.linspace(0, np.minimum(rv.dist.b, 3)) >>>>> >>>>> The problem is that the meaning of this attribute is nowhere >>>>> explained. Is this a bug? >>>> >>>> Do you have the github link to the source? It's not clear which module >>>> you are talking about. >>>> (klick on line number and you get the link to that specific line) >>>> It would safe me some time searching for this. >>>> >>>> Josef >>>> >>>> >>>>> >>>>> Nicky >>>>> _______________________________________________ >>>>> SciPy-Dev mailing list >>>>> SciPy-Dev at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From vanforeest at gmail.com Sat Apr 21 17:12:19 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sat, 21 Apr 2012 23:12:19 +0200 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: Message-ID: > In [11]: from scipy.stats import uniform > > In [12]: U = uniform(loc = 3, scale = 5) > > In [13]: U.mean() > Out[13]: 5.5 > > In [14]: U.moment(1) > Out[14]: 0.5 > > In [15]: U.moment(8) > Out[15]: array(0.11111111111111112) > > First point: why in line 14 is U.moment(1) ?not equal to U.mean()? I > checked the code on line > https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L358 > to see why, and this explains the result. However, from the doc-string > on line https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L129 > I would expect to see that U.moment(1) = U.mean(). Interestingly, http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.moment.html says that moment() does compute the central moment. However, I need the real moments, i.e., E (X^n) = \int x^n dF(x) where F is the distribution function of the R.V. X. From josef.pktd at gmail.com Sat Apr 21 17:27:04 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 21 Apr 2012 17:27:04 -0400 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: Message-ID: On Sat, Apr 21, 2012 at 5:12 PM, nicky van foreest wrote: >> In [11]: from scipy.stats import uniform >> >> In [12]: U = uniform(loc = 3, scale = 5) >> >> In [13]: U.mean() >> Out[13]: 5.5 >> >> In [14]: U.moment(1) >> Out[14]: 0.5 >> >> In [15]: U.moment(8) >> Out[15]: array(0.11111111111111112) >> >> First point: why in line 14 is U.moment(1) ?not equal to U.mean()? I >> checked the code on line >> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L358 >> to see why, and this explains the result. However, from the doc-string >> on line https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L129 >> I would expect to see that U.moment(1) = U.mean(). Looks like a bug. And I don't think the test suite checks whether loc and scale is handled correctly in all code paths. > > Interestingly, http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.moment.html this is empirical moment, a stats function, not for the distribution non-central for data is just (data**k).mean() if we don't care about ddof. Do we need a function? > says that moment() does compute the central moment. However, I need > the real moments, i.e., E (X^n) = \int x^n dF(x) where F is the > distribution function of the R.V. X. the distribution method moment is non-centered, raw moment. (It was a bit inconsistent when I went through this, and I think I decided everywhere on raw moments,) Josef > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From vanforeest at gmail.com Sat Apr 21 17:31:12 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sat, 21 Apr 2012 23:31:12 +0200 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: Message-ID: > In [11]: from scipy.stats import uniform > > In [12]: U = uniform(loc = 3, scale = 5) > > In [13]: U.mean() > Out[13]: 5.5 > > In [14]: U.moment(1) > Out[14]: 0.5 Might this problem be due to the fact that my ubuntu machine does not support the latest version of scipy.stats? From josef.pktd at gmail.com Sat Apr 21 17:45:34 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 21 Apr 2012 17:45:34 -0400 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: Message-ID: On Sat, Apr 21, 2012 at 5:31 PM, nicky van foreest wrote: >> In [11]: from scipy.stats import uniform >> >> In [12]: U = uniform(loc = 3, scale = 5) >> >> In [13]: U.mean() >> Out[13]: 5.5 >> >> In [14]: U.moment(1) >> Out[14]: 0.5 > > Might this problem be due to the fact that my ubuntu machine does not > support the latest version of scipy.stats? I'm stil on 0.9 and there is no loc, scale option >>> import scipy >>> scipy.__version__ '0.9.0' >>> stats.uniform.moment(loc = 3, scale = 5) Traceback (most recent call last): File "", line 1, in TypeError: moment() got an unexpected keyword argument 'loc' >>> U.moment(1) 0.5 >>> stats.uniform.moment(1) 0.5 Obviously I'm not up to date in this. can you try the distribution directly instead of the frozen distribution? Josef > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From vanforeest at gmail.com Sat Apr 21 17:58:56 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sat, 21 Apr 2012 23:58:56 +0200 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: Message-ID: On 21 April 2012 23:45, wrote: > On Sat, Apr 21, 2012 at 5:31 PM, nicky van foreest wrote: >>> In [11]: from scipy.stats import uniform >>> >>> In [12]: U = uniform(loc = 3, scale = 5) >>> >>> In [13]: U.mean() >>> Out[13]: 5.5 >>> >>> In [14]: U.moment(1) >>> Out[14]: 0.5 >> >> Might this problem be due to the fact that my ubuntu machine does not >> support the latest version of scipy.stats? > > I'm stil on 0.9 and there is no loc, scale option > >>>> import scipy >>>> scipy.__version__ > '0.9.0' >>>> stats.uniform.moment(loc = 3, scale = 5) > Traceback (most recent call last): > ?File "", line 1, in > TypeError: moment() got an unexpected keyword argument 'loc' >>>> U.moment(1) > 0.5 >>>> stats.uniform.moment(1) > 0.5 > > Obviously I'm not up to date in this. > > can you try the distribution directly instead of the frozen distribution? I am having the same problems as you, although: In [9]: ? uniform.moment Type: instancemethod Base Class: String Form: > Namespace: Interactive File: /usr/lib/python2.7/dist-packages/scipy/stats/distributions.py Definition: uniform.moment(self, n, *args) Docstring: n'th order non-central moment of distribution Parameters ---------- n: int, n>=1 order of moment arg1, arg2, arg3,... : array-like The shape parameter(s) for the distribution (see docstring of the instance object for more information) BTW: all of these points stem from my intention to add some function to scipy stats to compute/approximate the convolution of two r.v.s. One such method proposes to use moment matching... From josef.pktd at gmail.com Sat Apr 21 20:41:12 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 21 Apr 2012 20:41:12 -0400 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: Message-ID: On Sat, Apr 21, 2012 at 5:58 PM, nicky van foreest wrote: > On 21 April 2012 23:45, ? wrote: >> On Sat, Apr 21, 2012 at 5:31 PM, nicky van foreest wrote: >>>> In [11]: from scipy.stats import uniform >>>> >>>> In [12]: U = uniform(loc = 3, scale = 5) >>>> >>>> In [13]: U.mean() >>>> Out[13]: 5.5 >>>> >>>> In [14]: U.moment(1) >>>> Out[14]: 0.5 >>> >>> Might this problem be due to the fact that my ubuntu machine does not >>> support the latest version of scipy.stats? >> >> I'm stil on 0.9 and there is no loc, scale option >> >>>>> import scipy >>>>> scipy.__version__ >> '0.9.0' >>>>> stats.uniform.moment(loc = 3, scale = 5) >> Traceback (most recent call last): >> ?File "", line 1, in >> TypeError: moment() got an unexpected keyword argument 'loc' >>>>> U.moment(1) >> 0.5 >>>>> stats.uniform.moment(1) >> 0.5 >> >> Obviously I'm not up to date in this. >> >> can you try the distribution directly instead of the frozen distribution? > > I am having the same problems as you, although: > > In [9]: ? uniform.moment > Type: ? ? ? ? ? instancemethod > Base Class: ? ? > String Form: ? ? > > Namespace: ? ? ?Interactive > File: ? ? ? ? ? /usr/lib/python2.7/dist-packages/scipy/stats/distributions.py > Definition: ? ? uniform.moment(self, n, *args) > Docstring: > ? ?n'th order non-central moment of distribution > > ? ?Parameters > ? ?---------- > ? ?n: int, n>=1 > ? ? ? ?order of moment > > ? ?arg1, arg2, arg3,... : array-like > ? ? ? ?The shape parameter(s) for the distribution (see docstring of the > ? ? ? ?instance object for more information) >>> stats.uniform.moment(1, loc=3, scale=1) 3.5 >>> stats.uniform(loc=3, scale=1).moment(1) 3.5 >>> import scipy >>> scipy.__version__ '0.10.0b2' version mismatch with new documentation In the older versions of scipy, the frozen distributions didn't always pass on the keywords, loc and scale, so they were just quietly ignored. It looks like now moment handles loc and scale >>> stats.uniform.mean(1, loc=3, scale=5) 5.5 >>> stats.uniform.moment(1, loc=3, scale=5) 5.5 >>> stats.uniform.ppf([0, 1], loc=3, scale=5) array([ 3., 8.]) >>> stats.uniform.ppf([0, 1], loc=3, scale=5).mean() 5.5 Josef > > > BTW: all of these points stem from my intention to add some function > to scipy stats to compute/approximate the convolution of two r.v.s. > One such method proposes to use moment matching... > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From vanforeest at gmail.com Sun Apr 22 13:28:53 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sun, 22 Apr 2012 19:28:53 +0200 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: Message-ID: On 22 April 2012 02:41, wrote: > On Sat, Apr 21, 2012 at 5:58 PM, nicky van foreest wrote: >> On 21 April 2012 23:45, ? wrote: >>> On Sat, Apr 21, 2012 at 5:31 PM, nicky van foreest wrote: >>>>> In [11]: from scipy.stats import uniform >>>>> >>>>> In [12]: U = uniform(loc = 3, scale = 5) >>>>> >>>>> In [13]: U.mean() >>>>> Out[13]: 5.5 >>>>> >>>>> In [14]: U.moment(1) >>>>> Out[14]: 0.5 >>>> >>>> Might this problem be due to the fact that my ubuntu machine does not >>>> support the latest version of scipy.stats? >>> >>> I'm stil on 0.9 and there is no loc, scale option >>> >>>>>> import scipy >>>>>> scipy.__version__ >>> '0.9.0' >>>>>> stats.uniform.moment(loc = 3, scale = 5) >>> Traceback (most recent call last): >>> ?File "", line 1, in >>> TypeError: moment() got an unexpected keyword argument 'loc' >>>>>> U.moment(1) >>> 0.5 >>>>>> stats.uniform.moment(1) >>> 0.5 >>> >>> Obviously I'm not up to date in this. >>> >>> can you try the distribution directly instead of the frozen distribution? >> >> I am having the same problems as you, although: >> >> In [9]: ? uniform.moment >> Type: ? ? ? ? ? instancemethod >> Base Class: ? ? >> String Form: ? ?> > >> Namespace: ? ? ?Interactive >> File: ? ? ? ? ? /usr/lib/python2.7/dist-packages/scipy/stats/distributions.py >> Definition: ? ? uniform.moment(self, n, *args) >> Docstring: >> ? ?n'th order non-central moment of distribution >> >> ? ?Parameters >> ? ?---------- >> ? ?n: int, n>=1 >> ? ? ? ?order of moment >> >> ? ?arg1, arg2, arg3,... : array-like >> ? ? ? ?The shape parameter(s) for the distribution (see docstring of the >> ? ? ? ?instance object for more information) > >>>> stats.uniform.moment(1, loc=3, scale=1) > 3.5 >>>> stats.uniform(loc=3, scale=1).moment(1) > 3.5 > >>>> import scipy >>>> scipy.__version__ > '0.10.0b2' > > > version mismatch with new documentation > > In the older versions of scipy, the frozen distributions didn't always > pass on the keywords, loc and scale, so they were just quietly > ignored. > > It looks like now moment handles loc and scale > >>>> stats.uniform.mean(1, loc=3, scale=5) > 5.5 >>>> stats.uniform.moment(1, loc=3, scale=5) > 5.5 > >>>> stats.uniform.ppf([0, 1], loc=3, scale=5) > array([ 3., ?8.]) >>>> stats.uniform.ppf([0, 1], loc=3, scale=5).mean() > 5.5 > > > Josef Ok. So this is clear now. . Let's try to wrap up the above couple of mails. The problem with the moments is solved. The point about rv.dist.b in the examples text is not in my opinion. Is it ok to make a ticket for this? I'll try to figure out how to do this. Hopefully this is covered in the doc of Ralf. BTW. I can also try to repair a few of the tickets. For instance, I came across a similar problem like the one in http://projects.scipy.org/scipy/ticket/1493. I think I have a nice solution for this. From josef.pktd at gmail.com Sun Apr 22 13:38:11 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 22 Apr 2012 13:38:11 -0400 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: Message-ID: On Sun, Apr 22, 2012 at 1:28 PM, nicky van foreest wrote: > On 22 April 2012 02:41, ? wrote: >> On Sat, Apr 21, 2012 at 5:58 PM, nicky van foreest wrote: >>> On 21 April 2012 23:45, ? wrote: >>>> On Sat, Apr 21, 2012 at 5:31 PM, nicky van foreest wrote: >>>>>> In [11]: from scipy.stats import uniform >>>>>> >>>>>> In [12]: U = uniform(loc = 3, scale = 5) >>>>>> >>>>>> In [13]: U.mean() >>>>>> Out[13]: 5.5 >>>>>> >>>>>> In [14]: U.moment(1) >>>>>> Out[14]: 0.5 >>>>> >>>>> Might this problem be due to the fact that my ubuntu machine does not >>>>> support the latest version of scipy.stats? >>>> >>>> I'm stil on 0.9 and there is no loc, scale option >>>> >>>>>>> import scipy >>>>>>> scipy.__version__ >>>> '0.9.0' >>>>>>> stats.uniform.moment(loc = 3, scale = 5) >>>> Traceback (most recent call last): >>>> ?File "", line 1, in >>>> TypeError: moment() got an unexpected keyword argument 'loc' >>>>>>> U.moment(1) >>>> 0.5 >>>>>>> stats.uniform.moment(1) >>>> 0.5 >>>> >>>> Obviously I'm not up to date in this. >>>> >>>> can you try the distribution directly instead of the frozen distribution? >>> >>> I am having the same problems as you, although: >>> >>> In [9]: ? uniform.moment >>> Type: ? ? ? ? ? instancemethod >>> Base Class: ? ? >>> String Form: ? ?>> > >>> Namespace: ? ? ?Interactive >>> File: ? ? ? ? ? /usr/lib/python2.7/dist-packages/scipy/stats/distributions.py >>> Definition: ? ? uniform.moment(self, n, *args) >>> Docstring: >>> ? ?n'th order non-central moment of distribution >>> >>> ? ?Parameters >>> ? ?---------- >>> ? ?n: int, n>=1 >>> ? ? ? ?order of moment >>> >>> ? ?arg1, arg2, arg3,... : array-like >>> ? ? ? ?The shape parameter(s) for the distribution (see docstring of the >>> ? ? ? ?instance object for more information) >> >>>>> stats.uniform.moment(1, loc=3, scale=1) >> 3.5 >>>>> stats.uniform(loc=3, scale=1).moment(1) >> 3.5 >> >>>>> import scipy >>>>> scipy.__version__ >> '0.10.0b2' >> >> >> version mismatch with new documentation >> >> In the older versions of scipy, the frozen distributions didn't always >> pass on the keywords, loc and scale, so they were just quietly >> ignored. >> >> It looks like now moment handles loc and scale >> >>>>> stats.uniform.mean(1, loc=3, scale=5) >> 5.5 >>>>> stats.uniform.moment(1, loc=3, scale=5) >> 5.5 >> >>>>> stats.uniform.ppf([0, 1], loc=3, scale=5) >> array([ 3., ?8.]) >>>>> stats.uniform.ppf([0, 1], loc=3, scale=5).mean() >> 5.5 >> >> >> Josef > > Ok. So this is clear now. . Let's try to wrap up the above couple of > mails. The problem with the moments is solved. The point about > rv.dist.b in the examples text is not in my opinion. Is it ok to make > a ticket for this? I'll try to figure out how to do this. Hopefully > this is covered in the doc of Ralf. Fine, the template would need a new place holder, the value can then be inserted when the doc string is created in the line that I linked to, I think. > > BTW. I can also try to repair a few of the tickets. For instance, I > came across a similar problem like the one in > http://projects.scipy.org/scipy/ticket/1493. I think I have a nice > solution for this. That would be a good ticket to have a solution for. Thanks, Josef > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From vanforeest at gmail.com Sun Apr 22 13:41:59 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sun, 22 Apr 2012 19:41:59 +0200 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: Message-ID: >> Ok. So this is clear now. . Let's try to wrap up the above couple of >> mails. The problem with the moments is solved. The point about >> rv.dist.b in the examples text is not in my opinion. Is it ok to make >> a ticket for this? I'll try to figure out how to do this. Hopefully >> this is covered in the doc of Ralf. I tried to submit a ticket at http://projects.scipy.org/scipy but I get an error that the database is locked. Should I just wait and see? >> BTW. I can also try to repair a few of the tickets. For instance, I >> came across a similar problem like the one in >> http://projects.scipy.org/scipy/ticket/1493. I think I have a nice >> solution for this. > > That would be a good ticket to have a solution for. I give it a try right now. From josef.pktd at gmail.com Sun Apr 22 13:44:03 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 22 Apr 2012 13:44:03 -0400 Subject: [SciPy-Dev] scipy.stats In-Reply-To: References: Message-ID: On Sun, Apr 22, 2012 at 1:41 PM, nicky van foreest wrote: >>> Ok. So this is clear now. . Let's try to wrap up the above couple of >>> mails. The problem with the moments is solved. The point about >>> rv.dist.b in the examples text is not in my opinion. Is it ok to make >>> a ticket for this? I'll try to figure out how to do this. Hopefully >>> this is covered in the doc of Ralf. > > I tried to submit a ticket at http://projects.scipy.org/scipy ?but I > get an error that the database is locked. Should I just wait and see? Try again, and again, ... and hope it doesn't take more than a few minutes. Josef > >>> BTW. I can also try to repair a few of the tickets. For instance, I >>> came across a similar problem like the one in >>> http://projects.scipy.org/scipy/ticket/1493. I think I have a nice >>> solution for this. >> >> That would be a good ticket to have a solution for. > > I give it a try right now. > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From vanforeest at gmail.com Sun Apr 22 14:29:08 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sun, 22 Apr 2012 20:29:08 +0200 Subject: [SciPy-Dev] scipy.stats: algorithm to for ticket 1493 Message-ID: Hi, I saw Ralf's suggestions on issues of scipy.stats to work on. In another project I had to solve a similar problem as the one mentioned in http://projects.scipy.org/scipy/ticket/1493. I like to propose the algorithm below to tackle this. I am looking forward to your feedback. Once the algo seems to work impeccably, I'll try to convert it such that it can be incorporated in scipy.stats and add test cases. Lets first test the example of the ticket: In [21]: from scipy.stats import invnorm In [22]: from scipy import optimize In [23]: invnorm.xb = 10 In [24]: sol = invnorm.ppf(0.8455, 7.24000019602, scale= 2.51913630166) In [25]: print sol 25.1878345282 In [26]: print invnorm.cdf(sol, 7.24000019602, scale=2.51913630166) 0.8455 In [27]: Weird that this leads to a solution, as the solution is not in the search interval. Trying xb = 5 leads to the reported error. Now a solution: # Proposed solution. # A quintessential property in the algorithm is that the cdf is an # non-decreasing function. def findppf(q): # search until xb is large enough left, right = invnorm.xa, invnorm.xb while invnorm.cdf(right, 7.24000019602, scale=2.51913630166) < q: left = right right *= 2 return optimize.brentq(lambda x: \ invnorm.cdf(x, 7.24000019602, scale=2.51913630166) - q,\ left, right) # Perhaps increasing with a larger factor than 2 is faster, as brentq # convergest very fast. Taking `right` too large is problably not a real # problem. On the other hand, when `right` increases too fast, cdf(right) may # become numerically equal to 1. So, what would be a good factor? sol = findppf(0.8455) print sol print invnorm.cdf(sol, 7.24000019602, scale=2.51913630166) This code (I ran it in a file, not in ipython) gives the right results, at least for these values. Nicky From vanforeest at gmail.com Sun Apr 22 14:42:28 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sun, 22 Apr 2012 20:42:28 +0200 Subject: [SciPy-Dev] scipy.stats: algorithm to for ticket 1493 In-Reply-To: References: Message-ID: I just realized, xa may be too large... hence we should search such that cdf(left) < q < cdf(right). *Assuming* that xa < 0 and xb > 0 the following should be better def findppf(q): # search until cdf(left) < q < cdf(right) left, right = invnorm.xa, invnorm.xb while invnorm.cdf(left, 7.24000019602, scale=2.51913630166) > q: right = left left *= 2 while invnorm.cdf(right, 7.24000019602, scale=2.51913630166) < q: left = right right *= 2 return optimize.brentq(lambda x: \ invnorm.cdf(x, 7.24000019602, scale=2.51913630166) - q,\ left, right) Should a test on xa < 0 and xb>0 be added? From josef.pktd at gmail.com Sun Apr 22 20:27:36 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 22 Apr 2012 20:27:36 -0400 Subject: [SciPy-Dev] scipy.stats: algorithm to for ticket 1493 In-Reply-To: References: Message-ID: On Sun, Apr 22, 2012 at 2:42 PM, nicky van foreest wrote: > I just realized, xa may be too large... hence we should search such > that cdf(left) < q < cdf(right). > > *Assuming* that xa < 0 and xb > 0 the following should be better > > def findppf(q): > ? ?# search ?until cdf(left) < q < cdf(right) > ? ?left, right = invnorm.xa, invnorm.xb > ? ?while invnorm.cdf(left, 7.24000019602, scale=2.51913630166) > q: > ? ? ? ?right = left > ? ? ? ?left *= 2 > ? ?while invnorm.cdf(right, 7.24000019602, scale=2.51913630166) < q: > ? ? ? ?left = right > ? ? ? ?right *= 2 > ? ?return optimize.brentq(lambda x: \ > ? ? ? ? ? ? ? ? ? ? ? ? ? invnorm.cdf(x, 7.24000019602, > scale=2.51913630166) - q,\ > ? ? ? ? ? ? ? ? ? ? ? ? ? left, right) > > Should a test on xa < 0 and xb>0 be added? for xa, xb it doesn't matter whether they are larger or smaller than zero, so I don't think we need a special check it looks good in a few more example cases. The difficult cases will be where cdf also doesn't exist and we need to get it through integrate.quad, but I don't remember which distribution is a good case. There is a testcase in the test suite, where I tried to roundtrip close to the 0, 1 boundary before running into failures with some distributions https://github.com/scipy/scipy/blob/master/scipy/stats/tests/test_continuous_basic.py#L307 to try out how well tyour solution works, the roundtrip could be done with, for example, q= [1e-8, 1-1e-8] and see at which distribution it breaks and why (if any) Note: I removed the scale in your example, because internal _ppf works on the standard distribution, loc=0, scale=1. loc and scale are added generically in .ppf Thanks, Josef from scipy import stats, optimize def findppf(dist, q, *args): # search until cdf(left) < q < cdf(right) left, right = dist.xa, dist.xb counter = 0 while dist.cdf(left, *args) > q: right = left left *= 2 counter += 1 print counter, left, right while dist.cdf(right, *args) < q: left = right right *= 2 counter += 1 print counter, left, right return optimize.brentq(lambda x: dist.cdf(x, *args) - q, left, right) print print 'invgauss' s = 7.24000019602 sol = findppf(stats.invgauss, 0.8455, s) print sol sol = findppf(stats.invgauss, 1-1e-8, s) print 'roundtrip', 1-1e-8, sol, stats.invgauss.cdf(sol, s) print 1e-30, stats.invgauss.cdf(findppf(stats.invgauss, 1e-30, s), s) print '\nt' print findppf(stats.t, 1-1e-8, s), stats.t.ppf(1-1e-8, s) print findppf(stats.t, 1e-8, s), stats.t.ppf(1e-8, s) print '\ncauchy' print findppf(stats.cauchy, 1e-8), stats.cauchy.ppf(1e-8) print '\nf' print findppf(stats.f, 1-1e-8, 2, 10), stats.f.ppf(1-1e-8, 2, 10) > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From josef.pktd at gmail.com Sun Apr 22 20:44:31 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 22 Apr 2012 20:44:31 -0400 Subject: [SciPy-Dev] scipy.stats: algorithm to for ticket 1493 In-Reply-To: References: Message-ID: On Sun, Apr 22, 2012 at 8:27 PM, wrote: > On Sun, Apr 22, 2012 at 2:42 PM, nicky van foreest wrote: >> I just realized, xa may be too large... hence we should search such >> that cdf(left) < q < cdf(right). >> >> *Assuming* that xa < 0 and xb > 0 the following should be better >> >> def findppf(q): >> ? ?# search ?until cdf(left) < q < cdf(right) >> ? ?left, right = invnorm.xa, invnorm.xb >> ? ?while invnorm.cdf(left, 7.24000019602, scale=2.51913630166) > q: >> ? ? ? ?right = left >> ? ? ? ?left *= 2 >> ? ?while invnorm.cdf(right, 7.24000019602, scale=2.51913630166) < q: >> ? ? ? ?left = right >> ? ? ? ?right *= 2 >> ? ?return optimize.brentq(lambda x: \ >> ? ? ? ? ? ? ? ? ? ? ? ? ? invnorm.cdf(x, 7.24000019602, >> scale=2.51913630166) - q,\ >> ? ? ? ? ? ? ? ? ? ? ? ? ? left, right) >> >> Should a test on xa < 0 and xb>0 be added? > > for xa, xb it doesn't matter whether they are larger or smaller than > zero, so I don't think we need a special check > > it looks good in a few more example cases. > > The difficult cases will be where cdf also doesn't exist and we need > to get it through integrate.quad, but I don't remember which > distribution is a good case. > There is a testcase in the test suite, where I tried to roundtrip > close to the 0, 1 boundary before running into failures with some > distributions > https://github.com/scipy/scipy/blob/master/scipy/stats/tests/test_continuous_basic.py#L307 > > to try out how well tyour solution works, the roundtrip could be done > with, for example, q= [1e-8, 1-1e-8] and see at which distribution it > breaks and why (if any) I might not have been clear. I think the patch is good, and a good improvement over the current situation with fixed xa, xb. I just think that we are not able to reach the q=0, q=1 boundaries, since for some distributions we will run into other numerical problems. And I'm curious how far we can get with this. I don't have much of an opinion about the factor, times 2 or larger. Similarly, I don't know whether the default xa and xb are good. I changed them for a few distributions, but only where I saw obvious improvements. (There is a similar expansion of the trial space in the discrete distributions where I also was just guessing how fast to go and when to stop.) Josef > > Note: I removed the scale in your example, because internal _ppf works > on the standard distribution, loc=0, scale=1. loc and scale are added > generically in .ppf > > Thanks, > > Josef > > > from scipy import stats, optimize > > def findppf(dist, q, *args): > ? ?# search ?until cdf(left) < q < cdf(right) > > ? ?left, right = dist.xa, dist.xb > > ? ?counter = 0 > ? ?while dist.cdf(left, *args) > q: > ? ? ? ?right = left > ? ? ? ?left *= 2 > ? ? ? ?counter += 1 > ? ? ? ?print counter, left, right > > ? ?while dist.cdf(right, *args) < q: > ? ? ? ?left = right > ? ? ? ?right *= 2 > ? ? ? ?counter += 1 > ? ? ? ?print counter, left, right > > ? ?return optimize.brentq(lambda x: dist.cdf(x, *args) - q, left, right) > > print > print 'invgauss' > s = 7.24000019602 > sol = ?findppf(stats.invgauss, 0.8455, s) > print sol > sol = findppf(stats.invgauss, 1-1e-8, s) > print 'roundtrip', 1-1e-8, sol, stats.invgauss.cdf(sol, s) > print 1e-30, stats.invgauss.cdf(findppf(stats.invgauss, 1e-30, s), s) > > print '\nt' > print ?findppf(stats.t, 1-1e-8, s), stats.t.ppf(1-1e-8, s) > print ?findppf(stats.t, 1e-8, s), stats.t.ppf(1e-8, s) > print '\ncauchy' > print ?findppf(stats.cauchy, 1e-8), stats.cauchy.ppf(1e-8) > print '\nf' > print findppf(stats.f, 1-1e-8, 2, 10), stats.f.ppf(1-1e-8, 2, 10) > > > > >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev From josef.pktd at gmail.com Mon Apr 23 09:21:31 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 23 Apr 2012 09:21:31 -0400 Subject: [SciPy-Dev] scipy.stats: algorithm to for ticket 1493 In-Reply-To: References: Message-ID: On Sun, Apr 22, 2012 at 8:44 PM, wrote: > On Sun, Apr 22, 2012 at 8:27 PM, ? wrote: >> On Sun, Apr 22, 2012 at 2:42 PM, nicky van foreest wrote: >>> I just realized, xa may be too large... hence we should search such >>> that cdf(left) < q < cdf(right). >>> >>> *Assuming* that xa < 0 and xb > 0 the following should be better >>> >>> def findppf(q): >>> ? ?# search ?until cdf(left) < q < cdf(right) >>> ? ?left, right = invnorm.xa, invnorm.xb >>> ? ?while invnorm.cdf(left, 7.24000019602, scale=2.51913630166) > q: >>> ? ? ? ?right = left >>> ? ? ? ?left *= 2 >>> ? ?while invnorm.cdf(right, 7.24000019602, scale=2.51913630166) < q: >>> ? ? ? ?left = right >>> ? ? ? ?right *= 2 >>> ? ?return optimize.brentq(lambda x: \ >>> ? ? ? ? ? ? ? ? ? ? ? ? ? invnorm.cdf(x, 7.24000019602, >>> scale=2.51913630166) - q,\ >>> ? ? ? ? ? ? ? ? ? ? ? ? ? left, right) >>> >>> Should a test on xa < 0 and xb>0 be added? >> >> for xa, xb it doesn't matter whether they are larger or smaller than >> zero, so I don't think we need a special check >> >> it looks good in a few more example cases. >> >> The difficult cases will be where cdf also doesn't exist and we need >> to get it through integrate.quad, but I don't remember which >> distribution is a good case. >> There is a testcase in the test suite, where I tried to roundtrip >> close to the 0, 1 boundary before running into failures with some >> distributions >> https://github.com/scipy/scipy/blob/master/scipy/stats/tests/test_continuous_basic.py#L307 >> >> to try out how well tyour solution works, the roundtrip could be done >> with, for example, q= [1e-8, 1-1e-8] and see at which distribution it >> breaks and why (if any) > > I might not have been clear. I think the patch is good, and a good > improvement over the current situation with fixed xa, xb. > I just think that we are not able to reach the q=0, q=1 boundaries, > since for some distributions we will run into other numerical > problems. And I'm curious how far we can get with this. > > I don't have much of an opinion about the factor, times 2 or larger. > Similarly, I don't know whether the default xa and xb are good. I > changed them for a few distributions, but only where I saw obvious > improvements. > (There is a similar expansion of the trial space in the discrete > distributions where I also was just guessing how fast to go and when > to stop.) I thought that using some adaptive heuristic to expand the trial interval might end up doing something similar to fsolve. However, your solution is faster than using fsolve. Just a quick timing in the adjusted script: brentq is much faster than fsolve for invgauss, and only a little bit slower for cauchy (fat tails) Josef > > Josef > >> >> Note: I removed the scale in your example, because internal _ppf works >> on the standard distribution, loc=0, scale=1. loc and scale are added >> generically in .ppf >> >> Thanks, >> >> Josef >> >> >> from scipy import stats, optimize >> >> def findppf(dist, q, *args): >> ? ?# search ?until cdf(left) < q < cdf(right) >> >> ? ?left, right = dist.xa, dist.xb >> >> ? ?counter = 0 >> ? ?while dist.cdf(left, *args) > q: >> ? ? ? ?right = left >> ? ? ? ?left *= 2 >> ? ? ? ?counter += 1 >> ? ? ? ?print counter, left, right >> >> ? ?while dist.cdf(right, *args) < q: >> ? ? ? ?left = right >> ? ? ? ?right *= 2 >> ? ? ? ?counter += 1 >> ? ? ? ?print counter, left, right >> >> ? ?return optimize.brentq(lambda x: dist.cdf(x, *args) - q, left, right) >> >> print >> print 'invgauss' >> s = 7.24000019602 >> sol = ?findppf(stats.invgauss, 0.8455, s) >> print sol >> sol = findppf(stats.invgauss, 1-1e-8, s) >> print 'roundtrip', 1-1e-8, sol, stats.invgauss.cdf(sol, s) >> print 1e-30, stats.invgauss.cdf(findppf(stats.invgauss, 1e-30, s), s) >> >> print '\nt' >> print ?findppf(stats.t, 1-1e-8, s), stats.t.ppf(1-1e-8, s) >> print ?findppf(stats.t, 1e-8, s), stats.t.ppf(1e-8, s) >> print '\ncauchy' >> print ?findppf(stats.cauchy, 1e-8), stats.cauchy.ppf(1e-8) >> print '\nf' >> print findppf(stats.f, 1-1e-8, 2, 10), stats.f.ppf(1-1e-8, 2, 10) >> >> >> >> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev -------------- next part -------------- # -*- coding: utf-8 -*- """ Created on Sun Apr 22 19:39:34 2012 Author: Nicky van Foreest, Josef Perktold """ from scipy import stats, optimize def findppf(dist, q, *args): # search until cdf(left) < q < cdf(right) left, right = dist.xa, dist.xb #counter = 0 while dist.cdf(left, *args) > q: right = left left *= 2 #counter += 1 #print counter, left, right while dist.cdf(right, *args) < q: left = right right *= 2 #counter += 1 #print counter, left, right return optimize.brentq(lambda x: dist.cdf(x, *args) - q, left, right) def findppf2(dist, q, *args): # search until cdf(left) < q < cdf(right) left, right = dist.xa, dist.xb if dist.cdf(left, *args) > q: start = left elif dist.cdf(right, *args) < q: start = right else: return optimize.brentq(lambda x: dist.cdf(x, *args) - q, left, right) return optimize.fsolve(lambda x: dist.cdf(x, *args) - q, start) print print 'invgauss' s = 7.24000019602 sol = findppf(stats.invgauss, 0.8455, s) print sol sol = findppf(stats.invgauss, 1-1e-8, s) print 'roundtrip', 1-1e-8, sol, stats.invgauss.cdf(sol, s) print 1e-30, stats.invgauss.cdf(findppf(stats.invgauss, 1e-30, s), s) print '\nt' print findppf(stats.t, 1-1e-8, s), stats.t.ppf(1-1e-8, s) print findppf(stats.t, 1e-8, s), stats.t.ppf(1e-8, s) print '\ncauchy' print findppf(stats.cauchy, 1e-8), stats.cauchy.ppf(1e-8) print findppf2(stats.cauchy, 1e-8), stats.cauchy.ppf(1e-8) print '\nf' print findppf(stats.f, 1-1e-8, 2, 10), stats.f.ppf(1-1e-8, 2, 10) import time n_repl = 200 s = 10 t0 = time.time() for _ in xrange(n_repl): findppf(stats.invgauss, 1 - 1e-8, s) findppf(stats.cauchy, 1e-8) t1 = time.time() for _ in xrange(n_repl): findppf2(stats.invgauss, 1 - 1e-8, s) findppf2(stats.cauchy, 1e-8) t2 = time.time() print 'time', t1-t0, t2-t1 From vanforeest at gmail.com Mon Apr 23 14:58:24 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Mon, 23 Apr 2012 20:58:24 +0200 Subject: [SciPy-Dev] scipy.stats: algorithm to for ticket 1493 In-Reply-To: References: Message-ID: >> for xa, xb it doesn't matter whether they are larger or smaller than >>> zero, so I don't think we need a special check I think it does, for suppose that in the algo left = xa = 0.5 (because the user has been fiddling with xa) and cdf(xa) > q. Then setting left = 2*left will only worsen the problem. Or do I miss something? >>> it looks good in a few more example cases. I found another small bug, please see the included code. >>> >>> The difficult cases will be where cdf also doesn't exist and we need >>> to get it through integrate.quad, but I don't remember which >>> distribution is a good case. This case is harder indeed. (I assume you mean by 'not exist' that there is no closed form expression for the cdf, like the normal distribution). Computing the ppf would involve calling quad a lot of times. This is wasteful especially since the computation of cdf(b) includes the computation of cdf(a) for a < b, supposing that quad runs from -np.inf to b. We could repair this by computing cdf(b) = cdf(a) + quad(f, a, b), assuming that cdf(a) has been computed already. (perhaps I am not clear enough here. If so, let me know.) >> I just think that we are not able to reach the q=0, q=1 boundaries, >> since for some distributions we will run into other numerical >> problems. And I'm curious how far we can get with this. I completely missed to include a test on the obvious cases q >= 1. - np.finfo(float).eps and q <= np.finfo(float).eps. It is now in the attached file. >> Similarly, I don't know whether the default xa and xb are good. I >> changed them for a few distributions, but only where I saw obvious >> improvements. I also have no clue what would be good values in general. The choices seems reasonable from a practical point of view... >>> Note: I removed the scale in your example, because internal _ppf works >>> on the standard distribution, loc=0, scale=1. loc and scale are added >>> generically in .ppf Thanks. I included also **kwds so that I can pass scale = 10 or something like this. Once all works as it should, I'll try to convert the code such that it fits nicely in distributions.py. The simultaneous updating of left and right in the previous algo is wrong. Suppose for instance that cdf(left) < cdf(right) < q. Then both left and right would `move to the left'. This is clearly wrong. The included code should be better. With regard to the values of xb and xa. Can a `ordinary' user change these? If so, then the ppf finder should include some protection in my opinion. If not, the user will get an error that brentq has not the right limits, but this error might be somewhat unexpected. (What has brentq to do with finding the ppf?) Of course, looking at the code this is clear, but I expect most users will no do so. The code contains two choices about how to handle xa and xb. Do you have any preference? Thanks for your feedback. Very helpful. -------------- next part -------------- A non-text attachment was scrubbed... Name: findppf.py Type: application/octet-stream Size: 1792 bytes Desc: not available URL: From josef.pktd at gmail.com Mon Apr 23 16:04:57 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 23 Apr 2012 16:04:57 -0400 Subject: [SciPy-Dev] scipy.stats: algorithm to for ticket 1493 In-Reply-To: References: Message-ID: On Mon, Apr 23, 2012 at 2:58 PM, nicky van foreest wrote: >>> for xa, xb it doesn't matter whether they are larger or smaller than >>>> zero, so I don't think we need a special check > > I think it does, for suppose that in the algo left = xa = 0.5 (because > the user has been fiddling with xa) and cdf(xa) > q. Then ?setting > left = 2*left will only worsen the problem. Or do I miss something? True, however I don't think we have any predefined xa and xb that both are strictly positive or negative values. pareto is the only distribution bounded away from zero that I know and it has xa = -10 > >>>> it looks good in a few more example cases. > > I found another small bug, please see the included code. later today > >>>> >>>> The difficult cases will be where cdf also doesn't exist and we need >>>> to get it through integrate.quad, but I don't remember which >>>> distribution is a good case. > > This case is harder indeed. (I assume you mean by 'not exist' that > there is no closed form expression for the cdf, like the normal > distribution). Computing the ppf would involve calling quad a lot of > times. This is wasteful especially since the computation of cdf(b) > includes the computation of cdf(a) for a < b, supposing that quad runs > from -np.inf to b. We could repair this by computing cdf(b) = cdf(a) + > quad(f, a, b), assuming that cdf(a) has been computed already. > (perhaps I am not clear enough here. If so, let me know.) not exists = not defined as _cdf method could also be scipy.special if there are no closed form expressions quad should run from dist.a to x, I guess, dist.a might be -inf > >>> I just think that we are not able to reach the q=0, q=1 boundaries, >>> since for some distributions we will run into other numerical >>> problems. And I'm curious how far we can get with this. > > I completely missed to include a test on the obvious cases q >= 1. - > np.finfo(float).eps and q <= np.finfo(float).eps. It is now in the > attached file. >>> findppf(stats.expon, 1e-30) -6.3593574850511882e-13 lower bound q can be small and won't run into problems with being 0, until 1e-300? The right answer should be dist.b for q=numerically 1, lower support point is dist.a but I don't see when we would need it. > >>> Similarly, I don't know whether the default xa and xb are good. I >>> changed them for a few distributions, but only where I saw obvious >>> improvements. > > I also have no clue what would be good values in general. The choices > seems reasonable from a practical point of view... > >>>> Note: I removed the scale in your example, because internal _ppf works >>>> on the standard distribution, loc=0, scale=1. loc and scale are added >>>> generically in .ppf > > Thanks. I included also **kwds so that I can pass scale = 10 or > something like this. Once all works as it should, I'll try to convert > the code such that it fits nicely in distributions.py. with self instead of dist, it should already have the signature about right, no **kwds I assume > > The simultaneous updating of left and right in the previous algo is > wrong. Suppose for instance that cdf(left) < cdf(right) < q. Then both > left and right would `move to the left'. This is clearly wrong. The > included code should be better. would move to the *right* ? I thought the original was a nice trick, we can shift both left and right since we know it has to be in that direction, the cut of range cannot contain the answer. Or do I miss the point? > > With regard to the values of xb and xa. Can a `ordinary' user change > these? If so, then the ppf finder should include some protection in my > opinion. If not, the user will get an error that brentq has not the > right limits, but this error might be somewhat unexpected. (What has > brentq to do with finding the ppf?) Of course, looking at the code > this is clear, but I expect most users will no do so. I don't think `ordinary' users should touch xa, xb, but they could. Except for getting around the limitation in this ticket there is no reason to change xa, xb, so we could make them private _xa, _xb instead. > > The code contains two choices about how to handle xa and xb. Do you > have any preference? I don't really like choice 1, because it removes the use of the predefined xa, xb. On the other hand, with this extension, xa and xb wouldn't be really necessary anymore. another possibility would be to try except brentq with xa, xb first and get most cases, and switch to your version if needed. I'm not sure xa, xb are defined well enough that it's worth to go this route, though. Thanks for working on this, Josef > > Thanks for your feedback. Very helpful. > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From vanforeest at gmail.com Mon Apr 23 16:18:01 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Mon, 23 Apr 2012 22:18:01 +0200 Subject: [SciPy-Dev] scipy.stats: algorithm to for ticket 1493 In-Reply-To: References: Message-ID: I'll get back to this tomorrow evening. I promised to finish something today. With respect to computing the sum of two random variables: this turns out to be quite a challenge. To resolve this I decided to formulate it as a possible assignment for students... Thus, this takes somewhat longer than I expected. On the other hand, hopefully it leads to some more scipy converts... On 23 April 2012 22:04, wrote: > On Mon, Apr 23, 2012 at 2:58 PM, nicky van foreest wrote: >>>> for xa, xb it doesn't matter whether they are larger or smaller than >>>>> zero, so I don't think we need a special check >> >> I think it does, for suppose that in the algo left = xa = 0.5 (because >> the user has been fiddling with xa) and cdf(xa) > q. Then ?setting >> left = 2*left will only worsen the problem. Or do I miss something? > > True, however I don't think we have any predefined xa and xb that both > are strictly positive or negative values. > pareto is the only distribution bounded away from zero that I know and > it has xa = -10 > >> >>>>> it looks good in a few more example cases. >> >> I found another small bug, please see the included code. > > later today > >> >>>>> >>>>> The difficult cases will be where cdf also doesn't exist and we need >>>>> to get it through integrate.quad, but I don't remember which >>>>> distribution is a good case. >> >> This case is harder indeed. (I assume you mean by 'not exist' that >> there is no closed form expression for the cdf, like the normal >> distribution). Computing the ppf would involve calling quad a lot of >> times. This is wasteful especially since the computation of cdf(b) >> includes the computation of cdf(a) for a < b, supposing that quad runs >> from -np.inf to b. We could repair this by computing cdf(b) = cdf(a) + >> quad(f, a, b), assuming that cdf(a) has been computed already. >> (perhaps I am not clear enough here. If so, let me know.) > > not exists = not defined as _cdf method ?could also be scipy.special > if there are no closed form expressions > > quad should run from dist.a to x, I guess, dist.a might be -inf > >> >>>> I just think that we are not able to reach the q=0, q=1 boundaries, >>>> since for some distributions we will run into other numerical >>>> problems. And I'm curious how far we can get with this. >> >> I completely missed to include a test on the obvious cases q >= 1. - >> np.finfo(float).eps and q <= np.finfo(float).eps. It is now in the >> attached file. > >>>> findppf(stats.expon, 1e-30) > -6.3593574850511882e-13 > > lower bound q can be small and won't run into problems with being 0, > until 1e-300? > > The right answer should be dist.b for q=numerically 1, lower support > point is dist.a but I don't see when we would need it. > >> >>>> Similarly, I don't know whether the default xa and xb are good. I >>>> changed them for a few distributions, but only where I saw obvious >>>> improvements. >> >> I also have no clue what would be good values in general. The choices >> seems reasonable from a practical point of view... >> >>>>> Note: I removed the scale in your example, because internal _ppf works >>>>> on the standard distribution, loc=0, scale=1. loc and scale are added >>>>> generically in .ppf >> >> Thanks. I included also **kwds so that I can pass scale = 10 or >> something like this. Once all works as it should, I'll try to convert >> the code such that it fits nicely in distributions.py. > > with self instead of dist, it should already have the signature about > right, no **kwds I assume > >> >> The simultaneous updating of left and right in the previous algo is >> wrong. Suppose for instance that cdf(left) < cdf(right) < q. Then both >> left and right would `move to the left'. This is clearly wrong. The >> included code should be better. > > would move to the *right* ? > > I thought the original was a nice trick, we can shift both left and > right since we know it has to be in that direction, the cut of range > cannot contain the answer. > > Or do I miss the point? > >> >> With regard to the values of xb and xa. Can a `ordinary' user change >> these? If so, then the ppf finder should include some protection in my >> opinion. If not, the user will get an error that brentq has not the >> right limits, but this error might be somewhat unexpected. (What has >> brentq to do with finding the ppf?) Of course, looking at the code >> this is clear, but I expect most users will no do so. > > I don't think ?`ordinary' users should touch xa, xb, but they could. > Except for getting around the limitation in this ticket there is no > reason to change xa, xb, so we could make them private _xa, _xb > instead. > > >> >> The code contains two choices about how to handle xa and xb. Do you >> have any preference? > > I don't really like choice 1, because it removes the use of the > predefined xa, xb. On the other hand, with this extension, xa and xb > wouldn't be really necessary anymore. > > another possibility would be to try except brentq with xa, xb first > and get most cases, and switch to your version if needed. I'm not sure > xa, xb are defined well enough that it's worth to go this route, > though. > > Thanks for working on this, > > Josef > >> >> Thanks for your feedback. Very helpful. >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From warren.weckesser at enthought.com Wed Apr 25 08:20:19 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Wed, 25 Apr 2012 07:20:19 -0500 Subject: [SciPy-Dev] Cython 0.16 Message-ID: Is there currently a constraint on the version of cython that can be used in scipy? If so, can we bump it up to the latest version (0.16)? I would like to take advantage of the fused types in a extension module that I'm working on. Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Apr 25 08:45:02 2012 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 25 Apr 2012 13:45:02 +0100 Subject: [SciPy-Dev] Cython 0.16 In-Reply-To: References: Message-ID: On Wed, Apr 25, 2012 at 13:20, Warren Weckesser wrote: > Is there currently a constraint on the version of cython that can be used in > scipy?? If so, can we bump it up to the latest version (0.16)?? I would like > to take advantage of the fused types in a extension module that I'm working > on. The current policy is to generate the C sources and check them in too so that downstream builders don't have a dependency on Cython, only people who are actually modifying those Cython sources. So it's just a matter of making sure that scipy developers are on the same page. I have no objection. I would probably make a comment at the top of the file to specify that it requires Cython 0.16. -- Robert Kern From thouis at gmail.com Wed Apr 25 11:20:49 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Wed, 25 Apr 2012 17:20:49 +0200 Subject: [SciPy-Dev] Cython 0.16 In-Reply-To: References: Message-ID: On Wed, Apr 25, 2012 at 14:45, Robert Kern wrote: > On Wed, Apr 25, 2012 at 13:20, Warren Weckesser > wrote: >> Is there currently a constraint on the version of cython that can be used in >> scipy?? If so, can we bump it up to the latest version (0.16)?? I would like >> to take advantage of the fused types in a extension module that I'm working >> on. > > The current policy is to generate the C sources and check them in too > so that downstream builders don't have a dependency on Cython, only > people who are actually modifying those Cython sources. So it's just a > matter of making sure that scipy developers are on the same page. I > have no objection. I would probably make a comment at the top of the > file to specify that it requires Cython 0.16. Is there a way to have Cython check the version at compile time (Cython -> C) from within a .pyx file? I looked through the documentation and didn't find anything. Ray Jones From robert.kern at gmail.com Wed Apr 25 11:29:55 2012 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 25 Apr 2012 16:29:55 +0100 Subject: [SciPy-Dev] Cython 0.16 In-Reply-To: References: Message-ID: On Wed, Apr 25, 2012 at 16:20, Thouis (Ray) Jones wrote: > On Wed, Apr 25, 2012 at 14:45, Robert Kern wrote: >> On Wed, Apr 25, 2012 at 13:20, Warren Weckesser >> wrote: >>> Is there currently a constraint on the version of cython that can be used in >>> scipy?? If so, can we bump it up to the latest version (0.16)?? I would like >>> to take advantage of the fused types in a extension module that I'm working >>> on. >> >> The current policy is to generate the C sources and check them in too >> so that downstream builders don't have a dependency on Cython, only >> people who are actually modifying those Cython sources. So it's just a >> matter of making sure that scipy developers are on the same page. I >> have no objection. I would probably make a comment at the top of the >> file to specify that it requires Cython 0.16. > > Is there a way to have Cython check the version at compile time > (Cython -> C) from within a .pyx file? ?I looked through the > documentation and didn't find anything. No. -- Robert Kern From pav at iki.fi Wed Apr 25 12:46:14 2012 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 25 Apr 2012 18:46:14 +0200 Subject: [SciPy-Dev] Cython 0.16 In-Reply-To: References: Message-ID: 25.04.2012 14:20, Warren Weckesser kirjoitti: > Is there currently a constraint on the version of cython that can be > used in scipy? If so, can we bump it up to the latest version (0.16)? > I would like to take advantage of the fused types in a extension module > that I'm working on. +1 from me for moving to 0.16 (Yes, fused types https://github.com/pv/scipy-work/commits/enh/interpnd-fused-types ) From vanforeest at gmail.com Wed Apr 25 14:12:54 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Wed, 25 Apr 2012 20:12:54 +0200 Subject: [SciPy-Dev] scipy.stats: some questions/points about distributions.py + reply on ticket 1493 Message-ID: Hi Josef, Sorry for not responding earlier... too many obligations. Before I get back to your earlier mail, I have some naive questions about distributions.py. I hope you don't mind that I fire them at you. 1: https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L436 Is this code "dead"? Within distributions.py it is not called. Nearly the same code is written here: https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1180 2: I have a similar point about: https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L358 What is the use of this code? It is not called anywhere. Besides this, from our discussion about ticket 1493, this function returns the centralized moments, while the "real" moment E(X^n) should be returned. Hence, the code is also not correct, i.e., not in line with the documentation. 3: Suppose we would turn xa and xb into private atrributes _xa and _xb, then i suppose that https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L883 requires updating. 4: I have a hard time understanding the working (and goal) of https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L883 Where is the right place to ask for some clarification? Or should I just think harder? 5: The definition of arr in https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L60 does not add much (although it saves some characters at some points of the code), but makes it harder to read the code for novices like me. (I spent some time searching for a numpy function called arr, only to find out later that it was just a shorthand only used in the distribution.py module). Would it be a problem to replace such code by the proper numpy function? 6: https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L538 contains a typo. It should be Weisstein. 7: https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L625 This code gives me even a harder time than _argsreduce. I have to admit that I simply don't know what this code is trying to prevent/check/repair. Would you mind giving a hint? Nicky From josef.pktd at gmail.com Wed Apr 25 15:04:49 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Apr 2012 15:04:49 -0400 Subject: [SciPy-Dev] scipy.stats: some questions/points about distributions.py + reply on ticket 1493 In-Reply-To: References: Message-ID: On Wed, Apr 25, 2012 at 2:12 PM, nicky van foreest wrote: > Hi Josef, > > Sorry for not responding earlier... too many obligations. > > Before I get back to your earlier mail, I have some naive questions > about distributions.py. I hope you don't mind that I fire them at you. > > 1: > > https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L436 I never looked at this. It's not used anywhere. > > Is this code "dead"? Within distributions.py it is not called. Nearly > the same code is written here: > > https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1180 This is what is used for the generic ppf. > > > 2: > > I have a similar point about: > > https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L358 > > What is the use of this code? It is not called anywhere. Besides this, > from our ?discussion about ticket 1493, this function returns the > centralized moments, while the "real" moment E(X^n) should be > returned. Hence, the code is also not correct, i.e., not in line with > the documentation. I think this and skew, kurtosis are internal functions for fit_start, getting starting values for fit from the data, even if it's not used. in general: For the calculations it might sometimes be nicer to calculate central moments, and then convert them to non-central or the other way around. I have some helper functions for this in statsmodels and it is similarly used https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1745 (That's new code that I'm not so familiar with.) > > 3: > > Suppose we would turn xa and xb into private atrributes _xa and _xb, > then i suppose that > > https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L883 > > requires updating. Yes, but no big loss I think, given that it won't be needed anymore > > > 4: > > I have a hard time understanding the working (and goal) of > > https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L883 This ? xb : float, optional Upper bound for fixed point calculation for generic ppf. > > > Where is the right place to ask for some clarification? Or should I > just think harder? > > 5: > > The definition of arr in > > https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L60 > > does not add much (although it saves some characters at some points of > the code), but makes it harder to read the code for novices like me. > (I spent some time searching for a numpy function called arr, only to > find out later that it was just a shorthand only used in the > distribution.py module). Would it be a problem to replace such code by > the proper numpy function? But then these novices would just read some piece code instead of going through all 7000 lines looking for imports and redefinitions. And I suffered the same way. :) I don't have any problem with cleaning this up. I never checked if in some cases with lot's of generic loops the namespace lookup would significantly increase the runtime. > > 6: > > https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L538 > > contains a typo. It should be Weisstein. should be fixed then > > 7: > > https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L625 > > > This code gives me even a harder time than _argsreduce. I have to > admit that I simply don't know what this code is trying to > prevent/check/repair. Would you mind giving a hint? whats _argsreduce? https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L625 This has been rewritten by Per Brodtkorb. It is used in most methods to get the goodargs with which the distribution specific method is called. example ppf https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1524 first we are building the conditions for valid, good arguments. boundaries are filled, invalid arguments get nans. What's left over are the goodargs, the values of the method arguments for which we need to calculate the actual results. So we need to broadcast and select those arguments. -> argsreduce The distribution specific or generic ._ppf is then called with 1d arrays (of the same shape IIRC) of goodargs. then we can "place" the calculated values into the results arrays, next to the nans and boundaries. I hope that helps Thanks, Josef > > Nicky > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From vanforeest at gmail.com Wed Apr 25 15:21:21 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Wed, 25 Apr 2012 21:21:21 +0200 Subject: [SciPy-Dev] scipy.stats: algorithm to for ticket 1493 In-Reply-To: References: Message-ID: >>>>> The difficult cases will be where cdf also doesn't exist and we need >>>>> to get it through integrate.quad, but I don't remember which >>>>> distribution is a good case. >> >> This case is harder indeed. (I assume you mean by 'not exist' that >> there is no closed form expression for the cdf, like the normal >> distribution). Computing the ppf would involve calling quad a lot of >> times. This is wasteful especially since the computation of cdf(b) >> includes the computation of cdf(a) for a < b, supposing that quad runs >> from -np.inf to b. We could repair this by computing cdf(b) = cdf(a) + >> quad(f, a, b), assuming that cdf(a) has been computed already. >> (perhaps I am not clear enough here. If so, let me know.) > > not exists = not defined as _cdf method ?could also be scipy.special > if there are no closed form expressions I see, sure. >>>> I just think that we are not able to reach the q=0, q=1 boundaries, >>>> since for some distributions we will run into other numerical >>>> problems. And I'm curious how far we can get with this. >> >> I completely missed to include a test on the obvious cases q >= 1. - >> np.finfo(float).eps and q <= np.finfo(float).eps. It is now in the >> attached file. > >>>> findppf(stats.expon, 1e-30) > -6.3593574850511882e-13 This result shows actually that xa and xb are necessary to include in the specification of the distribution. The exponential distribution is (usually) defined only on [0, \infty) not on the negative numbers. The result above is negative though. This is of course a simple consequence of calling brentq. From a user's perspective, though, I would become very suspicious about this negative result. > The right answer should be dist.b for q=numerically 1, lower support > point is dist.a but I don't see when we would need it. I agree, provided xa and xb are always properly defined. But then, (just to be nitpicking), the definition of expon does not set xa and xb explicitly. Hence xa = -10, and this is somewhat undesirable, given the negative value above. >> >> The simultaneous updating of left and right in the previous algo is >> wrong. Suppose for instance that cdf(left) < cdf(right) < q. Then both >> left and right would `move to the left'. This is clearly wrong. The >> included code should be better. > > would move to the *right* ? Sure. > > I thought the original was a nice trick, we can shift both left and > right since we know it has to be in that direction, the cut of range > cannot contain the answer. > > Or do I miss the point? No, you are right. When I wrote this at first, I also thought about the point you bring up here. Then, I was somewhat dissatisfied with calling the while loop twice (suppose the left bound requires updating, then certainly the second while loop (to update the right bound) is unnecessary, and calling cdf(right) is useless). While trying to fix this, I forgot about my initial ideas... > >> >> With regard to the values of xb and xa. Can a `ordinary' user change >> these? If so, then the ppf finder should include some protection in my >> opinion. If not, the user will get an error that brentq has not the >> right limits, but this error might be somewhat unexpected. (What has >> brentq to do with finding the ppf?) Of course, looking at the code >> this is clear, but I expect most users will no do so. > > I don't think ?`ordinary' users should touch xa, xb, but they could. > Except for getting around the limitation in this ticket there is no > reason to change xa, xb, so we could make them private _xa, _xb > instead. I think that would be better. Thus, the developer that subclasses rv_continuous should set _xa and _xb properly. >> The code contains two choices about how to handle xa and xb. Do you >> have any preference? > > I don't really like choice 1, because it removes the use of the > predefined xa, xb. On the other hand, with this extension, xa and xb > wouldn't be really necessary anymore. In view of your example with findppf(expon(1e-30)) I prefer to use _xa and _xb. > > another possibility would be to try except brentq with xa, xb first > and get most cases, and switch to your version if needed. I'm not sure > xa, xb are defined well enough that it's worth to go this route, > though. I think that this makes the most sense. The definition of the class should include sensible values of xa and xb. All in all, I would like to make the following proposal to resolve the ticket in a generic way. 1) xa and xb should become private class members _xa and _xb 2) _xa and _xb should be given proper values in the class definition, e.g. expon._xa = 0 and expon._xb = 30., since exp(-30) = 9.35e-14. 3) given a quantile q in the ppf function, include a test on _cdf(_xa) <= q <= _cdf(_xb). If this fails, return an exception with some text that tells that either _cdf(_xa) > q or _cdf(_xb) < q. Given your comments I actually favor all this searching for left and right not that much anymore. It is generic, but it places the responsibility of good code at the wrong place. Nicky From josef.pktd at gmail.com Wed Apr 25 15:49:12 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Apr 2012 15:49:12 -0400 Subject: [SciPy-Dev] scipy.stats: algorithm to for ticket 1493 In-Reply-To: References: Message-ID: On Wed, Apr 25, 2012 at 3:21 PM, nicky van foreest wrote: >>>>>> The difficult cases will be where cdf also doesn't exist and we need >>>>>> to get it through integrate.quad, but I don't remember which >>>>>> distribution is a good case. >>> >>> This case is harder indeed. (I assume you mean by 'not exist' that >>> there is no closed form expression for the cdf, like the normal >>> distribution). Computing the ppf would involve calling quad a lot of >>> times. This is wasteful especially since the computation of cdf(b) >>> includes the computation of cdf(a) for a < b, supposing that quad runs >>> from -np.inf to b. We could repair this by computing cdf(b) = cdf(a) + >>> quad(f, a, b), assuming that cdf(a) has been computed already. >>> (perhaps I am not clear enough here. If so, let me know.) >> >> not exists = not defined as _cdf method ?could also be scipy.special >> if there are no closed form expressions > > I see, sure. > >>>>> I just think that we are not able to reach the q=0, q=1 boundaries, >>>>> since for some distributions we will run into other numerical >>>>> problems. And I'm curious how far we can get with this. >>> >>> I completely missed to include a test on the obvious cases q >= 1. - >>> np.finfo(float).eps and q <= np.finfo(float).eps. It is now in the >>> attached file. >> >>>>> findppf(stats.expon, 1e-30) >> -6.3593574850511882e-13 > > This result shows actually that xa and xb are necessary to include in > the specification of the distribution. The exponential distribution is > (usually) defined only on [0, \infty) not on the negative numbers. The > result above is negative though. This is of course a simple > consequence of calling brentq. From a user's perspective, though, I > would become very suspicious about this negative result. good argument to clean up xa, xb > >> The right answer should be dist.b for q=numerically 1, lower support >> point is dist.a but I don't see when we would need it. > > I agree, provided xa and xb are always properly defined. But then, > (just to be nitpicking), the definition of expon does not set xa and > xb explicitly. Hence xa = -10, and this is somewhat undesirable, given > the negative value above. > >>> >>> The simultaneous updating of left and right in the previous algo is >>> wrong. Suppose for instance that cdf(left) < cdf(right) < q. Then both >>> left and right would `move to the left'. This is clearly wrong. The >>> included code should be better. >> >> would move to the *right* ? > > Sure. > >> >> I thought the original was a nice trick, we can shift both left and >> right since we know it has to be in that direction, the cut of range >> cannot contain the answer. >> >> Or do I miss the point? > > No, you are right. When I wrote this at first, I also thought about > the point you bring up here. Then, I was somewhat dissatisfied with > calling the while loop twice (suppose the left bound requires > updating, then certainly the second while loop (to update the right > bound) is unnecessary, and calling cdf(right) is useless). While > trying to fix this, I forgot about my initial ideas... > >> >>> >>> With regard to the values of xb and xa. Can a `ordinary' user change >>> these? If so, then the ppf finder should include some protection in my >>> opinion. If not, the user will get an error that brentq has not the >>> right limits, but this error might be somewhat unexpected. (What has >>> brentq to do with finding the ppf?) Of course, looking at the code >>> this is clear, but I expect most users will no do so. >> >> I don't think ?`ordinary' users should touch xa, xb, but they could. >> Except for getting around the limitation in this ticket there is no >> reason to change xa, xb, so we could make them private _xa, _xb >> instead. > > I think that would be better. Thus, the developer that subclasses > rv_continuous should set _xa and _xb properly. > >>> The code contains two choices about how to handle xa and xb. Do you >>> have any preference? >> >> I don't really like choice 1, because it removes the use of the >> predefined xa, xb. On the other hand, with this extension, xa and xb >> wouldn't be really necessary anymore. > > In view of your example with findppf(expon(1e-30)) I prefer to use _xa and _xb. > >> >> another possibility would be to try except brentq with xa, xb first >> and get most cases, and switch to your version if needed. I'm not sure >> xa, xb are defined well enough that it's worth to go this route, >> though. > > I think that this makes the most sense. The definition of the class > should include sensible values of xa and xb. > > All in all, I would like to make the following proposal to resolve the > ticket in a generic way. > > 1) xa and xb should become private class members _xa and _xb > 2) _xa and _xb should be given proper values in the class definition, > e.g. expon._xa = 0 and expon._xb = 30., since exp(-30) = 9.35e-14. > 3) given a quantile q in the ppf function, include a test on _cdf(_xa) > <= q <= _cdf(_xb). If this fails, return an exception with some text > that tells that either _cdf(_xa) > q or _cdf(_xb) < q. > > Given your comments I actually favor all this searching for left and > right not that much anymore. It is generic, but it places the > responsibility of good code at the wrong place. 3) I prefer your expanding the search to raising an exception to the user. Note also that your 3) is inconsistent with 1). If a user visible exception is raised, then the user needs to change xa or xb, so it shouldn't be private. That's the current situation (except for a more cryptic message). 2) I'm all in favor, especially for one-side bound distributions, where it should be easy to go through those. There might be a few where the bound moves with the shape, but the only one I remember is genextreme and that has an explicit _ppf So I would prefer 1), 2) and your new enhanced generic _ppf Josef > > Nicky > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From vanforeest at gmail.com Wed Apr 25 16:03:17 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Wed, 25 Apr 2012 22:03:17 +0200 Subject: [SciPy-Dev] scipy.stats: some questions/points about distributions.py + reply on ticket 1493 In-Reply-To: References: Message-ID: >> 1: >> >> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L436 > > I never looked at this. It's not used anywhere. > >> >> Is this code "dead"? Within distributions.py it is not called. Nearly >> the same code is written here: >> >> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1180 > > This is what is used for the generic ppf. Yes, sure. Sorry for confusing you. L1180 makes good sense. But since L1180 is there, there appears to be no good reason to include the code at L436. >> 2: >> >> I have a similar point about: >> >> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L358 >> >> What is the use of this code? It is not called anywhere. Besides this, >> from our ?discussion about ticket 1493, this function returns the >> centralized moments, while the "real" moment E(X^n) should be >> returned. Hence, the code is also not correct, i.e., not in line with >> the documentation. > > I think this and skew, kurtosis are internal functions for fit_start, > getting starting values for fit from the data, even if it's not used. > in general: For the calculations it might sometimes be nicer to > calculate central moments, and then convert them to non-central or the > other way around. I have some helper functions for this in statsmodels > and it is similarly used > > https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1745 > > (That's new code that I'm not so familiar with.) I actually saw this code, and have my doubts about whether this is the best way to compute the non-central moments. Suppose that the computation of the central moment involves quad(). Then indeed the computations at these lines don't require a new call to quad(). However, there is a (slow) python for loop involved, the power function ** is called multiple times, and { n \choose k} is computed. (BTW, can I safely assume you use Latex?). Calling quad() on x**k to compute E(X^k) might be just a fast, although I did not test this hunch. Anyway quad( lamdba x: x**k *_pdf(x)) reads much easier. > >> >> 3: >> >> Suppose we would turn xa and xb into private atrributes _xa and _xb, >> then i suppose that >> >> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L883 >> >> requires updating. > > Yes, but no big loss I think, ?given that it won't be needed anymore Oops. Your other mail convinced to do use _xa and _xb.... See my other mail. >> 5: >> >> The definition of arr in >> >> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L60 >> >> does not add much (although it saves some characters at some points of >> the code), but makes it harder to read the code for novices like me. >> (I spent some time searching for a numpy function called arr, only to >> find out later that it was just a shorthand only used in the >> distribution.py module). Would it be a problem to replace such code by >> the proper numpy function? > > But then these novices would just read some piece code instead of > going through all 7000 lines looking for imports and redefinitions. > And I suffered the same way. :) I suppose you did :-) > > I don't have any problem with cleaning this up. I never checked if in > some cases with lot's of generic loops the namespace lookup would > significantly increase the runtime. Should it? I am not an expert on this, but I read in Langtangen's book that importing functions like so: from numpy import array, and so on, does not add much to the calling time of functions. However, if I am mistaken, please forget this point. > >> >> 6: >> >> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L538 >> >> contains a typo. It should be Weisstein. > > should be fixed then Should this become a ticket, or is it too minor? > >> >> 7: >> >> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L625 >> >> >> This code gives me even a harder time than _argsreduce. I have to >> admit that I simply don't know what this code is trying to >> prevent/check/repair. Would you mind giving a hint? > > whats _argsreduce? Sorry, I meant __argcheck(). The code at https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1195 is not very simple to understand, at least not for me. > > https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L625 > > This has been rewritten by Per Brodtkorb. > It is used in most methods to get the goodargs with which the > distribution specific method is called. > > example ppf https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1524 > > first we are building the conditions for valid, good arguments. > boundaries are filled, invalid arguments get nans. > What's left over are the goodargs, the values of the method arguments > for which we need to calculate the actual results. > So we need to broadcast and select those arguments. -> argsreduce > The distribution specific or generic ._ppf is then called with 1d > arrays (of the same shape IIRC) of goodargs. > > then we can "place" the calculated values into the results arrays, > next to the nans and boundaries. > > I hope that helps I'll try to understand it again. Thanks for your hints. > > Thanks, > > Josef > >> >> Nicky >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From josef.pktd at gmail.com Wed Apr 25 23:15:33 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Apr 2012 23:15:33 -0400 Subject: [SciPy-Dev] scipy.stats: algorithm to for ticket 1493 In-Reply-To: References: Message-ID: On Wed, Apr 25, 2012 at 3:49 PM, wrote: > On Wed, Apr 25, 2012 at 3:21 PM, nicky van foreest wrote: >>>>>>> The difficult cases will be where cdf also doesn't exist and we need >>>>>>> to get it through integrate.quad, but I don't remember which >>>>>>> distribution is a good case. >>>> >>>> This case is harder indeed. (I assume you mean by 'not exist' that >>>> there is no closed form expression for the cdf, like the normal >>>> distribution). Computing the ppf would involve calling quad a lot of >>>> times. This is wasteful especially since the computation of cdf(b) >>>> includes the computation of cdf(a) for a < b, supposing that quad runs >>>> from -np.inf to b. We could repair this by computing cdf(b) = cdf(a) + >>>> quad(f, a, b), assuming that cdf(a) has been computed already. >>>> (perhaps I am not clear enough here. If so, let me know.) >>> >>> not exists = not defined as _cdf method ?could also be scipy.special >>> if there are no closed form expressions >> >> I see, sure. >> >>>>>> I just think that we are not able to reach the q=0, q=1 boundaries, >>>>>> since for some distributions we will run into other numerical >>>>>> problems. And I'm curious how far we can get with this. >>>> >>>> I completely missed to include a test on the obvious cases q >= 1. - >>>> np.finfo(float).eps and q <= np.finfo(float).eps. It is now in the >>>> attached file. >>> >>>>>> findppf(stats.expon, 1e-30) >>> -6.3593574850511882e-13 >> >> This result shows actually that xa and xb are necessary to include in >> the specification of the distribution. The exponential distribution is >> (usually) defined only on [0, \infty) not on the negative numbers. The >> result above is negative though. This is of course a simple >> consequence of calling brentq. From a user's perspective, though, I >> would become very suspicious about this negative result. > > good argument to clean up xa, xb > >> >>> The right answer should be dist.b for q=numerically 1, lower support >>> point is dist.a but I don't see when we would need it. >> >> I agree, provided xa and xb are always properly defined. But then, >> (just to be nitpicking), the definition of expon does not set xa and >> xb explicitly. Hence xa = -10, and this is somewhat undesirable, given >> the negative value above. >> >>>> >>>> The simultaneous updating of left and right in the previous algo is >>>> wrong. Suppose for instance that cdf(left) < cdf(right) < q. Then both >>>> left and right would `move to the left'. This is clearly wrong. The >>>> included code should be better. >>> >>> would move to the *right* ? >> >> Sure. >> >>> >>> I thought the original was a nice trick, we can shift both left and >>> right since we know it has to be in that direction, the cut of range >>> cannot contain the answer. >>> >>> Or do I miss the point? >> >> No, you are right. When I wrote this at first, I also thought about >> the point you bring up here. Then, I was somewhat dissatisfied with >> calling the while loop twice (suppose the left bound requires >> updating, then certainly the second while loop (to update the right >> bound) is unnecessary, and calling cdf(right) is useless). While >> trying to fix this, I forgot about my initial ideas... >> >>> >>>> >>>> With regard to the values of xb and xa. Can a `ordinary' user change >>>> these? If so, then the ppf finder should include some protection in my >>>> opinion. If not, the user will get an error that brentq has not the >>>> right limits, but this error might be somewhat unexpected. (What has >>>> brentq to do with finding the ppf?) Of course, looking at the code >>>> this is clear, but I expect most users will no do so. >>> >>> I don't think ?`ordinary' users should touch xa, xb, but they could. >>> Except for getting around the limitation in this ticket there is no >>> reason to change xa, xb, so we could make them private _xa, _xb >>> instead. >> >> I think that would be better. Thus, the developer that subclasses >> rv_continuous should set _xa and _xb properly. >> >>>> The code contains two choices about how to handle xa and xb. Do you >>>> have any preference? >>> >>> I don't really like choice 1, because it removes the use of the >>> predefined xa, xb. On the other hand, with this extension, xa and xb >>> wouldn't be really necessary anymore. >> >> In view of your example with findppf(expon(1e-30)) I prefer to use _xa and _xb. >> >>> >>> another possibility would be to try except brentq with xa, xb first >>> and get most cases, and switch to your version if needed. I'm not sure >>> xa, xb are defined well enough that it's worth to go this route, >>> though. >> >> I think that this makes the most sense. The definition of the class >> should include sensible values of xa and xb. >> >> All in all, I would like to make the following proposal to resolve the >> ticket in a generic way. >> >> 1) xa and xb should become private class members _xa and _xb >> 2) _xa and _xb should be given proper values in the class definition, >> e.g. expon._xa = 0 and expon._xb = 30., since exp(-30) = 9.35e-14. >> 3) given a quantile q in the ppf function, include a test on _cdf(_xa) >> <= q <= _cdf(_xb). If this fails, return an exception with some text >> that tells that either _cdf(_xa) > q or _cdf(_xb) < q. >> >> Given your comments I actually favor all this searching for left and >> right not that much anymore. It is generic, but it places the >> responsibility of good code at the wrong place. > > 3) I prefer your expanding the search to raising an exception to the > user. Note also that your 3) is inconsistent with 1). If a user > visible exception is raised, then the user needs to change xa or xb, > so it shouldn't be private. That's the current situation (except for a > more cryptic message). > > 2) I'm all in favor, especially for one-side bound distributions, > where it should be easy to go through those. There might be a few > where the bound moves with the shape, but the only one I remember is > genextreme and that has an explicit _ppf > > So I would prefer 1), 2) and your new enhanced generic _ppf forgot to mention the main reason that I like your expanding search space is that the shape of the distribution can change a lot. Even if we set xa, xb to reasonable values for likely shape parameters they won't be good enough for others, as in the original ticket >>> stats.invgauss.stats(2) (array(2.0), array(8.0)) >>> stats.invgauss.stats(7) (array(7.0), array(343.0)) >>> stats.invgauss.stats(20) (array(20.0), array(8000.0)) >>> stats.invgauss.stats(100) (array(100.0), array(1000000.0)) >>> stats.invgauss.cdf(1000, 100) 0.98335562794321207 >>> findppf(stats.invgauss, 0.99, 100) 1926.520850319389 >>> findppf(stats.invgauss, 0.999, 100) 13928.012903371644 >>> findppf(stats.invgauss, 0.999, 1) 8.3548649291400938 --------- to get a rough idea: for xa, xb and a finite bound either left or right, all have generic xa=-10 or xb=10 >>> dist_cont = [getattr(stats.distributions, dname) for dname in dir(stats.distributions) if isinstance(getattr(stats.distributions, dname), stats.distributions.rv_continuous)] >>> left = [(d.name, d.a, d.xa) for d in dist_cont if not np.isneginf(d.a)] >>> pprint(left) [('alpha', 0.0, -10.0), ('anglit', -0.78539816339744828, -10.0), ('arcsine', 0.0, -10.0), ('beta', 0.0, -10.0), ('betaprime', 0.0, -10.0), ('bradford', 0.0, -10.0), ('burr', 0.0, -10.0), ('chi', 0.0, -10.0), ('chi2', 0.0, -10.0), ('cosine', -3.1415926535897931, -10.0), ('erlang', 0.0, -10.0), ('expon', 0.0, 0), ('exponpow', 0.0, -10.0), ('exponweib', 0.0, -10.0), ('f', 0.0, -10.0), ('fatiguelife', 0.0, -10.0), ('fisk', 0.0, -10.0), ('foldcauchy', 0.0, -10.0), ('foldnorm', 0.0, -10.0), ('frechet_r', 0.0, -10.0), ('gamma', 0.0, -10.0), ('gausshyper', 0.0, -10.0), ('genexpon', 0.0, -10.0), ('gengamma', 0.0, -10.0), ('genhalflogistic', 0.0, -10.0), ('genpareto', 0.0, -10.0), ('gilbrat', 0.0, -10.0), ('gompertz', 0.0, -10.0), ('halfcauchy', 0.0, -10.0), ('halflogistic', 0.0, -10.0), ('halfnorm', 0.0, -10.0), ('invgamma', 0.0, -10.0), ('invgauss', 0.0, -10.0), ('invnorm', 0.0, -10.0), ('invweibull', 0, -10.0), ('johnsonb', 0.0, -10.0), ('ksone', 0.0, -10.0), ('kstwobign', 0.0, -10.0), ('levy', 0.0, -10.0), ('loglaplace', 0.0, -10.0), ('lognorm', 0.0, -10.0), ('lomax', 0.0, -10.0), ('maxwell', 0.0, -10.0), ('mielke', 0.0, -10.0), ('nakagami', 0.0, -10.0), ('ncf', 0.0, -10.0), ('ncx2', 0.0, -10.0), ('pareto', 1.0, -10.0), ('powerlaw', 0.0, -10.0), ('powerlognorm', 0.0, -10.0), ('rayleigh', 0.0, -10.0), ('rdist', -1.0, -10.0), ('recipinvgauss', 0.0, -10.0), ('rice', 0.0, -10.0), ('semicircular', -1.0, -10.0), ('triang', 0.0, -10.0), ('truncexpon', 0.0, -10.0), ('uniform', 0.0, -10.0), ('wald', 0.0, -10.0), ('weibull_min', 0.0, -10.0), ('wrapcauchy', 0.0, -10.0)] >>> right = [(d.name, d.b, d.xb) for d in dist_cont if not np.isposinf(d.b)] >>> pprint(right) [('anglit', 0.78539816339744828, 10.0), ('arcsine', 1.0, 10.0), ('beta', 1.0, 10.0), ('betaprime', 500.0, 10.0), ('bradford', 1.0, 10.0), ('cosine', 3.1415926535897931, 10.0), ('frechet_l', 0.0, 10.0), ('gausshyper', 1.0, 10.0), ('johnsonb', 1.0, 10.0), ('levy_l', 0.0, 10.0), ('powerlaw', 1.0, 10.0), ('rdist', 1.0, 10.0), ('semicircular', 1.0, 10.0), ('triang', 1.0, 10.0), ('uniform', 1.0, 10.0), ('weibull_max', 0.0, 10.0), ('wrapcauchy', 6.2831853071795862, 10.0)] only pareto has both limits on the same side of zero >>> pprint ([(d.name, d.a, d.b) for d in dist_cont if d.a*d.b>0]) [('pareto', 1.0, inf)] genextreme, and maybe one or two others, are missing because finite a, b are set in _argcheck vonmises is for circular and doesn't behave properly only two distributions define non-generic xa or xb >>> pprint ([(d.name, d.a, d.b, d.xa, d.xb) for d in dist_cont if not d.xa*d.xb==-100]) [('foldcauchy', 0.0, inf, -10.0, 1000), ('recipinvgauss', 0.0, inf, -10.0, 50)] a pull request setting correct xa, xb would be very welcome Josef > > Josef > >> >> Nicky >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev From josef.pktd at gmail.com Wed Apr 25 23:54:58 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Apr 2012 23:54:58 -0400 Subject: [SciPy-Dev] scipy.stats: some questions/points about distributions.py + reply on ticket 1493 In-Reply-To: References: Message-ID: On Wed, Apr 25, 2012 at 4:03 PM, nicky van foreest wrote: >>> 1: >>> >>> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L436 >> >> I never looked at this. It's not used anywhere. >> >>> >>> Is this code "dead"? Within distributions.py it is not called. Nearly >>> the same code is written here: >>> >>> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1180 >> >> This is what is used for the generic ppf. > > Yes, sure. Sorry for confusing you. L1180 makes good sense. But since > L1180 is there, there appears to be no good reason to include the code > at L436. I guess not, but because I never looked at it carefully, I don't know if it might be useful for anything. > > >>> 2: >>> >>> I have a similar point about: >>> >>> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L358 >>> >>> What is the use of this code? It is not called anywhere. Besides this, >>> from our ?discussion about ticket 1493, this function returns the >>> centralized moments, while the "real" moment E(X^n) should be >>> returned. Hence, the code is also not correct, i.e., not in line with >>> the documentation. >> >> I think this and skew, kurtosis are internal functions for fit_start, >> getting starting values for fit from the data, even if it's not used. >> in general: For the calculations it might sometimes be nicer to >> calculate central moments, and then convert them to non-central or the >> other way around. I have some helper functions for this in statsmodels >> and it is similarly used >> >> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1745 >> >> (That's new code that I'm not so familiar with.) > > I actually saw this code, and have my doubts about whether this is the > best way to compute the non-central moments. Suppose that the > computation of the central moment involves quad(). Then indeed the > computations at these lines don't require a new call to quad(). > However, there is a (slow) python for loop involved, the power > function ** is called multiple times, and { n \choose k} ?is computed. > (BTW, can I safely assume you use Latex?). Calling quad() on x**k to > compute E(X^k) might be just a fast, although I did not test this > hunch. Anyway quad( lamdba x: x**k *_pdf(x)) reads much easier. L1745 are necessary now because moments are still non-central, but allow for non default loc and scale. _munp still does the raw quad( lamdba x: x**k *_pdf(x)) or similar if it is not specifically defined, i.e. can be calculated from explicit _stats. as aside: whenever I try to go through generic _stats and moments I have a hard time to follow in what is going on in all the different cases I don't see where we could save much. I didn't go through the math. > >> >>> >>> 3: >>> >>> Suppose we would turn xa and xb into private atrributes _xa and _xb, >>> then i suppose that >>> >>> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L883 >>> >>> requires updating. >> >> Yes, but no big loss I think, ?given that it won't be needed anymore > > Oops. Your other mail convinced to do use _xa and _xb.... See my other mail. not needed as public method, See the other thread which would also need adjustments to the docstring > >>> 5: >>> >>> The definition of arr in >>> >>> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L60 >>> >>> does not add much (although it saves some characters at some points of >>> the code), but makes it harder to read the code for novices like me. >>> (I spent some time searching for a numpy function called arr, only to >>> find out later that it was just a shorthand only used in the >>> distribution.py module). Would it be a problem to replace such code by >>> the proper numpy function? >> >> But then these novices would just read some piece code instead of >> going through all 7000 lines looking for imports and redefinitions. >> And I suffered the same way. :) > > I suppose you did :-) > >> >> I don't have any problem with cleaning this up. I never checked if in >> some cases with lot's of generic loops the namespace lookup would >> significantly increase the runtime. > > Should it? I am not an expert on this, but I read in Langtangen's book > that importing functions like so: from numpy import array, and so on, > does not add much to the calling time of functions. However, if I am > mistaken, please forget this point. from numpy import asarray or from numpy import asarray as arr doesn't make any difference, but I usually like full namespaces np.asarray > >> >>> >>> 6: >>> >>> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L538 >>> >>> contains a typo. It should be Weisstein. >> >> should be fixed then > > Should this become a ticket, or is it too minor? too minor for a ticket, but sneaking it into a pull request, or making a separate pull request (to increase your Karma) would be useful > >> >>> >>> 7: >>> >>> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L625 >>> >>> >>> This code gives me even a harder time than _argsreduce. I have to >>> admit that I simply don't know what this code is trying to >>> prevent/check/repair. Would you mind giving a hint? >> >> whats _argsreduce? > > Sorry, I meant __argcheck(). The code at > > https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1195 > > is not very simple to understand, at least not for me. AFAICS, It's just a joint condition that all args are strictly positive ( the default condition) args[0] > 0 & args[1] > 0 & .... my guess is the arr, asarray, is not necessary, and should be handled already in the main methods. Josef > >> >> https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L625 >> >> This has been rewritten by Per Brodtkorb. >> It is used in most methods to get the goodargs with which the >> distribution specific method is called. >> >> example ppf https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1524 >> >> first we are building the conditions for valid, good arguments. >> boundaries are filled, invalid arguments get nans. >> What's left over are the goodargs, the values of the method arguments >> for which we need to calculate the actual results. >> So we need to broadcast and select those arguments. -> argsreduce >> The distribution specific or generic ._ppf is then called with 1d >> arrays (of the same shape IIRC) of goodargs. >> >> then we can "place" the calculated values into the results arrays, >> next to the nans and boundaries. >> >> I hope that helps > > I'll try to understand it again. > > Thanks for your hints. > >> >> Thanks, >> >> Josef >> >>> >>> Nicky >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From sturla at molden.no Thu Apr 26 08:08:16 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 26 Apr 2012 14:08:16 +0200 Subject: [SciPy-Dev] scipy.sparse and OpenMP In-Reply-To: References: Message-ID: <4F993AB0.1060506@molden.no> With respect to OpenMP, it is worth noting that with MinGW (gcc on Windows) it requires an LGPL pthreads library. So it would mean we have to build SciPy with MSVC on Windows to avoid the LGPL licence taint. Sturla On 05.03.2012 11:15, Maximilian Nickel wrote: > Hi everyone, > I've been working with fairly large sparse matrices on a > multiprocessor system lately and noticed that scipy.sparse is > single-threaded. Since I needed faster computations, I've quickly > added some OpenMP #pragma directives in scipy/sparse/sparsetools to > the functions that I've been using in order to enable multithreading, > what worked out nicely. I wondered if you would be interested in a > more complete OpenMP-enabled version of scipy.sparse.setuptools. I've > attached the patch of the quick-and-dirty changes that I made so far > to this mail, to give you an idea. > > Best regards > Max > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From njs at pobox.com Thu Apr 26 08:18:59 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 26 Apr 2012 13:18:59 +0100 Subject: [SciPy-Dev] scipy.sparse and OpenMP In-Reply-To: <4F993AB0.1060506@molden.no> References: <4F993AB0.1060506@molden.no> Message-ID: On Thu, Apr 26, 2012 at 1:08 PM, Sturla Molden wrote: > With respect to OpenMP, it is worth noting that with MinGW (gcc on > Windows) it requires an LGPL pthreads library. So it would mean we have > to build SciPy with MSVC on Windows to avoid the LGPL licence taint. Is there an LGPL license taint? So long as pthreads is dynamically linked this shouldn't place any requirements on scipy users or distributors. (And if there is a taint, wouldn't we have the same problem on Linux? pthreads there is also LGPL, along with the rest of libc.) - N From warren.weckesser at enthought.com Thu Apr 26 12:20:45 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 26 Apr 2012 11:20:45 -0500 Subject: [SciPy-Dev] SciPy 2012 - The Eleventh Annual Conference on Scientific Computing with Python In-Reply-To: References: Message-ID: Dear all, (Sorry if you receive this announcement multiple times.) Registration for SciPy 2012, the eleventh annual Conference on Scientific Computing with Python, is open! Go to https://conference.scipy.org/scipy2012/register/index.php We would like to remind you that the submissions for talks, posters and tutorials are open *until April 30th, *which is just around the corner. For more information see: http://conference.scipy.org/scipy2012/tutorials.php http://conference.scipy.org/scipy2012/talks/index.php For talks or posters, all we need is an abstract. Tutorials require more significant preparation. If you are preparing a tutorial, please send a brief note to Jonathan Rocher (jrocher at enthought.com) to indicate your intent. We look forward to seeing many of you this summer. Kind regards, The SciPy 2012 organizers scipy2012 at scipy.org On Wed, Apr 4, 2012 at 4:30 PM, Warren Weckesser < warren.weckesser at enthought.com> wrote: > SciPy 2012, the eleventh annual Conference on Scientific Computing with > Python, will be held July 16?21, 2012, in Austin, Texas. > > At this conference, novel scientific applications and libraries related to > data acquisition, analysis, dissemination and visualization using Python > are presented. Attended by leading figures from both academia and industry, > it is an excellent opportunity to experience the cutting edge of scientific > software development. > > The conference is preceded by two days of tutorials, during which > community experts provide training on several scientific Python packages. > Following the main conference will be two days of coding sprints. > > We invite you to give a talk or present a poster at SciPy 2012. > > The list of topics that are appropriate for the conference includes (but > is not limited to): > > - new Python libraries for science and engineering; > - applications of Python in solving scientific or computational > problems; > - high performance, parallel and GPU computing with Python; > - use of Python in science education. > > > > Specialized Tracks > > Two specialized tracks run in parallel to the main conference: > > - High Performance Computing with Python > Whether your algorithm is distributed, threaded, memory intensive or > latency bound, Python is making headway into the problem. We are looking > for performance driven designs and applications in Python. Candidates > include the use of Python within a parallel application, new architectures, > and ways of making traditional applications execute more efficiently. > > > - Visualization > They say a picture is worth a thousand words--we?re interested in > both! Python provides numerous visualization tools that allow scientists > to show off their work, and we want to know about any new tools and > techniques out there. Come show off your latest graphics, whether it?s an > old library with a slick new feature, a new library out to challenge the > status quo, or simply a beautiful result. > > > > Domain-specific Mini-symposia > > Mini-symposia on the following topics are also being organized: > > - Computational bioinformatics > - Meteorology and climatology > - Astronomy and astrophysics > - Geophysics > > > > Talks, papers and posters > > We invite you to take part by submitting a talk or poster abstract. > Instructions are on the conference website: > > > http://conference.scipy.org/scipy2012/talks.php > > Selected talks are included as papers in the peer-reviewed conference > proceedings, to be published online. > > > Tutorials > > Tutorials will be given July 16?17. We invite instructors to submit > proposals for half-day tutorials on topics relevant to scientific computing > with Python. See > > http://conference.scipy.org/scipy2012/tutorials.php > > for information about submitting a tutorial proposal. To encourage > tutorials of the highest quality, the instructor (or team of instructors) > is given a $1,000 stipend for each half day tutorial. > > > Student/Community Scholarships > > We anticipate providing funding for students and for active members of the > SciPy community who otherwise might not be able to attend the conference. > See > > http://conference.scipy.org/scipy2012/student.php > > for scholarship application guidelines. > > > Be a Sponsor > > The SciPy conference could not run without the generous support of the > institutions and corporations who share our enthusiasm for Python as a tool > for science. Please consider sponsoring SciPy 2012. For more information, > see > > http://conference.scipy.org/scipy2012/sponsor/index.php > > > Important dates: > > Monday, April 30: Talk abstracts and tutorial proposals due. > Monday, May 7: Accepted tutorials announced. > Monday, May 13: Accepted talks announced. > > Monday, June 18: Early registration ends. (Price increases after this > date.) > Sunday, July 8: Online registration ends. > > Monday-Tuesday, July 16 - 17: Tutorials > Wednesday-Thursday, July 18 - July 19: Conference > Friday-Saturday, July 20 - July 21: Sprints > > We look forward to seeing you all in Austin this year! > > The SciPy 2012 Team > http://conference.scipy.org/scipy2012/organizers.php > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Thu Apr 26 13:05:18 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 26 Apr 2012 19:05:18 +0200 Subject: [SciPy-Dev] Cython 0.16 In-Reply-To: References: Message-ID: On Wed, Apr 25, 2012 at 6:46 PM, Pauli Virtanen wrote: > 25.04.2012 14:20, Warren Weckesser kirjoitti: > > Is there currently a constraint on the version of cython that can be > > used in scipy? If so, can we bump it up to the latest version (0.16)? > > I would like to take advantage of the fused types in a extension module > > that I'm working on. > > +1 from me for moving to 0.16 > > +1 Using the latest released version is always OK I think. Regenerating all C files with latest Cython before a release is probably also a good idea. Ralf > (Yes, fused types > https://github.com/pv/scipy-work/commits/enh/interpnd-fused-types ) > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vanforeest at gmail.com Thu Apr 26 17:19:35 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Thu, 26 Apr 2012 23:19:35 +0200 Subject: [SciPy-Dev] scipy.stats: some questions/points about distributions.py + reply on ticket 1493 In-Reply-To: References: Message-ID: > L1745 are necessary now because moments are still non-central, but > allow for non default loc and scale. > _munp still does the raw quad( lamdba x: x**k *_pdf(x)) or similar if > it is not specifically defined, i.e. can be calculated from explicit > _stats. > > as aside: whenever I try to go through generic _stats and moments I > have a hard time to follow in what is going on in all the different > cases > > I don't see where we could save much. I didn't go through the math. Perhaps we should postpone the problems with the moments computations for the moment. I'll come across this point in due time and bug you again about it. > from numpy import asarray > or > from numpy import asarray as arr > doesn't make any difference, but I usually like full namespaces > np.asarray About replacing arr by asarray: do I get it right that you change this? Or should I try making a pull request for this? > https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L538 >>>> >>>> contains a typo. It should be Weisstein. >>> >>> should be fixed then >> >> Should this become a ticket, or is it too minor? > > too minor for a ticket, but sneaking it into a pull request, or making > a separate pull request (to increase your Karma) would be useful I just got a local version of scipy and wrote a patch for this. For today I'll try to turn it into a pull request. As this is my first attempt, I have no idea how long it will take to set everything up. The spin off will be, hopefully, that I can add to the code at other places too. I suppose you will see the patch appearing. Nicky From vanforeest at gmail.com Thu Apr 26 17:20:56 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Thu, 26 Apr 2012 23:20:56 +0200 Subject: [SciPy-Dev] scipy.stats: algorithm to for ticket 1493 In-Reply-To: References: Message-ID: > 3) I prefer your expanding the search to raising an exception to the > user. Note also that your 3) is inconsistent with 1). If a user > visible exception is raised, then the user needs to change xa or xb, > so it shouldn't be private. That's the current situation (except for a > more cryptic message). > > 2) I'm all in favor, especially for one-side bound distributions, > where it should be easy to go through those. There might be a few > where the bound moves with the shape, but the only one I remember is > genextreme and that has an explicit _ppf > > So I would prefer 1), 2) and your new enhanced generic _ppf Ok. I am convinced now. I'll try to write this in a good and generic way. From mark.pundurs at nokia.com Mon Apr 30 11:51:54 2012 From: mark.pundurs at nokia.com (Pundurs Mark (Nokia-LC/Chicago)) Date: Mon, 30 Apr 2012 10:51:54 -0500 Subject: [SciPy-Dev] SciPy Dev Wiki bug Trac broken - how to report bug? Message-ID: <8A18D8FA4293104C9A710494FD6C273CB7C83581@hq-ex-mb03.ad.navteq.com> http://projects.scipy.org/scipy/newticket returns an error - with only the unhelpful suggestion to open a ticket: ************** Oops... Trac detected an internal error: OperationalError: database is locked There was an internal error in Trac. It is recommended that you inform your local Trac administrator and give him all the information he needs to reproduce the issue. To that end, you could Create a ticket. The action that triggered the error was: POST: /newticket ************** Is there any other means to register the following bug report? scipy.cluster.hierarchy.ClusterNode.pre_order returns IndexError for non-root node 0.9.0 To reproduce the error, run the following script: import random, numpy from scipy.cluster.hierarchy import linkage, to_tree datalist = [] for i in range(8000): datalist.append(random.random()) datalist = numpy.array(datalist) datalist = numpy.reshape(datalist, (datalist.shape[0], 1)) Z = linkage(datalist) root_node_ref = to_tree(Z) left_root_node_ref = root_node_ref.left left_root_node_ref.pre_order() The result is: Traceback (most recent call last): File "C:\ReproduceError-pre_order.py", line 12, in left_root_node_ref.pre_order() File "C:\Python27\lib\site-packages\scipy\cluster\hierarchy.py", line 732, in pre_order if not lvisited[ndid]: IndexError: index out of bounds One possible solution (successfully tested with preceding script) is to change pre_order in hierarchy.py as follows: n = self.count curNode = [None] * (2 * n) #following two lines changed: dictionaries instead of lists lvisited = {} rvisited = {} curNode[0] = self k = 0 preorder = [] while k >= 0: nd = curNode[k] ndid = nd.id if nd.is_leaf(): preorder.append(func(nd)) k = k - 1 else: #following line changed: check existence of dictionary key rather than value of list item if ndid not in lvisited.keys(): curNode[k + 1] = nd.left lvisited[ndid] = True k = k + 1 #following line changed: check existence of dictionary key rather than value of list item elif ndid not in rvisited.keys(): curNode[k + 1] = nd.right rvisited[ndid] = True k = k + 1 else: k = k - 1 return preorder Mark Pundurs Data Analyst - Traffic Location & Commerce Chicago The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files. From charlesr.harris at gmail.com Mon Apr 30 12:04:21 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 30 Apr 2012 10:04:21 -0600 Subject: [SciPy-Dev] SciPy Dev Wiki bug Trac broken - how to report bug? In-Reply-To: <8A18D8FA4293104C9A710494FD6C273CB7C83581@hq-ex-mb03.ad.navteq.com> References: <8A18D8FA4293104C9A710494FD6C273CB7C83581@hq-ex-mb03.ad.navteq.com> Message-ID: On Mon, Apr 30, 2012 at 9:51 AM, Pundurs Mark (Nokia-LC/Chicago) < mark.pundurs at nokia.com> wrote: > http://projects.scipy.org/scipy/newticket returns an error - with only > the unhelpful suggestion to open a ticket: > > ************** > Oops... > Trac detected an internal error: > OperationalError: database is locked > There was an internal error in Trac. It is recommended that you inform > your local Trac administrator and give him all the information he needs to > reproduce the issue. > > To that end, you could Create a ticket. > > The action that triggered the error was: > POST: /newticket > ************** > > Is there any other means to register the following bug report? > > This happens pretty regularly, the best thing is to try again later and if the problem persists, ping the list. We are looking for something a bit more reliable... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From junkshops at gmail.com Mon Apr 30 20:12:51 2012 From: junkshops at gmail.com (Junkshops) Date: Mon, 30 Apr 2012 17:12:51 -0700 Subject: [SciPy-Dev] Independent T-tests with unequal variances Message-ID: <4F9F2A83.1070206@gmail.com> Hello all, I hope, as an utter newb poking my nose into this list, that I'm not giving the author of Miss Manner's Book of Netiquette the vapours. This is a follow up to Deniz Turgut's recent email: http://article.gmane.org/gmane.comp.python.scientific.devel/16291/ "There is also a form of t-test for independent samples with different variances, also known as Welch's t-test [2]. I think it is better to include the 'identical variance' assumption in the doc to avoid confusion." I was just caught by this problem and heartily agree with Deniz's views. However, I didn't see any explicit plans to add Welch's test to scipy/stats/stats.py, and I needed an implementation of the test and so implemented it. A diff against scipy 0.10.1 is attached if anyone might find it useful. Cheers, Gavin -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: stats.py.diff URL: From junkshops at gmail.com Mon Apr 30 21:11:47 2012 From: junkshops at gmail.com (Junkshops) Date: Mon, 30 Apr 2012 18:11:47 -0700 Subject: [SciPy-Dev] Independent T-tests with unequal variances In-Reply-To: <4F9F2A83.1070206@gmail.com> References: <4F9F2A83.1070206@gmail.com> Message-ID: <4F9F3853.9050600@gmail.com> Well, this is embarrassing. I appear to have sent the wrong diff with a sign error in it. The correct one is attached. My apologies, Gavin On 4/30/2012 5:12 PM, Junkshops wrote: > Hello all, > > I hope, as an utter newb poking my nose into this list, that I'm not > giving the author of Miss Manner's Book of Netiquette the vapours. > > This is a follow up to Deniz Turgut's recent email: > http://article.gmane.org/gmane.comp.python.scientific.devel/16291/ > > "There is also a form of t-test for independent samples with different > variances, also known as Welch's t-test [2]. I think it is better to > include the 'identical variance' assumption in the doc to avoid > confusion." > > I was just caught by this problem and heartily agree with Deniz's > views. However, I didn't see any explicit plans to add Welch's test to > scipy/stats/stats.py, and I needed an implementation of the test and > so implemented it. A diff against scipy 0.10.1 is attached if anyone > might find it useful. > > Cheers, > > Gavin > > -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: stats.py.diff URL: From warren.weckesser at enthought.com Mon Apr 30 21:12:52 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Mon, 30 Apr 2012 20:12:52 -0500 Subject: [SciPy-Dev] SciPy 2012 Abstract and Tutorial Deadlines Extended Message-ID: SciPy 2012 Conference Deadlines Extended Didn't quite finish your abstract or tutorial yet? Good news: the SciPy 2012 organizers have extended the deadline until Friday, May 4. Proposals for tutorials and abstracts for talks and posters are now due by midnight (Austin time, CDT), May 4. For the many of you who have already submitted an abstract or tutorial: thanks! If you need to make corrections to an abstract or tutorial that you have already submitted, you may resubmit it by the same deadline. The SciPy 2012 Organizers -------------- next part -------------- An HTML attachment was scrubbed... URL: