From schut at sarvision.nl Mon Sep 3 08:32:01 2012 From: schut at sarvision.nl (Vincent Schut) Date: Mon, 03 Sep 2012 14:32:01 +0200 Subject: [SciPy-Dev] ckdtree pull request Message-ID: Hi all, there is this pull request with improvements for ckdtree (https://github.com/scipy/scipy/pull/262) that fixes quite some memory leaks for me. It saves me from the need to frequently restart my remote processing scripts, that use ckdtrees, and otherwise leak memory like a sieve... May I humbly request - as just a user who knows his place - this to be merged? Best, Vincent Schut. From patvarilly at gmail.com Mon Sep 3 08:37:22 2012 From: patvarilly at gmail.com (Patrick Varilly) Date: Mon, 3 Sep 2012 13:37:22 +0100 Subject: [SciPy-Dev] ckdtree pull request In-Reply-To: References: Message-ID: Dear Vincent, I'm very happy to hear that the code has helped you. The pull request is almost ready (I wrote most of it), and I've been delinquent in putting together the last few finishing touches so that it can be considered for a merge (rolling in some benchmarking code and merging a few final contributed fixes). Other than those, the code is ready to go. I've been very busy over the past few weeks, but will do my best to get these last issues sorted out soon. All the best, Patrick On Mon, Sep 3, 2012 at 1:32 PM, Vincent Schut wrote: > Hi all, > > there is this pull request with improvements for ckdtree > (https://github.com/scipy/scipy/pull/262) that fixes quite some memory > leaks for me. It saves me from the need to frequently restart my remote > processing scripts, that use ckdtrees, and otherwise leak memory like a > sieve... May I humbly request - as just a user who knows his place - > this to be merged? > > Best, > Vincent Schut. > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From schut at sarvision.nl Mon Sep 3 08:47:43 2012 From: schut at sarvision.nl (Vincent Schut) Date: Mon, 03 Sep 2012 14:47:43 +0200 Subject: [SciPy-Dev] ckdtree pull request In-Reply-To: References:

Message-ID: On 09/03/12 14:37, Patrick Varilly wrote: > Dear Vincent, > > I'm very happy to hear that the code has helped you. The pull request > is almost ready (I wrote most of it), and I've been delinquent in > putting together the last few finishing touches so that it can be > considered for a merge (rolling in some benchmarking code and merging a > few final contributed fixes). Other than those, the code is ready to > go. I've been very busy over the past few weeks, but will do my best to > get these last issues sorted out soon. > > All the best, > > Patrick Hi Patrick, thanks, sounds good! I can confirm that for me the code works, and the memory leaks are gone. As I will soon need to install my processing software on some new cluster computers, and I prefer to be able to just install scipy from the main repo instead of having to merge it with Sturla's and yours, that's why I asked. I have added myself as a watcher to your PR, so I suppose I should get informed automatically now when something changes or the PR gets merged. Thanks (also to Sturla) for the good work! Best, Vincent. > > On Mon, Sep 3, 2012 at 1:32 PM, Vincent Schut > wrote: > > Hi all, > > there is this pull request with improvements for ckdtree > (https://github.com/scipy/scipy/pull/262) that fixes quite some memory > leaks for me. It saves me from the need to frequently restart my remote > processing scripts, that use ckdtrees, and otherwise leak memory like a > sieve... May I humbly request - as just a user who knows his place - > this to be merged? > > Best, > Vincent Schut. > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From sturla at molden.no Tue Sep 4 07:39:10 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 04 Sep 2012 13:39:10 +0200 Subject: [SciPy-Dev] ckdtree pull request In-Reply-To: References: Message-ID: <5045E85E.1020708@molden.no> We are waiting for Pat Varilly to merge the last fix I did for Windows 64 (it has weird integer size issues) into his branch. He has not replied in several weeks, so maybe I should open a new PR? Sturla On 03.09.2012 14:32, Vincent Schut wrote: > Hi all, > > there is this pull request with improvements for ckdtree > (https://github.com/scipy/scipy/pull/262) that fixes quite some memory > leaks for me. It saves me from the need to frequently restart my remote > processing scripts, that use ckdtrees, and otherwise leak memory like a > sieve... May I humbly request - as just a user who knows his place - > this to be merged? > > Best, > Vincent Schut. > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From sturla at molden.no Tue Sep 4 08:21:16 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 04 Sep 2012 14:21:16 +0200 Subject: [SciPy-Dev] ckdtree pull request In-Reply-To: References:

Message-ID: <5045F23C.80201@molden.no> On 03.09.2012 14:47, Vincent Schut wrote: > thanks, sounds good! I can confirm that for me the code works, and the > memory leaks are gone. That is good to hear :) The cKDTree code in SciPy master has a few places where I expected that memory leaks could occur (particularly if there were exceptions) or integers could overflow (particularly on Win64). I removed all that I could see (there might still be more bugs, though), we and tried not to introduce any in Patrick's new code. Sturla From sturla at molden.no Tue Sep 4 08:22:44 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 04 Sep 2012 14:22:44 +0200 Subject: [SciPy-Dev] ckdtree pull request In-Reply-To: <5045E85E.1020708@molden.no> References: <5045E85E.1020708@molden.no> Message-ID: <5045F294.5080905@molden.no> Never mind this, I didn't see Patrick's reply. On 04.09.2012 13:39, Sturla Molden wrote: > We are waiting for Pat Varilly to merge the last fix I did for Windows > 64 (it has weird integer size issues) into his branch. He has not > replied in several weeks, so maybe I should open a new PR? > > Sturla > > > > On 03.09.2012 14:32, Vincent Schut wrote: >> Hi all, >> >> there is this pull request with improvements for ckdtree >> (https://github.com/scipy/scipy/pull/262) that fixes quite some memory >> leaks for me. It saves me from the need to frequently restart my remote >> processing scripts, that use ckdtrees, and otherwise leak memory like a >> sieve... May I humbly request - as just a user who knows his place - >> this to be merged? >> >> Best, >> Vincent Schut. >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From fperez.net at gmail.com Fri Sep 7 04:38:55 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Fri, 7 Sep 2012 01:38:55 -0700 Subject: [SciPy-Dev] John Hunter's memorial service: Oct 1, 2012 Message-ID: Hi all, I have just received the following information from John's family regarding the memorial service: John's memorial service will be held on Monday, October 1, 2012, at 11.a.m. at Rockefeller Chapel at the University of Chicago. The exact address is 5850 S. Woodlawn Ave, Chicago, IL 60615. The service is open to the public. The service will be fully planned and scripted with no room for people to eulogize, however, we will have a reception after the service, hosted by Tradelink, where people can talk. Regards, f From fperez.net at gmail.com Fri Sep 7 14:21:20 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Fri, 7 Sep 2012 11:21:20 -0700 Subject: [SciPy-Dev] John Hunter's memorial service: Oct 1, 2012 In-Reply-To: References: Message-ID: I just received the official announcement, please note the RSVP requirement to Miriam at msierig at gmail.com. John Davidson Hunter, III 1968-2012 [image: Inline image 1] Our family invites you to join us to celebrate and remember the life of John Hunter Memorial Service Rockefeller Chapel 5850 South Woodlawn Chicago, IL 60637 Monday October 1, 2012 11am Service will be followed by a reception where family and friends may gather to share memories of John. Please RSVP to Miriam at msierig at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: jdhj.jpg Type: image/jpeg Size: 370050 bytes Desc: not available URL: From ralf.gommers at gmail.com Sat Sep 8 12:21:29 2012 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 8 Sep 2012 18:21:29 +0200 Subject: [SciPy-Dev] ANN: SciPy 0.11.0 release candidate 2 In-Reply-To: <55FCB433-349E-468C-B59D-315C24A36BBA@samueljohn.de> References: <55FCB433-349E-468C-B59D-315C24A36BBA@samueljohn.de> Message-ID: On Tue, Aug 14, 2012 at 4:53 PM, Samuel John wrote: > > ====================================================================== > FAIL: test_stats.test_ttest_ind > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/case.py", > line 197, in runTest > self.test(*self.arg) > File > "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/tests/test_stats.py", > line 1556, in test_ttest_ind > assert_array_almost_equal([t,p],(tr,pr)) > File > "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", > line 800, in assert_array_almost_equal > header=('Arrays are not almost equal to %d decimals' % decimal)) > File > "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", > line 636, in assert_array_compare > raise AssertionError(msg) > AssertionError: > Arrays are not almost equal to 6 decimals > > (mismatch 50.0%) > x: array([ 1.09127469, 0.4998416 ]) > y: array([ 1.09127469, 0.27647819]) > > ====================================================================== > FAIL: test_stats.test_ttest_ind_with_uneq_var > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nose/case.py", > line 197, in runTest > self.test(*self.arg) > File > "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/tests/test_stats.py", > line 1596, in test_ttest_ind_with_uneq_var > assert_array_almost_equal([t,p], [tr, pr]) > File > "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", > line 800, in assert_array_almost_equal > header=('Arrays are not almost equal to %d decimals' % decimal)) > File > "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", > line 636, in assert_array_compare > raise AssertionError(msg) > AssertionError: > Arrays are not almost equal to 6 decimals > > (mismatch 50.0%) > x: array([-0.68649513, 0.81407518]) > y: array([-0.68649513, 0.53619491]) > These are a little odd. It gets the t-statistic right, so the problem is not in stats. The issue seems to be in special.stdtr(), whose test isn't failing for you. It only has a test for df=0 though. The first failure comes down to giving a different answer from: >>> special.stdtr(198, -1.09127) * 2 0.27648... Do you get the same for these? >>> special.stdtr(1, 0) # this is tested 0.5 >>> special.stdtr(1, 1) # this isn't 0.75000000000000022 >>> special.stdtr(1, 2) 0.85241638234956674 Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Sep 8 12:24:36 2012 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 8 Sep 2012 18:24:36 +0200 Subject: [SciPy-Dev] [SciPy-User] ANN: SciPy 0.11.0 release candidate 2 In-Reply-To: References: <50297AFE.3020806@uci.edu> Message-ID: On Tue, Aug 14, 2012 at 7:12 PM, Ralf Gommers wrote: > > > On Tue, Aug 14, 2012 at 12:09 AM, Christoph Gohlke wrote: > >> >> test_qz_double_sort is now failing in all msvc9/MKL builds (Python 2.6 >> to 3.2, 32 and 64 bit): >> > This turned out to be something minor, fixed by https://github.com/scipy/scipy/pull/292. > > > > This issue has held up the release for too long already, so unless someone > has time to get it resolved this week, I propose the following: > 1. Figure out if the problem is the sort function or something else. > 2. If it's sort, disable it. Otherwise remove the qz function from the > 0.11.x branch. > The problem is only in the sort keyword, here's a PR to disable it for the time being: https://github.com/scipy/scipy/pull/303 See ticket 1717 for more details. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Sep 8 12:31:37 2012 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 8 Sep 2012 18:31:37 +0200 Subject: [SciPy-Dev] [SciPy-User] ANN: SciPy 0.11.0 release candidate 2 In-Reply-To: <502A5424.2070408@comcast.net> References: <50294CEB.4070103@comcast.net> <50297B0F.2090400@comcast.net>

<502A5424.2070408@comcast.net> Message-ID: On Tue, Aug 14, 2012 at 3:35 PM, John Hassler wrote: > > On 8/14/2012 7:21 AM, Pauli Virtanen wrote: > > Ralf Gommers gmail.com> writes: > > [clip] > >> Does anyone have an idea about that test_singular failure? > > That's very likely some problem with the underlying LAPACK library. > > I think the problem solved is close to a numerical instability. > > > > The failing comparison compares eigenvalues computed by > > > > eig(A, B) > > eig(A, B, left=False, right=False) > > > > which differ solely in passing 'N' vs. 'V' to DGGEV. The eigenvalue > > property of the former is also checked and seems to pass. Interestingly, > > the result obtained from the two seems to differ (therefore, the latter > > is probably wrong), which appears to point to a LAPACK issue. > > > > Here, it would be interesting to know if the problem occurs with > > the official Scipy binaries, or something else. > > > > I installed rc2 on Python 2.7.3. Same problem. I get the test_singular > error on some, but not all, of the runs. Both are win32-superpack from > http://sourceforge.net/projects/scipy/files/scipy/0.11.0rc2/. > > The error occurs on less than half but more than 1/3 (based on a very > small sample) of the runs on both 2.7 and 3.2. > > I've been working on computers for more than 50 years. Somehow, I had > developed the delusion that they were deterministic ..... > john > What are we going to do about this one? I'm tempted to open a ticket for it and mark it as knownfail on Windows for now, since it's a corner case. Ralf > ------------- Python 2.7 -------------------- > >>> import scipy > >>> scipy.test() > Running unit tests for scipy > NumPy version 1.6.2 > NumPy is installed in C:\Python27\lib\site-packages\numpy > SciPy version 0.11.0rc2 > SciPy is installed in C:\Python27\lib\site-packages\scipy > Python version 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit > (Intel)] > nose version 0.11.2 > > ====================================================================== > FAIL: test_decomp.TestEig.test_singular > Test singular pair > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "C:\Python27\lib\site-packages\nose-0.11.2-py2.7.egg\nose\case.py", line > 186, in runTest > self.test(*self.arg) > File > "C:\Python27\lib\site-packages\scipy\linalg\tests\test_decomp.py", line > 201, in test_singular > self._check_gen_eig(A, B) > File > "C:\Python27\lib\site-packages\scipy\linalg\tests\test_decomp.py", line > 188, in _check_gen_eig > err_msg=msg) > File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line > 800, in assert_array_almost_equal > header=('Arrays are not almost equal to %d decimals' % decimal)) > File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line > 636, in assert_array_compare > raise AssertionError(msg) > AssertionError: > Arrays are not almost equal to 6 decimals > > array([[22, 34, 31, 31, 17], > [45, 45, 42, 19, 29], > [39, 47, 49, 26, 34], > [27, 31, 26, 21, 15], > [38, 44, 44, 24, 30]]) > array([[13, 26, 25, 17, 24], > [31, 46, 40, 26, 37], > [26, 40, 19, 25, 25], > [16, 25, 27, 14, 23], > [24, 35, 18, 21, 22]]) > (mismatch 25.0%) > x: array([ -2.45037885e-01 +0.00000000e+00j, > 5.17637463e-16 -4.01120590e-08j, > 5.17637463e-16 +4.01120590e-08j, 2.00000000e+00 > +0.00000000e+00j]) > y: array([ -3.74550285e-01 +0.00000000e+00j, > -5.17716907e-17 -1.15230800e-08j, > -5.17716907e-17 +1.15230800e-08j, 2.00000000e+00 > +0.00000000e+00j]) > > ---------------------------------------------------------------------- > Ran 5490 tests in 103.250s > > FAILED (KNOWNFAIL=14, SKIP=36, failures=1) > > >>> > > -------------- Python 3.2 -------------- > >>> scipy.test() > Running unit tests for scipy > NumPy version 1.6.2 > NumPy is installed in C:\Python32\lib\site-packages\numpy > SciPy version 0.11.0rc2 > SciPy is installed in C:\Python32\lib\site-packages\scipy > Python version 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit > (Intel)] > nose version 1.0.0 > > ====================================================================== > FAIL: test_decomp.TestEig.test_singular > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "C:\Python32\lib\site-packages\nose-1.0.0-py3.2.egg\nose\case.py", line > 188, in runTest > self.test(*self.arg) > File > "C:\Python32\lib\site-packages\scipy\linalg\tests\test_decomp.py", line > 201, in test_singular > self._check_gen_eig(A, B) > File > "C:\Python32\lib\site-packages\scipy\linalg\tests\test_decomp.py", line > 188, in _check_gen_eig > err_msg=msg) > File "C:\Python32\lib\site-packages\numpy\testing\utils.py", line > 800, in assert_array_almost_equal > header=('Arrays are not almost equal to %d decimals' % decimal)) > File "C:\Python32\lib\site-packages\numpy\testing\utils.py", line > 636, in assert_array_compare > raise AssertionError(msg) > AssertionError: > Arrays are not almost equal to 6 decimals > > array([[22, 34, 31, 31, 17], > [45, 45, 42, 19, 29], > [39, 47, 49, 26, 34], > [27, 31, 26, 21, 15], > [38, 44, 44, 24, 30]]) > array([[13, 26, 25, 17, 24], > [31, 46, 40, 26, 37], > [26, 40, 19, 25, 25], > [16, 25, 27, 14, 23], > [24, 35, 18, 21, 22]]) > (mismatch 25.0%) > x: array([ -2.450e-01 +0.000e+00j, 5.176e-16 -4.011e-08j, 5.176e-16 > +4.011e-08j, > 2.000e+00 +0.000e+00j]) > y: array([ -3.746e-01 +0.000e+00j, -5.177e-17 -1.152e-08j, -5.177e-17 > +1.152e-08j, > 2.000e+00 +0.000e+00j]) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Sep 8 13:12:41 2012 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 8 Sep 2012 19:12:41 +0200 Subject: [SciPy-Dev] extended functionality of max/min ufuncs In-Reply-To: References: Message-ID: On Thu, Aug 23, 2012 at 4:28 PM, Tom Grydeland wrote: > Hi all, > > I'm working on a system to translate (EXELIS/ITTVIS/RSI) IDL to > Python/Numpy. The results are close to usable for many purposes, but > some simple constructs are causing snags I was hoping you could help > me with. > > IDL's MAX function supports a second (output) argument which will > contain the index of the maximum value. One can write > > PRINT, MAX(x, id), id > > to get the value and index of the maximum element. In Numpy, I could do > > id = np.argmax(x); print x[id], id > > to get the same results in two statements. However, I can give MAX an > extra keyword /NAN to tell it to ignore NaN and infinities. In Numpy, > I have np.nanmax, but no np.nanargmax. > > In addition, MAX has the output keywords MIN=value and > SUBSCRIPT_MIN=index to return the minimum value and its index from the > same function call. Similarly, MIN has output keywords MAX=value and > SUBSCRIPT_MAX=index, so I can get max/min values and their indices > (ignoring NaNs) in a single call -- in Numpy I would need two calls, > two indexing operations, one temporary array of values and one of > indices. > > Would it be hard to add similar functionality to Numpy's min() and > max() functions? Is it desirable? > You may get more responses on the numpy mailing list. I think adding nanargmin/nanargmax is fine, and shouldn't be hard. Putting min/argmin keywords into max and vice versa looks weird to me. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Sep 8 13:33:22 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 8 Sep 2012 18:33:22 +0100 Subject: [SciPy-Dev] [SciPy-User] ANN: SciPy 0.11.0 release candidate 2 In-Reply-To: References: <50294CEB.4070103@comcast.net> <50297B0F.2090400@comcast.net>

<502A5424.2070408@comcast.net> Message-ID: Hi, On Sat, Sep 8, 2012 at 5:31 PM, Ralf Gommers wrote: > > > On Tue, Aug 14, 2012 at 3:35 PM, John Hassler wrote: >> >> >> On 8/14/2012 7:21 AM, Pauli Virtanen wrote: >> > Ralf Gommers gmail.com> writes: >> > [clip] >> >> Does anyone have an idea about that test_singular failure? >> > That's very likely some problem with the underlying LAPACK library. >> > I think the problem solved is close to a numerical instability. >> > >> > The failing comparison compares eigenvalues computed by >> > >> > eig(A, B) >> > eig(A, B, left=False, right=False) >> > >> > which differ solely in passing 'N' vs. 'V' to DGGEV. The eigenvalue >> > property of the former is also checked and seems to pass. Interestingly, >> > the result obtained from the two seems to differ (therefore, the latter >> > is probably wrong), which appears to point to a LAPACK issue. >> > >> > Here, it would be interesting to know if the problem occurs with >> > the official Scipy binaries, or something else. >> > >> >> I installed rc2 on Python 2.7.3. Same problem. I get the test_singular >> error on some, but not all, of the runs. Both are win32-superpack from >> http://sourceforge.net/projects/scipy/files/scipy/0.11.0rc2/. >> >> The error occurs on less than half but more than 1/3 (based on a very >> small sample) of the runs on both 2.7 and 3.2. >> >> I've been working on computers for more than 50 years. Somehow, I had >> developed the delusion that they were deterministic ..... >> john > > > What are we going to do about this one? I'm tempted to open a ticket for it > and mark it as knownfail on Windows for now, since it's a corner case. I have noticed that windows SVD appears to give different answers from repeated runs on the same matrix, differing in terms of sign flips, but valid SVDs. I've no idea why, but I had to adjust the tests in our code to allow for this. I guess we should make sure the returned results are correct, and fail otherwise. But maybe we do not require two runs to give the same answer. Could that explain the problem? Best, Matthew From josef.pktd at gmail.com Sat Sep 8 13:46:48 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 8 Sep 2012 13:46:48 -0400 Subject: [SciPy-Dev] [SciPy-User] ANN: SciPy 0.11.0 release candidate 2 In-Reply-To: References: <50294CEB.4070103@comcast.net> <50297B0F.2090400@comcast.net>

<502A5424.2070408@comcast.net>

Message-ID: On Sat, Sep 8, 2012 at 1:33 PM, Matthew Brett wrote: > Hi, > > On Sat, Sep 8, 2012 at 5:31 PM, Ralf Gommers wrote: >> >> >> On Tue, Aug 14, 2012 at 3:35 PM, John Hassler wrote: >>> >>> >>> On 8/14/2012 7:21 AM, Pauli Virtanen wrote: >>> > Ralf Gommers gmail.com> writes: >>> > [clip] >>> >> Does anyone have an idea about that test_singular failure? >>> > That's very likely some problem with the underlying LAPACK library. >>> > I think the problem solved is close to a numerical instability. >>> > >>> > The failing comparison compares eigenvalues computed by >>> > >>> > eig(A, B) >>> > eig(A, B, left=False, right=False) >>> > >>> > which differ solely in passing 'N' vs. 'V' to DGGEV. The eigenvalue >>> > property of the former is also checked and seems to pass. Interestingly, >>> > the result obtained from the two seems to differ (therefore, the latter >>> > is probably wrong), which appears to point to a LAPACK issue. >>> > >>> > Here, it would be interesting to know if the problem occurs with >>> > the official Scipy binaries, or something else. >>> > >>> >>> I installed rc2 on Python 2.7.3. Same problem. I get the test_singular >>> error on some, but not all, of the runs. Both are win32-superpack from >>> http://sourceforge.net/projects/scipy/files/scipy/0.11.0rc2/. >>> >>> The error occurs on less than half but more than 1/3 (based on a very >>> small sample) of the runs on both 2.7 and 3.2. >>> >>> I've been working on computers for more than 50 years. Somehow, I had >>> developed the delusion that they were deterministic ..... >>> john >> >> >> What are we going to do about this one? I'm tempted to open a ticket for it >> and mark it as knownfail on Windows for now, since it's a corner case. > > I have noticed that windows SVD appears to give different answers from > repeated runs on the same matrix, differing in terms of sign flips, > but valid SVDs. I've no idea why, but I had to adjust the tests in > our code to allow for this. > > I guess we should make sure the returned results are correct, and fail > otherwise. But maybe we do not require two runs to give the same > answer. Could that explain the problem? I'm only paying partial attention and not up-to-date, just a few tries: with b1 running (I removed a crashing qz sort for float test) (py27b) E:\Josef\testing\tox\py27b\Scripts>python -c "import scipy.linalg; scipy.linalg.test()" the tests always pass runinng the test specifically, I get the test failure each time (py27b) E:\Josef\testing\tox\py27b\Scripts>nosetests -v "E:\Josef\testing\tox\py27b\Lib\site-packages\scipy-0.11.0b1-py2.7-win32.egg\scipy\linalg\tests\test_decomp.py":TestEig.test_singular (mismatch 25.0%) x: array([ -3.74550285e-01 +0.00000000e+00j, -5.17716907e-17 -1.15230800e-08j, -5.17716907e-17 +1.15230800e-08j, 2.00000000e+00 +0.00000000e+00j]) y: array([ -2.45037885e-01 +0.00000000e+00j, 5.17637463e-16 -4.01120590e-08j, 5.17637463e-16 +4.01120590e-08j, 2.00000000e+00 +0.00000000e+00j]) running the example (just checking eigenvalues), I get different answers if the A,B matrices are int or float and then not always the same >>> import scipy >>> scipy.__version__ '0.9.0' >>> linalg.eigvals(a,b) array([ 2.00000000 +0.j, -0.00000080 +0.j, 0.00000080 +0.j, -0.35915547 +0.j, nan nanj]) >>> linalg.eigvals(a.astype(float),b) array([ 2.00000000+0.j , -0.00000000+0.00000018j, -0.00000000-0.00000018j, nan nanj, 0.57825572+0.j ]) >>> linalg.eigvals(a.astype(float),b.astype(float)) array([ 2.00000000+0.j , -0.00000000+0.00000018j, -0.00000000-0.00000018j, nan nanj, 0.57825572+0.j ]) >>> linalg.eigvals(a, b) array([ 2.00000000 +0.j, -0.00000080 +0.j, 0.00000080 +0.j, -0.35915547 +0.j, nan nanj]) >>> linalg.eigvals(a+0j, b +0j) array([ 2.00000000+0.j , -0.00000000-0.00000002j, 0.00000000+0.00000002j, 0.39034698+0.j , nan nanj]) >>> linalg.eigvals(a.astype(float),b.astype(float)) array([ 2.00000000 +0.j, -0.00000080 +0.j, 0.00000080 +0.j, -0.35915547 +0.j, nan nanj]) >>> linalg.eig(a, b)[0] array([ 2.00000000 +0.j, -0.00000080 +0.j, 0.00000080 +0.j, -0.35915547 +0.j, nan nanj]) >>> linalg.eig(a.astype(float),b.astype(float))[0] array([ 2.00000000+0.j , -0.00000000+0.00000018j, -0.00000000-0.00000018j, nan nanj, 0.57825572+0.j >>> a.dtype, b.dtype (dtype('int32'), dtype('int32')) Windows 7, python 32bit on 64bit machine Josef > > Best, > > Matthew > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From ralf.gommers at gmail.com Sat Sep 8 14:40:08 2012 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 8 Sep 2012 20:40:08 +0200 Subject: [SciPy-Dev] SciPy.org Website Move and Redesign In-Reply-To: <20120831194825.GL24647@phare.normalesup.org> References: <3C6CC713CBD949D2A51AB3C9859B493F@gmail.com>

<20120831194825.GL24647@phare.normalesup.org> Message-ID: On Fri, Aug 31, 2012 at 9:48 PM, Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > On Fri, Aug 31, 2012 at 12:21:53PM -0700, Fernando Perez wrote: > > > http://www.loria.fr/~rougier/tmp/scipy.png > > > http://www.loria.fr/~rougier/tmp/scipy.svg > > > FWIW, I like a lot the two central ones in the bottom row, thanks! > > My favorite one is the second one from the left on the bottom row. > Seconded, that one looks quite good! Ralf > > Thanks, Nicolas, you are our artist! > > Gael > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Sep 8 15:38:21 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 8 Sep 2012 15:38:21 -0400 Subject: [SciPy-Dev] [SciPy-User] ANN: SciPy 0.11.0 release candidate 2 In-Reply-To: References: <50294CEB.4070103@comcast.net> <50297B0F.2090400@comcast.net>

<502A5424.2070408@comcast.net>

Message-ID: On Sat, Sep 8, 2012 at 1:46 PM, wrote: > On Sat, Sep 8, 2012 at 1:33 PM, Matthew Brett wrote: >> Hi, >> >> On Sat, Sep 8, 2012 at 5:31 PM, Ralf Gommers wrote: >>> >>> >>> On Tue, Aug 14, 2012 at 3:35 PM, John Hassler wrote: >>>> >>>> >>>> On 8/14/2012 7:21 AM, Pauli Virtanen wrote: >>>> > Ralf Gommers gmail.com> writes: >>>> > [clip] >>>> >> Does anyone have an idea about that test_singular failure? >>>> > That's very likely some problem with the underlying LAPACK library. >>>> > I think the problem solved is close to a numerical instability. >>>> > >>>> > The failing comparison compares eigenvalues computed by >>>> > >>>> > eig(A, B) >>>> > eig(A, B, left=False, right=False) >>>> > >>>> > which differ solely in passing 'N' vs. 'V' to DGGEV. The eigenvalue >>>> > property of the former is also checked and seems to pass. Interestingly, >>>> > the result obtained from the two seems to differ (therefore, the latter >>>> > is probably wrong), which appears to point to a LAPACK issue. >>>> > >>>> > Here, it would be interesting to know if the problem occurs with >>>> > the official Scipy binaries, or something else. >>>> > >>>> >>>> I installed rc2 on Python 2.7.3. Same problem. I get the test_singular >>>> error on some, but not all, of the runs. Both are win32-superpack from >>>> http://sourceforge.net/projects/scipy/files/scipy/0.11.0rc2/. >>>> >>>> The error occurs on less than half but more than 1/3 (based on a very >>>> small sample) of the runs on both 2.7 and 3.2. >>>> >>>> I've been working on computers for more than 50 years. Somehow, I had >>>> developed the delusion that they were deterministic ..... >>>> john >>> >>> >>> What are we going to do about this one? I'm tempted to open a ticket for it >>> and mark it as knownfail on Windows for now, since it's a corner case. >> >> I have noticed that windows SVD appears to give different answers from >> repeated runs on the same matrix, differing in terms of sign flips, >> but valid SVDs. I've no idea why, but I had to adjust the tests in >> our code to allow for this. >> >> I guess we should make sure the returned results are correct, and fail >> otherwise. But maybe we do not require two runs to give the same >> answer. Could that explain the problem? > > I'm only paying partial attention and not up-to-date, just a few tries: > after installing rc2 into my 2.7.1 virtualenv, I cannot replicate any errors, also not running the singular test directly (py27b) E:\Josef\testing\tox\py27b\Scripts>easy_install "C:\...\scipy-0.11.0rc2-sse3.exe" (py27b) E:\Josef\testing\tox\py27b\Scripts>python Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import scipy >>> scipy.__version__ '0.11.0rc2' >>> scipy.__file__ 'E:\\Josef\\testing\\tox\\py27b\\lib\\site-packages\\scipy-0.11.0rc2-py2.7-win32.egg\\scipy\\__init__.pyc' >>> scipy.test() ... Ran 5486 tests in 56.394s OK (KNOWNFAIL=14, SKIP=42) ------------------------- If I just unzip the superpack instead of installing the sse3, I get only one failure ====================================================================== FAIL: test_basic.TestNorm.test_stable ---------------------------------------------------------------------- Traceback (most recent call last): File "E:\Josef\testing\tox\py27b\lib\site-packages\nose-1.1.2-py2.7.egg\nose\case.py", line 197, in runTest self.test(*self.arg) File "E:\Josef\testing\tox\py27b\lib\site-packages\scipy\linalg\tests\test_basic.py", line 592, in test_stable assert_almost_equal(norm(a) - 1e4, 0.0, err_msg=msg) File "E:\Josef\testing\tox\py27b\lib\site-packages\numpy-1.6.2-py2.7-win32.egg\numpy\testing\utils.py", line 468, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal to 7 decimals : Result should equal either 0.0 or 0.5 (depending on implementation of snrm2). ACTUAL: 0.4990234375 DESIRED: 0.0 maybe some sse incompatibilities. (I figured out how to install scipy into a virtualenv without installing it into my main python first.) Josef From vanforeest at gmail.com Sun Sep 9 13:13:44 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sun, 9 Sep 2012 19:13:44 +0200 Subject: [SciPy-Dev] stats cleanup, attempt Message-ID: Hi, Slowly but surely I am making some progress with respect to understanding the stats library. Hence I feel can start with making some contributions too. As a first attempt I would like to propose some clean up first. There is some code from https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L415 to https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L430 This is not used in distributions.py, and if I recall from a previous mail from Josef it not used at all anymore. Can I remove it? BTW: should I post a mail about proposals like this, or just remove it, and send a pull request with some explanation? Nicky From ralf.gommers at gmail.com Sun Sep 9 16:04:32 2012 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 9 Sep 2012 22:04:32 +0200 Subject: [SciPy-Dev] ckdtree pull request In-Reply-To: References:

Message-ID: On Mon, Sep 3, 2012 at 2:47 PM, Vincent Schut wrote: > On 09/03/12 14:37, Patrick Varilly wrote: > > Dear Vincent, > > > > I'm very happy to hear that the code has helped you. The pull request > > is almost ready (I wrote most of it), and I've been delinquent in > > putting together the last few finishing touches so that it can be > > considered for a merge (rolling in some benchmarking code and merging a > > few final contributed fixes). Other than those, the code is ready to > > go. I've been very busy over the past few weeks, but will do my best to > > get these last issues sorted out soon. > > > > All the best, > > > > Patrick > > Hi Patrick, > > thanks, sounds good! I can confirm that for me the code works, and the > memory leaks are gone. As I will soon need to install my processing > software on some new cluster computers, and I prefer to be able to just > install scipy from the main repo instead of having to merge it with > Sturla's and yours, that's why I asked. > I have added myself as a watcher to your PR, so I suppose I should get > informed automatically now when something changes or the PR gets merged. > > Thanks (also to Sturla) for the good work! > This PR is merged now. Thanks Patrick and Sturla! Ralf > > > > > On Mon, Sep 3, 2012 at 1:32 PM, Vincent Schut > > wrote: > > > > Hi all, > > > > there is this pull request with improvements for ckdtree > > (https://github.com/scipy/scipy/pull/262) that fixes quite some > memory > > leaks for me. It saves me from the need to frequently restart my > remote > > processing scripts, that use ckdtrees, and otherwise leak memory > like a > > sieve... May I humbly request - as just a user who knows his place - > > this to be merged? > > > > Best, > > Vincent Schut. > > > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > > > > > > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.haslwanter at alumni.ethz.ch Sun Sep 9 16:34:21 2012 From: thomas.haslwanter at alumni.ethz.ch (thomash) Date: Sun, 9 Sep 2012 20:34:21 +0000 (UTC) Subject: [SciPy-Dev] Savitzky-Golay for scipy.signal Message-ID: In my line of work, the Savitzky-Golay filter is one of the most useful filters, and I think it would be worth including it into scipy.signal. "rgommer" has already mentioned that he would support that notion, and I was wondering if it would be ok with the other developers if I go ahead with it? thomash From warren.weckesser at enthought.com Sun Sep 9 16:39:34 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Sun, 9 Sep 2012 15:39:34 -0500 Subject: [SciPy-Dev] Savitzky-Golay for scipy.signal In-Reply-To: References: Message-ID: On Sun, Sep 9, 2012 at 3:34 PM, thomash wrote: > In my line of work, the Savitzky-Golay filter is one of the most useful > filters, > and I think it would be worth including it into scipy.signal. > "rgommer" has already mentioned that he would support that notion, and I > was > wondering if it would be ok with the other developers if I go ahead with > it? > thomash > You're email is well-timed, because I've been working on an implementation today, and in fact, I was working on unit tests when your email arrived. I'll have a pull request on github very soon (definitely by the end of the day). Warren > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Sep 9 17:27:29 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 9 Sep 2012 17:27:29 -0400 Subject: [SciPy-Dev] stats cleanup, attempt In-Reply-To: References: Message-ID: On Sun, Sep 9, 2012 at 1:13 PM, nicky van foreest wrote: > Hi, > > Slowly but surely I am making some progress with respect to > understanding the stats library. Hence I feel can start with making > some contributions too. As a first attempt I would like to propose > some clean up first. There is some code from > > https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L415 > > to > > https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L430 > > This is not used in distributions.py, and if I recall from a previous > mail from Josef it not used at all anymore. Can I remove it? I don't see why not. > > BTW: should I post a mail about proposals like this, or just remove > it, and send a pull request with some explanation? Just going for the pull request is fine if it doesn't look really controversial. Also with separate cleanup commits within a pull request, it is also still possible to select there. ------ A comment about "dead" code: Sometimes code is currently dead because the original author didn't finish implementing a feature or didn't get it to work, or because it became dead after many refactorings. Sometimes it's best to throw it out, sometimes it's waiting for someone to finish. (old story) I found a lot of code in distributions that didn't work, and I spend time fixing and finishing it, but I wouldn't have been able to come up with the ideas and code by myself (as beginner with numpy/scipy). I don't think there is much like that left in scipy.stats (outside of _support.py), so cleaning up will be good. (But I wouldn't want any "cleanup crew" to go through my statsmodels sandbox and delete everything that doesn't work or is not used (yet). :) Glad to hear you are back with scipy.stats. Josef > > Nicky > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From kyle.mandli at gmail.com Sun Sep 9 20:14:40 2012 From: kyle.mandli at gmail.com (Kyle Mandli) Date: Sun, 9 Sep 2012 19:14:40 -0500 Subject: [SciPy-Dev] SciPy.org Website Move and Redesign In-Reply-To: References: <3C6CC713CBD949D2A51AB3C9859B493F@gmail.com> Message-ID: My replies to Pauli are below. I also like the bottom second from the left graphic, thanks for those! On Wed, Aug 29, 2012 at 1:58 PM, Pauli Virtanen wrote: > Kyle Mandli gmail.com> writes: >> I kind of dropped the ball on this as I have been traveling >> for the past month. I think we are ready to move >> forward on this and at least get the pieces in place. >> I am still not clear exactly what the current status of >> the scipy.org-new and scipy.github.com repos on github are. >> Can anyone speak to this? > > The status of scipy.org-new repository is that it is incomplete, but > future work should build on it, as it has some basic infrastructure > in place. (The conference/ directory in that repo can be ignored/removed, > as I believe it is outdated.) > > The scipy.github.com contains the HTML output built from scipy.org-new. This sounds like exactly what we had in mind! We may want to just go ahead and rename these repositories to mirror what ipython and matplotlib are using for names (this would apply to scipy.org-new). Is the content then in scipy.org-new relatively up to date with the old site? We could also start to accept github issues for things to bring over and people can then knock off missing pieces until we built up to where we left off. Also, this approach would probably require us to stop all new additions of content to the new site and encourage people to post to the new one. >> On a related topic, I have been searching out someone who may be >> able to do some redesign work on the site but >> was not successful (at least a free option). Any suggestions >> for someone with some design experience that >> could update the look of the site a bit please let us know. > > Well, the only requirement is that it should look better and more > organized than the current scipy.org page. That probably does not > require 1337 design chops, just some CSS/HTML skills :) > > Also, somehow the typography on scipy.github.com is off, the text > looks ugly and is difficult to read. > > The main point IMO is just to come up with some visual consistency > and navigation scheme between the main site, the documentation, etc. > This doesn't have to be perfect, the main thing is just to think up > some sort of a consistent framework. Agreed. As one doctor once said though, "I am a doctor, not a designer", at least that's probably what he would have said. > > The second thing related to content would be to think what to > do with the different Scipys: > > - "Scipy the community site" > - "Scipy the library" > > Well, anyway, the above is just my suggestion on what could be done. > If you can think of some other ways to improve the situation, > please don't let this hold you up. This issue is interesting and definitely bear further discussion in the community. Maybe the conference series wants to consider this as well? > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From warren.weckesser at enthought.com Sun Sep 9 22:20:58 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Sun, 9 Sep 2012 21:20:58 -0500 Subject: [SciPy-Dev] Savitzky-Golay for scipy.signal In-Reply-To: References:

Message-ID: On Sun, Sep 9, 2012 at 3:39 PM, Warren Weckesser < warren.weckesser at enthought.com> wrote: > > On Sun, Sep 9, 2012 at 3:34 PM, thomash wrote: > >> In my line of work, the Savitzky-Golay filter is one of the most useful >> filters, >> and I think it would be worth including it into scipy.signal. >> "rgommer" has already mentioned that he would support that notion, and I >> was >> wondering if it would be ok with the other developers if I go ahead with >> it? >> thomash >> > > > You're email is well-timed, because I've been working on an implementation > today, and in fact, I was working on unit tests when your email arrived. > I'll have a pull request on github very soon (definitely by the end of the > day). > > The pull request is here: https://github.com/scipy/scipy/pull/304 It is a work-in-progress--add comments and suggestions in the pull request. Warren Warren > > > >> >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vanforeest at gmail.com Tue Sep 11 09:46:23 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Tue, 11 Sep 2012 15:46:23 +0200 Subject: [SciPy-Dev] stats some further clean up Message-ID: Hi, distributions.py contains some #hashed documentation from lines https://github.com/nokfi/scipy/blob/master/scipy/stats/distributions.py#L509 to https://github.com/nokfi/scipy/blob/master/scipy/stats/distributions.py#L598 I would prefer to remove it as some parts should better fit in the documentation at http://docs.scipy.org/doc/scipy/reference/tutorial/stats/continuous.html#continuous-random-variables while a lot is already covered in the standard doc strings of rv_continuous and rv_discrete. Should I just go ahead, or first make a somewhat more specified list of what I intend to do? nicky From josef.pktd at gmail.com Tue Sep 11 10:03:39 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 11 Sep 2012 10:03:39 -0400 Subject: [SciPy-Dev] stats some further clean up In-Reply-To: References: Message-ID: On Tue, Sep 11, 2012 at 9:46 AM, nicky van foreest wrote: > Hi, > > distributions.py contains some #hashed documentation from lines > > https://github.com/nokfi/scipy/blob/master/scipy/stats/distributions.py#L509 > > to > > https://github.com/nokfi/scipy/blob/master/scipy/stats/distributions.py#L598 > > I would prefer to remove it as some parts should better fit in the > documentation at > http://docs.scipy.org/doc/scipy/reference/tutorial/stats/continuous.html#continuous-random-variables > while a lot is already covered in the standard doc strings of > rv_continuous and rv_discrete. Should I just go ahead, or first make a > somewhat more specified list of what I intend to do? Do we need module maintainer notes that are too much for the user facing documentation? I often keep some under triple quotes at the top of a module. You're the latest to figure out distributions.py and might know what additional information is helpful. Josef > > nicky > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From chris.felton at gmail.com Tue Sep 11 16:55:06 2012 From: chris.felton at gmail.com (Christopher Felton) Date: Tue, 11 Sep 2012 15:55:06 -0500 Subject: [SciPy-Dev] Savitzky-Golay for scipy.signal In-Reply-To: References:

Message-ID: On 9/9/2012 9:20 PM, Warren Weckesser wrote: > On Sun, Sep 9, 2012 at 3:39 PM, Warren Weckesser < > warren.weckesser at enthought.com> wrote: > >> >> On Sun, Sep 9, 2012 at 3:34 PM, thomash wrote: >> >>> In my line of work, the Savitzky-Golay filter is one of the most useful >>> filters, >>> and I think it would be worth including it into scipy.signal. >>> "rgommer" has already mentioned that he would support that notion, and I >>> was >>> wondering if it would be ok with the other developers if I go ahead with >>> it? >>> thomash >>> >> >> >> You're email is well-timed, because I've been working on an implementation >> today, and in fact, I was working on unit tests when your email arrived. >> I'll have a pull request on github very soon (definitely by the end of the >> day). >> >> > Where would the SG filter live in the scipy.signal namespace? There has been talk of reorganizing scipy.signal. Might consider the possible reorg or where it might make sense for it to live. Or is it simply in the .signal namespace? Regards, Chris From travis at vaught.net Tue Sep 11 17:16:16 2012 From: travis at vaught.net (Travis Vaught) Date: Tue, 11 Sep 2012 16:16:16 -0500 Subject: [SciPy-Dev] scipy domains In-Reply-To: <1AAE1367-051E-4130-A53E-8392C88F93AE@vaught.net> References: <1AAE1367-051E-4130-A53E-8392C88F93AE@vaught.net> Message-ID: On Sep 10, 2012, at 12:06 PM, Travis Vaught wrote: > Forwarding to scipy-user, since I've not had any thoughts out of scipy-dev on this. > > Maybe this is relevant to the naming discussion. > > Anyone? > > > Begin forwarded message: > >> From: Travis Vaught >> Subject: scipy domains >> Date: September 5, 2012 9:31:49 AM CDT >> To: "Scipy-Dev at Scipy. Org" >> >> All, >> >> In a fit of nostalgia, I began to renew the domains scipy.com and scipy.net this morning. >> >> I've no idea whether they have any real use other than brand protection, but I'm more than willing to fund the renewals for as long as I have the means to do so. (Note: even though I'm the listed administrator of these domains, I consider Enthought -- with which I'm no longer formally affiliated -- to be their capable caretaker). >> >> In the renewal process, the company from which I register the domains 'suggested' that there are other domains I might be interested in registering. Namely, scipy.info, scipy.us, and scipy.biz. Is there any use for these? I'm happy to add them to the bill if someone can make a case for their use/protection. >> >> Any thoughts are appreciated. >> >> Best, >> >> Travis > Alright, I'm feeling a bit neglected on this, so I'll just renew the scipy.net and scipy.com domains and leave it at that. If anyone needs to admin the domains going forward, let me know. Best, Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Tue Sep 11 22:16:28 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Tue, 11 Sep 2012 21:16:28 -0500 Subject: [SciPy-Dev] Savitzky-Golay for scipy.signal In-Reply-To: References:

Message-ID: On Tue, Sep 11, 2012 at 3:55 PM, Christopher Felton wrote: > On 9/9/2012 9:20 PM, Warren Weckesser wrote: > > On Sun, Sep 9, 2012 at 3:39 PM, Warren Weckesser < > > warren.weckesser at enthought.com> wrote: > > > >> > >> On Sun, Sep 9, 2012 at 3:34 PM, thomash < > thomas.haslwanter at alumni.ethz.ch>wrote: > >> > >>> In my line of work, the Savitzky-Golay filter is one of the most useful > >>> filters, > >>> and I think it would be worth including it into scipy.signal. > >>> "rgommer" has already mentioned that he would support that notion, and > I > >>> was > >>> wondering if it would be ok with the other developers if I go ahead > with > >>> it? > >>> thomash > >>> > >> > >> > >> You're email is well-timed, because I've been working on an > implementation > >> today, and in fact, I was working on unit tests when your email arrived. > >> I'll have a pull request on github very soon (definitely by the end of > the > >> day). > >> > >> > > > > Where would the SG filter live in the scipy.signal > namespace? > > There has been talk of reorganizing scipy.signal. > Might consider the possible reorg or where it might > make sense for it to live. Or is it simply in the > .signal namespace? > > In the pull request (https://github.com/scipy/scipy/pull/304), I added the functions to the scipy.signal namespace. Historically, the scipy packages tend towards flat namespaces. Warren > Regards, > Chris > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vanforeest at gmail.com Wed Sep 12 15:51:03 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Wed, 12 Sep 2012 21:51:03 +0200 Subject: [SciPy-Dev] continuous distributions documentation Message-ID: Hi, I updated my git version of https://github.com/scipy/scipy/blob/master/doc/source/tutorial/stats/continuous.lyx somewhat. I converted it to latex to convert it, with pandoc, to rst. However, the result is dramatic. Does anybody know how the lyx file was converted to rst so as to get a decent result? BTW, the latex file looks ok, so the problem resides in the conversion to rst. thanks Nicky From cimrman3 at ntc.zcu.cz Wed Sep 12 17:12:39 2012 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Wed, 12 Sep 2012 23:12:39 +0200 Subject: [SciPy-Dev] ANN: SfePy 2012.3 Message-ID: <5050FAC7.9020603@ntc.zcu.cz> I am pleased to announce release 2012.3 of SfePy. Description ----------- SfePy (simple finite elements in Python) is a software for solving systems of coupled partial differential equations by the finite element method. The code is based on NumPy and SciPy packages. It is distributed under the new BSD license. Home page: http://sfepy.org Downloads, mailing list, wiki: http://code.google.com/p/sfepy/ Git (source) repository, issue tracker: http://github.com/sfepy Highlights of this release -------------------------- - several new terms - material parameters can be defined per region using region names - base function values can be defined per element - support for global options For full release notes see http://docs.sfepy.org/doc/release_notes.html#id1 (rather long and technical). Best regards, Robert Cimrman and Contributors (*) (*) Contributors to this release (alphabetical order): Alec Kalinin, Vladim?r Luke? From thomas.haslwanter at alumni.ethz.ch Thu Sep 13 11:38:03 2012 From: thomas.haslwanter at alumni.ethz.ch (thomash) Date: Thu, 13 Sep 2012 15:38:03 +0000 (UTC) Subject: [SciPy-Dev] Savitzky-Golay for scipy.signal References:

Message-ID: Warren Weckesser enthought.com> writes: > > > On Sun, Sep 9, 2012 at 3:39 PM, Warren Weckesser enthought.com> wrote: > > On Sun, Sep 9, 2012 at 3:34 PM, thomash alumni.ethz.ch> wrote: > In my line of work, the Savitzky-Golay filter is one of the most useful filters, > and I think it would be worth including it into scipy.signal. > "rgommer" has already mentioned that he would support that notion, and I was > wondering if it would be ok with the other developers if I go ahead with it? > thomash > > > > You're email is well-timed, because I've been working on an implementation today, and in fact, I was working on unit tests when your email arrived.? I'll have a pull request on github very soon (definitely by the end of the day). > > > > > The pull request is here: https://github.com/scipy/scipy/pull/304It is a work-in-progress--add comments and suggestions in the pull request.Warren > > > > Warren? > > > _______________________________________________ > SciPy-Dev mailing listSciPy-Dev scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-dev > > > > > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > Warren, have a look at the following code: it is more concise, and it handles much more nicely at the edges (just give it a try with calculating the second derivative with your code, and with the code included here). Let me know what you think thomas # ---------Code Thomas, start -------------- # -*- coding: utf-8 -*- r"""Smooth (and optionally differentiate) data with a Savitzky-Golay filter. The Savitzky-Golay filter removes high frequency noise from data. It has the advantage of preserving the original shape and features of the signal better than other types of filtering approaches, such as moving averages techhniques. Parameters ---------- y : array_like, shape (N,) the values of the time history of the signal. window_size : int the length of the window. Must be an odd integer number. order : int the order of the polynomial used in the filtering. Must be less then `window_size` - 1. deriv: int the order of the derivative to compute (default = 0 means only smoothing) rate: sampling rate (in Hz; only used for derivatives) Returns ------- ys : ndarray, shape (N) the smoothed signal (or it's n-th derivative). Notes ----- The Savitzky-Golay is a type of low-pass filter, particularly suited for smoothing noisy data. The main idea behind this approach is to make for each point a least-square fit with a polynomial of high order over a odd-sized window centered at the point. The data at the beginning / end of the sample are deterimined from the best polynomial fit to the first / last datapoints. This makes the code a bit more complicated, but avoids wild artefacts at the beginning and the end. "Cutoff-frequencies": for smoothing (deriv=0), the frequency where the amplitude is reduced by 10% is approximately given by f_cutoff = sampling_rate / (1.5 * look) for the first derivative (deriv=1), the frequency where the amplitude is reduced by 10% is approximately given by f_cutoff = sampling_rate / (4 * look) Examples -------- t = np.linspace(-4, 4, 500) y = np.exp( -t**2 ) + np.random.normal(0, 0.05, t.shape) ysg = savitzky_golay(y, window_size=31, order=4) import matplotlib.pyplot as plt plt.plot(t, y, label='Noisy signal') plt.plot(t, np.exp(-t**2), 'k', lw=1.5, label='Original signal') plt.plot(t, ysg, 'r', label='Filtered signal') plt.legend() plt.show() References ---------- .. [1] A. Savitzky, M. J. E. Golay, Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry, 1964, 36 (8), pp 1627-1639. .. [2] Numerical Recipes 3rd Edition: The Art of Scientific Computing W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery Cambridge University Press ISBN-13: 9780521880688 .. [3] Siegmund Brandt, Datenanalyse, pp 435 """ """ Author: Thomas Haslwanter Version: 1.0 Date: 25-July-2012 """ import numpy as np from numpy import dot from math import factorial def savgol(x, window_size=3, order=2, deriv=0, rate=1): ''' Savitzky-Golay filter ''' # Check the input try: window_size = np.abs(np.int(window_size)) order = np.abs(np.int(order)) except ValueError: raise ValueError("window_size and order have to be of type int") if window_size > len(x): raise TypeError("Not enough data points!") if window_size % 2 != 1 or window_size < 1: raise TypeError("window_size size must be a positive odd number") if window_size < order + 1: raise TypeError("window_size is too small for the polynomials order") if order <= deriv: raise TypeError("The 'deriv' of the polynomial is too high.") # Calculate some required parameters order_range = range(order+1) half_window = (window_size -1) // 2 num_data = len(x) # Construct Vandermonde matrix, its inverse, and the Savitzky-Golay coefficients a = [[ii**jj for jj in order_range] for ii in range(-half_window, half_window+1)] pa = np.linalg.pinv(a) sg_coeff = pa[deriv] * rate**deriv * factorial(deriv) # Get the coefficients for the fits at the beginning and at the end # of the data coefs = np.array(order_range)**np.sign(deriv) coef_mat = np.zeros((order+1, order+1)) row = 0 for ii in range(deriv,order+1): coef = coefs[ii] for jj in range(1,deriv): coef *= (coefs[ii]-jj) coef_mat[row,row+deriv]=coef row += 1 coef_mat *= rate**deriv # Add the first and last point half_window times firstvals = np.ones(half_window) * x[0] lastvals = np.ones(half_window) * x[-1] x_calc = np.concatenate((firstvals, x, lastvals)) y = np.convolve( sg_coeff[::-1], x_calc, mode='full') # chop away intermediate data y = y[window_size-1:window_size+num_data-1] # filtering for the first and last few datapoints y[0:half_window] = dot(dot(dot(a[0:half_window], coef_mat), np.mat(pa)),x[0:window_size]) y[len(y)-half_window:len(y)] = dot(dot(dot(a[half_window+1:window_size], coef_mat), pa), x[len(x)-window_size:len(x)]) return y # --------- code thomas, stop ------------ From vanforeest at gmail.com Thu Sep 13 14:42:42 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Thu, 13 Sep 2012 20:42:42 +0200 Subject: [SciPy-Dev] continuous distributions documentation In-Reply-To: References: Message-ID: Just In case anybody is still thinking about this point: I already solved in another way. On 12 September 2012 21:51, nicky van foreest wrote: > Hi, > > I updated my git version of > https://github.com/scipy/scipy/blob/master/doc/source/tutorial/stats/continuous.lyx > somewhat. I converted it to latex to convert it, with pandoc, to rst. > However, the result is dramatic. Does anybody know how the lyx file > was converted to rst so as to get a decent result? BTW, the latex file > looks ok, so the problem resides in the conversion to rst. > > thanks > > Nicky From vanforeest at gmail.com Thu Sep 13 14:46:51 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Thu, 13 Sep 2012 20:46:51 +0200 Subject: [SciPy-Dev] stats some further clean up In-Reply-To: References: Message-ID: Hi Josef, On 11 September 2012 16:03, wrote: > On Tue, Sep 11, 2012 at 9:46 AM, nicky van foreest wrote: >> Hi, >> >> distributions.py contains some #hashed documentation from lines >> >> https://github.com/nokfi/scipy/blob/master/scipy/stats/distributions.py#L509 >> >> to >> >> https://github.com/nokfi/scipy/blob/master/scipy/stats/distributions.py#L598 >> >> I would prefer to remove it as some parts should better fit in the >> documentation at >> http://docs.scipy.org/doc/scipy/reference/tutorial/stats/continuous.html#continuous-random-variables >> while a lot is already covered in the standard doc strings of >> rv_continuous and rv_discrete. Should I just go ahead, or first make a >> somewhat more specified list of what I intend to do? > > Do we need module maintainer notes that are too much for the user > facing documentation? I don't quite get what you mean by this. Is it still relevant in view of my latest pull request about the documentation? > I often keep some under triple quotes at the top of a module. Sure, but the lines I referred are simply in overlap with other docstrings. Hence, from the DRY principle, it is better to remove it IMO. Nicky > > You're the latest to figure out distributions.py and might know what > additional information is helpful. > > Josef > >> >> nicky >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From vanforeest at gmail.com Thu Sep 13 17:21:24 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Thu, 13 Sep 2012 23:21:24 +0200 Subject: [SciPy-Dev] distributions.py Message-ID: Hi, Now that I understand github (Thanks to Ralf for his explanations in Dutch) and got some simple stuff out of the way in distributions.py I would like to tackle a somewhat harder issue. The function argsreduce is, as far as I can see, too generic. I did some tests to see whether its most generic output, as described by its docstring, is actually swallowed by the callers of argsreduce, but this appears not to be the case. My motivation to simplify the code in distributions.py (and clean it up) is partly based on making it simpler to understand for myself, but also to others. The fact that github makes code browsing a much nicer experience, perhaps more people will take a look at what's under the hood. But then the code should also be accessible and clean. Are there any reasons not to pursue this path, and focus on more important problems of the stats library? Nicky From josef.pktd at gmail.com Thu Sep 13 18:48:40 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 13 Sep 2012 18:48:40 -0400 Subject: [SciPy-Dev] distributions.py In-Reply-To: References: Message-ID: On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest wrote: > Hi, > > Now that I understand github (Thanks to Ralf for his explanations in > Dutch) and got some simple stuff out of the way in distributions.py I > would like to tackle a somewhat harder issue. The function argsreduce > is, as far as I can see, too generic. I did some tests to see whether > its most generic output, as described by its docstring, is actually > swallowed by the callers of argsreduce, but this appears not to be the > case. being generic is not a disadvantage (per se) if it's fast https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665 (and a being a one liner is not a disadvantage either) Josef > > My motivation to simplify the code in distributions.py (and clean it > up) is partly based on making it simpler to understand for myself, but > also to others. The fact that github makes code browsing a much nicer > experience, perhaps more people will take a look at what's under the > hood. But then the code should also be accessible and clean. Are there > any reasons not to pursue this path, and focus on more important > problems of the stats library? > > Nicky > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From vanforeest at gmail.com Fri Sep 14 16:21:46 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Fri, 14 Sep 2012 22:21:46 +0200 Subject: [SciPy-Dev] distributions.py expect Message-ID: Hi, I am trying to implement the expect method in rv_frozen. To understand the normal working of the expect method I tried the following: from scipy.stats import geom, norm, gamma print norm.expect(loc = 3,scale =5) print gamma.expect(None, 4.5) print gamma.expect(lambda x: x, 4.5) print geom.expect(lambda x: x, 1./3) This is the result: 3.0 4.5 4.5 Traceback (most recent call last): File "expecttest.py", line 6, in print geom.expect(lambda x: x, 1./3) File "/home/nicky/prog/scipy/scipy/stats/distributions.py", line 6375, in expect self._argcheck(*args) # (re)generate scalar self.a and self.b TypeError: _argcheck() argument after * must be a sequence, not float So the first examples work, but the rv_discrete example doesn't. One thing is that the _argcheck is not called in rv_continuous while it is in rv_discrete. Removing this line results in another error: 3.0 4.5 4.5 Traceback (most recent call last): File "expecttest.py", line 6, in print geom.expect(lambda x: x, 1./3) File "/home/nicky/prog/scipy/scipy/stats/distributions.py", line 6394, in expect low, upp = self._ppf(0.001, *args), self._ppf(0.999, *args) TypeError: _ppf() argument after * must be a sequence, not float What would be actually the right way to call the expect method for rv_discrete? Or perhaps the other way around, should this method be changed so that my example with geom works? ---- I extended rv_frozen with this method, and this appears to work well for continuous rvs, but also fails for discrete rvs. def expect(self, *args, **kwds): args += self.args kwds.update(self.kwds) return self.dist.expect(*args, **kwds) Nicky From josef.pktd at gmail.com Fri Sep 14 16:45:02 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 14 Sep 2012 16:45:02 -0400 Subject: [SciPy-Dev] distributions.py expect In-Reply-To: References: Message-ID: On Fri, Sep 14, 2012 at 4:21 PM, nicky van foreest wrote: > Hi, > > I am trying to implement the expect method in rv_frozen. To understand > the normal working of the expect method I tried the following: > > from scipy.stats import geom, norm, gamma > > print norm.expect(loc = 3,scale =5) > print gamma.expect(None, 4.5) > print gamma.expect(lambda x: x, 4.5) > print geom.expect(lambda x: x, 1./3) print geom.expect(lambda x: x, (1./3,)) args as tuple ? I will look at it later. The two versions should have a consistent signature. Josef > > This is the result: > > 3.0 > 4.5 > 4.5 > Traceback (most recent call last): > File "expecttest.py", line 6, in > print geom.expect(lambda x: x, 1./3) > File "/home/nicky/prog/scipy/scipy/stats/distributions.py", line > 6375, in expect > self._argcheck(*args) # (re)generate scalar self.a and self.b > TypeError: _argcheck() argument after * must be a sequence, not float > > So the first examples work, but the rv_discrete example doesn't. One > thing is that the _argcheck is not called in rv_continuous while it is > in rv_discrete. Removing this line results in another error: > > 3.0 > 4.5 > 4.5 > Traceback (most recent call last): > File "expecttest.py", line 6, in > print geom.expect(lambda x: x, 1./3) > File "/home/nicky/prog/scipy/scipy/stats/distributions.py", line > 6394, in expect > low, upp = self._ppf(0.001, *args), self._ppf(0.999, *args) > TypeError: _ppf() argument after * must be a sequence, not float > > What would be actually the right way to call the expect method for > rv_discrete? Or perhaps the other way around, should this method be > changed so that my example with geom works? > > ---- > > I extended rv_frozen with this method, and this appears to work well > for continuous rvs, but also fails for discrete rvs. > > def expect(self, *args, **kwds): > args += self.args > kwds.update(self.kwds) > return self.dist.expect(*args, **kwds) > > Nicky > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From ralf.gommers at gmail.com Fri Sep 14 16:49:03 2012 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 14 Sep 2012 22:49:03 +0200 Subject: [SciPy-Dev] distributions.py In-Reply-To: References: Message-ID: On Fri, Sep 14, 2012 at 12:48 AM, wrote: > On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest > wrote: > > Hi, > > > > Now that I understand github (Thanks to Ralf for his explanations in > > Dutch) and got some simple stuff out of the way in distributions.py I > > would like to tackle a somewhat harder issue. The function argsreduce > > is, as far as I can see, too generic. I did some tests to see whether > > its most generic output, as described by its docstring, is actually > > swallowed by the callers of argsreduce, but this appears not to be the > > case. > > being generic is not a disadvantage (per se) if it's fast > > https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665 > (and a being a one liner is not a disadvantage either) > > Josef > > > > > My motivation to simplify the code in distributions.py (and clean it > > up) is partly based on making it simpler to understand for myself, but > > also to others. The fact that github makes code browsing a much nicer > > experience, perhaps more people will take a look at what's under the > > hood. But then the code should also be accessible and clean. Are there > > any reasons not to pursue this path, and focus on more important > > problems of the stats library? > Not sure that argsreduce is the best place to start (see Josef's reply), but there should be things that can be done to make the code easier to read. For example, this code is used in ~10 methods of rv_continuous: loc,scale=map(kwds.get,['loc','scale']) args, loc, scale = self._fix_loc_scale(args, loc, scale) x,loc,scale = map(asarray,(x,loc,scale)) args = tuple(map(asarray,args)) Some refactoring may be in order. The same is true of the rest of the implementation of many of those methods. Some are exactly the same except for calls to the corresponding underscored method (example: logsf() and logcdf() are identical except for calls to _logsf() and _logcdf(), and one nonsensical multiplication). Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From vanderplas at astro.washington.edu Fri Sep 14 16:56:50 2012 From: vanderplas at astro.washington.edu (Jake Vanderplas) Date: Fri, 14 Sep 2012 13:56:50 -0700 Subject: [SciPy-Dev] distributions.py In-Reply-To: References:

Message-ID: <50539A12.3000101@astro.washington.edu> On 09/14/2012 01:49 PM, Ralf Gommers wrote: > > > On Fri, Sep 14, 2012 at 12:48 AM, > wrote: > > On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest > > wrote: > > Hi, > > > > Now that I understand github (Thanks to Ralf for his explanations in > > Dutch) and got some simple stuff out of the way in > distributions.py I > > would like to tackle a somewhat harder issue. The function > argsreduce > > is, as far as I can see, too generic. I did some tests to see > whether > > its most generic output, as described by its docstring, is actually > > swallowed by the callers of argsreduce, but this appears not to > be the > > case. > > being generic is not a disadvantage (per se) if it's fast > https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665 > (and a being a one liner is not a disadvantage either) > > Josef > > > > > My motivation to simplify the code in distributions.py (and clean it > > up) is partly based on making it simpler to understand for > myself, but > > also to others. The fact that github makes code browsing a much > nicer > > experience, perhaps more people will take a look at what's under the > > hood. But then the code should also be accessible and clean. Are > there > > any reasons not to pursue this path, and focus on more important > > problems of the stats library? > > > Not sure that argsreduce is the best place to start (see Josef's > reply), but there should be things that can be done to make the code > easier to read. For example, this code is used in ~10 methods of > rv_continuous: > > loc,scale=map(kwds.get,['loc','scale']) > args, loc, scale = self._fix_loc_scale(args, loc, scale) > x,loc,scale = map(asarray,(x,loc,scale)) > args = tuple(map(asarray,args)) > > Some refactoring may be in order. The same is true of the rest of the > implementation of many of those methods. Some are exactly the same > except for calls to the corresponding underscored method (example: > logsf() and logcdf() are identical except for calls to _logsf() and > _logcdf(), and one nonsensical multiplication). > > Ralf > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev I would say that the most important improvement needed in distributions is in the documentation. A new user would look at the doc string of, say, scipy.stats.norm, and have no idea how to proceed. Here's the current example from the docstring of scipy.stats.norm: Examples -------- >>> from scipy.stats import norm >>> numargs = norm.numargs >>> [ ] = [0.9,] * numargs >>> rv = norm() >>> x = np.linspace(0, np.minimum(rv.dist.b, 3)) >>> h = plt.plot(x, rv.pdf(x)) I don't even know what that means... and it doesn't compile. Also, what is b? how would I enter mu and sigma to make a normal distribution? It's all pretty opaque. Jake -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Sep 14 17:01:35 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 14 Sep 2012 17:01:35 -0400 Subject: [SciPy-Dev] distributions.py In-Reply-To: References:

Message-ID: On Fri, Sep 14, 2012 at 4:49 PM, Ralf Gommers wrote: > > > On Fri, Sep 14, 2012 at 12:48 AM, wrote: >> >> On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest >> wrote: >> > Hi, >> > >> > Now that I understand github (Thanks to Ralf for his explanations in >> > Dutch) and got some simple stuff out of the way in distributions.py I >> > would like to tackle a somewhat harder issue. The function argsreduce >> > is, as far as I can see, too generic. I did some tests to see whether >> > its most generic output, as described by its docstring, is actually >> > swallowed by the callers of argsreduce, but this appears not to be the >> > case. >> >> being generic is not a disadvantage (per se) if it's fast >> >> https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665 >> (and a being a one liner is not a disadvantage either) >> >> Josef >> >> > >> > My motivation to simplify the code in distributions.py (and clean it >> > up) is partly based on making it simpler to understand for myself, but >> > also to others. The fact that github makes code browsing a much nicer >> > experience, perhaps more people will take a look at what's under the >> > hood. But then the code should also be accessible and clean. Are there >> > any reasons not to pursue this path, and focus on more important >> > problems of the stats library? > > > Not sure that argsreduce is the best place to start (see Josef's reply), but > there should be things that can be done to make the code easier to read. For > example, this code is used in ~10 methods of rv_continuous: > > loc,scale=map(kwds.get,['loc','scale']) > args, loc, scale = self._fix_loc_scale(args, loc, scale) > x,loc,scale = map(asarray,(x,loc,scale)) > args = tuple(map(asarray,args)) > > Some refactoring may be in order. The same is true of the rest of the > implementation of many of those methods. Some are exactly the same except > for calls to the corresponding underscored method (example: logsf() and > logcdf() are identical except for calls to _logsf() and _logcdf(), and one > nonsensical multiplication). however when comparing across methods pdf, cdf, sf, ppf, (not with the log version) then there are small differences how bounds are handled, and the details can be tricky. Josef > > Ralf > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From ralf.gommers at gmail.com Sat Sep 15 05:47:07 2012 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 15 Sep 2012 11:47:07 +0200 Subject: [SciPy-Dev] distributions.py In-Reply-To: References:

Message-ID: On Fri, Sep 14, 2012 at 11:01 PM, wrote: > On Fri, Sep 14, 2012 at 4:49 PM, Ralf Gommers > wrote: > > > > > > On Fri, Sep 14, 2012 at 12:48 AM, wrote: > >> > >> On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest < > vanforeest at gmail.com> > >> wrote: > >> > Hi, > >> > > >> > Now that I understand github (Thanks to Ralf for his explanations in > >> > Dutch) and got some simple stuff out of the way in distributions.py I > >> > would like to tackle a somewhat harder issue. The function argsreduce > >> > is, as far as I can see, too generic. I did some tests to see whether > >> > its most generic output, as described by its docstring, is actually > >> > swallowed by the callers of argsreduce, but this appears not to be the > >> > case. > >> > >> being generic is not a disadvantage (per se) if it's fast > >> > >> > https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665 > >> (and a being a one liner is not a disadvantage either) > >> > >> Josef > >> > >> > > >> > My motivation to simplify the code in distributions.py (and clean it > >> > up) is partly based on making it simpler to understand for myself, but > >> > also to others. The fact that github makes code browsing a much nicer > >> > experience, perhaps more people will take a look at what's under the > >> > hood. But then the code should also be accessible and clean. Are there > >> > any reasons not to pursue this path, and focus on more important > >> > problems of the stats library? > > > > > > Not sure that argsreduce is the best place to start (see Josef's reply), > but > > there should be things that can be done to make the code easier to read. > For > > example, this code is used in ~10 methods of rv_continuous: > > > > loc,scale=map(kwds.get,['loc','scale']) > > args, loc, scale = self._fix_loc_scale(args, loc, scale) > > x,loc,scale = map(asarray,(x,loc,scale)) > > args = tuple(map(asarray,args)) > > > > Some refactoring may be in order. The same is true of the rest of the > > implementation of many of those methods. Some are exactly the same except > > for calls to the corresponding underscored method (example: logsf() and > > logcdf() are identical except for calls to _logsf() and _logcdf(), and > one > > nonsensical multiplication). > > however when comparing across methods pdf, cdf, sf, ppf, (not with the > log version) then there are small differences how bounds are handled, > and the details can be tricky. > Right, and the way it's written it's very hard to figure out those differences. It would help if the common parts were refactored out, making the differences visible, and that comments were added to explain the differences. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Sep 15 05:59:58 2012 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 15 Sep 2012 11:59:58 +0200 Subject: [SciPy-Dev] distributions.py In-Reply-To: <50539A12.3000101@astro.washington.edu> References:

<50539A12.3000101@astro.washington.edu> Message-ID: On Fri, Sep 14, 2012 at 10:56 PM, Jake Vanderplas < vanderplas at astro.washington.edu> wrote: > On 09/14/2012 01:49 PM, Ralf Gommers wrote: > > > > On Fri, Sep 14, 2012 at 12:48 AM, wrote: > >> On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest >> wrote: >> > Hi, >> > >> > Now that I understand github (Thanks to Ralf for his explanations in >> > Dutch) and got some simple stuff out of the way in distributions.py I >> > would like to tackle a somewhat harder issue. The function argsreduce >> > is, as far as I can see, too generic. I did some tests to see whether >> > its most generic output, as described by its docstring, is actually >> > swallowed by the callers of argsreduce, but this appears not to be the >> > case. >> >> being generic is not a disadvantage (per se) if it's fast >> >> https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665 >> (and a being a one liner is not a disadvantage either) >> >> Josef >> >> > >> > My motivation to simplify the code in distributions.py (and clean it >> > up) is partly based on making it simpler to understand for myself, but >> > also to others. The fact that github makes code browsing a much nicer >> > experience, perhaps more people will take a look at what's under the >> > hood. But then the code should also be accessible and clean. Are there >> > any reasons not to pursue this path, and focus on more important >> > problems of the stats library? >> > > Not sure that argsreduce is the best place to start (see Josef's reply), > but there should be things that can be done to make the code easier to > read. For example, this code is used in ~10 methods of rv_continuous: > > loc,scale=map(kwds.get,['loc','scale']) > args, loc, scale = self._fix_loc_scale(args, loc, scale) > x,loc,scale = map(asarray,(x,loc,scale)) > args = tuple(map(asarray,args)) > > Some refactoring may be in order. The same is true of the rest of the > implementation of many of those methods. Some are exactly the same except > for calls to the corresponding underscored method (example: logsf() and > logcdf() are identical except for calls to _logsf() and _logcdf(), and one > nonsensical multiplication). > > Ralf > > > > _______________________________________________ > SciPy-Dev mailing listSciPy-Dev at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-dev > > I would say that the most important improvement needed in distributions > is in the documentation. > A new user would look at the doc string of, say, scipy.stats.norm, and have > no idea how to proceed. Here's the current example from the docstring of > scipy.stats.norm: > > Examples > -------- > >>> from scipy.stats import norm > >>> numargs = norm.numargs > >>> [ ] = [0.9,] * numargs > >>> rv = norm() > > >>> x = np.linspace(0, np.minimum(rv.dist.b, 3)) > >>> h = plt.plot(x, rv.pdf(x)) > > I don't even know what that means... and it doesn't compile. Also, what > is b? how would I enter mu and sigma to make a normal distribution? It's > all pretty opaque. > True, the examples are confusing. The reason is that they're generated from a template, and it's pretty much impossible to get clear and concise examples that way. It would be better to write custom examples for the most-used distributions, and refer to those from the others. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From vanforeest at gmail.com Sat Sep 15 17:03:46 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sat, 15 Sep 2012 23:03:46 +0200 Subject: [SciPy-Dev] distributions.py In-Reply-To: References:

<50539A12.3000101@astro.washington.edu> Message-ID: Hi, While reading distributions.py I made a kind of private trac list, of stuff that might need refactoring, As a matter of fact, all issues discussed in the mails above are already on my list. To summarize (Please don't take the list below as a complaint, but just factual. I am very happy that all this exists.) 1: the documentation is not clear, too concise, and fragmented; actually a bit messy. 2: there is code overlap in the check work (The lines Ralf mentioned) making it hard to find out the differences (but the differences in the check work are method dependent so I don't quite know how to tackle that in an elegant way), 3: the docs say that _argscheck need to be rewritten in case users build their own distribution. But then the minimal requirement in my opinion is that argscheck is simple to understand, and not overly generic as it is right now. (I also have examples that its output, while in line with its doc string, results in errors.) As far as I can see its core can simply be replaced by np.all(cond) (I did not test this though). 4: distributions.py is very big, too big for me actually. I recall that my first attempt at finding out how the stats stuff worked was to see how expon was implemented. No clue that this resided in distributions.py. What I would like to see, although that would require a considerable amount of work, is an architecture like this. 1 rv_generic.py containing generic stuff 2) rv_continous.py and rv_discrete.py, each imports rv_generic. 3) each distribution is covered in a separate file. like expon.py, norm, py, etc, and imports rv_continuous.py or rv_discrete.py, whatever appropriate. Each docstring can/should contain some generic part (like now) and a specific part, with working examples, and clear explanations. The most important are normal, expon, binom, geom, poisson, and perhaps some others. This would also enable others to help extend the documentation, examples.... 4) I would like to move the math parts in continuous.rst to the doc string in the related distribution file. Since mathjax gives such nice results on screen, there is also no reason not to include the mathematical facts in the doc string of the distribution itself. In fact, most (all?) distributions already have a short math description, but this is in overlap with continuous.rst. I wouldn't mind chopping up distributions.py into the separate distributions, and merge it with the maths of continuous.rst. I can tackle approx one distribution per day roughly, hence reduce this mind-numbing work to roughly 15 minutes a day (correction work on exams is much worse :-) ). But I don't know how much this proposal will affect the automatic generation of documentation. For the rest I don't think this will affect the code a lot. NIcky On 15 September 2012 11:59, Ralf Gommers wrote: > > > On Fri, Sep 14, 2012 at 10:56 PM, Jake Vanderplas > wrote: >> >> On 09/14/2012 01:49 PM, Ralf Gommers wrote: >> >> >> >> On Fri, Sep 14, 2012 at 12:48 AM, wrote: >>> >>> On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest >>> wrote: >>> > Hi, >>> > >>> > Now that I understand github (Thanks to Ralf for his explanations in >>> > Dutch) and got some simple stuff out of the way in distributions.py I >>> > would like to tackle a somewhat harder issue. The function argsreduce >>> > is, as far as I can see, too generic. I did some tests to see whether >>> > its most generic output, as described by its docstring, is actually >>> > swallowed by the callers of argsreduce, but this appears not to be the >>> > case. >>> >>> being generic is not a disadvantage (per se) if it's fast >>> >>> https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665 >>> (and a being a one liner is not a disadvantage either) >>> >>> Josef >>> >>> > >>> > My motivation to simplify the code in distributions.py (and clean it >>> > up) is partly based on making it simpler to understand for myself, but >>> > also to others. The fact that github makes code browsing a much nicer >>> > experience, perhaps more people will take a look at what's under the >>> > hood. But then the code should also be accessible and clean. Are there >>> > any reasons not to pursue this path, and focus on more important >>> > problems of the stats library? >> >> >> Not sure that argsreduce is the best place to start (see Josef's reply), >> but there should be things that can be done to make the code easier to read. >> For example, this code is used in ~10 methods of rv_continuous: >> >> loc,scale=map(kwds.get,['loc','scale']) >> args, loc, scale = self._fix_loc_scale(args, loc, scale) >> x,loc,scale = map(asarray,(x,loc,scale)) >> args = tuple(map(asarray,args)) >> >> Some refactoring may be in order. The same is true of the rest of the >> implementation of many of those methods. Some are exactly the same except >> for calls to the corresponding underscored method (example: logsf() and >> logcdf() are identical except for calls to _logsf() and _logcdf(), and one >> nonsensical multiplication). >> >> Ralf >> >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> I would say that the most important improvement needed in distributions is >> in the documentation. >> >> A new user would look at the doc string of, say, scipy.stats.norm, and >> have no idea how to proceed. Here's the current example from the docstring >> of scipy.stats.norm: >> >> Examples >> -------- >> >>> from scipy.stats import norm >> >>> numargs = norm.numargs >> >>> [ ] = [0.9,] * numargs >> >>> rv = norm() >> >> >>> x = np.linspace(0, np.minimum(rv.dist.b, 3)) >> >>> h = plt.plot(x, rv.pdf(x)) >> >> I don't even know what that means... and it doesn't compile. Also, what >> is b? how would I enter mu and sigma to make a normal distribution? It's >> all pretty opaque. > > > True, the examples are confusing. The reason is that they're generated from > a template, and it's pretty much impossible to get clear and concise > examples that way. It would be better to write custom examples for the > most-used distributions, and refer to those from the others. > > Ralf > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From josef.pktd at gmail.com Sat Sep 15 17:23:36 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 15 Sep 2012 17:23:36 -0400 Subject: [SciPy-Dev] distributions.py In-Reply-To: References:

<50539A12.3000101@astro.washington.edu> Message-ID: On Sat, Sep 15, 2012 at 5:03 PM, nicky van foreest wrote: > Hi, > > While reading distributions.py I made a kind of private trac list, of > stuff that might need refactoring, As a matter of fact, all issues > discussed in the mails above are already on my list. To summarize > (Please don't take the list below as a complaint, but just factual. I > am very happy that all this exists.) > > 1: the documentation is not clear, too concise, and fragmented; > actually a bit messy. > > 2: there is code overlap in the check work (The lines Ralf mentioned) > making it hard to find out the differences (but the differences in the > check work are method dependent so I don't quite know how to tackle > that in an elegant way), > > 3: the docs say that _argscheck need to be rewritten in case users > build their own distribution. But then the minimal requirement in my > opinion is that argscheck is simple to understand, and not overly > generic as it is right now. (I also have examples that its output, > while in line with its doc string, results in errors.) As far as I can > see its core can simply be replaced by np.all(cond) (I did not test > this though). > > 4: distributions.py is very big, too big for me actually. I recall > that my first attempt at finding out how the stats stuff worked was to > see how expon was implemented. No clue that this resided in > distributions.py. > > What I would like to see, although that would require a considerable > amount of work, is an architecture like this. > 1 rv_generic.py containing generic stuff > 2) rv_continous.py and rv_discrete.py, each imports rv_generic. > 3) each distribution is covered in a separate file. like expon.py, > norm, py, etc, and imports rv_continuous.py or rv_discrete.py, > whatever appropriate. I think splitting into continuous and discrete is helpful. But I don't like splitting off the distributions, 90 files for distributions with 10 to 20 lines of real code each sounds a lot of files when we need to look for anything. Actually, I find the large file easy to use, using a search string, and it makes it easy to compare across distributions. Finding the generic parts can be difficult. Josef Each docstring can/should contain some generic > part (like now) and a specific part, with working examples, and clear > explanations. The most important are normal, expon, binom, geom, > poisson, and perhaps some others. This would also enable others to > help extend the documentation, examples.... > 4) I would like to move the math parts in continuous.rst to the doc > string in the related distribution file. Since mathjax gives such > nice results on screen, there is also no reason not to include the > mathematical facts in the doc string of the distribution itself. In > fact, most (all?) distributions already have a short math description, > but this is in overlap with continuous.rst. The main distinction for scipy usually is that docstrings should be readable in the interpreter as informative strings without being heavy on latex, while tutorial, and so on are mainly targeted to html. Josef > > I wouldn't mind chopping up distributions.py into the separate > distributions, and merge it with the maths of continuous.rst. I can > tackle approx one distribution per day roughly, hence reduce this > mind-numbing work to roughly 15 minutes a day (correction work on > exams is much worse :-) ). But I don't know how much this proposal > will affect the automatic generation of documentation. For the rest I > don't think this will affect the code a lot. > > > > NIcky > > > > > > On 15 September 2012 11:59, Ralf Gommers wrote: >> >> >> On Fri, Sep 14, 2012 at 10:56 PM, Jake Vanderplas >> wrote: >>> >>> On 09/14/2012 01:49 PM, Ralf Gommers wrote: >>> >>> >>> >>> On Fri, Sep 14, 2012 at 12:48 AM, wrote: >>>> >>>> On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest >>>> wrote: >>>> > Hi, >>>> > >>>> > Now that I understand github (Thanks to Ralf for his explanations in >>>> > Dutch) and got some simple stuff out of the way in distributions.py I >>>> > would like to tackle a somewhat harder issue. The function argsreduce >>>> > is, as far as I can see, too generic. I did some tests to see whether >>>> > its most generic output, as described by its docstring, is actually >>>> > swallowed by the callers of argsreduce, but this appears not to be the >>>> > case. >>>> >>>> being generic is not a disadvantage (per se) if it's fast >>>> >>>> https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665 >>>> (and a being a one liner is not a disadvantage either) >>>> >>>> Josef >>>> >>>> > >>>> > My motivation to simplify the code in distributions.py (and clean it >>>> > up) is partly based on making it simpler to understand for myself, but >>>> > also to others. The fact that github makes code browsing a much nicer >>>> > experience, perhaps more people will take a look at what's under the >>>> > hood. But then the code should also be accessible and clean. Are there >>>> > any reasons not to pursue this path, and focus on more important >>>> > problems of the stats library? >>> >>> >>> Not sure that argsreduce is the best place to start (see Josef's reply), >>> but there should be things that can be done to make the code easier to read. >>> For example, this code is used in ~10 methods of rv_continuous: >>> >>> loc,scale=map(kwds.get,['loc','scale']) >>> args, loc, scale = self._fix_loc_scale(args, loc, scale) >>> x,loc,scale = map(asarray,(x,loc,scale)) >>> args = tuple(map(asarray,args)) >>> >>> Some refactoring may be in order. The same is true of the rest of the >>> implementation of many of those methods. Some are exactly the same except >>> for calls to the corresponding underscored method (example: logsf() and >>> logcdf() are identical except for calls to _logsf() and _logcdf(), and one >>> nonsensical multiplication). >>> >>> Ralf >>> >>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> I would say that the most important improvement needed in distributions is >>> in the documentation. >>> >>> A new user would look at the doc string of, say, scipy.stats.norm, and >>> have no idea how to proceed. Here's the current example from the docstring >>> of scipy.stats.norm: >>> >>> Examples >>> -------- >>> >>> from scipy.stats import norm >>> >>> numargs = norm.numargs >>> >>> [ ] = [0.9,] * numargs >>> >>> rv = norm() >>> >>> >>> x = np.linspace(0, np.minimum(rv.dist.b, 3)) >>> >>> h = plt.plot(x, rv.pdf(x)) >>> >>> I don't even know what that means... and it doesn't compile. Also, what >>> is b? how would I enter mu and sigma to make a normal distribution? It's >>> all pretty opaque. >> >> >> True, the examples are confusing. The reason is that they're generated from >> a template, and it's pretty much impossible to get clear and concise >> examples that way. It would be better to write custom examples for the >> most-used distributions, and refer to those from the others. >> >> Ralf >> >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From vanderplas at astro.washington.edu Sun Sep 16 11:58:59 2012 From: vanderplas at astro.washington.edu (Jake Vanderplas) Date: Sun, 16 Sep 2012 08:58:59 -0700 Subject: [SciPy-Dev] distributions.py In-Reply-To: References:

<50539A12.3000101@astro.washington.edu> Message-ID: <5055F743.6030400@astro.washington.edu> Nicky, This is great - thanks so much for being willing to take it on! I will plan to help where I can. One comment: I like your idea of splitting the generic code to a separate file. But I'd hesitate to create a separate file for each distribution: that's a lot of files. In my opinion, a good compromise would be to create one file for continuous distributions, and one for discrete. All of this could be in a new "scipy.stats.distributions" submodule, for the sake of code organization. Also, I'd add one more item to your list: make sure all code is PEP8 compliant. Sometimes the PEP8 guidelines can seem a bit cumbersome, but they do make browsing and understanding code much easier. Thanks again for all your work on this - it's a very valuable contribution. Jake On 09/15/2012 02:03 PM, nicky van foreest wrote: > Hi, > > While reading distributions.py I made a kind of private trac list, of > stuff that might need refactoring, As a matter of fact, all issues > discussed in the mails above are already on my list. To summarize > (Please don't take the list below as a complaint, but just factual. I > am very happy that all this exists.) > > 1: the documentation is not clear, too concise, and fragmented; > actually a bit messy. > > 2: there is code overlap in the check work (The lines Ralf mentioned) > making it hard to find out the differences (but the differences in the > check work are method dependent so I don't quite know how to tackle > that in an elegant way), > > 3: the docs say that _argscheck need to be rewritten in case users > build their own distribution. But then the minimal requirement in my > opinion is that argscheck is simple to understand, and not overly > generic as it is right now. (I also have examples that its output, > while in line with its doc string, results in errors.) As far as I can > see its core can simply be replaced by np.all(cond) (I did not test > this though). > > 4: distributions.py is very big, too big for me actually. I recall > that my first attempt at finding out how the stats stuff worked was to > see how expon was implemented. No clue that this resided in > distributions.py. > > What I would like to see, although that would require a considerable > amount of work, is an architecture like this. > 1 rv_generic.py containing generic stuff > 2) rv_continous.py and rv_discrete.py, each imports rv_generic. > 3) each distribution is covered in a separate file. like expon.py, > norm, py, etc, and imports rv_continuous.py or rv_discrete.py, > whatever appropriate. Each docstring can/should contain some generic > part (like now) and a specific part, with working examples, and clear > explanations. The most important are normal, expon, binom, geom, > poisson, and perhaps some others. This would also enable others to > help extend the documentation, examples.... > 4) I would like to move the math parts in continuous.rst to the doc > string in the related distribution file. Since mathjax gives such > nice results on screen, there is also no reason not to include the > mathematical facts in the doc string of the distribution itself. In > fact, most (all?) distributions already have a short math description, > but this is in overlap with continuous.rst. > > I wouldn't mind chopping up distributions.py into the separate > distributions, and merge it with the maths of continuous.rst. I can > tackle approx one distribution per day roughly, hence reduce this > mind-numbing work to roughly 15 minutes a day (correction work on > exams is much worse :-) ). But I don't know how much this proposal > will affect the automatic generation of documentation. For the rest I > don't think this will affect the code a lot. > > > > NIcky > > > > > > On 15 September 2012 11:59, Ralf Gommers wrote: >> >> On Fri, Sep 14, 2012 at 10:56 PM, Jake Vanderplas >> wrote: >>> On 09/14/2012 01:49 PM, Ralf Gommers wrote: >>> >>> >>> >>> On Fri, Sep 14, 2012 at 12:48 AM, wrote: >>>> On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest >>>> wrote: >>>>> Hi, >>>>> >>>>> Now that I understand github (Thanks to Ralf for his explanations in >>>>> Dutch) and got some simple stuff out of the way in distributions.py I >>>>> would like to tackle a somewhat harder issue. The function argsreduce >>>>> is, as far as I can see, too generic. I did some tests to see whether >>>>> its most generic output, as described by its docstring, is actually >>>>> swallowed by the callers of argsreduce, but this appears not to be the >>>>> case. >>>> being generic is not a disadvantage (per se) if it's fast >>>> >>>> https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665 >>>> (and a being a one liner is not a disadvantage either) >>>> >>>> Josef >>>> >>>>> My motivation to simplify the code in distributions.py (and clean it >>>>> up) is partly based on making it simpler to understand for myself, but >>>>> also to others. The fact that github makes code browsing a much nicer >>>>> experience, perhaps more people will take a look at what's under the >>>>> hood. But then the code should also be accessible and clean. Are there >>>>> any reasons not to pursue this path, and focus on more important >>>>> problems of the stats library? >>> >>> Not sure that argsreduce is the best place to start (see Josef's reply), >>> but there should be things that can be done to make the code easier to read. >>> For example, this code is used in ~10 methods of rv_continuous: >>> >>> loc,scale=map(kwds.get,['loc','scale']) >>> args, loc, scale = self._fix_loc_scale(args, loc, scale) >>> x,loc,scale = map(asarray,(x,loc,scale)) >>> args = tuple(map(asarray,args)) >>> >>> Some refactoring may be in order. The same is true of the rest of the >>> implementation of many of those methods. Some are exactly the same except >>> for calls to the corresponding underscored method (example: logsf() and >>> logcdf() are identical except for calls to _logsf() and _logcdf(), and one >>> nonsensical multiplication). >>> >>> Ralf >>> >>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> I would say that the most important improvement needed in distributions is >>> in the documentation. >>> >>> A new user would look at the doc string of, say, scipy.stats.norm, and >>> have no idea how to proceed. Here's the current example from the docstring >>> of scipy.stats.norm: >>> >>> Examples >>> -------- >>>>>> from scipy.stats import norm >>>>>> numargs = norm.numargs >>>>>> [ ] = [0.9,] * numargs >>>>>> rv = norm() >>>>>> x = np.linspace(0, np.minimum(rv.dist.b, 3)) >>>>>> h = plt.plot(x, rv.pdf(x)) >>> I don't even know what that means... and it doesn't compile. Also, what >>> is b? how would I enter mu and sigma to make a normal distribution? It's >>> all pretty opaque. >> >> True, the examples are confusing. The reason is that they're generated from >> a template, and it's pretty much impossible to get clear and concise >> examples that way. It would be better to write custom examples for the >> most-used distributions, and refer to those from the others. >> >> Ralf >> >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From vanforeest at gmail.com Sun Sep 16 14:17:33 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sun, 16 Sep 2012 20:17:33 +0200 Subject: [SciPy-Dev] distributions.py In-Reply-To: <5055F743.6030400@astro.washington.edu> References:

<50539A12.3000101@astro.washington.edu> <5055F743.6030400@astro.washington.edu> Message-ID: > One comment: I like your idea of splitting the generic code to a > separate file. But I'd hesitate to create a separate file for each > distribution: that's a lot of files. In my opinion, a good compromise > would be to create one file for continuous distributions, and one for > discrete. All of this could be in a new "scipy.stats.distributions" > submodule, for the sake of code organization. I just responded to Josef. This proposal makes the most sense I guess. > > Also, I'd add one more item to your list: make sure all code is PEP8 > compliant. Sometimes the PEP8 guidelines can seem a bit cumbersome, but > they do make browsing and understanding code much easier. I'll check the pep8 documentation. I guess that improving the documentation is most important for the moment. Once this is done, we can go on with splitting distributions.py into two or three smaller files. Nicky > > Thanks again for all your work on this - it's a very valuable contribution. > Jake > > On 09/15/2012 02:03 PM, nicky van foreest wrote: >> Hi, >> >> While reading distributions.py I made a kind of private trac list, of >> stuff that might need refactoring, As a matter of fact, all issues >> discussed in the mails above are already on my list. To summarize >> (Please don't take the list below as a complaint, but just factual. I >> am very happy that all this exists.) >> >> 1: the documentation is not clear, too concise, and fragmented; >> actually a bit messy. >> >> 2: there is code overlap in the check work (The lines Ralf mentioned) >> making it hard to find out the differences (but the differences in the >> check work are method dependent so I don't quite know how to tackle >> that in an elegant way), >> >> 3: the docs say that _argscheck need to be rewritten in case users >> build their own distribution. But then the minimal requirement in my >> opinion is that argscheck is simple to understand, and not overly >> generic as it is right now. (I also have examples that its output, >> while in line with its doc string, results in errors.) As far as I can >> see its core can simply be replaced by np.all(cond) (I did not test >> this though). >> >> 4: distributions.py is very big, too big for me actually. I recall >> that my first attempt at finding out how the stats stuff worked was to >> see how expon was implemented. No clue that this resided in >> distributions.py. >> >> What I would like to see, although that would require a considerable >> amount of work, is an architecture like this. >> 1 rv_generic.py containing generic stuff >> 2) rv_continous.py and rv_discrete.py, each imports rv_generic. >> 3) each distribution is covered in a separate file. like expon.py, >> norm, py, etc, and imports rv_continuous.py or rv_discrete.py, >> whatever appropriate. Each docstring can/should contain some generic >> part (like now) and a specific part, with working examples, and clear >> explanations. The most important are normal, expon, binom, geom, >> poisson, and perhaps some others. This would also enable others to >> help extend the documentation, examples.... >> 4) I would like to move the math parts in continuous.rst to the doc >> string in the related distribution file. Since mathjax gives such >> nice results on screen, there is also no reason not to include the >> mathematical facts in the doc string of the distribution itself. In >> fact, most (all?) distributions already have a short math description, >> but this is in overlap with continuous.rst. >> >> I wouldn't mind chopping up distributions.py into the separate >> distributions, and merge it with the maths of continuous.rst. I can >> tackle approx one distribution per day roughly, hence reduce this >> mind-numbing work to roughly 15 minutes a day (correction work on >> exams is much worse :-) ). But I don't know how much this proposal >> will affect the automatic generation of documentation. For the rest I >> don't think this will affect the code a lot. >> >> >> >> NIcky >> >> >> >> >> >> On 15 September 2012 11:59, Ralf Gommers wrote: >>> >>> On Fri, Sep 14, 2012 at 10:56 PM, Jake Vanderplas >>> wrote: >>>> On 09/14/2012 01:49 PM, Ralf Gommers wrote: >>>> >>>> >>>> >>>> On Fri, Sep 14, 2012 at 12:48 AM, wrote: >>>>> On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest >>>>> wrote: >>>>>> Hi, >>>>>> >>>>>> Now that I understand github (Thanks to Ralf for his explanations in >>>>>> Dutch) and got some simple stuff out of the way in distributions.py I >>>>>> would like to tackle a somewhat harder issue. The function argsreduce >>>>>> is, as far as I can see, too generic. I did some tests to see whether >>>>>> its most generic output, as described by its docstring, is actually >>>>>> swallowed by the callers of argsreduce, but this appears not to be the >>>>>> case. >>>>> being generic is not a disadvantage (per se) if it's fast >>>>> >>>>> https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665 >>>>> (and a being a one liner is not a disadvantage either) >>>>> >>>>> Josef >>>>> >>>>>> My motivation to simplify the code in distributions.py (and clean it >>>>>> up) is partly based on making it simpler to understand for myself, but >>>>>> also to others. The fact that github makes code browsing a much nicer >>>>>> experience, perhaps more people will take a look at what's under the >>>>>> hood. But then the code should also be accessible and clean. Are there >>>>>> any reasons not to pursue this path, and focus on more important >>>>>> problems of the stats library? >>>> >>>> Not sure that argsreduce is the best place to start (see Josef's reply), >>>> but there should be things that can be done to make the code easier to read. >>>> For example, this code is used in ~10 methods of rv_continuous: >>>> >>>> loc,scale=map(kwds.get,['loc','scale']) >>>> args, loc, scale = self._fix_loc_scale(args, loc, scale) >>>> x,loc,scale = map(asarray,(x,loc,scale)) >>>> args = tuple(map(asarray,args)) >>>> >>>> Some refactoring may be in order. The same is true of the rest of the >>>> implementation of many of those methods. Some are exactly the same except >>>> for calls to the corresponding underscored method (example: logsf() and >>>> logcdf() are identical except for calls to _logsf() and _logcdf(), and one >>>> nonsensical multiplication). >>>> >>>> Ralf >>>> >>>> >>>> >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>>> >>>> I would say that the most important improvement needed in distributions is >>>> in the documentation. >>>> >>>> A new user would look at the doc string of, say, scipy.stats.norm, and >>>> have no idea how to proceed. Here's the current example from the docstring >>>> of scipy.stats.norm: >>>> >>>> Examples >>>> -------- >>>>>>> from scipy.stats import norm >>>>>>> numargs = norm.numargs >>>>>>> [ ] = [0.9,] * numargs >>>>>>> rv = norm() >>>>>>> x = np.linspace(0, np.minimum(rv.dist.b, 3)) >>>>>>> h = plt.plot(x, rv.pdf(x)) >>>> I don't even know what that means... and it doesn't compile. Also, what >>>> is b? how would I enter mu and sigma to make a normal distribution? It's >>>> all pretty opaque. >>> >>> True, the examples are confusing. The reason is that they're generated from >>> a template, and it's pretty much impossible to get clear and concise >>> examples that way. It would be better to write custom examples for the >>> most-used distributions, and refer to those from the others. >>> >>> Ralf >>> >>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From warren.weckesser at enthought.com Sun Sep 16 14:34:04 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Sun, 16 Sep 2012 13:34:04 -0500 Subject: [SciPy-Dev] distributions.py In-Reply-To: References:

<50539A12.3000101@astro.washington.edu> <5055F743.6030400@astro.washington.edu> Message-ID: On Sun, Sep 16, 2012 at 1:17 PM, nicky van foreest wrote: > > One comment: I like your idea of splitting the generic code to a > > separate file. But I'd hesitate to create a separate file for each > > distribution: that's a lot of files. In my opinion, a good compromise > > would be to create one file for continuous distributions, and one for > > discrete. All of this could be in a new "scipy.stats.distributions" > > submodule, for the sake of code organization. > > I just responded to Josef. This proposal makes the most sense I guess. > > > > > Also, I'd add one more item to your list: make sure all code is PEP8 > > compliant. Sometimes the PEP8 guidelines can seem a bit cumbersome, but > > they do make browsing and understanding code much easier. > > I'll check the pep8 documentation. > > I guess that improving the documentation is most important for the > moment. Once this is done, we can go on with splitting > distributions.py into two or three smaller files. > > FWIW, I'm strongly in favor of the following: * Split distributions.py into three pieces (generic, discrete, continuous). * Fix the screwy docstrings of the distributions (see the example Jake showed in a previous email). * PEP8. (Use the pep8 program to check the code. I just got 1884 "errors" when I ran 'pep8 --repeat distributions.py | wc -l'.) Warren Nicky > > > > > Thanks again for all your work on this - it's a very valuable > contribution. > > Jake > > > > On 09/15/2012 02:03 PM, nicky van foreest wrote: > >> Hi, > >> > >> While reading distributions.py I made a kind of private trac list, of > >> stuff that might need refactoring, As a matter of fact, all issues > >> discussed in the mails above are already on my list. To summarize > >> (Please don't take the list below as a complaint, but just factual. I > >> am very happy that all this exists.) > >> > >> 1: the documentation is not clear, too concise, and fragmented; > >> actually a bit messy. > >> > >> 2: there is code overlap in the check work (The lines Ralf mentioned) > >> making it hard to find out the differences (but the differences in the > >> check work are method dependent so I don't quite know how to tackle > >> that in an elegant way), > >> > >> 3: the docs say that _argscheck need to be rewritten in case users > >> build their own distribution. But then the minimal requirement in my > >> opinion is that argscheck is simple to understand, and not overly > >> generic as it is right now. (I also have examples that its output, > >> while in line with its doc string, results in errors.) As far as I can > >> see its core can simply be replaced by np.all(cond) (I did not test > >> this though). > >> > >> 4: distributions.py is very big, too big for me actually. I recall > >> that my first attempt at finding out how the stats stuff worked was to > >> see how expon was implemented. No clue that this resided in > >> distributions.py. > >> > >> What I would like to see, although that would require a considerable > >> amount of work, is an architecture like this. > >> 1 rv_generic.py containing generic stuff > >> 2) rv_continous.py and rv_discrete.py, each imports rv_generic. > >> 3) each distribution is covered in a separate file. like expon.py, > >> norm, py, etc, and imports rv_continuous.py or rv_discrete.py, > >> whatever appropriate. Each docstring can/should contain some generic > >> part (like now) and a specific part, with working examples, and clear > >> explanations. The most important are normal, expon, binom, geom, > >> poisson, and perhaps some others. This would also enable others to > >> help extend the documentation, examples.... > >> 4) I would like to move the math parts in continuous.rst to the doc > >> string in the related distribution file. Since mathjax gives such > >> nice results on screen, there is also no reason not to include the > >> mathematical facts in the doc string of the distribution itself. In > >> fact, most (all?) distributions already have a short math description, > >> but this is in overlap with continuous.rst. > >> > >> I wouldn't mind chopping up distributions.py into the separate > >> distributions, and merge it with the maths of continuous.rst. I can > >> tackle approx one distribution per day roughly, hence reduce this > >> mind-numbing work to roughly 15 minutes a day (correction work on > >> exams is much worse :-) ). But I don't know how much this proposal > >> will affect the automatic generation of documentation. For the rest I > >> don't think this will affect the code a lot. > >> > >> > >> > >> NIcky > >> > >> > >> > >> > >> > >> On 15 September 2012 11:59, Ralf Gommers > wrote: > >>> > >>> On Fri, Sep 14, 2012 at 10:56 PM, Jake Vanderplas > >>> wrote: > >>>> On 09/14/2012 01:49 PM, Ralf Gommers wrote: > >>>> > >>>> > >>>> > >>>> On Fri, Sep 14, 2012 at 12:48 AM, wrote: > >>>>> On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest < > vanforeest at gmail.com> > >>>>> wrote: > >>>>>> Hi, > >>>>>> > >>>>>> Now that I understand github (Thanks to Ralf for his explanations in > >>>>>> Dutch) and got some simple stuff out of the way in distributions.py > I > >>>>>> would like to tackle a somewhat harder issue. The function > argsreduce > >>>>>> is, as far as I can see, too generic. I did some tests to see > whether > >>>>>> its most generic output, as described by its docstring, is actually > >>>>>> swallowed by the callers of argsreduce, but this appears not to be > the > >>>>>> case. > >>>>> being generic is not a disadvantage (per se) if it's fast > >>>>> > >>>>> > https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665 > >>>>> (and a being a one liner is not a disadvantage either) > >>>>> > >>>>> Josef > >>>>> > >>>>>> My motivation to simplify the code in distributions.py (and clean it > >>>>>> up) is partly based on making it simpler to understand for myself, > but > >>>>>> also to others. The fact that github makes code browsing a much > nicer > >>>>>> experience, perhaps more people will take a look at what's under the > >>>>>> hood. But then the code should also be accessible and clean. Are > there > >>>>>> any reasons not to pursue this path, and focus on more important > >>>>>> problems of the stats library? > >>>> > >>>> Not sure that argsreduce is the best place to start (see Josef's > reply), > >>>> but there should be things that can be done to make the code easier > to read. > >>>> For example, this code is used in ~10 methods of rv_continuous: > >>>> > >>>> loc,scale=map(kwds.get,['loc','scale']) > >>>> args, loc, scale = self._fix_loc_scale(args, loc, scale) > >>>> x,loc,scale = map(asarray,(x,loc,scale)) > >>>> args = tuple(map(asarray,args)) > >>>> > >>>> Some refactoring may be in order. The same is true of the rest of the > >>>> implementation of many of those methods. Some are exactly the same > except > >>>> for calls to the corresponding underscored method (example: logsf() > and > >>>> logcdf() are identical except for calls to _logsf() and _logcdf(), > and one > >>>> nonsensical multiplication). > >>>> > >>>> Ralf > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> SciPy-Dev mailing list > >>>> SciPy-Dev at scipy.org > >>>> http://mail.scipy.org/mailman/listinfo/scipy-dev > >>>> > >>>> I would say that the most important improvement needed in > distributions is > >>>> in the documentation. > >>>> > >>>> A new user would look at the doc string of, say, scipy.stats.norm, and > >>>> have no idea how to proceed. Here's the current example from the > docstring > >>>> of scipy.stats.norm: > >>>> > >>>> Examples > >>>> -------- > >>>>>>> from scipy.stats import norm > >>>>>>> numargs = norm.numargs > >>>>>>> [ ] = [0.9,] * numargs > >>>>>>> rv = norm() > >>>>>>> x = np.linspace(0, np.minimum(rv.dist.b, 3)) > >>>>>>> h = plt.plot(x, rv.pdf(x)) > >>>> I don't even know what that means... and it doesn't compile. Also, > what > >>>> is b? how would I enter mu and sigma to make a normal distribution? > It's > >>>> all pretty opaque. > >>> > >>> True, the examples are confusing. The reason is that they're generated > from > >>> a template, and it's pretty much impossible to get clear and concise > >>> examples that way. It would be better to write custom examples for the > >>> most-used distributions, and refer to those from the others. > >>> > >>> Ralf > >>> > >>> > >>> > >>> _______________________________________________ > >>> SciPy-Dev mailing list > >>> SciPy-Dev at scipy.org > >>> http://mail.scipy.org/mailman/listinfo/scipy-dev > >>> > >> _______________________________________________ > >> SciPy-Dev mailing list > >> SciPy-Dev at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Sep 16 14:39:25 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 16 Sep 2012 14:39:25 -0400 Subject: [SciPy-Dev] distributions.py In-Reply-To: References:

<50539A12.3000101@astro.washington.edu> Message-ID: On Sat, Sep 15, 2012 at 5:03 PM, nicky van foreest wrote: > Hi, > > While reading distributions.py I made a kind of private trac list, of > stuff that might need refactoring, As a matter of fact, all issues > discussed in the mails above are already on my list. To summarize > (Please don't take the list below as a complaint, but just factual. I > am very happy that all this exists.) > > 1: the documentation is not clear, too concise, and fragmented; > actually a bit messy. > > 2: there is code overlap in the check work (The lines Ralf mentioned) > making it hard to find out the differences (but the differences in the > check work are method dependent so I don't quite know how to tackle > that in an elegant way), I looked in the past how the conditions are build, and I gave up trying to unify them after a short time. pdf is zero outside of support cdf, sf is zero or one outside of support ppf, isf produces nan if not in [0,1] boundary points are either included or treated explicitly all produce nan if shape parameter is invalid. reading the conditions for all corner cases might cause headaches :) > > 3: the docs say that _argscheck need to be rewritten in case users > build their own distribution. But then the minimal requirement in my > opinion is that argscheck is simple to understand, and not overly > generic as it is right now. (I also have examples that its output, > while in line with its doc string, results in errors.) As far as I can > see its core can simply be replaced by np.all(cond) (I did not test > this though). np.all(cond) will not work from code comment: "Returns condition array of 1's where arguments are correct and 0's where they are not." _argcheck is *elementwise* check for valid parameters furthermore, in some cases _argcheck needs to set a, b if those depend on shape parameters. no ``def __init__()`` Josef > > 4: distributions.py is very big, too big for me actually. I recall > that my first attempt at finding out how the stats stuff worked was to > see how expon was implemented. No clue that this resided in > distributions.py. > > What I would like to see, although that would require a considerable > amount of work, is an architecture like this. > 1 rv_generic.py containing generic stuff > 2) rv_continous.py and rv_discrete.py, each imports rv_generic. > 3) each distribution is covered in a separate file. like expon.py, > norm, py, etc, and imports rv_continuous.py or rv_discrete.py, > whatever appropriate. Each docstring can/should contain some generic > part (like now) and a specific part, with working examples, and clear > explanations. The most important are normal, expon, binom, geom, > poisson, and perhaps some others. This would also enable others to > help extend the documentation, examples.... > 4) I would like to move the math parts in continuous.rst to the doc > string in the related distribution file. Since mathjax gives such > nice results on screen, there is also no reason not to include the > mathematical facts in the doc string of the distribution itself. In > fact, most (all?) distributions already have a short math description, > but this is in overlap with continuous.rst. > > I wouldn't mind chopping up distributions.py into the separate > distributions, and merge it with the maths of continuous.rst. I can > tackle approx one distribution per day roughly, hence reduce this > mind-numbing work to roughly 15 minutes a day (correction work on > exams is much worse :-) ). But I don't know how much this proposal > will affect the automatic generation of documentation. For the rest I > don't think this will affect the code a lot. > > > > NIcky > > > > > > On 15 September 2012 11:59, Ralf Gommers wrote: >> >> >> On Fri, Sep 14, 2012 at 10:56 PM, Jake Vanderplas >> wrote: >>> >>> On 09/14/2012 01:49 PM, Ralf Gommers wrote: >>> >>> >>> >>> On Fri, Sep 14, 2012 at 12:48 AM, wrote: >>>> >>>> On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest >>>> wrote: >>>> > Hi, >>>> > >>>> > Now that I understand github (Thanks to Ralf for his explanations in >>>> > Dutch) and got some simple stuff out of the way in distributions.py I >>>> > would like to tackle a somewhat harder issue. The function argsreduce >>>> > is, as far as I can see, too generic. I did some tests to see whether >>>> > its most generic output, as described by its docstring, is actually >>>> > swallowed by the callers of argsreduce, but this appears not to be the >>>> > case. >>>> >>>> being generic is not a disadvantage (per se) if it's fast >>>> >>>> https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665 >>>> (and a being a one liner is not a disadvantage either) >>>> >>>> Josef >>>> >>>> > >>>> > My motivation to simplify the code in distributions.py (and clean it >>>> > up) is partly based on making it simpler to understand for myself, but >>>> > also to others. The fact that github makes code browsing a much nicer >>>> > experience, perhaps more people will take a look at what's under the >>>> > hood. But then the code should also be accessible and clean. Are there >>>> > any reasons not to pursue this path, and focus on more important >>>> > problems of the stats library? >>> >>> >>> Not sure that argsreduce is the best place to start (see Josef's reply), >>> but there should be things that can be done to make the code easier to read. >>> For example, this code is used in ~10 methods of rv_continuous: >>> >>> loc,scale=map(kwds.get,['loc','scale']) >>> args, loc, scale = self._fix_loc_scale(args, loc, scale) >>> x,loc,scale = map(asarray,(x,loc,scale)) >>> args = tuple(map(asarray,args)) >>> >>> Some refactoring may be in order. The same is true of the rest of the >>> implementation of many of those methods. Some are exactly the same except >>> for calls to the corresponding underscored method (example: logsf() and >>> logcdf() are identical except for calls to _logsf() and _logcdf(), and one >>> nonsensical multiplication). >>> >>> Ralf >>> >>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> I would say that the most important improvement needed in distributions is >>> in the documentation. >>> >>> A new user would look at the doc string of, say, scipy.stats.norm, and >>> have no idea how to proceed. Here's the current example from the docstring >>> of scipy.stats.norm: >>> >>> Examples >>> -------- >>> >>> from scipy.stats import norm >>> >>> numargs = norm.numargs >>> >>> [ ] = [0.9,] * numargs >>> >>> rv = norm() >>> >>> >>> x = np.linspace(0, np.minimum(rv.dist.b, 3)) >>> >>> h = plt.plot(x, rv.pdf(x)) >>> >>> I don't even know what that means... and it doesn't compile. Also, what >>> is b? how would I enter mu and sigma to make a normal distribution? It's >>> all pretty opaque. >> >> >> True, the examples are confusing. The reason is that they're generated from >> a template, and it's pretty much impossible to get clear and concise >> examples that way. It would be better to write custom examples for the >> most-used distributions, and refer to those from the others. >> >> Ralf >> >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From fboulogne at sciunto.org Sun Sep 16 15:11:49 2012 From: fboulogne at sciunto.org (=?ISO-8859-1?Q?Fran=E7ois_Boulogne?=) Date: Sun, 16 Sep 2012 21:11:49 +0200 Subject: [SciPy-Dev] distributions.py In-Reply-To: References:

<50539A12.3000101@astro.washington.edu> <5055F743.6030400@astro.washington.edu>

Message-ID: <50562475.1010204@sciunto.org> Le 16/09/2012 20:34, Warren Weckesser a ?crit : > > > > FWIW, I'm strongly in favor of the following: > * Split distributions.py into three pieces (generic, discrete, > continuous). > * Fix the screwy docstrings of the distributions (see the example Jake > showed in a previous email). > * PEP8. (Use the pep8 program to check the code. I just got 1884 > "errors" when I ran 'pep8 --repeat distributions.py | wc -l'.) FYI, I just made a pull request on a doctring. :) https://github.com/scipy/scipy/pull/312 -- Fran?ois Boulogne. https://www.sciunto.org From vanforeest at gmail.com Sun Sep 16 15:28:42 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sun, 16 Sep 2012 21:28:42 +0200 Subject: [SciPy-Dev] distributions.py In-Reply-To: References:

<50539A12.3000101@astro.washington.edu> Message-ID: Hi Josef, > I think splitting into continuous and discrete is helpful. > > But I don't like splitting off the distributions, 90 files for > distributions with 10 to 20 lines of real code each sounds a lot of > files when we need to look for anything. > > Actually, I find the large file easy to use, using a search string, > and it makes it easy to compare across distributions. Finding the > generic parts can be difficult. Ok. This makes sense to me too. > > Josef > > Each docstring can/should contain some generic >> part (like now) and a specific part, with working examples, and clear >> explanations. The most important are normal, expon, binom, geom, >> poisson, and perhaps some others. This would also enable others to >> help extend the documentation, examples.... >> 4) I would like to move the math parts in continuous.rst to the doc >> string in the related distribution file. Since mathjax gives such >> nice results on screen, there is also no reason not to include the >> mathematical facts in the doc string of the distribution itself. In >> fact, most (all?) distributions already have a short math description, >> but this is in overlap with continuous.rst. > > The main distinction for scipy usually is that docstrings should be > readable in the interpreter as informative strings without being heavy > on latex, while tutorial, and so on are mainly targeted to html. I forgot about reading the docstrings in ipython for instance. You're right. Nicky > > Josef > >> >> I wouldn't mind chopping up distributions.py into the separate >> distributions, and merge it with the maths of continuous.rst. I can >> tackle approx one distribution per day roughly, hence reduce this >> mind-numbing work to roughly 15 minutes a day (correction work on >> exams is much worse :-) ). But I don't know how much this proposal >> will affect the automatic generation of documentation. For the rest I >> don't think this will affect the code a lot. >> >> >> >> NIcky >> >> >> >> >> >> On 15 September 2012 11:59, Ralf Gommers wrote: >>> >>> >>> On Fri, Sep 14, 2012 at 10:56 PM, Jake Vanderplas >>> wrote: >>>> >>>> On 09/14/2012 01:49 PM, Ralf Gommers wrote: >>>> >>>> >>>> >>>> On Fri, Sep 14, 2012 at 12:48 AM, wrote: >>>>> >>>>> On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest >>>>> wrote: >>>>> > Hi, >>>>> > >>>>> > Now that I understand github (Thanks to Ralf for his explanations in >>>>> > Dutch) and got some simple stuff out of the way in distributions.py I >>>>> > would like to tackle a somewhat harder issue. The function argsreduce >>>>> > is, as far as I can see, too generic. I did some tests to see whether >>>>> > its most generic output, as described by its docstring, is actually >>>>> > swallowed by the callers of argsreduce, but this appears not to be the >>>>> > case. >>>>> >>>>> being generic is not a disadvantage (per se) if it's fast >>>>> >>>>> https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665 >>>>> (and a being a one liner is not a disadvantage either) >>>>> >>>>> Josef >>>>> >>>>> > >>>>> > My motivation to simplify the code in distributions.py (and clean it >>>>> > up) is partly based on making it simpler to understand for myself, but >>>>> > also to others. The fact that github makes code browsing a much nicer >>>>> > experience, perhaps more people will take a look at what's under the >>>>> > hood. But then the code should also be accessible and clean. Are there >>>>> > any reasons not to pursue this path, and focus on more important >>>>> > problems of the stats library? >>>> >>>> >>>> Not sure that argsreduce is the best place to start (see Josef's reply), >>>> but there should be things that can be done to make the code easier to read. >>>> For example, this code is used in ~10 methods of rv_continuous: >>>> >>>> loc,scale=map(kwds.get,['loc','scale']) >>>> args, loc, scale = self._fix_loc_scale(args, loc, scale) >>>> x,loc,scale = map(asarray,(x,loc,scale)) >>>> args = tuple(map(asarray,args)) >>>> >>>> Some refactoring may be in order. The same is true of the rest of the >>>> implementation of many of those methods. Some are exactly the same except >>>> for calls to the corresponding underscored method (example: logsf() and >>>> logcdf() are identical except for calls to _logsf() and _logcdf(), and one >>>> nonsensical multiplication). >>>> >>>> Ralf >>>> >>>> >>>> >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>>> >>>> I would say that the most important improvement needed in distributions is >>>> in the documentation. >>>> >>>> A new user would look at the doc string of, say, scipy.stats.norm, and >>>> have no idea how to proceed. Here's the current example from the docstring >>>> of scipy.stats.norm: >>>> >>>> Examples >>>> -------- >>>> >>> from scipy.stats import norm >>>> >>> numargs = norm.numargs >>>> >>> [ ] = [0.9,] * numargs >>>> >>> rv = norm() >>>> >>>> >>> x = np.linspace(0, np.minimum(rv.dist.b, 3)) >>>> >>> h = plt.plot(x, rv.pdf(x)) >>>> >>>> I don't even know what that means... and it doesn't compile. Also, what >>>> is b? how would I enter mu and sigma to make a normal distribution? It's >>>> all pretty opaque. >>> >>> >>> True, the examples are confusing. The reason is that they're generated from >>> a template, and it's pretty much impossible to get clear and concise >>> examples that way. It would be better to write custom examples for the >>> most-used distributions, and refer to those from the others. >>> >>> Ralf >>> >>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From vanforeest at gmail.com Sun Sep 16 15:33:29 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sun, 16 Sep 2012 21:33:29 +0200 Subject: [SciPy-Dev] distributions.py In-Reply-To: References:

<50539A12.3000101@astro.washington.edu> <5055F743.6030400@astro.washington.edu>

Message-ID: On 16 September 2012 20:34, Warren Weckesser wrote: > > > On Sun, Sep 16, 2012 at 1:17 PM, nicky van foreest > wrote: >> >> > One comment: I like your idea of splitting the generic code to a >> > separate file. But I'd hesitate to create a separate file for each >> > distribution: that's a lot of files. In my opinion, a good compromise >> > would be to create one file for continuous distributions, and one for >> > discrete. All of this could be in a new "scipy.stats.distributions" >> > submodule, for the sake of code organization. >> >> I just responded to Josef. This proposal makes the most sense I guess. >> >> > >> > Also, I'd add one more item to your list: make sure all code is PEP8 >> > compliant. Sometimes the PEP8 guidelines can seem a bit cumbersome, but >> > they do make browsing and understanding code much easier. >> >> I'll check the pep8 documentation. >> >> I guess that improving the documentation is most important for the >> moment. Once this is done, we can go on with splitting >> distributions.py into two or three smaller files. >> > > > FWIW, I'm strongly in favor of the following: > * Split distributions.py into three pieces (generic, discrete, continuous). I'll put that in my local to do list. > * Fix the screwy docstrings of the distributions (see the example Jake > showed in a previous email). As said in the mail to Jake, I think it is best to update the docstrings first. That confused me the most when I started using scipy.stats the most. > * PEP8. (Use the pep8 program to check the code. I just got 1884 "errors" > when I ran 'pep8 --repeat distributions.py | wc -l'.) I test the code I contribute with pep8.py. Nicky > > Warren > > >> Nicky >> >> > >> > Thanks again for all your work on this - it's a very valuable >> > contribution. >> > Jake >> > >> > On 09/15/2012 02:03 PM, nicky van foreest wrote: >> >> Hi, >> >> >> >> While reading distributions.py I made a kind of private trac list, of >> >> stuff that might need refactoring, As a matter of fact, all issues >> >> discussed in the mails above are already on my list. To summarize >> >> (Please don't take the list below as a complaint, but just factual. I >> >> am very happy that all this exists.) >> >> >> >> 1: the documentation is not clear, too concise, and fragmented; >> >> actually a bit messy. >> >> >> >> 2: there is code overlap in the check work (The lines Ralf mentioned) >> >> making it hard to find out the differences (but the differences in the >> >> check work are method dependent so I don't quite know how to tackle >> >> that in an elegant way), >> >> >> >> 3: the docs say that _argscheck need to be rewritten in case users >> >> build their own distribution. But then the minimal requirement in my >> >> opinion is that argscheck is simple to understand, and not overly >> >> generic as it is right now. (I also have examples that its output, >> >> while in line with its doc string, results in errors.) As far as I can >> >> see its core can simply be replaced by np.all(cond) (I did not test >> >> this though). >> >> >> >> 4: distributions.py is very big, too big for me actually. I recall >> >> that my first attempt at finding out how the stats stuff worked was to >> >> see how expon was implemented. No clue that this resided in >> >> distributions.py. >> >> >> >> What I would like to see, although that would require a considerable >> >> amount of work, is an architecture like this. >> >> 1 rv_generic.py containing generic stuff >> >> 2) rv_continous.py and rv_discrete.py, each imports rv_generic. >> >> 3) each distribution is covered in a separate file. like expon.py, >> >> norm, py, etc, and imports rv_continuous.py or rv_discrete.py, >> >> whatever appropriate. Each docstring can/should contain some generic >> >> part (like now) and a specific part, with working examples, and clear >> >> explanations. The most important are normal, expon, binom, geom, >> >> poisson, and perhaps some others. This would also enable others to >> >> help extend the documentation, examples.... >> >> 4) I would like to move the math parts in continuous.rst to the doc >> >> string in the related distribution file. Since mathjax gives such >> >> nice results on screen, there is also no reason not to include the >> >> mathematical facts in the doc string of the distribution itself. In >> >> fact, most (all?) distributions already have a short math description, >> >> but this is in overlap with continuous.rst. >> >> >> >> I wouldn't mind chopping up distributions.py into the separate >> >> distributions, and merge it with the maths of continuous.rst. I can >> >> tackle approx one distribution per day roughly, hence reduce this >> >> mind-numbing work to roughly 15 minutes a day (correction work on >> >> exams is much worse :-) ). But I don't know how much this proposal >> >> will affect the automatic generation of documentation. For the rest I >> >> don't think this will affect the code a lot. >> >> >> >> >> >> >> >> NIcky >> >> >> >> >> >> >> >> >> >> >> >> On 15 September 2012 11:59, Ralf Gommers >> >> wrote: >> >>> >> >>> On Fri, Sep 14, 2012 at 10:56 PM, Jake Vanderplas >> >>> wrote: >> >>>> On 09/14/2012 01:49 PM, Ralf Gommers wrote: >> >>>> >> >>>> >> >>>> >> >>>> On Fri, Sep 14, 2012 at 12:48 AM, wrote: >> >>>>> On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest >> >>>>> >> >>>>> wrote: >> >>>>>> Hi, >> >>>>>> >> >>>>>> Now that I understand github (Thanks to Ralf for his explanations >> >>>>>> in >> >>>>>> Dutch) and got some simple stuff out of the way in distributions.py >> >>>>>> I >> >>>>>> would like to tackle a somewhat harder issue. The function >> >>>>>> argsreduce >> >>>>>> is, as far as I can see, too generic. I did some tests to see >> >>>>>> whether >> >>>>>> its most generic output, as described by its docstring, is actually >> >>>>>> swallowed by the callers of argsreduce, but this appears not to be >> >>>>>> the >> >>>>>> case. >> >>>>> being generic is not a disadvantage (per se) if it's fast >> >>>>> >> >>>>> >> >>>>> https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665 >> >>>>> (and a being a one liner is not a disadvantage either) >> >>>>> >> >>>>> Josef >> >>>>> >> >>>>>> My motivation to simplify the code in distributions.py (and clean >> >>>>>> it >> >>>>>> up) is partly based on making it simpler to understand for myself, >> >>>>>> but >> >>>>>> also to others. The fact that github makes code browsing a much >> >>>>>> nicer >> >>>>>> experience, perhaps more people will take a look at what's under >> >>>>>> the >> >>>>>> hood. But then the code should also be accessible and clean. Are >> >>>>>> there >> >>>>>> any reasons not to pursue this path, and focus on more important >> >>>>>> problems of the stats library? >> >>>> >> >>>> Not sure that argsreduce is the best place to start (see Josef's >> >>>> reply), >> >>>> but there should be things that can be done to make the code easier >> >>>> to read. >> >>>> For example, this code is used in ~10 methods of rv_continuous: >> >>>> >> >>>> loc,scale=map(kwds.get,['loc','scale']) >> >>>> args, loc, scale = self._fix_loc_scale(args, loc, scale) >> >>>> x,loc,scale = map(asarray,(x,loc,scale)) >> >>>> args = tuple(map(asarray,args)) >> >>>> >> >>>> Some refactoring may be in order. The same is true of the rest of the >> >>>> implementation of many of those methods. Some are exactly the same >> >>>> except >> >>>> for calls to the corresponding underscored method (example: logsf() >> >>>> and >> >>>> logcdf() are identical except for calls to _logsf() and _logcdf(), >> >>>> and one >> >>>> nonsensical multiplication). >> >>>> >> >>>> Ralf >> >>>> >> >>>> >> >>>> >> >>>> _______________________________________________ >> >>>> SciPy-Dev mailing list >> >>>> SciPy-Dev at scipy.org >> >>>> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >>>> >> >>>> I would say that the most important improvement needed in >> >>>> distributions is >> >>>> in the documentation. >> >>>> >> >>>> A new user would look at the doc string of, say, scipy.stats.norm, >> >>>> and >> >>>> have no idea how to proceed. Here's the current example from the >> >>>> docstring >> >>>> of scipy.stats.norm: >> >>>> >> >>>> Examples >> >>>> -------- >> >>>>>>> from scipy.stats import norm >> >>>>>>> numargs = norm.numargs >> >>>>>>> [ ] = [0.9,] * numargs >> >>>>>>> rv = norm() >> >>>>>>> x = np.linspace(0, np.minimum(rv.dist.b, 3)) >> >>>>>>> h = plt.plot(x, rv.pdf(x)) >> >>>> I don't even know what that means... and it doesn't compile. Also, >> >>>> what >> >>>> is b? how would I enter mu and sigma to make a normal distribution? >> >>>> It's >> >>>> all pretty opaque. >> >>> >> >>> True, the examples are confusing. The reason is that they're generated >> >>> from >> >>> a template, and it's pretty much impossible to get clear and concise >> >>> examples that way. It would be better to write custom examples for the >> >>> most-used distributions, and refer to those from the others. >> >>> >> >>> Ralf >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> SciPy-Dev mailing list >> >>> SciPy-Dev at scipy.org >> >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >>> >> >> _______________________________________________ >> >> SciPy-Dev mailing list >> >> SciPy-Dev at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > >> > >> > _______________________________________________ >> > SciPy-Dev mailing list >> > SciPy-Dev at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-dev >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From vanforeest at gmail.com Sun Sep 16 15:36:33 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sun, 16 Sep 2012 21:36:33 +0200 Subject: [SciPy-Dev] distributions.py In-Reply-To: References:

<50539A12.3000101@astro.washington.edu>

Message-ID: > I looked in the past how the conditions are build, and I gave up > trying to unify them after a short time. > pdf is zero outside of support > cdf, sf is zero or one outside of support > ppf, isf produces nan if not in [0,1] > > boundary points are either included or treated explicitly > all produce nan if shape parameter is invalid. > > reading the conditions for all corner cases might cause headaches :) The more I think about it, the more I tend to agree. There are many distributions, with lots of properties. However, there may be some simple lines that the conditions have in common. I'll check this after I completed work on the documentation, and the split of distributions.py. >> 3: the docs say that _argscheck need to be rewritten in case users >> build their own distribution. But then the minimal requirement in my >> opinion is that argscheck is simple to understand, and not overly >> generic as it is right now. (I also have examples that its output, >> while in line with its doc string, results in errors.) As far as I can >> see its core can simply be replaced by np.all(cond) (I did not test >> this though). > > np.all(cond) will not work > > from code comment: > "Returns condition array of 1's where arguments are correct and 0's > where they are not." > > _argcheck is *elementwise* check for valid parameters > furthermore, in some cases _argcheck needs to set a, b if those depend > on shape parameters. > > > no ``def __init__()`` > I'll save this comment in my todo list, and turn to it later. Nicky > Josef > >> >> 4: distributions.py is very big, too big for me actually. I recall >> that my first attempt at finding out how the stats stuff worked was to >> see how expon was implemented. No clue that this resided in >> distributions.py. >> >> What I would like to see, although that would require a considerable >> amount of work, is an architecture like this. >> 1 rv_generic.py containing generic stuff >> 2) rv_continous.py and rv_discrete.py, each imports rv_generic. >> 3) each distribution is covered in a separate file. like expon.py, >> norm, py, etc, and imports rv_continuous.py or rv_discrete.py, >> whatever appropriate. Each docstring can/should contain some generic >> part (like now) and a specific part, with working examples, and clear >> explanations. The most important are normal, expon, binom, geom, >> poisson, and perhaps some others. This would also enable others to >> help extend the documentation, examples.... >> 4) I would like to move the math parts in continuous.rst to the doc >> string in the related distribution file. Since mathjax gives such >> nice results on screen, there is also no reason not to include the >> mathematical facts in the doc string of the distribution itself. In >> fact, most (all?) distributions already have a short math description, >> but this is in overlap with continuous.rst. >> >> I wouldn't mind chopping up distributions.py into the separate >> distributions, and merge it with the maths of continuous.rst. I can >> tackle approx one distribution per day roughly, hence reduce this >> mind-numbing work to roughly 15 minutes a day (correction work on >> exams is much worse :-) ). But I don't know how much this proposal >> will affect the automatic generation of documentation. For the rest I >> don't think this will affect the code a lot. >> >> >> >> NIcky >> >> >> >> >> >> On 15 September 2012 11:59, Ralf Gommers wrote: >>> >>> >>> On Fri, Sep 14, 2012 at 10:56 PM, Jake Vanderplas >>> wrote: >>>> >>>> On 09/14/2012 01:49 PM, Ralf Gommers wrote: >>>> >>>> >>>> >>>> On Fri, Sep 14, 2012 at 12:48 AM, wrote: >>>>> >>>>> On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest >>>>> wrote: >>>>> > Hi, >>>>> > >>>>> > Now that I understand github (Thanks to Ralf for his explanations in >>>>> > Dutch) and got some simple stuff out of the way in distributions.py I >>>>> > would like to tackle a somewhat harder issue. The function argsreduce >>>>> > is, as far as I can see, too generic. I did some tests to see whether >>>>> > its most generic output, as described by its docstring, is actually >>>>> > swallowed by the callers of argsreduce, but this appears not to be the >>>>> > case. >>>>> >>>>> being generic is not a disadvantage (per se) if it's fast >>>>> >>>>> https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665 >>>>> (and a being a one liner is not a disadvantage either) >>>>> >>>>> Josef >>>>> >>>>> > >>>>> > My motivation to simplify the code in distributions.py (and clean it >>>>> > up) is partly based on making it simpler to understand for myself, but >>>>> > also to others. The fact that github makes code browsing a much nicer >>>>> > experience, perhaps more people will take a look at what's under the >>>>> > hood. But then the code should also be accessible and clean. Are there >>>>> > any reasons not to pursue this path, and focus on more important >>>>> > problems of the stats library? >>>> >>>> >>>> Not sure that argsreduce is the best place to start (see Josef's reply), >>>> but there should be things that can be done to make the code easier to read. >>>> For example, this code is used in ~10 methods of rv_continuous: >>>> >>>> loc,scale=map(kwds.get,['loc','scale']) >>>> args, loc, scale = self._fix_loc_scale(args, loc, scale) >>>> x,loc,scale = map(asarray,(x,loc,scale)) >>>> args = tuple(map(asarray,args)) >>>> >>>> Some refactoring may be in order. The same is true of the rest of the >>>> implementation of many of those methods. Some are exactly the same except >>>> for calls to the corresponding underscored method (example: logsf() and >>>> logcdf() are identical except for calls to _logsf() and _logcdf(), and one >>>> nonsensical multiplication). >>>> >>>> Ralf >>>> >>>> >>>> >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>>> >>>> I would say that the most important improvement needed in distributions is >>>> in the documentation. >>>> >>>> A new user would look at the doc string of, say, scipy.stats.norm, and >>>> have no idea how to proceed. Here's the current example from the docstring >>>> of scipy.stats.norm: >>>> >>>> Examples >>>> -------- >>>> >>> from scipy.stats import norm >>>> >>> numargs = norm.numargs >>>> >>> [ ] = [0.9,] * numargs >>>> >>> rv = norm() >>>> >>>> >>> x = np.linspace(0, np.minimum(rv.dist.b, 3)) >>>> >>> h = plt.plot(x, rv.pdf(x)) >>>> >>>> I don't even know what that means... and it doesn't compile. Also, what >>>> is b? how would I enter mu and sigma to make a normal distribution? It's >>>> all pretty opaque. >>> >>> >>> True, the examples are confusing. The reason is that they're generated from >>> a template, and it's pretty much impossible to get clear and concise >>> examples that way. It would be better to write custom examples for the >>> most-used distributions, and refer to those from the others. >>> >>> Ralf >>> >>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From josef.pktd at gmail.com Sun Sep 16 15:37:23 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 16 Sep 2012 15:37:23 -0400 Subject: [SciPy-Dev] distributions.py In-Reply-To: References:

<50539A12.3000101@astro.washington.edu> <5055F743.6030400@astro.washington.edu>

Message-ID: On Sun, Sep 16, 2012 at 3:33 PM, nicky van foreest wrote: > On 16 September 2012 20:34, Warren Weckesser > wrote: >> >> >> On Sun, Sep 16, 2012 at 1:17 PM, nicky van foreest >> wrote: >>> >>> > One comment: I like your idea of splitting the generic code to a >>> > separate file. But I'd hesitate to create a separate file for each >>> > distribution: that's a lot of files. In my opinion, a good compromise >>> > would be to create one file for continuous distributions, and one for >>> > discrete. All of this could be in a new "scipy.stats.distributions" >>> > submodule, for the sake of code organization. >>> >>> I just responded to Josef. This proposal makes the most sense I guess. >>> >>> > >>> > Also, I'd add one more item to your list: make sure all code is PEP8 >>> > compliant. Sometimes the PEP8 guidelines can seem a bit cumbersome, but >>> > they do make browsing and understanding code much easier. >>> >>> I'll check the pep8 documentation. >>> >>> I guess that improving the documentation is most important for the >>> moment. Once this is done, we can go on with splitting >>> distributions.py into two or three smaller files. >>> >> >> >> FWIW, I'm strongly in favor of the following: >> * Split distributions.py into three pieces (generic, discrete, continuous). > > I'll put that in my local to do list. > >> * Fix the screwy docstrings of the distributions (see the example Jake >> showed in a previous email). > > As said in the mail to Jake, I think it is best to update the > docstrings first. That confused me the most when I started using > scipy.stats the most. One related old idea (never implemented): Splitting up the distributions pdf docs in tutorial into separate pages for individual distributions, make them nicer with code and graphs and link them from the docstring of the distribution. This would keep the docstring itself from blowing up, but we could get the full html reference if we need to. Josef > >> * PEP8. (Use the pep8 program to check the code. I just got 1884 "errors" >> when I ran 'pep8 --repeat distributions.py | wc -l'.) > > I test the code I contribute with pep8.py. > > > Nicky >> >> Warren >> >> >>> Nicky >>> >>> > >>> > Thanks again for all your work on this - it's a very valuable >>> > contribution. >>> > Jake >>> > >>> > On 09/15/2012 02:03 PM, nicky van foreest wrote: >>> >> Hi, >>> >> >>> >> While reading distributions.py I made a kind of private trac list, of >>> >> stuff that might need refactoring, As a matter of fact, all issues >>> >> discussed in the mails above are already on my list. To summarize >>> >> (Please don't take the list below as a complaint, but just factual. I >>> >> am very happy that all this exists.) >>> >> >>> >> 1: the documentation is not clear, too concise, and fragmented; >>> >> actually a bit messy. >>> >> >>> >> 2: there is code overlap in the check work (The lines Ralf mentioned) >>> >> making it hard to find out the differences (but the differences in the >>> >> check work are method dependent so I don't quite know how to tackle >>> >> that in an elegant way), >>> >> >>> >> 3: the docs say that _argscheck need to be rewritten in case users >>> >> build their own distribution. But then the minimal requirement in my >>> >> opinion is that argscheck is simple to understand, and not overly >>> >> generic as it is right now. (I also have examples that its output, >>> >> while in line with its doc string, results in errors.) As far as I can >>> >> see its core can simply be replaced by np.all(cond) (I did not test >>> >> this though). >>> >> >>> >> 4: distributions.py is very big, too big for me actually. I recall >>> >> that my first attempt at finding out how the stats stuff worked was to >>> >> see how expon was implemented. No clue that this resided in >>> >> distributions.py. >>> >> >>> >> What I would like to see, although that would require a considerable >>> >> amount of work, is an architecture like this. >>> >> 1 rv_generic.py containing generic stuff >>> >> 2) rv_continous.py and rv_discrete.py, each imports rv_generic. >>> >> 3) each distribution is covered in a separate file. like expon.py, >>> >> norm, py, etc, and imports rv_continuous.py or rv_discrete.py, >>> >> whatever appropriate. Each docstring can/should contain some generic >>> >> part (like now) and a specific part, with working examples, and clear >>> >> explanations. The most important are normal, expon, binom, geom, >>> >> poisson, and perhaps some others. This would also enable others to >>> >> help extend the documentation, examples.... >>> >> 4) I would like to move the math parts in continuous.rst to the doc >>> >> string in the related distribution file. Since mathjax gives such >>> >> nice results on screen, there is also no reason not to include the >>> >> mathematical facts in the doc string of the distribution itself. In >>> >> fact, most (all?) distributions already have a short math description, >>> >> but this is in overlap with continuous.rst. >>> >> >>> >> I wouldn't mind chopping up distributions.py into the separate >>> >> distributions, and merge it with the maths of continuous.rst. I can >>> >> tackle approx one distribution per day roughly, hence reduce this >>> >> mind-numbing work to roughly 15 minutes a day (correction work on >>> >> exams is much worse :-) ). But I don't know how much this proposal >>> >> will affect the automatic generation of documentation. For the rest I >>> >> don't think this will affect the code a lot. >>> >> >>> >> >>> >> >>> >> NIcky >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> On 15 September 2012 11:59, Ralf Gommers >>> >> wrote: >>> >>> >>> >>> On Fri, Sep 14, 2012 at 10:56 PM, Jake Vanderplas >>> >>> wrote: >>> >>>> On 09/14/2012 01:49 PM, Ralf Gommers wrote: >>> >>>> >>> >>>> >>> >>>> >>> >>>> On Fri, Sep 14, 2012 at 12:48 AM, wrote: >>> >>>>> On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest >>> >>>>> >>> >>>>> wrote: >>> >>>>>> Hi, >>> >>>>>> >>> >>>>>> Now that I understand github (Thanks to Ralf for his explanations >>> >>>>>> in >>> >>>>>> Dutch) and got some simple stuff out of the way in distributions.py >>> >>>>>> I >>> >>>>>> would like to tackle a somewhat harder issue. The function >>> >>>>>> argsreduce >>> >>>>>> is, as far as I can see, too generic. I did some tests to see >>> >>>>>> whether >>> >>>>>> its most generic output, as described by its docstring, is actually >>> >>>>>> swallowed by the callers of argsreduce, but this appears not to be >>> >>>>>> the >>> >>>>>> case. >>> >>>>> being generic is not a disadvantage (per se) if it's fast >>> >>>>> >>> >>>>> >>> >>>>> https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665 >>> >>>>> (and a being a one liner is not a disadvantage either) >>> >>>>> >>> >>>>> Josef >>> >>>>> >>> >>>>>> My motivation to simplify the code in distributions.py (and clean >>> >>>>>> it >>> >>>>>> up) is partly based on making it simpler to understand for myself, >>> >>>>>> but >>> >>>>>> also to others. The fact that github makes code browsing a much >>> >>>>>> nicer >>> >>>>>> experience, perhaps more people will take a look at what's under >>> >>>>>> the >>> >>>>>> hood. But then the code should also be accessible and clean. Are >>> >>>>>> there >>> >>>>>> any reasons not to pursue this path, and focus on more important >>> >>>>>> problems of the stats library? >>> >>>> >>> >>>> Not sure that argsreduce is the best place to start (see Josef's >>> >>>> reply), >>> >>>> but there should be things that can be done to make the code easier >>> >>>> to read. >>> >>>> For example, this code is used in ~10 methods of rv_continuous: >>> >>>> >>> >>>> loc,scale=map(kwds.get,['loc','scale']) >>> >>>> args, loc, scale = self._fix_loc_scale(args, loc, scale) >>> >>>> x,loc,scale = map(asarray,(x,loc,scale)) >>> >>>> args = tuple(map(asarray,args)) >>> >>>> >>> >>>> Some refactoring may be in order. The same is true of the rest of the >>> >>>> implementation of many of those methods. Some are exactly the same >>> >>>> except >>> >>>> for calls to the corresponding underscored method (example: logsf() >>> >>>> and >>> >>>> logcdf() are identical except for calls to _logsf() and _logcdf(), >>> >>>> and one >>> >>>> nonsensical multiplication). >>> >>>> >>> >>>> Ralf >>> >>>> >>> >>>> >>> >>>> >>> >>>> _______________________________________________ >>> >>>> SciPy-Dev mailing list >>> >>>> SciPy-Dev at scipy.org >>> >>>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>>> >>> >>>> I would say that the most important improvement needed in >>> >>>> distributions is >>> >>>> in the documentation. >>> >>>> >>> >>>> A new user would look at the doc string of, say, scipy.stats.norm, >>> >>>> and >>> >>>> have no idea how to proceed. Here's the current example from the >>> >>>> docstring >>> >>>> of scipy.stats.norm: >>> >>>> >>> >>>> Examples >>> >>>> -------- >>> >>>>>>> from scipy.stats import norm >>> >>>>>>> numargs = norm.numargs >>> >>>>>>> [ ] = [0.9,] * numargs >>> >>>>>>> rv = norm() >>> >>>>>>> x = np.linspace(0, np.minimum(rv.dist.b, 3)) >>> >>>>>>> h = plt.plot(x, rv.pdf(x)) >>> >>>> I don't even know what that means... and it doesn't compile. Also, >>> >>>> what >>> >>>> is b? how would I enter mu and sigma to make a normal distribution? >>> >>>> It's >>> >>>> all pretty opaque. >>> >>> >>> >>> True, the examples are confusing. The reason is that they're generated >>> >>> from >>> >>> a template, and it's pretty much impossible to get clear and concise >>> >>> examples that way. It would be better to write custom examples for the >>> >>> most-used distributions, and refer to those from the others. >>> >>> >>> >>> Ralf >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> >>> SciPy-Dev mailing list >>> >>> SciPy-Dev at scipy.org >>> >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> >>> >> _______________________________________________ >>> >> SciPy-Dev mailing list >>> >> SciPy-Dev at scipy.org >>> >> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> > >>> > >>> > _______________________________________________ >>> > SciPy-Dev mailing list >>> > SciPy-Dev at scipy.org >>> > http://mail.scipy.org/mailman/listinfo/scipy-dev >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From vanforeest at gmail.com Sun Sep 16 16:03:06 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sun, 16 Sep 2012 22:03:06 +0200 Subject: [SciPy-Dev] distributions.py In-Reply-To: References:

<50539A12.3000101@astro.washington.edu> <5055F743.6030400@astro.washington.edu>

Message-ID: I'll start a new thread on the documentation. From vanforeest at gmail.com Sun Sep 16 17:10:03 2012 From: vanforeest at gmail.com (nicky van foreest) Date: Sun, 16 Sep 2012 23:10:03 +0200 Subject: [SciPy-Dev] stats.distributions.py documentation Message-ID: Hi, Below are two proposals to handle the documentation of the scipy distributions. The first is to add a set of examples to each distribution, see the list at the end of the mail as an example. However, I actually wonder whether it wouldn't be better to put this stuff in the stats tutorial. (I recently updated this, but given the list below, it is still not complete.) The list below is a bit long... too long perhaps. I actualy get the feeling that, given the enormous diversity of the distributions, it may not be possible to automatically generate a set of simple examples that work for each and every distributions. Such examples then would involve the usage of x.dist.b, and so on, and this is not particularly revealing to first (and second) time users. A possible resolution is to include just one or two generic examples in the example doc string (e.g., dist.rvs(size = (2,3)) ), and refer to the tutorial for the rest. The tutorial then should show extensive examples for each method of the norm distribution. I assume that then any user of other distributions can figure out how to proceed for his/her own distribution. The second possibility would be to follow Josef's suggestion: --snip snip Splitting up the distributions pdf docs in tutorial into separate pages for individual distributions, make them nicer with code and graphs and link them from the docstring of the distribution. This would keep the docstring itself from blowing up, but we could get the full html reference if we need to. --snip snip This idea offers a lot of opportunities. In a previous mail I mentioned that I don't quite like that the documentation is spread over multiple documents. There are doc strings in distributions.py (leading to a bloated file), and there is continuous.rst. Part of the implementation can be understood from the doc-string, typically, the density function, but not the rest; this requires continuous.rst. Besides this, in case some specific distribution requires extra explanation/examples, this will have to put in the doc-string, making distributions.py longer still. Thus, to take up Josef's suggestion, what about a documentation file organised like this: # some tag to tell that these are the docs for the norm distribution # eg. # norm_gen Normal Distribution ---------------------------- Notes ^^^^^^^ # should be used by the interpreter The probability density function for `norm` is:: norm.pdf(x) = exp(-x**2/2)/sqrt(2*pi) Simple Examples ^^^^^^^^^^^^^^^^^^^^ # used for by interpreter >>> norm.rvs( size = (2,3) ) Extensive Examples ^^^^^^^^^^^^^^^^^^^^^^^^ # Not used by the interpreter, but certainly by a html viewer, containing graphs, hard/specific examples. Mathematical Details ^^^^^^^^^^^^^^^^^^^^^^ Stuff from continuous.rst # dist2_gen Distribution number 2 ----------------------------------------- etc It shouldn't be too hard to parse such a document, and couple each piece of documentation to a distribution in distributions.py (or am I mistaken?) as we use the class name as the tag in the documentation file. The doc-string for a distribution in distributions.py can then be removed, Nicky Example for the examples section of the docstring of norm. Notes ----- The probability density function for `norm` is:: norm.pdf(x) = exp(-x**2/2)/sqrt(2*pi) #%(example)s Examples -------- Setting the mean and standard deviation: >>> from scipy.stats import norm >>> norm.cdf(0.0) >>> norm.cdf(0., 1) # set mu = loc = 1 >>> norm.cdf(0., 1, 2) # mu = loc = 1, scale = sigma = 2 >>> norm.cdf(0., loc = 1, scale = 2) # mu = loc = 1, scale = sigma = 2 Frozen rvs >>> norm(1., 2.).cdf(0) >>> x = norm(scale = 2.) >>> x.cdf(0.0) Moments >>> norm(loc = 2).stats() >>> norm.mean() >>> norm.moment(2, scale = 3.) >>> x.std() >>> x.var() Random number generation >>> norm.rvs(3, 1, size = (2,3)) # loc = 3, scale =1, array of shape (2,3) >>> norm.rvs(3, 1, size = [2,3]) >>> x.rvs(3) # array with 3 random deviates >>> x.rvs([3,4]) # array of shape (3,4) with deviates Expectations >>> norm.expect(lambda x: x, loc = 1) # 1.00000 >>> norm.expect(lambda x: x**2, loc = 1., scale = 2.) # second moment Support of the distribution >>> norm.a # left limit, -np.inf here >>> norm.b # right limit, np.inf here Plot of the cdf >>> import numpy as np >>> x = np.linspace(0, 3) >>> P = norm.cdf(x) >>> plt.plot(x,P) >>> plt.show() From vanderplas at astro.washington.edu Mon Sep 17 19:30:56 2012 From: vanderplas at astro.washington.edu (Jacob VanderPlas) Date: Mon, 17 Sep 2012 16:30:56 -0700 Subject: [SciPy-Dev] Two speed/enhancement PRs Message-ID: <5057B2B0.7050202@astro.washington.edu> Hello, I currently have two small open PRs which are basically ready for merge, and have been sitting inactive for a while. I'd love a few people to read-through so we can merge: https://github.com/scipy/scipy/pull/289 This PR speeds up the pseudoinverse calculation, adds a fast hermitian pseudoinverse, and adds a `return_rank` option on all pseudoinverse methods https://github.com/scipy/scipy/pull/311 This PR speeds up single-row access in CSR matrices, and allows for fast generic slicing of CSR matrix rows. Thanks! Jake From ralf.gommers at gmail.com Tue Sep 18 14:58:23 2012 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 18 Sep 2012 20:58:23 +0200 Subject: [SciPy-Dev] Adding solvers to scipy.integrate [Was: A step toward merging odeint and ode] In-Reply-To: <1345537333.15902.7.camel@amilo.coursju> References: <1344768110.26417.6.camel@amilo.coursju> <1345020138.19573.11.camel@amilo.coursju> <1345114002.3855.5.camel@amilo.coursju> <1345537333.15902.7.camel@amilo.coursju> Message-ID: On Tue, Aug 21, 2012 at 10:22 AM, Fabrice Silva wrote: > Le lundi 20 ao?t 2012 ? 22:04 +0200, Ralf Gommers a ?crit : > > > https://github.com/FabricioS/scipy/commit/f867f2b8133d3f6ea47d449bd760a77a7c90394e > > > This is probably not worth the cost for existing users imho. It is a > > backwards compatibility break that doesn't really add anything except > > for some consistency (right?). > > Hi Ralf, > Ok concerning this point. > > Hi Fabrice, sorry for the slow reply. > > In addition, I have been looking to suggest additional solvers, > essentially simpler scheme, that would thus allow to easily switch > between "complex" (lsode, vode, cvode) and basic schemes (Euler, > Nicholson, etc...) > Sounds like a good idea. I think though that for each solver added there should be a gain in performance for some problems, whether that's speed, accuracy, robustness or something else. Adding a solver for educational purposes or completeness only wouldn't be all that useful. > I came across some code on the Montana Univ.'s Computer Science dpt: > http://wiki.cs.umt.edu/classes/cs477/index.php/Creating_ODE_Solver_Objects > and asked Jesse Johnson (the responsible for that class) what is the > license for that code. Here is his answer : > > Any thing that you find on those pages, you may use. However, > I'm not sure how to go about officially giving the code a > particular license. Can I add a license to the wiki, stating > that it applies to all the code therein? > > PS It is fantastic you're doing this. I've often thought that > scipy.ode could use some improvements. > > He is cc'ed of this mail, could anyone concerned about scipy license > requirements and more generally in code licensing answer him ? > Hi Jesse, thanks a lot for contributing your code. As for the licensing question, I think the best would be to either add a license at the bottom of each page, or in the code itself. Adding a page somewhere on your wiki saying "license X applies to all code in this wiki" is in principle enough, but it may be hard to find for readers. And if they can't find a license, they have to assume they can't reuse your code. As for the license itself, it would be good to be explicit ("public domain" or "BSD" for example). Disclaimer: I'm not a lawyer. Best, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim at cerazone.net Thu Sep 20 00:50:11 2012 From: tim at cerazone.net (Cera, Tim) Date: Thu, 20 Sep 2012 00:50:11 -0400 Subject: [SciPy-Dev] Scipy docstrings Message-ID: A whole bunch of conflicts and merges showed up on the docstring editor at http://docs.scipy.org/scipy/merge/ Can't 'Accept Merges' because all of the merges that I looked at did not follow the docstring standard. There isn't a way to 'Reject Merges', so what I have done in the past is accept, then revert. Could this be handled in some other way? Kindest regards, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From orion at cora.nwra.com Thu Sep 20 19:33:20 2012 From: orion at cora.nwra.com (Orion Poplawski) Date: Thu, 20 Sep 2012 17:33:20 -0600 Subject: [SciPy-Dev] Fwd: Package: scipy-0.11.0-0.1.rc2.fc18 Tag: f18-updates-candidate Status: failed Built by: orion In-Reply-To: <20120920210154.D759123187@bastion01.phx2.fedoraproject.org> References: <20120920210154.D759123187@bastion01.phx2.fedoraproject.org> Message-ID: <505BA7C0.6050605@cora.nwra.com> This is a plea for some help. We've been having trouble getting scipy to pass all of the tests in the Fedora 18 build with python 3.3 (although it seems to build okay in Fedora 19). Below are the logs of the build. There appears to be some kind of memory corruption that manifests itself a little differently on 32-bit vs. 64-bit. I really have no idea myself how to pursue debugging this, though I'm happy to provide any more needed information. Thanks, Orion -------- Original Message -------- Subject: Package: scipy-0.11.0-0.1.rc2.fc18 Tag: f18-updates-candidate Status: failed Built by: orion Date: Thu, 20 Sep 2012 21:01:54 +0000 (UTC) From: Fedora Koji Build System To: ausil at fedoraproject.org, jspaleta at fedoraproject.org, voronov at fedoraproject.org, torwangjl at fedoraproject.org, alagunambi at fedoraproject.org, urkle at fedoraproject.org, orion at fedoraproject.org Package: scipy-0.11.0-0.1.rc2.fc18 Tag: f18-updates-candidate Status: failed Built by: orion ID: 350761 Started: Thu, 20 Sep 2012 20:39:45 UTC Finished: Thu, 20 Sep 2012 21:01:32 UTC scipy-0.11.0-0.1.rc2.fc18 (350761) failed on buildvm-33.phx2.fedoraproject.org (x86_64), buildvm-35.phx2.fedoraproject.org (i386), buildvm-34.phx2.fedoraproject.org (noarch): BuildError: error building package (arch x86_64), mock exited with status 1; see build.log for more information SRPMS: scipy-0.11.0-0.1.rc2.fc18.src.rpm Failed tasks: ------------- Task 4509076 on buildvm-33.phx2.fedoraproject.org Task Type: buildArch (scipy-0.11.0-0.1.rc2.fc18.src.rpm, x86_64) logs: http://koji.fedoraproject.org/koji/getfile?taskID=4509076&name=build.log http://koji.fedoraproject.org/koji/getfile?taskID=4509076&name=mock_output.log http://koji.fedoraproject.org/koji/getfile?taskID=4509076&name=root.log http://koji.fedoraproject.org/koji/getfile?taskID=4509076&name=state.log Task 4509077 on buildvm-35.phx2.fedoraproject.org Task Type: buildArch (scipy-0.11.0-0.1.rc2.fc18.src.rpm, i686) logs: http://koji.fedoraproject.org/koji/getfile?taskID=4509077&name=build.log http://koji.fedoraproject.org/koji/getfile?taskID=4509077&name=mock_output.log http://koji.fedoraproject.org/koji/getfile?taskID=4509077&name=root.log http://koji.fedoraproject.org/koji/getfile?taskID=4509077&name=state.log Task 4509063 on buildvm-34.phx2.fedoraproject.org Task Type: build (f18-candidate, /scipy:cb69bd06f0d930fbe8840d89b918b617e28af63f) Closed tasks: ------------- Task 4509064 on buildvm-30.phx2.fedoraproject.org Task Type: buildSRPMFromSCM (/scipy:cb69bd06f0d930fbe8840d89b918b617e28af63f) logs: http://koji.fedoraproject.org/koji/getfile?taskID=4509064&name=build.log http://koji.fedoraproject.org/koji/getfile?taskID=4509064&name=checkout.log http://koji.fedoraproject.org/koji/getfile?taskID=4509064&name=mock_output.log http://koji.fedoraproject.org/koji/getfile?taskID=4509064&name=root.log http://koji.fedoraproject.org/koji/getfile?taskID=4509064&name=state.log Task Info: http://koji.fedoraproject.org/koji/taskinfo?taskID=4509063 Build Info: http://koji.fedoraproject.org/koji/buildinfo?buildID=350761 From ndbecker2 at gmail.com Fri Sep 21 10:49:51 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 21 Sep 2012 10:49:51 -0400 Subject: [SciPy-Dev] Fwd: Package: scipy-0.11.0-0.1.rc2.fc18 Tag: f18-updates-candidate Status: failed Built by: orion References: <20120920210154.D759123187@bastion01.phx2.fedoraproject.org> <505BA7C0.6050605@cora.nwra.com> Message-ID: As far as how to proceed, I had a problem once with unuran numerical issues on an unreleased fedora. I used mock to build on the unreleased version, and mock has a chroot feature to get a shell and debug. Orion Poplawski wrote: > This is a plea for some help. We've been having trouble getting scipy to pass > all of the tests in the Fedora 18 build with python 3.3 (although it seems to > build okay in Fedora 19). Below are the logs of the build. There appears to > be some kind of memory corruption that manifests itself a little differently > on 32-bit vs. 64-bit. I really have no idea myself how to pursue debugging > this, though I'm happy to provide any more needed information. > > Thanks, > > Orion > > -------- Original Message -------- > Subject: Package: scipy-0.11.0-0.1.rc2.fc18 Tag: f18-updates-candidate Status: > failed Built by: orion > Date: Thu, 20 Sep 2012 21:01:54 +0000 (UTC) > From: Fedora Koji Build System > To: ausil at fedoraproject.org, jspaleta at fedoraproject.org, > voronov at fedoraproject.org, torwangjl at fedoraproject.org, > alagunambi at fedoraproject.org, urkle at fedoraproject.org, orion at fedoraproject.org > > Package: scipy-0.11.0-0.1.rc2.fc18 > Tag: f18-updates-candidate > Status: failed > Built by: orion > ID: 350761 > Started: Thu, 20 Sep 2012 20:39:45 UTC > Finished: Thu, 20 Sep 2012 21:01:32 UTC > > > scipy-0.11.0-0.1.rc2.fc18 (350761) failed on buildvm-33.phx2.fedoraproject.org > (x86_64), buildvm-35.phx2.fedoraproject.org (i386), > buildvm-34.phx2.fedoraproject.org (noarch): > BuildError: error building package (arch x86_64), mock exited with status > 1; see build.log for more information > SRPMS: > scipy-0.11.0-0.1.rc2.fc18.src.rpm > > Failed tasks: > ------------- > > Task 4509076 on buildvm-33.phx2.fedoraproject.org > Task Type: buildArch (scipy-0.11.0-0.1.rc2.fc18.src.rpm, x86_64) > logs: > http://koji.fedoraproject.org/koji/getfile?taskID=4509076&name=build.log > http://koji.fedoraproject.org/koji/getfile?taskID=4509076&name=mock_output.log > http://koji.fedoraproject.org/koji/getfile?taskID=4509076&name=root.log > http://koji.fedoraproject.org/koji/getfile?taskID=4509076&name=state.log > > Task 4509077 on buildvm-35.phx2.fedoraproject.org > Task Type: buildArch (scipy-0.11.0-0.1.rc2.fc18.src.rpm, i686) > logs: > http://koji.fedoraproject.org/koji/getfile?taskID=4509077&name=build.log > http://koji.fedoraproject.org/koji/getfile?taskID=4509077&name=mock_output.log > http://koji.fedoraproject.org/koji/getfile?taskID=4509077&name=root.log > http://koji.fedoraproject.org/koji/getfile?taskID=4509077&name=state.log > > Task 4509063 on buildvm-34.phx2.fedoraproject.org > Task Type: build (f18-candidate, > /scipy:cb69bd06f0d930fbe8840d89b918b617e28af63f) > > > Closed tasks: > ------------- > > Task 4509064 on buildvm-30.phx2.fedoraproject.org > Task Type: buildSRPMFromSCM (/scipy:cb69bd06f0d930fbe8840d89b918b617e28af63f) > logs: > http://koji.fedoraproject.org/koji/getfile?taskID=4509064&name=build.log > http://koji.fedoraproject.org/koji/getfile?taskID=4509064&name=checkout.log > http://koji.fedoraproject.org/koji/getfile?taskID=4509064&name=mock_output.log > http://koji.fedoraproject.org/koji/getfile?taskID=4509064&name=root.log > http://koji.fedoraproject.org/koji/getfile?taskID=4509064&name=state.log > > > > Task Info: http://koji.fedoraproject.org/koji/taskinfo?taskID=4509063 > Build Info: http://koji.fedoraproject.org/koji/buildinfo?buildID=350761 From ralf.gommers at gmail.com Fri Sep 21 15:04:22 2012 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 21 Sep 2012 21:04:22 +0200 Subject: [SciPy-Dev] Scipy docstrings In-Reply-To: References: Message-ID: On Thu, Sep 20, 2012 at 6:50 AM, Cera, Tim wrote: > A whole bunch of conflicts and merges showed up on the docstring editor at > http://docs.scipy.org/scipy/merge/ > > Can't 'Accept Merges' because all of the merges that I looked at did not > follow the docstring standard. There isn't a way to 'Reject Merges', so > what I have done in the past is accept, then revert. Could this be handled > in some other way? > The question is where they come from. Certainly not from git master. Looks like a doc wiki bug. I wouldn't recommend spending effort merging and reverting before that's understood, otherwise it could easily happen again. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim at cerazone.net Fri Sep 21 15:25:39 2012 From: tim at cerazone.net (Cera, Tim) Date: Fri, 21 Sep 2012 15:25:39 -0400 Subject: [SciPy-Dev] Scipy docstrings In-Reply-To: References:

Message-ID: Thanks. I looked at the scipy commits yesterday and came to the same conclusion. Kindest regards, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Sep 21 15:27:14 2012 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 21 Sep 2012 21:27:14 +0200 Subject: [SciPy-Dev] stats.distributions.py documentation In-Reply-To: References: Message-ID: On Sun, Sep 16, 2012 at 11:10 PM, nicky van foreest wrote: > Hi, > > Below are two proposals to handle the documentation of the scipy > distributions. > > The first is to add a set of examples to each distribution, see the > list at the end of the mail as an example. However, I actually wonder > whether it wouldn't be better to put this stuff in the stats tutorial. > (I recently updated this, but given the list below, it is still not > complete.) The list below is a bit long... too long perhaps. > > I actualy get the feeling that, given the enormous diversity of the > distributions, it may not be possible to automatically generate a set > of simple examples that work for each and every distributions. Such > examples then would involve the usage of x.dist.b, and so on, and this > is not particularly revealing to first (and second) time users. > This is exactly what the problem is currently. > A possible resolution is to include just one or two generic examples > in the example doc string (e.g., dist.rvs(size = (2,3)) ), and refer > to the tutorial for the rest. The tutorial then should show extensive > examples for each method of the norm distribution. I assume that then > any user of other distributions can figure out how to proceed for > his/her own distribution. > This is a huge amount of work, and the generic example still won't run if you copy-paste it into a terminal. > > The second possibility would be to follow Josef's suggestion: > --snip snip > Splitting up the distributions pdf docs in tutorial into separate > pages for individual distributions, make them nicer with code and > graphs and link them from the docstring of the distribution. > Linking to the tutorial from the docstrings is a good idea, but the docstrings themselves should be enough to get started. > > This would keep the docstring itself from blowing up, but we could get > the full html reference if we need to. > > --snip snip > > This idea offers a lot of opportunities. In a previous mail I > mentioned that I don't quite like that the documentation is spread > over multiple documents. There are doc strings in distributions.py > (leading to a bloated file), It's not that bad imho. The typical docstring looks like: """A beta prima continuous random variable. %(before_notes)s Notes ----- The probability density function for `betaprime` is:: betaprime.pdf(x, a, b) = gamma(a+b) / (gamma(a)*gamma(b)) * x**(a-1) * (1-x)**(-a-b) for ``x > 0``, ``a > 0``, ``b > 0``. %(example)s """ It can't be much shorter than that. and there is continuous.rst. Part of the > implementation can be understood from the doc-string, typically, the > density function, but not the rest; The pdf and support are given, that's enough to define the distribution. So that should stay. It doesn't mean we have to copy the whole wikipedia page for each distribution. > this requires continuous.rst. > Besides this, in case some specific distribution requires extra > explanation/examples, this will have to put in the doc-string, making > distributions.py longer still. Thus, to take up Josef's suggestion, > what about a documentation file organised like this: > Are you suggesting a reST page here, or a .py file with only docs, and new magic to make part of the content show up as docstring? The former sounds better to me. > > # some tag to tell that these are the docs for the norm distribution > # eg. > # norm_gen > > Normal Distribution > ---------------------------- > > Notes > ^^^^^^^ > # should be used by the interpreter > The probability density function for `norm` is:: > > norm.pdf(x) = exp(-x**2/2)/sqrt(2*pi) > > Simple Examples > ^^^^^^^^^^^^^^^^^^^^ > # used for by interpreter > >>> norm.rvs( size = (2,3) ) > > Extensive Examples > ^^^^^^^^^^^^^^^^^^^^^^^^ > # Not used by the interpreter, but certainly by a html viewer, > containing graphs, hard/specific examples. > > Mathematical Details > ^^^^^^^^^^^^^^^^^^^^^^ > > Stuff from continuous.rst > > # dist2_gen > Distribution number 2 > ----------------------------------------- > etc > > It shouldn't be too hard to parse such a document, and couple each > piece of documentation to a distribution in distributions.py (or am I > mistaken?) as we use the class name as the tag in the documentation > file. The doc-string for a distribution in distributions.py can then > be removed, > > Nicky > > Example for the examples section of the docstring of norm. > This example is good. Perhaps the frozen distribution needs a few words of explanation. I suggest to do a few more of these for common distributions, and link to the norm() docstring from less common distributions. Other than that, I wouldn't change anything about the docstrings. Built docs could be reworked more thoroughly. Ralf > > Notes > ----- > The probability density function for `norm` is:: > > norm.pdf(x) = exp(-x**2/2)/sqrt(2*pi) > > #%(example)s > > Examples > -------- > > Setting the mean and standard deviation: > > >>> from scipy.stats import norm > >>> norm.cdf(0.0) > >>> norm.cdf(0., 1) # set mu = loc = 1 > >>> norm.cdf(0., 1, 2) # mu = loc = 1, scale = sigma = 2 > >>> norm.cdf(0., loc = 1, scale = 2) # mu = loc = 1, scale = > sigma = 2 > > Frozen rvs > > >>> norm(1., 2.).cdf(0) > >>> x = norm(scale = 2.) > >>> x.cdf(0.0) > > Moments > > >>> norm(loc = 2).stats() > >>> norm.mean() > >>> norm.moment(2, scale = 3.) > >>> x.std() > >>> x.var() > > Random number generation > > >>> norm.rvs(3, 1, size = (2,3)) # loc = 3, scale =1, array of > shape (2,3) > >>> norm.rvs(3, 1, size = [2,3]) > >>> x.rvs(3) # array with 3 random deviates > >>> x.rvs([3,4]) # array of shape (3,4) with deviates > > Expectations > > >>> norm.expect(lambda x: x, loc = 1) # 1.00000 > >>> norm.expect(lambda x: x**2, loc = 1., scale = 2.) # second > moment > > Support of the distribution > > >>> norm.a # left limit, -np.inf here > >>> norm.b # right limit, np.inf here > > Plot of the cdf > > >>> import numpy as np > >>> x = np.linspace(0, 3) > >>> P = norm.cdf(x) > >>> plt.plot(x,P) > >>> plt.show() > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Sep 22 04:50:49 2012 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 22 Sep 2012 10:50:49 +0200 Subject: [SciPy-Dev] ANN: SciPy 0.11.0 release candidate 2 In-Reply-To: References: <55FCB433-349E-468C-B59D-315C24A36BBA@samueljohn.de> Message-ID: On Sat, Sep 8, 2012 at 6:21 PM, Ralf Gommers