From fabian.pedregosa at inria.fr Mon Feb 1 04:28:20 2010 From: fabian.pedregosa at inria.fr (Fabian Pedregosa) Date: Mon, 01 Feb 2010 10:28:20 +0100 Subject: [SciPy-dev] Ball Tree code updated (ticket 1048) In-Reply-To: <58df6dc21001061448l4dc9ef05x478437a7837d3c32@mail.gmail.com> References: <58df6dc21001061448l4dc9ef05x478437a7837d3c32@mail.gmail.com> Message-ID: <4B669EB4.9070903@inria.fr> Jake VanderPlas wrote: > Hello, > I have had comments from a few people over the last two months on the > Ball Tree code that I submitted (ticket 1048). I cleaned up the code > a bit and posted the changes on the tracker. Any other comments would > be appreciated! > -Jake > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > Hi! Your code works great, with impressive speed improvements in high dimensions. I could just find a minor thing: leaf_size in the docstring should read leafsize. If this turns out to be too specific for scipy, would you mind that it get's included in scikit.learn [1] (BSD-license)? Thanks, fabian [1] http://scikit-learn.sourceforge.net From david at silveregg.co.jp Thu Feb 4 01:27:57 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Thu, 04 Feb 2010 15:27:57 +0900 Subject: [SciPy-dev] Comments on API for Matlab's eigs equivalent (computing a few eigenvalues only) Message-ID: <4B6A68ED.6070007@silveregg.co.jp> Hi, I have played a bit with adding support for computing a few eigenvalues of full symmetric matrices. I would like some comments on the current API: import numpy as np from scipy.linalg import eigs x = np.random.randn(10, 10) x = np.dot(x.T, x) # Retrieve the 3 biggest eigenvalues eigs(x, 3)[0] # Retrieve the 3 smallest eigenvalues eigs(x, -3)[0] # Retrieve the 3 biggest eigenvalues eigs(x, [0, 3], mode="index")[0] # Retrieve the 2nd and 3rd biggest eigs(x, [1, 3], mode="index")[0] # Retrieve all the eigenvalues in the range [1.5, 3.5[ eigs(x, [1.5, 3.5], mode="range")[0] One thing which does not feel right is that that the range in the "index" mode is exactly inverted compared to the output (i.e. if you ask for the range [0, 3], you get the last three items from what you would get if you asked for the full range [0, 10]), but this is because I kept compatibility with Octave (always showing from biggest to smallest). It would be easy to always do the contrary (from smallest to biggest) - besides consistency, it has the advantages that it is the actual order you get back from the underlying LAPACK function. Also, I needed to modify the f2py file for the related LAPACK functions, effectively changing their API in a backward incompatible way. Are the low-level f2py wrappers considered public API ? cheers, David From charlesr.harris at gmail.com Thu Feb 4 01:37:59 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 3 Feb 2010 23:37:59 -0700 Subject: [SciPy-dev] Comments on API for Matlab's eigs equivalent (computing a few eigenvalues only) In-Reply-To: <4B6A68ED.6070007@silveregg.co.jp> References: <4B6A68ED.6070007@silveregg.co.jp> Message-ID: On Wed, Feb 3, 2010 at 11:27 PM, David Cournapeau wrote: > Hi, > > I have played a bit with adding support for computing a few eigenvalues > of full symmetric matrices. I would like some comments on the current API: > > import numpy as np > from scipy.linalg import eigs > x = np.random.randn(10, 10) > x = np.dot(x.T, x) > # Retrieve the 3 biggest eigenvalues > eigs(x, 3)[0] > # Retrieve the 3 smallest eigenvalues > eigs(x, -3)[0] > # Retrieve the 3 biggest eigenvalues > eigs(x, [0, 3], mode="index")[0] > # Retrieve the 2nd and 3rd biggest > eigs(x, [1, 3], mode="index")[0] > # Retrieve all the eigenvalues in the range [1.5, 3.5[ > eigs(x, [1.5, 3.5], mode="range")[0] > > One thing which does not feel right is that that the range in the > "index" mode is exactly inverted compared to the output (i.e. if you ask > Why not use separate keywords for index and range. Changing the meaning of an argument using another keyword is just weird. I've seen it used elsewhere, but still... At some point you might even want the largest three in a range. So something like eigs(x, index=3) eigs(x, range=[1,4]) Why the index on the end? Is the return a list? > for the range [0, 3], you get the last three items from what you would > get if you asked for the full range [0, 10]), but this is because I kept > compatibility with Octave (always showing from biggest to smallest). It > Why follow Octave, isn't Octave like Matlab? Matlab functions are a mess. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Thu Feb 4 01:49:42 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Thu, 04 Feb 2010 15:49:42 +0900 Subject: [SciPy-dev] Comments on API for Matlab's eigs equivalent (computing a few eigenvalues only) In-Reply-To: References: <4B6A68ED.6070007@silveregg.co.jp> Message-ID: <4B6A6E06.7060102@silveregg.co.jp> Charles R Harris wrote: > > > On Wed, Feb 3, 2010 at 11:27 PM, David Cournapeau > wrote: > > Hi, > > I have played a bit with adding support for computing a few eigenvalues > of full symmetric matrices. I would like some comments on the > current API: > > import numpy as np > from scipy.linalg import eigs > x = np.random.randn(10, 10) > x = np.dot(x.T, x) > # Retrieve the 3 biggest eigenvalues > eigs(x, 3)[0] > # Retrieve the 3 smallest eigenvalues > eigs(x, -3)[0] > # Retrieve the 3 biggest eigenvalues > eigs(x, [0, 3], mode="index")[0] > # Retrieve the 2nd and 3rd biggest > eigs(x, [1, 3], mode="index")[0] > # Retrieve all the eigenvalues in the range [1.5, 3.5[ > eigs(x, [1.5, 3.5], mode="range")[0] > > One thing which does not feel right is that that the range in the > "index" mode is exactly inverted compared to the output (i.e. if you ask > > > Why not use separate keywords for index and range. Then I am not sure how to handle the "give me the k biggest/smallest" case, which is the most common I think (that's the only one I care personally :) ). Maybe this just warrants several functions ? But then there is the issue of naming (supporting non-symmetric matrices would be nice, but requires a totally different implementation, as LAPACK does not support it AFAIK - the easiest way would be to use ARPACK ATM). > Changing the meaning > of an argument using another keyword is just weird. Agreed - this is just the best I could came up with one function to support all cases while keeping the common case simple. But I feel a bit like I outsmarted myself here. > Why the index on the end? Is the return a list? Yes, it also returns eigenvectors > > for the range [0, 3], you get the last three items from what you would > get if you asked for the full range [0, 10]), but this is because I kept > compatibility with Octave (always showing from biggest to smallest). It > > > Why follow Octave, isn't Octave like Matlab? Matlab functions are a mess. Actually, there is another reason I forgot to mention: that's how eigen in scipy.sparse does as well (in decreasing order). I think that the ARPACK wrappers are not that well thought out, though. cheers, David From charlesr.harris at gmail.com Thu Feb 4 02:06:20 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 4 Feb 2010 00:06:20 -0700 Subject: [SciPy-dev] Comments on API for Matlab's eigs equivalent (computing a few eigenvalues only) In-Reply-To: <4B6A6E06.7060102@silveregg.co.jp> References: <4B6A68ED.6070007@silveregg.co.jp> <4B6A6E06.7060102@silveregg.co.jp> Message-ID: On Wed, Feb 3, 2010 at 11:49 PM, David Cournapeau wrote: > Charles R Harris wrote: > > > > > > On Wed, Feb 3, 2010 at 11:27 PM, David Cournapeau > > wrote: > > > > Hi, > > > > I have played a bit with adding support for computing a few > eigenvalues > > of full symmetric matrices. I would like some comments on the > > current API: > > > > import numpy as np > > from scipy.linalg import eigs > > x = np.random.randn(10, 10) > > x = np.dot(x.T, x) > > # Retrieve the 3 biggest eigenvalues > > eigs(x, 3)[0] > > # Retrieve the 3 smallest eigenvalues > > eigs(x, -3)[0] > > # Retrieve the 3 biggest eigenvalues > > eigs(x, [0, 3], mode="index")[0] > > # Retrieve the 2nd and 3rd biggest > > eigs(x, [1, 3], mode="index")[0] > > # Retrieve all the eigenvalues in the range [1.5, 3.5[ > > eigs(x, [1.5, 3.5], mode="range")[0] > > > > One thing which does not feel right is that that the range in the > > "index" mode is exactly inverted compared to the output (i.e. if you > ask > > > > > > Why not use separate keywords for index and range. > > Then I am not sure how to handle the "give me the k biggest/smallest" > case, which is the most common I think (that's the only one I care > personally :) ). > > That would be eigs(x, index=3) or eigs(x, index=-3) respectively, the default value of both range and index would be None, which could possibly return all eigenvalues. I'm not sure that index is the best word but I can't think of a better at the moment. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dagss at student.matnat.uio.no Thu Feb 4 03:12:04 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 04 Feb 2010 09:12:04 +0100 Subject: [SciPy-dev] Comments on API for Matlab's eigs equivalent (computing a few eigenvalues only) In-Reply-To: <4B6A6E06.7060102@silveregg.co.jp> References: <4B6A68ED.6070007@silveregg.co.jp> <4B6A6E06.7060102@silveregg.co.jp> Message-ID: <4B6A8154.2040705@student.matnat.uio.no> David Cournapeau wrote: > Charles R Harris wrote: >> >> On Wed, Feb 3, 2010 at 11:27 PM, David Cournapeau > > wrote: >> >> Hi, >> >> I have played a bit with adding support for computing a few eigenvalues >> of full symmetric matrices. I would like some comments on the >> current API: >> >> import numpy as np >> from scipy.linalg import eigs >> x = np.random.randn(10, 10) >> x = np.dot(x.T, x) >> # Retrieve the 3 biggest eigenvalues >> eigs(x, 3)[0] >> # Retrieve the 3 smallest eigenvalues >> eigs(x, -3)[0] >> # Retrieve the 3 biggest eigenvalues >> eigs(x, [0, 3], mode="index")[0] >> # Retrieve the 2nd and 3rd biggest >> eigs(x, [1, 3], mode="index")[0] >> # Retrieve all the eigenvalues in the range [1.5, 3.5[ >> eigs(x, [1.5, 3.5], mode="range")[0] >> >> One thing which does not feel right is that that the range in the >> "index" mode is exactly inverted compared to the output (i.e. if you ask >> >> >> Why not use separate keywords for index and range. > > Then I am not sure how to handle the "give me the k biggest/smallest" > case, which is the most common I think (that's the only one I care > personally :) ). > > Maybe this just warrants several functions ? But then there is the issue > of naming (supporting non-symmetric matrices would be nice, but requires > a totally different implementation, as LAPACK does not support it AFAIK > - the easiest way would be to use ARPACK ATM). > >> Changing the meaning >> of an argument using another keyword is just weird. > > Agreed - this is just the best I could came up with one function to > support all cases while keeping the common case simple. But I feel a bit > like I outsmarted myself here. > >> Why the index on the end? Is the return a list? > > Yes, it also returns eigenvectors > >> for the range [0, 3], you get the last three items from what you would >> get if you asked for the full range [0, 10]), but this is because I kept >> compatibility with Octave (always showing from biggest to smallest). It >> >> >> Why follow Octave, isn't Octave like Matlab? Matlab functions are a mess. > > Actually, there is another reason I forgot to mention: that's how eigen > in scipy.sparse does as well (in decreasing order). I think that the > ARPACK wrappers are not that well thought out, though. I feel there's also a certain convention of listing the largest eigenvalues first; "let \lambda_i be the eigenvalues...assume \lambda_2 < \lambda_1 ...". I know I'd assume that the largest eigenvalue came out first, and I've barely used either MATLAB or Octave. (Also with SVD one would typically put the largest singular values first, and so on.) Just a data-point, I don't really care either way... -- Dag Sverre From sturla at molden.no Thu Feb 4 05:40:52 2010 From: sturla at molden.no (Sturla Molden) Date: Thu, 4 Feb 2010 11:40:52 +0100 Subject: [SciPy-dev] compilation with fort77 In-Reply-To: <4B5CEA49.3040604@silveregg.co.jp> References: <4B5A054C.7060308@gmx.de> <4B5CEA49.3040604@silveregg.co.jp> Message-ID: >> >> >> >> the problem is that fort77 cannot deal with those variable sized >> input arguments: >> >> SUBROUTINE mvnun(d, n, lower, upper, means, covar, maxpts, >> & abseps, releps, value, inform) >> ... >> integer n, d, infin(d), maxpts, inform, tmpinf >> double precision lower(d), upper(d), releps, abseps, >> & error, value, stdev(d), rho(d*(d-1)/2), >> & covar(d,d), >> & nlower(d), nupper(d), means(d,n), tmpval >> integer i, j >> >> Could some Fortran expert please help me to make it fort77 >> compatible? The code posted here (I haven't looked in SVN) looks like valid Fortran 77 to me. Dummy arguments can have variable size. f2c might not accept it because variable-size arrays are not supported in C89, and the Fortran to C conversion is rather primitive. Sturla > > This file is using some F90/F95 - I am not sure what the policy is on > this point, but don't think we want to guarantee that we will never > use > fortran > F77. > > Can't you get gfortran running on your platform ? Cross-compiling > gfortran for your platform should not be difficult > > David > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From pav+sp at iki.fi Thu Feb 4 06:30:25 2010 From: pav+sp at iki.fi (Pauli Virtanen) Date: Thu, 4 Feb 2010 11:30:25 +0000 (UTC) Subject: [SciPy-dev] compilation with fort77 References: <4B5A054C.7060308@gmx.de> <4B5CEA49.3040604@silveregg.co.jp> Message-ID: Thu, 04 Feb 2010 11:40:52 +0100, Sturla Molden wrote: >>> the problem is that fort77 cannot deal with those variable sized input >>> arguments: >>> >>> SUBROUTINE mvnun(d, n, lower, upper, means, covar, maxpts, >>> & abseps, releps, value, inform) >>> ... >>> integer n, d, infin(d), maxpts, inform, tmpinf double precision >>> lower(d), upper(d), releps, abseps, >>> & error, value, stdev(d), rho(d*(d-1)/2), & >>> covar(d,d), >>> & nlower(d), nupper(d), means(d,n), tmpval >>> integer i, j >>> >>> Could some Fortran expert please help me to make it fort77 compatible? > > The code posted here (I haven't looked in SVN) looks like valid Fortran > 77 to me. Dummy arguments can have variable size. f2c might not accept > it because variable-size arrays are not supported in C89, and the > Fortran to C conversion is rather primitive. Note that infin, stdev, rho, nlower, and nupper are not dummy variables, so it's not valid F77. -- Pauli Virtanen From dwf at cs.toronto.edu Thu Feb 4 14:31:14 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 4 Feb 2010 14:31:14 -0500 Subject: [SciPy-dev] Comments on API for Matlab's eigs equivalent (computing a few eigenvalues only) In-Reply-To: <4B6A8154.2040705@student.matnat.uio.no> References: <4B6A68ED.6070007@silveregg.co.jp> <4B6A6E06.7060102@silveregg.co.jp> <4B6A8154.2040705@student.matnat.uio.no> Message-ID: <750A37B6-052E-4F91-A63A-97EC5D422AD5@cs.toronto.edu> On 4-Feb-10, at 3:12 AM, Dag Sverre Seljebotn wrote: > I feel there's also a certain convention of listing the largest > eigenvalues first; "let \lambda_i be the eigenvalues...assume > \lambda_2 > < \lambda_1 ...". +1, though I've seen them listed from smallest to largest as well. David From rob.clewley at gmail.com Thu Feb 4 15:01:56 2010 From: rob.clewley at gmail.com (Rob Clewley) Date: Thu, 4 Feb 2010 15:01:56 -0500 Subject: [SciPy-dev] Comments on API for Matlab's eigs equivalent (computing a few eigenvalues only) In-Reply-To: <750A37B6-052E-4F91-A63A-97EC5D422AD5@cs.toronto.edu> References: <4B6A68ED.6070007@silveregg.co.jp> <4B6A6E06.7060102@silveregg.co.jp> <4B6A8154.2040705@student.matnat.uio.no> <750A37B6-052E-4F91-A63A-97EC5D422AD5@cs.toronto.edu> Message-ID: On Thu, Feb 4, 2010 at 2:31 PM, David Warde-Farley wrote: > On 4-Feb-10, at 3:12 AM, Dag Sverre Seljebotn wrote: > >> I feel there's also a certain convention of listing the largest >> eigenvalues first; "let \lambda_i be the eigenvalues...assume >> \lambda_2 >> < \lambda_1 ...". > > +1, though I've seen them listed from smallest to largest as well. > > David I'm +1 too, but are these going to be ordered by norm of the eigenvalue if they are complex, or by the real part only? I suggest by norm, and the docstring needs to make the choice clear. This functionality has to be consistent for all uses of eig()! -Rob From peridot.faceted at gmail.com Thu Feb 4 15:18:02 2010 From: peridot.faceted at gmail.com (Anne Archibald) Date: Thu, 4 Feb 2010 15:18:02 -0500 Subject: [SciPy-dev] Comments on API for Matlab's eigs equivalent (computing a few eigenvalues only) In-Reply-To: References: <4B6A68ED.6070007@silveregg.co.jp> <4B6A6E06.7060102@silveregg.co.jp> <4B6A8154.2040705@student.matnat.uio.no> <750A37B6-052E-4F91-A63A-97EC5D422AD5@cs.toronto.edu> Message-ID: On 4 February 2010 15:01, Rob Clewley wrote: > On Thu, Feb 4, 2010 at 2:31 PM, David Warde-Farley wrote: >> On 4-Feb-10, at 3:12 AM, Dag Sverre Seljebotn wrote: >> >>> I feel there's also a certain convention of listing the largest >>> eigenvalues first; "let \lambda_i be the eigenvalues...assume >>> \lambda_2 >>> < \lambda_1 ...". >> >> +1, though I've seen them listed from smallest to largest as well. >> >> David > > I'm +1 too, but are these going to be ordered by norm of the > eigenvalue if they are complex, or by the real part only? I suggest by > norm, and the docstring needs to make the choice clear. This > functionality has to be consistent for all uses of eig()! I have to say using the norm for complex eigenvalues is asking for trouble, since it puts -2 between 1 and 3. Not that this is relevant for a symmetric (Hermitian) eigensolver, which will always have real eigenvalues. Anne From rob.clewley at gmail.com Thu Feb 4 19:39:37 2010 From: rob.clewley at gmail.com (Rob Clewley) Date: Thu, 4 Feb 2010 19:39:37 -0500 Subject: [SciPy-dev] Comments on API for Matlab's eigs equivalent (computing a few eigenvalues only) In-Reply-To: References: <4B6A68ED.6070007@silveregg.co.jp> <4B6A6E06.7060102@silveregg.co.jp> <4B6A8154.2040705@student.matnat.uio.no> <750A37B6-052E-4F91-A63A-97EC5D422AD5@cs.toronto.edu> Message-ID: > Not that this is relevant > for a symmetric (Hermitian) eigensolver, which will always have real > eigenvalues. > > Anne Yep, sorry I failed to notice this wasn't an extension to eig, but about a new function eigs only for symmetric matrices. My bad. From warren.weckesser at enthought.com Thu Feb 4 19:43:46 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 04 Feb 2010 18:43:46 -0600 Subject: [SciPy-dev] Comments on API for Matlab's eigs equivalent (computing a few eigenvalues only) In-Reply-To: References: <4B6A68ED.6070007@silveregg.co.jp> <4B6A6E06.7060102@silveregg.co.jp> <4B6A8154.2040705@student.matnat.uio.no> <750A37B6-052E-4F91-A63A-97EC5D422AD5@cs.toronto.edu> Message-ID: <4B6B69C2.7000907@enthought.com> Anne Archibald wrote: > On 4 February 2010 15:01, Rob Clewley wrote: > >> On Thu, Feb 4, 2010 at 2:31 PM, David Warde-Farley wrote: >> >>> On 4-Feb-10, at 3:12 AM, Dag Sverre Seljebotn wrote: >>> >>> >>>> I feel there's also a certain convention of listing the largest >>>> eigenvalues first; "let \lambda_i be the eigenvalues...assume >>>> \lambda_2 >>>> < \lambda_1 ...". >>>> >>> +1, though I've seen them listed from smallest to largest as well. >>> >>> David >>> >> I'm +1 too, but are these going to be ordered by norm of the >> eigenvalue if they are complex, or by the real part only? I suggest by >> norm, and the docstring needs to make the choice clear. This >> functionality has to be consistent for all uses of eig()! >> > > I have to say using the norm for complex eigenvalues is asking for > trouble, since it puts -2 between 1 and 3. Not that this is relevant > for a symmetric (Hermitian) eigensolver, which will always have real > eigenvalues. > > Anne > David C, You said this was for symmetric matrices, but do you envision later allowing nonsymmetric matrices? If not, then perhaps the name of the function should be 'eigsh', following the precedent set by numpy.linalg and scipy.linalg. On the other hand, if the intent is to eventually handle nonsymmetric matrices, then it would be nice to provide an API that is as flexible as (but definitetly not the same as) Matlab's eigs function: http://www.mathworks.com/access/helpdesk/help/techdoc/ref/eigs.html In particular, the choice of ordering by magnitude or by real part is convenient. Warren > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From cournape at gmail.com Thu Feb 4 20:27:07 2010 From: cournape at gmail.com (David Cournapeau) Date: Fri, 5 Feb 2010 10:27:07 +0900 Subject: [SciPy-dev] Comments on API for Matlab's eigs equivalent (computing a few eigenvalues only) In-Reply-To: <4B6B69C2.7000907@enthought.com> References: <4B6A68ED.6070007@silveregg.co.jp> <4B6A6E06.7060102@silveregg.co.jp> <4B6A8154.2040705@student.matnat.uio.no> <750A37B6-052E-4F91-A63A-97EC5D422AD5@cs.toronto.edu> <4B6B69C2.7000907@enthought.com> Message-ID: <5b8d13221002041727w131d1854sde448b76fcd9155c@mail.gmail.com> On Fri, Feb 5, 2010 at 9:43 AM, Warren Weckesser wrote: > David C, > > You said this was for symmetric matrices, but do you envision later > allowing nonsymmetric matrices? Yes. I have only implemented symmetric/Hermitian because that's the only solver that handles getting only a few eigenvalues in LAPACK. Matlab own eigs function uses ARPACK for both symmetric/unsymmetric cases, which is not good according to one of the Lapack developer (http://mail.scipy.org/pipermail/scipy-dev/2007-March/006778.html). But we could use ARPACK for the general case in eigs. > > If not, then perhaps the name of the function should be 'eigsh', > following the precedent set by numpy.linalg and scipy.linalg. I wonder whether those eigh functions are a good idea: I fear that most people will always use eig - maybe one could use the underlying eig*h solver in eig* if the matrix is detected as being symmetric ? I am not really knowledgeable about those issues, though. For example, I don't know whether the symmetric aspect should be checked exactly, or if it is better to use a symmetric solver even if there are very small errors in say A'-A. > > In particular, the choice of ordering by magnitude or by real part is > convenient. It seems that one would need to implement the non-symmetric capabilities to sort this out. I fear that those options are solver specific, though - maybe the solution is to have two levels of API, one low-level and one high level. cheers, David From josef.pktd at gmail.com Thu Feb 4 21:24:41 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 4 Feb 2010 21:24:41 -0500 Subject: [SciPy-dev] Comments on API for Matlab's eigs equivalent (computing a few eigenvalues only) In-Reply-To: <5b8d13221002041727w131d1854sde448b76fcd9155c@mail.gmail.com> References: <4B6A68ED.6070007@silveregg.co.jp> <4B6A6E06.7060102@silveregg.co.jp> <4B6A8154.2040705@student.matnat.uio.no> <750A37B6-052E-4F91-A63A-97EC5D422AD5@cs.toronto.edu> <4B6B69C2.7000907@enthought.com> <5b8d13221002041727w131d1854sde448b76fcd9155c@mail.gmail.com> Message-ID: <1cd32cbb1002041824s646b6a37g7a0e4c761975401e@mail.gmail.com> On Thu, Feb 4, 2010 at 8:27 PM, David Cournapeau wrote: > On Fri, Feb 5, 2010 at 9:43 AM, Warren Weckesser > wrote: > >> David C, >> >> You said this was for symmetric matrices, but do you envision later >> allowing nonsymmetric matrices? > > Yes. I have only implemented symmetric/Hermitian because that's the > only solver that handles getting only a few eigenvalues in LAPACK. > Matlab own eigs function uses ARPACK for both symmetric/unsymmetric > cases, which is not good according to one of the Lapack developer > (http://mail.scipy.org/pipermail/scipy-dev/2007-March/006778.html). > But we could use ARPACK for the general case in eigs. > >> >> If not, then perhaps the name of the function should be 'eigsh', >> following the precedent set by numpy.linalg and scipy.linalg. > > I wonder whether those eigh functions are a good idea: I fear that > most people will always use eig - maybe one could use the underlying > eig*h solver in eig* if the matrix is detected as being symmetric ? I > am not really knowledgeable about those issues, though. For example, I > don't know whether the symmetric aspect should be checked exactly, or > if it is better to use a symmetric solver ?even if there are very > small errors in say A'-A. > >> >> In particular, the choice of ordering by magnitude or by real part is >> convenient. > > It seems that one would need to implement the non-symmetric > capabilities to sort this out. I fear that those options are solver > specific, though - maybe the solution is to have two levels of API, > one low-level and one high level. > > cheers, > > David > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > the current version of scipy.linalg.eigh sorts from smallest to largest >>> scipy.linalg.eigh(x)[0] array([ 4.04457427e-03, 1.84073286e-01, 6.74875960e-01, 3.23328824e+00, 4.00741304e+00, 6.98333680e+00, 1.45314842e+01, 1.49260377e+01, 2.47166702e+01, 2.98955755e+01]) >>> scipy.linalg.eigh(x, eigvals=(0,3))[0] array([ 0.00404457, 0.18407329, 0.67487596, 3.23328824]) >>> scipy.linalg.eigh(x, eigvals=(len(x)-3, len(x)-1))[0] array([ 14.92603769, 24.71667021, 29.89557547]) >>> x = np.random.randn(10, 10) + 1j*np.random.randn(10, 10) >>> x = np.dot(x.T, x) >>> scipy.linalg.eigh(x, eigvals=(0,3))[0] array([-36.82039662, -18.20362967, -12.21716065, -9.79752274]) >>> scipy.linalg.eigh(x, eigvals=(len(x)-3, len(x)-1))[0] array([ 14.44815917, 28.8394026 , 42.45471058]) Josef From charlesr.harris at gmail.com Thu Feb 4 21:54:21 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 4 Feb 2010 19:54:21 -0700 Subject: [SciPy-dev] Comments on API for Matlab's eigs equivalent (computing a few eigenvalues only) In-Reply-To: <1cd32cbb1002041824s646b6a37g7a0e4c761975401e@mail.gmail.com> References: <4B6A68ED.6070007@silveregg.co.jp> <4B6A6E06.7060102@silveregg.co.jp> <4B6A8154.2040705@student.matnat.uio.no> <750A37B6-052E-4F91-A63A-97EC5D422AD5@cs.toronto.edu> <4B6B69C2.7000907@enthought.com> <5b8d13221002041727w131d1854sde448b76fcd9155c@mail.gmail.com> <1cd32cbb1002041824s646b6a37g7a0e4c761975401e@mail.gmail.com> Message-ID: On Thu, Feb 4, 2010 at 7:24 PM, wrote: > On Thu, Feb 4, 2010 at 8:27 PM, David Cournapeau > wrote: > > On Fri, Feb 5, 2010 at 9:43 AM, Warren Weckesser > > wrote: > > > >> David C, > >> > >> You said this was for symmetric matrices, but do you envision later > >> allowing nonsymmetric matrices? > > > > Yes. I have only implemented symmetric/Hermitian because that's the > > only solver that handles getting only a few eigenvalues in LAPACK. > > Matlab own eigs function uses ARPACK for both symmetric/unsymmetric > > cases, which is not good according to one of the Lapack developer > > (http://mail.scipy.org/pipermail/scipy-dev/2007-March/006778.html). > > But we could use ARPACK for the general case in eigs. > > > >> > >> If not, then perhaps the name of the function should be 'eigsh', > >> following the precedent set by numpy.linalg and scipy.linalg. > > > > I wonder whether those eigh functions are a good idea: I fear that > > most people will always use eig - maybe one could use the underlying > > eig*h solver in eig* if the matrix is detected as being symmetric ? I > > am not really knowledgeable about those issues, though. For example, I > > don't know whether the symmetric aspect should be checked exactly, or > > if it is better to use a symmetric solver even if there are very > > small errors in say A'-A. > > > >> > >> In particular, the choice of ordering by magnitude or by real part is > >> convenient. > > > > It seems that one would need to implement the non-symmetric > > capabilities to sort this out. I fear that those options are solver > > specific, though - maybe the solution is to have two levels of API, > > one low-level and one high level. > > > > cheers, > > > > David > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > the current version of scipy.linalg.eigh sorts from smallest to largest > > >>> scipy.linalg.eigh(x)[0] > array([ 4.04457427e-03, 1.84073286e-01, 6.74875960e-01, > 3.23328824e+00, 4.00741304e+00, 6.98333680e+00, > 1.45314842e+01, 1.49260377e+01, 2.47166702e+01, > 2.98955755e+01]) > >>> scipy.linalg.eigh(x, eigvals=(0,3))[0] > array([ 0.00404457, 0.18407329, 0.67487596, 3.23328824]) > >>> scipy.linalg.eigh(x, eigvals=(len(x)-3, len(x)-1))[0] > array([ 14.92603769, 24.71667021, 29.89557547]) > > > >>> x = np.random.randn(10, 10) + 1j*np.random.randn(10, 10) > >>> x = np.dot(x.T, x) > >>> scipy.linalg.eigh(x, eigvals=(0,3))[0] > array([-36.82039662, -18.20362967, -12.21716065, -9.79752274]) > >>> scipy.linalg.eigh(x, eigvals=(len(x)-3, len(x)-1))[0] > array([ 14.44815917, 28.8394026 , 42.45471058]) > > Yeah. It's my fault ;) It didn't use to sort at all. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Feb 4 22:08:42 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 4 Feb 2010 22:08:42 -0500 Subject: [SciPy-dev] Comments on API for Matlab's eigs equivalent (computing a few eigenvalues only) In-Reply-To: References: <4B6A68ED.6070007@silveregg.co.jp> <4B6A6E06.7060102@silveregg.co.jp> <4B6A8154.2040705@student.matnat.uio.no> <750A37B6-052E-4F91-A63A-97EC5D422AD5@cs.toronto.edu> <4B6B69C2.7000907@enthought.com> <5b8d13221002041727w131d1854sde448b76fcd9155c@mail.gmail.com> <1cd32cbb1002041824s646b6a37g7a0e4c761975401e@mail.gmail.com> Message-ID: <1cd32cbb1002041908i42e37c28sd4e54ee1d2bedadb@mail.gmail.com> On Thu, Feb 4, 2010 at 9:54 PM, Charles R Harris wrote: > > > On Thu, Feb 4, 2010 at 7:24 PM, wrote: >> >> On Thu, Feb 4, 2010 at 8:27 PM, David Cournapeau >> wrote: >> > On Fri, Feb 5, 2010 at 9:43 AM, Warren Weckesser >> > wrote: >> > >> >> David C, >> >> >> >> You said this was for symmetric matrices, but do you envision later >> >> allowing nonsymmetric matrices? >> > >> > Yes. I have only implemented symmetric/Hermitian because that's the >> > only solver that handles getting only a few eigenvalues in LAPACK. >> > Matlab own eigs function uses ARPACK for both symmetric/unsymmetric >> > cases, which is not good according to one of the Lapack developer >> > (http://mail.scipy.org/pipermail/scipy-dev/2007-March/006778.html). >> > But we could use ARPACK for the general case in eigs. >> > >> >> >> >> If not, then perhaps the name of the function should be 'eigsh', >> >> following the precedent set by numpy.linalg and scipy.linalg. >> > >> > I wonder whether those eigh functions are a good idea: I fear that >> > most people will always use eig - maybe one could use the underlying >> > eig*h solver in eig* if the matrix is detected as being symmetric ? I >> > am not really knowledgeable about those issues, though. For example, I >> > don't know whether the symmetric aspect should be checked exactly, or >> > if it is better to use a symmetric solver ?even if there are very >> > small errors in say A'-A. >> > >> >> >> >> In particular, the choice of ordering by magnitude or by real part is >> >> convenient. >> > >> > It seems that one would need to implement the non-symmetric >> > capabilities to sort this out. I fear that those options are solver >> > specific, though - maybe the solution is to have two levels of API, >> > one low-level and one high level. >> > >> > cheers, >> > >> > David >> > _______________________________________________ >> > SciPy-Dev mailing list >> > SciPy-Dev at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-dev >> > >> >> the current version of scipy.linalg.eigh sorts from smallest to largest >> >> >>> scipy.linalg.eigh(x)[0] >> array([ ?4.04457427e-03, ? 1.84073286e-01, ? 6.74875960e-01, >> ? ? ? ? 3.23328824e+00, ? 4.00741304e+00, ? 6.98333680e+00, >> ? ? ? ? 1.45314842e+01, ? 1.49260377e+01, ? 2.47166702e+01, >> ? ? ? ? 2.98955755e+01]) >> >>> scipy.linalg.eigh(x, eigvals=(0,3))[0] >> array([ 0.00404457, ?0.18407329, ?0.67487596, ?3.23328824]) >> >>> scipy.linalg.eigh(x, eigvals=(len(x)-3, len(x)-1))[0] >> array([ 14.92603769, ?24.71667021, ?29.89557547]) >> >> >> >>> x = np.random.randn(10, 10) + 1j*np.random.randn(10, 10) >> >>> x = np.dot(x.T, x) >> >>> scipy.linalg.eigh(x, eigvals=(0,3))[0] >> array([-36.82039662, -18.20362967, -12.21716065, ?-9.79752274]) >> >>> scipy.linalg.eigh(x, eigvals=(len(x)-3, len(x)-1))[0] >> array([ 14.44815917, ?28.8394026 , ?42.45471058]) >> > > Yeah. It's my fault ;) It didn't use to sort at all. I thought that's how Tiziano added them, I didn't see a sort when I briefly browsed the source. But I'm looking at lapack only from the outside. Josef > > Chuck > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From david at silveregg.co.jp Thu Feb 4 22:45:42 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Fri, 05 Feb 2010 12:45:42 +0900 Subject: [SciPy-dev] Why does orth use svd instead of QR ? Message-ID: <4B6B9466.4010507@silveregg.co.jp> Hi, I wanted to know if there was a rationale for using svd to orthonormalize the columns of a matrix (in scipy.linalg). QR-based methods are likely to be much faster, and I thought this was the standard, numerically-stable method to orthonormalize a basis ? If the reason is to deal with rank-deficient matrices, maybe we could add an option to choose between them ? cheers, David From charlesr.harris at gmail.com Fri Feb 5 00:37:55 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 4 Feb 2010 22:37:55 -0700 Subject: [SciPy-dev] Comments on API for Matlab's eigs equivalent (computing a few eigenvalues only) In-Reply-To: <1cd32cbb1002041908i42e37c28sd4e54ee1d2bedadb@mail.gmail.com> References: <4B6A68ED.6070007@silveregg.co.jp> <4B6A8154.2040705@student.matnat.uio.no> <750A37B6-052E-4F91-A63A-97EC5D422AD5@cs.toronto.edu> <4B6B69C2.7000907@enthought.com> <5b8d13221002041727w131d1854sde448b76fcd9155c@mail.gmail.com> <1cd32cbb1002041824s646b6a37g7a0e4c761975401e@mail.gmail.com> <1cd32cbb1002041908i42e37c28sd4e54ee1d2bedadb@mail.gmail.com> Message-ID: On Thu, Feb 4, 2010 at 8:08 PM, wrote: > On Thu, Feb 4, 2010 at 9:54 PM, Charles R Harris > wrote: > > > > > > On Thu, Feb 4, 2010 at 7:24 PM, wrote: > >> > >> On Thu, Feb 4, 2010 at 8:27 PM, David Cournapeau > >> wrote: > >> > On Fri, Feb 5, 2010 at 9:43 AM, Warren Weckesser > >> > wrote: > >> > > >> >> David C, > >> >> > >> >> You said this was for symmetric matrices, but do you envision later > >> >> allowing nonsymmetric matrices? > >> > > >> > Yes. I have only implemented symmetric/Hermitian because that's the > >> > only solver that handles getting only a few eigenvalues in LAPACK. > >> > Matlab own eigs function uses ARPACK for both symmetric/unsymmetric > >> > cases, which is not good according to one of the Lapack developer > >> > (http://mail.scipy.org/pipermail/scipy-dev/2007-March/006778.html). > >> > But we could use ARPACK for the general case in eigs. > >> > > >> >> > >> >> If not, then perhaps the name of the function should be 'eigsh', > >> >> following the precedent set by numpy.linalg and scipy.linalg. > >> > > >> > I wonder whether those eigh functions are a good idea: I fear that > >> > most people will always use eig - maybe one could use the underlying > >> > eig*h solver in eig* if the matrix is detected as being symmetric ? I > >> > am not really knowledgeable about those issues, though. For example, I > >> > don't know whether the symmetric aspect should be checked exactly, or > >> > if it is better to use a symmetric solver even if there are very > >> > small errors in say A'-A. > >> > > >> >> > >> >> In particular, the choice of ordering by magnitude or by real part is > >> >> convenient. > >> > > >> > It seems that one would need to implement the non-symmetric > >> > capabilities to sort this out. I fear that those options are solver > >> > specific, though - maybe the solution is to have two levels of API, > >> > one low-level and one high level. > >> > > >> > cheers, > >> > > >> > David > >> > _______________________________________________ > >> > SciPy-Dev mailing list > >> > SciPy-Dev at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/scipy-dev > >> > > >> > >> the current version of scipy.linalg.eigh sorts from smallest to largest > >> > >> >>> scipy.linalg.eigh(x)[0] > >> array([ 4.04457427e-03, 1.84073286e-01, 6.74875960e-01, > >> 3.23328824e+00, 4.00741304e+00, 6.98333680e+00, > >> 1.45314842e+01, 1.49260377e+01, 2.47166702e+01, > >> 2.98955755e+01]) > >> >>> scipy.linalg.eigh(x, eigvals=(0,3))[0] > >> array([ 0.00404457, 0.18407329, 0.67487596, 3.23328824]) > >> >>> scipy.linalg.eigh(x, eigvals=(len(x)-3, len(x)-1))[0] > >> array([ 14.92603769, 24.71667021, 29.89557547]) > >> > >> > >> >>> x = np.random.randn(10, 10) + 1j*np.random.randn(10, 10) > >> >>> x = np.dot(x.T, x) > >> >>> scipy.linalg.eigh(x, eigvals=(0,3))[0] > >> array([-36.82039662, -18.20362967, -12.21716065, -9.79752274]) > >> >>> scipy.linalg.eigh(x, eigvals=(len(x)-3, len(x)-1))[0] > >> array([ 14.44815917, 28.8394026 , 42.45471058]) > >> > > > > Yeah. It's my fault ;) It didn't use to sort at all. > > I thought that's how Tiziano added them, I didn't see a sort when I > briefly browsed the source. > But I'm looking at lapack only from the outside. > > I was thinking of the numpy version. But now that I think about it, it was a test that needed the eigenvalues sorted because the routine didn't sort them. I seem to be developing one of those trick memories... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Feb 5 00:58:23 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 4 Feb 2010 22:58:23 -0700 Subject: [SciPy-dev] Why does orth use svd instead of QR ? In-Reply-To: <4B6B9466.4010507@silveregg.co.jp> References: <4B6B9466.4010507@silveregg.co.jp> Message-ID: On Thu, Feb 4, 2010 at 8:45 PM, David Cournapeau wrote: > Hi, > > I wanted to know if there was a rationale for using svd to > orthonormalize the columns of a matrix (in scipy.linalg). QR-based > methods are likely to be much faster, and I thought this was the > standard, numerically-stable method to orthonormalize a basis ? If the > reason is to deal with rank-deficient matrices, maybe we could add an > option to choose between them ? > > QR with column rotation would deal with rank-deficient matrices and routines for that are available in LAPACK . The SVD was probably used because it was available. The diagonal elements of the R matrix can somewhat take the place of the singular values when column rotation is used. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Fri Feb 5 02:04:26 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Thu, 4 Feb 2010 23:04:26 -0800 Subject: [SciPy-dev] WOT: Experiences running 64 bit Vista on a 32 bit machine Message-ID: <45d1ab481002042304o375b7c69rb79d1af5b2de37d9@mail.gmail.com> Please forgive the widely OT request for input: I'm thinking about trying to run 64 bit Vista on my 32 bit machine (Pentium Dual-Core T4300 @ 2x2.10Ghz); following http://windows.microsoft.com/en-US/windows-vista/32-bit-and-64-bit-Windows-frequently-asked-questions#How-do-I-tell... I've confirmed that I'm "64 bit capable." Has anyone reading this far ;-) had a particularly positive/negative experience doing this? Also, how do I determine the 64 bit linux capability of my hardware? Thanks! DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Fri Feb 5 02:15:50 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Fri, 05 Feb 2010 16:15:50 +0900 Subject: [SciPy-dev] WOT: Experiences running 64 bit Vista on a 32 bit machine In-Reply-To: <45d1ab481002042304o375b7c69rb79d1af5b2de37d9@mail.gmail.com> References: <45d1ab481002042304o375b7c69rb79d1af5b2de37d9@mail.gmail.com> Message-ID: <4B6BC5A6.1000006@silveregg.co.jp> David Goldsmith wrote: > Please forgive the widely OT request for input: I'm thinking about > trying to run 64 bit Vista on my 32 bit machine (Pentium Dual-Core T4300 > @ 2x2.10Ghz); following > http://windows.microsoft.com/en-US/windows-vista/32-bit-and-64-bit-Windows-frequently-asked-questions#How-do-I-tell... > I've confirmed that I'm "64 bit capable." Has anyone reading this far > ;-) had a particularly positive/negative experience doing this? Also, > how do I determine the 64 bit linux capability of my hardware? If windows 64 runs, linux 64 will as well; as far as OS are concerned, 64 bits mode is like a new architecture (like ppc vs intel). When the PC starts, the kernel of whatever OS you are running will refuse to run if it is 64 bits and you run on a 32 bits machine. The details are OS/Boot loader dependent, but most installers will refuse to install a 64 bits installers on a 32bits-only machine anyway. You can also run a 64 bits VM inside a 32 bits, assuming your CPU is 64 bits capable (so you can run 64 bits Ubuntu in vmware on top of 32 bits Ubuntu for example). I strongly suggest using windows 7 instead of Vista if you can. cheers, David From david at silveregg.co.jp Fri Feb 5 02:18:02 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Fri, 05 Feb 2010 16:18:02 +0900 Subject: [SciPy-dev] Why does orth use svd instead of QR ? In-Reply-To: References: <4B6B9466.4010507@silveregg.co.jp> Message-ID: <4B6BC62A.8090109@silveregg.co.jp> Charles R Harris wrote: > > > On Thu, Feb 4, 2010 at 8:45 PM, David Cournapeau > wrote: > > Hi, > > I wanted to know if there was a rationale for using svd to > orthonormalize the columns of a matrix (in scipy.linalg). QR-based > methods are likely to be much faster, and I thought this was the > standard, numerically-stable method to orthonormalize a basis ? If the > reason is to deal with rank-deficient matrices, maybe we could add an > option to choose between them ? > > > QR with column rotation would deal with rank-deficient matrices and > routines for that are available in LAPACK > . The SVD was probably used > because it was available. The diagonal elements of the R matrix can > somewhat take the place of the singular values when column rotation is used. So would be it ok to use this column-rotated QR in place of svd for every case in orth ? I would have to check that QR with column rotation is still significantly faster than svd, but I would surprised if if were not the case. QR has also the advantage of being implemented in PLASMA already contrary to eigen/svd solvers, cheers, David From charlesr.harris at gmail.com Fri Feb 5 02:47:09 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 5 Feb 2010 00:47:09 -0700 Subject: [SciPy-dev] Why does orth use svd instead of QR ? In-Reply-To: <4B6BC62A.8090109@silveregg.co.jp> References: <4B6B9466.4010507@silveregg.co.jp> <4B6BC62A.8090109@silveregg.co.jp> Message-ID: On Fri, Feb 5, 2010 at 12:18 AM, David Cournapeau wrote: > Charles R Harris wrote: > > > > > > On Thu, Feb 4, 2010 at 8:45 PM, David Cournapeau > > wrote: > > > > Hi, > > > > I wanted to know if there was a rationale for using svd to > > orthonormalize the columns of a matrix (in scipy.linalg). QR-based > > methods are likely to be much faster, and I thought this was the > > standard, numerically-stable method to orthonormalize a basis ? If > the > > reason is to deal with rank-deficient matrices, maybe we could add an > > option to choose between them ? > > > > > > QR with column rotation would deal with rank-deficient matrices and > > routines for that are available in LAPACK > > . The SVD was probably used > > because it was available. The diagonal elements of the R matrix can > > somewhat take the place of the singular values when column rotation is > used. > > So would be it ok to use this column-rotated QR in place of svd for > every case in orth ? I would have to check that QR with column rotation > is still significantly faster than svd, but I would surprised if if were > not the case. QR has also the advantage of being implemented in PLASMA > already contrary to eigen/svd solvers, > > I don't know how the two methods compare in practice. SVD algorithms generally use iterated QR reductions in their implementation, so QR reductions can't be worse numerically. But the SVD probably provides a better metric for rank determination. A google search turns up some literature on the subject that I can't access from home. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Fri Feb 5 03:10:12 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 5 Feb 2010 09:10:12 +0100 Subject: [SciPy-dev] Why does orth use svd instead of QR ? In-Reply-To: <4B6BC62A.8090109@silveregg.co.jp> References: <4B6B9466.4010507@silveregg.co.jp> <4B6BC62A.8090109@silveregg.co.jp> Message-ID: <20100205081012.GA20893@phare.normalesup.org> On Fri, Feb 05, 2010 at 04:18:02PM +0900, David Cournapeau wrote: > So would be it ok to use this column-rotated QR in place of svd for > every case in orth ? I would have to check that QR with column rotation > is still significantly faster than svd, but I would surprised if if were > not the case. QR has also the advantage of being implemented in PLASMA > already contrary to eigen/svd solvers, Out of curiosity, what's PLASMA, in this context? Ga?l From charlesr.harris at gmail.com Fri Feb 5 03:12:06 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 5 Feb 2010 01:12:06 -0700 Subject: [SciPy-dev] Why does orth use svd instead of QR ? In-Reply-To: References: <4B6B9466.4010507@silveregg.co.jp> <4B6BC62A.8090109@silveregg.co.jp> Message-ID: On Fri, Feb 5, 2010 at 12:47 AM, Charles R Harris wrote: > > > On Fri, Feb 5, 2010 at 12:18 AM, David Cournapeau wrote: > >> Charles R Harris wrote: >> > >> > >> > On Thu, Feb 4, 2010 at 8:45 PM, David Cournapeau > > > wrote: >> > >> > Hi, >> > >> > I wanted to know if there was a rationale for using svd to >> > orthonormalize the columns of a matrix (in scipy.linalg). QR-based >> > methods are likely to be much faster, and I thought this was the >> > standard, numerically-stable method to orthonormalize a basis ? If >> the >> > reason is to deal with rank-deficient matrices, maybe we could add >> an >> > option to choose between them ? >> > >> > >> > QR with column rotation would deal with rank-deficient matrices and >> > routines for that are available in LAPACK >> > . The SVD was probably used >> > because it was available. The diagonal elements of the R matrix can >> > somewhat take the place of the singular values when column rotation is >> used. >> >> So would be it ok to use this column-rotated QR in place of svd for >> every case in orth ? I would have to check that QR with column rotation >> is still significantly faster than svd, but I would surprised if if were >> not the case. QR has also the advantage of being implemented in PLASMA >> already contrary to eigen/svd solvers, >> >> > I don't know how the two methods compare in practice. SVD algorithms > generally use iterated QR reductions in their implementation, so QR > reductions can't be worse numerically. But the SVD probably provides a > better metric for rank determination. A google search turns up some > literature on the subject that I can't access from home. > > OK, here's a good reference. A quick look seems to indicate that the SVD is the way to go. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Fri Feb 5 03:29:51 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Fri, 05 Feb 2010 17:29:51 +0900 Subject: [SciPy-dev] Why does orth use svd instead of QR ? In-Reply-To: References: <4B6B9466.4010507@silveregg.co.jp> <4B6BC62A.8090109@silveregg.co.jp> Message-ID: <4B6BD6FF.6070201@silveregg.co.jp> Charles R Harris wrote: > > > On Fri, Feb 5, 2010 at 12:47 AM, Charles R Harris > > wrote: > > > > On Fri, Feb 5, 2010 at 12:18 AM, David Cournapeau > > wrote: > > Charles R Harris wrote: > > > > > > On Thu, Feb 4, 2010 at 8:45 PM, David Cournapeau > > > >> wrote: > > > > Hi, > > > > I wanted to know if there was a rationale for using svd to > > orthonormalize the columns of a matrix (in scipy.linalg). > QR-based > > methods are likely to be much faster, and I thought this > was the > > standard, numerically-stable method to orthonormalize a > basis ? If the > > reason is to deal with rank-deficient matrices, maybe we > could add an > > option to choose between them ? > > > > > > QR with column rotation would deal with rank-deficient > matrices and > > routines for that are available in LAPACK > > . The SVD was > probably used > > because it was available. The diagonal elements of the R > matrix can > > somewhat take the place of the singular values when column > rotation is used. > > So would be it ok to use this column-rotated QR in place of svd for > every case in orth ? I would have to check that QR with column > rotation > is still significantly faster than svd, but I would surprised if > if were > not the case. QR has also the advantage of being implemented in > PLASMA > already contrary to eigen/svd solvers, > > > I don't know how the two methods compare in practice. SVD algorithms > generally use iterated QR reductions in their implementation, so QR > reductions can't be worse numerically. But the SVD probably provides > a better metric for rank determination. A google search turns up > some literature on the subject that I can't access from home. > > > OK, here's a good reference > . A quick > look seems to indicate that the SVD is the way to go. AFAIK, SVD is indeed the way to go, but do we really need this for the orth function ? I am wrong to think that orthonormalizing a matrix of linearly independent vectors is the most common usage for orth ? The difference in terms of speed is really significant (for example, svd of a 2000x100 matrix takes ~1.9 second vs 0.1 s for QR). cheers, David From charlesr.harris at gmail.com Fri Feb 5 03:31:17 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 5 Feb 2010 01:31:17 -0700 Subject: [SciPy-dev] Why does orth use svd instead of QR ? In-Reply-To: References: <4B6B9466.4010507@silveregg.co.jp> <4B6BC62A.8090109@silveregg.co.jp> Message-ID: On Fri, Feb 5, 2010 at 1:12 AM, Charles R Harris wrote: > > > On Fri, Feb 5, 2010 at 12:47 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Fri, Feb 5, 2010 at 12:18 AM, David Cournapeau wrote: >> >>> Charles R Harris wrote: >>> > >>> > >>> > On Thu, Feb 4, 2010 at 8:45 PM, David Cournapeau < >>> david at silveregg.co.jp >>> > > wrote: >>> > >>> > Hi, >>> > >>> > I wanted to know if there was a rationale for using svd to >>> > orthonormalize the columns of a matrix (in scipy.linalg). QR-based >>> > methods are likely to be much faster, and I thought this was the >>> > standard, numerically-stable method to orthonormalize a basis ? If >>> the >>> > reason is to deal with rank-deficient matrices, maybe we could add >>> an >>> > option to choose between them ? >>> > >>> > >>> > QR with column rotation would deal with rank-deficient matrices and >>> > routines for that are available in LAPACK >>> > . The SVD was probably used >>> > because it was available. The diagonal elements of the R matrix can >>> > somewhat take the place of the singular values when column rotation is >>> used. >>> >>> So would be it ok to use this column-rotated QR in place of svd for >>> every case in orth ? I would have to check that QR with column rotation >>> is still significantly faster than svd, but I would surprised if if were >>> not the case. QR has also the advantage of being implemented in PLASMA >>> already contrary to eigen/svd solvers, >>> >>> >> I don't know how the two methods compare in practice. SVD algorithms >> generally use iterated QR reductions in their implementation, so QR >> reductions can't be worse numerically. But the SVD probably provides a >> better metric for rank determination. A google search turns up some >> literature on the subject that I can't access from home. >> >> > OK, here's a good reference. > A quick look seems to indicate that the SVD is the way to go. > > I take that back. The QR algorithms are faster, but the SVD is more robust. In practice the LAPACK QR algorithm with column pivoting works well for most things, but there are even faster versions. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Fri Feb 5 03:33:49 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Fri, 05 Feb 2010 17:33:49 +0900 Subject: [SciPy-dev] Why does orth use svd instead of QR ? In-Reply-To: <20100205081012.GA20893@phare.normalesup.org> References: <4B6B9466.4010507@silveregg.co.jp> <4B6BC62A.8090109@silveregg.co.jp> <20100205081012.GA20893@phare.normalesup.org> Message-ID: <4B6BD7ED.9000205@silveregg.co.jp> Gael Varoquaux wrote: > On Fri, Feb 05, 2010 at 04:18:02PM +0900, David Cournapeau wrote: >> So would be it ok to use this column-rotated QR in place of svd for >> every case in orth ? I would have to check that QR with column rotation >> is still significantly faster than svd, but I would surprised if if were >> not the case. QR has also the advantage of being implemented in PLASMA >> already contrary to eigen/svd solvers, > > Out of curiosity, what's PLASMA, in this context? http://icl.cs.utk.edu/projectsfiles/plasma/html/README.html """ The main purpose of PLASMA is to address the performance shortcomings of the LAPACK and ScaLAPACK libraries on multicore processors and multi-socket systems of multicore processors. PLASMA provides routines to solve dense general systems of linear equations, symmetric positive definite systems of linear equations and linear least squares problems, using LU, Cholesky, QR and LQ factorizations. Real arithmetic and complex arithmetic are supported in both single precision and double precision. """ It is under BSD license, and as a bonus point, may be compiled easily on windows (MS is one of the sponsor of the project). The main drawback I can see is that it requires a serial BLAS, and ATLAS cannot be switched dynamically between serial and parallel (you have to relink). I am hoping to provide a basic set of wrappers for scipy, cheers, David From charlesr.harris at gmail.com Fri Feb 5 03:37:43 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 5 Feb 2010 01:37:43 -0700 Subject: [SciPy-dev] Why does orth use svd instead of QR ? In-Reply-To: <4B6BD6FF.6070201@silveregg.co.jp> References: <4B6B9466.4010507@silveregg.co.jp> <4B6BC62A.8090109@silveregg.co.jp> <4B6BD6FF.6070201@silveregg.co.jp> Message-ID: On Fri, Feb 5, 2010 at 1:29 AM, David Cournapeau wrote: > Charles R Harris wrote: > > > > > > On Fri, Feb 5, 2010 at 12:47 AM, Charles R Harris > > > wrote: > > > > > > > > On Fri, Feb 5, 2010 at 12:18 AM, David Cournapeau > > > wrote: > > > > Charles R Harris wrote: > > > > > > > > > On Thu, Feb 4, 2010 at 8:45 PM, David Cournapeau > > > > > > >> wrote: > > > > > > Hi, > > > > > > I wanted to know if there was a rationale for using svd to > > > orthonormalize the columns of a matrix (in scipy.linalg). > > QR-based > > > methods are likely to be much faster, and I thought this > > was the > > > standard, numerically-stable method to orthonormalize a > > basis ? If the > > > reason is to deal with rank-deficient matrices, maybe we > > could add an > > > option to choose between them ? > > > > > > > > > QR with column rotation would deal with rank-deficient > > matrices and > > > routines for that are available in LAPACK > > > . The SVD was > > probably used > > > because it was available. The diagonal elements of the R > > matrix can > > > somewhat take the place of the singular values when column > > rotation is used. > > > > So would be it ok to use this column-rotated QR in place of svd > for > > every case in orth ? I would have to check that QR with column > > rotation > > is still significantly faster than svd, but I would surprised if > > if were > > not the case. QR has also the advantage of being implemented in > > PLASMA > > already contrary to eigen/svd solvers, > > > > > > I don't know how the two methods compare in practice. SVD algorithms > > generally use iterated QR reductions in their implementation, so QR > > reductions can't be worse numerically. But the SVD probably provides > > a better metric for rank determination. A google search turns up > > some literature on the subject that I can't access from home. > > > > > > OK, here's a good reference > > . A quick > > look seems to indicate that the SVD is the way to go. > > AFAIK, SVD is indeed the way to go, but do we really need this for the > orth function ? I am wrong to think that orthonormalizing a matrix of > linearly independent vectors is the most common usage for orth ? The > difference in terms of speed is really significant (for example, svd of > a 2000x100 matrix takes ~1.9 second vs 0.1 s for QR). > > This looks a bit like sorting: quicksort is almost always fastest, but no quarantee. The other methods are safer but slower. Maybe the way to go is use a keyword to choose between methods. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Fri Feb 5 03:45:42 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 5 Feb 2010 09:45:42 +0100 Subject: [SciPy-dev] Why does orth use svd instead of QR ? In-Reply-To: <4B6BD7ED.9000205@silveregg.co.jp> References: <4B6B9466.4010507@silveregg.co.jp> <4B6BC62A.8090109@silveregg.co.jp> <20100205081012.GA20893@phare.normalesup.org> <4B6BD7ED.9000205@silveregg.co.jp> Message-ID: <20100205084542.GC20893@phare.normalesup.org> On Fri, Feb 05, 2010 at 05:33:49PM +0900, David Cournapeau wrote: > I am hoping to provide a basic set of wrappers for scipy, Whoho! Ga?l From charlesr.harris at gmail.com Fri Feb 5 03:54:23 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 5 Feb 2010 01:54:23 -0700 Subject: [SciPy-dev] Why does orth use svd instead of QR ? In-Reply-To: <4B6BD7ED.9000205@silveregg.co.jp> References: <4B6B9466.4010507@silveregg.co.jp> <4B6BC62A.8090109@silveregg.co.jp> <20100205081012.GA20893@phare.normalesup.org> <4B6BD7ED.9000205@silveregg.co.jp> Message-ID: On Fri, Feb 5, 2010 at 1:33 AM, David Cournapeau wrote: > Gael Varoquaux wrote: > > On Fri, Feb 05, 2010 at 04:18:02PM +0900, David Cournapeau wrote: > >> So would be it ok to use this column-rotated QR in place of svd for > >> every case in orth ? I would have to check that QR with column rotation > >> is still significantly faster than svd, but I would surprised if if were > >> not the case. QR has also the advantage of being implemented in PLASMA > >> already contrary to eigen/svd solvers, > > > > Out of curiosity, what's PLASMA, in this context? > > http://icl.cs.utk.edu/projectsfiles/plasma/html/README.html > > """ > The main purpose of PLASMA is to address the performance shortcomings of > the LAPACK and ScaLAPACK libraries on multicore processors and > multi-socket systems of multicore processors. PLASMA provides routines > to solve dense general systems of linear equations, symmetric positive > definite systems of linear equations and linear least squares problems, > using LU, Cholesky, QR and LQ factorizations. Real arithmetic and > complex arithmetic are supported in both single precision and double > precision. > """ > > It is under BSD license, and as a bonus point, may be compiled easily on > windows (MS is one of the sponsor of the project). The main drawback I > can see is that it requires a serial BLAS, and ATLAS cannot be switched > dynamically between serial and parallel (you have to relink). > > I am hoping to provide a basic set of wrappers for scipy, > > Looks like the column pivoting QR algorithm isn't there. I'm not sure what orth is supposed to be used for, but if there is no danger of rank deficiency, then the usual QR algorithm should work just fine. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Fri Feb 5 04:23:27 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Fri, 5 Feb 2010 01:23:27 -0800 Subject: [SciPy-dev] WOT: Experiences running 64 bit Vista on a 32 bit machine In-Reply-To: <4B6BC5A6.1000006@silveregg.co.jp> References: <45d1ab481002042304o375b7c69rb79d1af5b2de37d9@mail.gmail.com> <4B6BC5A6.1000006@silveregg.co.jp> Message-ID: <45d1ab481002050123p3a3abf5do2198c22d0911c708@mail.gmail.com> On Thu, Feb 4, 2010 at 11:15 PM, David Cournapeau wrote: > David Goldsmith wrote: > > Please forgive the widely OT request for input: I'm thinking about > > trying to run 64 bit Vista on my 32 bit machine (Pentium Dual-Core T4300 > > @ 2x2.10Ghz); following > > > http://windows.microsoft.com/en-US/windows-vista/32-bit-and-64-bit-Windows-frequently-asked-questions#How-do-I-tell. > .. > > I've confirmed that I'm "64 bit capable." Has anyone reading this far > > ;-) had a particularly positive/negative experience doing this? Also, > > how do I determine the 64 bit linux capability of my hardware? > > If windows 64 runs, linux 64 will as well; as far as OS are concerned, > 64 bits mode is like a new architecture (like ppc vs intel). When the PC > starts, the kernel of whatever OS you are running will refuse to run if > it is 64 bits and you run on a 32 bits machine. The details are OS/Boot > loader dependent, but most installers will refuse to install a 64 bits > installers on a 32bits-only machine anyway. You can also run a 64 bits > VM inside a 32 bits, assuming your CPU is 64 bits capable (so you can > run 64 bits Ubuntu in vmware on top of 32 bits Ubuntu for example). > > I strongly suggest using windows 7 instead of Vista if you can. > OK, thank you very much for the input. Do you have first-hand experience w/ that vmware configuration? My main issue of concern, IIRC the responses to a post I made about a year-and-a-half ago, is a single object being able to address more than 4GB of memory... Thanks again, DG > > cheers, > > David > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Fri Feb 5 04:45:51 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Fri, 05 Feb 2010 18:45:51 +0900 Subject: [SciPy-dev] Why does orth use svd instead of QR ? In-Reply-To: References: <4B6B9466.4010507@silveregg.co.jp> <4B6BC62A.8090109@silveregg.co.jp> <4B6BD6FF.6070201@silveregg.co.jp> Message-ID: <4B6BE8CF.1020207@silveregg.co.jp> Charles R Harris wrote: > > Maybe the way to > go is use a keyword to choose between methods. 'safe' vs 'fast' ? :) I wonder whether it would be ok to change the default to QR, though. I will look at the pivoted QR thing, cheers, David From josef.pktd at gmail.com Fri Feb 5 06:59:05 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 5 Feb 2010 06:59:05 -0500 Subject: [SciPy-dev] Comments on API for Matlab's eigs equivalent (computing a few eigenvalues only) In-Reply-To: <1cd32cbb1002041824s646b6a37g7a0e4c761975401e@mail.gmail.com> References: <4B6A68ED.6070007@silveregg.co.jp> <4B6A6E06.7060102@silveregg.co.jp> <4B6A8154.2040705@student.matnat.uio.no> <750A37B6-052E-4F91-A63A-97EC5D422AD5@cs.toronto.edu> <4B6B69C2.7000907@enthought.com> <5b8d13221002041727w131d1854sde448b76fcd9155c@mail.gmail.com> <1cd32cbb1002041824s646b6a37g7a0e4c761975401e@mail.gmail.com> Message-ID: <1cd32cbb1002050359g14c4b4d5m5e38c6f92349669b@mail.gmail.com> On Thu, Feb 4, 2010 at 9:24 PM, wrote: > On Thu, Feb 4, 2010 at 8:27 PM, David Cournapeau wrote: >> On Fri, Feb 5, 2010 at 9:43 AM, Warren Weckesser >> wrote: >> >>> David C, >>> >>> You said this was for symmetric matrices, but do you envision later >>> allowing nonsymmetric matrices? >> >> Yes. I have only implemented symmetric/Hermitian because that's the >> only solver that handles getting only a few eigenvalues in LAPACK. >> Matlab own eigs function uses ARPACK for both symmetric/unsymmetric >> cases, which is not good according to one of the Lapack developer >> (http://mail.scipy.org/pipermail/scipy-dev/2007-March/006778.html). >> But we could use ARPACK for the general case in eigs. >> >>> >>> If not, then perhaps the name of the function should be 'eigsh', >>> following the precedent set by numpy.linalg and scipy.linalg. >> >> I wonder whether those eigh functions are a good idea: I fear that >> most people will always use eig - maybe one could use the underlying >> eig*h solver in eig* if the matrix is detected as being symmetric ? I >> am not really knowledgeable about those issues, though. For example, I >> don't know whether the symmetric aspect should be checked exactly, or >> if it is better to use a symmetric solver ?even if there are very >> small errors in say A'-A. >> >>> >>> In particular, the choice of ordering by magnitude or by real part is >>> convenient. >> >> It seems that one would need to implement the non-symmetric >> capabilities to sort this out. I fear that those options are solver >> specific, though - maybe the solution is to have two levels of API, >> one low-level and one high level. >> >> cheers, >> >> David >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > the current version of scipy.linalg.eigh sorts from smallest to largest > >>>> scipy.linalg.eigh(x)[0] > array([ ?4.04457427e-03, ? 1.84073286e-01, ? 6.74875960e-01, > ? ? ? ? 3.23328824e+00, ? 4.00741304e+00, ? 6.98333680e+00, > ? ? ? ? 1.45314842e+01, ? 1.49260377e+01, ? 2.47166702e+01, > ? ? ? ? 2.98955755e+01]) >>>> scipy.linalg.eigh(x, eigvals=(0,3))[0] > array([ 0.00404457, ?0.18407329, ?0.67487596, ?3.23328824]) >>>> scipy.linalg.eigh(x, eigvals=(len(x)-3, len(x)-1))[0] > array([ 14.92603769, ?24.71667021, ?29.89557547]) > > >>>> x = np.random.randn(10, 10) + 1j*np.random.randn(10, 10) >>>> x = np.dot(x.T, x) >>>> scipy.linalg.eigh(x, eigvals=(0,3))[0] > array([-36.82039662, -18.20362967, -12.21716065, ?-9.79752274]) >>>> scipy.linalg.eigh(x, eigvals=(len(x)-3, len(x)-1))[0] > array([ 14.44815917, ?28.8394026 , ?42.45471058]) How does the new implementation relate to the existing implementation of selecting just a few eigenvalues in a range that is possible with the current scipy.linalg.eigh ? Tiziano added this with the integration of symeig into scipy.linalg. Josef > Josef > From cournape at gmail.com Fri Feb 5 07:59:23 2010 From: cournape at gmail.com (David Cournapeau) Date: Fri, 5 Feb 2010 21:59:23 +0900 Subject: [SciPy-dev] Comments on API for Matlab's eigs equivalent (computing a few eigenvalues only) In-Reply-To: <1cd32cbb1002050359g14c4b4d5m5e38c6f92349669b@mail.gmail.com> References: <4B6A68ED.6070007@silveregg.co.jp> <4B6A6E06.7060102@silveregg.co.jp> <4B6A8154.2040705@student.matnat.uio.no> <750A37B6-052E-4F91-A63A-97EC5D422AD5@cs.toronto.edu> <4B6B69C2.7000907@enthought.com> <5b8d13221002041727w131d1854sde448b76fcd9155c@mail.gmail.com> <1cd32cbb1002041824s646b6a37g7a0e4c761975401e@mail.gmail.com> <1cd32cbb1002050359g14c4b4d5m5e38c6f92349669b@mail.gmail.com> Message-ID: <5b8d13221002050459l7b582b7eo503ded93fee80adf@mail.gmail.com> On Fri, Feb 5, 2010 at 8:59 PM, wrote: > > How does the new implementation relate to the existing implementation > of selecting just a few eigenvalues in a range that is possible with > the current scipy.linalg.eigh ? Mostly a different interface (the underlying lapack function is the same). What bothers me with the current API for eigen/svd decompositions is the lack of consistency. The current eigh also does not enable to look for eigenvalues in a value range (e.g. all eigen values between 2 and 3), and I intend to add support for non-symmetric eigenvalues as well. cheers, David From charlesr.harris at gmail.com Fri Feb 5 11:53:25 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 5 Feb 2010 09:53:25 -0700 Subject: [SciPy-dev] Why does orth use svd instead of QR ? In-Reply-To: <4B6BE8CF.1020207@silveregg.co.jp> References: <4B6B9466.4010507@silveregg.co.jp> <4B6BC62A.8090109@silveregg.co.jp> <4B6BD6FF.6070201@silveregg.co.jp> <4B6BE8CF.1020207@silveregg.co.jp> Message-ID: On Fri, Feb 5, 2010 at 2:45 AM, David Cournapeau wrote: > Charles R Harris wrote: > > > > Maybe the way to > > go is use a keyword to choose between methods. > > 'safe' vs 'fast' ? :) > > I wonder whether it would be ok to change the default to QR, though. I > will look at the pivoted QR thing, > > That sounds OK. The documentation should probably mention that the orthogonal columns produced by the two methods will be different, otherwise we will probably get some bug reports. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Feb 5 13:15:51 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 5 Feb 2010 11:15:51 -0700 Subject: [SciPy-dev] Why does orth use svd instead of QR ? In-Reply-To: References: <4B6B9466.4010507@silveregg.co.jp> <4B6BC62A.8090109@silveregg.co.jp> <4B6BD6FF.6070201@silveregg.co.jp> <4B6BE8CF.1020207@silveregg.co.jp> Message-ID: On Fri, Feb 5, 2010 at 9:53 AM, Charles R Harris wrote: > > > On Fri, Feb 5, 2010 at 2:45 AM, David Cournapeau wrote: > >> Charles R Harris wrote: >> > >> > Maybe the way to >> > go is use a keyword to choose between methods. >> >> 'safe' vs 'fast' ? :) >> >> I wonder whether it would be ok to change the default to QR, though. I >> will look at the pivoted QR thing, >> >> > That sounds OK. The documentation should probably mention that the > orthogonal columns produced by the two methods will be different, otherwise > we will probably get some bug reports. > > I don't know if you read the paper, but the fastest and safest QR factorizations seem to be the DGEQPX and DGEQPY algorithms in ACM Algorithm 782. It looks like one might need to get permission from the ACM to use the algorithms in a BSD licensed library. Submittal of an algorithm for publication in one of the ACM Transactions impliesthat unrestricted use of the algorithm within a computer is permissible.General permission to copy and distribute the algorithm without fee is granted provided that the copies are not made or distributed for direct commercial advantage. The ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From smattacus at gmail.com Fri Feb 5 14:51:24 2010 From: smattacus at gmail.com (Sean Mattingly) Date: Fri, 5 Feb 2010 11:51:24 -0800 Subject: [SciPy-dev] WOT: Experiences running 64 bit Vista on a 32 bit machine In-Reply-To: <45d1ab481002050123p3a3abf5do2198c22d0911c708@mail.gmail.com> References: <45d1ab481002042304o375b7c69rb79d1af5b2de37d9@mail.gmail.com> <4B6BC5A6.1000006@silveregg.co.jp> <45d1ab481002050123p3a3abf5do2198c22d0911c708@mail.gmail.com> Message-ID: <856175f81002051151s1d610d8cia9116f41a1431a2d@mail.gmail.com> Hi David, I've installed Linux x64, Vista x64, and Windows 7 x64 on my desktop computer which has this kind of processor: http://www.newegg.com/Product/Product.aspx?Item=N82E16819115029 Looking around a bit on the web, I can find instances of computers with the same processor you have with 64 bit OS's already installed. If I had a computer with that processor, I wouldn't think twice about putting a 64 bit OS on it. I doubt that, with the 64 bit VM on a 32 bit host OS on a 64 bit CPU, you'd be able to address more than the ~ 2.8 gigs of memory or so which is the limit for 32 bit machines running windows (after hardware and whatever takes its chunk of the 4g visible). Indeed, finding some threads on this: http://superuser.com/questions/15434/how-does-vmware-guest-os-memory-usage-work There's some links to forum discussion on that page where people have tried that, but the gist of it is that the VM can be told to use more than the amount of ram physically provided, but it's paging memory, and not physically accessing the unused memory. What you're asking about, if it can use more than the 4g limit, would require the VM making its own OS - like interface with the hardware...which means it's not a VM anymore! Also, I want to agree with an earlier message that you go with Windows 7. I had a dual - boot configuration of Vista and Linux (Openbox on Ubuntu), and Vista was just giving me all sorts of problems. I replaced the Vista installation with Windows 7 about 2 weeks ago, so these weren't "early rollout" issues. 7 is a much smoother OS; under the hood, it's running a lot of the good things from Vista anyways, so there's no reason to go with Vista. Really, IMO, going with Vista would be like taking windows ME over XP SP2, at this point. And the driver support is much better, as some manufacturers and software developers have just stopped caring about Vista. Finally, if you decide to install Vista or Windows 7 and you have more than one hard drive partition, feel free to ping me with questions, because Microsoft made some...interesting...design decisions in the install process there. Hope some of this helps... - Sean On Fri, Feb 5, 2010 at 1:23 AM, David Goldsmith wrote: > On Thu, Feb 4, 2010 at 11:15 PM, David Cournapeau wrote: > >> David Goldsmith wrote: >> > Please forgive the widely OT request for input: I'm thinking about >> > trying to run 64 bit Vista on my 32 bit machine (Pentium Dual-Core T4300 >> > @ 2x2.10Ghz); following >> > >> http://windows.microsoft.com/en-US/windows-vista/32-bit-and-64-bit-Windows-frequently-asked-questions#How-do-I-tell. >> .. >> > I've confirmed that I'm "64 bit capable." Has anyone reading this far >> > ;-) had a particularly positive/negative experience doing this? Also, >> > how do I determine the 64 bit linux capability of my hardware? >> >> If windows 64 runs, linux 64 will as well; as far as OS are concerned, >> 64 bits mode is like a new architecture (like ppc vs intel). When the PC >> starts, the kernel of whatever OS you are running will refuse to run if >> it is 64 bits and you run on a 32 bits machine. The details are OS/Boot >> loader dependent, but most installers will refuse to install a 64 bits >> installers on a 32bits-only machine anyway. You can also run a 64 bits >> VM inside a 32 bits, assuming your CPU is 64 bits capable (so you can >> run 64 bits Ubuntu in vmware on top of 32 bits Ubuntu for example). >> >> I strongly suggest using windows 7 instead of Vista if you can. >> > > OK, thank you very much for the input. Do you have first-hand experience > w/ that vmware configuration? My main issue of concern, IIRC the responses > to a post I made about a year-and-a-half ago, is a single object being able > to address more than 4GB of memory... Thanks again, > > DG > > >> >> cheers, >> >> David >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Fri Feb 5 15:07:25 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Fri, 5 Feb 2010 12:07:25 -0800 Subject: [SciPy-dev] WOT: Experiences running 64 bit Vista on a 32 bit machine In-Reply-To: <856175f81002051151s1d610d8cia9116f41a1431a2d@mail.gmail.com> References: <45d1ab481002042304o375b7c69rb79d1af5b2de37d9@mail.gmail.com> <4B6BC5A6.1000006@silveregg.co.jp> <45d1ab481002050123p3a3abf5do2198c22d0911c708@mail.gmail.com> <856175f81002051151s1d610d8cia9116f41a1431a2d@mail.gmail.com> Message-ID: <45d1ab481002051207s7afce8f8paf6cf0681a613457@mail.gmail.com> On Fri, Feb 5, 2010 at 11:51 AM, Sean Mattingly wrote: > Hi David, > > I've installed Linux x64, Vista x64, and Windows 7 x64 on my desktop > computer which has this kind of processor: > > http://www.newegg.com/Product/Product.aspx?Item=N82E16819115029 > > Looking around a bit on the web, I can find instances of computers with the > same processor you have with 64 bit OS's already installed. If I had a > computer with that processor, I wouldn't think twice about putting a 64 bit > OS on it. > > I doubt that, with the 64 bit VM on a 32 bit host OS on a 64 bit CPU, you'd > be able to address more than the ~ 2.8 gigs of memory or so which is the > limit for 32 bit machines running windows (after hardware and whatever takes > its chunk of the 4g visible). Indeed, finding some threads on this: > > > http://superuser.com/questions/15434/how-does-vmware-guest-os-memory-usage-work > > There's some links to forum discussion on that page where people have tried > that, but the gist of it is that the VM can be told to use more than the > amount of ram physically provided, but it's paging memory, and not > physically accessing the unused memory. What you're asking about, if it can > use more than the 4g limit, would require the VM making its own OS - like > interface with the hardware...which means it's not a VM anymore! > > Also, I want to agree with an earlier message that you go with Windows 7. I > had a dual - boot configuration of Vista and Linux (Openbox on Ubuntu), and > Vista was just giving me all sorts of problems. I replaced the Vista > installation with Windows 7 about 2 weeks ago, so these weren't "early > rollout" issues. 7 is a much smoother OS; under the hood, it's running a lot > of the good things from Vista anyways, so there's no reason to go with > Vista. Really, IMO, going with Vista would be like taking windows ME over XP > SP2, at this point. And the driver support is much better, as some > manufacturers and software developers have just stopped caring about Vista. > > Finally, if you decide to install Vista or Windows 7 and you have more than > one hard drive partition, feel free to ping me with questions, because > Microsoft made some...interesting...design decisions in the install process > there. > > Hope some of this helps... > - Sean It helps a lot, thanks, but just to be clear, mine is dual core 32 bit hardware, yes? So the info about the 64 bit VM on a 32 bit host OS on a 64 bit CPU isn't exactly applicable, is it? Right now, I'm leaning toward keeping my Windows install as is and trying a 64-bit linux install in a dual boot config., as the only thing presently I need the extra memory addressing for is Python-related software. :-) Thanks again. DG PS: now that I've netted a couple "volunteer instructors," if the linux majority would prefer it if I took this exchange off list... -------------- next part -------------- An HTML attachment was scrubbed... URL: From luszczek at eecs.utk.edu Fri Feb 5 23:05:12 2010 From: luszczek at eecs.utk.edu (Piotr Luszczek) Date: Sat, 6 Feb 2010 04:05:12 +0000 (UTC) Subject: [SciPy-dev] Why does orth use svd instead of QR ? References: <4B6B9466.4010507@silveregg.co.jp> <4B6BC62A.8090109@silveregg.co.jp> <20100205081012.GA20893@phare.normalesup.org> <4B6BD7ED.9000205@silveregg.co.jp> Message-ID: David Cournapeau silveregg.co.jp> writes: > > Gael Varoquaux wrote: > > On Fri, Feb 05, 2010 at 04:18:02PM +0900, David Cournapeau wrote: > >> So would be it ok to use this column-rotated QR in place of svd for > >> every case in orth ? I would have to check that QR with column rotation > >> is still significantly faster than svd, but I would surprised if if were > >> not the case. QR has also the advantage of being implemented in PLASMA > >> already contrary to eigen/svd solvers, > > > > Out of curiosity, what's PLASMA, in this context? > http://icl.cs.utk.edu/projectsfiles/plasma/html/README.html > It is under BSD license, and as a bonus point, may be compiled easily on > windows (MS is one of the sponsor of the project). The main drawback I > can see is that it requires a serial BLAS, and ATLAS cannot be switched > dynamically between serial and parallel (you have to relink). > > I am hoping to provide a basic set of wrappers for scipy, We (PLASMA team) are looking into this problem. In principle it should be possible to just use parallel ATLAS and call internal functions of ATLAS to get at the serial interface. We might add this feature to PLASMA some time soon. Piotr From luszczek at eecs.utk.edu Fri Feb 5 23:21:35 2010 From: luszczek at eecs.utk.edu (Piotr Luszczek) Date: Sat, 6 Feb 2010 04:21:35 +0000 (UTC) Subject: [SciPy-dev] Why does orth use svd instead of QR ? References: <4B6B9466.4010507@silveregg.co.jp> <4B6BC62A.8090109@silveregg.co.jp> <20100205081012.GA20893@phare.normalesup.org> <4B6BD7ED.9000205@silveregg.co.jp> Message-ID: Charles R Harris gmail.com> writes: > > > On Fri, Feb 5, 2010 at 1:33 AM, David Cournapeau silveregg.co.jp> wrote: > Gael Varoquaux wrote: > > On Fri, Feb 05, 2010 at 04:18:02PM +0900, David Cournapeau wrote: > >> So would be it ok to use this column-rotated QR in place of svd for > >> every case in orth ? I would have to check that QR with column rotation > >> is still significantly faster than svd, but I would surprised if if were > >> not the case. QR has also the advantage of being implemented in PLASMA > >> already contrary to eigen/svd solvers, > > > > Out of curiosity, what's PLASMA, in this context? > http://icl.cs.utk.edu/projectsfiles/plasma/html/README.html > It is under BSD license, and as a bonus point, may be compiled easily on > windows (MS is one of the sponsor of the project). The main drawback I > can see is that it requires a serial BLAS, and ATLAS cannot be switched > dynamically between serial and parallel (you have to relink). > I am hoping to provide a basic set of wrappers for scipy, > > > Looks like the column pivoting QR algorithm isn't there. I'm not sure what orth is supposed to be used for, but if there is no danger of rank deficiency, then the usual QR algorithm should work just fine.Chuck Indeed, there is no rank-revealing QR in PLASMA (I'm assuming this is what you're after). And there are no immediate plans for it. Piotr PS. I'm part of PLASMA team. From opossumnano at gmail.com Sun Feb 7 03:14:29 2010 From: opossumnano at gmail.com (Tiziano Zito) Date: Sun, 7 Feb 2010 09:14:29 +0100 Subject: [SciPy-dev] Comments on API for Matlab's eigs equivalent (computing a few eigenvalues only) In-Reply-To: <5b8d13221002050459l7b582b7eo503ded93fee80adf@mail.gmail.com> References: <4B6A6E06.7060102@silveregg.co.jp> <4B6A8154.2040705@student.matnat.uio.no> <750A37B6-052E-4F91-A63A-97EC5D422AD5@cs.toronto.edu> <4B6B69C2.7000907@enthought.com> <5b8d13221002041727w131d1854sde448b76fcd9155c@mail.gmail.com> <1cd32cbb1002041824s646b6a37g7a0e4c761975401e@mail.gmail.com> <1cd32cbb1002050359g14c4b4d5m5e38c6f92349669b@mail.gmail.com> <5b8d13221002050459l7b582b7eo503ded93fee80adf@mail.gmail.com> Message-ID: <20100207081429.GB3065@localhost> > > How does the new implementation relate to the existing implementation > > of selecting just a few eigenvalues in a range that is possible with > > the current scipy.linalg.eigh ? > > Mostly a different interface (the underlying lapack function is the > same). What bothers me with the current API for eigen/svd > decompositions is the lack of consistency. The current eigh also does > not enable to look for eigenvalues in a value range (e.g. all eigen > values between 2 and 3), and I intend to add support for non-symmetric > eigenvalues as well. Is the new eigs function going to supersede eigh? I did not know that LAPACK allowed selecting a range of eigenvalues by value. Which are the LAPACK routines you are referring to? I agree that a consistent interface for eigen/svd decomposition is good, I am on the other hand a little unsure that linalg needs an eigs function together with an eigh function. Why not simply change the eigh interface in a backward compatible way? ciao, tiziano From tmp50 at ukr.net Sun Feb 7 04:37:12 2010 From: tmp50 at ukr.net (Dmitrey) Date: Sun, 07 Feb 2010 11:37:12 +0200 Subject: [SciPy-dev] Don't you mind me to put a minor fix into some scipy.optimize docstrings? Message-ID: Hello, I've been informed scipy.optimize docstrings still point to scikits.openopt (thus google search yields deprecated webpages), so if you don't mind I'll point it to mere openopt. Regards, D. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sun Feb 7 07:05:31 2010 From: cournape at gmail.com (David Cournapeau) Date: Sun, 7 Feb 2010 21:05:31 +0900 Subject: [SciPy-dev] Comments on API for Matlab's eigs equivalent (computing a few eigenvalues only) In-Reply-To: <20100207081429.GB3065@localhost> References: <4B6A6E06.7060102@silveregg.co.jp> <750A37B6-052E-4F91-A63A-97EC5D422AD5@cs.toronto.edu> <4B6B69C2.7000907@enthought.com> <5b8d13221002041727w131d1854sde448b76fcd9155c@mail.gmail.com> <1cd32cbb1002041824s646b6a37g7a0e4c761975401e@mail.gmail.com> <1cd32cbb1002050359g14c4b4d5m5e38c6f92349669b@mail.gmail.com> <5b8d13221002050459l7b582b7eo503ded93fee80adf@mail.gmail.com> <20100207081429.GB3065@localhost> Message-ID: <5b8d13221002070405j116fd747n8ff52f84e677fd30@mail.gmail.com> On Sun, Feb 7, 2010 at 5:14 PM, Tiziano Zito wrote: >> > How does the new implementation relate to the existing implementation >> > of selecting just a few eigenvalues in a range that is possible with >> > the current scipy.linalg.eigh ? >> >> Mostly a different interface (the underlying lapack function is the >> same). What bothers me with the current API for eigen/svd >> decompositions is the lack of consistency. The current eigh also does >> not enable to look for eigenvalues in a value range (e.g. all eigen >> values between 2 and 3), and I intend to add support for non-symmetric >> eigenvalues as well. > > Is the new eigs function going to supersede eigh? I did not know > that LAPACK allowed selecting a range of eigenvalues by value. Which > are the LAPACK routines you are referring to? The ones used in eigh, e.g. dsyevr. The range mode may be an addition in lapack 3.0, I am not sure: """ * DSYEVR computes selected eigenvalues and, optionally, eigenvectors * of a real symmetric matrix A. Eigenvalues and eigenvectors can be * selected by specifying either a range of values or a range of * indices for the desired eigenvalues. """ > I agree that a consistent interface for eigen/svd decomposition is > good, I am on the other hand a little unsure that linalg needs an > eigs function together with an eigh function. Why not simply change the > eigh interface in a backward compatible way? Because I intend to support non-symmetric matrices (using arpack for the unsymmetric case). David From warren.weckesser at enthought.com Sun Feb 7 20:57:14 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Sun, 07 Feb 2010 19:57:14 -0600 Subject: [SciPy-dev] Ticket #1105 -- patch submitted for scipy.signal.waveforms.chirp() Message-ID: <4B6F6F7A.70506@enthought.com> I just added a patch to ticket #1105; a summary of the changes is given in the ticket. If any scipy.signal users (especially chirp()) users) have chance, please take a look a let me know what you think. The docstrings could use polishing, but I'd like to get some feedback before doing more work on it. Warren From cournape at gmail.com Mon Feb 8 20:45:00 2010 From: cournape at gmail.com (David Cournapeau) Date: Tue, 9 Feb 2010 10:45:00 +0900 Subject: [SciPy-dev] Why does orth use svd instead of QR ? In-Reply-To: References: <4B6B9466.4010507@silveregg.co.jp> <4B6BC62A.8090109@silveregg.co.jp> <20100205081012.GA20893@phare.normalesup.org> <4B6BD7ED.9000205@silveregg.co.jp> Message-ID: <5b8d13221002081745t7eee8405g72b3d1bc2da10740@mail.gmail.com> On Sat, Feb 6, 2010 at 1:05 PM, Piotr Luszczek wrote: > > We (PLASMA team) are looking into this problem. In principle > it should be possible to just use parallel ATLAS and call > internal functions of ATLAS to get at the serial interface. > We might add this feature to PLASMA some time soon. Great, thank you for the information. cheers, David From stefan at sun.ac.za Tue Feb 9 04:31:12 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 9 Feb 2010 11:31:12 +0200 Subject: [SciPy-dev] Changes to trunk/scipy/optimize Message-ID: <9457e7c81002090131v148a86d1w1eef996c9f3c19a2@mail.gmail.com> Hi all, I noticed the following changeset coming through. Do you think docstrings are the right place for advertising external packages? If not, should they be moved to the module docstring, or removed entirely? Regards St?fan ---------- Forwarded message ---------- Author: dmitrey.kroshko Date: 2010-02-09 02:41:34 -0600 (Tue, 09 Feb 2010) New Revision: 6221 Modified: trunk/scipy/optimize/anneal.py trunk/scipy/optimize/cobyla.py trunk/scipy/optimize/lbfgsb.py trunk/scipy/optimize/minpack.py trunk/scipy/optimize/nnls.py trunk/scipy/optimize/optimize.py trunk/scipy/optimize/slsqp.py trunk/scipy/optimize/tnc.py Log: scikits.openopt replaced by mere openopt Modified: trunk/scipy/optimize/anneal.py =================================================================== --- trunk/scipy/optimize/anneal.py 2010-02-08 15:18:27 UTC (rev 6220) +++ trunk/scipy/optimize/anneal.py 2010-02-09 08:41:34 UTC (rev 6221) @@ -217,6 +217,8 @@ fixed_point -- scalar fixed-point finder + OpenOpt -- Python package with more optimization solvers + """ x0 = asarray(x0) lower = asarray(lower) Modified: trunk/scipy/optimize/cobyla.py =================================================================== --- trunk/scipy/optimize/cobyla.py 2010-02-08 15:18:27 UTC (rev 6220) +++ trunk/scipy/optimize/cobyla.py 2010-02-09 08:41:34 UTC (rev 6221) @@ -46,8 +46,6 @@ See also: - scikits.openopt, which offers a unified syntax to call this and other solvers - fmin, fmin_powell, fmin_cg, fmin_bfgs, fmin_ncg -- multivariate local optimizers leastsq -- nonlinear least squares minimizer @@ -65,6 +63,9 @@ fixed_point -- scalar fixed-point finder + OpenOpt -- a tool which offers a unified syntax to call this and + other solvers with possibility of automatic differentiation + """ err = "cons must be a sequence of callable functions or a single"\ " callable function." Modified: trunk/scipy/optimize/lbfgsb.py =================================================================== --- trunk/scipy/optimize/lbfgsb.py 2010-02-08 15:18:27 UTC (rev 6220) +++ trunk/scipy/optimize/lbfgsb.py 2010-02-09 08:41:34 UTC (rev 6221) @@ -119,8 +119,6 @@ ACM Transactions on Mathematical Software, Vol 23, Num. 4, pp. 550 - 560. See also: - scikits.openopt, which offers a unified syntax to call this and other solvers - fmin, fmin_powell, fmin_cg, fmin_bfgs, fmin_ncg -- multivariate local optimizers leastsq -- nonlinear least squares minimizer @@ -138,6 +136,9 @@ fixed_point -- scalar fixed-point finder + OpenOpt -- a tool which offers a unified syntax to call this and + other solvers with possibility of automatic differentiation + """ n = len(x0) Modified: trunk/scipy/optimize/minpack.py =================================================================== --- trunk/scipy/optimize/minpack.py 2010-02-08 15:18:27 UTC (rev 6220) +++ trunk/scipy/optimize/minpack.py 2010-02-09 08:41:34 UTC (rev 6221) @@ -102,8 +102,6 @@ See Also -------- - scikits.openopt : offers a unified syntax to call this and other solvers - fmin, fmin_powell, fmin_cg, fmin_bfgs, fmin_ncg : multivariate local optimizers leastsq : nonlinear least squares minimizer @@ -118,6 +116,9 @@ fixed_point : scalar and vector fixed-point finder + OpenOpt : a tool which offers a unified syntax to call this and + other solvers with possibility of automatic differentiation + """ if not warning : msg = "The warning keyword is deprecated. Use the warnings module." @@ -263,7 +264,6 @@ See Also -------- - scikits.openopt: offers a unified syntax to call this and other solvers fmin, fmin_powell, fmin_cg, fmin_bfgs, fmin_ncg: multivariate local optimizers fmin_l_bfgs_b, fmin_tnc, fmin_cobyla: constrained multivariate optimizers anneal, brute: global optimizers @@ -272,6 +272,9 @@ brentq, brenth, ridder, bisect, newton: one-dimensional root-finding fixed_point: scalar and vector fixed-point finder curve_fit: find parameters for a curve-fitting problem. + OpenOpt : a tool which offers a unified syntax to call this and + other solvers with possibility of automatic differentiation + """ if not warning : msg = "The warning keyword is deprecated. Use the warnings module." Modified: trunk/scipy/optimize/nnls.py =================================================================== --- trunk/scipy/optimize/nnls.py 2010-02-08 15:18:27 UTC (rev 6220) +++ trunk/scipy/optimize/nnls.py 2010-02-09 08:41:34 UTC (rev 6221) @@ -16,6 +16,8 @@ wrapper around NNLS.F code below nnls/ directory + Check OpenOpt for more LLSP solvers + """ A,b = map(asarray_chkfinite, (A,b)) Modified: trunk/scipy/optimize/optimize.py =================================================================== --- trunk/scipy/optimize/optimize.py 2010-02-08 15:18:27 UTC (rev 6220) +++ trunk/scipy/optimize/optimize.py 2010-02-09 08:41:34 UTC (rev 6221) @@ -156,7 +156,8 @@ Uses a Nelder-Mead simplex algorithm to find the minimum of function of one or more variables. - + Check OpenOpt - a tool which offers a unified syntax to call + this and other solvers with possibility of automatic differentiation. """ fcalls, func = wrap_function(func, args) x0 = asfarray(x0).flatten() @@ -694,8 +695,8 @@ *See Also*: - scikits.openopt : SciKit which offers a unified syntax to call - this and other solvers. + OpenOpt : a tool which offers a unified syntax to call + this and other solvers with possibility of automatic differentiation. """ x0 = asarray(x0).squeeze() @@ -862,7 +863,8 @@ using the nonlinear conjugate gradient algorithm of Polak and Ribiere See Wright, and Nocedal 'Numerical Optimization', 1999, pg. 120-122. - + Check OpenOpt - a tool which offers a unified syntax to call + this and other solvers with possibility of automatic differentiation. """ x0 = asarray(x0).flatten() if maxiter is None: @@ -1018,8 +1020,7 @@ If True, return a list of results at each iteration. :Notes: - 1. scikits.openopt offers a unified syntax to call this and other solvers. - 2. Only one of `fhess_p` or `fhess` need to be given. If `fhess` + 1. Only one of `fhess_p` or `fhess` need to be given. If `fhess` is provided, then `fhess_p` will be ignored. If neither `fhess` nor `fhess_p` is provided, then the hessian product will be approximated using finite differences on `fprime`. `fhess_p` @@ -1027,6 +1028,8 @@ given, finite-differences on `fprime` are used to compute it. See Wright, and Nocedal 'Numerical Optimization', 1999, pg. 140. + 2. Check OpenOpt - a tool which offers a unified syntax to call + this and other solvers with possibility of automatic differentiation. """ x0 = asarray(x0).flatten() @@ -1179,8 +1182,9 @@ Finds a local minimizer of the scalar function `func` in the interval x1 < xopt < x2 using Brent's method. (See `brent` for auto-bracketing). + Check OpenOpt - a tool which offers a unified syntax to call + this and other solvers with possibility of automatic differentiation. - """ # Test bounds are of correct form @@ -1722,7 +1726,8 @@ Uses a modification of Powell's method to find the minimum of a function of N variables. - + Check OpenOpt - a tool which offers a unified syntax to call + this and other solvers with possibility of automatic differentiation. """ # we need to use a mutable object here that we can update in the # wrapper function Modified: trunk/scipy/optimize/slsqp.py =================================================================== --- trunk/scipy/optimize/slsqp.py 2010-02-08 15:18:27 UTC (rev 6220) +++ trunk/scipy/optimize/slsqp.py 2010-02-09 08:41:34 UTC (rev 6221) @@ -146,6 +146,11 @@ for examples see :ref:`in the tutorial ` + See also + -------- + OpenOpt - a tool which offers a unified syntax to call this + and other solvers with possibility of automatic differentiation. + """ exit_modes = { -1 : "Gradient evaluation required (g & a)", Modified: trunk/scipy/optimize/tnc.py =================================================================== --- trunk/scipy/optimize/tnc.py 2010-02-08 15:18:27 UTC (rev 6220) +++ trunk/scipy/optimize/tnc.py 2010-02-09 08:41:34 UTC (rev 6221) @@ -164,8 +164,6 @@ Return code as defined in the RCSTRINGS dict. :SeeAlso: - - scikits.openopt, which offers a unified syntax to call this and other solvers - - fmin, fmin_powell, fmin_cg, fmin_bfgs, fmin_ncg : multivariate local optimizers @@ -184,6 +182,9 @@ - fixed_point : scalar fixed-point finder + - OpenOpt : a tool which offers a unified syntax to call this and + other solvers with possibility of automatic differentiation. + """ x0 = asarray(x0, dtype=float).tolist() n = len(x0) _______________________________________________ Scipy-svn mailing list Scipy-svn at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-svn -------------- next part -------------- An HTML attachment was scrubbed... URL: From millman at berkeley.edu Tue Feb 9 05:14:45 2010 From: millman at berkeley.edu (Jarrod Millman) Date: Tue, 9 Feb 2010 02:14:45 -0800 Subject: [SciPy-dev] Changes to trunk/scipy/optimize In-Reply-To: <9457e7c81002090131v148a86d1w1eef996c9f3c19a2@mail.gmail.com> References: <9457e7c81002090131v148a86d1w1eef996c9f3c19a2@mail.gmail.com> Message-ID: 2010/2/9 St?fan van der Walt : > I noticed the following changeset coming through. Do you think docstrings > are the right place for advertising external packages?? If not, should they > be moved to the module docstring, or removed entirely? My vote would be to remove them. I think the 'See Also' section should only point to numpy/scipy/scikits code. We can point to external packages on the website. -- Jarrod Millman Helen Wills Neuroscience Institute 10 Giannini Hall, UC Berkeley http://cirl.berkeley.edu/ From ralf.gommers at googlemail.com Tue Feb 9 05:15:08 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 9 Feb 2010 18:15:08 +0800 Subject: [SciPy-dev] Changes to trunk/scipy/optimize In-Reply-To: <9457e7c81002090131v148a86d1w1eef996c9f3c19a2@mail.gmail.com> References: <9457e7c81002090131v148a86d1w1eef996c9f3c19a2@mail.gmail.com> Message-ID: 2010/2/9 St?fan van der Walt > Hi all, > > I noticed the following changeset coming through. Do you think docstrings > are the right place for advertising external packages? If not, should they > be moved to the module docstring, or removed entirely? > > > First, thanks to Dimitry for asking on this list before making the change. That said, docstrings do not seem like the right place for this, module docstring and/or tutorial seem like more appropriate places. I do not see any reason to remove mention of OpenOpt completely, many external packages are mentioned in our docs and OpenOpt is clearly relevant. That it moved from scikits to some other location does not matter. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmp50 at ukr.net Tue Feb 9 05:19:50 2010 From: tmp50 at ukr.net (Dmitrey) Date: Tue, 09 Feb 2010 12:19:50 +0200 Subject: [SciPy-dev] Changes to trunk/scipy/optimize In-Reply-To: <9457e7c81002090131v148a86d1w1eef996c9f3c19a2@mail.gmail.com> Message-ID: I had asked it 2 days ago before today's commit http://permalink.gmane.org/gmane.comp.python.scientific.devel/12740 why couldn't you answer earlier? as for mention scikits.openopt, it was allowed that time I had asked for it (about 2 years ago). In any way, of course, you can undo the changes / remove it completely / do anything else what you want. D. --- ???????? ????????? --- ?? ????: St?fan van der Walt ????: SciPy Developers List ????: 9 ???????, 11:31:12 ????: [SciPy-dev] Changes to trunk/scipy/optimize Hi all, I noticed the following changeset coming through. Do you think docstrings are the right place for advertising external packages? If not, should they be moved to the module docstring, or removed entirely? Regards St?fan ---------- Forwarded message ---------- Author: dmitrey.kroshko Date: 2010-02-09 02:41:34 -0600 (Tue, 09 Feb 2010) New Revision: 6221 Modified: trunk/scipy/optimize/anneal.py trunk/scipy/optimize/cobyla.py trunk/scipy/optimize/lbfgsb.py trunk/scipy/optimize/minpack.py trunk/scipy/optimize/nnls.py trunk/scipy/optimize/optimize.py trunk/scipy/optimize/slsqp.py trunk/scipy/optimize/tnc.py Log: scikits.openopt replaced by mere openopt Modified: trunk/scipy/optimize/anneal.py =================================================================== --- trunk/scipy/optimize/anneal.py 2010-02-08 15:18:27 UTC (rev 6220) +++ trunk/scipy/optimize/anneal.py 2010-02-09 08:41:34 UTC (rev 6221) @@ -217,6 +217,8 @@ fixed_point -- scalar fixed-point finder + OpenOpt -- Python package with more optimization solvers + """ x0 = asarray(x0) lower = asarray(lower) Modified: trunk/scipy/optimize/cobyla.py =================================================================== --- trunk/scipy/optimize/cobyla.py 2010-02-08 15:18:27 UTC (rev 6220) +++ trunk/scipy/optimize/cobyla.py 2010-02-09 08:41:34 UTC (rev 6221) @@ -46,8 +46,6 @@ See also: - scikits.openopt, which offers a unified syntax to call this and other solvers - fmin, fmin_powell, fmin_cg, fmin_bfgs, fmin_ncg -- multivariate local optimizers leastsq -- nonlinear least squares minimizer @@ -65,6 +63,9 @@ fixed_point -- scalar fixed-point finder + OpenOpt -- a tool which offers a unified syntax to call this and + other solvers with possibility of automatic differentiation + """ err = "cons must be a sequence of callable functions or a single"\ " callable function." Modified: trunk/scipy/optimize/lbfgsb.py =================================================================== --- trunk/scipy/optimize/lbfgsb.py 2010-02-08 15:18:27 UTC (rev 6220) +++ trunk/scipy/optimize/lbfgsb.py 2010-02-09 08:41:34 UTC (rev 6221) @@ -119,8 +119,6 @@ ACM Transactions on Mathematical Software, Vol 23, Num. 4, pp. 550 - 560. See also: - scikits.openopt, which offers a unified syntax to call this and other solvers - fmin, fmin_powell, fmin_cg, fmin_bfgs, fmin_ncg -- multivariate local optimizers leastsq -- nonlinear least squares minimizer @@ -138,6 +136,9 @@ fixed_point -- scalar fixed-point finder + OpenOpt -- a tool which offers a unified syntax to call this and + other solvers with possibility of automatic differentiation + """ n = len(x0) Modified: trunk/scipy/optimize/minpack.py =================================================================== --- trunk/scipy/optimize/minpack.py 2010-02-08 15:18:27 UTC (rev 6220) +++ trunk/scipy/optimize/minpack.py 2010-02-09 08:41:34 UTC (rev 6221) @@ -102,8 +102,6 @@ See Also -------- - scikits.openopt : offers a unified syntax to call this and other solvers - fmin, fmin_powell, fmin_cg, fmin_bfgs, fmin_ncg : multivariate local optimizers leastsq : nonlinear least squares minimizer @@ -118,6 +116,9 @@ fixed_point : scalar and vector fixed-point finder + OpenOpt : a tool which offers a unified syntax to call this and + other solvers with possibility of automatic differentiation + """ if not warning : msg = "The warning keyword is deprecated. Use the warnings module." @@ -263,7 +264,6 @@ See Also -------- - scikits.openopt: offers a unified syntax to call this and other solvers fmin, fmin_powell, fmin_cg, fmin_bfgs, fmin_ncg: multivariate local optimizers fmin_l_bfgs_b, fmin_tnc, fmin_cobyla: constrained multivariate optimizers anneal, brute: global optimizers @@ -272,6 +272,9 @@ brentq, brenth, ridder, bisect, newton: one-dimensional root-finding fixed_point: scalar and vector fixed-point finder curve_fit: find parameters for a curve-fitting problem. + OpenOpt : a tool which offers a unified syntax to call this and + other solvers with possibility of automatic differentiation + """ if not warning : msg = "The warning keyword is deprecated. Use the warnings module." Modified: trunk/scipy/optimize/nnls.py =================================================================== --- trunk/scipy/optimize/nnls.py 2010-02-08 15:18:27 UTC (rev 6220) +++ trunk/scipy/optimize/nnls.py 2010-02-09 08:41:34 UTC (rev 6221) @@ -16,6 +16,8 @@ wrapper around NNLS.F code below nnls/ directory + Check OpenOpt for more LLSP solvers + """ A,b = map(asarray_chkfinite, (A,b)) Modified: trunk/scipy/optimize/optimize.py =================================================================== --- trunk/scipy/optimize/optimize.py 2010-02-08 15:18:27 UTC (rev 6220) +++ trunk/scipy/optimize/optimize.py 2010-02-09 08:41:34 UTC (rev 6221) @@ -156,7 +156,8 @@ Uses a Nelder-Mead simplex algorithm to find the minimum of function of one or more variables. - + Check OpenOpt - a tool which offers a unified syntax to call + this and other solvers with possibility of automatic differentiation. """ fcalls, func = wrap_function(func, args) x0 = asfarray(x0).flatten() @@ -694,8 +695,8 @@ *See Also*: - scikits.openopt : SciKit which offers a unified syntax to call - this and other solvers. + OpenOpt : a tool which offers a unified syntax to call + this and other solvers with possibility of automatic differentiation. """ x0 = asarray(x0).squeeze() @@ -862,7 +863,8 @@ using the nonlinear conjugate gradient algorithm of Polak and Ribiere See Wright, and Nocedal 'Numerical Optimization', 1999, pg. 120-122. - + Check OpenOpt - a tool which offers a unified syntax to call + this and other solvers with possibility of automatic differentiation. """ x0 = asarray(x0).flatten() if maxiter is None: @@ -1018,8 +1020,7 @@ If True, return a list of results at each iteration. :Notes: - 1. scikits.openopt offers a unified syntax to call this and other solvers. - 2. Only one of `fhess_p` or `fhess` need to be given. If `fhess` + 1. Only one of `fhess_p` or `fhess` need to be given. If `fhess` is provided, then `fhess_p` will be ignored. If neither `fhess` nor `fhess_p` is provided, then the hessian product will be approximated using finite differences on `fprime`. `fhess_p` @@ -1027,6 +1028,8 @@ given, finite-differences on `fprime` are used to compute it. See Wright, and Nocedal 'Numerical Optimization', 1999, pg. 140. + 2. Check OpenOpt - a tool which offers a unified syntax to call + this and other solvers with possibility of automatic differentiation. """ x0 = asarray(x0).flatten() @@ -1179,8 +1182,9 @@ Finds a local minimizer of the scalar function `func` in the interval x1 < xopt < x2 using Brent's method. (See `brent` for auto-bracketing). + Check OpenOpt - a tool which offers a unified syntax to call + this and other solvers with possibility of automatic differentiation. - """ # Test bounds are of correct form @@ -1722,7 +1726,8 @@ Uses a modification of Powell's method to find the minimum of a function of N variables. - + Check OpenOpt - a tool which offers a unified syntax to call + this and other solvers with possibility of automatic differentiation. """ # we need to use a mutable object here that we can update in the # wrapper function Modified: trunk/scipy/optimize/slsqp.py =================================================================== --- trunk/scipy/optimize/slsqp.py 2010-02-08 15:18:27 UTC (rev 6220) +++ trunk/scipy/optimize/slsqp.py 2010-02-09 08:41:34 UTC (rev 6221) @@ -146,6 +146,11 @@ for examples see :ref:`in the tutorial ` + See also + -------- + OpenOpt - a tool which offers a unified syntax to call this + and other solvers with possibility of automatic differentiation. + """ exit_modes = { -1 : "Gradient evaluation required (g & a)", Modified: trunk/scipy/optimize/tnc.py =================================================================== --- trunk/scipy/optimize/tnc.py 2010-02-08 15:18:27 UTC (rev 6220) +++ trunk/scipy/optimize/tnc.py 2010-02-09 08:41:34 UTC (rev 6221) @@ -164,8 +164,6 @@ Return code as defined in the RCSTRINGS dict. :SeeAlso: - - scikits.openopt, which offers a unified syntax to call this and other solvers - - fmin, fmin_powell, fmin_cg, fmin_bfgs, fmin_ncg : multivariate local optimizers @@ -184,6 +182,9 @@ - fixed_point : scalar fixed-point finder + - OpenOpt : a tool which offers a unified syntax to call this and + other solvers with possibility of automatic differentiation. + """ x0 = asarray(x0, dtype=float).tolist() n = len(x0) _______________________________________________ Scipy-svn mailing list Scipy-svn at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-svn _______________________________________________ SciPy-Dev mailing list SciPy-Dev at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From millman at berkeley.edu Tue Feb 9 06:05:41 2010 From: millman at berkeley.edu (Jarrod Millman) Date: Tue, 9 Feb 2010 03:05:41 -0800 Subject: [SciPy-dev] Changes to trunk/scipy/optimize In-Reply-To: References: <9457e7c81002090131v148a86d1w1eef996c9f3c19a2@mail.gmail.com> Message-ID: 2010/2/9 Dmitrey : > I had asked it 2 days ago before today's commit > http://permalink.gmane.org/gmane.comp.python.scientific.devel/12740 > why couldn't you answer earlier? Thanks for raising the issue to the list. Sorry, I missed your original email and I am sure Stefan must have missed it as well. Please don't take this discussion as in any way an attempt to discourage people from using OpenOpt or suggesting you did anything wrong. I know that OpenOpt is extremely useful and you've done a great job developing it. > as for mention scikits.openopt, it was allowed that time I had asked for it > (about 2 years ago). Just to be clear, I think Stefan was asking a more general question that arose due to this specific instance. The OpenOpt situation is a bit unique since it was originally a Google SoC project for SciPy, but after funding ran out became a successful stand-alone project of its own. Now the question is where is the best place for us to reference external, but relevant and useful Python projects. My personal feeling is that it shouldn't be in the 'see also' section of our docstrings, but we don't have any official policy on that yet. So we need to have a general discussion about what the general policy should be. Personally I would say that the primary place to point to external packages is in the topical software section of website. For instance, OpenOpt is pointed to here: http://www.scipy.org/Topical_Software#head-d21a11d2d173826993e03eb937fac7e6347e6d5f I also think it would be fine to occasionally use external packages in the tutorials if deemed useful. But, in general, I would expect external packages to have their own tutorials. I would prefer to limit the docstrings to just our core projects (numpy and scipy for certain and perhaps the scikits as well). If we don't limit the docstrings in this way, I can see us either 1) getting in the situation where it isn't clear how much more we should add to the docstrings for the sake of completeness or 2) inadvertently getting into political battles by appearing to favor certain external projects while not mentioning others. I am very interested in hearing what everyone else thinks about this issue. However, I think it would be most useful to discuss this in general, rather than with a focus on openopt. So if we decide not to reference external packages in scipy and it turns out that we reference several others in addition to openopt, then we should apply the same standard to all the cases. Best, Jarrod From luethi at vaw.baug.ethz.ch Tue Feb 9 07:22:28 2010 From: luethi at vaw.baug.ethz.ch (Martin =?ISO-8859-1?Q?L=FCthi?=) Date: Tue, 09 Feb 2010 13:22:28 +0100 Subject: [SciPy-dev] Changes to trunk/scipy/optimize In-Reply-To: References: <9457e7c81002090131v148a86d1w1eef996c9f3c19a2@mail.gmail.com> Message-ID: <87mxziegzf.wl%luethi@vaw.baug.ethz.ch> Hi At Tue, 9 Feb 2010 03:05:41 -0800, Jarrod Millman wrote: > Now the question is where is the best place for us to reference > external, but relevant and useful Python projects. My personal > feeling is that it shouldn't be in the 'see also' section of our > docstrings, but we don't have any official policy on that yet. So we > need to have a general discussion about what the general policy should > be. I completely understand your reasoning. However, I would still find it useful if there were a docstring section named "similar packages", mentioning either of a) relevant packages that are closely painfully integrated with scipy and are generally regarded as useful b) a link to the relevant URL of the topical software guide. Option (a) might lead to some "political" problems, which I cannot imagine being difficult. Option (b) would be useful in any case. While checking out code the Topical Software Pages or the Scipy Wiki are not the immediately obvious places to look for more relevant information. Best, Martin From josef.pktd at gmail.com Tue Feb 9 09:31:35 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 9 Feb 2010 09:31:35 -0500 Subject: [SciPy-dev] Changes to trunk/scipy/optimize In-Reply-To: <87mxziegzf.wl%luethi@vaw.baug.ethz.ch> References: <9457e7c81002090131v148a86d1w1eef996c9f3c19a2@mail.gmail.com> <87mxziegzf.wl%luethi@vaw.baug.ethz.ch> Message-ID: <1cd32cbb1002090631q1d2623cbwda6216f3e1b34a97@mail.gmail.com> 2010/2/9 Martin L?thi : > Hi > > At Tue, 9 Feb 2010 03:05:41 -0800, > Jarrod Millman wrote: >> Now the question is where is the best place for us to reference >> external, but relevant and useful Python projects. ?My personal >> feeling is that it shouldn't be in the 'see also' section of our >> docstrings, but we don't have any official policy on that yet. ?So we >> need to have a general discussion about what the general policy should >> be. > > I completely understand your reasoning. However, I would still find it useful > if there were a docstring section named "similar packages", mentioning either of > > a) relevant packages that are closely painfully integrated with scipy and are > ? generally regarded as useful > > b) a link to the relevant URL of the topical software guide. > > Option (a) might lead to some "political" problems, which I cannot imagine > being difficult. Option (b) would be useful in any case. While checking out > code the Topical Software Pages or the Scipy Wiki are not the immediately > obvious places to look for more relevant information. I think this would be useful on the module or scipy subpackage level but not for the docstrings for individual functions or classes. In my opinion, the docstring of function should provide only closely related functions and classes for quick lookup. Extra information should go into a different section, for example in the statsmodels docs, I included a page with related packages. For the specific case of scipy.optimize, I think also some of the internal See Also can be removed, since many of the docstrings are almost a table of content of scipy.optimize, which also can be directly seen on the subpackage level. If I remember correctly, in some parts of scipy some weeding of umbrella SeeAlso has already occured. On Wiki versus documentation pages, I think it would be helpful to introduce additional descriptive sections on the scipy.subpackage level, which could include specific Topical Software. I'm using the numpy/scipy docs mainly through htmlhelp (.chm) where the tree view makes browsing very fast (similar to matlab help), maybe the html pages need more links than this. Josef > > Best, Martin > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From robert.kern at gmail.com Tue Feb 9 10:30:44 2010 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 9 Feb 2010 09:30:44 -0600 Subject: [SciPy-dev] Changes to trunk/scipy/optimize In-Reply-To: <1cd32cbb1002090631q1d2623cbwda6216f3e1b34a97@mail.gmail.com> References: <9457e7c81002090131v148a86d1w1eef996c9f3c19a2@mail.gmail.com> <87mxziegzf.wl%luethi@vaw.baug.ethz.ch> <1cd32cbb1002090631q1d2623cbwda6216f3e1b34a97@mail.gmail.com> Message-ID: <3d375d731002090730i7bc7da2eybda44eee094b1dde@mail.gmail.com> On Tue, Feb 9, 2010 at 08:31, wrote: > 2010/2/9 Martin L?thi : >> Hi >> >> At Tue, 9 Feb 2010 03:05:41 -0800, >> Jarrod Millman wrote: >>> Now the question is where is the best place for us to reference >>> external, but relevant and useful Python projects. ?My personal >>> feeling is that it shouldn't be in the 'see also' section of our >>> docstrings, but we don't have any official policy on that yet. ?So we >>> need to have a general discussion about what the general policy should >>> be. >> >> I completely understand your reasoning. However, I would still find it useful >> if there were a docstring section named "similar packages", mentioning either of >> >> a) relevant packages that are closely painfully integrated with scipy and are >> ? generally regarded as useful >> >> b) a link to the relevant URL of the topical software guide. >> >> Option (a) might lead to some "political" problems, which I cannot imagine >> being difficult. Option (b) would be useful in any case. While checking out >> code the Topical Software Pages or the Scipy Wiki are not the immediately >> obvious places to look for more relevant information. > > I think this would be useful on the module or scipy subpackage level > but not for the docstrings for individual functions or classes. In my > opinion, the docstring of function should provide only closely related > functions and classes for quick lookup. Extra information should go > into a different section, for example in the statsmodels docs, I > included a page with related packages. +1 > For the specific case of scipy.optimize, I think also some of the > internal See Also can be removed, since many of the docstrings are > almost a table of content of scipy.optimize, which also can be > directly seen on the subpackage level. If I remember correctly, in > some parts of scipy some weeding of umbrella SeeAlso has already > occured. +1 > On Wiki versus documentation pages, I think it would be helpful to > introduce additional descriptive sections on the scipy.subpackage > level, which could include specific Topical Software. +0.5 -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From jsseabold at gmail.com Tue Feb 9 11:03:07 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 9 Feb 2010 11:03:07 -0500 Subject: [SciPy-dev] Changes to trunk/scipy/optimize In-Reply-To: <1cd32cbb1002090631q1d2623cbwda6216f3e1b34a97@mail.gmail.com> References: <9457e7c81002090131v148a86d1w1eef996c9f3c19a2@mail.gmail.com> <87mxziegzf.wl%luethi@vaw.baug.ethz.ch> <1cd32cbb1002090631q1d2623cbwda6216f3e1b34a97@mail.gmail.com> Message-ID: On Tue, Feb 9, 2010 at 9:31 AM, wrote: > 2010/2/9 Martin L?thi : >> Hi >> >> At Tue, 9 Feb 2010 03:05:41 -0800, >> Jarrod Millman wrote: >>> Now the question is where is the best place for us to reference >>> external, but relevant and useful Python projects. ?My personal >>> feeling is that it shouldn't be in the 'see also' section of our >>> docstrings, but we don't have any official policy on that yet. ?So we >>> need to have a general discussion about what the general policy should >>> be. >> >> I completely understand your reasoning. However, I would still find it useful >> if there were a docstring section named "similar packages", mentioning either of >> >> a) relevant packages that are closely painfully integrated with scipy and are >> ? generally regarded as useful >> >> b) a link to the relevant URL of the topical software guide. >> >> Option (a) might lead to some "political" problems, which I cannot imagine >> being difficult. Option (b) would be useful in any case. While checking out >> code the Topical Software Pages or the Scipy Wiki are not the immediately >> obvious places to look for more relevant information. > > +1 to (b). While I think (a), would be of more use, it could be difficult to keep such sections clean, short, and relevant, but having them some other place that's readily accessible is a good compromise. I've been hearing some very mild push back against using Python from others recently, because it's a "dark horse." Whether or not this is a fair moniker is irrelevant, but I think part of this is because it doesn't feel like a totally integrated environment for problem solving across packages (the fact that Python is much more general than this seems to get lost...). The question then is, is scipy the place to act as a gateway for scientific problem solving writ large? I'd say yes... > I think this would be useful on the module or scipy subpackage level > but not for the docstrings for individual functions or classes. In my > opinion, the docstring of function should provide only closely related > functions and classes for quick lookup. Extra information should go > into a different section, for example in the statsmodels docs, I > included a page with related packages. > +1 Fair compromise. > For the specific case of scipy.optimize, I think also some of the > internal See Also can be removed, since many of the docstrings are > almost a table of content of scipy.optimize, which also can be > directly seen on the subpackage level. If I remember correctly, in > some parts of scipy some weeding of umbrella SeeAlso has already > occured. > > On Wiki versus documentation pages, I think it would be helpful to > introduce additional descriptive sections on the scipy.subpackage > level, which could include specific Topical Software. > > I'm using the numpy/scipy docs mainly through htmlhelp (.chm) where > the tree view makes browsing very fast (similar to matlab help), maybe > the html pages need more links than this. > > Josef > > >> >> Best, Martin Skipper From matthew.brett at gmail.com Tue Feb 9 13:24:30 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 9 Feb 2010 10:24:30 -0800 Subject: [SciPy-dev] Changes to trunk/scipy/optimize In-Reply-To: <3d375d731002090730i7bc7da2eybda44eee094b1dde@mail.gmail.com> References: <9457e7c81002090131v148a86d1w1eef996c9f3c19a2@mail.gmail.com> <87mxziegzf.wl%luethi@vaw.baug.ethz.ch> <1cd32cbb1002090631q1d2623cbwda6216f3e1b34a97@mail.gmail.com> <3d375d731002090730i7bc7da2eybda44eee094b1dde@mail.gmail.com> Message-ID: <1e2af89e1002091024hbfc2d01q6ef3e7a7eadfaae2@mail.gmail.com> Hi, >> I think this would be useful on the module or scipy subpackage level >> but not for the docstrings for individual functions or classes. In my >> opinion, the docstring of function should provide only closely related >> functions and classes for quick lookup. Extra information should go >> into a different section, for example in the statsmodels docs, I >> included a page with related packages. > > +1 That seems a sensible suggestion. Matthew From rob.clewley at gmail.com Tue Feb 9 15:49:07 2010 From: rob.clewley at gmail.com (Rob Clewley) Date: Tue, 9 Feb 2010 15:49:07 -0500 Subject: [SciPy-dev] Changes to trunk/scipy/optimize In-Reply-To: <87mxziegzf.wl%luethi@vaw.baug.ethz.ch> References: <9457e7c81002090131v148a86d1w1eef996c9f3c19a2@mail.gmail.com> <87mxziegzf.wl%luethi@vaw.baug.ethz.ch> Message-ID: > I completely understand your reasoning. However, I would still find it useful > if there were a docstring section named "similar packages", mentioning either of > > a) relevant packages that are closely painfully integrated with scipy and are > ? generally regarded as useful > > b) a link to the relevant URL of the topical software guide. > > Option (a) might lead to some "political" problems, which I cannot imagine > being difficult. Option (b) would be useful in any case. While checking out > code the Topical Software Pages or the Scipy Wiki are not the immediately > obvious places to look for more relevant information. +1 for (b). It achieves the same thing as (a) with only minimal additional inconvenience to the user but much greater convenience to developers of the external codes and to the maintainers of the scipy code. There is already a one stop shop for this info at the Topical Software wiki page, and this just has to be better advertised IMO. -Rob From david at silveregg.co.jp Tue Feb 9 21:12:14 2010 From: david at silveregg.co.jp (David Cournapeau) Date: Wed, 10 Feb 2010 11:12:14 +0900 Subject: [SciPy-dev] Latex and docstrings Message-ID: <4B7215FE.5030803@silveregg.co.jp> Hi, I noticed that some of the docstrings I have written for DCT have been changed to latex format. While I have no issue with having latex in the documentation, I thought the consensus was to use them sparingly in docstrings ? For example, the dct I formula used to be (fixed width font assumed): for 0 <= k < N, N-1 y[k] = x[0] + (-1)**k x[N-1] + 2 * sum x[n]*cos(pi*k*n/(N-1)) n=0 But now, it is: y_k = x_0 + (-1)^k x_{N-1} + 2\\sum_{n=1}^{N-2} x_n \\cos\\left({\\pi nk\\over N-1}\\right), \\qquad 0 \\le k < N. I much prefer the former (the latter is unreadable in a terminal IMO). I have of course no issue in putting the latex formula in the scipy docs, thanks, David From robertlayton at gmail.com Tue Feb 9 23:38:56 2010 From: robertlayton at gmail.com (Robert Layton) Date: Wed, 10 Feb 2010 15:38:56 +1100 Subject: [SciPy-dev] Ticket #467 pending decision for 3 months Message-ID: <585dc5e21002092038j3678d516ma01c79c10c5831d3@mail.gmail.com> I submitted a fix for ticket #467 a while ago, which is quite a simple fix. As scipy's mean and std functions are now passing through to numpy, there is little reason to test them as part of scipy (the appropriate tests should be in numpy). Even if its decided that the tests should be retained, theres a patch for that (r) as well. Thoughts? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Wed Feb 10 02:44:36 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 10 Feb 2010 09:44:36 +0200 Subject: [SciPy-dev] Changes to trunk/scipy/optimize In-Reply-To: References: <9457e7c81002090131v148a86d1w1eef996c9f3c19a2@mail.gmail.com> Message-ID: <9457e7c81002092344i72886a26gcee3460782e4647e@mail.gmail.com> 2010/2/9 Dmitrey : > I had asked it 2 days ago before today's commit > http://permalink.gmane.org/gmane.comp.python.scientific.devel/12740 > why couldn't you answer earlier? Sorry, Dmitrey, I didn't see your earlier message. I included a link to OpenOpt in the optimize module docstring, and removed the See Also sections entirely (they were difficult to maintain, so I would have removed them either way). I hope you find this solution to your satisfaction. Kind regards St?fan From stefan at sun.ac.za Wed Feb 10 03:18:27 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 10 Feb 2010 10:18:27 +0200 Subject: [SciPy-dev] Ticket #467 pending decision for 3 months In-Reply-To: <585dc5e21002092038j3678d516ma01c79c10c5831d3@mail.gmail.com> References: <585dc5e21002092038j3678d516ma01c79c10c5831d3@mail.gmail.com> Message-ID: <9457e7c81002100018k69022e86n967f052dc6dc13c8@mail.gmail.com> Hi Robert On 10 February 2010 06:38, Robert Layton wrote: > I submitted a fix for ticket #467 a while ago, which is quite a simple fix. > > As scipy's mean and std functions are now passing through to numpy, there is > little reason to test them as part of scipy (the appropriate tests should be > in numpy). > Even if its decided that the tests should be retained, theres a patch for > that (r) as well. Sorry for not paying attention to this earlier. I think we should remove tests that only validate numpy's behaviour, so your `without_numpy.patch' looks good. Unfortunately, it doesn't apply cleanly; would you have a chance to look at it again? Thanks, St?fan From pav+sp at iki.fi Wed Feb 10 04:48:51 2010 From: pav+sp at iki.fi (Pauli Virtanen) Date: Wed, 10 Feb 2010 09:48:51 +0000 (UTC) Subject: [SciPy-dev] Latex and docstrings References: <4B7215FE.5030803@silveregg.co.jp> Message-ID: Wed, 10 Feb 2010 11:12:14 +0900, David Cournapeau wrote: [clip] > for 0 <= k < N, > > N-1 > y[k] = x[0] + (-1)**k x[N-1] + 2 * sum x[n]*cos(pi*k*n/(N-1)) > n=0 > > But now, it is: > > y_k = x_0 + (-1)^k x_{N-1} + 2\\sum_{n=1}^{N-2} x_n > \\cos\\left({\\pi nk\\over N-1}\\right), \\qquad 0 \\le k < N. > > I much prefer the former (the latter is unreadable in a terminal IMO). I > have of course no issue in putting the latex formula in the scipy docs, Someone could try to write a "simple math" plugin for Sphinx that converted poor man's math notation to Latex. Cheers, Pauli From stefan at sun.ac.za Wed Feb 10 05:24:16 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 10 Feb 2010 12:24:16 +0200 Subject: [SciPy-dev] Latex and docstrings In-Reply-To: References: <4B7215FE.5030803@silveregg.co.jp> Message-ID: <9457e7c81002100224s261b648u6c5a81412aa55448@mail.gmail.com> On 10 February 2010 11:48, Pauli Virtanen wrote: >> I much prefer the former (the latter is unreadable in a terminal IMO). I >> have of course no issue in putting the latex formula in the scipy docs, > > Someone could try to write a "simple math" plugin for Sphinx that > converted poor man's math notation to Latex. We could look at AsciiMathML for inspiration: http://asciimathml.sourceforge.net/ Regards St?fan From dagss at student.matnat.uio.no Wed Feb 10 06:30:16 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 10 Feb 2010 12:30:16 +0100 Subject: [SciPy-dev] Latex and docstrings In-Reply-To: <9457e7c81002100224s261b648u6c5a81412aa55448@mail.gmail.com> References: <4B7215FE.5030803@silveregg.co.jp> <9457e7c81002100224s261b648u6c5a81412aa55448@mail.gmail.com> Message-ID: <4B7298C8.30000@student.matnat.uio.no> St?fan van der Walt wrote: > On 10 February 2010 11:48, Pauli Virtanen wrote: > >>> I much prefer the former (the latter is unreadable in a terminal IMO). I >>> have of course no issue in putting the latex formula in the scipy docs, >>> >> Someone could try to write a "simple math" plugin for Sphinx that >> converted poor man's math notation to Latex. >> > > We could look at AsciiMathML for inspiration: > > http://asciimathml.sourceforge.net/ > +1, that looks great. One should ban expressions like "a/b/c/d" though -- apparently AsciiMathML typesets that as (a/b)/(c/d), while in Python that would be a/(b*c*d). Dag Sverre From josef.pktd at gmail.com Wed Feb 10 10:44:55 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 10 Feb 2010 10:44:55 -0500 Subject: [SciPy-dev] Ticket #467 pending decision for 3 months In-Reply-To: <9457e7c81002100018k69022e86n967f052dc6dc13c8@mail.gmail.com> References: <585dc5e21002092038j3678d516ma01c79c10c5831d3@mail.gmail.com> <9457e7c81002100018k69022e86n967f052dc6dc13c8@mail.gmail.com> Message-ID: <1cd32cbb1002100744i47c5b87bv91f46f6e65b597f5@mail.gmail.com> 2010/2/10 St?fan van der Walt : > Hi Robert > > On 10 February 2010 06:38, Robert Layton wrote: >> I submitted a fix for ticket #467 a while ago, which is quite a simple fix. >> >> As scipy's mean and std functions are now passing through to numpy, there is >> little reason to test them as part of scipy (the appropriate tests should be >> in numpy). >> Even if its decided that the tests should be retained, theres a patch for >> that (r) as well. > > Sorry for not paying attention to this earlier. ?I think we should > remove tests that only validate numpy's behaviour, so your > `without_numpy.patch' looks good. ?Unfortunately, it doesn't apply > cleanly; would you have a chance to look at it again? this was sitting in my drafts folder since November ''' Sorry, for not replying earlier, I have seen your patches before but I don't know what I would prefer. I agree that numpy functions should be tested in numpy. On the other hand, the stats tests already include additional test matrices, that can be used to check the precision of the numpy functions. And I would like to (As an example, numpy random is mostly tested in scipy.stats since there pdf, pmf and cdf of the distributions are available.) '''' The point for stats is that I didn't find any precision test in the numpy test suite for mean, var and so on. When I wrote the anova tests using the NIST reference cases, numpy.mean did pretty badly for the badly scaled test cases. I never checked the NIST test cases specifically for mean, var, ... I still don't know where precision tests should be, but the outcome would be very useful. In ANOVA I ended up calculating the mean twice (if the dataset is badly scaled) to pass the NIST test. Josef (google is not very informative about donkeys and piles of hay) > > Thanks, > St?fan > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From josef.pktd at gmail.com Wed Feb 10 11:03:42 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 10 Feb 2010 11:03:42 -0500 Subject: [SciPy-dev] Ticket #467 pending decision for 3 months In-Reply-To: <1cd32cbb1002100744i47c5b87bv91f46f6e65b597f5@mail.gmail.com> References: <585dc5e21002092038j3678d516ma01c79c10c5831d3@mail.gmail.com> <9457e7c81002100018k69022e86n967f052dc6dc13c8@mail.gmail.com> <1cd32cbb1002100744i47c5b87bv91f46f6e65b597f5@mail.gmail.com> Message-ID: <1cd32cbb1002100803i4d3b73fcle72db01f6c3c7b37@mail.gmail.com> On Wed, Feb 10, 2010 at 10:44 AM, wrote: > 2010/2/10 St?fan van der Walt : >> Hi Robert >> >> On 10 February 2010 06:38, Robert Layton wrote: >>> I submitted a fix for ticket #467 a while ago, which is quite a simple fix. >>> >>> As scipy's mean and std functions are now passing through to numpy, there is >>> little reason to test them as part of scipy (the appropriate tests should be >>> in numpy). >>> Even if its decided that the tests should be retained, theres a patch for >>> that (r) as well. >> >> Sorry for not paying attention to this earlier. ?I think we should >> remove tests that only validate numpy's behaviour, so your >> `without_numpy.patch' looks good. ?Unfortunately, it doesn't apply >> cleanly; would you have a chance to look at it again? > > this was sitting in my drafts folder since November > > ''' > Sorry, for not replying earlier, I have seen your patches before but I > don't know > what I would prefer. > > I agree that numpy functions should be tested in numpy. On the other hand, > the stats tests already include additional test matrices, that can be used > to check the precision of the numpy functions. And I would like to > > (As an example, numpy random is mostly tested in scipy.stats since there > pdf, pmf and cdf of the distributions are available.) > '''' > > The point for stats is that I didn't find any precision test in the > numpy test suite for mean, var and so on. > > When I wrote the anova tests using the NIST reference cases, > numpy.mean did pretty badly for the badly scaled test cases. I never > checked the NIST test cases specifically for mean, var, ... > > I still don't know where precision tests should be, but the outcome > would be very useful. In ANOVA I ended up calculating the mean twice > (if the dataset is badly scaled) to pass the NIST test. > > Josef > (google is not very informative about donkeys and piles of hay) > Speaking of piles of hay: http://projects.scipy.org/scipy/ticket/999 (O(n log(n)) ) in python http://projects.scipy.org/scipy/ticket/893 )(n**2) (I think) in cython I don't have much experience guessing which would perform better and no time for comparing them. One of the two should go into scipy, since both are much better than the current implementation. Which? Josef > > >> >> Thanks, >> St?fan >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > From robertlayton at gmail.com Wed Feb 10 16:54:03 2010 From: robertlayton at gmail.com (Robert Layton) Date: Thu, 11 Feb 2010 08:54:03 +1100 Subject: [SciPy-dev] SciPy-Dev Digest, Vol 76, Issue 14 In-Reply-To: References: Message-ID: <585dc5e21002101354v187b7e04m7f089ba1d2e3f4e4@mail.gmail.com> >2010/2/10 St?fan van der Walt : >> Hi Robert >> >> On 10 February 2010 06:38, Robert Layton wrote: >>> I submitted a fix for ticket #467 a while ago, which is quite a simple fix. >>> >>> As scipy's mean and std functions are now passing through to numpy, there is >>> little reason to test them as part of scipy (the appropriate tests should be >>> in numpy). >>> Even if its decided that the tests should be retained, theres a patch for >>> that (r) as well. >> >> Sorry for not paying attention to this earlier. ?I think we should >> remove tests that only validate numpy's behaviour, so your >> `without_numpy.patch' looks good. ?Unfortunately, it doesn't apply >> cleanly; would you have a chance to look at it again? > >this was sitting in my drafts folder since November > >''' >Sorry, for not replying earlier, I have seen your patches before but I >don't know >what I would prefer. > >I agree that numpy functions should be tested in numpy. On the other hand, >the stats tests already include additional test matrices, that can be used >to check the precision of the numpy functions. And I would like to > >(As an example, numpy random is mostly tested in scipy.stats since there >pdf, pmf and cdf of the distributions are available.) >'''' > >The point for stats is that I didn't find any precision test in the >numpy test suite for mean, var and so on. > >When I wrote the anova tests using the NIST reference cases, >numpy.mean did pretty badly for the badly scaled test cases. I never >checked the NIST test cases specifically for mean, var, ... > >I still don't know where precision tests should be, but the outcome >would be very useful. In ANOVA I ended up calculating the mean twice >(if the dataset is badly scaled) to pass the NIST test. > >Josef >(google is not very informative about donkeys and piles of hay) > I have updated the patch and it now applys to the current SVN version. As I stated in the ticket, there is a problem with my build environment with a different test (test_ltisys.TestSS2TF.test_basic(0, 3, 3) ... ** On entry to DGEEV parameter number 5 had an illegal value ) so I can't verify that the test works, but it does build and install. Thanks Robert -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Feb 10 17:48:28 2010 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 10 Feb 2010 16:48:28 -0600 Subject: [SciPy-dev] Web space for docstring-linked PDFs? Message-ID: <3d375d731002101448x6c817043sd9eb6351455c643b@mail.gmail.com> I would like to make the PDFs of the ODRPACK User's Guide available somewhere on (docs.)scipy.org in order to provide a stable URL for them. The URL currently in the scipy.odr docstring is now broken. I currently have them on my own web space, but it would be nice to have them available somewhere stable on scipy.org (although not as a wiki attachment). I am now getting an email every month or so telling me that the link is broken. Is there a reasonable place in the docs.scipy.org infrastructure where I can place these static files such that they will be backed up and eventually migrated along with the rest of the infrastructure as it evolves? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From stefan at sun.ac.za Thu Feb 11 01:41:45 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 11 Feb 2010 08:41:45 +0200 Subject: [SciPy-dev] Web space for docstring-linked PDFs? In-Reply-To: <3d375d731002101448x6c817043sd9eb6351455c643b@mail.gmail.com> References: <3d375d731002101448x6c817043sd9eb6351455c643b@mail.gmail.com> Message-ID: <9457e7c81002102241n57ffff69w26c1592dbd411437@mail.gmail.com> On 11 February 2010 00:48, Robert Kern wrote: > I would like to make the PDFs of the ODRPACK User's Guide available > somewhere on (docs.)scipy.org in order to provide a stable URL for > them. The URL currently in the scipy.odr docstring is now broken. I > currently have them on my own web space, but it would be nice to have > them available somewhere stable on scipy.org (although not as a wiki > attachment). I am now getting an email every month or so telling me > that the link is broken. Is there a reasonable place in the > docs.scipy.org infrastructure where I can place these static files > such that they will be backed up and eventually migrated along with > the rest of the infrastructure as it evolves? If the files are not to big, let's add them to the SVN repo under either /trunk/doc/frontpage or the new /scipy.org The new scipy site can be previewed at new.scipy.org. Regards St?fan From dwf at cs.toronto.edu Thu Feb 11 02:20:24 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 11 Feb 2010 02:20:24 -0500 Subject: [SciPy-dev] Web space for docstring-linked PDFs? In-Reply-To: <3d375d731002101448x6c817043sd9eb6351455c643b@mail.gmail.com> References: <3d375d731002101448x6c817043sd9eb6351455c643b@mail.gmail.com> Message-ID: <93CB8BF7-4DEC-4FAA-A2B3-7F54385ED863@cs.toronto.edu> On 10-Feb-10, at 5:48 PM, Robert Kern wrote: > I would like to make the PDFs of the ODRPACK User's Guide available > somewhere on (docs.)scipy.org in order to provide a stable URL for > them. The URL currently in the scipy.odr docstring is now broken. I > currently have them on my own web space, but it would be nice to have > them available somewhere stable on scipy.org (although not as a wiki > attachment). I am now getting an email every month or so telling me > that the link is broken. Is there a reasonable place in the > docs.scipy.org infrastructure where I can place these static files > such that they will be backed up and eventually migrated along with > the rest of the infrastructure as it evolves? I don't have an answer but I think it's an important question. A number of times in the scipy.org-Sphinx effort I've been worried about where to stick old wiki attachments and how to link to them so that those links will be fairly stable for years to come. Wiki attachments are nice in this way but the wiki has other problems... David From robert.kern at gmail.com Thu Feb 11 10:37:16 2010 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 11 Feb 2010 09:37:16 -0600 Subject: [SciPy-dev] Web space for docstring-linked PDFs? In-Reply-To: <9457e7c81002102241n57ffff69w26c1592dbd411437@mail.gmail.com> References: <3d375d731002101448x6c817043sd9eb6351455c643b@mail.gmail.com> <9457e7c81002102241n57ffff69w26c1592dbd411437@mail.gmail.com> Message-ID: <3d375d731002110737i5b0d6cebo7fdfbeb52f6cdfd7@mail.gmail.com> 2010/2/11 St?fan van der Walt : > On 11 February 2010 00:48, Robert Kern wrote: >> I would like to make the PDFs of the ODRPACK User's Guide available >> somewhere on (docs.)scipy.org in order to provide a stable URL for >> them. The URL currently in the scipy.odr docstring is now broken. I >> currently have them on my own web space, but it would be nice to have >> them available somewhere stable on scipy.org (although not as a wiki >> attachment). I am now getting an email every month or so telling me >> that the link is broken. Is there a reasonable place in the >> docs.scipy.org infrastructure where I can place these static files >> such that they will be backed up and eventually migrated along with >> the rest of the infrastructure as it evolves? > > If the files are not to big, let's add them to the SVN repo under either > > /trunk/doc/frontpage or the new > /scipy.org > > The new scipy site can be previewed at new.scipy.org. How big is too big? % ls -lh total 900K -rw-r--r-- 1 rkern rkern 156K May 27 2009 odr_ams.pdf -rw-r--r-- 1 rkern rkern 551K May 27 2009 odrpack_guide.pdf -rw-r--r-- 1 rkern rkern 167K May 27 2009 odr_vcv.pdf The really important one is the 551K file, though all of them are useful. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From stefan at sun.ac.za Thu Feb 11 11:27:42 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 11 Feb 2010 18:27:42 +0200 Subject: [SciPy-dev] Web space for docstring-linked PDFs? In-Reply-To: <3d375d731002110737i5b0d6cebo7fdfbeb52f6cdfd7@mail.gmail.com> References: <3d375d731002101448x6c817043sd9eb6351455c643b@mail.gmail.com> <9457e7c81002102241n57ffff69w26c1592dbd411437@mail.gmail.com> <3d375d731002110737i5b0d6cebo7fdfbeb52f6cdfd7@mail.gmail.com> Message-ID: <9457e7c81002110827r398b128dx21f6c58738e3539d@mail.gmail.com> On 11 February 2010 17:37, Robert Kern wrote: > How big is too big? > > % ls -lh > total 900K > -rw-r--r-- 1 rkern rkern 156K May 27 ?2009 odr_ams.pdf > -rw-r--r-- 1 rkern rkern 551K May 27 ?2009 odrpack_guide.pdf > -rw-r--r-- 1 rkern rkern 167K May 27 ?2009 odr_vcv.pdf > > The really important one is the 551K file, though all of them are useful. Even here, where our internet connectivity is something not discussed in civilised conversation, 500K is quite manageable, so I'd say go ahead (if it were up to me). Regards St?fan From d.l.goldsmith at gmail.com Thu Feb 11 19:20:57 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Thu, 11 Feb 2010 16:20:57 -0800 Subject: [SciPy-dev] Not Re: Removing datetime support. :-) Message-ID: <45d1ab481002111620s103360e6m35606a0336e4986a@mail.gmail.com> In the doc Wiki, how does one make a three line, a.k.a. "identically" equal to sign? Thanks! DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Feb 11 19:21:59 2010 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 11 Feb 2010 18:21:59 -0600 Subject: [SciPy-dev] Not Re: Removing datetime support. :-) In-Reply-To: <45d1ab481002111620s103360e6m35606a0336e4986a@mail.gmail.com> References: <45d1ab481002111620s103360e6m35606a0336e4986a@mail.gmail.com> Message-ID: <3d375d731002111621jc31d987v9211aec3e021b9c4@mail.gmail.com> On Thu, Feb 11, 2010 at 18:20, David Goldsmith wrote: > In the doc Wiki, how does one make a three line, a.k.a. "identically" equal > to sign?? Thanks! Preferably, you don't. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From david.kirkby at onetel.net Thu Feb 11 19:35:06 2010 From: david.kirkby at onetel.net (Dr. David Kirkby) Date: Fri, 12 Feb 2010 00:35:06 +0000 Subject: [SciPy-dev] Not Re: Removing datetime support. :-) In-Reply-To: <45d1ab481002111620s103360e6m35606a0336e4986a@mail.gmail.com> References: <45d1ab481002111620s103360e6m35606a0336e4986a@mail.gmail.com> Message-ID: <4B74A23A.7050801@onetel.net> David Goldsmith wrote: > In the doc Wiki, how does one make a three line, a.k.a. "identically" > equal to sign? Thanks! > > DG On some Wikis, an exclamation mark will remove formatting. Try !=== Dave From jh at physics.ucf.edu Sat Feb 13 13:30:18 2010 From: jh at physics.ucf.edu (Joe Harrington) Date: Sat, 13 Feb 2010 13:30:18 -0500 Subject: [SciPy-dev] 2-review system on doc wiki Message-ID: Chuck Harris recently posted on the numpy-discussion list a request for numpy 2.0 requirements. I suggested that we include reviewed docs in 2.0. The problem is that the docstring review is stuck. We need to implement both a technical and a presentation review, but we currently lack labor to do the job. Pauli Virtanen has been busy with other commitments, so there has been little progress on doc wiki changes. So, this is also a call for a Django programmer who can add a second review capability to the doc wiki. If nobody steps forward, then we'll have to abandon the idea. I think this would be a shame, because there are many docstrings that are technically complete but impenetrable, and others that are well presented but technically incomplete. It won't be hard to fix these, but we need a system to find them. Any takers? Ideally, it should be someone who has written docs in our system and who has Django experience, but likely Django or similar experience is more important for this. Thanks, --jh-- Prof. Joseph Harrington Planetary Sciences Group Department of Physics MAP 414 4000 Central Florida Blvd. University of Central Florida Orlando, FL 32816-2385 jh at physics.ucf.edu planets.ucf.edu From millman at berkeley.edu Sat Feb 13 15:39:30 2010 From: millman at berkeley.edu (Jarrod Millman) Date: Sat, 13 Feb 2010 14:39:30 -0600 Subject: [SciPy-dev] 2-review system on doc wiki In-Reply-To: References: Message-ID: On Sat, Feb 13, 2010 at 12:30 PM, Joe Harrington wrote: > Chuck Harris recently posted on the numpy-discussion list a request > for numpy 2.0 requirements. ?I suggested that we include reviewed docs > in 2.0. NumPy 2.0 is being released to address a recent ABI change (and is not going to include any major new functionality). It will be released in 3-4 weeks, so this is unfortunately way out of scope for this release. However, I am all for continuing the doc project and happy to see you are still moving forward with it. Thanks, -- Jarrod Millman Helen Wills Neuroscience Institute 10 Giannini Hall, UC Berkeley http://cirl.berkeley.edu/ From bsouthey at gmail.com Sat Feb 13 20:22:19 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Sat, 13 Feb 2010 19:22:19 -0600 Subject: [SciPy-dev] 2-review system on doc wiki In-Reply-To: References: Message-ID: On Sat, Feb 13, 2010 at 12:30 PM, Joe Harrington wrote: > Chuck Harris recently posted on the numpy-discussion list a request > for numpy 2.0 requirements. ?I suggested that we include reviewed docs > in 2.0. I think you need to a provide a plan on how this can be done although clearly it will not be the 2.0 release but perhaps the somewhere in the 2.x series. > > The problem is that the docstring review is stuck. ?We need to > implement both a technical and a presentation review, but we currently > lack labor to do the job. ?Pauli Virtanen has been busy with other > commitments, so there has been little progress on doc wiki changes. What do you actually need by technical review and presentation review? I think that we need some sort of checklist that people can go through for each part. Really just numpy alone it a big task so it would be nice to get people to proof other people's work. > So, this is also a call for a Django programmer who can add a second > review capability to the doc wiki. > > If nobody steps forward, then we'll have to abandon the idea. ?I think > this would be a shame, because there are many docstrings that are > technically complete but impenetrable, and others that are well > presented but technically incomplete. ?It won't be hard to fix these, > but we need a system to find them. Can you provide examples that illustrate these two problems? I am prepared to try to do something if I can understand what to do and I can get the time. But the Django side is beyond me as I am trying to learn it for my own project. > > Any takers? ?Ideally, it should be someone who has written docs in our > system and who has Django experience, but likely Django or similar > experience is more important for this. > > Thanks, > > --jh-- Bruce From d.l.goldsmith at gmail.com Sat Feb 13 23:39:04 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sat, 13 Feb 2010 20:39:04 -0800 Subject: [SciPy-dev] 2-review system on doc wiki In-Reply-To: References: Message-ID: <45d1ab481002132039r4823c067je07d3d19089088d7@mail.gmail.com> On Sat, Feb 13, 2010 at 5:22 PM, Bruce Southey wrote: > On Sat, Feb 13, 2010 at 12:30 PM, Joe Harrington > wrote: > > Chuck Harris recently posted on the numpy-discussion list a request > > for numpy 2.0 requirements. I suggested that we include reviewed docs > > in 2.0. > > I think you need to a provide a plan on how this can be done although > clearly it will not be the 2.0 release but perhaps the somewhere in > the 2.x series. > > > > > The problem is that the docstring review is stuck. We need to > > implement both a technical and a presentation review, but we currently > > lack labor to do the job. Pauli Virtanen has been busy with other > > commitments, so there has been little progress on doc wiki changes. > > What do you actually need by technical review and presentation review? > I think that we need some sort of checklist that people can go through > http://docs.scipy.org/numpy/Questions+Answers/#reviewer-guidelines First promulgated *2009-07-22, updated 2009-09-28, 2009-10-01 Unfortunately, as I say there, asking for a precise "checklist" for the "presentation" review is _perhaps_ intractable (too much inherent subjectivity IMO), but I don't feel that should preclude it being done. DG * > for each part. Really just numpy alone it a big task so it would be > nice to get people to proof other people's work. > > > So, this is also a call for a Django programmer who can add a second > > review capability to the doc wiki. > > > > If nobody steps forward, then we'll have to abandon the idea. I think > > this would be a shame, because there are many docstrings that are > > technically complete but impenetrable, and others that are well > > presented but technically incomplete. It won't be hard to fix these, > > but we need a system to find them. > > Can you provide examples that illustrate these two problems? > > I am prepared to try to do something if I can understand what to do > and I can get the time. But the Django side is beyond me as I am > trying to learn it for my own project. > > > > > > Any takers? Ideally, it should be someone who has written docs in our > > system and who has Django experience, but likely Django or similar > > experience is more important for this. > > > > Thanks, > > > > --jh-- > > Bruce > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Feb 14 01:24:09 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 13 Feb 2010 23:24:09 -0700 Subject: [SciPy-dev] Buildbots showing red Message-ID: *All* the buildbots are showing errors. Here are some: ====================================================================== ERROR: test_view_to_flexible_dtype (test_core.TestMaskedView) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/numpybb/Buildbot/numpy/b13/numpy-install/lib/python2.4/site-packages/numpy/ma/tests/test_core.py", line 3333, in test_view_to_flexible_dtype test = a[0].view([('A', float), ('B', float)]) File "../numpy-install/lib/python2.4/site-packages/numpy/ma/core.py", line 2866, in view File "../numpy-install/lib/python2.4/site-packages/numpy/ma/core.py", line 2786, in __array_finalize__ TypeError: attribute 'shape' of 'numpy.generic' objects is not writable ====================================================================== ERROR: test_view_to_subdtype (test_core.TestMaskedView) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/numpybb/Buildbot/numpy/b13/numpy-install/lib/python2.4/site-packages/numpy/ma/tests/test_core.py", line 3354, in test_view_to_subdtype test = a[0].view((float, 2)) File "../numpy-install/lib/python2.4/site-packages/numpy/ma/core.py", line 2866, in view File "../numpy-install/lib/python2.4/site-packages/numpy/ma/core.py", line 2786, in __array_finalize__ TypeError: attribute 'shape' of 'numpy.generic' objects is not writable ====================================================================== FAIL: test_buffer_hashlib (test_regression.TestRegression) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/numpybb/Buildbot/numpy/b13/numpy-install/lib/python2.4/site-packages/numpy/core/tests/test_regression.py", line 1255, in test_buffer_hashlib assert_equal(md5(x).hexdigest(), '2a1dd1e1e59d0a384c26951e316cd7e6') File "../numpy-install/lib/python2.4/site-packages/numpy/testing/utils.py", line 305, in assert_equal AssertionError: Items are not equal: ACTUAL: '1264d4a9f74dc462700fd163e3ff09a6' DESIRED: '2a1dd1e1e59d0a384c26951e316cd7e6' Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Sun Feb 14 10:12:03 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Sun, 14 Feb 2010 09:12:03 -0600 Subject: [SciPy-dev] 2-review system on doc wiki In-Reply-To: <45d1ab481002132039r4823c067je07d3d19089088d7@mail.gmail.com> References: <45d1ab481002132039r4823c067je07d3d19089088d7@mail.gmail.com> Message-ID: On Sat, Feb 13, 2010 at 10:39 PM, David Goldsmith wrote: > On Sat, Feb 13, 2010 at 5:22 PM, Bruce Southey wrote: >> >> On Sat, Feb 13, 2010 at 12:30 PM, Joe Harrington >> wrote: >> > Chuck Harris recently posted on the numpy-discussion list a request >> > for numpy 2.0 requirements. ?I suggested that we include reviewed docs >> > in 2.0. >> >> I think you need to a provide a plan on how this can be done although >> clearly it will not be the 2.0 release but perhaps the somewhere in >> the 2.x series. >> >> > >> > The problem is that the docstring review is stuck. ?We need to >> > implement both a technical and a presentation review, but we currently >> > lack labor to do the job. ?Pauli Virtanen has been busy with other >> > commitments, so there has been little progress on doc wiki changes. >> >> What do you actually need by technical review and presentation review? >> I think that we need some sort of checklist that people can go through > > http://docs.scipy.org/numpy/Questions+Answers/#reviewer-guidelines > > First promulgated 2009-07-22, updated 2009-09-28, 2009-10-01 > > Unfortunately, as I say there, asking for a precise "checklist" for the > "presentation" review is _perhaps_ intractable (too much inherent > subjectivity IMO), but I don't feel that should preclude it being done. > > DG Never knew that existed because it is rather different from: http://docs.scipy.org/numpy/Front%20Page/#roles-reviewing But my question is what is 'technical review' and what is 'presentation review'? Without clarifying those any steps are rather pointless. From the page you linked, these two things are combined. Questions like 'is it clear?' and 'is it helpful?' is not what I would call presentation but technical. To me, presentation review should only address if it meets the docstring standard http://projects.scipy.org/numpy/wiki/CodingStyleGuidelines#docstring-standard) and is displayed correctly. I just think that the 'bar' here is set too high for a volunteer project. Also I think that this 'new version' is asking too much especially when people have been working under a rather different approach. Also there is no conflict resolution between all the steps involved. Bruce From stefan at sun.ac.za Sun Feb 14 11:07:43 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sun, 14 Feb 2010 18:07:43 +0200 Subject: [SciPy-dev] 2-review system on doc wiki In-Reply-To: References: <45d1ab481002132039r4823c067je07d3d19089088d7@mail.gmail.com> Message-ID: <9457e7c81002140807g348a5a5naa6f13deb82b92c5@mail.gmail.com> On 14 February 2010 17:12, Bruce Southey wrote: > I just think that the 'bar' here is set too high for a volunteer > project. Also I think that this 'new version' is asking too much > especially when people have been working under a rather different > approach. Also there is no conflict resolution between all the steps > involved. This whole issue looks much more complicated than it is. We simply need two (possibly overlapping) groups of people to answer the following questions: 1) Is the docstring technically accurate? (Examples correct, docstring format followed, etc.) 2) Is the docstring well written and easily understandable? (Language use, simplicity) However, I think the reviewing process is much too daunting as it is, so we'll need to simply everything by providing: - An easy way to reach numpy / scipy docstrings for editing (currently requires >1 click) - An easy way to review (maybe an automated weekly post to the list with prioritised request for review) - Simpler explanations of the differences between docstring and reference guide editing - Easy integration of patches into the source code (everything is in place, but we need to make the process clear) Regards St?fan From tmp50 at ukr.net Sun Feb 14 11:45:23 2010 From: tmp50 at ukr.net (Dmitrey) Date: Sun, 14 Feb 2010 18:45:23 +0200 Subject: [SciPy-dev] Changes to trunk/scipy/optimize In-Reply-To: <9457e7c81002092344i72886a26gcee3460782e4647e@mail.gmail.com> Message-ID: Well, if anyone doesn't mind, I'll add one more string "and automatic differentiation" to the docstring, I guess it's quite essential. Regards, D. ?? ????: St?fan van der Walt Sorry, Dmitrey, I didn't see your earlier message. I included a link to OpenOpt in the optimize module docstring, and removed the See Also sections entirely (they were difficult to maintain, so I would have removed them either way). I hope you find this solution to your satisfaction. Kind regards St?fan ___________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jh at physics.ucf.edu Sun Feb 14 14:07:59 2010 From: jh at physics.ucf.edu (Joe Harrington) Date: Sun, 14 Feb 2010 14:07:59 -0500 Subject: [SciPy-dev] 2-review system on doc wiki In-Reply-To: (scipy-dev-request@scipy.org) References: Message-ID: On 14 February 2010 17:12, Bruce Southey wrote: > I just think that the 'bar' here is set too high for a volunteer > project. Also I think that this 'new version' is asking too much > especially when people have been working under a rather different > approach. Also there is no conflict resolution between all the steps > involved. Sorry for the length here. Hopefully this clarifies a lot of questions. See, in particular, example 3 if you're not convinced we need this. I agree with Stefan that this really isn't that complicated. David and I have discussed the two-review system here, in doc telecons, at the SciPy09 conference, and in its proceedings; this is nothing new. The motivation is simple: I read a number of the reviewed pages and found problems that should not have passed review. The plan is a slight modification of our one-review plan. David pointed to that already (thanks, David). There are no differences of approach other than the change in the review system. Since only a tiny fraction (8%) of the pages has undergone any level of review, and only 4% have passed review, the change will not cause a major upset to what we are doing. As always, we resolve conflicts by discussion and use of the comment field on each page. We are aiming at a product of equal or greater quality to similar manuals for software such as IDL or Matlab. Whether this can all be done by volunteers is an irrelevant question. I expect that the number of reviewers will be much smaller than the number of writers. We will identify and vet technical and presentation reviewers, and if necessary we can seek funds to pay them. Of course, we'll try the volunteer way first. I hope that we can find volunteer technical reviewers from among the developers. Presentation reviewers will likely have substantial technical writing experience; we have a list of a few potentials already. A professional copy editor will proof the doc the first time we have fully-reviewed pages and hopefully for each major release thereafter, but that's a future problem. I give some examples and clarification on the review roles below. EXAMPLES 1. numpy.core.umath.sqrt does not define the "out" argument (technical omission) and uses language "branch cut", "continuous from above on it" that will confuse the majority of readers who have not taken a course in complex variables, such as high-school students and perhaps many of their teachers (presentation review). This could be solved with an external reference, which is missing, or even just a rewording of the sentence, like: In the terminology of complex-variable calculus (ref), sqrt has a branch cut [-inf, 0) and is continuous from above on it. This is what I call "introducing an expert section". It signifies to our target audience (one level below the likely users of a function) that we're about to go over their heads, where to go to come up to speed, and otherwise not to sweat it if they don't get it. (Actually, in this particular case, it's not clear to me why we need to document the analytic properties of taking roots. There's *lots* more one could say about roots, and trig functions, and.... We should leave that to the textbooks.) 2. Most routines are missing pointers to relevant pages of the numpy.doc package that discuss things like "along and axis" or "out". In many cases, that's because these pages didn't exist when the function docstrings were written. 3. From scipy, some of the ready-for-review pages in scipy.stats are likely technically good, but are totally impenetrable to anyone without several semesters' equivalent college education in statistics. While you may need that level of description to use all the tests to their fullest, a beginner should be able to do things like plot, evaluate, and integrate standard PDFs within a few minutes of starting to read the docs there. If two stats experts wrote all the pages and reviewed each others' writing, such improvements would never be suggested. Yet, a single presentation-oriented reviewer might not catch technical errors. That's why we need two types of reviewers. TECHNICAL REVIEW A technical review ensures that all the features, API points, underlying methods that affect the results, and limitations of the item are noted properly in the docstring. It implies familiarity with (or at least a good, hard look at) the source code and the general topic (e.g., fitting, stats, etc.). In the ideal case, an expert should be able to take the doc and write a more-or-less equivalent routine. This review also should check that internal cross-references are complete and that external references are sufficient (and long-lived). PRESENTATION REVIEW A presentation review ensures that our target audience - which we long ago defined at one level *below* that of a likely user of a given routine - can read and understand all but the expert parts of the document, that the doc follows the docstring format, that it is as clear as reasonably possible, that, if expert sections are needed, they are properly introduced as such, that the examples are the right ones to have and that they work, etc. --jh-- From josef.pktd at gmail.com Sun Feb 14 14:56:25 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 14 Feb 2010 14:56:25 -0500 Subject: [SciPy-dev] 2-review system on doc wiki In-Reply-To: References: Message-ID: <1cd32cbb1002141156s37147298md272f15d356fd934@mail.gmail.com> On Sun, Feb 14, 2010 at 2:07 PM, Joe Harrington wrote: > On 14 February 2010 17:12, Bruce Southey wrote: >> I just think that the 'bar' here is set too high for a volunteer >> project. Also I think that this 'new version' is asking too much >> especially when people have been working under a rather different >> approach. Also there is no conflict resolution between all the steps >> involved. > > Sorry for the length here. ?Hopefully this clarifies a lot of > questions. ?See, in particular, example 3 if you're not convinced we > need this. > > I agree with Stefan that this really isn't that complicated. ?David > and I have discussed the two-review system here, in doc telecons, at > the SciPy09 conference, and in its proceedings; this is nothing new. > The motivation is simple: I read a number of the reviewed pages and > found problems that should not have passed review. ?The plan is a > slight modification of our one-review plan. ?David pointed to that > already (thanks, David). > > There are no differences of approach other than the change in the > review system. ?Since only a tiny fraction (8%) of the pages has > undergone any level of review, and only 4% have passed review, the > change will not cause a major upset to what we are doing. > > As always, we resolve conflicts by discussion and use of the comment > field on each page. > > We are aiming at a product of equal or greater quality to similar > manuals for software such as IDL or Matlab. ?Whether this can all be > done by volunteers is an irrelevant question. ?I expect that the > number of reviewers will be much smaller than the number of writers. > We will identify and vet technical and presentation reviewers, and if > necessary we can seek funds to pay them. ?Of course, we'll try the > volunteer way first. ?I hope that we can find volunteer technical > reviewers from among the developers. ?Presentation reviewers will > likely have substantial technical writing experience; we have a list > of a few potentials already. ?A professional copy editor will proof > the doc the first time we have fully-reviewed pages and hopefully for > each major release thereafter, but that's a future problem. > > I give some examples and clarification on the review roles below. > > EXAMPLES > > 1. numpy.core.umath.sqrt does not define the "out" argument (technical > omission) and uses language "branch cut", "continuous from above on > it" that will confuse the majority of readers who have not taken a > course in complex variables, such as high-school students and perhaps > many of their teachers (presentation review). ?This could be solved > with an external reference, which is missing, or even just a rewording > of the sentence, like: > > In the terminology of complex-variable calculus (ref), sqrt has a > branch cut [-inf, 0) and is continuous from above on it. > > This is what I call "introducing an expert section". ?It signifies to > our target audience (one level below the likely users of a function) > that we're about to go over their heads, where to go to come up to > speed, and otherwise not to sweat it if they don't get it. ?(Actually, > in this particular case, it's not clear to me why we need to document > the analytic properties of taking roots. ?There's *lots* more one > could say about roots, and trig functions, and.... ?We should leave > that to the textbooks.) > > 2. Most routines are missing pointers to relevant pages of the > numpy.doc package that discuss things like "along and axis" or "out". > In many cases, that's because these pages didn't exist when the > function docstrings were written. > > 3. From scipy, some of the ready-for-review pages in scipy.stats are > likely technically good, but are totally impenetrable to anyone > without several semesters' equivalent college education in statistics. > While you may need that level of description to use all the tests to > their fullest, a beginner should be able to do things like plot, > evaluate, and integrate standard PDFs within a few minutes of starting > to read the docs there. ?If two stats experts wrote all the pages and > reviewed each others' writing, such improvements would never be > suggested. ?Yet, a single presentation-oriented reviewer might not > catch technical errors. ?That's why we need two types of reviewers. I agree with all the reviewing proposals, but I have two qualifications for specialized parts in scipy Most scipy subpackages have tutorials, and if a user needs an introduction then it is necessary to read the tutorial. I went through them when I looked at the part of scipy that I didn't know. A basic introduction cannot be included in every docstring. I think the presentation review for accessibility with less prior information should focus also more on the tutorials. For the scipy.stats.distributions, I tried to do this in the stats.tutorial (although my plots seemed to have disappeared again.) For some functions it will be difficult to determine what one level below the likely user actually means. I recently fixed a problem with http://docs.scipy.org/scipy/docs/scipy.signal.signaltools.hilbert/ but I still don't know what the use for the analytical signal is or what it really means. I don't think a random user will bump into signal.hilbert, for example. And for a basic introduction Wikipedia is more informative than a docstring can be. Josef > > TECHNICAL REVIEW > > A technical review ensures that all the features, API points, > underlying methods that affect the results, and limitations of the > item are noted properly in the docstring. ?It implies familiarity with > (or at least a good, hard look at) the source code and the general > topic (e.g., fitting, stats, etc.). ?In the ideal case, an expert > should be able to take the doc and write a more-or-less equivalent > routine. ?This review also should check that internal cross-references > are complete and that external references are sufficient (and > long-lived). > > PRESENTATION REVIEW > > A presentation review ensures that our target audience - which we long > ago defined at one level *below* that of a likely user of a given > routine - can read and understand all but the expert parts of the > document, that the doc follows the docstring format, that it is as > clear as reasonably possible, that, if expert sections are needed, > they are properly introduced as such, that the examples are the right > ones to have and that they work, etc. > > --jh-- > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From stefan at sun.ac.za Sun Feb 14 15:48:14 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sun, 14 Feb 2010 22:48:14 +0200 Subject: [SciPy-dev] Changes to trunk/scipy/optimize In-Reply-To: References: <9457e7c81002092344i72886a26gcee3460782e4647e@mail.gmail.com> Message-ID: <9457e7c81002141248q3b213e15off65a76c617b7242@mail.gmail.com> 2010/2/14 Dmitrey : > Well, if anyone doesn't mind, I'll add one more string "and automatic > differentiation" to the docstring, I guess it's quite essential. Sure, go ahead. Regards St?fan From warren.weckesser at enthought.com Sun Feb 14 19:06:18 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Sun, 14 Feb 2010 18:06:18 -0600 Subject: [SciPy-dev] Ticket #1105 -- patch submitted for scipy.signal.waveforms.chirp() In-Reply-To: <4B6F6F7A.70506@enthought.com> References: <4B6F6F7A.70506@enthought.com> Message-ID: <4B788FFA.4060709@enthought.com> Warren Weckesser wrote: > I just added a patch to ticket #1105; a summary of the changes is given > in the ticket. If any scipy.signal users (especially chirp()) users) > have chance, please take a look a let me know what you think. The > docstrings could use polishing, but I'd like to get some feedback before > doing more work on it. > > Warren > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > I created a new patch today; any feedback would be appreciated. Examples of signals created by the patched functions are available here: http://www.scipy.org/NewChirp Cheers, Warren From d.l.goldsmith at gmail.com Mon Feb 15 00:30:58 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sun, 14 Feb 2010 21:30:58 -0800 Subject: [SciPy-dev] 2-review system on doc wiki In-Reply-To: <1cd32cbb1002141156s37147298md272f15d356fd934@mail.gmail.com> References: <1cd32cbb1002141156s37147298md272f15d356fd934@mail.gmail.com> Message-ID: <45d1ab481002142130i3306d68pbb3771265a42c8b6@mail.gmail.com> On Sun, Feb 14, 2010 at 11:56 AM, wrote: > > For some functions it will be difficult to determine what one level > below the likely user actually means. I recently fixed a problem with > Maybe, maybe not: if in doubt, one could query the list "who uses this function?" Reply: "I do." (Hopefully someone pipes up.) Next query: "what do you do/how do you use it?" Reply: "I'm a physicist; I use it for my research on quantum gravity." Conclusion: "one level below" equals a physics graduate student (or maybe post-doc). Another (hopefully less plausible) example: "Who uses 'dot'?" A tsunami of replies: "I do." Follow-up: "What do you do?" Lowest level of replies: a high school student taking AP Physics. Conclusion: "one level below" equals a high school student taking non-AP physics. Not so hard - "loud" on the list, perhaps, but not so hard. DG > http://docs.scipy.org/scipy/docs/scipy.signal.signaltools.hilbert/ > but I still don't know what the use for the analytical signal is or > what it really means. I don't think a random user will bump into > signal.hilbert, for example. And for a basic introduction Wikipedia is > more informative than a docstring can be. > > Josef > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jh at physics.ucf.edu Mon Feb 15 13:34:44 2010 From: jh at physics.ucf.edu (Joe Harrington) Date: Mon, 15 Feb 2010 13:34:44 -0500 Subject: [SciPy-dev] 2-review system on doc wiki In-Reply-To: (scipy-dev-request@scipy.org) References: Message-ID: On Sun, Feb 14, 2010 at 11:56 AM, wrote: > Most scipy subpackages have tutorials, and if a user needs an > introduction then it is necessary to read the tutorial. Agreed that a lot can go into the tutorials; we need clearer guidelines on what should go in tutorial vs. function docstrings. Perhaps more importantly, we need to educate people about the existence of these tutorials, perhaps in the see-also to each routine, or at least in the module docstrings. > And for a basic introduction Wikipedia is > more informative than a docstring can be. Agreed that it is our job to inform about what the routines are and how to use them to do likely tasks, not to teach people the math behind them. But, it is pretty standard to include enough of that math (such as a formula) to demonstrate what the routine does. This is particuarly important where there is more than one way it's defined, such as the normalization for Fourier transforms (does it go on the forward transform, the inverse, or split between them). --jh-- From njs at pobox.com Tue Feb 16 15:18:49 2010 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 16 Feb 2010 12:18:49 -0800 Subject: [SciPy-dev] code for incremental least squares Message-ID: <961fa2b41002161218q1a2c5242o6ae7f2f39d05a179@mail.gmail.com> Hello all, I have a need for solving some very large least squares regression problems -- but fortunately my model matrix can be generated incrementally, so there are some tricks to do the important calculations without actually loading the whole thing into memory. The resulting code is rather general, and might be of wider interest, so I thought I'd send a note here and see what people thought about upstreaming it somehow... Specifically, what it implements is: -- Least squares regression for very large numbers of observations -- Using either Cholesky or QR methods -- Multivariate or univariate -- Full statistical tests (t tests, linear hypothesis F tests, univariate and multivariate) -- "Grouped" weighted least squares (i.e., if your observations fall into a small number of different classes, and are homoskedastic within each class, but not across classes, it can handle that), with simple EM estimation of the weights ("feasible generalized least squares") -- The Cholesky solver's initial "accumulation" step has a parallelization-friendly API, and can take advantage of sparseness in the model matrix Most of the code is unencumbered, but I cribbed some of the calculations for multivariate statistics (Wilk's lambda and all that) from R, so those functions are GPL'ed (but could presumably be rewritten if anyone cares). Also the sparse Cholesky method should probably take advantage of the GPL'ed scikits.sparse.cholmod, but that isn't actually implemented yet (it just uses sparse matrices for the accumulation step, and then converts to a dense matrix at the end). So, any thoughts on what would be best to do with this? I'd be happy to contribute it to SciPy or whatever. In some ways it overlaps with scikits.statsmodels, but of course the API is quite different. -- Nathaniel From stefan at sun.ac.za Tue Feb 16 15:24:08 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 16 Feb 2010 22:24:08 +0200 Subject: [SciPy-dev] code for incremental least squares In-Reply-To: <961fa2b41002161218q1a2c5242o6ae7f2f39d05a179@mail.gmail.com> References: <961fa2b41002161218q1a2c5242o6ae7f2f39d05a179@mail.gmail.com> Message-ID: <9457e7c81002161224r383a5e33sd04f8d1e195cc7bd@mail.gmail.com> Hi Nathan On 16 February 2010 22:18, Nathaniel Smith wrote: > I have a need for solving some very large least squares regression > problems -- but fortunately my model matrix can be generated > incrementally, so there are some tricks to do the important > calculations without actually loading the whole thing into memory. The > resulting code is rather general, and might be of wider interest, so I > thought I'd send a note here and see what people thought about > upstreaming it somehow... This sounds interesting! Could you expand on the incremental generation of the model matrix, and how it is made general? Thanks St?fan From josef.pktd at gmail.com Tue Feb 16 15:37:55 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 16 Feb 2010 15:37:55 -0500 Subject: [SciPy-dev] code for incremental least squares In-Reply-To: <9457e7c81002161224r383a5e33sd04f8d1e195cc7bd@mail.gmail.com> References: <961fa2b41002161218q1a2c5242o6ae7f2f39d05a179@mail.gmail.com> <9457e7c81002161224r383a5e33sd04f8d1e195cc7bd@mail.gmail.com> Message-ID: <1cd32cbb1002161237q13a3d27ct1cccc1f2927bc25e@mail.gmail.com> 2010/2/16 St?fan van der Walt : > Hi Nathan > > On 16 February 2010 22:18, Nathaniel Smith wrote: >> I have a need for solving some very large least squares regression >> problems -- but fortunately my model matrix can be generated >> incrementally, so there are some tricks to do the important >> calculations without actually loading the whole thing into memory. The >> resulting code is rather general, and might be of wider interest, so I >> thought I'd send a note here and see what people thought about >> upstreaming it somehow... > > This sounds interesting! ?Could you expand on the incremental > generation of the model matrix, and how it is made general? I have the same thought, Are you increasing by observation or by explanatory variable? For either case, I would be very interested to see this kind of incremental least squares in statsmodels. If you are able to license your parts as BSD, then I will look at it for sure. We have some plans for this but not yet implemented. Josef > > Thanks > St?fan > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From njs at pobox.com Tue Feb 16 15:49:12 2010 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 16 Feb 2010 12:49:12 -0800 Subject: [SciPy-dev] Best interface for computing the logarithm of the determinant? Message-ID: <961fa2b41002161249g66dde100s251cf8874d2a996e@mail.gmail.com> So when you have a matrix whose determinant you want, it's often wise to compute the logarithm of the determinant instead of the determinant itself, because determinants involve lots and lots of multiplications and the result might otherwise underflow/overflow. Therefore, in scikits.sparse, I'd like to provide an API for doing this (and this is well-supported by the underlying libraries). But the problem is that for a general matrix, the determinant may be zero or negative. Obviously we can deal with this, but what's the best API? I'd like to use one consistently across the different factorizations in scikits.sparse, and perhaps eventually in numpy as well. Some options: 1) Split off the sign into a separate return value ('sign' may be 1, -1, 0): sign, value = logdet(A) actual_determinant = sign * exp(value) 2) Allow complex/infinite return values, even when A is a real matrix: logdet(eye(3)) == pi*1j logdet(zeros((3, 3))) == -Inf 3) "Scientific notation" (This is what UMFPACK's API does): return a mantissa and base-10 exponent: mantissa, exponent = logdet(A) actual_determinant = mantissa * 10 ** exponent 4) Have separate functions for computing the sign, and the log of the absolute value (This is what GSL does, though it seems pointlessly inefficient): sign = sgndet(A) value = logdet(A) actual_determinant = sign * exp(value) These are all kind of ugly looking, unfortunately, but that seems unavoidable, unless someone has a clever idea. Any preferences? -- Nathaniel From warren.weckesser at enthought.com Tue Feb 16 16:01:42 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Tue, 16 Feb 2010 15:01:42 -0600 Subject: [SciPy-dev] Best interface for computing the logarithm of the determinant? In-Reply-To: <961fa2b41002161249g66dde100s251cf8874d2a996e@mail.gmail.com> References: <961fa2b41002161249g66dde100s251cf8874d2a996e@mail.gmail.com> Message-ID: <4B7B07B6.7040802@enthought.com> Nathaniel Smith wrote: > So when you have a matrix whose determinant you want, it's often wise > to compute the logarithm of the determinant instead of the determinant > itself, because determinants involve lots and lots of multiplications > and the result might otherwise underflow/overflow. Therefore, in > scikits.sparse, I'd like to provide an API for doing this (and this is > well-supported by the underlying libraries). > > But the problem is that for a general matrix, the determinant may be > zero or negative. For a general matrix, the determinant may be complex. Or are you only considering real matrices? Warren From jsseabold at gmail.com Tue Feb 16 16:10:12 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 16 Feb 2010 16:10:12 -0500 Subject: [SciPy-dev] maxentropy docs / parent class methods Message-ID: The docs on the methods for the maxentropy model class that are inherited from basemodel are not picked up by sphinx. It seems that most of the .rst files explicitly list the methods, but since basemodel is not intended to be public (and some subclasses overwrite the parent class methods), is there a better way than to start listing the basemodel methods? I started to make the changes, but I don't think this is the right way forward. Other thoughts? http://docs.scipy.org/scipy/docs/scipy-docs/maxentropy.rst/ Cheers, Skipper From njs at pobox.com Tue Feb 16 16:16:05 2010 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 16 Feb 2010 13:16:05 -0800 Subject: [SciPy-dev] Best interface for computing the logarithm of the determinant? In-Reply-To: <4B7B07B6.7040802@enthought.com> References: <961fa2b41002161249g66dde100s251cf8874d2a996e@mail.gmail.com> <4B7B07B6.7040802@enthought.com> Message-ID: <961fa2b41002161316o63db3d74o9403a9df0284d329@mail.gmail.com> On Tue, Feb 16, 2010 at 1:01 PM, Warren Weckesser wrote: > Nathaniel Smith wrote: >> So when you have a matrix whose determinant you want, it's often wise >> to compute the logarithm of the determinant instead of the determinant >> itself, because determinants involve lots and lots of multiplications >> and the result might otherwise underflow/overflow. Therefore, in >> scikits.sparse, I'd like to provide an API for doing this (and this is >> well-supported by the underlying libraries). >> >> But the problem is that for a general matrix, the determinant may be >> zero or negative. > > For a general matrix, the determinant may be complex. ?Or are you only > considering real matrices? Sorry, yes. For a complex matrix, obviously you expect a complex determinant and there is no problem. But it may be confusing for people to get a complex number out when they are computing the (real) determinant of a real matrix; for instance, np.log on real-value inputs never returns a complex number, even if this means it must return NaN. Or perhaps this would be just fine, that's why it's one of the options I mentioned :-). It is a good point though that whatever API we pick should not be too annoying for people with complex matrices, since we want to use the same API for both real and complex. -- Nathaniel From dagss at student.matnat.uio.no Tue Feb 16 16:19:58 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Tue, 16 Feb 2010 22:19:58 +0100 Subject: [SciPy-dev] Best interface for computing the logarithm of the determinant? In-Reply-To: <961fa2b41002161249g66dde100s251cf8874d2a996e@mail.gmail.com> References: <961fa2b41002161249g66dde100s251cf8874d2a996e@mail.gmail.com> Message-ID: <4B7B0BFE.8060109@student.matnat.uio.no> Nathaniel Smith wrote: > So when you have a matrix whose determinant you want, it's often wise > to compute the logarithm of the determinant instead of the determinant > itself, because determinants involve lots and lots of multiplications > and the result might otherwise underflow/overflow. Therefore, in > scikits.sparse, I'd like to provide an API for doing this (and this is > well-supported by the underlying libraries). > > But the problem is that for a general matrix, the determinant may be > zero or negative. Obviously we can deal with this, but what's the best > API? I'd like to use one consistently across the different > factorizations in scikits.sparse, and perhaps eventually in numpy as > well. > > Some options: > > 1) Split off the sign into a separate return value ('sign' may be 1, -1, 0): > sign, value = logdet(A) > actual_determinant = sign * exp(value) > > 2) Allow complex/infinite return values, even when A is a real matrix: > logdet(eye(3)) == pi*1j > logdet(zeros((3, 3))) == -Inf I'm +1 for this one. It is easy to interpret, and if one knows that the result is positive (like after a successful CHOLMOD LL^T) is is easy enough to take the real part by appending .real. Also it makes more sense for code that may be dealing with both complex and real matrices compared to 1). > > 3) "Scientific notation" (This is what UMFPACK's API does): return a > mantissa and base-10 exponent: > mantissa, exponent = logdet(A) > actual_determinant = mantissa * 10 ** exponent This is my least preferred option, because "10" has nothing to do with floats. > > 4) Have separate functions for computing the sign, and the log of the > absolute value (This is what GSL does, though it seems pointlessly > inefficient): > sign = sgndet(A) > value = logdet(A) > actual_determinant = sign * exp(value) Well, it doesn't have to be inefficient, you use cache the return value internally on id(A) and possibly weak reference callbacks to remove the cached values...but pretty ugly, yeah. -- Dag Sverre From josef.pktd at gmail.com Tue Feb 16 16:29:12 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 16 Feb 2010 16:29:12 -0500 Subject: [SciPy-dev] Best interface for computing the logarithm of the determinant? In-Reply-To: <4B7B0BFE.8060109@student.matnat.uio.no> References: <961fa2b41002161249g66dde100s251cf8874d2a996e@mail.gmail.com> <4B7B0BFE.8060109@student.matnat.uio.no> Message-ID: <1cd32cbb1002161329s29f8da19l298ce62fd726125e@mail.gmail.com> On Tue, Feb 16, 2010 at 4:19 PM, Dag Sverre Seljebotn wrote: > Nathaniel Smith wrote: >> So when you have a matrix whose determinant you want, it's often wise >> to compute the logarithm of the determinant instead of the determinant >> itself, because determinants involve lots and lots of multiplications >> and the result might otherwise underflow/overflow. Therefore, in >> scikits.sparse, I'd like to provide an API for doing this (and this is >> well-supported by the underlying libraries). >> >> But the problem is that for a general matrix, the determinant may be >> zero or negative. Obviously we can deal with this, but what's the best >> API? I'd like to use one consistently across the different >> factorizations in scikits.sparse, and perhaps eventually in numpy as >> well. >> >> Some options: >> >> 1) Split off the sign into a separate return value ('sign' may be 1, -1, 0): >> sign, value = logdet(A) >> actual_determinant = sign * exp(value) >> >> 2) Allow complex/infinite return values, even when A is a real matrix: >> logdet(eye(3)) == pi*1j >> logdet(zeros((3, 3))) == -Inf > > I'm +1 for this one. It is easy to interpret, and if one knows that the > result is positive (like after a successful CHOLMOD LL^T) is is easy > enough to take the real part by appending .real. > > Also it makes more sense for code that may be dealing with both complex > and real matrices compared to 1). I'm also in favor of this, but would prefer if promotion to complex only occurs if the result requires it. np.real_if_close might be safer than .real Josef > >> >> 3) "Scientific notation" (This is what UMFPACK's API does): return a >> mantissa and base-10 exponent: >> mantissa, exponent = logdet(A) >> actual_determinant = mantissa * 10 ** exponent > > This is my least preferred option, because "10" has nothing to do with > floats. > >> >> 4) Have separate functions for computing the sign, and the log of the >> absolute value (This is what GSL does, though it seems pointlessly >> inefficient): >> sign = sgndet(A) >> value = logdet(A) >> actual_determinant = sign * exp(value) > > Well, it doesn't have to be inefficient, you use cache the return value > internally on id(A) and possibly weak reference callbacks to remove the > cached values...but pretty ugly, yeah. > > -- > Dag Sverre > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From josef.pktd at gmail.com Tue Feb 16 16:41:23 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 16 Feb 2010 16:41:23 -0500 Subject: [SciPy-dev] maxentropy docs / parent class methods In-Reply-To: References: Message-ID: <1cd32cbb1002161341r412704c8y65a3eed7000fce8f@mail.gmail.com> On Tue, Feb 16, 2010 at 4:10 PM, Skipper Seabold wrote: > The docs on the methods for the maxentropy model class that are > inherited from basemodel are not picked up by sphinx. ?It seems that > most of the .rst files explicitly list the methods, but since > basemodel is not intended to be public (and some subclasses overwrite > the parent class methods), is there a better way than to start listing > the basemodel methods? > > I started to make the changes, but I don't think this is the right way > forward. ?Other thoughts? > > http://docs.scipy.org/scipy/docs/scipy-docs/maxentropy.rst/ Try to reference directly the basemodel.xxx method in the autoclass for model .. autoclass:: model .. autosummary:: :toctree: generated/ ..... model.dual basemodel.fit model.grad model.log Otherwise, it's better to have too much than too little information in the docs Josef > > Cheers, > > Skipper > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From njs at pobox.com Tue Feb 16 16:56:39 2010 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 16 Feb 2010 13:56:39 -0800 Subject: [SciPy-dev] code for incremental least squares In-Reply-To: <1cd32cbb1002161237q13a3d27ct1cccc1f2927bc25e@mail.gmail.com> References: <961fa2b41002161218q1a2c5242o6ae7f2f39d05a179@mail.gmail.com> <9457e7c81002161224r383a5e33sd04f8d1e195cc7bd@mail.gmail.com> <1cd32cbb1002161237q13a3d27ct1cccc1f2927bc25e@mail.gmail.com> Message-ID: <961fa2b41002161356vd281f75i46092d4a6e79c559@mail.gmail.com> On Tue, Feb 16, 2010 at 12:37 PM, wrote: > 2010/2/16 St?fan van der Walt : >> Hi Nathan >> >> On 16 February 2010 22:18, Nathaniel Smith wrote: >>> I have a need for solving some very large least squares regression >>> problems -- but fortunately my model matrix can be generated >>> incrementally, so there are some tricks to do the important >>> calculations without actually loading the whole thing into memory. The >>> resulting code is rather general, and might be of wider interest, so I >>> thought I'd send a note here and see what people thought about >>> upstreaming it somehow... >> >> This sounds interesting! ?Could you expand on the incremental >> generation of the model matrix, and how it is made general? > > I have the same thought, Are you increasing by observation or by > explanatory variable? By observation. (This is good since hopefully you have more observations than explanatory variables!) Actually generating the model matrix in an incremental way is your problem; but as you generate each 'strip' of the model matrix, you just hand it to the code's 'update' method and then forget about it. The basic idea inside 'update' is pretty elementary... if you have a model matrix X = np.row_stack([X1, X2, X3, ...]) then what we need for least squares calculations is X'X = X1'X1 + X2'X2 + X3'X3 + ... and we can compute this sum incrementally as the X_i matrices arrive. Also, it is obviously easy to parallelize the matrix products (and potentially the generation of the strips themselves, depending on your situation), and those seem to be the bottleneck. There's some linear algebra in working out how to calculate the residual sum of squares and products without actually calculating the residuals, you need to also accumulate X'y, weight handling, etc., but no real magic. For the QR code I use a recurrence relation I stole from some slides by Simon Wood to compute R and Q'y incrementally; probably a real incremental QR (e.g., "AS274", which is what R's biglm package uses) would be better, but this one is easy to implement in terms of non-incremental QR. > For either case, I would be very interested to see this kind of > incremental least squares in statsmodels. If you are able to license > your parts as BSD, then I will look at it for sure. I am. > We have some plans for this but not yet implemented. Oh? Do tell... Anyway, attaching the code so you can see the details for yourselves. It won't quite run as is, since it uses some utility routines I haven't included, but you should be able to get the idea. -- Nathaniel -------------- next part -------------- A non-text attachment was scrubbed... Name: incremental_qr.py Type: text/x-python Size: 4845 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: incremental_ls.py Type: text/x-python Size: 28500 bytes Desc: not available URL: From jsseabold at gmail.com Tue Feb 16 17:05:44 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 16 Feb 2010 17:05:44 -0500 Subject: [SciPy-dev] maxentropy docs / parent class methods In-Reply-To: <1cd32cbb1002161341r412704c8y65a3eed7000fce8f@mail.gmail.com> References: <1cd32cbb1002161341r412704c8y65a3eed7000fce8f@mail.gmail.com> Message-ID: On Tue, Feb 16, 2010 at 4:41 PM, wrote: > On Tue, Feb 16, 2010 at 4:10 PM, Skipper Seabold wrote: >> The docs on the methods for the maxentropy model class that are >> inherited from basemodel are not picked up by sphinx. ?It seems that >> most of the .rst files explicitly list the methods, but since >> basemodel is not intended to be public (and some subclasses overwrite >> the parent class methods), is there a better way than to start listing >> the basemodel methods? >> >> I started to make the changes, but I don't think this is the right way >> forward. ?Other thoughts? >> >> http://docs.scipy.org/scipy/docs/scipy-docs/maxentropy.rst/ > > Try to reference directly the basemodel.xxx method in the autoclass for model > > .. autoclass:: model > > .. autosummary:: > ? :toctree: generated/ > > ..... > ? model.dual > ? basemodel.fit > ? model.grad > ? model.log > > Otherwise, it's better to have too much than too little information in the docs > That works, but it's still the basemodel namespace under model. I was looking to statsmodels to see how we autogenerate the inherited methods. I might play around with a local build so I don't clutter up the logs on the wiki. It's changed so that they show up for now while I clean up a little more but is probably not a final solution. Skipper From arokem at berkeley.edu Tue Feb 16 17:11:15 2010 From: arokem at berkeley.edu (Ariel Rokem) Date: Tue, 16 Feb 2010 14:11:15 -0800 Subject: [SciPy-dev] bugfix in optimize.leastsq Message-ID: <43958ee61002161411h27d3ee36y47ac263ccbe6f97b@mail.gmail.com> Hi everyone, I just submitted this ticket: http://projects.scipy.org/scipy/ticket/1115 with a bug-fix in optimize.leastsq (a typo: "warning" s.b. "warnings"). Could someone take a look and tell me if it is correct/apply it? Best - Ariel -- Ariel Rokem Helen Wills Neuroscience Institute University of California, Berkeley http://argentum.ucbso.berkeley.edu/ariel -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Feb 16 17:56:32 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 16 Feb 2010 17:56:32 -0500 Subject: [SciPy-dev] code for incremental least squares In-Reply-To: <961fa2b41002161356vd281f75i46092d4a6e79c559@mail.gmail.com> References: <961fa2b41002161218q1a2c5242o6ae7f2f39d05a179@mail.gmail.com> <9457e7c81002161224r383a5e33sd04f8d1e195cc7bd@mail.gmail.com> <1cd32cbb1002161237q13a3d27ct1cccc1f2927bc25e@mail.gmail.com> <961fa2b41002161356vd281f75i46092d4a6e79c559@mail.gmail.com> Message-ID: <1cd32cbb1002161456p2a5e2414t80e901165b3b2422@mail.gmail.com> On Tue, Feb 16, 2010 at 4:56 PM, Nathaniel Smith wrote: > On Tue, Feb 16, 2010 at 12:37 PM, ? wrote: >> 2010/2/16 St?fan van der Walt : >>> Hi Nathan >>> >>> On 16 February 2010 22:18, Nathaniel Smith wrote: >>>> I have a need for solving some very large least squares regression >>>> problems -- but fortunately my model matrix can be generated >>>> incrementally, so there are some tricks to do the important >>>> calculations without actually loading the whole thing into memory. The >>>> resulting code is rather general, and might be of wider interest, so I >>>> thought I'd send a note here and see what people thought about >>>> upstreaming it somehow... >>> >>> This sounds interesting! ?Could you expand on the incremental >>> generation of the model matrix, and how it is made general? >> >> I have the same thought, Are you increasing by observation or by >> explanatory variable? > > By observation. (This is good since hopefully you have more > observations than explanatory variables!) Actually generating the > model matrix in an incremental way is your problem; but as you > generate each 'strip' of the model matrix, you just hand it to the > code's 'update' method and then forget about it. > > The basic idea inside 'update' is pretty elementary... if you have a > model matrix > ?X = np.row_stack([X1, X2, X3, ...]) > then what we need for least squares calculations is > ?X'X = X1'X1 + X2'X2 + X3'X3 + ... > and we can compute this sum incrementally as the X_i matrices arrive. > Also, it is obviously easy to parallelize the matrix products (and > potentially the generation of the strips themselves, depending on your > situation), and those seem to be the bottleneck. > > There's some linear algebra in working out how to calculate the > residual sum of squares and products without actually calculating the > residuals, you need to also accumulate X'y, weight handling, etc., but > no real magic. > > For the QR code I use a recurrence relation I stole from some slides > by Simon Wood to compute R and Q'y incrementally; probably a real > incremental QR (e.g., "AS274", which is what R's biglm package uses) > would be better, but this one is easy to implement in terms of > non-incremental QR. > >> For either case, I would be very interested to see this kind of >> incremental least squares in statsmodels. If you are able to license >> your parts as BSD, then I will look at it for sure. > > I am. > >> We have some plans for this but not yet implemented. > > Oh? Do tell... pandas has expanding ols, (besides moving ols) which is similar in the idea that, for each new observation (or groups of observations seems to be in your case), the ols estimate is calculated in a recursive way. However your code looks more efficient because of the incremental QR updating, and that you update more summary/sufficient statistics, but I just had a brief look and I don't remember the details of pandas. My application is also the more in time series when validation of the estimator requires continuous updating of the estimate. In pandas case and in my case it is more computational efficiency that makes incremental estimation attractive than memory requirements. For this part I was looking more into using Kalman Filter, but, although I have seen papers that use matrix decomposition to do more efficient updating, my knowledge of recursive QR, or cholesky or ?? is not great. Actually, having more observations than explanatory variables is not necessarily the only standard case anymore. In machine learning and in econometrics in a "Data-Rich Environment", the number of observations and variables might be of the same order. But none of the applications that I know of in econometrics would struggle with memory problems. There are many interesting cases for example for forecasting where checking the forecast performance requires variable selection and re-estimation in each period. I will have a much closer look, but my impression is that the basic structure is not so different from the model structure in statsmodels that it wouldn't fit in. Also, I think that some of your functions might also be useful for other models, and maybe also for pandas. And I will be learning more about how to work with QR directly. Thanks, Josef > > Anyway, attaching the code so you can see the details for yourselves. > It won't quite run as is, since it uses some utility routines I > haven't included, but you should be able to get the idea. > > -- Nathaniel > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > From josef.pktd at gmail.com Tue Feb 16 18:04:44 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 16 Feb 2010 18:04:44 -0500 Subject: [SciPy-dev] Best interface for computing the logarithm of the determinant? In-Reply-To: <1cd32cbb1002161329s29f8da19l298ce62fd726125e@mail.gmail.com> References: <961fa2b41002161249g66dde100s251cf8874d2a996e@mail.gmail.com> <4B7B0BFE.8060109@student.matnat.uio.no> <1cd32cbb1002161329s29f8da19l298ce62fd726125e@mail.gmail.com> Message-ID: <1cd32cbb1002161504h3c24f22q55cb7ed4ac749b35@mail.gmail.com> On Tue, Feb 16, 2010 at 4:29 PM, wrote: > On Tue, Feb 16, 2010 at 4:19 PM, Dag Sverre Seljebotn > wrote: >> Nathaniel Smith wrote: >>> So when you have a matrix whose determinant you want, it's often wise >>> to compute the logarithm of the determinant instead of the determinant >>> itself, because determinants involve lots and lots of multiplications >>> and the result might otherwise underflow/overflow. Therefore, in >>> scikits.sparse, I'd like to provide an API for doing this (and this is >>> well-supported by the underlying libraries). >>> >>> But the problem is that for a general matrix, the determinant may be >>> zero or negative. Obviously we can deal with this, but what's the best >>> API? I'd like to use one consistently across the different >>> factorizations in scikits.sparse, and perhaps eventually in numpy as >>> well. >>> >>> Some options: >>> >>> 1) Split off the sign into a separate return value ('sign' may be 1, -1, 0): >>> sign, value = logdet(A) >>> actual_determinant = sign * exp(value) >>> >>> 2) Allow complex/infinite return values, even when A is a real matrix: >>> logdet(eye(3)) == pi*1j >>> logdet(zeros((3, 3))) == -Inf >> >> I'm +1 for this one. It is easy to interpret, and if one knows that the >> result is positive (like after a successful CHOLMOD LL^T) is is easy >> enough to take the real part by appending .real. >> >> Also it makes more sense for code that may be dealing with both complex >> and real matrices compared to 1). > > I'm also in favor of this, but would prefer if promotion to complex > only occurs if the result requires it. > np.real_if_close ?might be safer than .real > > Josef > >> >>> >>> 3) "Scientific notation" (This is what UMFPACK's API does): return a >>> mantissa and base-10 exponent: >>> mantissa, exponent = logdet(A) >>> actual_determinant = mantissa * 10 ** exponent >> >> This is my least preferred option, because "10" has nothing to do with >> floats. >> >>> >>> 4) Have separate functions for computing the sign, and the log of the >>> absolute value (This is what GSL does, though it seems pointlessly >>> inefficient): >>> sign = sgndet(A) >>> value = logdet(A) >>> actual_determinant = sign * exp(value) >> >> Well, it doesn't have to be inefficient, you use cache the return value >> internally on id(A) and possibly weak reference callbacks to remove the >> cached values...but pretty ugly, yeah. An alternative would be to provide 2 public functions where logdet() would have the same behavior as np.log(np.det()) and any convenient representation for an internal functions. for multivariate normal distribution (loglikelihood) logdet would be very useful. Is it possible to get an efficient implementation in scipy or are the functions only available in CHOLMOD ? Josef >> >> -- >> Dag Sverre >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > From peridot.faceted at gmail.com Tue Feb 16 18:14:02 2010 From: peridot.faceted at gmail.com (Anne Archibald) Date: Tue, 16 Feb 2010 18:14:02 -0500 Subject: [SciPy-dev] Best interface for computing the logarithm of the determinant? In-Reply-To: <4B7B0BFE.8060109@student.matnat.uio.no> References: <961fa2b41002161249g66dde100s251cf8874d2a996e@mail.gmail.com> <4B7B0BFE.8060109@student.matnat.uio.no> Message-ID: On 16 February 2010 16:19, Dag Sverre Seljebotn wrote: > Nathaniel Smith wrote: >> So when you have a matrix whose determinant you want, it's often wise >> to compute the logarithm of the determinant instead of the determinant >> itself, because determinants involve lots and lots of multiplications >> and the result might otherwise underflow/overflow. Therefore, in >> scikits.sparse, I'd like to provide an API for doing this (and this is >> well-supported by the underlying libraries). >> >> But the problem is that for a general matrix, the determinant may be >> zero or negative. Obviously we can deal with this, but what's the best >> API? I'd like to use one consistently across the different >> factorizations in scikits.sparse, and perhaps eventually in numpy as >> well. >> >> Some options: >> >> 1) Split off the sign into a separate return value ('sign' may be 1, -1, 0): >> sign, value = logdet(A) >> actual_determinant = sign * exp(value) I kind of like this one, but it could definitely become a pain. For what it's worth, for complex numbers it means you get a polar representation of the complex number (complex number of magnitude 1 and real log of the magnitude). >> 2) Allow complex/infinite return values, even when A is a real matrix: >> logdet(eye(3)) == pi*1j >> logdet(zeros((3, 3))) == -Inf > > I'm +1 for this one. It is easy to interpret, and if one knows that the > result is positive (like after a successful CHOLMOD LL^T) is is easy > enough to take the real part by appending .real. > > Also it makes more sense for code that may be dealing with both complex > and real matrices compared to 1). I don't really like this very much, since it can be really startling to suddenly have complex numbers appear out of nowhere. But at least it's consistent. >> >> 3) "Scientific notation" (This is what UMFPACK's API does): return a >> mantissa and base-10 exponent: >> mantissa, exponent = logdet(A) >> actual_determinant = mantissa * 10 ** exponent > > This is my least preferred option, because "10" has nothing to do with > floats. This lends itself well to a multiply-and-rescale-when-necessary approach, and avoids transcendental functions entirely; if the 10 offends, you could use 2 or 16, I suppose. But this kind of count-every-flop approach is not really appropriate for scipy/numpy. >> 4) Have separate functions for computing the sign, and the log of the >> absolute value (This is what GSL does, though it seems pointlessly >> inefficient): >> sign = sgndet(A) >> value = logdet(A) >> actual_determinant = sign * exp(value) > > Well, it doesn't have to be inefficient, you use cache the return value > internally on id(A) and possibly weak reference callbacks to remove the > cached values...but pretty ugly, yeah. Since dets are (potentially) expensive operations, no, I don't like this either. Of course, one might want a sgndet function in addition to the usual API. Anne From josef.pktd at gmail.com Tue Feb 16 23:51:40 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 16 Feb 2010 23:51:40 -0500 Subject: [SciPy-dev] code for incremental least squares In-Reply-To: <1cd32cbb1002161456p2a5e2414t80e901165b3b2422@mail.gmail.com> References: <961fa2b41002161218q1a2c5242o6ae7f2f39d05a179@mail.gmail.com> <9457e7c81002161224r383a5e33sd04f8d1e195cc7bd@mail.gmail.com> <1cd32cbb1002161237q13a3d27ct1cccc1f2927bc25e@mail.gmail.com> <961fa2b41002161356vd281f75i46092d4a6e79c559@mail.gmail.com> <1cd32cbb1002161456p2a5e2414t80e901165b3b2422@mail.gmail.com> Message-ID: <1cd32cbb1002162051t2290ac6y2a8bcf243de5127@mail.gmail.com> On Tue, Feb 16, 2010 at 5:56 PM, wrote: > On Tue, Feb 16, 2010 at 4:56 PM, Nathaniel Smith wrote: >> On Tue, Feb 16, 2010 at 12:37 PM, ? wrote: >>> 2010/2/16 St?fan van der Walt : >>>> Hi Nathan >>>> >>>> On 16 February 2010 22:18, Nathaniel Smith wrote: >>>>> I have a need for solving some very large least squares regression >>>>> problems -- but fortunately my model matrix can be generated >>>>> incrementally, so there are some tricks to do the important >>>>> calculations without actually loading the whole thing into memory. The >>>>> resulting code is rather general, and might be of wider interest, so I >>>>> thought I'd send a note here and see what people thought about >>>>> upstreaming it somehow... >>>> >>>> This sounds interesting! ?Could you expand on the incremental >>>> generation of the model matrix, and how it is made general? >>> >>> I have the same thought, Are you increasing by observation or by >>> explanatory variable? >> >> By observation. (This is good since hopefully you have more >> observations than explanatory variables!) Actually generating the >> model matrix in an incremental way is your problem; but as you >> generate each 'strip' of the model matrix, you just hand it to the >> code's 'update' method and then forget about it. >> >> The basic idea inside 'update' is pretty elementary... if you have a >> model matrix >> ?X = np.row_stack([X1, X2, X3, ...]) >> then what we need for least squares calculations is >> ?X'X = X1'X1 + X2'X2 + X3'X3 + ... >> and we can compute this sum incrementally as the X_i matrices arrive. >> Also, it is obviously easy to parallelize the matrix products (and >> potentially the generation of the strips themselves, depending on your >> situation), and those seem to be the bottleneck. >> >> There's some linear algebra in working out how to calculate the >> residual sum of squares and products without actually calculating the >> residuals, you need to also accumulate X'y, weight handling, etc., but >> no real magic. >> >> For the QR code I use a recurrence relation I stole from some slides >> by Simon Wood to compute R and Q'y incrementally; probably a real >> incremental QR (e.g., "AS274", which is what R's biglm package uses) >> would be better, but this one is easy to implement in terms of >> non-incremental QR. >> >>> For either case, I would be very interested to see this kind of >>> incremental least squares in statsmodels. If you are able to license >>> your parts as BSD, then I will look at it for sure. >> >> I am. >> >>> We have some plans for this but not yet implemented. >> >> Oh? Do tell... > > pandas has expanding ols, (besides moving ols) which is similar in the > idea that, for each new observation (or groups of observations seems > to be in your case), the ols estimate is calculated in a recursive > way. > > However your code looks more efficient because of the incremental QR > updating, and that you update more summary/sufficient statistics, but > I just had a brief look and I don't remember the details of pandas. > > My application is also the more in time series when validation of the > estimator requires continuous updating of the estimate. In pandas case > and in my case it is more computational efficiency that makes > incremental estimation attractive than memory requirements. > For this part I was looking more into using Kalman Filter, but, > although I have seen papers that use matrix decomposition to do more > efficient updating, my knowledge of recursive QR, or cholesky or ?? is > not great. > > Actually, having more observations than explanatory variables is not > necessarily the only standard case anymore. In machine learning and in > econometrics in a "Data-Rich Environment", the number of observations > and variables might be of the same order. But none of the applications > that I know of in econometrics would struggle with memory problems. > There are many interesting cases for example for forecasting where > checking the forecast performance requires variable selection and > re-estimation in each period. > > I will have a much closer look, but my impression is that the basic > structure is not so different from the model structure in statsmodels > that it wouldn't fit in. Also, I think that some of your functions > might also be useful for other models, and maybe also for pandas. And > I will be learning more about how to work with QR directly. > > Thanks, > > Josef > >> >> Anyway, attaching the code so you can see the details for yourselves. >> It won't quite run as is, since it uses some utility routines I >> haven't included, but you should be able to get the idea. just a BTW: I found this comment in incremental_ls # R and Scipy disagree by about 3 orders of magnitude here: # pf(4704.7767809675142416, 4, 12, lower.tail=FALSE) == 4.68-19 # 1 - f.cdf(4704.7767809675142416, 4, 12) == 1.11-16 # which I guess is most likely Scipy having limited resolution in the # tails. >>> from scipy import stats >>> stats.f.sf(4704.7767809675142416, 4, 12) 4.6848221938640787e-019 some of our tails are also pretty good Josef >> >> -- Nathaniel >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > From njs at pobox.com Wed Feb 17 00:31:16 2010 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 16 Feb 2010 21:31:16 -0800 Subject: [SciPy-dev] code for incremental least squares In-Reply-To: <1cd32cbb1002162051t2290ac6y2a8bcf243de5127@mail.gmail.com> References: <961fa2b41002161218q1a2c5242o6ae7f2f39d05a179@mail.gmail.com> <9457e7c81002161224r383a5e33sd04f8d1e195cc7bd@mail.gmail.com> <1cd32cbb1002161237q13a3d27ct1cccc1f2927bc25e@mail.gmail.com> <961fa2b41002161356vd281f75i46092d4a6e79c559@mail.gmail.com> <1cd32cbb1002161456p2a5e2414t80e901165b3b2422@mail.gmail.com> <1cd32cbb1002162051t2290ac6y2a8bcf243de5127@mail.gmail.com> Message-ID: <961fa2b41002162131s7e1815e8gb338bfbef0268346@mail.gmail.com> On Tue, Feb 16, 2010 at 8:51 PM, wrote: > I found this comment in incremental_ls > > ? ? ? ?# R and Scipy disagree by about 3 orders of magnitude here: > ? ? ? ?# ?pf(4704.7767809675142416, 4, 12, lower.tail=FALSE) == 4.68-19 > ? ? ? ?# ?1 - f.cdf(4704.7767809675142416, 4, 12) == 1.11-16 > ? ? ? ?# which I guess is most likely Scipy having limited resolution in the > ? ? ? ?# tails. > >>>> from scipy import stats >>>> stats.f.sf(4704.7767809675142416, 4, 12) > 4.6848221938640787e-019 > > some of our tails are also pretty good Oh, awesome, I'd missed .sf() -- thanks for the pointer! It's aesthetically annoying to plot p-values on a graph and always have weird discontinuities where they underflow... -- Nathaniel From josef.pktd at gmail.com Wed Feb 17 00:55:58 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 17 Feb 2010 00:55:58 -0500 Subject: [SciPy-dev] code for incremental least squares In-Reply-To: <961fa2b41002162131s7e1815e8gb338bfbef0268346@mail.gmail.com> References: <961fa2b41002161218q1a2c5242o6ae7f2f39d05a179@mail.gmail.com> <9457e7c81002161224r383a5e33sd04f8d1e195cc7bd@mail.gmail.com> <1cd32cbb1002161237q13a3d27ct1cccc1f2927bc25e@mail.gmail.com> <961fa2b41002161356vd281f75i46092d4a6e79c559@mail.gmail.com> <1cd32cbb1002161456p2a5e2414t80e901165b3b2422@mail.gmail.com> <1cd32cbb1002162051t2290ac6y2a8bcf243de5127@mail.gmail.com> <961fa2b41002162131s7e1815e8gb338bfbef0268346@mail.gmail.com> Message-ID: <1cd32cbb1002162155q303cac6ekc0c8570266bcc3ce@mail.gmail.com> On Wed, Feb 17, 2010 at 12:31 AM, Nathaniel Smith wrote: > On Tue, Feb 16, 2010 at 8:51 PM, ? wrote: >> I found this comment in incremental_ls >> >> ? ? ? ?# R and Scipy disagree by about 3 orders of magnitude here: >> ? ? ? ?# ?pf(4704.7767809675142416, 4, 12, lower.tail=FALSE) == 4.68-19 >> ? ? ? ?# ?1 - f.cdf(4704.7767809675142416, 4, 12) == 1.11-16 >> ? ? ? ?# which I guess is most likely Scipy having limited resolution in the >> ? ? ? ?# tails. >> >>>>> from scipy import stats >>>>> stats.f.sf(4704.7767809675142416, 4, 12) >> 4.6848221938640787e-019 >> >> some of our tails are also pretty good > > Oh, awesome, I'd missed .sf() -- thanks for the pointer! It's > aesthetically annoying to plot p-values on a graph and always have > weird discontinuities where they underflow... I'm glad to help, (especially if I can show off things that work) There are still some strange things left in the range smaller than 1e-8, but every once in a while we catch one and stats.distributions is slowly improving. Filing a ticket for these cases is useful, sometimes it's easy to improve when we find the loss in precision, sometimes it's buried in the c or fortran code in scipy.special and we have to live with 1e-8 or whichever precision we get. (Although it won't make much difference to the statistical significance of any test.) Josef > > -- Nathaniel > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From njs at pobox.com Wed Feb 17 00:58:57 2010 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 16 Feb 2010 21:58:57 -0800 Subject: [SciPy-dev] code for incremental least squares In-Reply-To: <1cd32cbb1002161456p2a5e2414t80e901165b3b2422@mail.gmail.com> References: <961fa2b41002161218q1a2c5242o6ae7f2f39d05a179@mail.gmail.com> <9457e7c81002161224r383a5e33sd04f8d1e195cc7bd@mail.gmail.com> <1cd32cbb1002161237q13a3d27ct1cccc1f2927bc25e@mail.gmail.com> <961fa2b41002161356vd281f75i46092d4a6e79c559@mail.gmail.com> <1cd32cbb1002161456p2a5e2414t80e901165b3b2422@mail.gmail.com> Message-ID: <961fa2b41002162158u77f1e776xd732be0548ff9f05@mail.gmail.com> On Tue, Feb 16, 2010 at 2:56 PM, wrote: > On Tue, Feb 16, 2010 at 4:56 PM, Nathaniel Smith wrote: > pandas has expanding ols, (besides moving ols) which is similar in the > idea that, for each new observation (or groups of observations seems > to be in your case), the ols estimate is calculated in a recursive > way. Yes, and similar code can be used -- certainly you can calculate both expanding and moving ols using what I've been calling "the cholesky approach". (CHOLMOD actually includes code for "update" and "downdate" operations, i.e., adding/removing rows to your model matrix directly on the decomposition, without saving X'X separately and re-decomposing. Of course, if your model matrix isn't sparse then CHOLMOD is irrelevant.) > However your code looks more efficient because of the incremental QR > updating, and that you update more summary/sufficient statistics, but > I just had a brief look and I don't remember the details of pandas. I actually don't use the QR approach much, because it is much slower than the Cholesky approach. In principle the QR version might be more numerically stable (so far I've been lucky and haven't had problems), and it is useful in some other cases (e.g. the thoroughly magic spline fitting routines in R package "mgcv" have a mode that lets you fit very large data sets by doing the QR decomposition in advance with something like incremental_qr, and then pass in the decomposition to the magic fitting functions), so it's worth keeping around. But I don't know if it's possible to do a QR "downdate" for the "moving ols" case to toss out observations at the trailing edge. > My application is also the more in time series when validation of the > estimator requires continuous updating of the estimate. In pandas case > and in my case it is more computational efficiency that makes > incremental estimation attractive than memory requirements. > For this part I was looking more into using Kalman Filter, but, > although I have seen papers that use matrix decomposition to do more > efficient updating, my knowledge of recursive QR, or cholesky or ?? is > not great. If your goal is robust, fast, incremental OLS for general consumption on potentially misbehaved data, then I'd definitely recommend looking into true incremental QR algorithms, like that "AS274" I mentioned. It looks like the biglm folks figured out how to do incremental sandwich estimation too: see slide 32-ish: http://faculty.washington.edu/tlumley/tutorials/user-biglm.pdf > Actually, having more observations than explanatory variables is not > necessarily the only standard case anymore. In machine learning and in > econometrics in a "Data-Rich Environment", the number of observations > and variables might be of the same order. Yes, but then you need some regularization to make the problem well-defined -- simple OLS will blow up. (Of course, penalized least squares may work fine, and is just as tractable as OLS; I think the same techniques work, though I haven't checked the details.) > But none of the applications > that I know of in econometrics would struggle with memory problems. Yes, my data sets are from EEG (brain waves), so I get 250 samples per second (or more), and I'm trying to estimate whole impulse response curves (so I have predictors like "event with property x1 occurred 100 ms ago", "event with property x1 occurred 104 ms ago", ...), which leads to thousands of predictors and millions of observations = tens of gigabyte sized model matrices. It's a somewhat unusual situation to be in, but I guess data sets grow fast enough that it may become more common... -- Nathaniel From stefan at sun.ac.za Wed Feb 17 02:07:23 2010 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 17 Feb 2010 09:07:23 +0200 Subject: [SciPy-dev] code for incremental least squares In-Reply-To: <961fa2b41002161356vd281f75i46092d4a6e79c559@mail.gmail.com> References: <961fa2b41002161218q1a2c5242o6ae7f2f39d05a179@mail.gmail.com> <9457e7c81002161224r383a5e33sd04f8d1e195cc7bd@mail.gmail.com> <1cd32cbb1002161237q13a3d27ct1cccc1f2927bc25e@mail.gmail.com> <961fa2b41002161356vd281f75i46092d4a6e79c559@mail.gmail.com> Message-ID: <9457e7c81002162307q3dc59492y9905dcc525a49575@mail.gmail.com> Hi Nathaniel On 16 February 2010 23:56, Nathaniel Smith wrote: > The basic idea inside 'update' is pretty elementary... if you have a > model matrix > ?X = np.row_stack([X1, X2, X3, ...]) > then what we need for least squares calculations is > ?X'X = X1'X1 + X2'X2 + X3'X3 + ... > and we can compute this sum incrementally as the X_i matrices arrive. Forming the product A^T A is often a bad idea from a numerical perspective. In Chapter 3 of Ake Bjork's "Numerical Methods for Least Squares Problems", he talks about "update" problems. He mentions that "...the solution should be accurate up to the limitation of data and conditioning of the problem; i.e., a stable numerical method must be used." He describes the Kalman-based update method (which bears some resemblance to yours), but says that "the main disadvantage...is its serious sensitivity to roundoff errors. The updating algorithms based on orthogonal transformations developed in the following sections are generally to be preferred." He then goes into more detail on updating the QR and Gram-Schmidt decompositions. Not sure if that helps, but it may be worth reading that chapter. > For the QR code I use a recurrence relation I stole from some slides > by Simon Wood to compute R and Q'y incrementally; probably a real > incremental QR (e.g., "AS274", which is what R's biglm package uses) > would be better, but this one is easy to implement in terms of > non-incremental QR. Incremental QR is something we should implement in scipy.linalg, for sure. Regards St?fan From charlesr.harris at gmail.com Wed Feb 17 02:46:25 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 17 Feb 2010 00:46:25 -0700 Subject: [SciPy-dev] code for incremental least squares In-Reply-To: <961fa2b41002161356vd281f75i46092d4a6e79c559@mail.gmail.com> References: <961fa2b41002161218q1a2c5242o6ae7f2f39d05a179@mail.gmail.com> <9457e7c81002161224r383a5e33sd04f8d1e195cc7bd@mail.gmail.com> <1cd32cbb1002161237q13a3d27ct1cccc1f2927bc25e@mail.gmail.com> <961fa2b41002161356vd281f75i46092d4a6e79c559@mail.gmail.com> Message-ID: On Tue, Feb 16, 2010 at 2:56 PM, Nathaniel Smith wrote: > On Tue, Feb 16, 2010 at 12:37 PM, wrote: > > 2010/2/16 St?fan van der Walt : > >> Hi Nathan > >> > >> On 16 February 2010 22:18, Nathaniel Smith wrote: > >>> I have a need for solving some very large least squares regression > >>> problems -- but fortunately my model matrix can be generated > >>> incrementally, so there are some tricks to do the important > >>> calculations without actually loading the whole thing into memory. The > >>> resulting code is rather general, and might be of wider interest, so I > >>> thought I'd send a note here and see what people thought about > >>> upstreaming it somehow... > >> > >> This sounds interesting! Could you expand on the incremental > >> generation of the model matrix, and how it is made general? > > > > I have the same thought, Are you increasing by observation or by > > explanatory variable? > > By observation. (This is good since hopefully you have more > observations than explanatory variables!) Actually generating the > model matrix in an incremental way is your problem; but as you > generate each 'strip' of the model matrix, you just hand it to the > code's 'update' method and then forget about it. > > The basic idea inside 'update' is pretty elementary... if you have a > model matrix > X = np.row_stack([X1, X2, X3, ...]) > then what we need for least squares calculations is > X'X = X1'X1 + X2'X2 + X3'X3 + ... > and we can compute this sum incrementally as the X_i matrices arrive. > Also, it is obviously easy to parallelize the matrix products (and > potentially the generation of the strips themselves, depending on your > situation), and those seem to be the bottleneck. > > Did you look into Kalman filters? They probably aren't the most efficient approach but they give you incremental solutions along the way if you want them. There are also various factored versions available which avoid the final decompostion. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dagss at student.matnat.uio.no Wed Feb 17 02:49:59 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 17 Feb 2010 08:49:59 +0100 Subject: [SciPy-dev] Best interface for computing the logarithm of the determinant? In-Reply-To: <1cd32cbb1002161504h3c24f22q55cb7ed4ac749b35@mail.gmail.com> References: <961fa2b41002161249g66dde100s251cf8874d2a996e@mail.gmail.com> <4B7B0BFE.8060109@student.matnat.uio.no> <1cd32cbb1002161329s29f8da19l298ce62fd726125e@mail.gmail.com> <1cd32cbb1002161504h3c24f22q55cb7ed4ac749b35@mail.gmail.com> Message-ID: <4B7B9FA7.4060806@student.matnat.uio.no> josef.pktd at gmail.com wrote: > On Tue, Feb 16, 2010 at 4:29 PM, wrote: > >> On Tue, Feb 16, 2010 at 4:19 PM, Dag Sverre Seljebotn >> wrote: >> >>> Nathaniel Smith wrote: >>> >>>> So when you have a matrix whose determinant you want, it's often wise >>>> to compute the logarithm of the determinant instead of the determinant >>>> itself, because determinants involve lots and lots of multiplications >>>> and the result might otherwise underflow/overflow. Therefore, in >>>> scikits.sparse, I'd like to provide an API for doing this (and this is >>>> well-supported by the underlying libraries). >>>> >>>> But the problem is that for a general matrix, the determinant may be >>>> zero or negative. Obviously we can deal with this, but what's the best >>>> API? I'd like to use one consistently across the different >>>> factorizations in scikits.sparse, and perhaps eventually in numpy as >>>> well. >>>> >>>> Some options: >>>> >>>> 1) Split off the sign into a separate return value ('sign' may be 1, -1, 0): >>>> sign, value = logdet(A) >>>> actual_determinant = sign * exp(value) >>>> >>>> 2) Allow complex/infinite return values, even when A is a real matrix: >>>> logdet(eye(3)) == pi*1j >>>> logdet(zeros((3, 3))) == -Inf >>>> >>> I'm +1 for this one. It is easy to interpret, and if one knows that the >>> result is positive (like after a successful CHOLMOD LL^T) is is easy >>> enough to take the real part by appending .real. >>> >>> Also it makes more sense for code that may be dealing with both complex >>> and real matrices compared to 1). >>> >> I'm also in favor of this, but would prefer if promotion to complex >> only occurs if the result requires it. >> np.real_if_close might be safer than .real >> >> Josef >> >> >>>> 3) "Scientific notation" (This is what UMFPACK's API does): return a >>>> mantissa and base-10 exponent: >>>> mantissa, exponent = logdet(A) >>>> actual_determinant = mantissa * 10 ** exponent >>>> >>> This is my least preferred option, because "10" has nothing to do with >>> floats. >>> >>> >>>> 4) Have separate functions for computing the sign, and the log of the >>>> absolute value (This is what GSL does, though it seems pointlessly >>>> inefficient): >>>> sign = sgndet(A) >>>> value = logdet(A) >>>> actual_determinant = sign * exp(value) >>>> >>> Well, it doesn't have to be inefficient, you use cache the return value >>> internally on id(A) and possibly weak reference callbacks to remove the >>> cached values...but pretty ugly, yeah. >>> > > An alternative would be to provide 2 public functions where logdet() > would have the same behavior as np.log(np.det()) and any convenient > representation for an internal functions. > > for multivariate normal distribution (loglikelihood) logdet would be > very useful. Is it possible to get an efficient implementation in > scipy or are the functions only available in CHOLMOD ? > You mean like this? 2 * np.sum(np.log(np.diagonal(np.linalg.cholesky(A)))) (Using scipy.linalg.cho_solve would be very slightly faster I guess.) Of course, in practice you usually get the Cholesky factor once and then reuse it both for solving and for getting the determinant. So what you want is a symbolic "solution" object which allows solving, getting determinant etc. scikits.sparse has this for Cholesky factors but SciPy doesn't. I don't think SciPy should get it though -- SciPy usually has a "lower-level" approach. Therefore I've written "oomatrix.py" (polymorphic matrices) which does this in a higher-level way for my own purposes (not available online yet...will have to see when I get time to polish it...): A, B = ...numpy arrays... M = oomatrix.matrix(A) # Create immutable matrix object from array C M.solve_right(B, algorithm='cholesky') # does cholesky and caches it M.log_determinant() # from cached decomposition P, L = M.cholesky() Alternatively, constructing oomatrix.matrix(A, sparse=True) will transparently switch to using CHOLMOD as backend. Dag Sverre From njs at pobox.com Wed Feb 17 03:32:45 2010 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 17 Feb 2010 00:32:45 -0800 Subject: [SciPy-dev] code for incremental least squares In-Reply-To: <9457e7c81002162307q3dc59492y9905dcc525a49575@mail.gmail.com> References: <961fa2b41002161218q1a2c5242o6ae7f2f39d05a179@mail.gmail.com> <9457e7c81002161224r383a5e33sd04f8d1e195cc7bd@mail.gmail.com> <1cd32cbb1002161237q13a3d27ct1cccc1f2927bc25e@mail.gmail.com> <961fa2b41002161356vd281f75i46092d4a6e79c559@mail.gmail.com> <9457e7c81002162307q3dc59492y9905dcc525a49575@mail.gmail.com> Message-ID: <961fa2b41002170032u72d6a22ajafd95af99da6f57d@mail.gmail.com> 2010/2/16 St?fan van der Walt : > Forming the product A^T A is often a bad idea from a numerical > perspective. Right. > In Chapter 3 of Ake Bjork's "Numerical Methods for Least > Squares Problems", he talks about "update" problems. ?He mentions that > "...the solution should be accurate up to the limitation of data and > conditioning of the problem; i.e., a stable numerical method must be > used." > > He describes the Kalman-based update method (which bears some > resemblance to yours), but says that "the main disadvantage...is its > serious sensitivity to roundoff errors. ?The updating algorithms based > on orthogonal transformations developed in the following sections are > generally to be preferred." ?He then goes into more detail on updating > the QR and Gram-Schmidt decompositions. > > Not sure if that helps, but it may be worth reading that chapter. Definitely -- thanks for the reference! Looks like the library has a copy available, too... (Not that I'll necessarily bother for my own use, since like I said, my problems seem to be fairly well conditioned and the sparse X'X + Cholesky approach is very fast, and also generalizes easily to mixed effect models, which I'm currently working on implementing.) >> For the QR code I use a recurrence relation I stole from some slides >> by Simon Wood to compute R and Q'y incrementally; probably a real >> incremental QR (e.g., "AS274", which is what R's biglm package uses) >> would be better, but this one is easy to implement in terms of >> non-incremental QR. > > Incremental QR is something we should implement in scipy.linalg, for sure. +1 -- Nathaniel From tom.grydeland at gmail.com Wed Feb 17 05:22:17 2010 From: tom.grydeland at gmail.com (Tom Grydeland) Date: Wed, 17 Feb 2010 11:22:17 +0100 Subject: [SciPy-dev] Latex and docstrings In-Reply-To: <4B7215FE.5030803@silveregg.co.jp> References: <4B7215FE.5030803@silveregg.co.jp> Message-ID: On Wed, Feb 10, 2010 at 3:12 AM, David Cournapeau wrote: > Hi, > > I noticed that some of the docstrings I have written for DCT have been > changed to latex format. While I have no issue with having latex in the > documentation, I thought the consensus was to use them sparingly in > docstrings ? I am probably the one to "blame" here. I know I have edited the DCT docstrings. What is considered "sparingly" is obviously different from one person to the next, and I have taken it to mean roughly "where pure text is insufficient", and (along the discussion re numpy.fft and friends ) okay for module-level docstrings but avoid it for functions. Since I have written a lot of latex, I might have a lower threshold than others here. > For example, the dct I formula used to be (fixed width font assumed): > > for 0 <= k < N, > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?N-1 > y[k] = x[0] + (-1)**k x[N-1] + 2 * sum x[n]*cos(pi*k*n/(N-1)) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?n=0 > > But now, it is: > > y_k = x_0 + (-1)^k x_{N-1} + 2\\sum_{n=1}^{N-2} x_n > ? ? ? ? \\cos\\left({\\pi nk\\over N-1}\\right), > ? ? ? ? \\qquad 0 \\le k < N. > > I much prefer the former (the latter is unreadable in a terminal IMO). I > have of course no issue in putting the latex formula in the scipy docs, Similarly, the former is unreadable or incorrect in the web interface where the latter is useful: http://docs.scipy.org/scipy/docs/scipy.fftpack.realtransforms.dct/ If you change it back, please observe that the limits on the summation are different for the two versions. > David Regards, -- Tom Grydeland From ralf.gommers at googlemail.com Wed Feb 17 06:16:21 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 17 Feb 2010 19:16:21 +0800 Subject: [SciPy-dev] maxentropy docs / parent class methods In-Reply-To: References: <1cd32cbb1002161341r412704c8y65a3eed7000fce8f@mail.gmail.com> Message-ID: On Wed, Feb 17, 2010 at 6:05 AM, Skipper Seabold wrote: > On Tue, Feb 16, 2010 at 4:41 PM, wrote: > > On Tue, Feb 16, 2010 at 4:10 PM, Skipper Seabold > wrote: > >> The docs on the methods for the maxentropy model class that are > >> inherited from basemodel are not picked up by sphinx. It seems that > >> most of the .rst files explicitly list the methods, but since > >> basemodel is not intended to be public (and some subclasses overwrite > >> the parent class methods), is there a better way than to start listing > >> the basemodel methods? > >> > >> I started to make the changes, but I don't think this is the right way > >> forward. Other thoughts? > >> > >> http://docs.scipy.org/scipy/docs/scipy-docs/maxentropy.rst/ > > > > Try to reference directly the basemodel.xxx method in the autoclass for > model > > > > .. autoclass:: model > > > > .. autosummary:: > > :toctree: generated/ > > > > ..... > > model.dual > > basemodel.fit > > model.grad > > model.log > > > > Otherwise, it's better to have too much than too little information in > the docs > > > > That works, but it's still the basemodel namespace under model. I was > looking to statsmodels to see how we autogenerate the inherited > methods. I might play around with a local build so I don't clutter up > the logs on the wiki. It's changed so that they show up for now while > I clean up a little more but is probably not a final solution. > > Those methods do not belong just to 'model' but also to 'bigmodel' and 'conditionalmodel'. So listing them under 'model' is a bit arbitrary. I think the correct way to do this is to list them under an .. autoclass:: basemodel, with a note that this class contains shared functionality and should not be instantiated directly. Also, I do not see in the code that basemodel is not public. Why did you conclude this? If a user wants to implement a new model, shouldn't he inherit from basemodel? In that case it should be public. If that's not the case, an __all__ dict should be added to the module to indicate what is public and what is not. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Wed Feb 17 07:21:05 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 17 Feb 2010 20:21:05 +0800 Subject: [SciPy-dev] Latex and docstrings In-Reply-To: References: <4B7215FE.5030803@silveregg.co.jp> Message-ID: On Wed, Feb 17, 2010 at 6:22 PM, Tom Grydeland wrote: > On Wed, Feb 10, 2010 at 3:12 AM, David Cournapeau > wrote: > > Hi, > > > > I noticed that some of the docstrings I have written for DCT have been > > changed to latex format. While I have no issue with having latex in the > > documentation, I thought the consensus was to use them sparingly in > > docstrings ? > > I am probably the one to "blame" here. I know I have edited the DCT > docstrings. > > What is considered "sparingly" is obviously different from one person > to the next, and I have taken it to mean roughly "where pure text is > insufficient", and (along the discussion re numpy.fft and friends > ) okay for module-level > docstrings but avoid it for functions. > That's about right. In function docstrings latex should only sparingly be used in the Notes section (see http://projects.scipy.org/numpy/wiki/CodingStyleGuidelines). This policy has worked pretty well so far, the problem for this particular docstring is that the Notes section is huge with lots of maths. > > > For example, the dct I formula used to be (fixed width font assumed): > > > > for 0 <= k < N, > > > > N-1 > > y[k] = x[0] + (-1)**k x[N-1] + 2 * sum x[n]*cos(pi*k*n/(N-1)) > > n=0 > > > > But now, it is: > > > > y_k = x_0 + (-1)^k x_{N-1} + 2\\sum_{n=1}^{N-2} x_n > > \\cos\\left({\\pi nk\\over N-1}\\right), > > \\qquad 0 \\le k < N. > > > > I much prefer the former (the latter is unreadable in a terminal IMO). I > > have of course no issue in putting the latex formula in the scipy docs, > > Similarly, the former is unreadable or incorrect in the web interface > where the latter is useful: > http://docs.scipy.org/scipy/docs/scipy.fftpack.realtransforms.dct/ > > If you change it back, please observe that the limits on the summation > are different for the two versions. Fixing this docstring for both terminal and html/pdf does not seem possible, short of removing content or writing a new Sphinx plugin. For now I would be in favor of keeping the latex, because going to the html doc for terminal users is doable. Users of the html docs on the other hand are unlikely to all know their way around a terminal. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Feb 17 09:16:31 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 17 Feb 2010 09:16:31 -0500 Subject: [SciPy-dev] Best interface for computing the logarithm of the determinant? In-Reply-To: <4B7B9FA7.4060806@student.matnat.uio.no> References: <961fa2b41002161249g66dde100s251cf8874d2a996e@mail.gmail.com> <4B7B0BFE.8060109@student.matnat.uio.no> <1cd32cbb1002161329s29f8da19l298ce62fd726125e@mail.gmail.com> <1cd32cbb1002161504h3c24f22q55cb7ed4ac749b35@mail.gmail.com> <4B7B9FA7.4060806@student.matnat.uio.no> Message-ID: <1cd32cbb1002170616r60779c28p3ac6258f6c74b6d9@mail.gmail.com> On Wed, Feb 17, 2010 at 2:49 AM, Dag Sverre Seljebotn wrote: > josef.pktd at gmail.com wrote: >> On Tue, Feb 16, 2010 at 4:29 PM, ? wrote: >> >>> On Tue, Feb 16, 2010 at 4:19 PM, Dag Sverre Seljebotn >>> wrote: >>> >>>> Nathaniel Smith wrote: >>>> >>>>> So when you have a matrix whose determinant you want, it's often wise >>>>> to compute the logarithm of the determinant instead of the determinant >>>>> itself, because determinants involve lots and lots of multiplications >>>>> and the result might otherwise underflow/overflow. Therefore, in >>>>> scikits.sparse, I'd like to provide an API for doing this (and this is >>>>> well-supported by the underlying libraries). >>>>> >>>>> But the problem is that for a general matrix, the determinant may be >>>>> zero or negative. Obviously we can deal with this, but what's the best >>>>> API? I'd like to use one consistently across the different >>>>> factorizations in scikits.sparse, and perhaps eventually in numpy as >>>>> well. >>>>> >>>>> Some options: >>>>> >>>>> 1) Split off the sign into a separate return value ('sign' may be 1, -1, 0): >>>>> sign, value = logdet(A) >>>>> actual_determinant = sign * exp(value) >>>>> >>>>> 2) Allow complex/infinite return values, even when A is a real matrix: >>>>> logdet(eye(3)) == pi*1j >>>>> logdet(zeros((3, 3))) == -Inf >>>>> >>>> I'm +1 for this one. It is easy to interpret, and if one knows that the >>>> result is positive (like after a successful CHOLMOD LL^T) is is easy >>>> enough to take the real part by appending .real. >>>> >>>> Also it makes more sense for code that may be dealing with both complex >>>> and real matrices compared to 1). >>>> >>> I'm also in favor of this, but would prefer if promotion to complex >>> only occurs if the result requires it. >>> np.real_if_close ?might be safer than .real >>> >>> Josef >>> >>> >>>>> 3) "Scientific notation" (This is what UMFPACK's API does): return a >>>>> mantissa and base-10 exponent: >>>>> mantissa, exponent = logdet(A) >>>>> actual_determinant = mantissa * 10 ** exponent >>>>> >>>> This is my least preferred option, because "10" has nothing to do with >>>> floats. >>>> >>>> >>>>> 4) Have separate functions for computing the sign, and the log of the >>>>> absolute value (This is what GSL does, though it seems pointlessly >>>>> inefficient): >>>>> sign = sgndet(A) >>>>> value = logdet(A) >>>>> actual_determinant = sign * exp(value) >>>>> >>>> Well, it doesn't have to be inefficient, you use cache the return value >>>> internally on id(A) and possibly weak reference callbacks to remove the >>>> cached values...but pretty ugly, yeah. >>>> >> >> An alternative would be to provide 2 public functions where logdet() >> would have the same behavior as np.log(np.det()) ?and any convenient >> representation for an internal functions. >> >> for multivariate normal distribution (loglikelihood) logdet would be >> very useful. Is it possible to get an efficient implementation in >> scipy or are the functions only available in CHOLMOD ? >> > You mean like this? > > 2 * np.sum(np.log(np.diagonal(np.linalg.cholesky(A)))) Thanks, I didn't know it is as simple as this. > > (Using scipy.linalg.cho_solve would be very slightly faster I guess.) > > Of course, in practice you usually get the Cholesky factor once and then > reuse it both for solving and for getting the determinant. So what you > want is a symbolic "solution" object which allows solving, getting > determinant etc. scikits.sparse has this for Cholesky factors but SciPy > doesn't. > > I don't think SciPy should get it though -- SciPy usually has a > "lower-level" approach. Therefore I've written "oomatrix.py" > (polymorphic matrices) which does this in a higher-level way for my own > purposes (not available online yet...will have to see when I get time to > polish it...): I like to keep some one-liner functions around just to tell or remind me how to do it. Scipy.stats currently doesn't need it, (maybe when a multivariate normal class is included, and I just saw that it might be possible to improve stats.kde by storing the determinant instead of calculating it repeatedly.) In statsmodels, we store currently the cholesky of the pinv of the covariance matrix, but statsmodels is still mostly least squares based where we don't need the determinant, and we are still expanding MLE. So, these linear algebra tricks will come in handy. Josef > > A, B = ...numpy arrays... > M = oomatrix.matrix(A) # Create immutable matrix object from array > C M.solve_right(B, algorithm='cholesky') # does cholesky and caches it > M.log_determinant() # from cached decomposition > P, L = M.cholesky() > > Alternatively, constructing oomatrix.matrix(A, sparse=True) will > transparently switch to using CHOLMOD as backend. > > Dag Sverre > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From jsseabold at gmail.com Wed Feb 17 09:46:52 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Wed, 17 Feb 2010 09:46:52 -0500 Subject: [SciPy-dev] maxentropy docs / parent class methods In-Reply-To: References: <1cd32cbb1002161341r412704c8y65a3eed7000fce8f@mail.gmail.com> Message-ID: On Wed, Feb 17, 2010 at 6:16 AM, Ralf Gommers wrote: > > > On Wed, Feb 17, 2010 at 6:05 AM, Skipper Seabold > wrote: >> >> On Tue, Feb 16, 2010 at 4:41 PM, ? wrote: >> > On Tue, Feb 16, 2010 at 4:10 PM, Skipper Seabold >> > wrote: >> >> The docs on the methods for the maxentropy model class that are >> >> inherited from basemodel are not picked up by sphinx. ?It seems that >> >> most of the .rst files explicitly list the methods, but since >> >> basemodel is not intended to be public (and some subclasses overwrite >> >> the parent class methods), is there a better way than to start listing >> >> the basemodel methods? >> >> >> >> I started to make the changes, but I don't think this is the right way >> >> forward. ?Other thoughts? >> >> >> >> http://docs.scipy.org/scipy/docs/scipy-docs/maxentropy.rst/ >> > >> > Try to reference directly the basemodel.xxx method in the autoclass for >> > model >> > >> > .. autoclass:: model >> > >> > .. autosummary:: >> > ? :toctree: generated/ >> > >> > ..... >> > ? model.dual >> > ? basemodel.fit >> > ? model.grad >> > ? model.log >> > >> > Otherwise, it's better to have too much than too little information in >> > the docs >> > >> >> That works, but it's still the basemodel namespace under model. ?I was >> looking to statsmodels to see how we autogenerate the inherited >> methods. ?I might play around with a local build so I don't clutter up >> the logs on the wiki. ?It's changed so that they show up for now while >> I clean up a little more but is probably not a final solution. >> > Those methods do not belong just to 'model' but also to 'bigmodel' and > 'conditionalmodel'. So listing them under 'model' is a bit arbitrary. I > think the correct way to do this is to list them under an .. autoclass:: > basemodel, with a note that this class contains shared functionality and > should not be instantiated directly. > Agreed. I will try this and make sure the note that is already in the docstring shows up. > Also, I do not see in the code that basemodel is not public. Why did you > conclude this? If a user wants to implement a new model, shouldn't he > inherit from basemodel? In that case it should be public. If that's not the > case, an __all__ dict should be added to the module to indicate what is > public and what is not. > I guess I am not using public/private right. I was thinking private == can't be instantiated, but what you say is correct (I will be using it for generalized maxent, etc.), so I guess it's public. Thanks, Skipper From josef.pktd at gmail.com Wed Feb 17 10:10:05 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 17 Feb 2010 10:10:05 -0500 Subject: [SciPy-dev] code for incremental least squares In-Reply-To: <961fa2b41002170032u72d6a22ajafd95af99da6f57d@mail.gmail.com> References: <961fa2b41002161218q1a2c5242o6ae7f2f39d05a179@mail.gmail.com> <9457e7c81002161224r383a5e33sd04f8d1e195cc7bd@mail.gmail.com> <1cd32cbb1002161237q13a3d27ct1cccc1f2927bc25e@mail.gmail.com> <961fa2b41002161356vd281f75i46092d4a6e79c559@mail.gmail.com> <9457e7c81002162307q3dc59492y9905dcc525a49575@mail.gmail.com> <961fa2b41002170032u72d6a22ajafd95af99da6f57d@mail.gmail.com> Message-ID: <1cd32cbb1002170710l8365a7bic20c33a0cc5f1379@mail.gmail.com> On Wed, Feb 17, 2010 at 3:32 AM, Nathaniel Smith wrote: > 2010/2/16 St?fan van der Walt : >> Forming the product A^T A is often a bad idea from a numerical >> perspective. > > Right. > >> In Chapter 3 of Ake Bjork's "Numerical Methods for Least >> Squares Problems", he talks about "update" problems. ?He mentions that >> "...the solution should be accurate up to the limitation of data and >> conditioning of the problem; i.e., a stable numerical method must be >> used." >> >> He describes the Kalman-based update method (which bears some >> resemblance to yours), but says that "the main disadvantage...is its >> serious sensitivity to roundoff errors. ?The updating algorithms based >> on orthogonal transformations developed in the following sections are >> generally to be preferred." ?He then goes into more detail on updating >> the QR and Gram-Schmidt decompositions. >> >> Not sure if that helps, but it may be worth reading that chapter. > > Definitely -- thanks for the reference! Looks like the library has a > copy available, too... > > (Not that I'll necessarily bother for my own use, since like I said, > my problems seem to be fairly well conditioned and the sparse X'X + > Cholesky approach is very fast, and also generalizes easily to mixed > effect models, which I'm currently working on implementing.) > >>> For the QR code I use a recurrence relation I stole from some slides >>> by Simon Wood to compute R and Q'y incrementally; probably a real >>> incremental QR (e.g., "AS274", which is what R's biglm package uses) >>> would be better, but this one is easy to implement in terms of >>> non-incremental QR. >> >> Incremental QR is something we should implement in scipy.linalg, for sure. > > +1 Just a follow up question for the experts. In the description of some estimators and the optimization routines used, I have read that the updating is done on the cholesky or QR decomposition of the inverse matrix ( (x'x)^{-1} or inverse Hessian). Except for simple least squares problem the inverse is often more important than the the original X'X. Are there algorithms available for this, and is there an advantage for either way? From what I have seen in incremental_ls, the updating works on QR of x'x and then the inverse of the QR decomposition is taken. Assuming a well behaved non-singular X'X. Nathaniel, I will need more time to work through this but will come back to it soon. from njs.util import block_diagonal_stack seems to be the only function that is missing Thanks, Josef > > -- Nathaniel > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From charlesr.harris at gmail.com Wed Feb 17 11:03:23 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 17 Feb 2010 09:03:23 -0700 Subject: [SciPy-dev] code for incremental least squares In-Reply-To: <1cd32cbb1002170710l8365a7bic20c33a0cc5f1379@mail.gmail.com> References: <961fa2b41002161218q1a2c5242o6ae7f2f39d05a179@mail.gmail.com> <9457e7c81002161224r383a5e33sd04f8d1e195cc7bd@mail.gmail.com> <1cd32cbb1002161237q13a3d27ct1cccc1f2927bc25e@mail.gmail.com> <961fa2b41002161356vd281f75i46092d4a6e79c559@mail.gmail.com> <9457e7c81002162307q3dc59492y9905dcc525a49575@mail.gmail.com> <961fa2b41002170032u72d6a22ajafd95af99da6f57d@mail.gmail.com> <1cd32cbb1002170710l8365a7bic20c33a0cc5f1379@mail.gmail.com> Message-ID: On Wed, Feb 17, 2010 at 8:10 AM, wrote: > On Wed, Feb 17, 2010 at 3:32 AM, Nathaniel Smith