From ralf.gommers at gmail.com Fri Jan 2 10:09:58 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 2 Jan 2015 16:09:58 +0100 Subject: [SciPy-Dev] adding exponential window to signal Message-ID: Hi, https://github.com/scipy/scipy/pull/4348 proposes to add a new window function to scipy.signal, this one to be exact: https://en.wikipedia.org/wiki/Window_function#Exponential_or_Poisson_window It seems fairly straightforward, but if there are users of such a window function then please have a look that the API is what you'd expect. Any other comments also welcome of course. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From andyfaff at gmail.com Wed Jan 7 20:57:04 2015 From: andyfaff at gmail.com (Andrew Nelson) Date: Thu, 8 Jan 2015 12:57:04 +1100 Subject: [SciPy-Dev] Halting optimize.minimize Message-ID: Dear devs, often one might want to halt a minimization, especially if the minimization is taking a long time. This could be done by raising an Exception in the objective function or callback function. e.g. def callback(xk): if something_is_not_to_our_liking(xk): raise HaltError('we wanted to stop') However, on halting one would not get the best solution found so far. Could the optimize.minimize functions be adapted to stop a minimizer? This could be done in a couple of ways: 1) the minimizer could catch something like a HaltError, raised either in the objective function or in the callback function. When it's caught the minimizer should return the best solution so far.. I'm not sure how this would work with the internals of each minimizer. 2) the callback could return False if the minimizer has to stop, or True to keep going. Again, I'm not sure how this would work with the internals of each minimizer. I'd also like to be able to halt a 'leastsq' optimization, in a similar fashion. What would be the best way of going about this? cheers, Andrew. -- _____________________________________ Dr. Andrew Nelson _____________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmcgibbo at gmail.com Wed Jan 7 21:33:23 2015 From: rmcgibbo at gmail.com (Robert McGibbon) Date: Wed, 7 Jan 2015 18:33:23 -0800 Subject: [SciPy-Dev] Halting optimize.minimize In-Reply-To: References: Message-ID: +1 for something like this. In some of my applications, I want to remove variables from the optimization when they get sufficiently close to zero, so some control flow is needed to halt the optimization and then restart it with a different (reduced) set of independent variables. -Robert On Wed, Jan 7, 2015 at 5:57 PM, Andrew Nelson wrote: > Dear devs, > often one might want to halt a minimization, especially if the > minimization is taking a long time. > This could be done by raising an Exception in the objective function or > callback function. e.g. > > def callback(xk): > if something_is_not_to_our_liking(xk): > raise HaltError('we wanted to stop') > > However, on halting one would not get the best solution found so far. > Could the optimize.minimize functions be adapted to stop a minimizer? This > could be done in a couple of ways: > 1) the minimizer could catch something like a HaltError, raised either in > the objective function or in the callback function. When it's caught the > minimizer should return the best solution so far.. I'm not sure how this > would work with the internals of each minimizer. > 2) the callback could return False if the minimizer has to stop, or True > to keep going. Again, I'm not sure how this would work with the internals > of each minimizer. > > I'd also like to be able to halt a 'leastsq' optimization, in a similar > fashion. What would be the best way of going about this? > > cheers, > Andrew. > > > > -- > _____________________________________ > Dr. Andrew Nelson > > > _____________________________________ > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andyfaff at gmail.com Thu Jan 8 06:59:47 2015 From: andyfaff at gmail.com (Andrew Nelson) Date: Thu, 8 Jan 2015 22:59:47 +1100 Subject: [SciPy-Dev] Halting optimize.minimize In-Reply-To: References: Message-ID: I submitted https://github.com/scipy/scipy/pull/4384, to achieve this. If a callback function returns True the minimization halts. However, it won't work for TNC minimizer as the callback is called from somewhere in the Fortran and I know no Fortran (thank goodness). On 8 January 2015 at 13:33, Robert McGibbon wrote: > +1 for something like this. In some of my applications, I want to remove > variables from the optimization when they get sufficiently close to zero, > so some control flow is needed to halt the optimization and then restart it > with a different (reduced) set of independent variables. > > -Robert > > On Wed, Jan 7, 2015 at 5:57 PM, Andrew Nelson wrote: > >> Dear devs, >> often one might want to halt a minimization, especially if the >> minimization is taking a long time. >> This could be done by raising an Exception in the objective function or >> callback function. e.g. >> >> def callback(xk): >> if something_is_not_to_our_liking(xk): >> raise HaltError('we wanted to stop') >> >> However, on halting one would not get the best solution found so far. >> Could the optimize.minimize functions be adapted to stop a minimizer? This >> could be done in a couple of ways: >> 1) the minimizer could catch something like a HaltError, raised either in >> the objective function or in the callback function. When it's caught the >> minimizer should return the best solution so far.. I'm not sure how this >> would work with the internals of each minimizer. >> 2) the callback could return False if the minimizer has to stop, or True >> to keep going. Again, I'm not sure how this would work with the internals >> of each minimizer. >> >> I'd also like to be able to halt a 'leastsq' optimization, in a similar >> fashion. What would be the best way of going about this? >> >> cheers, >> Andrew. >> >> >> >> -- >> _____________________________________ >> Dr. Andrew Nelson >> >> >> _____________________________________ >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -- _____________________________________ Dr. Andrew Nelson _____________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From newville at cars.uchicago.edu Thu Jan 8 14:34:27 2015 From: newville at cars.uchicago.edu (Matt Newville) Date: Thu, 8 Jan 2015 13:34:27 -0600 Subject: [SciPy-Dev] SciPy-Dev Digest, Vol 135, Issue 2 In-Reply-To: References: Message-ID: Hi Andrew, leastsq() can be halted by having your objective function return None. I think that doesn't work for the scalar minimizers, though I didn't try them all. Returning None from the objective function is not very fine-grained (you wouldn't know why it happened), but it seems easier to have the wrappers around the Fortran code look for a NULL return value than for a HaltError exception. It seems like a sensible, if crude, way to say "Stop now", and might be worth considering for all the minimizing functions. --Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sun Jan 11 12:50:47 2015 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 11 Jan 2015 19:50:47 +0200 Subject: [SciPy-Dev] ANN: Scipy 0.15.0 release Message-ID: <54B2B7F7.4030708@iki.fi> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dear all, We are pleased to announce the Scipy 0.15.0 release. The 0.15.0 release contains bugfixes and new features, most important of which are mentioned in the excerpt from the release notes below. Source tarballs, binaries, and full release notes are available at https://sourceforge.net/projects/scipy/files/scipy/0.15.0/ Best regards, Pauli Virtanen ========================== SciPy 0.15.0 Release Notes ========================== SciPy 0.15.0 is the culmination of 6 months of hard work. It contains several new features, numerous bug-fixes, improved test coverage and better documentation. There have been a number of deprecations and API changes in this release, which are documented below. All users are encouraged to upgrade to this release, as there are a large number of bug-fixes and optimizations. Moreover, our development attention will now shift to bug-fix releases on the 0.16.x branch, and on adding new features on the master branch. This release requires Python 2.6, 2.7 or 3.2-3.4 and NumPy 1.5.1 or greater. New features ============ Linear Programming Interface - ---------------------------- The new function `scipy.optimize.linprog` provides a generic linear programming similar to the way `scipy.optimize.minimize` provides a generic interface to nonlinear programming optimizers. Currently the only method supported is *simplex* which provides a two-phase, dense-matrix-based simplex algorithm. Callbacks functions are supported, allowing the user to monitor the progress of the algorithm. Differential evolution, a global optimizer - ------------------------------------------ A new `scipy.optimize.differential_evolution` function has been added to the ``optimize`` module. Differential Evolution is an algorithm used for finding the global minimum of multivariate functions. It is stochastic in nature (does not use gradient methods), and can search large areas of candidate space, but often requires larger numbers of function evaluations than conventional gradient based techniques. ``scipy.signal`` improvements - ----------------------------- The function `scipy.signal.max_len_seq` was added, which computes a Maximum Length Sequence (MLS) signal. ``scipy.integrate`` improvements - -------------------------------- It is now possible to use `scipy.integrate` routines to integrate multivariate ctypes functions, thus avoiding callbacks to Python and providing better performance. ``scipy.linalg`` improvements - ----------------------------- The function `scipy.linalg.orthogonal_procrustes` for solving the procrustes linear algebra problem was added. BLAS level 2 functions ``her``, ``syr``, ``her2`` and ``syr2`` are now wrapped in ``scipy.linalg``. ``scipy.sparse`` improvements - ----------------------------- `scipy.sparse.linalg.svds` can now take a ``LinearOperator`` as its main input. ``scipy.special`` improvements - ------------------------------ Values of ellipsoidal harmonic (i.e. Lame) functions and associated normalization constants can be now computed using ``ellip_harm``, ``ellip_harm_2``, and ``ellip_normal``. New convenience functions ``entr``, ``rel_entr`` ``kl_div``, ``huber``, and ``pseudo_huber`` were added. ``scipy.sparse.csgraph`` improvements - ------------------------------------- Routines ``reverse_cuthill_mckee`` and ``maximum_bipartite_matching`` for computing reorderings of sparse graphs were added. ``scipy.stats`` improvements - ---------------------------- Added a Dirichlet multivariate distribution, `scipy.stats.dirichlet`. The new function `scipy.stats.median_test` computes Mood's median test. The new function `scipy.stats.combine_pvalues` implements Fisher's and Stouffer's methods for combining p-values. `scipy.stats.describe` returns a namedtuple rather than a tuple, allowing users to access results by index or by name. Deprecated features =================== The `scipy.weave` module is deprecated. It was the only module never ported to Python 3.x, and is not recommended to be used for new code - use Cython instead. In order to support existing code, ``scipy.weave`` has been packaged separately: https://github.com/scipy/weave. It is a pure Python package, and can easily be installed with ``pip install weave``. `scipy.special.bessel_diff_formula` is deprecated. It is a private function, and therefore will be removed from the public API in a following release. ``scipy.stats.nanmean``, ``nanmedian`` and ``nanstd`` functions are deprecated in favor of their numpy equivalents. Backwards incompatible changes ============================== scipy.ndimage - ------------- The functions `scipy.ndimage.minimum_positions`, `scipy.ndimage.maximum_positions`` and `scipy.ndimage.extrema` return positions as ints instead of floats. scipy.integrate - --------------- The format of banded Jacobians in `scipy.integrate.ode` solvers is changed. Note that the previous documentation of this feature was erroneous. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAlSyt/cACgkQ6BQxb7O0pWA8SACfXmpUsJcXT5espj71OYpeaj5b JJwAoL10ud3q1f51A5Ij4lgqMeZGnHlj =ZmOl -----END PGP SIGNATURE----- From pierre.haessig at crans.org Tue Jan 13 07:20:42 2015 From: pierre.haessig at crans.org (Pierre Haessig) Date: Tue, 13 Jan 2015 13:20:42 +0100 Subject: [SciPy-Dev] error in docstring of pdist Message-ID: <54B50D9A.5000904@crans.org> Hello, It's my first use of the distance function from scipy.spatial, and I wonder if there is an error in the docstring of pdist: http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.distance.pdist.html (same thing for cdist) [...] Returns ------- Y : ndarray Returns a condensed distance matrix Y. For each :math:`i` and :math:`j` (where :math:`i References: <54B50D9A.5000904@crans.org> Message-ID: Hi Pierre, On Tue, Jan 13, 2015 at 1:20 PM, Pierre Haessig wrote: > Hello, > > It's my first use of the distance function from scipy.spatial, and I > wonder if there is an error in the docstring of pdist: > > > http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.distance.pdist.html > (same thing for cdist) > > [...] > Returns > ------- > Y : ndarray > Returns a condensed distance matrix Y. For > each :math:`i` and :math:`j` (where :math:`i metric ``dist(u=X[i], v=X[j])`` is computed and stored in entry > ``ij``. > [...] > > Is this indeed the entry i*j ? Because the docstring of squareform says : > I don't think that i*j is meant here but rather the typical mathematical matrix notation, in tex $X_{ij}$. I believe that you're right and this is also incorrect, i.e., it should be something along the lines of what you quote from the squareform documentation: > > [...] > The X[i, j] and X[j, i] values are set to v[{n \choose 2}-{n-i \choose > 2} + (j-u-1)] > [...] > I do fail to see what `u` is in this context, however. > > best, > Pierre > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > So I think the docs should be improved. Care to submit a PR? Cheers, Moritz -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.haessig at crans.org Tue Jan 13 13:24:01 2015 From: pierre.haessig at crans.org (Pierre Haessig) Date: Tue, 13 Jan 2015 19:24:01 +0100 Subject: [SciPy-Dev] error in docstring of pdist In-Reply-To: References: <54B50D9A.5000904@crans.org> Message-ID: <54B562C1.8020908@crans.org> Hi Moritz, Thanks for the feedback. Matlab doc for the equivalent pdist function is indeed clearer. http://www.mathworks.com/help/stats/pdist.html I believe you're right : the [ij] means [i,j] and this is wrong. It should be more the complex product given in squareform help (by the way Matlab gives an easier to read formula "/D/((/i/?1)*(/m/?/i//2)+/j/?/i/)". I should check whether it's equivalent after converting to 0-based indexing...) best, Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericq at caltech.edu Sat Jan 17 23:51:15 2015 From: ericq at caltech.edu (Eric Quintero) Date: Sat, 17 Jan 2015 20:51:15 -0800 Subject: [SciPy-Dev] CSD in scipy.signal? Message-ID: <25502383-0F7C-45ED-A432-BF8436340462@caltech.edu> Hi all, I would like to write cross-spectral density (and, by extension, coherence) methods for scipy.signal, as a complement the welch?s PSD method already there. I haven?t contributed to scipy before, and the website encourages discussion of new features on this list, so here I am. In addition, when computing PSDs in my field, we sometimes prefer to take the median of the segments, rather than the mean, in order to reduce the effect of transients in the data when trying to evaluate a stationary noise floor. I would like to add this option as a kwarg. Thanks for your time, Eric Q. From alex.grigorievskiy at gmail.com Sun Jan 18 08:51:02 2015 From: alex.grigorievskiy at gmail.com (Alexander Grigorievskiy) Date: Sun, 18 Jan 2015 15:51:02 +0200 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal Message-ID: <54BBBA46.2070802@gmail.com> Hello Scipy, I have been elaborating least-squares solvers recently I found out that the current implementation which is inside scipy.linalg.lstsq is not optimal. Of course, I may be wrong but here are my arguments: 1) Intrinsically it calls the LAPACK function "gelss" which uses SVD (as written in the LAPACK documentation). There are, however, two more routines which solve the same problem "gelsy" (uses complete orthogonal factorization) and "gelsd" (uses divide-and-conquer SVD algorithm). In the LAPACK documentation it is written that "gelsd" work faster then "gelss". http://www.netlib.org/lapack/lug/node27.html 2) In Numpy there is the same function "lstsq" and they call "gelsd" there. 3) I run my own tests analyzing the speed of four functions: "gelss" (the current one used in scipy.linalg.lstsq) "gelsy" (present in LAPACK uses complete orthogonal factorization) "gelsd" (present in LAPACK uses divide-and-conquer SVD) numpy.lstsq ( uses "gelsd" from lapack_lite) I have imported the missing functions (from LAPACK) in Scipy by including them into the file scipy/linalg/flapack.pyf.src, recompiling scipy, and creating almost the same function as "lstsq" but calling different LAPACK functions. The file with the test results is in the attachment. Based on this the fastest method is "gelsy" shortly followed by "gelsd". I want still to run couple more tests regarding how fast the methods scale if you change only one dimension of the matrix, or you can propose some other tests. I also monitored accuracies but they are almost the same, I can send the plot if anyone is interested. I would like to propose to change the current implementation to call "gelsy" or "gelsd" from LAPACK along with corresponding modifications. Best Regards, Alexander Grigorevskiy, PhD student, Aalto University. -------------- next part -------------- A non-text attachment was scrubbed... Name: ls_test_1_speeds.png Type: image/png Size: 51036 bytes Desc: not available URL: From josef.pktd at gmail.com Sun Jan 18 09:10:54 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 18 Jan 2015 09:10:54 -0500 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: <54BBBA46.2070802@gmail.com> References: <54BBBA46.2070802@gmail.com> Message-ID: On Sun, Jan 18, 2015 at 8:51 AM, Alexander Grigorievskiy < alex.grigorievskiy at gmail.com> wrote: > Hello Scipy, > > I have been elaborating least-squares solvers recently I found out that > the current implementation > which is inside scipy.linalg.lstsq is not optimal. Of course, I may be > wrong but here are my arguments: > > 1) Intrinsically it calls the LAPACK function "gelss" which uses SVD (as > written in the LAPACK documentation). > There are, however, two more routines which solve the same problem > "gelsy" (uses complete orthogonal factorization) > and "gelsd" (uses divide-and-conquer SVD algorithm). In the LAPACK > documentation it is written that "gelsd" work faster then "gelss". > http://www.netlib.org/lapack/lug/node27.html > > 2) In Numpy there is the same function "lstsq" and they call "gelsd" there. > > 3) I run my own tests analyzing the speed of four functions: > "gelss" (the current one used in scipy.linalg.lstsq) > "gelsy" (present in LAPACK uses complete orthogonal factorization) > "gelsd" (present in LAPACK uses divide-and-conquer SVD) > numpy.lstsq ( uses "gelsd" from lapack_lite) > > I have imported the missing functions (from LAPACK) in Scipy by > including them into the file scipy/linalg/flapack.pyf.src, > recompiling scipy, and creating almost the same function as "lstsq" but > calling different LAPACK functions. > > The file with the test results is in the attachment. > Based on this the fastest method is "gelsy" shortly followed by "gelsd". > I want still to run couple more tests regarding how > fast the methods scale if you change only one dimension of the matrix, > or you can propose some other tests. > I also monitored accuracies but they are almost the same, I can send the > plot if anyone is interested. > > I would like to propose to change the current implementation to call > "gelsy" or "gelsd" from LAPACK along with corresponding modifications. > In terms of backwards compatibility, it is also necessary to check the behavior in bad cases, for example singular matrices and ill-conditioned and near singular cases. For the latter the NIST cases would provide a good check. I don't know how well the current lstsq is doing since I never checked. Using scipy pinv (SVD based ?) for linear regression works pretty well in those cases. Josef > > Best Regards, > Alexander Grigorevskiy, PhD student, Aalto University. > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sun Jan 18 14:22:54 2015 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 18 Jan 2015 21:22:54 +0200 Subject: [SciPy-Dev] ANN: Scipy 0.15.1 Message-ID: <54BC080E.7040109@iki.fi> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dear all, We are pleased to announce the Scipy 0.15.1 release. Scipy 0.15.1 contains only bugfixes. The module ``scipy.linalg.calc_lwork`` removed in Scipy 0.15.0 is restored. This module is not a part of Scipy's public API, and although it is available again in Scipy 0.15.1, using it is deprecated and it may be removed again in a future Scipy release. Source tarballs, binaries, and full release notes are available at https://sourceforge.net/projects/scipy/files/scipy/0.15.1/ Best regards, Pauli Virtanen ========================== SciPy 0.15.1 Release Notes ========================== SciPy 0.15.1 is a bug-fix release with no new features compared to 0.15.0. Issues fixed - ------------ * `#4413 `__: BUG: Tests too strict, f2py doesn't have to overwrite this array * `#4417 `__: BLD: avoid using NPY_API_VERSION to check not using deprecated... * `#4418 `__: Restore and deprecate scipy.linalg.calc_work -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAlS8CA4ACgkQ6BQxb7O0pWCmOQCgzg9AXDaqRaK5/QBWopIrv2OA WkEAn0ltDfDHFpw0zMzB9mUscAAb2xnE =JrGj -----END PGP SIGNATURE----- From sturla.molden at gmail.com Sun Jan 18 17:50:17 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 18 Jan 2015 23:50:17 +0100 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: <54BBBA46.2070802@gmail.com> References: <54BBBA46.2070802@gmail.com> Message-ID: The "fastest" lapack least-squares solver (apart from using Cholesky, which is even faster), is *gels which uses QR or LQ factorization. If the data is rank n x p in C order *gels can fit least squares with LQ, which thus avoids the transpose to n x p in Fortran order. *gelss et al. can only work with data rank n x p in Fortran order. Which SVD-based solver is the faster depends on the hardware and the LAPACK library. Correctness also beats speed. It is wrong to say that the fastest least-squares solver is the better, in which case we should be using Cholesky factorization. Sturla On 18/01/15 14:51, Alexander Grigorievskiy wrote: > Hello Scipy, > > I have been elaborating least-squares solvers recently I found out that > the current implementation > which is inside scipy.linalg.lstsq is not optimal. Of course, I may be > wrong but here are my arguments: > > 1) Intrinsically it calls the LAPACK function "gelss" which uses SVD (as > written in the LAPACK documentation). > There are, however, two more routines which solve the same problem > "gelsy" (uses complete orthogonal factorization) > and "gelsd" (uses divide-and-conquer SVD algorithm). In the LAPACK > documentation it is written that "gelsd" work faster then "gelss". > http://www.netlib.org/lapack/lug/node27.html > > 2) In Numpy there is the same function "lstsq" and they call "gelsd" there. > > 3) I run my own tests analyzing the speed of four functions: > "gelss" (the current one used in scipy.linalg.lstsq) > "gelsy" (present in LAPACK uses complete orthogonal factorization) > "gelsd" (present in LAPACK uses divide-and-conquer SVD) > numpy.lstsq ( uses "gelsd" from lapack_lite) > > I have imported the missing functions (from LAPACK) in Scipy by > including them into the file scipy/linalg/flapack.pyf.src, > recompiling scipy, and creating almost the same function as "lstsq" but > calling different LAPACK functions. > > The file with the test results is in the attachment. > Based on this the fastest method is "gelsy" shortly followed by "gelsd". > I want still to run couple more tests regarding how > fast the methods scale if you change only one dimension of the matrix, > or you can propose some other tests. > I also monitored accuracies but they are almost the same, I can send the > plot if anyone is interested. > > I would like to propose to change the current implementation to call > "gelsy" or "gelsd" from LAPACK along with corresponding modifications. > > Best Regards, > Alexander Grigorevskiy, PhD student, Aalto University. > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From sturla.molden at gmail.com Sun Jan 18 18:17:00 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 19 Jan 2015 00:17:00 +0100 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: References: <54BBBA46.2070802@gmail.com> Message-ID: On 18/01/15 15:10, josef.pktd at gmail.com wrote: > I don't know how well the current lstsq is doing since I never checked. > Using scipy pinv (SVD based ?) for linear regression works pretty well > in those cases. pinv uses SVD, but there is a big difference: You are refering to using SVD on the scatter matrix X'X, whereas SciPy uses SVD on the data matrix X for the least-squares solver. Directly forming the scatter matrix X'X should be avoided. The condition number of X'X is the square of the condition number of X. The only thing you achieve by forming X'X is accumulating rounding errors. And if you need pinv because X'X is too ill-conditioned, you could probably still get away with using QR or LQ on X becuause it might not be. The only thing you have achieved is reduced numerical accuracy. Also if you use pinv to deal with ill-conditioning, it is close to brain-dead to compute X'X. SciPy takes the safest and most accurat approach, which is SVD on the data matrix X to fit the least-squares solution (which is what *gelss does internally). Sturla From josef.pktd at gmail.com Sun Jan 18 18:40:26 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 18 Jan 2015 18:40:26 -0500 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: References: <54BBBA46.2070802@gmail.com> Message-ID: On Sun, Jan 18, 2015 at 6:17 PM, Sturla Molden wrote: > On 18/01/15 15:10, josef.pktd at gmail.com wrote: > > > > I don't know how well the current lstsq is doing since I never checked. > > Using scipy pinv (SVD based ?) for linear regression works pretty well > > in those cases. > > pinv uses SVD, but there is a big difference: You are refering to using > SVD on the scatter matrix X'X, whereas SciPy uses SVD on the data matrix > X for the least-squares solver. > No, I meant svd(x), see below. > > Directly forming the scatter matrix X'X should be avoided. The condition > number of X'X is the square of the condition number of X. The only thing > you achieve by forming X'X is accumulating rounding errors. > And if you need pinv because X'X is too ill-conditioned, you could > probably still get away with using QR or LQ on X becuause it might not > be. The only thing you have achieved is reduced numerical accuracy. > > Also if you use pinv to deal with ill-conditioning, it is close to > brain-dead to compute X'X. > > SciPy takes the safest and most accurat approach, which is SVD on the > data matrix X to fit the least-squares solution (which is what *gelss > does internally). > brain-dead ? (numerical mafia?) Numerically significant precision problems doesn't mean that they are "statistically" important. :) If you have to worry about numerical precision in statistical analysis, then (most of the time) you are screwed already much earlier and you better rethink your choice of models or statistical method. inv(x'x) is perfectly fine, the warning in the old statistical literature starts at a condition number of 30 ! (I think I picked a condition number of a few 1000 to print the warning in statsmodels.) However, statsmodels uses pinv(x) which uses svd(x), and allows by default for singular x. The alternative method uses QR which fails on singular x. np.linalg.pinv is doing reasonably ok on the toughest NIST problem, but it's a piece of cake if we standardize the x values beforehand. scipy.linalg.pinv was doing a little bit better at default settings. Josef > > > Sturla > > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.grigorievskiy at gmail.com Mon Jan 19 03:53:47 2015 From: alex.grigorievskiy at gmail.com (Alexander Grigorievskiy) Date: Mon, 19 Jan 2015 10:53:47 +0200 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: References: <54BBBA46.2070802@gmail.com> Message-ID: <54BCC61B.7070501@gmail.com> Hi Sturla, Josef Thanks for your comments. On 01/19/2015 12:50 AM, Sturla Molden wrote: > The "fastest" lapack least-squares solver (apart from using Cholesky, > which is even faster), is *gels which uses QR or LQ factorization. If > the data is rank n x p in C order *gels can fit least squares with LQ, > which thus avoids the transpose to n x p in Fortran order. > > *gelss et al. can only work with data rank n x p in Fortran order. > > Which SVD-based solver is the faster depends on the hardware and the > LAPACK library. > > Correctness also beats speed. It is wrong to say that the fastest > least-squares solver is the better, in which case we should be using > Cholesky factorization. > > > Sturla > In the LAPACK documentation it is explicitly written that "gels" can solve only full rank problems. "gelss", "gelsd" ,"gelsy" can solve not full rank, but both "gelsy" and "gelsd" are faster than "gelss" (currently used), however they might require some more space. http://www.netlib.org/lapack/lug/node27.html My experiments show the same. By the way were you able to see the attachment in my first letter? In the experiments I solved the system Ax=b, where A is (m*n). I varied m, I took n = 2/3*m, and rank r=1/2*m. Then I found x, measured time and compared solutions (ny max norm) between difference methods to monitor the accuracy. I agree with the fact the the speed may depend on hardware and LAPACK library. But I use the standard modern computer and standard LAPACK, which is probably the most frequent use case for SciPy. I guess the goal of SciPy is not to optimize for ScaLAPACK, MAGMA and so forth. I have also monitored the accuracy, as I wrote it is practically the same for all methods < 2*10e-16. I apply the graph of the accuracies to this letter (sorry for the legend it is doubled), where the max-norm difference between various methods is shown. > brain-dead ? (numerical mafia?) > > Numerically significant precision problems doesn't mean that they are > "statistically" important. > :) > > If you have to worry about numerical precision in statistical > analysis, then (most of the time) you are screwed already much earlier > and you better rethink your choice of models or statistical method. > > inv(x'x) is perfectly fine, the warning in the old statistical > literature starts at a condition number of 30 ! > (I think I picked a condition number of a few 1000 to print the > warning in statsmodels.) > > > However, statsmodels uses pinv(x) which uses svd(x), and allows by > default for singular x. The alternative method uses QR which fails on > singular x. > > np.linalg.pinv is doing reasonably ok on the toughest NIST problem, > but it's a piece of cake if we standardize the x values beforehand. > scipy.linalg.pinv was doing a little bit better at default settings. > > > Josef > Josef, I think you can solve the least-squares by SVD and pseudo-inverse, but this is not a direct solution. So, first you find the SVD, then pseudo-inverse and then solve least-squares, rather then directly solving least-squares (although maybe by similar methods). I am not saying that this is wrong, but then what for there are separate functions to solve least-squares in LAPACK, SciPy and NumPy? My desire is to improve the diretc least-square solver lstsq by calling the appropriate LAPACK function. Best regards, Alexander Grigorevskiy, PhD student, Aalto University -------------- next part -------------- A non-text attachment was scrubbed... Name: ls_test_1_accuracies.png Type: image/png Size: 69600 bytes Desc: not available URL: From josef.pktd at gmail.com Mon Jan 19 08:51:12 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 19 Jan 2015 08:51:12 -0500 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: <54BCC61B.7070501@gmail.com> References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> Message-ID: On Mon, Jan 19, 2015 at 3:53 AM, Alexander Grigorievskiy < alex.grigorievskiy at gmail.com> wrote: > Hi Sturla, Josef > > Thanks for your comments. > > On 01/19/2015 12:50 AM, Sturla Molden wrote: > > The "fastest" lapack least-squares solver (apart from using Cholesky, > > which is even faster), is *gels which uses QR or LQ factorization. If > > the data is rank n x p in C order *gels can fit least squares with LQ, > > which thus avoids the transpose to n x p in Fortran order. > > > > *gelss et al. can only work with data rank n x p in Fortran order. > > > > Which SVD-based solver is the faster depends on the hardware and the > > LAPACK library. > > > > Correctness also beats speed. It is wrong to say that the fastest > > least-squares solver is the better, in which case we should be using > > Cholesky factorization. > > > > > > Sturla > > > > In the LAPACK documentation it is explicitly written that "gels" can > solve only full rank problems. "gelss", "gelsd" ,"gelsy" can solve not > full rank, > but both "gelsy" and "gelsd" are faster than "gelss" (currently used), > however they might require > some more space. > http://www.netlib.org/lapack/lug/node27.html > > My experiments show the same. By the way were you able to see the > attachment > in my first letter? > > In the experiments I solved the system Ax=b, where A is (m*n). I varied > m, I took n = 2/3*m, and rank r=1/2*m. > Then I found x, measured time and compared solutions (ny max norm) > between difference methods to monitor the accuracy. > I didn't read carefully enough and missed the rank part in the bottom of the graph. The speed improvements do look impressive, and the Lapack node seems to indicate that it is a more modern version for solving this. Our most common case would be fixed number of columns and increasing number of rows. > > I agree with the fact the the speed may depend on hardware and LAPACK > library. But I use > the standard modern computer and standard LAPACK, which is probably the > most > frequent use case for SciPy. I guess the goal of SciPy is not to > optimize for ScaLAPACK, MAGMA and so forth. > > I have also monitored the accuracy, as I wrote it is practically the > same for all methods < 2*10e-16. > I apply the graph of the accuracies to this letter (sorry for the legend > it is doubled), where the max-norm difference > between various methods is shown. > > > brain-dead ? (numerical mafia?) > > > > Numerically significant precision problems doesn't mean that they are > > "statistically" important. > > :) > > > > If you have to worry about numerical precision in statistical > > analysis, then (most of the time) you are screwed already much earlier > > and you better rethink your choice of models or statistical method. > > > > inv(x'x) is perfectly fine, the warning in the old statistical > > literature starts at a condition number of 30 ! > > (I think I picked a condition number of a few 1000 to print the > > warning in statsmodels.) > > > > > > However, statsmodels uses pinv(x) which uses svd(x), and allows by > > default for singular x. The alternative method uses QR which fails on > > singular x. > > > > np.linalg.pinv is doing reasonably ok on the toughest NIST problem, > > but it's a piece of cake if we standardize the x values beforehand. > > scipy.linalg.pinv was doing a little bit better at default settings. > > > > > > Josef > > > > Josef, I think you can solve the least-squares by SVD and > pseudo-inverse, but this is not a direct solution. > So, first you find the SVD, then pseudo-inverse and then solve > least-squares, rather then directly solving least-squares > (although maybe by similar methods). I am not saying that this is wrong, > but then what for there are separate functions > to solve least-squares in LAPACK, SciPy and NumPy? > My desire is to improve the diretc least-square solver lstsq by calling > the appropriate LAPACK function. > This was mainly an aside in my initial message to illustrate that I checked the behavior of an svd version in the singular and near-singular case, and it is relatively easy to understand what regularization is used in those cases. Without checking I wouldn't know what the "complete orthogonal factorization" in gelsy is doing. another aside which is not really relevant for the issue In statsmodels, in the main use case OLS we also need additional results like inv(x'x) (or a numerically stable version of it) and rank, and even pinv is already a bit high-level for this and using raw svd might save us some calculations. In other cases we want to solve for different right hand sides which we might not know in advance, pinv is my or our preferred solution. The least common case is straightforward linalg.lstsq where we only need the solution to one set of linear equations. We also use np.linalg.solve (which according to the docstring used dgesv) for simple inv(a) dot b problems in matrix equations. Overall, I think more benchmarks and evaluating different Lapack solutions like you did is useful. (Especially for users like my who have only a rough idea about the linalg jungle, i.e. myriad possibilities to do the "same" thing.) I suggest you submit a PR to scipy and the linalg developers/maintainers will look at the details. Josef > > Best regards, > Alexander Grigorevskiy, PhD student, Aalto University > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Mon Jan 19 13:23:15 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 19 Jan 2015 18:23:15 +0000 (UTC) Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal References: <54BBBA46.2070802@gmail.com> Message-ID: <947197713443384373.749165sturla.molden-gmail.com@news.gmane.org> wrote: > Numerically significant precision problems doesn't mean that they are > "statistically" important. scipy.linalg.lstsq is not only for statistics. It is a linear algebra solver. Sturla From ralf.gommers at gmail.com Mon Jan 19 15:42:20 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 19 Jan 2015 21:42:20 +0100 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> Message-ID: On Mon, Jan 19, 2015 at 2:51 PM, wrote: > I suggest you submit a PR to scipy and the linalg developers/maintainers > will look at the details. > +1 from me on this. I actually suggest to split it into 2 PRs: one that wraps the LAPACK functions (which is quick to review), and one for the change to lstsq which requires more careful review and testing. Nice performance comparison by the way. Adding that to the PR summary would also be useful. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Jan 19 15:55:14 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 19 Jan 2015 15:55:14 -0500 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> Message-ID: On Mon, Jan 19, 2015 at 3:42 PM, Ralf Gommers wrote: > > > On Mon, Jan 19, 2015 at 2:51 PM, wrote: > > > >> I suggest you submit a PR to scipy and the linalg developers/maintainers >> will look at the details. >> > > +1 from me on this. I actually suggest to split it into 2 PRs: one that > wraps the LAPACK functions (which is quick to review), and one for the > change to lstsq which requires more careful review and testing. > > Nice performance comparison by the way. Adding that to the PR summary > would also be useful. > Also, the performance in the first plot looks so good that, IMO, this should go into scipy.linalg even if it is/were not a substitute for the current lstsq, it could always be a lstsq2. Josef > > Cheers, > Ralf > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Mon Jan 19 16:38:37 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 19 Jan 2015 22:38:37 +0100 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> Message-ID: On 19/01/15 21:55, josef.pktd at gmail.com wrote: > Also, the performance in the first plot looks so good that, IMO, this > should go into scipy.linalg even if it is/were not a substitute for the > current lstsq, it could always be a lstsq2. One could use a keyword argument to select the LAPACK driver. Personally I also think SciPy should expose *GGGLM because it solves the generalized least squares problem. Sturla From njs at pobox.com Mon Jan 19 18:20:51 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 19 Jan 2015 23:20:51 +0000 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> Message-ID: On Mon, Jan 19, 2015 at 8:55 PM, wrote: > > On Mon, Jan 19, 2015 at 3:42 PM, Ralf Gommers > wrote: >> >> >> >> On Mon, Jan 19, 2015 at 2:51 PM, wrote: >> >> >>> >>> I suggest you submit a PR to scipy and the linalg developers/maintainers >>> will look at the details. >> >> >> +1 from me on this. I actually suggest to split it into 2 PRs: one that >> wraps the LAPACK functions (which is quick to review), and one for the >> change to lstsq which requires more careful review and testing. >> >> Nice performance comparison by the way. Adding that to the PR summary >> would also be useful. > > > Also, the performance in the first plot looks so good that, IMO, this should > go into scipy.linalg even if it is/were not a substitute for the current > lstsq, it could always be a lstsq2. Obviously this should be available, but does switching lstsq even matter? IIRC lstsq is going to be pretty slow regardless of what core algorithm it uses, b/c of the issue with returning residuals. (Plus it returns singular values, which I don't know if these other algorithms provide.) -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From sturla.molden at gmail.com Mon Jan 19 19:12:07 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 20 Jan 2015 01:12:07 +0100 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> Message-ID: On 20/01/15 00:20, Nathaniel Smith wrote: > Obviously this should be available, but does switching lstsq even > matter? IIRC lstsq is going to be pretty slow regardless of what core > algorithm it uses, b/c of the issue with returning residuals. (Plus it > returns singular values, which I don't know if these other algorithms > provide.) *gelsd returns singular values. It does the same as *gelss but uses a divide-and-conquer algorithm. *gelsd was designed to be a faster alternative to *gelss on vector processors like Cray C-90. Presumably *gelsd might also be more efficient on Intel processors with SIMD instructions like SSE* and AVX. *gelsx and *gelsy are QR-based alternatives to *gels and do not return singular values. Unlike *gels they require the condition number (RCOND) as input, which in my opinion limits their usability. AFAIK the main difference between *gelsx and *gelsy is the BLAS level. In summary: *gels is normally useful as a faster alternative to *gelss, i.e. whenever X is positive definite, which is usually the case. *gels can also handle p x n data without transposition to n x p. *gelsd is a SIMD-optimized alternative to *gelss. *gelsx is useful as a faster alternative to *gels if we know RCOND in advance. *gelsy is useful as a faster alternative to *gely if we can benefit from a higher BLAS level. *ggglm is similar to *gels but solves the generalized least squares problem. It is by far the slowest of the least-squares solvers in LAPACK. Sturla From sturla.molden at gmail.com Mon Jan 19 19:24:18 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 20 Jan 2015 01:24:18 +0100 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> Message-ID: On 20/01/15 00:20, Nathaniel Smith wrote: > Obviously this should be available, but does switching lstsq even > matter? IIRC lstsq is going to be pretty slow regardless of what core > algorithm it uses We should also consider that calling lstsq from Python is also speed-limited by allocation of a temporary work array and data transposition (in f2py). This overhead will not go away if we use a different lapack driver underneath. Sturla From alex.grigorievskiy at gmail.com Tue Jan 20 04:56:17 2015 From: alex.grigorievskiy at gmail.com (Alexander Grigorievskiy) Date: Tue, 20 Jan 2015 11:56:17 +0200 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> Message-ID: <54BE2641.4090203@gmail.com> Hi, everyone I think that 1) "gels" is not applicable here since it requires the full (column or row) rank matrix as input which might not be the case for lstsq. 2) "gelsx" is substituted by "gelsy" and is kept only for backward compatibility. http://www.netlib.org/lapack/lug/node27.html 3) Indeed, "gelsy" requires the condition number RCOND. Actually, currently I pass -1 as the condition number by analogy with "gelss" where it is assumed that machine precision is used if (RCOND < 0). Now I checked the documentation and found out that "gelsy" does not have this default behavior (although experiments show that it works ok) However, I think we can pass machine precision form python if the user did not provided the condition number. I need to test is a bit more. It is still the fastest algorithm in most cases. 4) "gelsd" is the faster version of "gelss"(used currently). The interface is the same. 5) I am touching "ggglm" here. If I have time I can export it to SciPy as well. I agree that it is better to have it 6) > We should also consider that calling lstsq from Python is also > speed-limited by allocation of a temporary work array and data > transposition (in f2py). This overhead will not go away if we use a > different lapack driver underneath. In my test this time is already taken into account, and it is approximately the same for all the drivers. So, I think what I am going to do is Create 2 Pull Requests as was proposed before. In the first one I will create SciPy wrappers for "gelsd" and "gelsy". The latter I need to test I little bit more because of the revealed issue with condition number. In the second one I am going to modify lstsq where the default call is to "gelsd". Why "gelsd"? Because it is more similar to the current one "gelss" i.e. they both use SVD, and it also returns singular values. So, there is going to be some succession or continuity. Also, I am going to add a text parameter to the function by which you could select an alternative driver: "gelsy" and "gelss". "gelsy" is useful when you really interested in speed, and "gelss" as written in LAPACK docs consumes less memory then "gelsd" which also might be relevant sometimes. The parameter is going to have a default value, so the existing calls to lstsq are not affected. I think those more or less takes all the remarks and comments into account. P.S I have conducted couple more experiments in which I varied only one dimension of the matrix: m or n. The results are in the attachment. Regards, Alex Grigorevskiy, PhD student, Aalto University. On 01/20/2015 02:12 AM, Sturla Molden wrote: > On 20/01/15 00:20, Nathaniel Smith wrote: > >> Obviously this should be available, but does switching lstsq even >> matter? IIRC lstsq is going to be pretty slow regardless of what core >> algorithm it uses, b/c of the issue with returning residuals. (Plus it >> returns singular values, which I don't know if these other algorithms >> provide.) > *gelsd returns singular values. It does the same as *gelss but uses a > divide-and-conquer algorithm. *gelsd was designed to be a faster > alternative to *gelss on vector processors like Cray C-90. Presumably > *gelsd might also be more efficient on Intel processors with SIMD > instructions like SSE* and AVX. > > *gelsx and *gelsy are QR-based alternatives to *gels and do not return > singular values. Unlike *gels they require the condition number (RCOND) > as input, which in my opinion limits their usability. AFAIK the main > difference between *gelsx and *gelsy is the BLAS level. > > > In summary: > > *gels is normally useful as a faster alternative to *gelss, i.e. > whenever X is positive definite, which is usually the case. > > *gels can also handle p x n data without transposition to n x p. > > *gelsd is a SIMD-optimized alternative to *gelss. > > *gelsx is useful as a faster alternative to *gels if we know RCOND in > advance. > > *gelsy is useful as a faster alternative to *gely if we can benefit from > a higher BLAS level. > > *ggglm is similar to *gels but solves the generalized least squares > problem. It is by far the slowest of the least-squares solvers in LAPACK. > > > Sturla > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev -------------- next part -------------- A non-text attachment was scrubbed... Name: ls_test_2_speeds.png Type: image/png Size: 50560 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ls_test_3_speeds.png Type: image/png Size: 52865 bytes Desc: not available URL: From josef.pktd at gmail.com Tue Jan 20 08:56:55 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 20 Jan 2015 08:56:55 -0500 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: <54BE2641.4090203@gmail.com> References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> <54BE2641.4090203@gmail.com> Message-ID: On Tue, Jan 20, 2015 at 4:56 AM, Alexander Grigorievskiy < alex.grigorievskiy at gmail.com> wrote: > Hi, everyone > > I think that > > 1) "gels" is not applicable here since it requires the full (column or > row) rank matrix as input > which might not be the case for lstsq. > > 2) "gelsx" is substituted by "gelsy" and is kept only for backward > compatibility. > http://www.netlib.org/lapack/lug/node27.html > > 3) Indeed, "gelsy" requires the condition number RCOND. Actually, currently > I pass -1 as the condition number by analogy with "gelss" where it > is assumed > that machine precision is used if (RCOND < 0). Now I checked the > documentation > and found out that "gelsy" does not have this default behavior > (although experiments show that it works ok) > However, I think we can pass machine precision form python if the > user did not provided the > condition number. I need to test is a bit more. > It is still the fastest algorithm in most cases. > > 4) "gelsd" is the faster version of "gelss"(used currently). The > interface is the same. > > 5) I am touching "ggglm" here. If I have time I can export it to SciPy > as well. I agree that it is better to have it > > 6) > > We should also consider that calling lstsq from Python is also > > speed-limited by allocation of a temporary work array and data > > transposition (in f2py). This overhead will not go away if we use a > > different lapack driver underneath. > In my test this time is already taken into account, and it is > approximately the same > for all the drivers. > > > So, I think what I am going to do is > > Create 2 Pull Requests as was proposed before. > In the first one I will create SciPy wrappers for "gelsd" and "gelsy". > The latter I need to test I little bit more > because of the revealed issue with condition number. > > In the second one I am going to modify lstsq where the default call is > to "gelsd". Why "gelsd"? > Because it is more similar to the current one "gelss" i.e. they both use > SVD, and it also > returns singular values. So, there is going to be some succession or > continuity. > Also, I am going to add a text parameter to the function by which you > could select an alternative > driver: "gelsy" and "gelss". "gelsy" is useful when you really > interested in speed, and "gelss" > as written in LAPACK docs consumes less memory then "gelsd" which also > might be relevant sometimes. > The parameter is going to have a default value, so the existing calls to > lstsq are not affected. > > I think those more or less takes all the remarks and comments into account. > sounds good to me, as a user > > P.S I have conducted couple more experiments in which I varied only one > dimension of the matrix: m or n. > The results are in the attachment. > Can you add another case for m increasing and n, k fixed with larger m to see the different trend behavior in the first graph? e.g. m increasing to at least 100,000, n fixed at 500 or less, and rank fixed closer to n. (assuming I didn't mix up m or n) Josef > > Regards, > Alex Grigorevskiy, PhD student, Aalto University. > > > On 01/20/2015 02:12 AM, Sturla Molden wrote: > > On 20/01/15 00:20, Nathaniel Smith wrote: > > > >> Obviously this should be available, but does switching lstsq even > >> matter? IIRC lstsq is going to be pretty slow regardless of what core > >> algorithm it uses, b/c of the issue with returning residuals. (Plus it > >> returns singular values, which I don't know if these other algorithms > >> provide.) > > *gelsd returns singular values. It does the same as *gelss but uses a > > divide-and-conquer algorithm. *gelsd was designed to be a faster > > alternative to *gelss on vector processors like Cray C-90. Presumably > > *gelsd might also be more efficient on Intel processors with SIMD > > instructions like SSE* and AVX. > > > > *gelsx and *gelsy are QR-based alternatives to *gels and do not return > > singular values. Unlike *gels they require the condition number (RCOND) > > as input, which in my opinion limits their usability. AFAIK the main > > difference between *gelsx and *gelsy is the BLAS level. > > > > > > In summary: > > > > *gels is normally useful as a faster alternative to *gelss, i.e. > > whenever X is positive definite, which is usually the case. > > > > *gels can also handle p x n data without transposition to n x p. > > > > *gelsd is a SIMD-optimized alternative to *gelss. > > > > *gelsx is useful as a faster alternative to *gels if we know RCOND in > > advance. > > > > *gelsy is useful as a faster alternative to *gely if we can benefit from > > a higher BLAS level. > > > > *ggglm is similar to *gels but solves the generalized least squares > > problem. It is by far the slowest of the least-squares solvers in LAPACK. > > > > > > Sturla > > > > > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Tue Jan 20 12:01:29 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 20 Jan 2015 17:01:29 +0000 (UTC) Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> <54BE2641.4090203@gmail.com> Message-ID: <1926630893443465905.254848sturla.molden-gmail.com@news.gmane.org> Alexander Grigorievskiy wrote: > 3) Indeed, "gelsy" requires the condition number RCOND. Actually, currently > I pass -1 as the condition number by analogy with "gelss" where it > is assumed > that machine precision is used if (RCOND < 0). Now I checked the > documentation > and found out that "gelsy" does not have this default behavior > (although experiments show that it works ok) > However, I think we can pass machine precision form python if the > user did not provided the > condition number. I need to test is a bit more. > It is still the fastest algorithm in most cases. Forgive me if I am ignorant here: Shouldn't you do an SVD first and compute RCOND from the singular values before you call *gelsy? Granted, using an arbitrary RCOND value might work, but it is not how *gelsy is supposed to be used, AFAIK. Sturla From josef.pktd at gmail.com Tue Jan 20 12:40:27 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 20 Jan 2015 12:40:27 -0500 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: <1926630893443465905.254848sturla.molden-gmail.com@news.gmane.org> References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> <54BE2641.4090203@gmail.com> <1926630893443465905.254848sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, Jan 20, 2015 at 12:01 PM, Sturla Molden wrote: > Alexander Grigorievskiy wrote: > > > 3) Indeed, "gelsy" requires the condition number RCOND. Actually, > currently > > I pass -1 as the condition number by analogy with "gelss" where it > > is assumed > > that machine precision is used if (RCOND < 0). Now I checked the > > documentation > > and found out that "gelsy" does not have this default behavior > > (although experiments show that it works ok) > > However, I think we can pass machine precision form python if the > > user did not provided the > > condition number. I need to test is a bit more. > > It is still the fastest algorithm in most cases. > > Forgive me if I am ignorant here: > > Shouldn't you do an SVD first and compute RCOND from the singular values > before you call *gelsy? > > Granted, using an arbitrary RCOND value might work, but it is not how > *gelsy is supposed to be used, AFAIK. > Isn't rcond a choice variable for the threshold of considering near singular as singular as in pinv ? (and not something that is calculated from the data) The are similar rcond choices, predefined or user given, in other numpy or scipy linalg routines. mostly guessing Josef > > Sturla > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jan 20 12:47:32 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 20 Jan 2015 12:47:32 -0500 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> <54BE2641.4090203@gmail.com> <1926630893443465905.254848sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, Jan 20, 2015 at 12:40 PM, wrote: > > > On Tue, Jan 20, 2015 at 12:01 PM, Sturla Molden > wrote: > >> Alexander Grigorievskiy wrote: >> >> > 3) Indeed, "gelsy" requires the condition number RCOND. Actually, >> currently >> > I pass -1 as the condition number by analogy with "gelss" where it >> > is assumed >> > that machine precision is used if (RCOND < 0). Now I checked the >> > documentation >> > and found out that "gelsy" does not have this default behavior >> > (although experiments show that it works ok) >> > However, I think we can pass machine precision form python if the >> > user did not provided the >> > condition number. I need to test is a bit more. >> > It is still the fastest algorithm in most cases. >> >> Forgive me if I am ignorant here: >> >> Shouldn't you do an SVD first and compute RCOND from the singular values >> before you call *gelsy? >> >> Granted, using an arbitrary RCOND value might work, but it is not how >> *gelsy is supposed to be used, AFAIK. >> > > Isn't rcond a choice variable for the threshold of considering near > singular as singular as in pinv ? > (and not something that is calculated from the data) > > The are similar rcond choices, predefined or user given, in other numpy or > scipy linalg routines. > > mostly guessing > or not guessing: rcond rcond is used to determine the effective rank of A, which is defined as the order of the largest leading triangular submatrix R11 in the QR factorization with pivoting of A, whose estimated condition number < 1/rcond. https://software.intel.com/en-us/node/521113 Josef > > Josef > > > >> >> Sturla >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jan 20 12:49:52 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 20 Jan 2015 12:49:52 -0500 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> <54BE2641.4090203@gmail.com> <1926630893443465905.254848sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, Jan 20, 2015 at 12:47 PM, wrote: > > > On Tue, Jan 20, 2015 at 12:40 PM, wrote: > >> >> >> On Tue, Jan 20, 2015 at 12:01 PM, Sturla Molden >> wrote: >> >>> Alexander Grigorievskiy wrote: >>> >>> > 3) Indeed, "gelsy" requires the condition number RCOND. Actually, >>> currently >>> > I pass -1 as the condition number by analogy with "gelss" where it >>> > is assumed >>> > that machine precision is used if (RCOND < 0). Now I checked the >>> > documentation >>> > and found out that "gelsy" does not have this default behavior >>> > (although experiments show that it works ok) >>> > However, I think we can pass machine precision form python if the >>> > user did not provided the >>> > condition number. I need to test is a bit more. >>> > It is still the fastest algorithm in most cases. >>> >>> Forgive me if I am ignorant here: >>> >>> Shouldn't you do an SVD first and compute RCOND from the singular values >>> before you call *gelsy? >>> >>> Granted, using an arbitrary RCOND value might work, but it is not how >>> *gelsy is supposed to be used, AFAIK. >>> >> >> Isn't rcond a choice variable for the threshold of considering near >> singular as singular as in pinv ? >> (and not something that is calculated from the data) >> >> The are similar rcond choices, predefined or user given, in other numpy >> or scipy linalg routines. >> >> mostly guessing >> > > or not guessing: > > rcond > > rcond is used to determine the effective rank of A, which is defined as > the order of the largest leading triangular submatrix R11 in the QR > factorization with pivoting of A, whose estimated condition number < > 1/rcond. > > https://software.intel.com/en-us/node/521113 > and a bit further down: Default value for this element is rcond = 100*EPSILON(1.0_WP). for whatever EPSILON and 1._WP are (I didn't check) Josef > > Josef > > >> >> Josef >> >> >> >>> >>> Sturla >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Tue Jan 20 12:51:37 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 20 Jan 2015 18:51:37 +0100 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> <54BE2641.4090203@gmail.com> <1926630893443465905.254848sturla.molden-gmail.com@news.gmane.org> Message-ID: On 20/01/15 18:47, josef.pktd at gmail.com wrote: > rcond > > rcond is used to determine the effective rank of A, which is defined as > the order of the largest leading triangular submatrix R11 in the QR > factorization with pivoting of A, whose estimated condition number < > 1/rcond. > > https://software.intel.com/en-us/node/521113 Normally rcond (reciprocal condtitioning number) means the ratio of the smallest singular value to the largest, i.e. a matrix is singular when rcond is zero. But perhaps it means something else in LAPACK? Sturla From josef.pktd at gmail.com Tue Jan 20 12:59:08 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 20 Jan 2015 12:59:08 -0500 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> <54BE2641.4090203@gmail.com> <1926630893443465905.254848sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, Jan 20, 2015 at 12:51 PM, Sturla Molden wrote: > On 20/01/15 18:47, josef.pktd at gmail.com wrote: > > > rcond > > > > rcond is used to determine the effective rank of A, which is defined as > > the order of the largest leading triangular submatrix R11 in the QR > > factorization with pivoting of A, whose estimated condition number < > > 1/rcond. > > > > https://software.intel.com/en-us/node/521113 > > Normally rcond (reciprocal condtitioning number) means the ratio of the > smallest singular value to the largest, i.e. a matrix is singular when > rcond is zero. But perhaps it means something else in LAPACK? > yes that's what it means, however the full name should be `rcond_threshold` AFAICS see for example http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.linalg.pinv.html which has both cond and rcond Josef (I'm the naming police for statsmodels to try to minimize these confusions.) > > Sturla > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewm at redtetrahedron.org Tue Jan 20 13:00:39 2015 From: ewm at redtetrahedron.org (Eric Moore) Date: Tue, 20 Jan 2015 13:00:39 -0500 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> <54BE2641.4090203@gmail.com> <1926630893443465905.254848sturla.molden-gmail.com@news.gmane.org> Message-ID: No that's what it means. The subroutine calculates exactly that (I'd imagine, didn't check) and compares it to the value you specify. If the calculated value is less than your specified value, the matrix is then considered to be singular. It looks like gelsy discards the smallest singular values until the truncated matrix is no longer singular. Eric On Tuesday, January 20, 2015, Sturla Molden wrote: > On 20/01/15 18:47, josef.pktd at gmail.com wrote: > > > rcond > > > > rcond is used to determine the effective rank of A, which is defined as > > the order of the largest leading triangular submatrix R11 in the QR > > factorization with pivoting of A, whose estimated condition number < > > 1/rcond. > > > > https://software.intel.com/en-us/node/521113 > > Normally rcond (reciprocal condtitioning number) means the ratio of the > smallest singular value to the largest, i.e. a matrix is singular when > rcond is zero. But perhaps it means something else in LAPACK? > > Sturla > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jan 20 13:05:54 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 20 Jan 2015 13:05:54 -0500 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> <54BE2641.4090203@gmail.com> <1926630893443465905.254848sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, Jan 20, 2015 at 1:00 PM, Eric Moore wrote: > No that's what it means. The subroutine calculates exactly that (I'd > imagine, didn't check) and compares it to the value you specify. If the > calculated value is less than your specified value, the matrix is then > considered to be singular. > > It looks like gelsy discards the smallest singular values until the > truncated matrix is no longer singular. > svd (or pinv) based least squares solution also needs to do this. the current scipy.linalg.lstsq has a `cond` argument http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.linalg.lstsq.html >From the quick reading I'm not sure whether the svd version and the pivoting qr version use the same eigenvalues/condition number in the comparison. Josef > > Eric > > > On Tuesday, January 20, 2015, Sturla Molden > wrote: > >> On 20/01/15 18:47, josef.pktd at gmail.com wrote: >> >> > rcond >> > >> > rcond is used to determine the effective rank of A, which is defined as >> > the order of the largest leading triangular submatrix R11 in the QR >> > factorization with pivoting of A, whose estimated condition number < >> > 1/rcond. >> > >> > https://software.intel.com/en-us/node/521113 >> >> Normally rcond (reciprocal condtitioning number) means the ratio of the >> smallest singular value to the largest, i.e. a matrix is singular when >> rcond is zero. But perhaps it means something else in LAPACK? >> >> Sturla >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Tue Jan 20 13:06:21 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 20 Jan 2015 19:06:21 +0100 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> <54BE2641.4090203@gmail.com> <1926630893443465905.254848sturla.molden-gmail.com@news.gmane.org> Message-ID: On 20/01/15 18:49, josef.pktd at gmail.com wrote: > Default value for this element is rcond = 100*EPSILON(1.0_WP). > > for whatever EPSILON and 1._WP are (I didn't check) EPSILON(1.0_WP) is the smallest number E such that 1.0 + E > 1.0 in "working precision" (hence: _WP). Working precision is a consept in Fortran used for floating point constants. It means that when the constant is used in an expression the compiler will select the correct precision depending on the other arguments. EPSILON() is an intrinsic Fortran function. Sturla From sturla.molden at gmail.com Tue Jan 20 13:12:37 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 20 Jan 2015 19:12:37 +0100 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> <54BE2641.4090203@gmail.com> <1926630893443465905.254848sturla.molden-gmail.com@news.gmane.org> Message-ID: On 20/01/15 19:00, Eric Moore wrote: > No that's what it means. The subroutine calculates exactly that (I'd > imagine, didn't check) and compares it to the value you specify. If the > calculated value is less than your specified value, the matrix is then > considered to be singular. I see, so it is the smallest RCOND allowed, and not the true RCOND. Sturla From ralf.gommers at gmail.com Wed Jan 21 01:12:06 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 21 Jan 2015 07:12:06 +0100 Subject: [SciPy-Dev] CSD in scipy.signal? In-Reply-To: <25502383-0F7C-45ED-A432-BF8436340462@caltech.edu> References: <25502383-0F7C-45ED-A432-BF8436340462@caltech.edu> Message-ID: Hi Eric, On Sun, Jan 18, 2015 at 5:51 AM, Eric Quintero wrote: > Hi all, > > I would like to write cross-spectral density (and, by extension, > coherence) methods for scipy.signal, as a complement the welch?s PSD method > already there. I haven?t contributed to scipy before, and the website > encourages discussion of new features on this list, so here I am. > This sounds interesting. We'd definitely like to have a more comprehensive set of spectral density methods in scipy.signal. Do you have a particular (set of) algorithm(s) in mind? > In addition, when computing PSDs in my field, we sometimes prefer to take > the median of the segments, rather than the mean, in order to reduce the > effect of transients in the data when trying to evaluate a stationary noise > floor. I would like to add this option as a kwarg. > If that common in your field, then adding an option for it makes sense. I'd suggest to send a separate PR for that, and keep in mind to add the new keyword at the end to keep backwards compatibility. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.grigorievskiy at gmail.com Wed Jan 21 03:56:51 2015 From: alex.grigorievskiy at gmail.com (Alexander Grigorievskiy) Date: Wed, 21 Jan 2015 10:56:51 +0200 Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal In-Reply-To: References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> <54BE2641.4090203@gmail.com> Message-ID: <54BF69D3.8030402@gmail.com> On 01/20/2015 03:56 PM, josef.pktd at gmail.com wrote: > > > P.S I have conducted couple more experiments in which I varied > only one > dimension of the matrix: m or n. > The results are in the attachment. > > > Can you add another case for m increasing and n, k fixed with larger m > to see the different trend behavior in the first graph? > e.g. m increasing to at least 100,000, n fixed at 500 or less, and > rank fixed closer to n. > (assuming I didn't mix up m or n) > > Josef I have done this experiment (see attachment) and it confirms that "gelsd" should be the default method. "gelsy" is not efficient here, but it was when the matrix is more square. SVD based methods scale better here. I have also done the same experiment for the transposed case (small m, large n), result is the same. > On 20/01/15 19:00, Eric Moore wrote: > >> > No that's what it means. The subroutine calculates exactly that (I'd >> > imagine, didn't check) and compares it to the value you specify. If the >> > calculated value is less than your specified value, the matrix is then >> > considered to be singular. > I see, so it is the smallest RCOND allowed, and not the true RCOND. > > Sturla Yes, RCOND is supposed to be provided by the user and defines the largest condition number allowed. ( Actually, one should provide the inverse of the condition number - some small value) Based on that the subroutines determine which singular values to nullify or in case of QR algorithm when to stop QR and assume remaining lower-right submatrix is zero. Most of the time, however, users do not care about this and subroutines automatically select some value e.g. machine precision. But I have not found this default behavior for "gelsy" therefore I proposed to assign some small value i.e machine precision from python. Regards, Alex Grigorievskiy. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ls_test_4_speeds.png Type: image/png Size: 65500 bytes Desc: not available URL: From sturla.molden at gmail.com Wed Jan 21 15:30:03 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 21 Jan 2015 20:30:03 +0000 (UTC) Subject: [SciPy-Dev] Least-Squares Linear Solver ( scipy.linalg.lstsq ) not optimal References: <54BBBA46.2070802@gmail.com> <54BCC61B.7070501@gmail.com> <54BE2641.4090203@gmail.com> <54BF69D3.8030402@gmail.com> Message-ID: <1628135832443564804.052054sturla.molden-gmail.com@news.gmane.org> Personally I have nothing against using *gelsd instead of *gelss. Nearly all users of SciPy have CPUs with some sort of SIMD instructions these days. The only difference is the name of the routine. NumPy would also benefit from using divide-and-conquer SVD. Sturla Alexander Grigorievskiy wrote: > On 01/20/2015 03:56 PM, josef.pktd at gmail.com wrote: >> >> >> P.S I have conducted couple more experiments in which I varied >> only one >> dimension of the matrix: m or n. >> The results are in the attachment. >> >> >> Can you add another case for m increasing and n, k fixed with larger m >> to see the different trend behavior in the first graph? >> e.g. m increasing to at least 100,000, n fixed at 500 or less, and >> rank fixed closer to n. >> (assuming I didn't mix up m or n) >> >> Josef > > I have done this experiment (see attachment) and it confirms that > "gelsd" should be the default method. "gelsy" is not efficient here, > but it was when the matrix is more square. SVD based methods scale > better here. I have also done the same experiment for the transposed case > (small m, large n), result is the same. > >> On 20/01/15 19:00, Eric Moore wrote: >> >>>> No that's what it means. The subroutine calculates exactly that (I'd >>>> imagine, didn't check) and compares it to the value you specify. If the >>>> calculated value is less than your specified value, the matrix is then >>>> considered to be singular. >> I see, so it is the smallest RCOND allowed, and not the true RCOND. >> >> Sturla > Yes, RCOND is supposed to be provided by the user and defines the > largest condition number allowed. > ( Actually, one should provide the inverse of the condition number - > some small value) > Based on that the subroutines determine which singular values to nullify > or in case of QR algorithm > when to stop QR and assume remaining lower-right submatrix is zero. > Most of the time, however, users do not care about this and subroutines > automatically select > some value e.g. machine precision. > But I have not found this default behavior for "gelsy" therefore I > proposed to assign some small value i.e machine precision > from python. > > Regards, > Alex Grigorievskiy. > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > href="http://mail.scipy.org/mailman/listinfo/scipy-dev">http://mail.scipy.org/mailman/listinfo/scipy-dev From m.hofsaess at gmail.com Fri Jan 23 21:33:53 2015 From: m.hofsaess at gmail.com (=?UTF-8?B?TWFydGluIEhvZnPDpMOf?=) Date: Sat, 24 Jan 2015 03:33:53 +0100 Subject: [SciPy-Dev] scipy minimize with bound Message-ID: Hi all, I want to uses the minimize function with bounds, but I get an error: ValueError: _lbfgsb.setulb() 13rd argument (iprint) can't be converted to int I call the function like this: spo.minimize(residuals,(3.9,40.,0.2),args=(k,puu,pvv,pww,cuw,stat),method='L-BFGS-B',options={'disp':'True','maxiter':100},tol=5.0e-7,bounds=((0,None),(0,None),(0,None))) TNC and SLSQP worked with bounds. I'm using linux 64bit python 2.7.3 with scipy 0.14.0. Thanks for your help. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sat Jan 24 07:05:14 2015 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 24 Jan 2015 14:05:14 +0200 Subject: [SciPy-Dev] scipy minimize with bound In-Reply-To: References: Message-ID: 24.01.2015, 04:33, Martin Hofs?? kirjoitti: > Hi all, > > I want to uses the minimize function with bounds, > > but I get an error: > > ValueError: _lbfgsb.setulb() 13rd argument (iprint) can't be converted to > int > > I call the function like this: > > spo.minimize(residuals,(3.9,40.,0.2),args=(k,puu,pvv,pww,cuw,stat),method='L-BFGS-B',options={'disp':'True','maxiter':100},tol=5.0e-7,bounds=((0,None),(0,None),(0,None))) Should have 'disp': True and not 'disp': 'True' > > > TNC and SLSQP worked with bounds. > > > I'm using linux 64bit python 2.7.3 with scipy 0.14.0. > > Thanks for your help. > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From m.hofsaess at gmail.com Mon Jan 26 03:00:49 2015 From: m.hofsaess at gmail.com (=?UTF-8?B?TWFydGluIEhvZnPDpMOf?=) Date: Mon, 26 Jan 2015 09:00:49 +0100 Subject: [SciPy-Dev] scipy minimize with bound In-Reply-To: References: Message-ID: Thanks, it worked now. 2015-01-24 13:05 GMT+01:00 Pauli Virtanen : > 24.01.2015, 04:33, Martin Hofs?? kirjoitti: > > Hi all, > > > > I want to uses the minimize function with bounds, > > > > but I get an error: > > > > ValueError: _lbfgsb.setulb() 13rd argument (iprint) can't be converted to > > int > > > > I call the function like this: > > > > > spo.minimize(residuals,(3.9,40.,0.2),args=(k,puu,pvv,pww,cuw,stat),method='L-BFGS-B',options={'disp':'True','maxiter':100},tol=5.0e-7,bounds=((0,None),(0,None),(0,None))) > > Should have 'disp': True and not 'disp': 'True' > > > > > > > > TNC and SLSQP worked with bounds. > > > > > > I'm using linux 64bit python 2.7.3 with scipy 0.14.0. > > > > Thanks for your help. > > > > > > > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From contrebasse at gmail.com Mon Jan 26 05:30:57 2015 From: contrebasse at gmail.com (Joseph Martinot-Lagarde) Date: Mon, 26 Jan 2015 10:30:57 +0000 (UTC) Subject: [SciPy-Dev] Update NumPy for Matlab Users Message-ID: Hi, I noticed some problems in the Wiki page "NumPy for Matlab Users" [1], but I dan't edit it myself. Here is what i found: - For symbolic calculation, swiginac is the proposed solution, but it has not been updated on pypi since 2007, and the home page is down. I guess that the standard is now sympy. - There are multiple dead links on the page. It can be easily see on the general informations about this page [2]. - On the pros and cons to use array or matrix in numpy about dot(), the form dot(dot(A,B),C) should be replaced by the more readeable A.dot(B).dot(C). Also this will have to change when the '@' operator is introduced in python 3.5. - In "Customizing Your Environment", numpy is imported as "num", while the commonly used name is "np". THis page is really a great help for Matlab users, keeping it up to date is really important ! Joseph [1] http://wiki.scipy.org/NumPy_for_Matlab_Users [2] http://wiki.scipy.org/NumPy_for_Matlab_Users?action=info&general=1 From sdpan21 at gmail.com Tue Jan 27 12:28:40 2015 From: sdpan21 at gmail.com (dp docs) Date: Tue, 27 Jan 2015 22:58:40 +0530 Subject: [SciPy-Dev] Regarding involvement in Scipy projects Message-ID: Dear Developers, I believe that I have good knowledge of C/C++, python programming languages. I have read the project Ideas from the idea page and most of them seems me interesting. But I have no clue from where should I start. Can anyone please provide me a sequence wise guideline so that I can be able to get involved successfully in the development in this open source projects. Please provide the link of the Source code also which I need to modify or re factor in case if it is better to start from re factoring. i shall be very much grateful to you for your effort. Thanks Durgesh Pandey. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jakevdp at cs.washington.edu Tue Jan 27 12:32:06 2015 From: jakevdp at cs.washington.edu (Jacob Vanderplas) Date: Tue, 27 Jan 2015 09:32:06 -0800 Subject: [SciPy-Dev] Regarding involvement in Scipy projects In-Reply-To: References: Message-ID: Hi Durgesh, Here is some info about how to contribute to scipy: https://github.com/scipy/scipy/blob/master/HACKING.rst.txt Jake Jake VanderPlas Director of Research ? Physical Sciences eScience Institute, University of Washington http://www.vanderplas.com On Tue, Jan 27, 2015 at 9:28 AM, dp docs wrote: > Dear Developers, > I believe that I have good knowledge of C/C++, python programming > languages. I have read the project Ideas from the idea page and most of > them seems me interesting. But I have no clue from where should I start. > Can anyone please provide me a sequence wise guideline so that I can be > able to get involved successfully in the development in this open source > projects. Please provide the link of the Source code also which I need to > modify or re factor in case if it is better to start from re factoring. i > shall be very much grateful to you for your effort. > Thanks > Durgesh Pandey. > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aron at ahmadia.net Tue Jan 27 12:32:42 2015 From: aron at ahmadia.net (Aron Ahmadia) Date: Tue, 27 Jan 2015 12:32:42 -0500 Subject: [SciPy-Dev] Regarding involvement in Scipy projects In-Reply-To: References: Message-ID: Nice link Jake! On Tue, Jan 27, 2015 at 12:32 PM, Jacob Vanderplas < jakevdp at cs.washington.edu> wrote: > Hi Durgesh, > Here is some info about how to contribute to scipy: > https://github.com/scipy/scipy/blob/master/HACKING.rst.txt > Jake > > > Jake VanderPlas > Director of Research ? Physical Sciences > eScience Institute, University of Washington > http://www.vanderplas.com > > On Tue, Jan 27, 2015 at 9:28 AM, dp docs wrote: > >> Dear Developers, >> I believe that I have good knowledge of C/C++, python programming >> languages. I have read the project Ideas from the idea page and most of >> them seems me interesting. But I have no clue from where should I start. >> Can anyone please provide me a sequence wise guideline so that I can be >> able to get involved successfully in the development in this open source >> projects. Please provide the link of the Source code also which I need to >> modify or re factor in case if it is better to start from re factoring. i >> shall be very much grateful to you for your effort. >> Thanks >> Durgesh Pandey. >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdpan21 at gmail.com Tue Jan 27 12:35:16 2015 From: sdpan21 at gmail.com (dp docs) Date: Tue, 27 Jan 2015 23:05:16 +0530 Subject: [SciPy-Dev] Regarding involvement in Scipy projects In-Reply-To: References: Message-ID: Thanks Jacob Durgesh Pandey. On Tue, Jan 27, 2015 at 11:02 PM, Aron Ahmadia wrote: > Nice link Jake! > > On Tue, Jan 27, 2015 at 12:32 PM, Jacob Vanderplas < > jakevdp at cs.washington.edu> wrote: > >> Hi Durgesh, >> Here is some info about how to contribute to scipy: >> https://github.com/scipy/scipy/blob/master/HACKING.rst.txt >> Jake >> >> >> Jake VanderPlas >> Director of Research ? Physical Sciences >> eScience Institute, University of Washington >> http://www.vanderplas.com >> >> On Tue, Jan 27, 2015 at 9:28 AM, dp docs wrote: >> >>> Dear Developers, >>> I believe that I have good knowledge of C/C++, python programming >>> languages. I have read the project Ideas from the idea page and most of >>> them seems me interesting. But I have no clue from where should I start. >>> Can anyone please provide me a sequence wise guideline so that I can be >>> able to get involved successfully in the development in this open source >>> projects. Please provide the link of the Source code also which I need to >>> modify or re factor in case if it is better to start from re factoring. i >>> shall be very much grateful to you for your effort. >>> Thanks >>> Durgesh Pandey. >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: