From matthieu.brucher at gmail.com Sun Dec 1 03:01:01 2013 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Sun, 1 Dec 2013 09:01:01 +0100 Subject: [SciPy-User] "Genetic Algorithm" method support in Python/SciPy In-Reply-To: References: Message-ID: Hi David, I think one of the best package is pyevolve. Cheers, Matthieu 2013/12/1 David Goldsmith : > Hi, folks. Does SciPy have a sub-package for so-called Genetic Algorithm > work? If not in SciPy, does anyone know of a Python package for this? > Thanks! > > DG > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ From d.l.goldsmith at gmail.com Sun Dec 1 13:50:13 2013 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Sun, 1 Dec 2013 10:50:13 -0800 Subject: [SciPy-User] "Genetic Algorithm" method support in Python/SciPy Message-ID: Thanks, Matthieu! DG On Sun, Dec 1, 2013 at 10:00 AM, wrote: > Send SciPy-User mailing list submissions to > scipy-user at scipy.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.scipy.org/mailman/listinfo/scipy-user > or, via email, send a message with subject or body 'help' to > scipy-user-request at scipy.org > > You can reach the person managing the list at > scipy-user-owner at scipy.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of SciPy-User digest..." > > > Today's Topics: > > 1. "Genetic Algorithm" method support in Python/SciPy > (David Goldsmith) > 2. Re: "Genetic Algorithm" method support in Python/SciPy > (Matthieu Brucher) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sat, 30 Nov 2013 17:35:56 -0800 > From: David Goldsmith > Subject: [SciPy-User] "Genetic Algorithm" method support in > Python/SciPy > To: scipy-user at scipy.org > Message-ID: > Zig at mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > Hi, folks. Does SciPy have a sub-package for so-called Genetic > Algorithmwork? If not > in SciPy, does anyone know of a Python package for this? > Thanks! > > DG > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mail.scipy.org/pipermail/scipy-user/attachments/20131130/29d6666d/attachment-0001.html > > ------------------------------ > > Message: 2 > Date: Sun, 1 Dec 2013 09:01:01 +0100 > From: Matthieu Brucher > Subject: Re: [SciPy-User] "Genetic Algorithm" method support in > Python/SciPy > To: SciPy Users List > Message-ID: > n0157AgQOkAw at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi David, > > I think one of the best package is pyevolve. > > Cheers, > > Matthieu > > 2013/12/1 David Goldsmith : > > Hi, folks. Does SciPy have a sub-package for so-called Genetic Algorithm > > work? If not in SciPy, does anyone know of a Python package for this? > > Thanks! > > > > DG > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > -- > Information System Engineer, Ph.D. > Blog: http://matt.eifelle.com > LinkedIn: http://www.linkedin.com/in/matthieubrucher > Music band: http://liliejay.com/ > > > ------------------------------ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Mon Dec 2 16:18:31 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Mon, 2 Dec 2013 16:18:31 -0500 Subject: [SciPy-User] scipy.sparse.[vh]stack and a_sparse_matrix.__setitem__(ndarray, value) broken Message-ID: Hi, I need to do some work around bugs a user reported on Theano mailing list. emails: https://groups.google.com/forum/?fromgroups=#!topic/theano-users/Hu9ve3AIag8 work around: https://github.com/Theano/Theano/pull/1636 The 3 problems are: 1) a_sparse_matrix.__setitem(ndarray, value) don't work anymore when the ndarray contain only 2 value. Fix: casting the ndarray to a tuple: 2) scipy.sparse.vstack(block, format=self.format, dtype=self.dtype) Do not cast block to the wanted dtype. Fix: check if the dtype is right, if not, call astype(dtype) 3) same as 2 for hstack Fr?d?ric From pav at iki.fi Mon Dec 2 17:13:23 2013 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 03 Dec 2013 00:13:23 +0200 Subject: [SciPy-User] scipy.sparse.[vh]stack and a_sparse_matrix.__setitem__(ndarray, value) broken In-Reply-To: References: Message-ID: Hi, 02.12.2013 23:18, Fr?d?ric Bastien kirjoitti: [clip] > 1) a_sparse_matrix.__setitem(ndarray, value) don't work anymore when > the ndarray contain only 2 value. > > Fix: casting the ndarray to a tuple: That it worked similarly as a tuple is a bug, actually. The current behavior is correct: >>> from scipy.sparse import csr_matrix >>> import numpy as np >>> x = np.arange(5*5).reshape(5,5) >>> y = csr_matrix(x) >>> x[np.array([1,3])] array([[ 5, 6, 7, 8, 9], [15, 16, 17, 18, 19]]) >>> y[np.array([1,3])].todense() matrix([[ 5, 6, 7, 8, 9], [15, 16, 17, 18, 19]]) >>> y[np.array([1,3])] = 5 >>> y[np.array([1,3])].todense() matrix([[5, 5, 5, 5, 5], [5, 5, 5, 5, 5]]) Now, you could perhaps argue for bug-for-bug backward compatibility, but unfortunately this is not a realistic option in the current state of scipy.sparse. > 2) scipy.sparse.vstack(block, format=self.format, > dtype=self.dtype) > > Do not cast block to the wanted dtype. > > Fix: check if the dtype is right, if not, call astype(dtype) > > 3) same as 2 for hstack These are probably due to the CSR/CSC fast path added recently to hstack/vstack in scipy master. Please report this to the Scipy issue tracker, so we remember it. -- Pauli Virtanen From ondrej.certik at gmail.com Mon Dec 2 19:17:01 2013 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Mon, 2 Dec 2013 17:17:01 -0700 Subject: [SciPy-User] Vectorized spherical Bessel functions Message-ID: Hi, I need to apply spherical bessel function (values) to a vector. The current functions accept a scalar and return two arrays of values and derivatives as follows: >>> from scipy.special import sph_jn >>> sph_jn(0, 5.) (array([-0.19178485]), array([ 0.09508941])) So in order to vectorize it, I use: def j0(x): res = empty(len(x), dtype="double") for i in range(len(x)): res[i] = sph_jn(0, x[i])[0][0] return res Which is really slow for larger vectors... Any ideas how to quickly get an array of values? I can use Cython, etc. but I was wondering whether there is some obvious way to do this from Python using current SciPy. Ondrej From guziy.sasha at gmail.com Mon Dec 2 19:44:30 2013 From: guziy.sasha at gmail.com (Oleksandr Huziy) Date: Mon, 2 Dec 2013 19:44:30 -0500 Subject: [SciPy-User] Vectorized spherical Bessel functions In-Reply-To: References: Message-ID: Hi: have you tried numpy.vectorize? In [3]: import numpy as np In [4]: jn_vect = np.vectorize(sph_jn) In [9]: jn_vect(0, [0.1, 0.2, 0.3, 0.5]) Out[9]: (array([ 0.99833417, 0.99334665, 0.98506736, 0.95885108]), array([-0.03330001, -0.06640038, -0.09910289, -0.16253703])) In [10]: jn_vect([0] * 4, [0.1, 0.2, 0.3, 0.5]) Out[10]: (array([ 0.99833417, 0.99334665, 0.98506736, 0.95885108]), array([-0.03330001, -0.06640038, -0.09910289, -0.16253703])) Cheers 2013/12/2 Ond?ej ?ert?k > Hi, > > I need to apply spherical bessel function (values) to a vector. The > current functions accept a scalar and return two arrays of values and > derivatives as follows: > > >>> from scipy.special import sph_jn > >>> sph_jn(0, 5.) > (array([-0.19178485]), array([ 0.09508941])) > > > So in order to vectorize it, I use: > > def j0(x): > res = empty(len(x), dtype="double") > for i in range(len(x)): > res[i] = sph_jn(0, x[i])[0][0] > return res > > Which is really slow for larger vectors... Any ideas how to quickly > get an array of values? > > I can use Cython, etc. but I was wondering whether there is some > obvious way to do this from Python using current SciPy. > > Ondrej > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Sasha -------------- next part -------------- An HTML attachment was scrubbed... URL: From ondrej.certik at gmail.com Mon Dec 2 23:20:54 2013 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Mon, 2 Dec 2013 21:20:54 -0700 Subject: [SciPy-User] Vectorized spherical Bessel functions In-Reply-To: References: Message-ID: Hi Oleksandr, On Mon, Dec 2, 2013 at 5:44 PM, Oleksandr Huziy wrote: > Hi: > > have you tried numpy.vectorize? > > In [3]: import numpy as np > > In [4]: jn_vect = np.vectorize(sph_jn) > > > In [9]: jn_vect(0, [0.1, 0.2, 0.3, 0.5]) > Out[9]: > (array([ 0.99833417, 0.99334665, 0.98506736, 0.95885108]), > array([-0.03330001, -0.06640038, -0.09910289, -0.16253703])) > > In [10]: jn_vect([0] * 4, [0.1, 0.2, 0.3, 0.5]) > Out[10]: > (array([ 0.99833417, 0.99334665, 0.98506736, 0.95885108]), > array([-0.03330001, -0.06640038, -0.09910289, -0.16253703])) Unfortunately, the performance of vectorize() is described in it's docstring: The `vectorize` function is provided primarily for convenience, not for performance. The implementation is essentially a for loop. So it doesn't fix the problem that it's slow. Thanks for the tip though --- at least it has a nice syntax, so I'll be using that. The jn0(x) function is just sin(x)/x, so compared to the intrinsic sin(x) it's just slow. It looks like the only faster option is something like Cython or Numba. Ondrej From pav at iki.fi Tue Dec 3 04:48:06 2013 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 3 Dec 2013 09:48:06 +0000 (UTC) Subject: [SciPy-User] Vectorized spherical Bessel functions References: Message-ID: Ond?ej ?ert?k gmail.com> writes: [clip] > I can use Cython, etc. but I was wondering whether there is some > obvious way to do this from Python using current SciPy. I'm afraid the long-term solution is to roll up your sleeves and implement ufuncs that call CSPHJY, SPHJ, SPHY, CSPHIK, SPHI, SPHIK. Nowadays, this is fairly simple to do, take a look at: generate_ufuncs.py specfun_wrappers.h specfun_wrappers.c -- Pauli Virtanen From lorenzo.isella at gmail.com Tue Dec 3 05:01:54 2013 From: lorenzo.isella at gmail.com (Lorenzo Isella) Date: Tue, 03 Dec 2013 11:01:54 +0100 Subject: [SciPy-User] Packing Algorithm in Python Message-ID: Dear All, I hope this is not too off-topic. Essentially, I am struggling with a problem about maximizing a packing fraction. To fix the ideas: I have a long list of 3D boxes identified by their sizes [a_i,b_i,c_i] (width, depth, height; each one is a discrete number) which I need to put inside a large, infinitely deep container identified by [x, y, inf], with x>=a_i, y>=b_i for every i. The goal is to minimize the maximum height of the highest box in the container. Later on I may consider the packing of boxes which are spherical, cylindrical, etc...but this is more than enough to start with. Is anybody aware of a freely available Python implementation of an algorithm to achieve this (possibly relying on numpy/scipy)? Many thanks Lorenzo From djpine at gmail.com Tue Dec 3 05:19:06 2013 From: djpine at gmail.com (David J Pine) Date: Tue, 3 Dec 2013 11:19:06 +0100 Subject: [SciPy-User] adding linear fitting routine Message-ID: I would like to get some feedback and generate some discussion about a least squares fitting routine I submitted last Friday [please see adding linear fitting routine (29 Nov 2013)]. I know that everybody is very busy, but it would be helpful to get some feedback and, I hope, eventually to get this routine added to one of the basic numpy/scipy libraries. David Pine -------------- next part -------------- An HTML attachment was scrubbed... URL: From athanastasiou at gmail.com Tue Dec 3 05:21:43 2013 From: athanastasiou at gmail.com (Athanasios Anastasiou) Date: Tue, 3 Dec 2013 10:21:43 +0000 Subject: [SciPy-User] Packing Algorithm in Python In-Reply-To: References: Message-ID: Hello I am not aware of a Python implementation specifically but Burr tools could help you with your application (http://burrtools.sourceforge.net/). You can setup a packing problem and the algorithm will return all possible packing assemblies, which then i suppose you could use a quick Python script to find an optimum according to your criteria. The impressive thing about Burr is that it will work with elementary objects that can even contain holes or be in irregular shapes. Other than this, since packing is an NP hard problem, you can start with a given box and develop a graph-based approach with back-tracking. Every side of your initial box (or boxes already in the container) is a potential "port" where other boxes can be attached provided that they don't violate the boundaries of already placed boxes from previous steps or the boundaries of your container. When you have run out of combinations (including the reason for being close to your container's boundaries), you either backtrack and try a different box side or stop the search.(That's the basic idea, obviously it does not take into account symmetry so it might end up counting some solutions twice which would waste computational time). Hope this helps. All the best Athanasios On 3 Dec 2013 10:02, "Lorenzo Isella" wrote: > Dear All, > I hope this is not too off-topic. Essentially, I am struggling with a > problem about maximizing a packing fraction. > To fix the ideas: I have a long list of 3D boxes identified by their sizes > [a_i,b_i,c_i] (width, depth, height; each one is a discrete number) which > I need to put inside a large, infinitely deep container identified by [x, > y, inf], with x>=a_i, y>=b_i for every i. > The goal is to minimize the maximum height of the highest box in the > container. Later on I may consider the packing of boxes which are > spherical, cylindrical, etc...but this is more than enough to start with. > Is anybody aware of a freely available Python implementation of an > algorithm to achieve this (possibly relying on numpy/scipy)? > Many thanks > > Lorenzo > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Tue Dec 3 09:31:01 2013 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 3 Dec 2013 14:31:01 +0000 Subject: [SciPy-User] Repeated Measure ANOVA In-Reply-To: References: Message-ID: On Wed, Nov 6, 2013 at 1:49 PM, Horea Christian wrote: > Hi guys, I would like to compare reaction times for a series of > experimental conditions. My data comes from ~100 trial repetitions over 10 > participants (yielding ~1000 trials). I was told that just doing an ANOVA > on this dataset would be improper, because the 1000 measurements are not > truly independent - and that the proper way to do this is called a repeated > measure ANOVA. > > I have tried to look for a scipy function for this and found nothing. In this > relevant discussiona participant pointed the following out: > > "Repeated measures" ANOVA is just a misnomer for using the "randomized >> block design" as a substitute for not knowing MANOVA or Hotelling's T- >> square test, and as such leads to conclusions that are very hard to >> interpret. The real value of repeated measures ANOVA in medical >> litterature is often to inform the reader that the authors don't >> understand the statistics they use ;-) > > > I would like to know whether I'm looking for the right thing at all, and > if yes how I could accomplish this with scipy > Repeated Measures ANOVA is waiting a champion. I don't think it's going to be entirely trivial to get it right, and I just don't have the bandwidth right now to put in any (unpaid) time on this, though maybe it'll fall out of our ongoing panel data work in statsmodels (it's unclear to me right now). https://github.com/statsmodels/statsmodels/issues/749 https://github.com/statsmodels/statsmodels/pull/786 https://github.com/statsmodels/statsmodels/issues/646 Skipper -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Tue Dec 3 13:29:29 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Tue, 3 Dec 2013 13:29:29 -0500 Subject: [SciPy-User] scipy.sparse.[vh]stack and a_sparse_matrix.__setitem__(ndarray, value) broken In-Reply-To: References: Message-ID: On Mon, Dec 2, 2013 at 5:13 PM, Pauli Virtanen wrote: > Hi, > > 02.12.2013 23:18, Fr?d?ric Bastien kirjoitti: > [clip] >> 1) a_sparse_matrix.__setitem(ndarray, value) don't work anymore when >> the ndarray contain only 2 value. >> >> Fix: casting the ndarray to a tuple: > > That it worked similarly as a tuple is a bug, actually. > The current behavior is correct: > >>>> from scipy.sparse import csr_matrix >>>> import numpy as np >>>> x = np.arange(5*5).reshape(5,5) >>>> y = csr_matrix(x) >>>> x[np.array([1,3])] > array([[ 5, 6, 7, 8, 9], > [15, 16, 17, 18, 19]]) >>>> y[np.array([1,3])].todense() > matrix([[ 5, 6, 7, 8, 9], > [15, 16, 17, 18, 19]]) >>>> y[np.array([1,3])] = 5 >>>> y[np.array([1,3])].todense() > matrix([[5, 5, 5, 5, 5], > [5, 5, 5, 5, 5]]) > > Now, you could perhaps argue for bug-for-bug backward compatibility, but > unfortunately this is not a realistic option in the current state of > scipy.sparse. Thanks for the fix! I didn't realize this was a bug fix at the same time. I don't want bug-for-bug backward compatibility. >> 2) scipy.sparse.vstack(block, format=self.format, >> dtype=self.dtype) >> >> Do not cast block to the wanted dtype. >> >> Fix: check if the dtype is right, if not, call astype(dtype) >> >> 3) same as 2 for hstack > > These are probably due to the CSR/CSC fast path added recently to > hstack/vstack in scipy master. Please report this to the Scipy issue > tracker, so we remember it. Done. But a user told it had scipy 0.13.1 and had this problem. I tested with that version, and I don't have this problem. So it is probably in the development version as you tell. Here is the issues: https://github.com/scipy/scipy/issues/3111 thanks Fred From ondrej.certik at gmail.com Tue Dec 3 14:12:22 2013 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Tue, 3 Dec 2013 12:12:22 -0700 Subject: [SciPy-User] Vectorized spherical Bessel functions In-Reply-To: References: Message-ID: On Tue, Dec 3, 2013 at 2:48 AM, Pauli Virtanen wrote: > Ond?ej ?ert?k gmail.com> writes: > [clip] >> I can use Cython, etc. but I was wondering whether there is some >> obvious way to do this from Python using current SciPy. > > I'm afraid the long-term solution is to roll up your sleeves > and implement ufuncs that call CSPHJY, SPHJ, SPHY, CSPHIK, > SPHI, SPHIK. > > Nowadays, this is fairly simple to do, take a look at: > > generate_ufuncs.py > specfun_wrappers.h > specfun_wrappers.c I see, so it's a bug, I reported it: https://github.com/scipy/scipy/issues/3113 I'll see if I have time, indeed I should be able to figure it out. Ondrej From nouiz at nouiz.org Tue Dec 3 14:50:37 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Tue, 3 Dec 2013 14:50:37 -0500 Subject: [SciPy-User] Theano 0.6 released Message-ID: What's New ---------- We recommend that everybody update to this version. Highlights (since 0.6rc5): * Last release with support for Python 2.4 and 2.5. * We will try to release more frequently. * Fix crash/installation problems. * Use less memory for conv3d2d. 0.6rc4 skipped for a technical reason. Highlights (since 0.6rc3): * Python 3.3 compatibility with buildbot test for it. * Full advanced indexing support. * Better Windows 64 bit support. * New profiler. * Better error messages that help debugging. * Better support for newer NumPy versions (remove useless warning/crash). * Faster optimization/compilation for big graph. * Move in Theano the Conv3d2d implementation. * Better SymPy/Theano bridge: Make an Theano op from SymPy expression and use SymPy c code generator. * Bug fixes. Change from 0.6rc5: * Fix crash when specifing march in cxxflags Theano flag. (Frederic B., reported by FiReTiTi) * code cleanup (Jorg Bornschein) * Fix Canopy installation on windows when it was installed for all users: Raingo * Fix Theano tests due to a scipy change. (Frederic B.) * Work around bug introduced in scipy dev 0.14. (Frederic B.) * Fix Theano tests following bugfix in SciPy. (Frederic B., reported by Ziyuan Lin) * Add Theano flag cublas.lib (Misha Denil) * Make conv3d2d work more inplace (so less memory usage) (Frederic B., repoted by Jean-Philippe Ouellet) See https://pypi.python.org/pypi/Theano for more details. Download and Install -------------------- You can download Theano from http://pypi.python.org/pypi/Theano Installation instructions are available at http://deeplearning.net/software/theano/install.html Description ----------- Theano is a Python library that allows you to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays. It is built on top of NumPy. Theano features: * tight integration with NumPy: a similar interface to NumPy's. numpy.ndarrays are also used internally in Theano-compiled functions. * transparent use of a GPU: perform data-intensive computations up to 140x faster than on a CPU (support for float32 only). * efficient symbolic differentiation: Theano can compute derivatives for functions of one or many inputs. * speed and stability optimizations: avoid nasty bugs when computing expressions such as log(1+ exp(x)) for large values of x. * dynamic C code generation: evaluate expressions faster. * extensive unit-testing and self-verification: includes tools for detecting and diagnosing bugs and/or potential problems. Theano has been powering large-scale computationally intensive scientific research since 2007, but it is also approachable enough to be used in the classroom (IFT6266 at the University of Montreal). Resources --------- About Theano: http://deeplearning.net/software/theano/ Theano-related projects: http://github.com/Theano/Theano/wiki/Related-projects About NumPy: http://numpy.scipy.org/ About SciPy: http://www.scipy.org/ Machine Learning Tutorial with Theano on Deep Architectures: http://deeplearning.net/tutorial/ Acknowledgments --------------- I would like to thank all contributors of Theano. For this particular release (since 0.5), many people have helped, notably: Frederic Bastien Pascal Lamblin Ian Goodfellow Olivier Delalleau Razvan Pascanu abalkin Arnaud Bergeron Nicolas Bouchard + Jeremiah Lowin + Matthew Rocklin Eric Larsen + James Bergstra David Warde-Farley John Salvatier + Vivek Kulkarni + Yann N. Dauphin Ludwig Schmidt-Hackenberg + Gabe Schwartz + Rami Al-Rfou' + Guillaume Desjardins Caglar + Sigurd Spieckermann + Steven Pigeon + Bogdan Budescu + Jey Kottalam + Mehdi Mirza + Alexander Belopolsky + Ethan Buchman + Jason Yosinski Nicolas Pinto + Sina Honari + Ben McCann + Graham Taylor Hani Almousli Ilya Dyachenko + Jan Schl?ter + Jorg Bornschein + Micky Latowicki + Yaroslav Halchenko + Eric Hunsberger + Amir Elaguizy + Hannes Schulz + Huy Nguyen + Ilan Schnell + Li Yao Misha Denil + Robert Kern + Sebastian Berg + Vincent Dumoulin + Wei Li + XterNalz + A total of 51 people contributed to this release. People with a "+" by their names contributed a patch for the first time. Also, thank you to all NumPy and Scipy developers as Theano builds on their strengths. All questions/comments are always welcome on the Theano mailing-lists ( http://deeplearning.net/software/theano/#community ) From guziy.sasha at gmail.com Tue Dec 3 15:42:48 2013 From: guziy.sasha at gmail.com (Oleksandr Huziy) Date: Tue, 3 Dec 2013 15:42:48 -0500 Subject: [SciPy-User] Matplotlib 1.3.1: plot(matrix("1, 2, 3")) -> RuntimeError: maximum recursion depth exceeded In-Reply-To: References: Message-ID: Hi: It is supposed to be fixed on github: https://github.com/matplotlib/matplotlib/commit/cee4ba990c7e209561e4deec75452e9dc97c5a30 Try to install using pip from there. cheers 2013/11/17 Klaus > Hi, > > I am working with python 2.7.5 using > > - numpy.__version__: 1.7.1 > - > - matplotlib.__version__: 1.3.1 > > When I start "ipython2 --pylab" and execute the following code > > x = matrix("1,2,3") >> plot(x) > > > I get the error message > > [...] >> /usr/lib/python2.7/site-packages/matplotlib/units.pyc in >> get_converter(self, x) >> 146 except AttributeError: >> 147 # not a masked_array >> --> 148 converter = self.get_converter(xravel[0]) >> 149 return converter >> 150 >> /usr/lib/python2.7/site-packages/numpy/matrixlib/defmatrix.py in >> __getitem__(self, index) >> 303 >> 304 try: >> --> 305 out = N.ndarray.__getitem__(self, index) >> 306 finally: >> 307 self._getitem = False >> RuntimeError: maximum recursion depth exceeded > > > In the older matplotlib version 1.3.0 this error was not present. > > Any help is highly appreciated! > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- Sasha -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmhobson at gmail.com Tue Dec 3 17:46:53 2013 From: pmhobson at gmail.com (Paul Hobson) Date: Tue, 3 Dec 2013 14:46:53 -0800 Subject: [SciPy-User] log normal distribution random number array generation In-Reply-To: <1383319029.73853.YahooMailNeo@web142305.mail.bf1.yahoo.com> References: <1383319029.73853.YahooMailNeo@web142305.mail.bf1.yahoo.com> Message-ID: Jose, For lognorm.rvs, mu and sigma translate to loc and scale, respectively. The same is true for norm.rvs -paul On Fri, Nov 1, 2013 at 8:17 AM, Jos? Luis Mietta < joseluismietta at yahoo.com.ar> wrote: > Hi experts! > > I wanna generate a random number array of size=N using a log normal > distribution. From http://en.wikipedia.org/wiki/Log-normal_distribution i > wanna use the parameters mu and sigma. > > I know that I must do: > > form scipy.stats import lognorm > new_array = lognorm.rvs(......, size=N) > > What must I set like parameters (loc, s, scale, etc.) for use mu and sigma > distribution parameters. > > In the same way: what must I do in > new_array = norm.rvs(......, size=N) > for generate a array of random numbers using a gaussian distribution with > parameters mu and sigma? > > Waitign for your answers. > > Thanks a lot! > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Dec 3 18:08:36 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 3 Dec 2013 18:08:36 -0500 Subject: [SciPy-User] log normal distribution random number array generation In-Reply-To: References: <1383319029.73853.YahooMailNeo@web142305.mail.bf1.yahoo.com> Message-ID: On Tue, Dec 3, 2013 at 5:46 PM, Paul Hobson wrote: > Jose, > > For lognorm.rvs, mu and sigma translate to loc and scale, respectively. The > same is true for norm.rvs For the lognorm, mu and sigma are often used as parameters of the underlying normal distribution, not directly of the lognormal mean and scale "If log(x) is normally distributed with mean mu and variance sigma**2, then x is log-normally distributed with shape parameter sigma and scale parameter exp(mu)." http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.lognorm.html Josef > > -paul > > > On Fri, Nov 1, 2013 at 8:17 AM, Jos? Luis Mietta > wrote: >> >> Hi experts! >> >> I wanna generate a random number array of size=N using a log normal >> distribution. From http://en.wikipedia.org/wiki/Log-normal_distribution i >> wanna use the parameters mu and sigma. >> >> I know that I must do: >> >> form scipy.stats import lognorm >> new_array = lognorm.rvs(......, size=N) >> >> What must I set like parameters (loc, s, scale, etc.) for use mu and sigma >> distribution parameters. >> >> In the same way: what must I do in >> new_array = norm.rvs(......, size=N) >> for generate a array of random numbers using a gaussian distribution with >> parameters mu and sigma? >> >> Waitign for your answers. >> >> Thanks a lot! >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From robert.kern at gmail.com Wed Dec 4 05:03:37 2013 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 4 Dec 2013 10:03:37 +0000 Subject: [SciPy-User] log normal distribution random number array generation In-Reply-To: References: <1383319029.73853.YahooMailNeo@web142305.mail.bf1.yahoo.com> Message-ID: On Tue, Dec 3, 2013 at 11:08 PM, wrote: > > On Tue, Dec 3, 2013 at 5:46 PM, Paul Hobson wrote: > > Jose, > > > > For lognorm.rvs, mu and sigma translate to loc and scale, respectively. The > > same is true for norm.rvs > > For the lognorm, mu and sigma are often used as parameters of the > underlying normal distribution, not directly of the lognormal mean and > scale > > "If log(x) is normally distributed with mean mu and variance sigma**2, > then x is log-normally distributed with shape parameter sigma and > scale parameter exp(mu)." > http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.lognorm.html Specifically, in order to translate any standard convention to lognorm, you must keep the default loc=0. Most standard conventions for the log-normal distribution do not shift the location at all, just the scale and a shape as explained above. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Wed Dec 4 06:13:59 2013 From: daniele at grinta.net (Daniele Nicolodi) Date: Wed, 04 Dec 2013 12:13:59 +0100 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: Message-ID: <529F0E77.1040106@grinta.net> On 03/12/2013 11:19, David J Pine wrote: > I would like to get some feedback and generate some discussion about a > least squares fitting routine I submitted last Friday [please > see adding linear fitting routine > (29 > Nov 2013)]. I know that everybody is very busy, but it would be helpful > to get some feedback and, I hope, eventually to get this routine added > to one of the basic numpy/scipy libraries. I think that adding least squares fitting routine which handles correctly uncertainties and computes the covariance matrix is a good idea. I wanted to do that myself since quite a while. However, I think that a generalization to arbitrary degree polynomials would be much more useful. A linfit function may be added as a convenience wrapper. Actually it would be nice to have something that works on arbitrary orthogonal bases, but it may be difficult to design a general interface for such a thing. Regarding your pull request, I don't really think that your code can be much faster than the general purpose lest square fitting already in scipy or numpy, modulo some bug somewhere. You justify that saying that your solution is faster because it does not invert a matrix, but this is exactly what you are doing, except that you do not write the math in a matrix formalism. Furthermore, I didn't have a very close look but I don't understand what the `relsigma` parameter is supposed to do, and I would rename the `sigmay` parameter `yerr`. Cheers, Daniele From djpine at gmail.com Wed Dec 4 07:43:52 2013 From: djpine at gmail.com (David J Pine) Date: Wed, 4 Dec 2013 13:43:52 +0100 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: <529F0E77.1040106@grinta.net> References: <529F0E77.1040106@grinta.net> Message-ID: Daniele, Thank you for your feedback. Regarding the points you raise: 1. Generalization to arbitrary degree polynomials. This already exists in numpy.polyfit. One limitation of polyfit is that it does not currently allow the user to provide absolute uncertainties in the data, but there has been some discussion of adding this capability. 2. Generalization to arbitrary orthogonal bases. There currently exists in numpy fitting routines to various polynomials including chebfit, legfit, lagfit, hermfit, hermefit. I am not aware of a fitting routine in numpy/scipy that works on arbitrary bases. 3. Speed. As far as I know there is no bug in any of the tested software. The unit test test_linfit.py (https://github.com/djpine/linfit) times the various fitting routines. You can run it yourself (and check the code--maybe you can spot some errors) but on my laptop I get the results printed at the end of this message. linfit does not call a matrix inversion routine. Instead it calculates the best fit slope and y-intercept directly. By contrast, polyfit does call a matrix inversion routine (numpy.linalg.lstsq), which has a certain amount of overhead that linfit avoids. This may be why polyfit is slower than linfit. 4. relsigma. Other than using no weighting at all, there are basically two ways that people weight data in a least squares fit. (a) provide explicit absolute estimates of the errors (uncertainties) for each data point. This is what physical scientists often do. Setting relsigma=False tells linfit to use this method of weighting the data. If the error estimates are accurate, then the covariance matrix provides estimates of the uncertainties in the fitting parameters (the slope & y-intercept). (b) provide relative estimates of the errors (uncertainties) for each data point (it's assumed that the absolute errors are not known but relative uncertainties between difference data points is known). This is what social scientists often do. When only the relative uncertainties are known, the covariance matrix needs to be rescaled in order to obtain accurate estimates of the uncertainties in the fitting parameters. Setting relsigma=True tells linfit to use this method of weighting the data. 5. Renaming `sigmay` parameter `yerr`. Either choice is fine with me but I used `sigmay` to be (mostly) consistent with scipy.optimize.curve_fit. --------------------------------- Results of timing tests from test_linfit.py test_linfit.py .... Compare linfit to scipy.linalg.lstsq with relative individually weighted data points 10 data points: linfit is faster than scipy.linalg.lstsq by 1.26 times 100 data points: linfit is faster than scipy.linalg.lstsq by 2.33 times 1000 data points: linfit is faster than scipy.linalg.lstsq by 12 times 10000 data points: linfit is faster than scipy.linalg.lstsq by 31.8 times . Compare linfit to scipy.linalg.lstsq with unweighted data points 10 data points: linfit is faster than scipy.linalg.lstsq by 2.4 times 100 data points: linfit is faster than scipy.linalg.lstsq by 2.5 times 1000 data points: linfit is faster than scipy.linalg.lstsq by 2.9 times 10000 data points: linfit is faster than scipy.linalg.lstsq by 3.5 times 100000 data points: linfit is faster than scipy.linalg.lstsq by 4.4 times 1000000 data points: linfit is faster than scipy.linalg.lstsq by 4.6 times . Compare linfit to scipy.stats.linregress with unweighted data points 10 data points: linfit is faster than scipy.stats.linregress by 5.2 times 100 data points: linfit is faster than scipy.stats.linregress by 5.1 times 1000 data points: linfit is faster than scipy.stats.linregress by 4.7 times 10000 data points: linfit is faster than scipy.stats.linregress by 2.9 times 100000 data points: linfit is faster than scipy.stats.linregress by 1.8 times 1000000 data points: linfit is faster than scipy.stats.linregress by 1.1 times . Compare linfit to polyfit with relative individually weighted data points 10 data points: linfit is faster than numpy.polyfit by 2.6 times 100 data points: linfit is faster than numpy.polyfit by 2.5 times 1000 data points: linfit is faster than numpy.polyfit by 4.4 times 10000 data points: linfit is faster than numpy.polyfit by 3.1 times 100000 data points: linfit is faster than numpy.polyfit by 3.5 times 1000000 data points: linfit is faster than numpy.polyfit by 1.9 times . Compare linfit to polyfit with unweighted data points 10 data points: linfit is faster than numpy.polyfit by 3 times 100 data points: linfit is faster than numpy.polyfit by 3.5 times 1000 data points: linfit is faster than numpy.polyfit by 4.3 times 10000 data points: linfit is faster than numpy.polyfit by 6 times 100000 data points: linfit is faster than numpy.polyfit by 9.5 times 1000000 data points: linfit is faster than numpy.polyfit by 7.1 times ..... ---------------------------------------------------------------------- On Wed, Dec 4, 2013 at 12:13 PM, Daniele Nicolodi wrote: > On 03/12/2013 11:19, David J Pine wrote: > > I would like to get some feedback and generate some discussion about a > > least squares fitting routine I submitted last Friday [please > > see adding linear fitting routine > > (29 > > Nov 2013)]. I know that everybody is very busy, but it would be helpful > > to get some feedback and, I hope, eventually to get this routine added > > to one of the basic numpy/scipy libraries. > > > I think that adding least squares fitting routine which handles > correctly uncertainties and computes the covariance matrix is a good > idea. I wanted to do that myself since quite a while. > > However, I think that a generalization to arbitrary degree polynomials > would be much more useful. A linfit function may be added as a > convenience wrapper. Actually it would be nice to have something that > works on arbitrary orthogonal bases, but it may be difficult to design a > general interface for such a thing. > > Regarding your pull request, I don't really think that your code can be > much faster than the general purpose lest square fitting already in > scipy or numpy, modulo some bug somewhere. You justify that saying that > your solution is faster because it does not invert a matrix, but this is > exactly what you are doing, except that you do not write the math in a > matrix formalism. > > Furthermore, I didn't have a very close look but I don't understand what > the `relsigma` parameter is supposed to do, and I would rename the > `sigmay` parameter `yerr`. > > Cheers, > Daniele > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Wed Dec 4 07:55:51 2013 From: daniele at grinta.net (Daniele Nicolodi) Date: Wed, 04 Dec 2013 13:55:51 +0100 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <529F0E77.1040106@grinta.net> Message-ID: <529F2657.2050808@grinta.net> On 04/12/2013 13:43, David J Pine wrote: > linfit does not call a matrix inversion routine. Instead it calculates > the best fit slope and y-intercept directly. By contrast, polyfit does > call a matrix inversion routine (numpy.linalg.lstsq), which has a > certain amount of overhead that linfit avoids. This may be why polyfit > is slower than linfit. A least squares fit is a matrix inversion. What you do is a matrix inversion, except that the notation you use does not make this clear. What you can discuss is the method you use for the inversion. I would have to have a closer look to the test... > 4. relsigma. Other than using no weighting at all, there are basically > two ways that people weight data in a least squares fit. > (a) provide explicit absolute estimates of the errors > (uncertainties) for each data point. This is what physical scientists > often do. Setting relsigma=False tells linfit to use this method of > weighting the data. If the error estimates are accurate, then the > covariance matrix provides estimates of the uncertainties in the fitting > parameters (the slope & y-intercept). > (b) provide relative estimates of the errors (uncertainties) for > each data point (it's assumed that the absolute errors are not known but > relative uncertainties between difference data points is known). This > is what social scientists often do. When only the relative > uncertainties are known, the covariance matrix needs to be rescaled in > order to obtain accurate estimates of the uncertainties in the fitting > parameters. Setting relsigma=True tells linfit to use this method of > weighting the data. This is not really clear from the docstring (plus it optional but no default value is specified in the docstring), and made even less obvious by the name of the parameter used to specify the uncertainties. I would prefer two independent and mutually exclusive parameters for the two cases, 'sigma' and 'relsigma' are one option if you want to be compatible with the (ugly, IMHO) parameter name used by curve_fit. Cheers, Daniele From daniele at grinta.net Wed Dec 4 08:04:00 2013 From: daniele at grinta.net (Daniele Nicolodi) Date: Wed, 04 Dec 2013 14:04:00 +0100 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <529F0E77.1040106@grinta.net> Message-ID: <529F2840.9000100@grinta.net> On 04/12/2013 13:43, David J Pine wrote: > 1. Generalization to arbitrary degree polynomials. This already exists > in numpy.polyfit. One limitation of polyfit is that it does not > currently allow the user to provide absolute uncertainties in the data, > but there has been some discussion of adding this capability. This is a huge limitation IMHO. Furthermore, polyfit() allows only to fit polynomials up to an arbitrary order, not polynomials of arbitrary order (it is not possible to fit y = d * x**3 but only y = a + b * x + c * x**2 + d * x**3). Cheers, Daniele From davidmenhur at gmail.com Wed Dec 4 08:20:14 2013 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Wed, 4 Dec 2013 14:20:14 +0100 Subject: [SciPy-User] [SciPy-Dev] adding linear fitting routine In-Reply-To: References: Message-ID: On 3 December 2013 11:19, David J Pine wrote: > I would like to get some feedback and generate some discussion about a > least squares fitting routine I submitted last Friday > On the wishlist level, I would like to see a complete model fitting, considering errors in both axis and correlation, and the option for a robust fitting system. See details, for example, here: http://arxiv.org/abs/1008.4686 I haven't really needed it myself, so I haven't taken the time to implement it yet. /David. -------------- next part -------------- An HTML attachment was scrubbed... URL: From djpine at gmail.com Wed Dec 4 08:58:28 2013 From: djpine at gmail.com (David Pine) Date: Wed, 4 Dec 2013 14:58:28 +0100 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: <529F2657.2050808@grinta.net> References: <529F0E77.1040106@grinta.net> <529F2657.2050808@grinta.net> Message-ID: Daniele On Dec 4, 2013, at 1:55 PM, Daniele Nicolodi wrote: > On 04/12/2013 13:43, David J Pine wrote: >> linfit does not call a matrix inversion routine. Instead it calculates >> the best fit slope and y-intercept directly. By contrast, polyfit does >> call a matrix inversion routine (numpy.linalg.lstsq), which has a >> certain amount of overhead that linfit avoids. This may be why polyfit >> is slower than linfit. > > A least squares fit is a matrix inversion. What you do is a matrix > inversion, except that the notation you use does not make this clear. > What you can discuss is the method you use for the inversion. I would > have to have a closer look to the test... I assure you that I understand the mathematics. Specifically, I understand that you can view the mathematics used in linfit as implementing matrix inversion. That is not the point. The point is that polyfit calls a matrix inversion routine, which invokes computational machinery that is slow compared to just doing the algebra directly without calling a matrix inversion routine. I hope this is clear. > >> 4. relsigma. Other than using no weighting at all, there are basically >> two ways that people weight data in a least squares fit. >> (a) provide explicit absolute estimates of the errors >> (uncertainties) for each data point. This is what physical scientists >> often do. Setting relsigma=False tells linfit to use this method of >> weighting the data. If the error estimates are accurate, then the >> covariance matrix provides estimates of the uncertainties in the fitting >> parameters (the slope & y-intercept). >> (b) provide relative estimates of the errors (uncertainties) for >> each data point (it's assumed that the absolute errors are not known but >> relative uncertainties between difference data points is known). This >> is what social scientists often do. When only the relative >> uncertainties are known, the covariance matrix needs to be rescaled in >> order to obtain accurate estimates of the uncertainties in the fitting >> parameters. Setting relsigma=True tells linfit to use this method of >> weighting the data. > > This is not really clear from the docstring (plus it optional but no > default value is specified in the docstring), and made even less obvious > by the name of the parameter used to specify the uncertainties. It's specified in the function definition: def linfit(x, y, sigmay=None, relsigma=True, cov=False, chisq=False, residuals=False) which is the way it's always done in the online numpy and scipy documentation. However, I can additionally specify it in the docstring under the parameter definition. > > I would prefer two independent and mutually exclusive parameters for the > two cases, 'sigma' and 'relsigma' are one option if you want to be > compatible with the (ugly, IMHO) parameter name used by curve_fit. Here I disagree. sigmay is an array of error values. resigma is a boolean that simply tells linfit whether to treat the sigmay values as relative (relsigma=True, the default) or as absolute (relsigma=False). David From josef.pktd at gmail.com Wed Dec 4 09:24:28 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 4 Dec 2013 09:24:28 -0500 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <529F0E77.1040106@grinta.net> <529F2657.2050808@grinta.net> Message-ID: On Wed, Dec 4, 2013 at 8:58 AM, David Pine wrote: > Daniele > > > On Dec 4, 2013, at 1:55 PM, Daniele Nicolodi wrote: > >> On 04/12/2013 13:43, David J Pine wrote: >>> linfit does not call a matrix inversion routine. Instead it calculates >>> the best fit slope and y-intercept directly. By contrast, polyfit does >>> call a matrix inversion routine (numpy.linalg.lstsq), which has a >>> certain amount of overhead that linfit avoids. This may be why polyfit >>> is slower than linfit. >> >> A least squares fit is a matrix inversion. What you do is a matrix >> inversion, except that the notation you use does not make this clear. >> What you can discuss is the method you use for the inversion. I would >> have to have a closer look to the test... > > I assure you that I understand the mathematics. Specifically, I understand > that you can view the mathematics used in linfit as implementing matrix > inversion. That is not the point. The point is that polyfit calls a matrix > inversion routine, which invokes computational machinery that is slow > compared to just doing the algebra directly without calling a matrix > inversion routine. I hope this is clear. > >> >>> 4. relsigma. Other than using no weighting at all, there are basically >>> two ways that people weight data in a least squares fit. >>> (a) provide explicit absolute estimates of the errors >>> (uncertainties) for each data point. This is what physical scientists >>> often do. Setting relsigma=False tells linfit to use this method of >>> weighting the data. If the error estimates are accurate, then the >>> covariance matrix provides estimates of the uncertainties in the fitting >>> parameters (the slope & y-intercept). >>> (b) provide relative estimates of the errors (uncertainties) for >>> each data point (it's assumed that the absolute errors are not known but >>> relative uncertainties between difference data points is known). This >>> is what social scientists often do. When only the relative >>> uncertainties are known, the covariance matrix needs to be rescaled in >>> order to obtain accurate estimates of the uncertainties in the fitting >>> parameters. Setting relsigma=True tells linfit to use this method of >>> weighting the data. >> >> This is not really clear from the docstring (plus it optional but no >> default value is specified in the docstring), and made even less obvious >> by the name of the parameter used to specify the uncertainties. > > It's specified in the function definition: > > def linfit(x, y, sigmay=None, relsigma=True, cov=False, chisq=False, residuals=False) > > which is the way it's always done in the online numpy and scipy documentation. > However, I can additionally specify it in the docstring under the parameter > definition. > >> >> I would prefer two independent and mutually exclusive parameters for the >> two cases, 'sigma' and 'relsigma' are one option if you want to be >> compatible with the (ugly, IMHO) parameter name used by curve_fit. > > Here I disagree. sigmay is an array of error values. resigma is a boolean that simply > tells linfit whether to treat the sigmay values as relative (relsigma=True, the default) > or as absolute (relsigma=False). linfit looks like an enhanced version of linregress, which also has only one regressor, but doesn't have weights http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html relsigma is similar to the new `absolute_sigma` in curve_fit https://github.com/scipy/scipy/pull/3098 I think linregress could be rewritten to include these improvements. Otherwise I keep out of any fitting debates, because I think `odr` is better for handling measurement errors in the x variables, and statsmodels is better for everything else (mainly linear only so far) and `lmfit` for nonlinear LS. There might be a case for stripped down convenience functions or special case functions. Josef > > David > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From djpine at gmail.com Wed Dec 4 11:29:55 2013 From: djpine at gmail.com (David Pine) Date: Wed, 4 Dec 2013 17:29:55 +0100 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <529F0E77.1040106@grinta.net> <529F2657.2050808@grinta.net> Message-ID: <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> On Dec 4, 2013, at 3:24 PM, josef.pktd at gmail.com wrote: > > linfit looks like an enhanced version of linregress, which also has > only one regressor, but doesn't have weights > http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html The problem with this is that the statistical tests that linregress uses--the r-value, p-value, & stderr-- are not really compatible with the weighted chi-squared fitting that linfit does. The r-value, p-value, & stderr are statistical tests that are used mostly in the social sciences (see http://en.wikipedia.org/wiki/Coefficient_of_determination). Looking at linregress, it's clear that it was written with that community in mind. By contrast, linfit (and curve_fit) use the chi-squared measure of goodness of fit, which is explicitly made to be used with weighted data. In my opinion, trying to satisfy the needs of both communities with one function will result in inefficient code and confusion in both user communities. linfit naturally goes with the curve_fit and polyfit functions, and is implemented consistent with those fitting routines. linregress is really a different animal, with statistical tests normally used with unweighted data, and I suspect that the community that uses it will be put off by the "improvements" made by linfit. > > relsigma is similar to the new `absolute_sigma` in curve_fit > https://github.com/scipy/scipy/pull/3098 That's right. linfit implements essentially the same functionality that is being implemented in curve_fit > > > I think linregress could be rewritten to include these improvements. > > Otherwise I keep out of any fitting debates, because I think `odr` is > better for handling measurement errors in the x variables, and > statsmodels is better for everything else (mainly linear only so far) > and `lmfit` for nonlinear LS. > There might be a case for stripped down convenience functions or > special case functions. > > Josef -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.sousarj at yahoo.com.br Wed Dec 4 11:58:02 2013 From: david.sousarj at yahoo.com.br (davidsousarj) Date: Wed, 4 Dec 2013 08:58:02 -0800 (PST) Subject: [SciPy-User] Non-linear parameter optimization without least-squares Message-ID: <1386176282390-18956.post@n7.nabble.com> Hi,I am working with python 2.7 using last updated versions of scipy/numpy.I need to find the best parameters to minimize a function that is like this:f(x) = A.x + c.eBx, where A and B are parameters and c in constant. The function is non-linear, and i used to use the method scipy.optimize.leastsq to perform this optimization:xi = np.array([list])yi = np.array([list])p = [A0, B0]def error(params, xi, yi): y0 = f(params, x0) return yi - y0best_p, ok = scipy.optimize.leastsq(error, p, args = (xi,yi))print best_pBut now I want optimize the parameters with a different function, not the sum of deviations squared. If I want to use, for example, the sum of the absolute values of the error, what function of scipy I would use?Thank you. -- View this message in context: http://scipy-user.10969.n7.nabble.com/Non-linear-parameter-optimization-without-least-squares-tp18956.html Sent from the Scipy-User mailing list archive at Nabble.com. -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Dec 4 12:03:35 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 4 Dec 2013 12:03:35 -0500 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> References: <529F0E77.1040106@grinta.net> <529F2657.2050808@grinta.net> <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> Message-ID: On Wed, Dec 4, 2013 at 11:29 AM, David Pine wrote: > > On Dec 4, 2013, at 3:24 PM, josef.pktd at gmail.com wrote: > > > linfit looks like an enhanced version of linregress, which also has > only one regressor, but doesn't have weights > http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html > > > The problem with this is that the statistical tests that linregress > uses--the r-value, p-value, & stderr-- are not really compatible with the > weighted chi-squared fitting that linfit does. The r-value, p-value, & > stderr are statistical tests that are used mostly in the social sciences > (see http://en.wikipedia.org/wiki/Coefficient_of_determination). Looking at > linregress, it's clear that it was written with that community in mind. > > By contrast, linfit (and curve_fit) use the chi-squared measure of goodness > of fit, which is explicitly made to be used with weighted data. In my > opinion, trying to satisfy the needs of both communities with one function > will result in inefficient code and confusion in both user communities. > linfit naturally goes with the curve_fit and polyfit functions, and is > implemented consistent with those fitting routines. linregress is really a > different animal, with statistical tests normally used with unweighted data, > and I suspect that the community that uses it will be put off by the > "improvements" made by linfit. except for setting absolute_sigma to True or relsigma to False and returning redchisq instead of rsquared, there is no real difference. It's still just weighted least squares with fixed or estimated scale. (In statsmodels we have most of the same statistics returned after WLS as after OLS. However, allowing for a fixed scale is still not built in.) You still return the cov of the parameter estimates, so users can still calculate std_err and pvalue themselves in `linfit`. In my interpretation of the discussions around curve_fit, it seems to me that it is now a version that both communities can use. The only problem I see is that linfit/linregress get a bit ugly if there are many optional returns. Josef > > > relsigma is similar to the new `absolute_sigma` in curve_fit > https://github.com/scipy/scipy/pull/3098 > > > That's right. linfit implements essentially the same functionality that is > being implemented in curve_fit > > > > I think linregress could be rewritten to include these improvements. > > Otherwise I keep out of any fitting debates, because I think `odr` is > better for handling measurement errors in the x variables, and > statsmodels is better for everything else (mainly linear only so far) > and `lmfit` for nonlinear LS. > There might be a case for stripped down convenience functions or > special case functions. > > Josef > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From djpine at gmail.com Wed Dec 4 12:47:35 2013 From: djpine at gmail.com (David J Pine) Date: Wed, 4 Dec 2013 18:47:35 +0100 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <529F0E77.1040106@grinta.net> <529F2657.2050808@grinta.net> <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> Message-ID: Josef, Ok, so what would you propose? That we essentially replace linregress with linfit, and then let people calculate std_err and pvalue themselves from the covariance matrix that `linfit` returns? or something else? By the way, that's what I chose to do for the estimates of the uncertainties in the fitting parameters--to let the user calculate the uncertainties in the fitting parameters from square roots the diagonal elements of the covariance matrix. In my opinion, that results in a cleaner less cluttered function. David David On Wed, Dec 4, 2013 at 6:03 PM, wrote: > On Wed, Dec 4, 2013 at 11:29 AM, David Pine wrote: > > > > On Dec 4, 2013, at 3:24 PM, josef.pktd at gmail.com wrote: > > > > > > linfit looks like an enhanced version of linregress, which also has > > only one regressor, but doesn't have weights > > > http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html > > > > > > The problem with this is that the statistical tests that linregress > > uses--the r-value, p-value, & stderr-- are not really compatible with the > > weighted chi-squared fitting that linfit does. The r-value, p-value, & > > stderr are statistical tests that are used mostly in the social sciences > > (see http://en.wikipedia.org/wiki/Coefficient_of_determination). > Looking at > > linregress, it's clear that it was written with that community in mind. > > > > By contrast, linfit (and curve_fit) use the chi-squared measure of > goodness > > of fit, which is explicitly made to be used with weighted data. In my > > opinion, trying to satisfy the needs of both communities with one > function > > will result in inefficient code and confusion in both user communities. > > linfit naturally goes with the curve_fit and polyfit functions, and is > > implemented consistent with those fitting routines. linregress is > really a > > different animal, with statistical tests normally used with unweighted > data, > > and I suspect that the community that uses it will be put off by the > > "improvements" made by linfit. > > except for setting absolute_sigma to True or relsigma to False and > returning redchisq instead of rsquared, there is no real difference. > It's still just weighted least squares with fixed or estimated scale. > (In statsmodels we have most of the same statistics returned after WLS > as after OLS. However, allowing for a fixed scale is still not built > in.) > > You still return the cov of the parameter estimates, so users can > still calculate std_err and pvalue themselves in `linfit`. > > In my interpretation of the discussions around curve_fit, it seems to > me that it is now a version that both communities can use. > The only problem I see is that linfit/linregress get a bit ugly if > there are many optional returns. > > Josef > > > > > > > relsigma is similar to the new `absolute_sigma` in curve_fit > > https://github.com/scipy/scipy/pull/3098 > > > > > > That's right. linfit implements essentially the same functionality that > is > > being implemented in curve_fit > > > > > > > > I think linregress could be rewritten to include these improvements. > > > > Otherwise I keep out of any fitting debates, because I think `odr` is > > better for handling measurement errors in the x variables, and > > statsmodels is better for everything else (mainly linear only so far) > > and `lmfit` for nonlinear LS. > > There might be a case for stripped down convenience functions or > > special case functions. > > > > Josef > > > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From djpine at gmail.com Wed Dec 4 12:53:36 2013 From: djpine at gmail.com (David J Pine) Date: Wed, 4 Dec 2013 18:53:36 +0100 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <529F0E77.1040106@grinta.net> <529F2657.2050808@grinta.net> <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> Message-ID: Josef, Actually, I just rechecked curve_fit and it returns only the optimal fitting parameters and the covariance matrix. I could pare down linfit so that it returns only those quantities and leave it to the user to calculate chi0squared and the residuals. I suppose that's the cleanest way to go. David On Wed, Dec 4, 2013 at 6:47 PM, David J Pine wrote: > Josef, > > Ok, so what would you propose? That we essentially replace linregress > with linfit, and then let people calculate std_err and pvalue themselves > from the covariance matrix that `linfit` returns? or something else? By > the way, that's what I chose to do for the estimates of the uncertainties > in the fitting parameters--to let the user calculate the uncertainties in > the fitting parameters from square roots the diagonal elements of the > covariance matrix. In my opinion, that results in a cleaner less cluttered > function. > > David > > David > > > On Wed, Dec 4, 2013 at 6:03 PM, wrote: > >> On Wed, Dec 4, 2013 at 11:29 AM, David Pine wrote: >> > >> > On Dec 4, 2013, at 3:24 PM, josef.pktd at gmail.com wrote: >> > >> > >> > linfit looks like an enhanced version of linregress, which also has >> > only one regressor, but doesn't have weights >> > >> http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html >> > >> > >> > The problem with this is that the statistical tests that linregress >> > uses--the r-value, p-value, & stderr-- are not really compatible with >> the >> > weighted chi-squared fitting that linfit does. The r-value, p-value, & >> > stderr are statistical tests that are used mostly in the social sciences >> > (see http://en.wikipedia.org/wiki/Coefficient_of_determination). >> Looking at >> > linregress, it's clear that it was written with that community in mind. >> > >> > By contrast, linfit (and curve_fit) use the chi-squared measure of >> goodness >> > of fit, which is explicitly made to be used with weighted data. In my >> > opinion, trying to satisfy the needs of both communities with one >> function >> > will result in inefficient code and confusion in both user communities. >> > linfit naturally goes with the curve_fit and polyfit functions, and is >> > implemented consistent with those fitting routines. linregress is >> really a >> > different animal, with statistical tests normally used with unweighted >> data, >> > and I suspect that the community that uses it will be put off by the >> > "improvements" made by linfit. >> >> except for setting absolute_sigma to True or relsigma to False and >> returning redchisq instead of rsquared, there is no real difference. >> It's still just weighted least squares with fixed or estimated scale. >> (In statsmodels we have most of the same statistics returned after WLS >> as after OLS. However, allowing for a fixed scale is still not built >> in.) >> >> You still return the cov of the parameter estimates, so users can >> still calculate std_err and pvalue themselves in `linfit`. >> >> In my interpretation of the discussions around curve_fit, it seems to >> me that it is now a version that both communities can use. >> The only problem I see is that linfit/linregress get a bit ugly if >> there are many optional returns. >> >> Josef >> >> > >> > >> > relsigma is similar to the new `absolute_sigma` in curve_fit >> > https://github.com/scipy/scipy/pull/3098 >> > >> > >> > That's right. linfit implements essentially the same functionality >> that is >> > being implemented in curve_fit >> > >> > >> > >> > I think linregress could be rewritten to include these improvements. >> > >> > Otherwise I keep out of any fitting debates, because I think `odr` is >> > better for handling measurement errors in the x variables, and >> > statsmodels is better for everything else (mainly linear only so far) >> > and `lmfit` for nonlinear LS. >> > There might be a case for stripped down convenience functions or >> > special case functions. >> > >> > Josef >> > >> > >> > >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Dec 4 13:15:03 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 4 Dec 2013 13:15:03 -0500 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <529F0E77.1040106@grinta.net> <529F2657.2050808@grinta.net> <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> Message-ID: On Wed, Dec 4, 2013 at 12:53 PM, David J Pine wrote: > Josef, > > Actually, I just rechecked curve_fit and it returns only the optimal fitting > parameters and the covariance matrix. I could pare down linfit so that it > returns only those quantities and leave it to the user to calculate > chi0squared and the residuals. I suppose that's the cleanest way to go. > > David > > > On Wed, Dec 4, 2013 at 6:47 PM, David J Pine wrote: >> >> Josef, >> >> Ok, so what would you propose? That we essentially replace linregress >> with linfit, and then let people calculate std_err and pvalue themselves >> from the covariance matrix that `linfit` returns? or something else? By >> the way, that's what I chose to do for the estimates of the uncertainties in >> the fitting parameters--to let the user calculate the uncertainties in the >> fitting parameters from square roots the diagonal elements of the covariance >> matrix. In my opinion, that results in a cleaner less cluttered function. Please reply inline so we have the sub-threads together. two thoughts: - I'm getting more and more averse to functions that return "numbers" scipy.optimize minimize returns a dictionary In statsmodels we return a special class instance, that can calculate lazily all the extra things a user might want. And were we don't do that yet like in hypothesis test, we want to change it soon. The two main problems with returning numbers are that it cannot be changed in a backwards compatible way, and, second, if we want to offer a user to calculate additional optional results, then we need return_this, return_that, return_something_else, .... - The main usage of stats.linregress that I have seen in random looks at various packages, is just to get quick fit of a line without (m)any extras. In this case just returning the parameters and maybe some other minimal cheap extras is fine. I don't know if we want linfit to provide a one-stop shopping center, or just to provide some minimal results and leave the rest to the user. (In statsmodels I also often don't know what I should do. I follow the scipy tradition and return some numbers, only to change my mind later when I see what additional results could be easily calculated within the function, but I don't get access to the required calculations.) Josef >> >> David >> >> David >> >> >> On Wed, Dec 4, 2013 at 6:03 PM, wrote: >>> >>> On Wed, Dec 4, 2013 at 11:29 AM, David Pine wrote: >>> > >>> > On Dec 4, 2013, at 3:24 PM, josef.pktd at gmail.com wrote: >>> > >>> > >>> > linfit looks like an enhanced version of linregress, which also has >>> > only one regressor, but doesn't have weights >>> > >>> > http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html >>> > >>> > >>> > The problem with this is that the statistical tests that linregress >>> > uses--the r-value, p-value, & stderr-- are not really compatible with >>> > the >>> > weighted chi-squared fitting that linfit does. The r-value, p-value, >>> > & >>> > stderr are statistical tests that are used mostly in the social >>> > sciences >>> > (see http://en.wikipedia.org/wiki/Coefficient_of_determination). >>> > Looking at >>> > linregress, it's clear that it was written with that community in mind. >>> > >>> > By contrast, linfit (and curve_fit) use the chi-squared measure of >>> > goodness >>> > of fit, which is explicitly made to be used with weighted data. In my >>> > opinion, trying to satisfy the needs of both communities with one >>> > function >>> > will result in inefficient code and confusion in both user communities. >>> > linfit naturally goes with the curve_fit and polyfit functions, and is >>> > implemented consistent with those fitting routines. linregress is >>> > really a >>> > different animal, with statistical tests normally used with unweighted >>> > data, >>> > and I suspect that the community that uses it will be put off by the >>> > "improvements" made by linfit. >>> >>> except for setting absolute_sigma to True or relsigma to False and >>> returning redchisq instead of rsquared, there is no real difference. >>> It's still just weighted least squares with fixed or estimated scale. >>> (In statsmodels we have most of the same statistics returned after WLS >>> as after OLS. However, allowing for a fixed scale is still not built >>> in.) >>> >>> You still return the cov of the parameter estimates, so users can >>> still calculate std_err and pvalue themselves in `linfit`. >>> >>> In my interpretation of the discussions around curve_fit, it seems to >>> me that it is now a version that both communities can use. >>> The only problem I see is that linfit/linregress get a bit ugly if >>> there are many optional returns. >>> >>> Josef >>> >>> > >>> > >>> > relsigma is similar to the new `absolute_sigma` in curve_fit >>> > https://github.com/scipy/scipy/pull/3098 >>> > >>> > >>> > That's right. linfit implements essentially the same functionality >>> > that is >>> > being implemented in curve_fit >>> > >>> > >>> > >>> > I think linregress could be rewritten to include these improvements. >>> > >>> > Otherwise I keep out of any fitting debates, because I think `odr` is >>> > better for handling measurement errors in the x variables, and >>> > statsmodels is better for everything else (mainly linear only so far) >>> > and `lmfit` for nonlinear LS. >>> > There might be a case for stripped down convenience functions or >>> > special case functions. >>> > >>> > Josef >>> > >>> > >>> > >>> > _______________________________________________ >>> > SciPy-User mailing list >>> > SciPy-User at scipy.org >>> > http://mail.scipy.org/mailman/listinfo/scipy-user >>> > >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From newville at cars.uchicago.edu Wed Dec 4 13:24:12 2013 From: newville at cars.uchicago.edu (Matt Newville) Date: Wed, 4 Dec 2013 12:24:12 -0600 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <529F0E77.1040106@grinta.net> <529F2657.2050808@grinta.net> <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> Message-ID: Hi David, Josef, On Wed, Dec 4, 2013 at 12:15 PM, wrote: > On Wed, Dec 4, 2013 at 12:53 PM, David J Pine wrote: >> Josef, >> >> Actually, I just rechecked curve_fit and it returns only the optimal fitting >> parameters and the covariance matrix. I could pare down linfit so that it >> returns only those quantities and leave it to the user to calculate >> chi0squared and the residuals. I suppose that's the cleanest way to go. >> >> David >> >> >> On Wed, Dec 4, 2013 at 6:47 PM, David J Pine wrote: >>> >>> Josef, >>> >>> Ok, so what would you propose? That we essentially replace linregress >>> with linfit, and then let people calculate std_err and pvalue themselves >>> from the covariance matrix that `linfit` returns? or something else? By >>> the way, that's what I chose to do for the estimates of the uncertainties in >>> the fitting parameters--to let the user calculate the uncertainties in the >>> fitting parameters from square roots the diagonal elements of the covariance >>> matrix. In my opinion, that results in a cleaner less cluttered function. > > Please reply inline so we have the sub-threads together. > > two thoughts: > > - I'm getting more and more averse to functions that return "numbers" > scipy.optimize minimize returns a dictionary > In statsmodels we return a special class instance, that can > calculate lazily all the extra things a user might want. > And were we don't do that yet like in hypothesis test, we want to > change it soon. > The two main problems with returning numbers are that it cannot be > changed in a backwards compatible way, and, second, if we want to > offer a user to calculate additional optional results, then we need > return_this, return_that, return_something_else, .... > > - The main usage of stats.linregress that I have seen in random looks > at various packages, is just to get quick fit of a line without (m)any > extras. In this case just returning the parameters and maybe some > other minimal cheap extras is fine. > > > I don't know if we want linfit to provide a one-stop shopping center, > or just to provide some minimal results and leave the rest to the > user. > > (In statsmodels I also often don't know what I should do. I follow the > scipy tradition and return some numbers, only to change my mind later > when I see what additional results could be easily calculated within > the function, but I don't get access to the required calculations.) > > Josef > >>> >>> David >>> >>> David >>> >>> >>> On Wed, Dec 4, 2013 at 6:03 PM, wrote: >>>> >>>> On Wed, Dec 4, 2013 at 11:29 AM, David Pine wrote: >>>> > >>>> > On Dec 4, 2013, at 3:24 PM, josef.pktd at gmail.com wrote: >>>> > >>>> > >>>> > linfit looks like an enhanced version of linregress, which also has >>>> > only one regressor, but doesn't have weights >>>> > >>>> > http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html >>>> > >>>> > >>>> > The problem with this is that the statistical tests that linregress >>>> > uses--the r-value, p-value, & stderr-- are not really compatible with >>>> > the >>>> > weighted chi-squared fitting that linfit does. The r-value, p-value, >>>> > & >>>> > stderr are statistical tests that are used mostly in the social >>>> > sciences >>>> > (see http://en.wikipedia.org/wiki/Coefficient_of_determination). >>>> > Looking at >>>> > linregress, it's clear that it was written with that community in mind. >>>> > >>>> > By contrast, linfit (and curve_fit) use the chi-squared measure of >>>> > goodness >>>> > of fit, which is explicitly made to be used with weighted data. In my >>>> > opinion, trying to satisfy the needs of both communities with one >>>> > function >>>> > will result in inefficient code and confusion in both user communities. >>>> > linfit naturally goes with the curve_fit and polyfit functions, and is >>>> > implemented consistent with those fitting routines. linregress is >>>> > really a >>>> > different animal, with statistical tests normally used with unweighted >>>> > data, >>>> > and I suspect that the community that uses it will be put off by the >>>> > "improvements" made by linfit. >>>> >>>> except for setting absolute_sigma to True or relsigma to False and >>>> returning redchisq instead of rsquared, there is no real difference. >>>> It's still just weighted least squares with fixed or estimated scale. >>>> (In statsmodels we have most of the same statistics returned after WLS >>>> as after OLS. However, allowing for a fixed scale is still not built >>>> in.) >>>> >>>> You still return the cov of the parameter estimates, so users can >>>> still calculate std_err and pvalue themselves in `linfit`. >>>> >>>> In my interpretation of the discussions around curve_fit, it seems to >>>> me that it is now a version that both communities can use. >>>> The only problem I see is that linfit/linregress get a bit ugly if >>>> there are many optional returns. >>>> >>>> Josef >>>> >>>> > >>>> > >>>> > relsigma is similar to the new `absolute_sigma` in curve_fit >>>> > https://github.com/scipy/scipy/pull/3098 >>>> > >>>> > >>>> > That's right. linfit implements essentially the same functionality >>>> > that is >>>> > being implemented in curve_fit >>>> > >>>> > >>>> > >>>> > I think linregress could be rewritten to include these improvements. >>>> > >>>> > Otherwise I keep out of any fitting debates, because I think `odr` is >>>> > better for handling measurement errors in the x variables, and >>>> > statsmodels is better for everything else (mainly linear only so far) >>>> > and `lmfit` for nonlinear LS. >>>> > There might be a case for stripped down convenience functions or >>>> > special case functions. >>>> > >>>> > Josef >>>> > >>>> > >>>> > >>>> > _______________________________________________ >>>> > SciPy-User mailing list >>>> > SciPy-User at scipy.org >>>> > http://mail.scipy.org/mailman/listinfo/scipy-user >>>> > >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > I'd very much like to see having linfit() available in scipy. Would it be reasonable to add David's linfit() as "the more complete" version, and refactor linregress() to use linfit() and return it's current return tuple derived from the linfit results? Perhaps that what Josef is suggesting too. FWIW, I would generally prefer getting a dictionary of results as a return value instead of a tuple with more than 2 items. --Matt Newville From rnelsonchem at gmail.com Wed Dec 4 13:42:57 2013 From: rnelsonchem at gmail.com (Ryan Nelson) Date: Wed, 4 Dec 2013 13:42:57 -0500 Subject: [SciPy-User] Non-linear parameter optimization without least-squares In-Reply-To: <1386176282390-18956.post@n7.nabble.com> References: <1386176282390-18956.post@n7.nabble.com> Message-ID: If you know that leastsq squares and sums the return value from your error function, perhaps you could just modify the return value. def error(params, xi, yi): y0 = f(params, x0) return ( np.abs(yi - y0) )**0.5 This is probably really bad from a statistical point of view, but I guess it does what you want. I don't know if any of the other functions will use the absolute deviation. Ryan On Wed, Dec 4, 2013 at 11:58 AM, davidsousarj wrote: > Hi, I am working with python 2.7 using last updated versions of > scipy/numpy. I need to find the best parameters to minimize a function that > is like this: f(x) = A.x + c.eBx, where A and B are parameters and c in > constant. The function is non-linear, and i used to use the method > scipy.optimize.leastsq to perform this optimization: xi = np.array([list]) > yi = np.array([list]) p = [A0, B0] def error(params, xi, yi): y0 = > f(params, x0) return yi - y0 best_p, ok = scipy.optimize.leastsq(error, p, > args = (xi,yi)) print best_p But now I want optimize the parameters with a > different function, not the sum of deviations squared. If I want to use, > for example, the sum of the absolute values of the error, what function of > scipy I would use? Thank you. > ------------------------------ > View this message in context: Non-linear parameter optimization without > least-squares > Sent from the Scipy-User mailing list archiveat Nabble.com. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From djpine at gmail.com Wed Dec 4 14:13:51 2013 From: djpine at gmail.com (David J Pine) Date: Wed, 4 Dec 2013 20:13:51 +0100 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <529F0E77.1040106@grinta.net> <529F2657.2050808@grinta.net> <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> Message-ID: I guess my preference would be to write have linfit() be as similar to curve_fit() in outputs (and inputs in so far as it makes sense), and then if we decide we prefer another way of doing either the inputs or the outputs, then to do them in concert. I think there is real value in making the user interfaces of linfit() and curve_fit() consistent--it make the user's experience so much less confusing. As of right now, I am agnostic about whether or not the function returns a dictionary of results--although I am unsure of what you have in mind. How would you structure a dictionary of results? David On Wed, Dec 4, 2013 at 7:24 PM, Matt Newville wrote: > Hi David, Josef, > > On Wed, Dec 4, 2013 at 12:15 PM, wrote: > > On Wed, Dec 4, 2013 at 12:53 PM, David J Pine wrote: > >> Josef, > >> > >> Actually, I just rechecked curve_fit and it returns only the optimal > fitting > >> parameters and the covariance matrix. I could pare down linfit so that > it > >> returns only those quantities and leave it to the user to calculate > >> chi0squared and the residuals. I suppose that's the cleanest way to go. > >> > >> David > >> > >> > >> On Wed, Dec 4, 2013 at 6:47 PM, David J Pine wrote: > >>> > >>> Josef, > >>> > >>> Ok, so what would you propose? That we essentially replace linregress > >>> with linfit, and then let people calculate std_err and pvalue > themselves > >>> from the covariance matrix that `linfit` returns? or something else? > By > >>> the way, that's what I chose to do for the estimates of the > uncertainties in > >>> the fitting parameters--to let the user calculate the uncertainties in > the > >>> fitting parameters from square roots the diagonal elements of the > covariance > >>> matrix. In my opinion, that results in a cleaner less cluttered > function. > > > > Please reply inline so we have the sub-threads together. > > > > two thoughts: > > > > - I'm getting more and more averse to functions that return "numbers" > > scipy.optimize minimize returns a dictionary > > In statsmodels we return a special class instance, that can > > calculate lazily all the extra things a user might want. > > And were we don't do that yet like in hypothesis test, we want to > > change it soon. > > The two main problems with returning numbers are that it cannot be > > changed in a backwards compatible way, and, second, if we want to > > offer a user to calculate additional optional results, then we need > > return_this, return_that, return_something_else, .... > > > > - The main usage of stats.linregress that I have seen in random looks > > at various packages, is just to get quick fit of a line without (m)any > > extras. In this case just returning the parameters and maybe some > > other minimal cheap extras is fine. > > > > > > I don't know if we want linfit to provide a one-stop shopping center, > > or just to provide some minimal results and leave the rest to the > > user. > > > > (In statsmodels I also often don't know what I should do. I follow the > > scipy tradition and return some numbers, only to change my mind later > > when I see what additional results could be easily calculated within > > the function, but I don't get access to the required calculations.) > > > > Josef > > > >>> > >>> David > >>> > >>> David > >>> > >>> > >>> On Wed, Dec 4, 2013 at 6:03 PM, wrote: > >>>> > >>>> On Wed, Dec 4, 2013 at 11:29 AM, David Pine wrote: > >>>> > > >>>> > On Dec 4, 2013, at 3:24 PM, josef.pktd at gmail.com wrote: > >>>> > > >>>> > > >>>> > linfit looks like an enhanced version of linregress, which also has > >>>> > only one regressor, but doesn't have weights > >>>> > > >>>> > > http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html > >>>> > > >>>> > > >>>> > The problem with this is that the statistical tests that linregress > >>>> > uses--the r-value, p-value, & stderr-- are not really compatible > with > >>>> > the > >>>> > weighted chi-squared fitting that linfit does. The r-value, > p-value, > >>>> > & > >>>> > stderr are statistical tests that are used mostly in the social > >>>> > sciences > >>>> > (see http://en.wikipedia.org/wiki/Coefficient_of_determination). > >>>> > Looking at > >>>> > linregress, it's clear that it was written with that community in > mind. > >>>> > > >>>> > By contrast, linfit (and curve_fit) use the chi-squared measure of > >>>> > goodness > >>>> > of fit, which is explicitly made to be used with weighted data. In > my > >>>> > opinion, trying to satisfy the needs of both communities with one > >>>> > function > >>>> > will result in inefficient code and confusion in both user > communities. > >>>> > linfit naturally goes with the curve_fit and polyfit functions, and > is > >>>> > implemented consistent with those fitting routines. linregress is > >>>> > really a > >>>> > different animal, with statistical tests normally used with > unweighted > >>>> > data, > >>>> > and I suspect that the community that uses it will be put off by the > >>>> > "improvements" made by linfit. > >>>> > >>>> except for setting absolute_sigma to True or relsigma to False and > >>>> returning redchisq instead of rsquared, there is no real difference. > >>>> It's still just weighted least squares with fixed or estimated scale. > >>>> (In statsmodels we have most of the same statistics returned after WLS > >>>> as after OLS. However, allowing for a fixed scale is still not built > >>>> in.) > >>>> > >>>> You still return the cov of the parameter estimates, so users can > >>>> still calculate std_err and pvalue themselves in `linfit`. > >>>> > >>>> In my interpretation of the discussions around curve_fit, it seems to > >>>> me that it is now a version that both communities can use. > >>>> The only problem I see is that linfit/linregress get a bit ugly if > >>>> there are many optional returns. > >>>> > >>>> Josef > >>>> > >>>> > > >>>> > > >>>> > relsigma is similar to the new `absolute_sigma` in curve_fit > >>>> > https://github.com/scipy/scipy/pull/3098 > >>>> > > >>>> > > >>>> > That's right. linfit implements essentially the same functionality > >>>> > that is > >>>> > being implemented in curve_fit > >>>> > > >>>> > > >>>> > > >>>> > I think linregress could be rewritten to include these improvements. > >>>> > > >>>> > Otherwise I keep out of any fitting debates, because I think `odr` > is > >>>> > better for handling measurement errors in the x variables, and > >>>> > statsmodels is better for everything else (mainly linear only so > far) > >>>> > and `lmfit` for nonlinear LS. > >>>> > There might be a case for stripped down convenience functions or > >>>> > special case functions. > >>>> > > >>>> > Josef > >>>> > > >>>> > > >>>> > > >>>> > _______________________________________________ > >>>> > SciPy-User mailing list > >>>> > SciPy-User at scipy.org > >>>> > http://mail.scipy.org/mailman/listinfo/scipy-user > >>>> > > >>>> _______________________________________________ > >>>> SciPy-User mailing list > >>>> SciPy-User at scipy.org > >>>> http://mail.scipy.org/mailman/listinfo/scipy-user > >>> > >>> > >> > >> > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > I'd very much like to see having linfit() available in scipy. > Would it be reasonable to add David's linfit() as "the more complete" > version, and refactor linregress() to use linfit() and return it's > current return tuple derived from the linfit results? Perhaps that > what Josef is suggesting too. FWIW, I would generally prefer getting > a dictionary of results as a return value instead of a tuple with more > than 2 items. > > --Matt Newville > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From newville at cars.uchicago.edu Wed Dec 4 15:15:30 2013 From: newville at cars.uchicago.edu (Matt Newville) Date: Wed, 4 Dec 2013 14:15:30 -0600 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <529F0E77.1040106@grinta.net> <529F2657.2050808@grinta.net> <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> Message-ID: Hi David, On Wed, Dec 4, 2013 at 1:13 PM, David J Pine wrote: > I guess my preference would be to write have linfit() be as similar to > curve_fit() in outputs (and inputs in so far as it makes sense), and then if > we decide we prefer another way of doing either the inputs or the outputs, > then to do them in concert. I think there is real value in making the user > interfaces of linfit() and curve_fit() consistent--it make the user's > experience so much less confusing. As of right now, I am agnostic about > whether or not the function returns a dictionary of results--although I am > unsure of what you have in mind. How would you structure a dictionary of > results? > Using return (pbest, covar) seems reasonable. But, if you returned a dictionary, you could include a chi-square statistic and a residuals array. scipy.optimize.leastsq() returns 5 items: (pbest, covar, infodict, mesg, ier) with infodict being a dict with items 'nfev', 'fvec', 'fjac', 'ipvt', and 'qtf'. I think it's too late to change it, but it would have been nicer (IMHO) if it had returned a single dict instead: return {'best_values': pbest, 'covar': covar, 'nfev': infodict['nfev'], 'fvec': infodict['fvec'], 'fjac': infodict['fjac'], 'ipvt': infodict['ipvt'], 'qtf': infodict['qtf'], 'mesg': mesg, 'ier': ier} Similarly, linregress() returns a 5 element tuple. The problem with these is that you end up with long assignments slope, intercept, r_value, p_value, stderr = scipy.stats.linregress(xdata, ydata) in fact, you sort of have to do this, even for a quick and dirty result when slope and intercept are all that would be used later on. The central problem is these 5 returned values are now in your local namespace, but they are not really independent values. Instead, you could think about regression = scipy.stats.linregress(xdata, ydata) and get to any of the values from computing the regression you want. In short, if you had linfit() return a dictionary of values, you could put many statistics in it, and people who wanted to ignore some of them would be able to do so. FWIW, a named tuple would be fine alternative. I don't know if backward compatibility would prevent that in scipy. Anyway, it's just a suggestion.... --Matt From lthiberiol at gmail.com Wed Dec 4 15:19:32 2013 From: lthiberiol at gmail.com (Luiz Thiberio Rangel) Date: Wed, 4 Dec 2013 18:19:32 -0200 Subject: [SciPy-User] "Segmentation fault (core dumped)" when running scipy.spatial.distance.squareform Message-ID: Hi everyone, I am facing a problem when I try to manage some realy big data. When I try the same code using a small dataset it works prefectly. The code is: >>> import pandas as pd>>> from scipy.spatial import distance >>> content = pd.read_table('proteobacteria-gene_content.tab')>>> content = content.set_index('Species_name')>>> content = content.T>>> contentIndex: 83803 entries, Proteo_1 to Proteo_83803Columns: 468 entries, Methylomonas_methanica to Glaciecola_sp dtypes: int64(468) >>> j_distances = distance.pdist(content, metric='jaccard')>>> distance_matrix = distance.squareform(j_distances)Segmentation fault (core dumped) The j_distances array contains 3,511,429,503 float64 elements, and ocupy nearly 26Gb of space. I already tried to decrease the j_distance size using the "astype('float16')" function but the error goes on. Can anybody help me with it? (The numpy version is 1.6.1 and the scipy version is 0.13.1. I am running it in a 12.04 ubuntu server with 1Tb RAM.) Luiz Thib?rio Rangel -------------- next part -------------- An HTML attachment was scrubbed... URL: From djpine at gmail.com Wed Dec 4 17:00:31 2013 From: djpine at gmail.com (David J Pine) Date: Wed, 4 Dec 2013 23:00:31 +0100 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <529F0E77.1040106@grinta.net> <529F2657.2050808@grinta.net> <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> Message-ID: Ok, here are my thoughts about how to do the returns. They are informed by (1) the speed of linfit and (2) the above discussion. (1) Speed. linfit runs fastest when the residuals are not calculated. Calculating the residuals generally slows down linfit by a factor of 2 to 3 -- and it's the only thing that really slows it down. After that, all additional calculations consume negligible time. The residuals are calculated if: (a) residuals=True or chisq=True, or (b) if cov=True AND relsigma=True. Note that (b) means that the residuals are not calculated when cov=True AND relsigma=False (and residuals=False or chisq=False). (2) The consensus of the discussion seems to be that when a lot of things are returned by linfit, it's better to return everything as a dictionary. So here is what I propose: If the user only wants the optimal fitting parameters, or the user wants only the optimal fitting parameters and the covariance matrix, these can be returns as arrays. Otherwise, everything is returned, and returned as a dictionary. If we adopted this, then the only question is what the default setting would be, say return_all=False or return_all=True. I guess I would opt for return_all=False, the less verbose return option. Adopting these way of doing things would simplify the arguments of linfit, which would now look like linfit(x, y, sigmay=None, relsigma=True, return_all=False) I would also modify linfit to calculate the r-value, p-value, and the stderr, which would all be returned in dictionary format when return_all=True. How does this sound? David On Wed, Dec 4, 2013 at 9:15 PM, Matt Newville wrote: > Hi David, > > On Wed, Dec 4, 2013 at 1:13 PM, David J Pine wrote: > > I guess my preference would be to write have linfit() be as similar to > > curve_fit() in outputs (and inputs in so far as it makes sense), and > then if > > we decide we prefer another way of doing either the inputs or the > outputs, > > then to do them in concert. I think there is real value in making the > user > > interfaces of linfit() and curve_fit() consistent--it make the user's > > experience so much less confusing. As of right now, I am agnostic about > > whether or not the function returns a dictionary of results--although I > am > > unsure of what you have in mind. How would you structure a dictionary of > > results? > > > > Using return (pbest, covar) seems reasonable. But, if you > returned a dictionary, you could include a chi-square statistic and a > residuals array. > > scipy.optimize.leastsq() returns 5 items: (pbest, covar, infodict, mesg, > ier) > with infodict being a dict with items 'nfev', 'fvec', 'fjac', 'ipvt', > and 'qtf'. I think it's too late to change it, but it would have > been nicer (IMHO) if it had returned a single dict instead: > > return {'best_values': pbest, 'covar': covar, 'nfev': > infodict['nfev'], 'fvec': infodict['fvec'], > 'fjac': infodict['fjac'], 'ipvt': infodict['ipvt'], > 'qtf': infodict['qtf'], 'mesg': mesg, 'ier': ier} > > Similarly, linregress() returns a 5 element tuple. The problem with > these is that you end up with long assignments > slope, intercept, r_value, p_value, stderr = > scipy.stats.linregress(xdata, ydata) > > in fact, you sort of have to do this, even for a quick and dirty > result when slope and intercept are all that would be used later on. > The central problem is these 5 returned values are now in your local > namespace, but they are not really independent values. Instead, you > could think about > regression = scipy.stats.linregress(xdata, ydata) > > and get to any of the values from computing the regression you want. > In short, if you > had linfit() return a dictionary of values, you could put many > statistics in it, and people who wanted to ignore some of them would > be able to do so. > > FWIW, a named tuple would be fine alternative. I don't know if > backward compatibility would prevent that in scipy. Anyway, it's > just a suggestion.... > > --Matt > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Wed Dec 4 17:29:43 2013 From: daniele at grinta.net (Daniele Nicolodi) Date: Wed, 04 Dec 2013 23:29:43 +0100 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <529F0E77.1040106@grinta.net> <529F2657.2050808@grinta.net> <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> Message-ID: <529FACD7.90801@grinta.net> On 04/12/2013 23:00, David J Pine wrote: > Otherwise, everything is returned, and returned as a dictionary. I'll repeat myself: a named tuple is the way to go, not a dictionary. Cheers, Daniele From josef.pktd at gmail.com Wed Dec 4 17:42:43 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 4 Dec 2013 17:42:43 -0500 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: <529FACD7.90801@grinta.net> References: <529F0E77.1040106@grinta.net> <529F2657.2050808@grinta.net> <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> <529FACD7.90801@grinta.net> Message-ID: On Wed, Dec 4, 2013 at 5:29 PM, Daniele Nicolodi wrote: > On 04/12/2013 23:00, David J Pine wrote: >> Otherwise, everything is returned, and returned as a dictionary. > > I'll repeat myself: a named tuple is the way to go, not a dictionary. namedtuples have the disadvantage that users use tuple unpacking and then it breaks again backwards compatibility if any return is changed in future. Josef > > Cheers, > Daniele > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From daniele at grinta.net Wed Dec 4 18:00:47 2013 From: daniele at grinta.net (Daniele Nicolodi) Date: Thu, 05 Dec 2013 00:00:47 +0100 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <529F2657.2050808@grinta.net> <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> <529FACD7.90801@grinta.net> Message-ID: <529FB41F.9040109@grinta.net> On 04/12/2013 23:42, josef.pktd at gmail.com wrote: > On Wed, Dec 4, 2013 at 5:29 PM, Daniele Nicolodi wrote: >> On 04/12/2013 23:00, David J Pine wrote: >>> Otherwise, everything is returned, and returned as a dictionary. >> >> I'll repeat myself: a named tuple is the way to go, not a dictionary. > > namedtuples have the disadvantage that users use tuple unpacking and > then it breaks again backwards compatibility if any return is changed > in future. I frankly don't see how someone may want to extend the interface of a linear fitting routine to return more information in the future. I think all the information required to design the interface right is already available. Cheers, Daniele From gdmcbain at freeshell.org Wed Dec 4 18:02:05 2013 From: gdmcbain at freeshell.org (Geordie McBain) Date: Thu, 5 Dec 2013 10:02:05 +1100 Subject: [SciPy-User] Non-linear parameter optimization without least-squares In-Reply-To: References: <1386176282390-18956.post@n7.nabble.com> Message-ID: 2013/12/5 Ryan Nelson : > If you know that leastsq squares and sums the return value from your error > function, perhaps you could just modify the return value. > > def error(params, xi, yi): > y0 = f(params, x0) > return ( np.abs(yi - y0) )**0.5 > > This is probably really bad from a statistical point of view, but I guess it > does what you want. I don't know if any of the other functions will use the > absolute deviation. Least absolute deviation is a special case of quantile regression; I don't know of any function in SciPy to do this, but there is statsmodels.regression.quantile_regression.QuantReg. http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.quantile_regression.QuantReg.html -- G. D. McBain Theory of Lift - Introductory Computational Aerodynamics in MATLAB/Octave Out now - http://www.wileyeurope.com/remtitle.cgi?111995228X From djpine at gmail.com Wed Dec 4 18:21:53 2013 From: djpine at gmail.com (David J Pine) Date: Thu, 5 Dec 2013 00:21:53 +0100 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: <529FB41F.9040109@grinta.net> References: <529F2657.2050808@grinta.net> <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> <529FACD7.90801@grinta.net> <529FB41F.9040109@grinta.net> Message-ID: Of course, that's the point of designing for backwards compatibility--you don't see the need for more information when you write the code, otherwise you would include it. But as code gets used, you sometimes see things you didn't see before. So it's good to write code that allows for unforeseen changes. On Thu, Dec 5, 2013 at 12:00 AM, Daniele Nicolodi wrote: > On 04/12/2013 23:42, josef.pktd at gmail.com wrote: > > On Wed, Dec 4, 2013 at 5:29 PM, Daniele Nicolodi > wrote: > >> On 04/12/2013 23:00, David J Pine wrote: > >>> Otherwise, everything is returned, and returned as a dictionary. > >> > >> I'll repeat myself: a named tuple is the way to go, not a dictionary. > > > > namedtuples have the disadvantage that users use tuple unpacking and > > then it breaks again backwards compatibility if any return is changed > > in future. > > I frankly don't see how someone may want to extend the interface of a > linear fitting routine to return more information in the future. I > think all the information required to design the interface right is > already available. > > Cheers, > Daniele > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Wed Dec 4 18:58:56 2013 From: daniele at grinta.net (Daniele Nicolodi) Date: Thu, 05 Dec 2013 00:58:56 +0100 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> <529FACD7.90801@grinta.net> <529FB41F.9040109@grinta.net> Message-ID: <529FC1C0.5030901@grinta.net> On 05/12/2013 00:21, David J Pine wrote: > Of course, that's the point of designing for backwards > compatibility--you don't see the need for more information when you > write the code, otherwise you would include it. But as code gets used, > you sometimes see things you didn't see before. So it's good to write > code that allows for unforeseen changes. If this is the reasoning, all functions or methods should return dictionaries. PS: is it so hard to stop top-posting? Cheers, Daniele From alan.isaac at gmail.com Wed Dec 4 19:25:14 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Wed, 04 Dec 2013 19:25:14 -0500 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: <529FC1C0.5030901@grinta.net> References: <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> <529FACD7.90801@grinta.net> <529FB41F.9040109@grinta.net> <529FC1C0.5030901@grinta.net> Message-ID: <529FC7EA.3070408@gmail.com> > On 05/12/2013 00:21, David J Pine wrote: >> Of course, that's the point of designing for backwards >> compatibility--you don't see the need for more information when you >> write the code, otherwise you would include it. But as code gets used, >> you sometimes see things you didn't see before. So it's good to write >> code that allows for unforeseen changes. On 12/4/2013 6:58 PM, Daniele Nicolodi wrote: > If this is the reasoning, all functions or methods should return > dictionaries. Indeed, we have just seen (in this thread, if I recall correctly) a lament that some optimization functions were not written initially to return dictionaries. It seems to me that the case for a named tuple will rely primarily on its being "lightweight". The case for a dictionary will rely on its flexibility. In most settings that I imagine, flexibility will be the greater concern. But not all. So for a specific function or method, it seems useful to explain why the trade-offs favor one or the other. fwiw, Alan Isaac From josef.pktd at gmail.com Wed Dec 4 19:26:28 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 4 Dec 2013 19:26:28 -0500 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: <529FC1C0.5030901@grinta.net> References: <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> <529FACD7.90801@grinta.net> <529FB41F.9040109@grinta.net> <529FC1C0.5030901@grinta.net> Message-ID: On Wed, Dec 4, 2013 at 6:58 PM, Daniele Nicolodi wrote: > On 05/12/2013 00:21, David J Pine wrote: >> Of course, that's the point of designing for backwards >> compatibility--you don't see the need for more information when you >> write the code, otherwise you would include it. But as code gets used, >> you sometimes see things you didn't see before. So it's good to write >> code that allows for unforeseen changes. > > If this is the reasoning, all functions or methods should return > dictionaries. some functions are reasonable targeted that we don't expect many changes. I wouldn't know what else numpy.sum could return. (numpy.nanmean also does the count of the non-nans but doesn't return it.) we copied numpy.linalg.pinv into statsmodels because it doesn't give as the singular values. scipy.linalg got the change to optionally return the rank, with new keyword `return_rank` Sometimes the reply on issues in scipy is that we cannot add to the return or change it because it's not backwards compatible. I would be happy if I could change the returns of stats.linregress. In the case of linfit or curve_fit, there are many possible additional returns that we might want to add if the demand is large enough. Last time there was a question, I argued against curve_fit returning the std_err, i.e. np.sqrt(np.diag(pcov)) to keep it as just a minimal fitting function. I'm not a big fan of dictionaries because I don't like to type [" "] instead of just a dot. Josef > > PS: is it so hard to stop top-posting? > > Cheers, > Daniele > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From djpine at gmail.com Thu Dec 5 03:27:23 2013 From: djpine at gmail.com (David J Pine) Date: Thu, 5 Dec 2013 09:27:23 +0100 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> <529FACD7.90801@grinta.net> <529FB41F.9040109@grinta.net> <529FC1C0.5030901@grinta.net> Message-ID: On Thu, Dec 5, 2013 at 1:26 AM, wrote: > On Wed, Dec 4, 2013 at 6:58 PM, Daniele Nicolodi > wrote: > > On 05/12/2013 00:21, David J Pine wrote: > >> Of course, that's the point of designing for backwards > >> compatibility--you don't see the need for more information when you > >> write the code, otherwise you would include it. But as code gets used, > >> you sometimes see things you didn't see before. So it's good to write > >> code that allows for unforeseen changes. > > > > If this is the reasoning, all functions or methods should return > > dictionaries. > > some functions are reasonable targeted that we don't expect many changes. > I wouldn't know what else numpy.sum could return. > (numpy.nanmean also does the count of the non-nans but doesn't return it.) > we copied numpy.linalg.pinv into statsmodels because it doesn't give > as the singular values. > scipy.linalg got the change to optionally return the rank, with new > keyword `return_rank` > > Sometimes the reply on issues in scipy is that we cannot add to the > return or change it because it's not backwards compatible. I would be > happy if I could change the returns of stats.linregress. > > In the case of linfit or curve_fit, there are many possible additional > returns that we might want to add if the demand is large enough. > Last time there was a question, I argued against curve_fit returning > the std_err, i.e. np.sqrt(np.diag(pcov)) to keep it as just a minimal > fitting function. > > I'm not a big fan of dictionaries because I don't like to type [" "] > instead of just a dot. > > Josef > > After all of this discussion, I find myself wanting to opt for a simple, clean, set of returns, namely the fitting parameters as a 2-element array and the covariance matrix as a 2x2 array. Then I would just include in the docstring instructions about how to calculate the uncertainties in the fitting parameters (std_err), the r-value, chi-squared, etc. Alternatively, we could have linfit always return the fitting parameters and the covariance matrix as described above, and then a dictionary, with all the ancillary outputs, that could be returned if a 'return_all' switch was set to True. That way, with return_all=False, linfit could be used in a fast, lean mode, and with return_all=True, users could get all the other stuff in a dictionary to which later additions could be made as demand dictated. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at hilboll.de Thu Dec 5 04:33:37 2013 From: lists at hilboll.de (Andreas Hilboll) Date: Thu, 05 Dec 2013 10:33:37 +0100 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <529FACD7.90801@grinta.net> <529FB41F.9040109@grinta.net> <529FC1C0.5030901@grinta.net> Message-ID: <52A04871.6010104@hilboll.de> On 05.12.2013 09:27, David J Pine wrote: > > > > On Thu, Dec 5, 2013 at 1:26 AM, > wrote: > > On Wed, Dec 4, 2013 at 6:58 PM, Daniele Nicolodi > wrote: > > On 05/12/2013 00:21, David J Pine wrote: > >> Of course, that's the point of designing for backwards > >> compatibility--you don't see the need for more information when you > >> write the code, otherwise you would include it. But as code gets > used, > >> you sometimes see things you didn't see before. So it's good to > write > >> code that allows for unforeseen changes. > > > > If this is the reasoning, all functions or methods should return > > dictionaries. > > some functions are reasonable targeted that we don't expect many > changes. > I wouldn't know what else numpy.sum could return. > (numpy.nanmean also does the count of the non-nans but doesn't > return it.) > we copied numpy.linalg.pinv into statsmodels because it doesn't give > as the singular values. > scipy.linalg got the change to optionally return the rank, with new > keyword `return_rank` > > Sometimes the reply on issues in scipy is that we cannot add to the > return or change it because it's not backwards compatible. I would be > happy if I could change the returns of stats.linregress. > > In the case of linfit or curve_fit, there are many possible additional > returns that we might want to add if the demand is large enough. > Last time there was a question, I argued against curve_fit returning > the std_err, i.e. np.sqrt(np.diag(pcov)) to keep it as just a minimal > fitting function. > > I'm not a big fan of dictionaries because I don't like to type [" "] > instead of just a dot. > > Josef > > > After all of this discussion, I find myself wanting to opt for a simple, > clean, set of returns, namely the fitting parameters as a 2-element > array and the covariance matrix as a 2x2 array. > > Then I would just include in the docstring instructions about how to > calculate the uncertainties in the fitting parameters (std_err), the > r-value, chi-squared, etc. Even though that's simple, I personally find it inconvenient. Plus, if the user has to calculate uncertainties by herself, that's maybe not really error-prone (because simple and explained in the docstring), but still un-tested. And the more tested code there is, the better. > Alternatively, we could have linfit always return the fitting parameters > and the covariance matrix as described above, and then a dictionary, > with all the ancillary outputs, that could be returned if a 'return_all' > switch was set to True. That way, with return_all=False, linfit could > be used in a fast, lean mode, and with return_all=True, users could get > all the other stuff in a dictionary to which later additions could be > made as demand dictated. +1 Andreas. From djpine at gmail.com Thu Dec 5 05:27:17 2013 From: djpine at gmail.com (David J Pine) Date: Thu, 5 Dec 2013 11:27:17 +0100 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: <52A04871.6010104@hilboll.de> References: <529FACD7.90801@grinta.net> <529FB41F.9040109@grinta.net> <529FC1C0.5030901@grinta.net> <52A04871.6010104@hilboll.de> Message-ID: On Thu, Dec 5, 2013 at 10:33 AM, Andreas Hilboll wrote: > On 05.12.2013 09:27, David J Pine wrote: > > > > > > > > On Thu, Dec 5, 2013 at 1:26 AM, > > wrote: > > > > On Wed, Dec 4, 2013 at 6:58 PM, Daniele Nicolodi > > wrote: > > > On 05/12/2013 00:21, David J Pine wrote: > > >> Of course, that's the point of designing for backwards > > >> compatibility--you don't see the need for more information when > you > > >> write the code, otherwise you would include it. But as code gets > > used, > > >> you sometimes see things you didn't see before. So it's good to > > write > > >> code that allows for unforeseen changes. > > > > > > If this is the reasoning, all functions or methods should return > > > dictionaries. > > > > some functions are reasonable targeted that we don't expect many > > changes. > > I wouldn't know what else numpy.sum could return. > > (numpy.nanmean also does the count of the non-nans but doesn't > > return it.) > > we copied numpy.linalg.pinv into statsmodels because it doesn't give > > as the singular values. > > scipy.linalg got the change to optionally return the rank, with new > > keyword `return_rank` > > > > Sometimes the reply on issues in scipy is that we cannot add to the > > return or change it because it's not backwards compatible. I would be > > happy if I could change the returns of stats.linregress. > > > > In the case of linfit or curve_fit, there are many possible > additional > > returns that we might want to add if the demand is large enough. > > Last time there was a question, I argued against curve_fit returning > > the std_err, i.e. np.sqrt(np.diag(pcov)) to keep it as just a minimal > > fitting function. > > > > I'm not a big fan of dictionaries because I don't like to type [" > "] > > instead of just a dot. > > > > Josef > > > > > > After all of this discussion, I find myself wanting to opt for a simple, > > clean, set of returns, namely the fitting parameters as a 2-element > > array and the covariance matrix as a 2x2 array. > > > > Then I would just include in the docstring instructions about how to > > calculate the uncertainties in the fitting parameters (std_err), the > > r-value, chi-squared, etc. > > Even though that's simple, I personally find it inconvenient. Plus, if > the user has to calculate uncertainties by herself, that's maybe not > really error-prone (because simple and explained in the docstring), but > still un-tested. And the more tested code there is, the better. > > > Alternatively, we could have linfit always return the fitting parameters > > and the covariance matrix as described above, and then a dictionary, > > with all the ancillary outputs, that could be returned if a 'return_all' > > switch was set to True. That way, with return_all=False, linfit could > > be used in a fast, lean mode, and with return_all=True, users could get > > all the other stuff in a dictionary to which later additions could be > > made as demand dictated. > > +1 > > Andreas. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Good point about tested calculations. So I take that as a vote for including the ancillary outputs in a dictionary! David -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Thu Dec 5 05:43:27 2013 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 5 Dec 2013 10:43:27 +0000 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> <529FACD7.90801@grinta.net> <529FB41F.9040109@grinta.net> <529FC1C0.5030901@grinta.net> Message-ID: On Thu, Dec 5, 2013 at 12:26 AM, wrote: > On Wed, Dec 4, 2013 at 6:58 PM, Daniele Nicolodi wrote: >> On 05/12/2013 00:21, David J Pine wrote: >>> Of course, that's the point of designing for backwards >>> compatibility--you don't see the need for more information when you >>> write the code, otherwise you would include it. But as code gets used, >>> you sometimes see things you didn't see before. So it's good to write >>> code that allows for unforeseen changes. >> >> If this is the reasoning, all functions or methods should return >> dictionaries. > > some functions are reasonable targeted that we don't expect many changes. > I wouldn't know what else numpy.sum could return. > (numpy.nanmean also does the count of the non-nans but doesn't return it.) > we copied numpy.linalg.pinv into statsmodels because it doesn't give > as the singular values. > scipy.linalg got the change to optionally return the rank, with new > keyword `return_rank` > > Sometimes the reply on issues in scipy is that we cannot add to the > return or change it because it's not backwards compatible. I would be > happy if I could change the returns of stats.linregress. > > In the case of linfit or curve_fit, there are many possible additional > returns that we might want to add if the demand is large enough. > Last time there was a question, I argued against curve_fit returning > the std_err, i.e. np.sqrt(np.diag(pcov)) to keep it as just a minimal > fitting function. > > I'm not a big fan of dictionaries because I don't like to type [" "] > instead of just a dot. > Maybe this is an opportunity to start introducing the Bunch pattern into scipy. From what I remember, the tuple returns were encouraged because scipy is a "library," though, of course, this leads to all the problems already pointed out. And it seems silly to just stick to this policy for its own sake. I've been using R quite a bit recently for some work, and they often return R lists, which are essentially what we get from a Bunch. It's quite nice, and I'm now finding tuples of returns to be pretty rough. Skipper From evgeny.burovskiy at gmail.com Thu Dec 5 06:48:27 2013 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Thu, 5 Dec 2013 11:48:27 +0000 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <093C8355-C530-408B-A55D-BA7105688DFD@gmail.com> <529FACD7.90801@grinta.net> <529FB41F.9040109@grinta.net> <529FC1C0.5030901@grinta.net> Message-ID: IMO it would be good to have some degree of uniformity between `minimize`, `curve_fit` and this new routine. Both in terms of return types (`optimize.Result`?) and in terms of keyword arguments (`full_output` etc). On Thu, Dec 5, 2013 at 10:43 AM, Skipper Seabold wrote: > On Thu, Dec 5, 2013 at 12:26 AM, wrote: > > On Wed, Dec 4, 2013 at 6:58 PM, Daniele Nicolodi > wrote: > >> On 05/12/2013 00:21, David J Pine wrote: > >>> Of course, that's the point of designing for backwards > >>> compatibility--you don't see the need for more information when you > >>> write the code, otherwise you would include it. But as code gets used, > >>> you sometimes see things you didn't see before. So it's good to write > >>> code that allows for unforeseen changes. > >> > >> If this is the reasoning, all functions or methods should return > >> dictionaries. > > > > some functions are reasonable targeted that we don't expect many changes. > > I wouldn't know what else numpy.sum could return. > > (numpy.nanmean also does the count of the non-nans but doesn't return > it.) > > we copied numpy.linalg.pinv into statsmodels because it doesn't give > > as the singular values. > > scipy.linalg got the change to optionally return the rank, with new > > keyword `return_rank` > > > > Sometimes the reply on issues in scipy is that we cannot add to the > > return or change it because it's not backwards compatible. I would be > > happy if I could change the returns of stats.linregress. > > > > In the case of linfit or curve_fit, there are many possible additional > > returns that we might want to add if the demand is large enough. > > Last time there was a question, I argued against curve_fit returning > > the std_err, i.e. np.sqrt(np.diag(pcov)) to keep it as just a minimal > > fitting function. > > > > I'm not a big fan of dictionaries because I don't like to type [" "] > > instead of just a dot. > > > > Maybe this is an opportunity to start introducing the Bunch pattern > into scipy. From what I remember, the tuple returns were encouraged > because scipy is a "library," though, of course, this leads to all the > problems already pointed out. And it seems silly to just stick to this > policy for its own sake. > > I've been using R quite a bit recently for some work, and they often > return R lists, which are essentially what we get from a Bunch. It's > quite nice, and I'm now finding tuples of returns to be pretty rough. > > Skipper > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From af.charles.pierre at gmail.com Thu Dec 5 08:47:47 2013 From: af.charles.pierre at gmail.com (Charles Pierre) Date: Thu, 5 Dec 2013 14:47:47 +0100 Subject: [SciPy-User] lstsq/Scipy and python multiprocessing Message-ID: I was trying to do some simple multivariate regression using sklearn.linear_model and mutliprocessing module when i found this really confusing behavior. For some reason, the linear regression seems to be broken for particular input vectors when using multiprocessing. Using the same training set without multiprocessing yields correct values ... Here a piece of code that demonstrates this weird behavior: import multiprocessingfrom sklearn import linear_model def test_without_multi(input_x,input_y): clf = linear_model.LinearRegression(normalize=True) clf.fit(input_x, input_y, n_jobs=1) print clf.coef_ def test_with_multi(input_x,input_y): process = multiprocessing.Process(target=test_without_multi,args=(input_x,input_y)) process.start() process.join() if __name__ == '__main__': input_x = [[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[1,1350]] input_y = [2,1,1,2,3,1,3,2,1] test_without_multi(input_x,input_y) test_with_multi(input_x,input_y) Does anyone know what is happening ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Thu Dec 5 09:13:27 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Thu, 05 Dec 2013 09:13:27 -0500 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <529FACD7.90801@grinta.net> <529FB41F.9040109@grinta.net> <529FC1C0.5030901@grinta.net> Message-ID: <52A08A07.3010909@gmail.com> On 12/5/2013 5:43 AM, Skipper Seabold wrote: > Maybe this is an opportunity to start introducing the Bunch pattern > into scipy. Pursuing this train of thought one step further, I like results objects that can do lazy evaluation of expensive results. Alan Isaac From newville at cars.uchicago.edu Thu Dec 5 09:49:24 2013 From: newville at cars.uchicago.edu (Matt Newville) Date: Thu, 5 Dec 2013 08:49:24 -0600 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: <52A08A07.3010909@gmail.com> References: <529FACD7.90801@grinta.net> <529FB41F.9040109@grinta.net> <529FC1C0.5030901@grinta.net> <52A08A07.3010909@gmail.com> Message-ID: On Thu, Dec 5, 2013 at 8:13 AM, Alan G Isaac wrote: > On 12/5/2013 5:43 AM, Skipper Seabold wrote: >> Maybe this is an opportunity to start introducing the Bunch pattern >> into scipy. > > Pursuing this train of thought one step further, > I like results objects that can do lazy evaluation > of expensive results. > > Alan Isaac > Yes, I think this would be very helpful. Returning a "Result" that was an otherwise empty class instance that could possibly have methods to calculate derived values would be a nice approach. A nice possibility is that curve_fit() could simply extend the Results class returned from minimize(), so that the minimize() values were available if needed. I don't like the 'full_output' options that change the number or quantity of output values. Consistency in return values between different functions with related purpose would very, very helpful, but consistency in return values for a single function seems like it should be a requirement. Of course, breaking existing APIs is also bad, but I would suggest not adding any more 'full_output' options. That said, in the case of linfit(), returning just two values (best_values, covariance) seems completely acceptable to me. --Matt Newville From flying-sheep at web.de Thu Dec 5 10:09:46 2013 From: flying-sheep at web.de (Philipp A.) Date: Thu, 5 Dec 2013 16:09:46 +0100 Subject: [SciPy-User] Optimiation does nothing Message-ID: Hi, i?m trying to use scipy?s optimization using a negative loglikelihood function, but it immediately ?converges? with a gradient of 1e+23. here?s the notebook if you could have a look: http://nbviewer.ipython.org/gist/flying-sheep/7806554 (just ignore the data, code is after that) as you can see, matlab?s fmincon found a much better solution, and there?s no bug in my loglikelihood function (i use scipy 1.14?s multivariate normal log-probability density function, and it indeed rates matlabs optimum better) what should i modify to make the optimizer do its job? best regards, philipp -------------- next part -------------- An HTML attachment was scrubbed... URL: From guziy.sasha at gmail.com Thu Dec 5 10:10:18 2013 From: guziy.sasha at gmail.com (Oleksandr Huziy) Date: Thu, 5 Dec 2013 10:10:18 -0500 Subject: [SciPy-User] lstsq/Scipy and python multiprocessing In-Reply-To: References: Message-ID: Hi: This code gives me the same answer in both cases. [ -2.85439413e+14 2.11436602e+11] [ -2.85439413e+14 2.11436602e+11] sklearn.__version__ = '0.14.1' multiprocessing.__version__ = '0.70a1' Cheers 2013/12/5 Charles Pierre > I was trying to do some simple multivariate regression using > sklearn.linear_model and mutliprocessing module when i found this really > confusing behavior. > > For some reason, the linear regression seems to be broken for particular > input vectors when using multiprocessing. Using the same training set > without multiprocessing yields correct values ... > > Here a piece of code that demonstrates this weird behavior: > > import multiprocessingfrom sklearn import linear_model > def test_without_multi(input_x,input_y): > clf = linear_model.LinearRegression(normalize=True) > clf.fit(input_x, > input_y, > n_jobs=1) > print clf.coef_ > def test_with_multi(input_x,input_y): > process = multiprocessing.Process(target=test_without_multi,args=(input_x,input_y)) > process.start() > process.join() > if __name__ == '__main__': > input_x = [[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[1,1350]] > input_y = [2,1,1,2,3,1,3,2,1] > test_without_multi(input_x,input_y) > test_with_multi(input_x,input_y) > > Does anyone know what is happening ? > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- Sasha -------------- next part -------------- An HTML attachment was scrubbed... URL: From af.charles.pierre at gmail.com Thu Dec 5 10:15:51 2013 From: af.charles.pierre at gmail.com (Charles Pierre) Date: Thu, 5 Dec 2013 16:15:51 +0100 Subject: [SciPy-User] lstsq/Scipy and python multiprocessing In-Reply-To: References: Message-ID: Hi, Thanks for trying out the code. In my case, i get : [ -4.37499997e-01 -3.24074065e-04] [ -2.85439412e+14 2.11436597e+11] I am still confused on why i don't get the same result for both methods... 2013/12/5 Oleksandr Huziy > Hi: > > This code gives me the same answer in both cases. > > [ -2.85439413e+14 2.11436602e+11] > [ -2.85439413e+14 2.11436602e+11] > > > > sklearn.__version__ = '0.14.1' > multiprocessing.__version__ = '0.70a1' > > Cheers > > > > 2013/12/5 Charles Pierre > >> I was trying to do some simple multivariate regression using >> sklearn.linear_model and mutliprocessing module when i found this really >> confusing behavior. >> >> For some reason, the linear regression seems to be broken for particular >> input vectors when using multiprocessing. Using the same training set >> without multiprocessing yields correct values ... >> >> Here a piece of code that demonstrates this weird behavior: >> >> import multiprocessingfrom sklearn import linear_model >> def test_without_multi(input_x,input_y): >> clf = linear_model.LinearRegression(normalize=True) >> clf.fit(input_x, >> input_y, >> n_jobs=1) >> print clf.coef_ >> def test_with_multi(input_x,input_y): >> process = multiprocessing.Process(target=test_without_multi,args=(input_x,input_y)) >> process.start() >> process.join() >> if __name__ == '__main__': >> input_x = [[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[1,1350]] >> input_y = [2,1,1,2,3,1,3,2,1] >> test_without_multi(input_x,input_y) >> test_with_multi(input_x,input_y) >> >> Does anyone know what is happening ? >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > > -- > Sasha > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guziy.sasha at gmail.com Thu Dec 5 10:23:35 2013 From: guziy.sasha at gmail.com (Oleksandr Huziy) Date: Thu, 5 Dec 2013 10:23:35 -0500 Subject: [SciPy-User] lstsq/Scipy and python multiprocessing In-Reply-To: References: Message-ID: My scipy version is '0.12.0', I do not know maybe it is a bug in sklearn, scipy or multiprocessing, what are the versions you use? 2013/12/5 Charles Pierre > Hi, > Thanks for trying out the code. > In my case, i get : > > [ -4.37499997e-01 -3.24074065e-04] > [ -2.85439412e+14 2.11436597e+11] > > I am still confused on why i don't get the same result for both methods... > > > > 2013/12/5 Oleksandr Huziy > >> Hi: >> >> This code gives me the same answer in both cases. >> >> [ -2.85439413e+14 2.11436602e+11] >> [ -2.85439413e+14 2.11436602e+11] >> >> >> >> sklearn.__version__ = '0.14.1' >> multiprocessing.__version__ = '0.70a1' >> >> Cheers >> >> >> >> 2013/12/5 Charles Pierre >> >>> I was trying to do some simple multivariate regression using >>> sklearn.linear_model and mutliprocessing module when i found this really >>> confusing behavior. >>> >>> For some reason, the linear regression seems to be broken for particular >>> input vectors when using multiprocessing. Using the same training set >>> without multiprocessing yields correct values ... >>> >>> Here a piece of code that demonstrates this weird behavior: >>> >>> import multiprocessingfrom sklearn import linear_model >>> def test_without_multi(input_x,input_y): >>> clf = linear_model.LinearRegression(normalize=True) >>> clf.fit(input_x, >>> input_y, >>> n_jobs=1) >>> print clf.coef_ >>> def test_with_multi(input_x,input_y): >>> process = multiprocessing.Process(target=test_without_multi,args=(input_x,input_y)) >>> process.start() >>> process.join() >>> if __name__ == '__main__': >>> input_x = [[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[1,1350]] >>> input_y = [2,1,1,2,3,1,3,2,1] >>> test_without_multi(input_x,input_y) >>> test_with_multi(input_x,input_y) >>> >>> Does anyone know what is happening ? >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> >> >> -- >> Sasha >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- Sasha -------------- next part -------------- An HTML attachment was scrubbed... URL: From af.charles.pierre at gmail.com Thu Dec 5 10:29:22 2013 From: af.charles.pierre at gmail.com (Charles Pierre) Date: Thu, 5 Dec 2013 16:29:22 +0100 Subject: [SciPy-User] lstsq/Scipy and python multiprocessing In-Reply-To: References: Message-ID: Same as you, sklearn.__version__ = '0.14.1' multiprocessing.__version__ = '0.70a1' scipy.__version__='0.12.0' 2013/12/5 Oleksandr Huziy > My scipy version is '0.12.0', I do not know maybe it is a bug in sklearn, > scipy or multiprocessing, what are the versions you use? > > > 2013/12/5 Charles Pierre > >> Hi, >> Thanks for trying out the code. >> In my case, i get : >> >> [ -4.37499997e-01 -3.24074065e-04] >> [ -2.85439412e+14 2.11436597e+11] >> >> I am still confused on why i don't get the same result for both methods... >> >> >> >> 2013/12/5 Oleksandr Huziy >> >>> Hi: >>> >>> This code gives me the same answer in both cases. >>> >>> [ -2.85439413e+14 2.11436602e+11] >>> [ -2.85439413e+14 2.11436602e+11] >>> >>> >>> >>> sklearn.__version__ = '0.14.1' >>> multiprocessing.__version__ = '0.70a1' >>> >>> Cheers >>> >>> >>> >>> 2013/12/5 Charles Pierre >>> >>>> I was trying to do some simple multivariate regression using >>>> sklearn.linear_model and mutliprocessing module when i found this really >>>> confusing behavior. >>>> >>>> For some reason, the linear regression seems to be broken for >>>> particular input vectors when using multiprocessing. Using the same >>>> training set without multiprocessing yields correct values ... >>>> >>>> Here a piece of code that demonstrates this weird behavior: >>>> >>>> import multiprocessingfrom sklearn import linear_model >>>> def test_without_multi(input_x,input_y): >>>> clf = linear_model.LinearRegression(normalize=True) >>>> clf.fit(input_x, >>>> input_y, >>>> n_jobs=1) >>>> print clf.coef_ >>>> def test_with_multi(input_x,input_y): >>>> process = multiprocessing.Process(target=test_without_multi,args=(input_x,input_y)) >>>> process.start() >>>> process.join() >>>> if __name__ == '__main__': >>>> input_x = [[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[1,1350]] >>>> input_y = [2,1,1,2,3,1,3,2,1] >>>> test_without_multi(input_x,input_y) >>>> test_with_multi(input_x,input_y) >>>> >>>> Does anyone know what is happening ? >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>> >>> >>> -- >>> Sasha >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > > -- > Sasha > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guziy.sasha at gmail.com Thu Dec 5 10:35:10 2013 From: guziy.sasha at gmail.com (Oleksandr Huziy) Date: Thu, 5 Dec 2013 10:35:10 -0500 Subject: [SciPy-User] lstsq/Scipy and python multiprocessing In-Reply-To: References: Message-ID: What about numpy and python?: Python 2.7.5+ numpy: '1.7.1' 2013/12/5 Charles Pierre > Same as you, > > sklearn.__version__ = '0.14.1' > multiprocessing.__version__ = '0.70a1' > scipy.__version__='0.12.0' > > > 2013/12/5 Oleksandr Huziy > >> My scipy version is '0.12.0', I do not know maybe it is a bug in >> sklearn, scipy or multiprocessing, what are the versions you use? >> >> >> 2013/12/5 Charles Pierre >> >>> Hi, >>> Thanks for trying out the code. >>> In my case, i get : >>> >>> [ -4.37499997e-01 -3.24074065e-04] >>> [ -2.85439412e+14 2.11436597e+11] >>> >>> I am still confused on why i don't get the same result for both >>> methods... >>> >>> >>> >>> 2013/12/5 Oleksandr Huziy >>> >>>> Hi: >>>> >>>> This code gives me the same answer in both cases. >>>> >>>> [ -2.85439413e+14 2.11436602e+11] >>>> [ -2.85439413e+14 2.11436602e+11] >>>> >>>> >>>> >>>> sklearn.__version__ = '0.14.1' >>>> multiprocessing.__version__ = '0.70a1' >>>> >>>> Cheers >>>> >>>> >>>> >>>> 2013/12/5 Charles Pierre >>>> >>>>> I was trying to do some simple multivariate regression using >>>>> sklearn.linear_model and mutliprocessing module when i found this really >>>>> confusing behavior. >>>>> >>>>> For some reason, the linear regression seems to be broken for >>>>> particular input vectors when using multiprocessing. Using the same >>>>> training set without multiprocessing yields correct values ... >>>>> >>>>> Here a piece of code that demonstrates this weird behavior: >>>>> >>>>> import multiprocessingfrom sklearn import linear_model >>>>> def test_without_multi(input_x,input_y): >>>>> clf = linear_model.LinearRegression(normalize=True) >>>>> clf.fit(input_x, >>>>> input_y, >>>>> n_jobs=1) >>>>> print clf.coef_ >>>>> def test_with_multi(input_x,input_y): >>>>> process = multiprocessing.Process(target=test_without_multi,args=(input_x,input_y)) >>>>> process.start() >>>>> process.join() >>>>> if __name__ == '__main__': >>>>> input_x = [[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[1,1350]] >>>>> input_y = [2,1,1,2,3,1,3,2,1] >>>>> test_without_multi(input_x,input_y) >>>>> test_with_multi(input_x,input_y) >>>>> >>>>> Does anyone know what is happening ? >>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>>> >>>> >>>> >>>> -- >>>> Sasha >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> >> >> -- >> Sasha >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- Sasha -------------- next part -------------- An HTML attachment was scrubbed... URL: From af.charles.pierre at gmail.com Thu Dec 5 10:42:39 2013 From: af.charles.pierre at gmail.com (Charles Pierre) Date: Thu, 5 Dec 2013 16:42:39 +0100 Subject: [SciPy-User] lstsq/Scipy and python multiprocessing In-Reply-To: References: Message-ID: python: 2.7.3 numpy: 1.7.0b2 i am gonna try updating 2013/12/5 Oleksandr Huziy > What about numpy and python?: > Python 2.7.5+ > numpy: '1.7.1' > > > > 2013/12/5 Charles Pierre > >> Same as you, >> >> sklearn.__version__ = '0.14.1' >> multiprocessing.__version__ = '0.70a1' >> scipy.__version__='0.12.0' >> >> >> 2013/12/5 Oleksandr Huziy >> >>> My scipy version is '0.12.0', I do not know maybe it is a bug in >>> sklearn, scipy or multiprocessing, what are the versions you use? >>> >>> >>> 2013/12/5 Charles Pierre >>> >>>> Hi, >>>> Thanks for trying out the code. >>>> In my case, i get : >>>> >>>> [ -4.37499997e-01 -3.24074065e-04] >>>> [ -2.85439412e+14 2.11436597e+11] >>>> >>>> I am still confused on why i don't get the same result for both >>>> methods... >>>> >>>> >>>> >>>> 2013/12/5 Oleksandr Huziy >>>> >>>>> Hi: >>>>> >>>>> This code gives me the same answer in both cases. >>>>> >>>>> [ -2.85439413e+14 2.11436602e+11] >>>>> [ -2.85439413e+14 2.11436602e+11] >>>>> >>>>> >>>>> >>>>> sklearn.__version__ = '0.14.1' >>>>> multiprocessing.__version__ = '0.70a1' >>>>> >>>>> Cheers >>>>> >>>>> >>>>> >>>>> 2013/12/5 Charles Pierre >>>>> >>>>>> I was trying to do some simple multivariate regression using >>>>>> sklearn.linear_model and mutliprocessing module when i found this really >>>>>> confusing behavior. >>>>>> >>>>>> For some reason, the linear regression seems to be broken for >>>>>> particular input vectors when using multiprocessing. Using the same >>>>>> training set without multiprocessing yields correct values ... >>>>>> >>>>>> Here a piece of code that demonstrates this weird behavior: >>>>>> >>>>>> import multiprocessingfrom sklearn import linear_model >>>>>> def test_without_multi(input_x,input_y): >>>>>> clf = linear_model.LinearRegression(normalize=True) >>>>>> clf.fit(input_x, >>>>>> input_y, >>>>>> n_jobs=1) >>>>>> print clf.coef_ >>>>>> def test_with_multi(input_x,input_y): >>>>>> process = multiprocessing.Process(target=test_without_multi,args=(input_x,input_y)) >>>>>> process.start() >>>>>> process.join() >>>>>> if __name__ == '__main__': >>>>>> input_x = [[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[0,0],[1,1350]] >>>>>> input_y = [2,1,1,2,3,1,3,2,1] >>>>>> test_without_multi(input_x,input_y) >>>>>> test_with_multi(input_x,input_y) >>>>>> >>>>>> Does anyone know what is happening ? >>>>>> >>>>>> _______________________________________________ >>>>>> SciPy-User mailing list >>>>>> SciPy-User at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Sasha >>>>> >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>> >>> >>> -- >>> Sasha >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > > -- > Sasha > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From djpine at gmail.com Fri Dec 6 02:21:21 2013 From: djpine at gmail.com (David J Pine) Date: Fri, 6 Dec 2013 08:21:21 +0100 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: <6D454EB9-44AE-4A5D-9B55-3080364B9A26@gmail.com> References: <6D454EB9-44AE-4A5D-9B55-3080364B9A26@gmail.com> Message-ID: On Thu, Dec 5, 2013 at 3:49 PM, Matt Newville wrote: > On Thu, Dec 5, 2013 at 8:13 AM, Alan G Isaac wrote: > > On 12/5/2013 5:43 AM, Skipper Seabold wrote: > >> Maybe this is an opportunity to start introducing the Bunch pattern > >> into scipy. > > > > Pursuing this train of thought one step further, > > I like results objects that can do lazy evaluation > > of expensive results. > > > > Alan Isaac > > > > Yes, I think this would be very helpful. Returning a "Result" that > was an otherwise empty class instance that could possibly have methods > to calculate derived values would be a nice approach. A nice > possibility is that curve_fit() could simply extend the Results class > returned from minimize(), so that the minimize() values were available > if needed. > > I don't like the 'full_output' options that change the number or > quantity of output values. Consistency in return values between > different functions with related purpose would very, very helpful, but > consistency in return values for a single function seems like it > should be a requirement. Of course, breaking existing APIs is also > bad, but I would suggest not adding any more 'full_output' options. > > That said, in the case of linfit(), returning just two values > (best_values, covariance) seems completely acceptable to me. > > --Matt Newville > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > The Bunch class idea is new to me. I looked it up ( http://stackoverflow.com/questions/16262670/understanding-the-bunch-pattern-and-self-dict) and tried it out in linfit(). It works quite nicely, having the advantages of a dictionary but with a cleaner syntax. If this way of doing the output were implemented, then linfit() would have two persistent outputs, fit (an array containing the slope and y-intercept) and cvm (the 2x2 covariance matrix of the fitting parameters), and the optional output info where info.rchisq would be the value of reduced chi-squared, info.resids would be the residuals, info.rval would be the r-value, etc. It isn't the usual way of doing things, but it's clean and simple. I rather like it. What does everyone else think? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Dec 6 02:25:11 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 6 Dec 2013 08:25:11 +0100 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <6D454EB9-44AE-4A5D-9B55-3080364B9A26@gmail.com> Message-ID: On Fri, Dec 6, 2013 at 8:21 AM, David J Pine wrote: > > > On Thu, Dec 5, 2013 at 3:49 PM, Matt Newville wrote: > >> On Thu, Dec 5, 2013 at 8:13 AM, Alan G Isaac >> wrote: >> > On 12/5/2013 5:43 AM, Skipper Seabold wrote: >> >> Maybe this is an opportunity to start introducing the Bunch pattern >> >> into scipy. >> > >> > Pursuing this train of thought one step further, >> > I like results objects that can do lazy evaluation >> > of expensive results. >> > >> > Alan Isaac >> > >> >> Yes, I think this would be very helpful. Returning a "Result" that >> was an otherwise empty class instance that could possibly have methods >> to calculate derived values would be a nice approach. A nice >> possibility is that curve_fit() could simply extend the Results class >> returned from minimize(), so that the minimize() values were available >> if needed. >> >> I don't like the 'full_output' options that change the number or >> quantity of output values. Consistency in return values between >> different functions with related purpose would very, very helpful, but >> consistency in return values for a single function seems like it >> should be a requirement. Of course, breaking existing APIs is also >> bad, but I would suggest not adding any more 'full_output' options. >> >> That said, in the case of linfit(), returning just two values >> (best_values, covariance) seems completely acceptable to me. >> >> --Matt Newville >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > The Bunch class idea is new to me. I looked it up ( > http://stackoverflow.com/questions/16262670/understanding-the-bunch-pattern-and-self-dict) > and tried it out in linfit(). It works quite nicely, having the advantages > of a dictionary but with a cleaner syntax. If this way of doing the output > were implemented, then linfit() would have two persistent outputs, fit (an > array containing the slope and y-intercept) and cvm (the 2x2 covariance > matrix of the fitting parameters), and the optional output info where > info.rchisq would be the value of reduced chi-squared, info.resids would be > the residuals, info.rval would be the r-value, etc. It isn't the usual way > of doing things, but it's clean and simple. I rather like it. What does > everyone else think? > +1 would be a useful improvement imho. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From newville at cars.uchicago.edu Fri Dec 6 10:28:23 2013 From: newville at cars.uchicago.edu (Matt Newville) Date: Fri, 6 Dec 2013 09:28:23 -0600 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <6D454EB9-44AE-4A5D-9B55-3080364B9A26@gmail.com> Message-ID: On Fri, Dec 6, 2013 at 1:21 AM, David J Pine wrote: > > > On Thu, Dec 5, 2013 at 3:49 PM, Matt Newville > wrote: >> >> On Thu, Dec 5, 2013 at 8:13 AM, Alan G Isaac wrote: >> > On 12/5/2013 5:43 AM, Skipper Seabold wrote: >> >> Maybe this is an opportunity to start introducing the Bunch pattern >> >> into scipy. >> > >> > Pursuing this train of thought one step further, >> > I like results objects that can do lazy evaluation >> > of expensive results. >> > >> > Alan Isaac >> > >> >> Yes, I think this would be very helpful. Returning a "Result" that >> was an otherwise empty class instance that could possibly have methods >> to calculate derived values would be a nice approach. A nice >> possibility is that curve_fit() could simply extend the Results class >> returned from minimize(), so that the minimize() values were available >> if needed. >> >> I don't like the 'full_output' options that change the number or >> quantity of output values. Consistency in return values between >> different functions with related purpose would very, very helpful, but >> consistency in return values for a single function seems like it >> should be a requirement. Of course, breaking existing APIs is also >> bad, but I would suggest not adding any more 'full_output' options. >> >> That said, in the case of linfit(), returning just two values >> (best_values, covariance) seems completely acceptable to me. >> >> --Matt Newville >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > The Bunch class idea is new to me. I looked it up > (http://stackoverflow.com/questions/16262670/understanding-the-bunch-pattern-and-self-dict) > and tried it out in linfit(). It works quite nicely, having the advantages > of a dictionary but with a cleaner syntax. If this way of doing the output > were implemented, then linfit() would have two persistent outputs, fit (an > array containing the slope and y-intercept) and cvm (the 2x2 covariance > matrix of the fitting parameters), and the optional output info where > info.rchisq would be the value of reduced chi-squared, info.resids would be > the residuals, info.rval would be the r-value, etc. It isn't the usual way > of doing things, but it's clean and simple. I rather like it. What does > everyone else think? > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > +1. I might suggest using longer names: 'covariance' instead of 'cvm', and perhaps 'slope' and 'intercept' instead of (or in addition to) a 2-element array (again, order is probably obvious, but only probably). But having linfit() included would be great. -- --Matt Newville From josef.pktd at gmail.com Fri Dec 6 11:19:06 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 6 Dec 2013 11:19:06 -0500 Subject: [SciPy-User] adding linear fitting routine In-Reply-To: References: <6D454EB9-44AE-4A5D-9B55-3080364B9A26@gmail.com> Message-ID: On Fri, Dec 6, 2013 at 10:28 AM, Matt Newville wrote: > On Fri, Dec 6, 2013 at 1:21 AM, David J Pine wrote: >> >> >> On Thu, Dec 5, 2013 at 3:49 PM, Matt Newville >> wrote: >>> >>> On Thu, Dec 5, 2013 at 8:13 AM, Alan G Isaac wrote: >>> > On 12/5/2013 5:43 AM, Skipper Seabold wrote: >>> >> Maybe this is an opportunity to start introducing the Bunch pattern >>> >> into scipy. >>> > >>> > Pursuing this train of thought one step further, >>> > I like results objects that can do lazy evaluation >>> > of expensive results. >>> > >>> > Alan Isaac >>> > >>> >>> Yes, I think this would be very helpful. Returning a "Result" that >>> was an otherwise empty class instance that could possibly have methods >>> to calculate derived values would be a nice approach. A nice >>> possibility is that curve_fit() could simply extend the Results class >>> returned from minimize(), so that the minimize() values were available >>> if needed. >>> >>> I don't like the 'full_output' options that change the number or >>> quantity of output values. Consistency in return values between >>> different functions with related purpose would very, very helpful, but >>> consistency in return values for a single function seems like it >>> should be a requirement. Of course, breaking existing APIs is also >>> bad, but I would suggest not adding any more 'full_output' options. >>> >>> That said, in the case of linfit(), returning just two values >>> (best_values, covariance) seems completely acceptable to me. >>> >>> --Matt Newville >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> The Bunch class idea is new to me. I looked it up >> (http://stackoverflow.com/questions/16262670/understanding-the-bunch-pattern-and-self-dict) >> and tried it out in linfit(). It works quite nicely, having the advantages >> of a dictionary but with a cleaner syntax. If this way of doing the output >> were implemented, then linfit() would have two persistent outputs, fit (an >> array containing the slope and y-intercept) and cvm (the 2x2 covariance >> matrix of the fitting parameters), and the optional output info where >> info.rchisq would be the value of reduced chi-squared, info.resids would be >> the residuals, info.rval would be the r-value, etc. It isn't the usual way >> of doing things, but it's clean and simple. I rather like it. What does >> everyone else think? >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > +1. I might suggest using longer names: 'covariance' instead of > 'cvm', and perhaps 'slope' and 'intercept' instead of (or in addition > to) a 2-element array (again, order is probably obvious, but only > probably). But having linfit() included would be great. I like Bunches or results classes, but I don't expect to be much of a user of linfit and don't vote. There is still the question of whether and how to add lazy evaluation. (In statmodels we calculate most things lazily, but then attach it to the results instance so any further use of final or intermediate results doesn't have to be recalculated.) Josef > > -- > --Matt Newville > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From ralf.gommers at gmail.com Sun Dec 8 05:06:55 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 8 Dec 2013 11:06:55 +0100 Subject: [SciPy-User] ANN: Scipy 0.13.2 release Message-ID: Hi, I'm happy to announce the availability of the scipy 0.13.2 release. This is a bugfix only release; it contains fixes for ndimage and optimize, and most importantly was compiled with Cython 0.19.2 to fix memory leaks in code using Cython fused types. Source tarballs, binaries and release notes can be found at http://sourceforge.net/projects/scipy/files/scipy/0.13.2/ Cheers, Ralf ========================== SciPy 0.13.2 Release Notes ========================== SciPy 0.13.2 is a bug-fix release with no new features compared to 0.13.1. Issues fixed ------------ - 3096: require Cython 0.19, earlier versions have memory leaks in fused types - 3079: ``ndimage.label`` fix swapped 64-bitness test - 3108: ``optimize.fmin_slsqp`` constraint violation -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Sun Dec 8 16:22:32 2013 From: argriffi at ncsu.edu (alex) Date: Sun, 8 Dec 2013 16:22:32 -0500 Subject: [SciPy-User] Optimiation does nothing Message-ID: > Hi, > > i?m trying to use scipy?s optimization using a negative loglikelihood > function, but it immediately ?converges? with a gradient of 1e+23. > > here?s the notebook if you could have a look: > http://nbviewer.ipython.org/gist/flying-sheep/7806554 (just ignore the > data, code is after that) > > as you can see, matlab?s fmincon found a much better solution, > and there?s no bug in my loglikelihood function > (i use scipy 1.14?s multivariate normal log-probability density function, > and it indeed rates matlabs optimum better) > > what should i modify to make the optimizer do its job? > > best regards, philipp Hi Philipp, I'm glad to see that this new scipy function is being used! Here's a code snippet that will reproduce your matlab answer ----- import numpy as np from scipy.stats import multivariate_normal from scipy import optimize import scipy from mydata import data1, data2 def neg_ll(X): lam, x2, y2, x1, y1 = X ll_total = 0 for x, y, data in ((x1, y1, data1), (x2, y2, data2)): prec = np.array([ [x*x, x*y*np.cos(lam)], [x*y*np.cos(lam), y*y]]) cov = scipy.linalg.pinvh(prec) ll = multivariate_normal.logpdf(data, np.mean(data, 0), cov).sum() ll_total += ll return -ll_total guess = np.ones(5) optx, opty, info = optimize.fmin_l_bfgs_b(neg_ll, guess, approx_grad=True) print 'scipy optimize info:' print optx print opty print info ----- Best, Alex From flying-sheep at web.de Mon Dec 9 06:30:36 2013 From: flying-sheep at web.de (Philipp A.) Date: Mon, 9 Dec 2013 12:30:36 +0100 Subject: [SciPy-User] Optimiation does nothing In-Reply-To: References: Message-ID: Hi Alex, thank you very much for helping me here! could you please tell me why you chose pinvh here, and whats the difference between it, numpy.linalg.inv and scipy.linmalg.inv? Best regards, Philipp 2013/12/8 alex > > Hi, > > > > i?m trying to use scipy?s optimization using a negative loglikelihood > > function, but it immediately ?converges? with a gradient of 1e+23. > > > > here?s the notebook if you could have a look: > > http://nbviewer.ipython.org/gist/flying-sheep/7806554 (just ignore the > > data, code is after that) > > > > as you can see, matlab?s fmincon found a much better solution, > > and there?s no bug in my loglikelihood function > > (i use scipy 1.14?s multivariate normal log-probability density function, > > and it indeed rates matlabs optimum better) > > > > what should i modify to make the optimizer do its job? > > > > best regards, philipp > > Hi Philipp, > > I'm glad to see that this new scipy function is being used! Here's a > code snippet that will reproduce your matlab answer > > ----- > > import numpy as np > > from scipy.stats import multivariate_normal > from scipy import optimize > import scipy > > from mydata import data1, data2 > > def neg_ll(X): > lam, x2, y2, x1, y1 = X > ll_total = 0 > for x, y, data in ((x1, y1, data1), (x2, y2, data2)): > prec = np.array([ > [x*x, x*y*np.cos(lam)], > [x*y*np.cos(lam), y*y]]) > cov = scipy.linalg.pinvh(prec) > ll = multivariate_normal.logpdf(data, np.mean(data, 0), cov).sum() > ll_total += ll > return -ll_total > > guess = np.ones(5) > optx, opty, info = optimize.fmin_l_bfgs_b(neg_ll, guess, approx_grad=True) > > print 'scipy optimize info:' > print optx > print opty > print info > > ----- > > Best, > Alex > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Mon Dec 9 11:48:13 2013 From: argriffi at ncsu.edu (alex) Date: Mon, 9 Dec 2013 11:48:13 -0500 Subject: [SciPy-User] Optimiation does nothing In-Reply-To: References: Message-ID: > Hi Alex, > > thank you very much for helping me here! could you please tell me why you > chose pinvh here, and whats the difference between it, numpy.linalg.inv and > scipy.linmalg.inv? pinvh http://docs.scipy.org/doc/scipy-dev/reference/generated/scipy.linalg.pinvh.html gets the pseudo-inverse of a hermitian matrix, treating nearly-singular matrices in a way that is compatible with the way that they are treated by scipy's multivariate_normal logpdf. When the covariance matrix is nearly singular, scipy's multivariate_normal uses http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Degenerate_case which you probably don't want, now that I think about it. If you set your angle to zero in your optimal solution, I think that you will get a better log likelihood using scipy's multivariate normal, because it is treating the distribution as degenerate, and the optimization is not finding it because it is only looking for a local min. I don't know how matlab will treat it. Because computing the rank of a matrix is numerically touchy, in your case it is best to not check linalg.det == 0 but rather to supply the return_rank=True option to pinvh. In your case this will give a rank<2 exactly when the multivariate normal logpdf would have used a degenerate distribution, and you can return np.inf as the negative log likelihood. Maybe it would make sense to add some option to multivariate_normal that tells it to complain about nearly singular matrices instead of silently switching to a degenerate distribution, or to have the logpdf optionally return the matrix rank that was used internally. From flying-sheep at web.de Mon Dec 9 14:13:24 2013 From: flying-sheep at web.de (Philipp A.) Date: Mon, 9 Dec 2013 20:13:24 +0100 Subject: [SciPy-User] Optimiation does nothing In-Reply-To: References: Message-ID: 2013/12/9 alex > pinvh > http://docs.scipy.org/doc/scipy-dev/reference/generated/scipy.linalg.pinvh.html > gets the pseudo-inverse of a hermitian matrix, treating > nearly-singular matrices in a way that is compatible with the way that > they are treated by scipy's multivariate_normal logpdf. When the > covariance matrix is nearly singular, scipy's multivariate_normal uses > > http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Degenerate_case > which you probably don't want, now that I think about it. > > If you set your angle to zero in your optimal solution, I think that > you will get a better log likelihood using scipy's multivariate > normal, because it is treating the distribution as degenerate, and the > optimization is not finding it because it is only looking for a local > min. I don't know how matlab will treat it. Because computing the > rank of a matrix is numerically touchy, in your case it is best to not > check linalg.det == 0 but rather to supply the return_rank=True option > to pinvh. In your case this will give a rank<2 exactly when the > multivariate normal logpdf would have used a degenerate distribution, > and you can return np.inf as the negative log likelihood. Maybe it > would make sense to add some option to multivariate_normal that tells > it to complain about nearly singular matrices instead of silently > switching to a degenerate distribution, or to have the logpdf > optionally return the matrix rank that was used internally. thanks about the in-depth explanation. and for noticing that the parameters were switched in respect to the data! (?, ???, ???, , ???, ??? vs data1, data2) -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.collette at gmail.com Mon Dec 9 19:28:46 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Mon, 9 Dec 2013 17:28:46 -0700 Subject: [SciPy-User] ANN: HDF5 for Python 2.2.1 Message-ID: Announcing HDF5 for Python (h5py) 2.2.1 ======================================= The h5py team is happy, in a sense, to announce the availability of h5py 2.2.1. This release fixes a critical bug reported by Jim Parker on December 7th, which affects code using HDF5 compound types. We recommend that all users of h5py 2.2.0 upgrade to avoid crashes or possible data corruption. About h5py, downloads, documentation: http://www.h5py.org Scope of bug ------------ The issue affects a feature introduced in h5py 2.2.0, in which HDF5 compound datasets may be updated in-place, by specifying a field name or names when writing to the dataset: >>> dataset['field_name'] = value Under certain conditions, h5py can supply uninitialized memory to the HDF5 conversion machinery, leading (in the case reported) to a segmentation fault. It is also possible for other fields of the type to be corrupted. This issue affects only code which updates a subset of the fields in the compound type. Programs reading from a compound type, writing all fields, or using other datatypes, are not affected; nor are versions of h5py prior to 2.2.0. More information ---------------- Github issue: https://github.com/h5py/h5py/issues/372 Original thread: https://groups.google.com/forum/#!topic/h5py/AbUOZ1MXf3U Thanks also to Christoph Gohlke for making Windows installers available on very short notice, after a glitch in the h5py build system. From flying-sheep at web.de Tue Dec 10 11:39:15 2013 From: flying-sheep at web.de (Philipp A.) Date: Tue, 10 Dec 2013 17:39:15 +0100 Subject: [SciPy-User] Scipy docs servers are SLOW Message-ID: Hi, the following pic shows how slow the doc servers are to respond sometimes: [image: Inline-Bild 1] the second one means ?establishing connection? and the third one means ?waiting [for the server]?, which takes sometimes 1, sometimes 10, and sometimes >40 seconds. is there some way to fix this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ognen at enthought.com Tue Dec 10 11:40:53 2013 From: ognen at enthought.com (Ognen Duzlevski) Date: Tue, 10 Dec 2013 10:40:53 -0600 Subject: [SciPy-User] Scipy docs servers are SLOW In-Reply-To: References: Message-ID: Yes, move the docs server to Amazon. On Tue, Dec 10, 2013 at 10:39 AM, Philipp A. wrote: > Hi, the following pic shows how slow the doc servers are to respond > sometimes: > > [image: Inline-Bild 1] > > the second one means ?establishing connection? and the third one means > ?waiting [for the server]?, which takes sometimes 1, sometimes 10, and > sometimes >40 seconds. > > is there some way to fix this? > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From flying-sheep at web.de Tue Dec 10 12:06:32 2013 From: flying-sheep at web.de (Philipp A.) Date: Tue, 10 Dec 2013 18:06:32 +0100 Subject: [SciPy-User] Scipy docs servers are SLOW In-Reply-To: References: Message-ID: or to pythonhosted (simply upload via PyPI) or to readthedocs or to github pages i think everything is static pages anyway, so all of those should be possible. apologies if i?m wrong. 2013/12/10 Ognen Duzlevski > Yes, move the docs server to Amazon. > > > On Tue, Dec 10, 2013 at 10:39 AM, Philipp A. wrote: > >> Hi, the following pic shows how slow the doc servers are to respond >> sometimes: >> >> [image: Inline-Bild 1] >> >> the second one means ?establishing connection? and the third one means >> ?waiting [for the server]?, which takes sometimes 1, sometimes 10, and >> sometimes >40 seconds. >> >> is there some way to fix this? >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Fri Dec 13 05:02:05 2013 From: cournape at gmail.com (David Cournapeau) Date: Fri, 13 Dec 2013 10:02:05 +0000 Subject: [SciPy-User] [SciPy-Dev] ANN: Scipy 0.13.2 release In-Reply-To: References: Message-ID: Hi Ralf, Thanks a lot for the quick fix release. I can confirm it builds and tests correctly on windows, rhel5 and osx (both 32 and 64 bits). cheers, David On Sun, Dec 8, 2013 at 10:06 AM, Ralf Gommers wrote: > Hi, > > I'm happy to announce the availability of the scipy 0.13.2 release. This > is a bugfix only release; it contains fixes for ndimage and optimize, and > most importantly was compiled with Cython 0.19.2 to fix memory leaks in > code using Cython fused types. > > Source tarballs, binaries and release notes can be found at > http://sourceforge.net/projects/scipy/files/scipy/0.13.2/ > > Cheers, > Ralf > > > ========================== > SciPy 0.13.2 Release Notes > ========================== > > SciPy 0.13.2 is a bug-fix release with no new features compared to 0.13.1. > > > Issues fixed > ------------ > > - 3096: require Cython 0.19, earlier versions have memory leaks in fused > types > - 3079: ``ndimage.label`` fix swapped 64-bitness test > - 3108: ``optimize.fmin_slsqp`` constraint violation > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From James.R.Anderson at utah.edu Fri Dec 13 13:47:13 2013 From: James.R.Anderson at utah.edu (James Anderson) Date: Fri, 13 Dec 2013 18:47:13 +0000 Subject: [SciPy-User] Scipy.spatial: Interval data structure? Message-ID: <945D27AF3E34704989FA4D330E11A2AE310FCD42@X-MB6.xds.umail.utah.edu> I'm looking for a reasonably fast interval tree structure in Python and Scipy.Spatial seems like the right place for this structure to live. I need the basic queries of "All intervals intersecting a point" and "All intervals intersecting an interval" The only other interval tree I've found is in Banyan, but that requires compilation and most of my users are on Windows so I try to restrict my dependencies to the ones they can easily download from Gohlke's site. Right now I'm thinking of coding my own interval tree in pure Python, but before I do that is there an existing way of doing this with Scipy I am missing? Thanks, James -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Dec 13 14:23:56 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 13 Dec 2013 11:23:56 -0800 Subject: [SciPy-User] Scipy.spatial: Interval data structure? In-Reply-To: <945D27AF3E34704989FA4D330E11A2AE310FCD42@X-MB6.xds.umail.utah.edu> References: <945D27AF3E34704989FA4D330E11A2AE310FCD42@X-MB6.xds.umail.utah.edu> Message-ID: On Fri, Dec 13, 2013 at 10:47 AM, James Anderson wrote: > I?m looking for a reasonably fast interval tree structure in Python and > Scipy.Spatial seems like the right place for this structure to live. I need > the basic queries of ?All intervals intersecting a point? and ?All intervals > intersecting an interval? > > The only other interval tree I?ve found is in Banyan, but that requires > compilation and most of my users are on Windows so I try to restrict my > dependencies to the ones they can easily download from Gohlke?s site. > > Right now I?m thinking of coding my own interval tree in pure Python, but > before I do that is there an existing way of doing this with Scipy I am > missing? I don't know of any implementation in scipy. bx-python has an interval tree implementation, that I think has a pure-Python version. For a hack implementing an interval-tree-like structure using the stdlib sqlite module, there's also this trick: http://www.logarithmic.net/pfh/blog/01235197474 which is what rERPy uses: https://github.com/rerpy/rerpy/blob/master/rerpy/events.py But not sure this is the most useful approach unless you need some of the other advantages of sqlite. Before spending a lot of effort on writing new code you might see if cgohlke would be willing to add banyan to his list, or if the banyan developers are interested in distributing .whl's for windows... -n From James.R.Anderson at utah.edu Fri Dec 13 14:31:37 2013 From: James.R.Anderson at utah.edu (James Anderson) Date: Fri, 13 Dec 2013 19:31:37 +0000 Subject: [SciPy-User] Scipy.spatial: Interval data structure? In-Reply-To: References: <945D27AF3E34704989FA4D330E11A2AE310FCD42@X-MB6.xds.umail.utah.edu> Message-ID: <945D27AF3E34704989FA4D330E11A2AE310FCD82@X-MB6.xds.umail.utah.edu> Thanks, I may have found a solution with "rtree" on Gohlke's site. It allows searching for ranges in 2D and 3D. The downside is not having a Python 3 build. However it has the exact functionality I needed. I'm testing it out now. -----Original Message----- From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org] On Behalf Of Nathaniel Smith Sent: Friday, December 13, 2013 11:24 AM To: SciPy Users List Subject: Re: [SciPy-User] Scipy.spatial: Interval data structure? On Fri, Dec 13, 2013 at 10:47 AM, James Anderson wrote: > I?m looking for a reasonably fast interval tree structure in Python > and Scipy.Spatial seems like the right place for this structure to > live. I need the basic queries of ?All intervals intersecting a > point? and ?All intervals intersecting an interval? > > The only other interval tree I?ve found is in Banyan, but that > requires compilation and most of my users are on Windows so I try to > restrict my dependencies to the ones they can easily download from Gohlke?s site. > > Right now I?m thinking of coding my own interval tree in pure Python, > but before I do that is there an existing way of doing this with Scipy > I am missing? I don't know of any implementation in scipy. bx-python has an interval tree implementation, that I think has a pure-Python version. For a hack implementing an interval-tree-like structure using the stdlib sqlite module, there's also this trick: http://www.logarithmic.net/pfh/blog/01235197474 which is what rERPy uses: https://github.com/rerpy/rerpy/blob/master/rerpy/events.py But not sure this is the most useful approach unless you need some of the other advantages of sqlite. Before spending a lot of effort on writing new code you might see if cgohlke would be willing to add banyan to his list, or if the banyan developers are interested in distributing .whl's for windows... -n _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From tmp50 at ukr.net Sun Dec 15 06:51:01 2013 From: tmp50 at ukr.net (Dmitrey) Date: Sun, 15 Dec 2013 13:51:01 +0200 Subject: [SciPy-User] [ANN] OpenOpt suite v 0.52 Message-ID: <1387108021.728479007.a89e227c@frv46.ukr.net> Hi all, I'm glad to inform you about new? OpenOpt ? Suite release 0.52 (2013-Dec-15): ? ? Minor? interalg ? speedup ? ? oofun expression ? ? MATLAB solvers fmincon and fsolve have been connected ? ? Several MATLAB ODE solvers have been connected ? ? New ODE solvers, parameters abstol and reltol ? ? New GLP solver: direct ? ? Some minor bugfixes and improvements Regards, D. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hong at topbug.net Mon Dec 16 04:47:19 2013 From: hong at topbug.net (Hong Xu) Date: Mon, 16 Dec 2013 01:47:19 -0800 Subject: [SciPy-User] Error bound of scipy.linalg.eigh? Message-ID: <52AECC27.8030709@topbug.net> Hi all, It seems that there is no parameter to set the error bound for the scipy.linalg.eigh routine. Does anyone have any idea on this? Thanks! Hong From pav at iki.fi Mon Dec 16 14:04:26 2013 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 16 Dec 2013 21:04:26 +0200 Subject: [SciPy-User] Error bound of scipy.linalg.eigh? In-Reply-To: <52AECC27.8030709@topbug.net> References: <52AECC27.8030709@topbug.net> Message-ID: 16.12.2013 11:47, Hong Xu kirjoitti: > It seems that there is no parameter to set the error bound for the > scipy.linalg.eigh routine. Does anyone have any idea on this? LAPACK usually determines the error bounds by itself. You can call the underlying LAPACK routines via scipy.linalg.lapack, if you really need to set some of the parameters. -- Pauli Virtanen From jrocher at enthought.com Mon Dec 16 20:46:53 2013 From: jrocher at enthought.com (Jonathan Rocher) Date: Mon, 16 Dec 2013 19:46:53 -0600 Subject: [SciPy-User] [ANN] Release of ETS 4.4: Traits, Chaco, and more... Message-ID: [Apologies for the cross-post] Dear fellow developers, Enthought is pleased to announce the release of multiple major projects of ETS: - Traits 4.4.0, - Chaco 4.4.1, - TraitsUI 4.4.0, - Envisage 4.4.0, - Pyface 4.4.0, - Codetools 4.2.0, - ETS 4.4.1 These packages are at the core of the Enthought Tool Suite (ETS, http://code.enthought.com/projects), a collection of free, open-source components developed by Enthought and our partners to construct custom scientific applications. ETS includes a wide variety of components, including: - an extensible application framework (Envisage) - application building blocks (Traits, TraitsUI, Enaml, Pyface, Codetools) - 2-D and 3-D graphics libraries (Chaco, Mayavi, Enable) - scientific and math libraries (Scimath) - developer tools (Apptools) You can install any of these packages from Canopy's package manager, Canopy 's (or EPD?s) enpkg command, PyPI (using pip or easy_install), , or build them from source code on github. For more details about installation, see the ETS intallation page. *Contributors* =========== This set of releases was an 9-month effort of all Enthought developers as well as: - Yves Delley - Pieter Aarnoutse - Jordan Ilott - Matthieu Dartiailh - Ian Delaney - Gregor Thalhammer Many thanks to them! *General release notes* =================== 1. The major new feature in this Traits release is a new adaptation mechanism in the ``traits.adaptation`` package. The new mechanism is intended to replace the older traits.protocols package. Code written against ``traits.protocols`` will continue to work, although the ``traits.protocols`` API has been deprecated, and a warning will be logged on first use of ``traits.protocols``. See the 'Advanced Topics' section of the user manual for more details . 2. These new releases of TraitsUI, Envisage, Pyface and Codetools include an update to this new adaptation mechanism. 3. All ETS projects are now on TravisCI, making it easier to contribute to them. 4. As of this release, the only Python versions that are actively supported are 2.6 and 2.7. As we are moving toward future-proofing ETS, more code that supported Python 2.5 will be removed in the coming months. 5. We will retire chaco-users at enthought.com since it is lightly used and are now recommending all Chaco users to send questions, requests and comments to enthought-dev at enthought.com or to StackOverflow (tag "enthought" and possibly "chaco"). More details about the release of each project are given below. Please see the CHANGES.txt file inside each project for full details of the changes. Happy coding! The ETS developers *Specific release notes* =================== Traits 4.4.0 release notes --------------------------------- The Traits library enhances Python by adding optional type-checking and an event notification system, making it an ideal platform for writing data-driven applications. It forms the foundation of the Enthought Tool Suite. In addition to the above-mentioned rework of the adaptation mechanism, the release also includes improved support for using Cython with `HasTraits` classes, some new helper utilities for writing unit tests for Traits events, and a variety of bug fixes, stability enhancements, and internal code improvements. Chaco 4.4.1 release notes ----------------------------------- Chaco is a Python package for building efficient, interactive and custom 2-D plots and visualizations. While Chaco generates attractive static plots, it works particularly well for interactive data visualization and exploration. This release introduces many improvements and bug fixes, including fixes to the generation of image files from plots, improvements to the ArrayPlotData to change multiple arrays at a time, and improvements to multiple elements of the plots such as tick labels and text overlays. TraitsUI 4.4.0 release notes ------------------------------------ The TraitsUI project contains a toolkit-independent GUI abstraction layer, which is used to support the "visualization" features of the Traits package. TraitsUI allows developers to write against the TraitsUI API (views, items, editors, etc.), and let TraitsUI and the selected toolkit and back-end take care of the details of displaying them. In addition to the above-mentioned update to the new Traits 4.4.0 adaptation mechanism, there have also been a number of improvements to drag and drop support for the Qt backend and some modernization of the use of WxPython to support Wx 2.9. This release also includes a number of bug-fixes and minor functionality enhancements. Envisage 4.4.0 release notes -------------------------------------- Envisage is a Python-based framework for building extensible applications, providing a standard mechanism for features to be added to an application, whether by the original developer or by someone else. In addition to the above-mentioned update to the new Traits 4.4.0 adaptation mechanism, this release also adds a new method to retrieve a service that is required by the application and provides documentation and test updates. Pyface 4.4.0 release notes ----------------------------------- The pyface project provides a toolkit-independent library of Traits-aware widgets and GUI components, which are used to support the "visualization" features of Traits. The biggest change in this release is support for the new adaptation mechanism in Traits 4.4.0. This release also includes Tasks support for Enaml 0.8 and a number of other minor changes, improvements and bug-fixes. Codetools release notes ------------------------------- The codetools project includes packages that simplify meta-programming and help the programmer separate data from code in Python. This library provides classes for performing dependency-analysis on blocks of Python code, and Traits-enhanced execution contexts that can be used as execution namespaces. In addition to the above-mentioned update to the new Traits 4.4.0 adaptation mechanism, this release also includes a number of modernizations of the code base, including the consistent use of absolute imports, and a new execution manager for deferring events from Contexts. -- Jonathan Rocher, PhD Scientific software developer Enthought, Inc. jrocher at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.c.dixon at leeds.ac.uk Wed Dec 18 07:44:55 2013 From: m.c.dixon at leeds.ac.uk (Mark Dixon) Date: Wed, 18 Dec 2013 12:44:55 +0000 (GMT) Subject: [SciPy-User] scipy and python 3.3 Message-ID: Hi, I'm dipping my toe in python3-land and trying to build a few packages on a 64-bit Intel CentOS 6.5 box, but I have problems getting scipy 0.13.2 to pass some of its tests. Should I be opening tickets about these, or am I being really dumb? * python 3.3.2 * nose 1.3.0 * numpy 1.8.0 (with PTATLAS=None) * atlas 3.10.1 (made into a full LAPACK with netlib LAPACK 3.4.2) * gcc 4.4.7 (as shipped with CentOS 6.5) (I've built each of these from source, apart from gcc) When I run scipy's tests, I get 3 failures in (full output with verbose=2 appended below): test_basic.test_xlogy test_lambertw.test_values test_lambertw.test_ufunc I don't get any failures with the same stack built against python 2.7.6. numpy passes its tests with either stack. Any pointers would be greatly appreciated, please! Cheers, Mark ====================================================================== FAIL: test_basic.test_xlogy ---------------------------------------------------------------------- Traceback (most recent call last): File "/scratch/bob/p3/pylibs/lib/python3.3/site-packages/nose-1.3.0-py3.3.egg/nose/case.py", line 198, in runTest self.test(*self.arg) File "/scratch/bob/p3/pylibs/lib/python3.3/site-packages/scipy/special/tests/test_basic.py", line 2736, in test_xlogy assert_func_equal(special.xlogy, w2, z2, rtol=1e-13, atol=1e-13) File "/scratch/bob/p3/pylibs/lib/python3.3/site-packages/scipy/special/_testutils.py", line 87, in assert_func_equal fdata.check() File "/scratch/bob/p3/pylibs/lib/python3.3/site-packages/scipy/special/_testutils.py", line 292, in check assert_(False, "\n".join(msg)) File "/scratch/bob/p3/pylibs/lib/python3.3/site-packages/numpy/testing/utils.py", line 44, in assert_ raise AssertionError(msg) AssertionError: Max |adiff|: 712.557 Max |rdiff|: 1028 Bad results (3 out of 6) for the following points (in output 0): 0j (nan+0j) => (-0+0j) != (nan+nanj) (rdiff 0.0) (1+0j) (2+0j) => (-711.8625285635226+1.5707963267948752j) != (0.6931471805599453+0j) (rdiff 1028.0030375952847) (1+0j) 1j => (-711.8625285635226+1.5707963267948752j) != 1.5707963267948966j (rdiff 453.18576089112065) ====================================================================== FAIL: test_lambertw.test_values ---------------------------------------------------------------------- Traceback (most recent call last): File "/scratch/bob/p3/pylibs/lib/python3.3/site-packages/nose-1.3.0-py3.3.egg/nose/case.py", line 198, in runTest self.test(*self.arg) File "/scratch/bob/p3/pylibs/lib/python3.3/site-packages/scipy/special/tests/test_lambertw.py", line 21, in test_values assert_equal(lambertw(inf,1).real, inf) File "/scratch/bob/p3/pylibs/lib/python3.3/site-packages/numpy/testing/utils.py", line 304, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: ACTUAL: nan DESIRED: inf ====================================================================== FAIL: test_lambertw.test_ufunc ---------------------------------------------------------------------- Traceback (most recent call last): File "/scratch/bob/p3/pylibs/lib/python3.3/site-packages/numpy/testing/utils.py", line 581, in chk_same_position assert_array_equal(x_id, y_id) File "/scratch/bob/p3/pylibs/lib/python3.3/site-packages/numpy/testing/utils.py", line 718, in assert_array_equal verbose=verbose, header='Arrays are not equal') File "/scratch/bob/p3/pylibs/lib/python3.3/site-packages/numpy/testing/utils.py", line 644, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not equal (mismatch 66.66666666666666%) x: array([False, True, True], dtype=bool) y: array([False, False, False], dtype=bool) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/scratch/bob/p3/pylibs/lib/python3.3/site-packages/nose-1.3.0-py3.3.egg/nose/case.py", line 198, in runTest self.test(*self.arg) File "/scratch/bob/p3/pylibs/lib/python3.3/site-packages/scipy/special/tests/test_lambertw.py", line 93, in test_ufunc lambertw(r_[0., e, 1.]), r_[0., 1., 0.567143290409783873]) File "/scratch/bob/p3/pylibs/lib/python3.3/site-packages/numpy/testing/utils.py", line 811, in assert_array_almost_equal header=('Arrays are not almost equal to %d decimals' % decimal)) File "/scratch/bob/p3/pylibs/lib/python3.3/site-packages/numpy/testing/utils.py", line 607, in assert_array_compare chk_same_position(x_isnan, y_isnan, hasval='nan') File "/scratch/bob/p3/pylibs/lib/python3.3/site-packages/numpy/testing/utils.py", line 587, in chk_same_position raise AssertionError(msg) AssertionError: Arrays are not almost equal to 6 decimals x and y nan location mismatch: x: array([ 0.+0.j, nan+0.j, nan+0.j]) y: array([ 0. , 1. , 0.567]) ---------------------------------------------------------------------- From parrenin at ujf-grenoble.fr Thu Dec 19 04:55:46 2013 From: parrenin at ujf-grenoble.fr (=?ISO-8859-1?Q?Fr=E9d=E9ric_Parrenin?=) Date: Thu, 19 Dec 2013 10:55:46 +0100 Subject: [SciPy-User] leastsq and multiprocessing Message-ID: Dear all, Following these posts: http://stackoverflow.com/questions/10489134/multithreaded-calls-to-the-objective-function-of-scipy-optimize-leastsq It seems it is possible to make leastsq take part of multiple processors. I was wondering: given that the tendency of processors is to have more and more cores nowadays, why this is not done by default in leastsq? Best regards, Fr?d?ric Parrenin -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Dec 19 07:24:54 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 19 Dec 2013 07:24:54 -0500 Subject: [SciPy-User] leastsq and multiprocessing In-Reply-To: References: Message-ID: On Thu, Dec 19, 2013 at 4:55 AM, Fr?d?ric Parrenin wrote: > Dear all, > > Following these posts: > > http://stackoverflow.com/questions/10489134/multithreaded-calls-to-the-objective-function-of-scipy-optimize-leastsq > It seems it is possible to make leastsq take part of multiple processors. > > I was wondering: given that the tendency of processors is to have more and > more cores nowadays, why this is not done by default in leastsq? > I think parallelizing leastsq would almost always be the wrong place to parallelize. Even the loop over j that Pauli mentions is in the user function, and leastsq cannot assume that this works, since there are many applications where the calculations for different j's are not independent of each other. Using parallelization in the wrong spot can hurt performance instead of improving it. https://groups.google.com/d/msg/pystatsmodels/3X1LlY9U3Yc/7FDXWEADBUIJ Josef > > Best regards, > > Fr?d?ric Parrenin > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From newville at cars.uchicago.edu Thu Dec 19 12:38:48 2013 From: newville at cars.uchicago.edu (Matt Newville) Date: Thu, 19 Dec 2013 11:38:48 -0600 Subject: [SciPy-User] leastsq and multiprocessing In-Reply-To: References: Message-ID: On Thu, Dec 19, 2013 at 6:24 AM, wrote: > > > > On Thu, Dec 19, 2013 at 4:55 AM, Fr?d?ric Parrenin > wrote: >> >> Dear all, >> >> Following these posts: >> >> http://stackoverflow.com/questions/10489134/multithreaded-calls-to-the-objective-function-of-scipy-optimize-leastsq >> It seems it is possible to make leastsq take part of multiple processors. >> >> I was wondering: given that the tendency of processors is to have more and >> more cores nowadays, why this is not done by default in leastsq? > > > I think parallelizing leastsq would almost always be the wrong place to > parallelize. I am slightly reluctant to speak up, but I think this may not be true. For calls to leastsq() with finite-difference Jacobians, MINPACK's lmdif() calls fdjac2() in each iteration. This subroutine then calls the users objective function N times (for N variables) in a simple loop with slightly different values for the variables. Although these calls share a work array this is an implementation detail and the array elements per variable are actually independent. This loop of N calls to the objective function per iteration would be a good candidate for a multiprocessing pool, and doing so could give a substantial speed up for problems with more than a couple variables and where the calculation of the objective function is the bottleneck (which is typical for all but simple examples). Currently, scipy's leastsq() simply calls the Fortran lmdif() (for finite-diff Jacobian). I think replacing fdjac2() with a multiprocessing version would require reimplementing both lmdif() and fdjac2(), probably using cython. If calls to MINPACKs lmpar() and qrfac() could be left untouched, this translation does not look too insane -- the two routines lmdif() and fdjac2() themselves are not that complicated. It would be a fair amount of work, and I cannot volunteer to do this myself any time soon. But, I do think it actually would improve the speed of leastsq() for many use cases. Hoping this will inspire someone..... --Matt From jeremy at jeremysanders.net Fri Dec 20 07:43:47 2013 From: jeremy at jeremysanders.net (Jeremy Sanders) Date: Fri, 20 Dec 2013 13:43:47 +0100 Subject: [SciPy-User] leastsq and multiprocessing References: Message-ID: Matt Newville wrote: > Currently, scipy's leastsq() simply calls the Fortran lmdif() (for > finite-diff Jacobian). I think replacing fdjac2() with a > multiprocessing version would require reimplementing both lmdif() and > fdjac2(), probably using cython. If calls to MINPACKs lmpar() and > qrfac() could be left untouched, this translation does not look too > insane -- the two routines lmdif() and fdjac2() themselves are not > that complicated. It would be a fair amount of work, and I cannot > volunteer to do this myself any time soon. But, I do think it > actually would improve the speed of leastsq() for many use cases. Computing the Jacobian using using multiprocessing definitely helps the speed. I wrote the unrated answer (xioxox) there which shows how to do it in Python. Jeremy From newville at cars.uchicago.edu Fri Dec 20 08:09:00 2013 From: newville at cars.uchicago.edu (Matt Newville) Date: Fri, 20 Dec 2013 07:09:00 -0600 Subject: [SciPy-User] leastsq and multiprocessing In-Reply-To: References: Message-ID: Jeremy, On Fri, Dec 20, 2013 at 6:43 AM, Jeremy Sanders wrote: > Matt Newville wrote: > >> Currently, scipy's leastsq() simply calls the Fortran lmdif() (for >> finite-diff Jacobian). I think replacing fdjac2() with a >> multiprocessing version would require reimplementing both lmdif() and >> fdjac2(), probably using cython. If calls to MINPACKs lmpar() and >> qrfac() could be left untouched, this translation does not look too >> insane -- the two routines lmdif() and fdjac2() themselves are not >> that complicated. It would be a fair amount of work, and I cannot >> volunteer to do this myself any time soon. But, I do think it >> actually would improve the speed of leastsq() for many use cases. > > Computing the Jacobian using using multiprocessing definitely helps the > speed. I wrote the unrated answer (xioxox) there which shows how to do it in > Python. > > Jeremy > Sorry, I hadn't read the stackoverflow discussion carefully enough. You're right that this is the same basic approach, and your suggestion is much easier to implement. I think having helper functions to automatically provide this functionality would be really great. -- --Matt From schut at sarvision.nl Fri Dec 20 08:46:55 2013 From: schut at sarvision.nl (Vincent Schut) Date: Fri, 20 Dec 2013 14:46:55 +0100 Subject: [SciPy-User] leastsq and multiprocessing In-Reply-To: References: Message-ID: On 12/19/2013 10:55 AM, Fr?d?ric Parrenin wrote: > Dear all, > > Following these posts: > http://stackoverflow.com/questions/10489134/multithreaded-calls-to-the-objective-function-of-scipy-optimize-leastsq > It seems it is possible to make leastsq take part of multiple processors. > > I was wondering: given that the tendency of processors is to have more > and more cores nowadays, why this is not done by default in leastsq? > > Best regards, > > Fr?d?ric Parrenin > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Folks, whatever is the outcome of the discussion, please always make multithreading/-processing a configurable option. Like: I want to be able to turn it off and have numpy/scipy use only 1 thread/core. Most of my programs already use multicore processing, in which case having each process doing 'second level' multicore stuff internally would be very counterproductive. Like, I have 48 cores, and thus 48 subprocesses of my program spawned doing calculations. When each of those also tries to spawn 48 threads/processes to optimize some leastsq problem, in the worst case I'll have 2304 (48*48) threads fighting for 48 cpu's... E.g. I also always have the num threads for openblas set to 1. Best, Vincent. From matt at plot.ly Fri Dec 20 19:07:27 2013 From: matt at plot.ly (Matt Sundquist) Date: Fri, 20 Dec 2013 16:07:27 -0800 Subject: [SciPy-User] Plotly Beta: web-based, publication-quality graphing for Python and IPython Message-ID: Hi SciPy users, My name is Matt, and I'm part of Plotly . We're working on a scientific graphing library for Python (fork here ) that allows you to create interactive, web-based graphs in IPython and your browser. Our gallery of Notebooks is here . We would love to be a useful resource for the SciPy community. As we're quite new (just a few months into our beta), we benefit from and very much appreciate help and expert feedback. We would love your opinions, advice, and thoughts. Plotly lets you style interactive, publication-quality graphs. You can make Plotly graphs with NumPy, pandas, Datetime, and LaTeX. Plotly has bubble charts, box plots, line charts, scatter plots, histograms, 2D histograms, and heatmaps. Plotly supports log axes, error bars, date axes, multiple axes, and subplots. The gallery, here, has examples. You can edit with code or the GUI in Plotly, share a graph online, via a download, or as part of a NB. You can also embed with an iframe (Washington Post example ). Data and graphs live together, and can be shared with collaborators (like a Google Doc). For an example, here is a graph made with Python, styled with this NB. Plotly is set up like GitHub. You control privacy and sharing. It's free for public use, you can fork the APIs (and we welcome pull requests), and has a premium subscription for heavy private use. Thanks a bunch. It would mean a lot to hear your thoughts, advice, and feedback. All my best, Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From flying-sheep at web.de Sat Dec 21 07:07:43 2013 From: flying-sheep at web.de (Philipp A.) Date: Sat, 21 Dec 2013 13:07:43 +0100 Subject: [SciPy-User] Plotly Beta: web-based, publication-quality graphing for Python and IPython In-Reply-To: References: Message-ID: looks very interesting and useful in combination with ipython notebooks, thanks! 2013/12/21 Matt Sundquist > Hi SciPy users, > > My name is Matt, and I'm part of Plotly . We're working > on a scientific graphing library for Python > (fork here ) that allows you to > create interactive, web-based graphs in IPython and your browser. Our > gallery of Notebooks is here > . > > We would love to be a useful resource for the SciPy community. As we're > quite new (just a few months into our beta), we benefit from and very much > appreciate help and expert feedback. We would love your opinions, advice, > and thoughts. > > Plotly lets you style interactive, publication-quality graphs. You can > make Plotly graphs with NumPy, pandas, Datetime, and LaTeX. Plotly has > bubble charts, box plots, line charts, scatter plots, histograms, 2D > histograms, and heatmaps. Plotly supports log axes, error bars, date axes, > multiple axes, and subplots. The gallery, here, > has examples. > > You can edit with code or the GUI in Plotly, share a graph online, via a > download, or as part of a NB. You can also embed with an iframe (Washington > Post example > ). > > Data and graphs live together, and can be shared with collaborators (like > a Google Doc). For an example, here is a graph made > with Python, styled with this NB. > > > Plotly is set up like GitHub. You control privacy and sharing. It's free > for public use, you can fork the APIs (and we welcome pull requests), and > has a premium subscription for heavy private use. > > Thanks a bunch. It would mean a lot to hear your thoughts, advice, and > feedback. > > All my best, > Matt > > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From laughingrice at gmail.com Sun Dec 22 11:59:25 2013 From: laughingrice at gmail.com (laughingrice) Date: Sun, 22 Dec 2013 08:59:25 -0800 (PST) Subject: [SciPy-User] Problems with weave under windows Message-ID: <1387731565361-19020.post@n7.nabble.com> I've been fighting with getting weave working under windows 8.1 using Canopy (scipy 0.13.2-1) Turned out that compilation errors were Microsoft complaining that the command like is too long. Changing line 95 in scipy/weave/catalog.py from return base + sha256(expr).hexdigest() to return base + sha256(expr).hexdigest()[:-30] or doing the same in line 126 of scipy/weave/platform_info.py chk_sum = check_sum(exe_path) to chk_sum = check_sum(exe_path)[:-30] Solved the problem for me (a combination of them also worked removing less characters in each, although this would depend on user name length as well) This is with both visual studio 2010 and 2012 (had to set VS90COMNTOOLS to point to either VS100COMNTOOLS or VS110COMNTOOLS for weave to find vcvarsall.bat as well). Anyone else see this problem and has a better solution? Thanks -- View this message in context: http://scipy-user.10969.n7.nabble.com/Problems-with-weave-under-windows-tp19020.html Sent from the Scipy-User mailing list archive at Nabble.com. From ralf.gommers at gmail.com Mon Dec 23 10:31:29 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 23 Dec 2013 16:31:29 +0100 Subject: [SciPy-User] scipy and python 3.3 In-Reply-To: References: Message-ID: On Wed, Dec 18, 2013 at 1:44 PM, Mark Dixon wrote: > Hi, > > I'm dipping my toe in python3-land and trying to build a few packages on a > 64-bit Intel CentOS 6.5 box, but I have problems getting scipy 0.13.2 to > pass some of its tests. > > Should I be opening tickets about these, or am I being really dumb? > > * python 3.3.2 > * nose 1.3.0 > * numpy 1.8.0 (with PTATLAS=None) > * atlas 3.10.1 (made into a full LAPACK with netlib LAPACK 3.4.2) > * gcc 4.4.7 (as shipped with CentOS 6.5) > > (I've built each of these from source, apart from gcc) > > When I run scipy's tests, I get 3 failures in (full output with verbose=2 > appended below): > > test_basic.test_xlogy > test_lambertw.test_values > test_lambertw.test_ufunc > > I don't get any failures with the same stack built against python 2.7.6. > numpy passes its tests with either stack. > > Any pointers would be greatly appreciated, please! > Hi Mark, I've seen similar failures before but can't reproduce them with python 3.3 or 3.4 right now. Probably compiler or atlas-version specific. Would be useful if you opened a ticket for these. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.c.dixon at leeds.ac.uk Mon Dec 23 10:38:46 2013 From: m.c.dixon at leeds.ac.uk (Mark Dixon) Date: Mon, 23 Dec 2013 15:38:46 +0000 (GMT) Subject: [SciPy-User] scipy and python 3.3 In-Reply-To: References: Message-ID: On Mon, 23 Dec 2013, Ralf Gommers wrote: ... > Hi Mark, I've seen similar failures before but can't reproduce them with > python 3.3 or 3.4 right now. Probably compiler or atlas-version > specific. Would be useful if you opened a ticket for these. ... Hi Ralf, Thanks for that:- I'll open a ticket after we reopen in the new year :) All the best, Mark From juanlu001 at gmail.com Mon Dec 23 13:46:53 2013 From: juanlu001 at gmail.com (Juan Luis Cano) Date: Mon, 23 Dec 2013 19:46:53 +0100 Subject: [SciPy-User] setuptools messing with sdists using numpy.distutils and Fortran libraries Message-ID: <52B8851D.8010004@gmail.com> I'm trying to build a Python package using some Fortran libraries, and started using also setuptools at a certain point because I wanted to take advantage of "setup.py develop". However, despite being stated that one should just import setuptools before numpy.distutils imports: http://mail.scipy.org/pipermail/numpy-discussion/2013-September/067784.html I found that "setup.py sdist" works differently whether setuptools has been imported or not - i.e. not the same files get excluded or included. In particular, I had problems with the .pyf files of a Fortran library I created, included using config.add_extension. Without setuptools it works, and the .pyf files get included, but with setuptools they are missing. This results in failed installations later on. In case you want the specific example, here is the diff between the working and non-working setup.py: https://github.com/Pybonacci/poliastro/compare/0.1.x...master#diff-29 The use of setuptools is not crucial for this project, but I'm interested in knowing what's going on here. Thanks in advance! From ralf.gommers at gmail.com Mon Dec 23 15:39:13 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 23 Dec 2013 21:39:13 +0100 Subject: [SciPy-User] setuptools messing with sdists using numpy.distutils and Fortran libraries In-Reply-To: <52B8851D.8010004@gmail.com> References: <52B8851D.8010004@gmail.com> Message-ID: On Mon, Dec 23, 2013 at 7:46 PM, Juan Luis Cano wrote: > I'm trying to build a Python package using some Fortran libraries, and > started using also setuptools at a certain point because I wanted to > take advantage of "setup.py develop". However, despite being stated that > one should just import setuptools before numpy.distutils imports: > > http://mail.scipy.org/pipermail/numpy-discussion/2013-September/067784.html > > I found that "setup.py sdist" works differently whether setuptools has > been imported or not - i.e. not the same files get excluded or included. > > In particular, I had problems with the .pyf files of a Fortran library I > created, included using config.add_extension. Without setuptools it > works, and the .pyf files get included, but with setuptools they are > missing. This results in failed installations later on. > You could special case "develop" in this manner to not have setuptools mess up other commands: https://github.com/scipy/scipy/blob/master/setup.py#L205 > In case you want the specific example, here is the diff between the > working and non-working setup.py: > > https://github.com/Pybonacci/poliastro/compare/0.1.x...master#diff-29 > > The use of setuptools is not crucial for this project, but I'm > interested in knowing what's going on here. > Both setuptools and numpy.distutils monkeypatch the behavior of distutils commands, so there's very little logic to what happens. Different commands typically break in different ways. You can try to debug it to understand but if you value your sanity, you should try to avoid that:) Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From juanlu001 at gmail.com Mon Dec 23 16:16:14 2013 From: juanlu001 at gmail.com (Juan Luis Cano) Date: Mon, 23 Dec 2013 22:16:14 +0100 Subject: [SciPy-User] setuptools messing with sdists using numpy.distutils and Fortran libraries In-Reply-To: References: <52B8851D.8010004@gmail.com> Message-ID: <52B8A81E.9030803@gmail.com> On 12/23/2013 09:39 PM, Ralf Gommers wrote: > > > > On Mon, Dec 23, 2013 at 7:46 PM, Juan Luis Cano > wrote: > > I'm trying to build a Python package using some Fortran libraries, and > started using also setuptools at a certain point because I wanted to > take advantage of "setup.py develop". However, despite being > stated that > one should just import setuptools before numpy.distutils imports: > > http://mail.scipy.org/pipermail/numpy-discussion/2013-September/067784.html > > I found that "setup.py sdist" works differently whether setuptools has > been imported or not - i.e. not the same files get excluded or > included. > > In particular, I had problems with the .pyf files of a Fortran > library I > created, included using config.add_extension. Without setuptools it > works, and the .pyf files get included, but with setuptools they are > missing. This results in failed installations later on. > > > You could special case "develop" in this manner to not have setuptools > mess up other commands: > https://github.com/scipy/scipy/blob/master/setup.py#L205 > > In case you want the specific example, here is the diff between the > working and non-working setup.py: > > https://github.com/Pybonacci/poliastro/compare/0.1.x...master#diff-29 > > The use of setuptools is not crucial for this project, but I'm > interested in knowing what's going on here. > > > Both setuptools and numpy.distutils monkeypatch the behavior of > distutils commands, so there's very little logic to what happens. > Different commands typically break in different ways. You can try to > debug it to understand but if you value your sanity, you should try to > avoid that:) Yeah, at least for today I value it :) Thank you very much! Cheers Juan Luis -------------- next part -------------- An HTML attachment was scrubbed... URL: From takowl at gmail.com Tue Dec 24 12:18:07 2013 From: takowl at gmail.com (Thomas Kluyver) Date: Tue, 24 Dec 2013 17:18:07 +0000 Subject: [SciPy-User] setuptools messing with sdists using numpy.distutils and Fortran libraries In-Reply-To: <52B8851D.8010004@gmail.com> References: <52B8851D.8010004@gmail.com> Message-ID: On 23 December 2013 18:46, Juan Luis Cano wrote: > and started using also setuptools at a certain point because I wanted to > take advantage of "setup.py develop". > For IPython, we actually went down the rabbit hole and added a 'symlink' command which is like 'develop', but doesn't use setuptools. We'd found that using 'develop' with Python 2 and 3 at the same time created conflicts in the entry points, so we rewrote it to use simple launcher scripts. The scripts are installed normally, only the package is symlinked into site-packages. If you install after symlinking, though, it tries to install into the source tree, so we now have an 'unsymlink' command as well. The implementation is not supposed to be generic, but if anyone else wants to take a look, the code is here: https://github.com/ipython/ipython/blob/master/setupbase.py#L385 Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Dec 25 12:21:27 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Dec 2013 12:21:27 -0500 Subject: [SciPy-User] setuptools messing with sdists using numpy.distutils and Fortran libraries In-Reply-To: <52B8851D.8010004@gmail.com> References: <52B8851D.8010004@gmail.com> Message-ID: On Mon, Dec 23, 2013 at 1:46 PM, Juan Luis Cano wrote: > I'm trying to build a Python package using some Fortran libraries, and > started using also setuptools at a certain point because I wanted to > take advantage of "setup.py develop". However, despite being stated that > one should just import setuptools before numpy.distutils imports: > > http://mail.scipy.org/pipermail/numpy-discussion/2013-September/067784.html > > I found that "setup.py sdist" works differently whether setuptools has > been imported or not - i.e. not the same files get excluded or included. > > In particular, I had problems with the .pyf files of a Fortran library I > created, included using config.add_extension. Without setuptools it > works, and the .pyf files get included, but with setuptools they are > missing. This results in failed installations later on. > I don't have much idea about including fortran. But did you try to include *.pyf in MANIFEST.in? That's in my experience often a source of missing or extra files being included in the sdist or installed files Josef > > In case you want the specific example, here is the diff between the > working and non-working setup.py: > > https://github.com/Pybonacci/poliastro/compare/0.1.x...master#diff-29 > > The use of setuptools is not crucial for this project, but I'm > interested in knowing what's going on here. > > Thanks in advance! > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From juanlu001 at gmail.com Thu Dec 26 10:36:52 2013 From: juanlu001 at gmail.com (Juan Luis Cano) Date: Thu, 26 Dec 2013 16:36:52 +0100 Subject: [SciPy-User] setuptools messing with sdists using numpy.distutils and Fortran libraries In-Reply-To: References: <52B8851D.8010004@gmail.com> Message-ID: <52BC4D14.8040309@gmail.com> On 12/25/2013 06:21 PM, josef.pktd at gmail.com wrote: > > > > On Mon, Dec 23, 2013 at 1:46 PM, Juan Luis Cano > wrote: > > I'm trying to build a Python package using some Fortran libraries, and > started using also setuptools at a certain point because I wanted to > take advantage of "setup.py develop". However, despite being > stated that > one should just import setuptools before numpy.distutils imports: > > http://mail.scipy.org/pipermail/numpy-discussion/2013-September/067784.html > > I found that "setup.py sdist" works differently whether setuptools has > been imported or not - i.e. not the same files get excluded or > included. > > In particular, I had problems with the .pyf files of a Fortran > library I > created, included using config.add_extension. Without setuptools it > works, and the .pyf files get included, but with setuptools they are > missing. This results in failed installations later on. > > > I don't have much idea about including fortran. > > But did you try to include *.pyf in MANIFEST.in? > That's in my experience often a source of missing or extra files being > included in the sdist or installed files > > Josef > I didn't, because I found it weird to specify these files in two different places (setup.py and MANIFEST.in). I just checked and it solves the issue. Good to know! Juan Luis -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Dec 30 11:12:56 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 30 Dec 2013 17:12:56 +0100 Subject: [SciPy-User] Problems with weave under windows In-Reply-To: <1387731565361-19020.post@n7.nabble.com> References: <1387731565361-19020.post@n7.nabble.com> Message-ID: On Sun, Dec 22, 2013 at 5:59 PM, laughingrice wrote: > I've been fighting with getting weave working under windows 8.1 using > Canopy > (scipy 0.13.2-1) > Turned out that compilation errors were Microsoft complaining that the > command like is too long. > > Changing line 95 in scipy/weave/catalog.py from > return base + sha256(expr).hexdigest() > to > return base + sha256(expr).hexdigest()[:-30] > > or doing the same in line 126 of scipy/weave/platform_info.py > chk_sum = check_sum(exe_path) > to > chk_sum = check_sum(exe_path)[:-30] > > Solved the problem for me (a combination of them also worked removing less > characters in each, although this would depend on user name length as well) > > This is with both visual studio 2010 and 2012 (had to set VS90COMNTOOLS to > point to either VS100COMNTOOLS or VS110COMNTOOLS for weave to find > vcvarsall.bat as well). > > Anyone else see this problem and has a better solution? > Not yet. This function was changed in scipy 0.13.0, so it seems we broke something. Can you check what the length limit for the command is on your system? According to http://stackoverflow.com/questions/3205027/maximum-length-of-command-line-stringit should be 2048 chars or more. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: