From aarchiba at physics.mcgill.ca Wed Jun 1 16:07:28 2011 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Wed, 1 Jun 2011 16:07:28 -0400 Subject: [SciPy-User] efficient computation of point cloud nearest neighbors In-Reply-To: <4DE508E1.1000904@molden.no> References: <20110529181538.GA13056@phare.normalesup.org> <4DE508E1.1000904@molden.no> Message-ID: On 31 May 2011 11:27, Sturla Molden wrote: > Den 30.05.2011 20:20, skrev Anne Archibald: >> If this is not fast enough, it might be worth trying a two-tree query >> - that is, putting both the query points and the potential neighbours >> in kd-trees. Then there's an algorithm that saves a lot of tree >> traversal by using the spatial structure of the query points. (In this >> case the two trees are even the same.) Such an algorithm is even >> implemented, but unfortunately only in the pure python KDTree. If the >> OP really needs this to be fast, then the best thing to do would >> probably be to port KDTree.query_tree to cython. The algorithm is a >> little messy but not too complicated. > > In this case we just need one kd-tree. Instead of starting from the > root, we begin with the leaf containing the query point and work our way > downwards. We then find a better branching point from which to start > than the root. That is not messy at all :-) But we can sometimes do better - all the leaves in a leaf node will have very similar neighbour sets, for example, so in principle one can avoid traversing (part of) the tree once for each. I'm not sure how much speedup is really possible, though; since there are kn neighbours to be listed, you're never going to beat O(kn), and the simple query-everything approach is only O(kn log n) or so. > Another thing to note is that making the kd-tree is very fast whereas > searching it is slow. So using multiprocessing is an option. cKDTrees cannot currently be copied, but it would be simple to implement. This would save a bit of time when multiprocessing. That said, they are also immutable, so multiple threads/processes can happily operate on the same one. Anne > Sturla > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From sturla at molden.no Wed Jun 1 17:47:40 2011 From: sturla at molden.no (Sturla Molden) Date: Wed, 01 Jun 2011 23:47:40 +0200 Subject: [SciPy-User] efficient computation of point cloud nearest neighbors In-Reply-To: References: <20110529181538.GA13056@phare.normalesup.org> <4DE508E1.1000904@molden.no> Message-ID: <4DE6B37C.6050302@molden.no> Den 01.06.2011 22:07, skrev Anne Archibald: > cKDTrees cannot currently be copied, but it would be simple to > implement. This would save a bit of time when multiprocessing. There is not much to save here, constructing a kd-tree is very fast compared to searching it. At least it is in the common case of finding k nearest neightbours for each point in a cloud of n points. > That > said, they are also immutable, so multiple threads/processes can > happily operate on the same one. Shared memory do not have the same base address in different processes, so all pointers are invalid. Thus the kd-tree must be built with integer offsets instead of pointers, like my first Python version did. You'll get the same problem if you want to serialize a kd-tree: pointers must be saved as offsets. os.fork() will copy-on-write on Linux though. Sturla From sturla at molden.no Wed Jun 1 18:17:10 2011 From: sturla at molden.no (Sturla Molden) Date: Thu, 02 Jun 2011 00:17:10 +0200 Subject: [SciPy-User] efficient computation of point cloud nearest neighbors In-Reply-To: <4DE6B37C.6050302@molden.no> References: <20110529181538.GA13056@phare.normalesup.org> <4DE508E1.1000904@molden.no> <4DE6B37C.6050302@molden.no> Message-ID: <4DE6BA66.9060705@molden.no> Den 01.06.2011 23:47, skrev Sturla Molden: > > os.fork() will copy-on-write on Linux though. > On Windows we can allocate shared memory as private copy-on-write pages. It is not as useful as Linux' os.fork(), as pointers cannot be shared, but still a nice way to save some RAM if we want to share (and write-protect) large ndarrays. I might add this option to my shared memory ndarrays one day, but not today :) Sturla From William.T.Bridgman at nasa.gov Thu Jun 2 08:17:55 2011 From: William.T.Bridgman at nasa.gov (Bridgman, William T.) Date: Thu, 2 Jun 2011 08:17:55 -0400 Subject: [SciPy-User] Easy way to detect data boundary in integrate.odeint? Message-ID: <72B29452-7630-450B-870C-80FFAC42AA98@nasa.gov> Hello, I'm building streamlines from a 3-D vector array and keep having the problem that if the point I'm propagating reaches the boundary of the data it will sometimes reverse, re-traversing the dataset (really bad), or just repeatedly add points at the boundary (annoying). Is there an easy way to terminate this behavior? I've implemented a version calling odeint in a loop where I check if the output position is still in my data volume with each integration step, but this is notoriously slow. Is there any flag or sentinel value I can wrap around my data cube that would tell odeint to terminate when the integration hits the data boundary? I can't find any in the scipy docs and I've found a few queries on the discussion list which are close to the topic but apparently never actually implemented. Thanks, Tom -- Dr. William T."Tom" Bridgman Scientific Visualization Studio Global Science & Technology, Inc. NASA/Goddard Space Flight Center Email: William.T.Bridgman at nasa.gov Code 610.3 Phone: 301-286-1346 Greenbelt, MD 20771 FAX: 301-286-1634 http://svs.gsfc.nasa.gov/ From josef.pktd at gmail.com Thu Jun 2 10:35:50 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Jun 2011 10:35:50 -0400 Subject: [SciPy-User] random variable for truncated multivariate normal and t distributions Message-ID: (another random question) I would like to generate random variables for multivariate normal and t distributions that are truncated on a rectangular area (element wise upper and lower bounds). Is there anything better available than sampling from the un-truncated distributions and throwing away the out of bounds samples ? Josef From sturla at molden.no Thu Jun 2 11:29:56 2011 From: sturla at molden.no (Sturla Molden) Date: Thu, 02 Jun 2011 17:29:56 +0200 Subject: [SciPy-User] random variable for truncated multivariate normal and t distributions In-Reply-To: References: Message-ID: <4DE7AC74.6080907@molden.no> Den 02.06.2011 16:35, skrev josef.pktd at gmail.com: > (another random question) > > I would like to generate random variables for multivariate normal and > t distributions that are truncated on a rectangular area (element wise > upper and lower bounds). > > Is there anything better available than sampling from the un-truncated > distributions and throwing away the out of bounds samples ? Perhaps Metropolis-Hastings? Sturla From bsouthey at gmail.com Thu Jun 2 12:24:55 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 02 Jun 2011 11:24:55 -0500 Subject: [SciPy-User] random variable for truncated multivariate normal and t distributions In-Reply-To: References: Message-ID: <4DE7B957.2060305@gmail.com> On 06/02/2011 09:35 AM, josef.pktd at gmail.com wrote: > (another random question) > > I would like to generate random variables for multivariate normal and > t distributions that are truncated on a rectangular area (element wise > upper and lower bounds). > > Is there anything better available than sampling from the un-truncated > distributions and throwing away the out of bounds samples ? > > Josef > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user You have seen R's tmvtnorm? cran.r-project.org/web/packages/tmvtnorm Bruce From josef.pktd at gmail.com Thu Jun 2 12:54:23 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Jun 2011 12:54:23 -0400 Subject: [SciPy-User] random variable for truncated multivariate normal and t distributions In-Reply-To: <4DE7B957.2060305@gmail.com> References: <4DE7B957.2060305@gmail.com> Message-ID: On Thu, Jun 2, 2011 at 12:24 PM, Bruce Southey wrote: > On 06/02/2011 09:35 AM, josef.pktd at gmail.com wrote: >> (another random question) >> >> I would like to generate random variables for multivariate normal and >> t distributions that are truncated on a rectangular area (element wise >> upper and lower bounds). >> >> Is there anything better available than sampling from the un-truncated >> distributions and throwing away the out of bounds samples ? >> >> Josef >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > You have seen R's tmvtnorm? > cran.r-project.org/web/packages/tmvtnorm No, I didn't know about this, they use rejection sampling or Gibbs sampling. If these are the alternatives, then I will stick with rejection sampling. I'm not starting to learn the implementation details of simulating with MCMC, Metropolis-Hastings or Gibbs, and leave it to the pymc developers and to Wes. rtmvnorm has a big Warning label about the Gibbs sampler, although, for MonteCarlo integration, any serial correlation in the sampler won't be very relevant. Thanks Sturla and Bruce, Josef > > Bruce > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From rob.clewley at gmail.com Thu Jun 2 15:35:02 2011 From: rob.clewley at gmail.com (Rob Clewley) Date: Thu, 2 Jun 2011 15:35:02 -0400 Subject: [SciPy-User] Easy way to detect data boundary in integrate.odeint? In-Reply-To: <72B29452-7630-450B-870C-80FFAC42AA98@nasa.gov> References: <72B29452-7630-450B-870C-80FFAC42AA98@nasa.gov> Message-ID: William, > I'm building streamlines from a 3-D vector array and keep having the > problem that if the point I'm propagating reaches the boundary of the > data it will sometimes reverse, re-traversing the dataset (really > bad), or just repeatedly add points at the boundary (annoying). > > Is there an easy way to terminate this behavior? No, this feature is not present in scipy's odeint wrapper. > I've implemented a version calling odeint in a loop where I check if > the output position is still in my data volume with each integration > step, but this is notoriously slow. If speed has become a major issue for you, I recommend my PyDSTool package for faster integration with easily set up event detection. Both will take place at the level of the C code the package creates automatically from your specifications. So it will be very fast. Feel free to contact me about setting up if you get stuck with the limited documentation online (just google it). Best, Rob -- Robert Clewley, Ph.D. Assistant Professor Neuroscience Institute and Department of Mathematics and Statistics Georgia State University PO Box 5030 Atlanta, GA 30302, USA tel: 404-413-6420 fax: 404-413-5446 http://www2.gsu.edu/~matrhc http://neuroscience.gsu.edu/rclewley.html From sturla at molden.no Thu Jun 2 19:11:20 2011 From: sturla at molden.no (Sturla Molden) Date: Fri, 03 Jun 2011 01:11:20 +0200 Subject: [SciPy-User] random variable for truncated multivariate normal and t distributions In-Reply-To: References: <4DE7B957.2060305@gmail.com> Message-ID: <4DE81898.1070701@molden.no> Den 02.06.2011 18:54, skrev josef.pktd at gmail.com: > > If these are the alternatives, then I will stick with rejection sampling. > I'm not starting to learn the implementation details of simulating > with MCMC, Metropolis-Hastings or Gibbs, and leave it to the pymc > developers and to Wes. Metropolis-Hastings is a form of rejection sampling. It's just a way to reduce the number of rejections, particularly when the sample space is large. > rtmvnorm has a big Warning label about the Gibbs sampler, although, > for MonteCarlo integration, any serial correlation in the sampler > won't be very relevant. You will get serial correlation with MCMC, but remember they are still samples from the stationary distribution of the Markov chain. You can still use these samples to compute mean, standard deviation, KDE, numerical integrals, etc. Sturla From josef.pktd at gmail.com Fri Jun 3 09:20:36 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 3 Jun 2011 09:20:36 -0400 Subject: [SciPy-User] affine transformation - what's going on ? Message-ID: I'm puzzling for hours already what's going on, and I don't understand where my thinko or bug is. I *think* an affine transformation should return the same count in an inequality. x is (nobs, 3) a is (3) mu and A define an affine transformation Why do the following two not give the same result? The first is about 0.19, the second 0.169 print (x From warren.weckesser at enthought.com Fri Jun 3 10:20:33 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Fri, 3 Jun 2011 09:20:33 -0500 Subject: [SciPy-User] affine transformation - what's going on ? In-Reply-To: References: Message-ID: On Fri, Jun 3, 2011 at 8:20 AM, wrote: > I'm puzzling for hours already what's going on, and I don't understand > where my thinko or bug is. > > I *think* an affine transformation should return the same count in an > inequality. > x is (nobs, 3) > a is (3) > mu and A define an affine transformation > > Why do the following two not give the same result? The first is about > 0.19, the second 0.169 > > print (x > print (affine(x, mu, A) < affine(a, mu, A)).all(-1).mean() > > An affine transformation will not necessarily preserve the ordering of the components of two vectors. Here's a counterexample: In [92]: Q = array([[2, -1.5],[0,0.5]]) In [93]: Q Out[93]: array([[ 2. , -1.5], [ 0. , 0.5]]) In [94]: x1 = array([1.0, 1.0]) In [95]: x2 = array([0.9, 0.1]) In [96]: x1 > x2 Out[96]: array([ True, True], dtype=bool) In [97]: dot(Q, x1) Out[97]: array([ 0.5, 0.5]) In [98]: dot(Q, x2) Out[98]: array([ 1.65, 0.05]) In [99]: dot(Q,x1) > dot(Q,x2) Out[99]: array([False, True], dtype=bool) Warren > > full script below and in attachment > ------------- > import numpy as np > > def affine(x, mu, A): > return np.dot(x-mu, A.T) > > cov3 = np.array([[ 1. , 0.5 , 0.75], > [ 0.5 , 1.5 , 0.6 ], > [ 0.75, 0.6 , 2. ]]) > > mu = np.array([-1, 0.0, 2.0]) > > A = np.array([[ 1.22955725, -0.25615776, -0.38423664], > [-0. , 0.87038828, -0.26111648], > [-0. , -0. , 0.70710678]]) > > x = np.random.multivariate_normal(mu, cov3, size=1000000) > print x.shape > > a = np.array([ 0. , 0.5, 1. ]) > > print (x print (affine(x, mu, A) < affine(a, mu, A)).all(-1).mean() > > ''' > with 100000 > (100000, 3) > 0.19185 > 0.16837 > > with 1000000 > > (1000000, 3) > 0.191597 > 0.168814 > ''' > ------------------ > > context: I'm transforming multivariate normal distributed random > variables, and my cdf's don't match up. > > Can anyone help figuring out where my thinking or my calculations are > wrong? > > Josef > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From e.antero.tammi at gmail.com Fri Jun 3 11:15:06 2011 From: e.antero.tammi at gmail.com (eat) Date: Fri, 3 Jun 2011 18:15:06 +0300 Subject: [SciPy-User] affine transformation - what's going on ? In-Reply-To: References: Message-ID: Hi, On Fri, Jun 3, 2011 at 4:20 PM, wrote: > I'm puzzling for hours already what's going on, and I don't understand > where my thinko or bug is. > > I *think* an affine transformation should return the same count in an > inequality. > x is (nobs, 3) > a is (3) > mu and A define an affine transformation > > Why do the following two not give the same result? The first is about > 0.19, the second 0.169 > > print (x > print (affine(x, mu, A) < affine(a, mu, A)).all(-1).mean() > > > full script below and in attachment > ------------- > import numpy as np > > def affine(x, mu, A): > return np.dot(x-mu, A.T) > > cov3 = np.array([[ 1. , 0.5 , 0.75], > [ 0.5 , 1.5 , 0.6 ], > [ 0.75, 0.6 , 2. ]]) > > mu = np.array([-1, 0.0, 2.0]) > > A = np.array([[ 1.22955725, -0.25615776, -0.38423664], > [-0. , 0.87038828, -0.26111648], > [-0. , -0. , 0.70710678]]) > > x = np.random.multivariate_normal(mu, cov3, size=1000000) > print x.shape > > a = np.array([ 0. , 0.5, 1. ]) > > print (x print (affine(x, mu, A) < affine(a, mu, A)).all(-1).mean() > > ''' > with 100000 > (100000, 3) > 0.19185 > 0.16837 > > with 1000000 > > (1000000, 3) > 0.191597 > 0.168814 > ''' > ------------------ > > context: I'm transforming multivariate normal distributed random > variables, and my cdf's don't match up. > > Can anyone help figuring out where my thinking or my calculations are > wrong? > >From my rusty memory of linear algebra, only orthogonal transformations will do that what you are (apparently) looking for. So, given def affine(x, mu, A): U, S, V= np.linalg.svd(A.T) return np.dot(x- mu, np.dot(U, U.T)) it will produce: In []: run try_affine (1000000, 3) 0.191657 0.191657 My 2 cents, eat > > Josef > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Jun 3 11:48:50 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 3 Jun 2011 11:48:50 -0400 Subject: [SciPy-User] affine transformation - what's going on ? In-Reply-To: References: Message-ID: On Fri, Jun 3, 2011 at 10:20 AM, Warren Weckesser wrote: > > > On Fri, Jun 3, 2011 at 8:20 AM, wrote: >> >> I'm puzzling for hours already what's going on, and I don't understand >> where my thinko or bug is. >> >> I *think* an affine transformation should return the same count in an >> inequality. >> x is (nobs, 3) >> a is (3) >> mu and A define an affine transformation >> >> Why do the following two not give the same result? The first is about >> 0.19, the second 0.169 >> >> print (x> >> print (affine(x, mu, A) < affine(a, mu, A)).all(-1).mean() >> > > > An affine transformation will not necessarily preserve > the ordering of the components of two vectors. > > Here's a counterexample: > > In [92]: Q = array([[2, -1.5],[0,0.5]]) > > In [93]: Q > Out[93]: > array([[ 2. , -1.5], > ?????? [ 0. ,? 0.5]]) > > In [94]: x1 = array([1.0, 1.0]) > > In [95]: x2 = array([0.9, 0.1]) > > In [96]: x1 > x2 > Out[96]: array([ True,? True], dtype=bool) > > In [97]: dot(Q, x1) > Out[97]: array([ 0.5,? 0.5]) > > In [98]: dot(Q, x2) > Out[98]: array([ 1.65,? 0.05]) > > In [99]: dot(Q,x1) > dot(Q,x2) > Out[99]: array([False,? True], dtype=bool) Thanks Warren, nice (nasty for my thinking) example, positive definite and everything. I completely forgot about monotonicity. I had to check it's not a trick with negative eigenvalues. This means we can transform a multivariate normal distributed random variable to the standardized N(0, eye) form but cannot use it for calculating cdfs. http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Affine_transformation (Now I understand why Brentz and Getz only standardize by the variance for the mvn cdf) (Last time I had to ask stackoverflow when my uncorrelated t-distributed random variables were not independent) Josef > > > > Warren > > >> >> full script below and in attachment >> ------------- >> import numpy as np >> >> def affine(x, mu, A): >> ? ?return np.dot(x-mu, A.T) >> >> cov3 = np.array([[ 1. ?, ?0.5 , ?0.75], >> ? ? ? ? ? ? ? ? ? [ 0.5 , ?1.5 , ?0.6 ], >> ? ? ? ? ? ? ? ? ? [ 0.75, ?0.6 , ?2. ?]]) >> >> mu = np.array([-1, 0.0, 2.0]) >> >> A = np.array([[ 1.22955725, -0.25615776, -0.38423664], >> ? ? ? ? ? ? ? [-0. ? ? ? ?, ?0.87038828, -0.26111648], >> ? ? ? ? ? ? ? [-0. ? ? ? ?, -0. ? ? ? ?, ?0.70710678]]) >> >> x = np.random.multivariate_normal(mu, cov3, size=1000000) >> print x.shape >> >> a = np.array([ 0. , ?0.5, ?1. ]) >> >> print (x> print (affine(x, mu, A) < affine(a, mu, A)).all(-1).mean() >> >> ''' >> with 100000 >> (100000, 3) >> 0.19185 >> 0.16837 >> >> with 1000000 >> >> (1000000, 3) >> 0.191597 >> 0.168814 >> ''' >> ------------------ >> >> context: I'm transforming multivariate normal distributed random >> variables, and my cdf's don't match up. >> >> Can anyone help figuring out where my thinking or my calculations are >> wrong? >> >> Josef >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From josef.pktd at gmail.com Fri Jun 3 11:50:41 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 3 Jun 2011 11:50:41 -0400 Subject: [SciPy-User] affine transformation - what's going on ? In-Reply-To: References:

Message-ID: On Fri, Jun 3, 2011 at 11:15 AM, eat wrote: > Hi, > > On Fri, Jun 3, 2011 at 4:20 PM, wrote: >> >> I'm puzzling for hours already what's going on, and I don't understand >> where my thinko or bug is. >> >> I *think* an affine transformation should return the same count in an >> inequality. >> x is (nobs, 3) >> a is (3) >> mu and A define an affine transformation >> >> Why do the following two not give the same result? The first is about >> 0.19, the second 0.169 >> >> print (x> >> print (affine(x, mu, A) < affine(a, mu, A)).all(-1).mean() >> >> >> full script below and in attachment >> ------------- >> import numpy as np >> >> def affine(x, mu, A): >> ? ?return np.dot(x-mu, A.T) >> >> cov3 = np.array([[ 1. ?, ?0.5 , ?0.75], >> ? ? ? ? ? ? ? ? ? [ 0.5 , ?1.5 , ?0.6 ], >> ? ? ? ? ? ? ? ? ? [ 0.75, ?0.6 , ?2. ?]]) >> >> mu = np.array([-1, 0.0, 2.0]) >> >> A = np.array([[ 1.22955725, -0.25615776, -0.38423664], >> ? ? ? ? ? ? ? [-0. ? ? ? ?, ?0.87038828, -0.26111648], >> ? ? ? ? ? ? ? [-0. ? ? ? ?, -0. ? ? ? ?, ?0.70710678]]) >> >> x = np.random.multivariate_normal(mu, cov3, size=1000000) >> print x.shape >> >> a = np.array([ 0. , ?0.5, ?1. ]) >> >> print (x> print (affine(x, mu, A) < affine(a, mu, A)).all(-1).mean() >> >> ''' >> with 100000 >> (100000, 3) >> 0.19185 >> 0.16837 >> >> with 1000000 >> >> (1000000, 3) >> 0.191597 >> 0.168814 >> ''' >> ------------------ >> >> context: I'm transforming multivariate normal distributed random >> variables, and my cdf's don't match up. >> >> Can anyone help figuring out where my thinking or my calculations are >> wrong? > > From my rusty memory of linear algebra, only orthogonal transformations will > do that what you are (apparently) looking for. So, given > def affine(x, mu, A): > ? ? U, S, V= np.linalg.svd(A.T) > > ? ? return np.dot(x- mu, np.dot(U, U.T)) > > it will produce: > In []: run try_affine > (1000000, 3) > 0.191657 > 0.191657 Thanks, my linear algebra has sometimes blind spots. After seeing Warrens answer, I started to google for the conditions that affine transformation are monotonic, but I didn't find anything. Josef > My 2 cents, > eat >> >> Josef >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From William.T.Bridgman at nasa.gov Fri Jun 3 12:13:02 2011 From: William.T.Bridgman at nasa.gov (Bridgman, William T.) Date: Fri, 3 Jun 2011 12:13:02 -0400 Subject: [SciPy-User] Easy way to detect data boundary in integrate.odeint? In-Reply-To: References: Message-ID: <79A56B19-7DAF-4517-8AB0-5D49425E84A3@nasa.gov> Rob, Thanks for the pointer to this package. Is there any reports on its robustness in use with the newest scipy & numpy? I'm probably too close to deadline to use with this project, but will keep it in mind for future projects. Thanks, Tom On Jun 3, 2011, at 11:48 AM, scipy-user-request at scipy.org wrote: > Message: 1 > Date: Thu, 2 Jun 2011 15:35:02 -0400 > From: Rob Clewley > Subject: Re: [SciPy-User] Easy way to detect data boundary in > integrate.odeint? > To: SciPy Users List > Message-ID: > Content-Type: text/plain; charset=ISO-8859-1 > > William, > >> I'm building streamlines from a 3-D vector array and keep having the >> problem that if the point I'm propagating reaches the boundary of the >> data it will sometimes reverse, re-traversing the dataset (really >> bad), or just repeatedly add points at the boundary (annoying). >> >> Is there an easy way to terminate this behavior? > > No, this feature is not present in scipy's odeint wrapper. > >> I've implemented a version calling odeint in a loop where I check if >> the output position is still in my data volume with each integration >> step, but this is notoriously slow. > > If speed has become a major issue for you, I recommend my PyDSTool > package for faster integration with easily set up event detection. > Both will take place at the level of the C code the package creates > automatically from your specifications. So it will be very fast. Feel > free to contact me about setting up if you get stuck with the limited > documentation online (just google it). > > Best, > Rob > > -- > Robert Clewley, Ph.D. > Assistant Professor > Neuroscience Institute and > Department of Mathematics and Statistics > Georgia State University > PO Box 5030 > Atlanta, GA 30302, USA > > tel: 404-413-6420 fax: 404-413-5446 > http://www2.gsu.edu/~matrhc > http://neuroscience.gsu.edu/rclewley.html -- Dr. William T."Tom" Bridgman Scientific Visualization Studio Global Science & Technology, Inc. NASA/Goddard Space Flight Center Email: William.T.Bridgman at nasa.gov Code 610.3 Phone: 301-286-1346 Greenbelt, MD 20771 FAX: 301-286-1634 http://svs.gsfc.nasa.gov/ From silva at lma.cnrs-mrs.fr Sat Jun 4 10:53:22 2011 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Sat, 04 Jun 2011 16:53:22 +0200 Subject: [SciPy-User] ODE: providing function and gradient together Message-ID: <1307199202.8667.6.camel@amilo.coursju> Hello, I am trying to solve some ODE providing the function describing the dynamics and its gradient. The trouble is they are somehow expensive to evaluate (due to some time-varying parameters that need to be updated) and some computations are done twice (once in the function, another one in the gradient). I was wondering if anyone know an integration facility that would accept a single function returning the evaluation of the dynamics function and the evaluation of the gradient together. Any idea? Regards -- Fabrice Silva From rob.clewley at gmail.com Sat Jun 4 14:20:24 2011 From: rob.clewley at gmail.com (Rob Clewley) Date: Sat, 4 Jun 2011 14:20:24 -0400 Subject: [SciPy-User] Easy way to detect data boundary in integrate.odeint? In-Reply-To: <79A56B19-7DAF-4517-8AB0-5D49425E84A3@nasa.gov> References: <79A56B19-7DAF-4517-8AB0-5D49425E84A3@nasa.gov> Message-ID: Hi Tom, I am not using the latest numpy/scipy versions myself, I have been using 1.4.1/0.7.0 respectively. There are just a very small number of test errors with the very latest versions, basically a strange Bus Error (on Macs at least) for certain symbolic calculations, which I don't yet understand. Otherwise you should be fine using the latest ones. Do let me know if you encounter any specific issues on your platform and setup. Best, Rob On Fri, Jun 3, 2011 at 12:13 PM, Bridgman, William T. wrote: > Rob, > > Thanks for the pointer to this package. > > Is there any reports on its robustness in use with the newest scipy & > numpy? > > I'm probably too close to deadline to use with this project, but will > keep it in mind for future projects. > > Thanks, > Tom > On Jun 3, 2011, at 11:48 AM, scipy-user-request at scipy.org wrote: >> Message: 1 >> Date: Thu, 2 Jun 2011 15:35:02 -0400 >> From: Rob Clewley >> Subject: Re: [SciPy-User] Easy way to detect data boundary in >> ? ? ? ?integrate.odeint? >> To: SciPy Users List >> Message-ID: >> Content-Type: text/plain; charset=ISO-8859-1 >> >> William, >> >>> I'm building streamlines from a 3-D vector array and keep having the >>> problem that if the point I'm propagating reaches the boundary of the >>> data it will sometimes reverse, re-traversing the dataset (really >>> bad), or just repeatedly add points at the boundary (annoying). >>> >>> Is there an easy way to terminate this behavior? >> >> No, this feature is not present in scipy's odeint wrapper. >> >>> I've implemented a version calling odeint in a loop where I check if >>> the output position is still in my data volume with each integration >>> step, but this is notoriously slow. >> >> If speed has become a major issue for you, I recommend my PyDSTool >> package for faster integration with easily set up event detection. >> Both will take place at the level of the C code the package creates >> automatically from your specifications. So it will be very fast. Feel >> free to contact me about setting up if you get stuck with the limited >> documentation online (just google it). >> >> Best, >> Rob >> >> -- >> Robert Clewley, Ph.D. >> Assistant Professor >> Neuroscience Institute and >> Department of Mathematics and Statistics >> Georgia State University >> PO Box 5030 >> Atlanta, GA 30302, USA >> >> tel: 404-413-6420 fax: 404-413-5446 >> http://www2.gsu.edu/~matrhc >> http://neuroscience.gsu.edu/rclewley.html > > -- > Dr. William T."Tom" Bridgman ? ? ? ? ? ? ? Scientific Visualization > Studio > Global Science & Technology, Inc. ? ? ? ? ?NASA/Goddard Space Flight > Center > Email: William.T.Bridgman at nasa.gov ? ? ? ? Code 610.3 > Phone: 301-286-1346 ? ? ? ? ? ? ? ? ? ? ? ?Greenbelt, MD 20771 > FAX: ? 301-286-1634 ? ? ? ? ? ? ? ? ? ? ? ?http://svs.gsfc.nasa.gov/ > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Robert Clewley, Ph.D. Assistant Professor Neuroscience Institute and Department of Mathematics and Statistics Georgia State University PO Box 5030 Atlanta, GA 30302, USA tel: 404-413-6420 fax: 404-413-5446 http://www2.gsu.edu/~matrhc http://neuroscience.gsu.edu/rclewley.html From kwgoodman at gmail.com Sat Jun 4 14:44:57 2011 From: kwgoodman at gmail.com (Keith Goodman) Date: Sat, 4 Jun 2011 11:44:57 -0700 Subject: [SciPy-User] [ANN] Bottleneck 0.5.0beta Message-ID: I don't know if there are any Bottleneck users out there but I do know that the Bottleneck 0.4 release was a mess (0.4.0, 0.4.1, 0.4.2, 0.4.3). So this time around I've made a beta release of Bottleneck 0.5: https://github.com/downloads/kwgoodman/bottleneck/Bottleneck-0.5.0beta.tar.gz Reports of success or failure of bottleneck.test() are appreciated. *Release date: Not yet released, in development* The fifth release of bottleneck adds four new functions, comes in a single source distribution instead of separate 32 and 64 bit versions, and fixes a bug in nanmedian: **New functions** - move_median(), moving window median - partsort(), partial sort - argpartsort() - ss(), sum of squares, faster version of scipy.stats.ss **Changes** - Single source distribution instead of separate 32 and 64 bit versions - nanmax and nanmin now follow Numpy 1.6 (not 1.5.1) when input is all NaN **Bug fixes** - #14 Support python 2.5 by importing `with` statement - #22 nanmedian wrong for particular ordering of NaN and non-NaN elements From cgohlke at uci.edu Sat Jun 4 15:52:29 2011 From: cgohlke at uci.edu (Christoph Gohlke) Date: Sat, 04 Jun 2011 12:52:29 -0700 Subject: [SciPy-User] [ANN] Bottleneck 0.5.0beta In-Reply-To: References: Message-ID: <4DEA8CFD.4030505@uci.edu> On 6/4/2011 11:44 AM, Keith Goodman wrote: > I don't know if there are any Bottleneck users out there but I do know > that the Bottleneck 0.4 release was a mess (0.4.0, 0.4.1, 0.4.2, > 0.4.3). So this time around I've made a beta release of Bottleneck > 0.5: > > https://github.com/downloads/kwgoodman/bottleneck/Bottleneck-0.5.0beta.tar.gz > > Reports of success or failure of bottleneck.test() are appreciated. > > *Release date: Not yet released, in development* > > The fifth release of bottleneck adds four new functions, comes in a > single source distribution instead of separate 32 and 64 bit versions, > and fixes a bug in nanmedian: > > **New functions** > > - move_median(), moving window median > - partsort(), partial sort > - argpartsort() > - ss(), sum of squares, faster version of scipy.stats.ss > > **Changes** > > - Single source distribution instead of separate 32 and 64 bit versions > - nanmax and nanmin now follow Numpy 1.6 (not 1.5.1) when input is all NaN > > **Bug fixes** > > - #14 Support python 2.5 by importing `with` statement > - #22 nanmedian wrong for particular ordering of NaN and non-NaN elements Hi Keith, the code currently fails to compile with msvc9 on Windows. A patch is attached. bottleneck.test() passes all 80 tests in ~30s. In move_median.c, _size_t is defined as 64 bit npy_int64 even on 32 bit systems. Is that intended? Christoph -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: msvc9.diff URL: From kwgoodman at gmail.com Sat Jun 4 16:48:27 2011 From: kwgoodman at gmail.com (Keith Goodman) Date: Sat, 4 Jun 2011 13:48:27 -0700 Subject: [SciPy-User] [ANN] Bottleneck 0.5.0beta In-Reply-To: <4DEA8CFD.4030505@uci.edu> References: <4DEA8CFD.4030505@uci.edu> Message-ID: On Sat, Jun 4, 2011 at 12:52 PM, Christoph Gohlke wrote: > > > On 6/4/2011 11:44 AM, Keith Goodman wrote: >> >> I don't know if there are any Bottleneck users out there but I do know >> that the Bottleneck 0.4 release was a mess (0.4.0, 0.4.1, 0.4.2, >> 0.4.3). So this time around I've made a beta release of Bottleneck >> 0.5: >> >> >> https://github.com/downloads/kwgoodman/bottleneck/Bottleneck-0.5.0beta.tar.gz >> >> Reports of success or failure of bottleneck.test() are appreciated. >> >> *Release date: Not yet released, in development* >> >> The fifth release of bottleneck adds four new functions, comes in a >> single source distribution instead of separate 32 and 64 bit versions, >> and fixes a bug in nanmedian: >> >> **New functions** >> >> - move_median(), moving window median >> - partsort(), partial sort >> - argpartsort() >> - ss(), sum of squares, faster version of scipy.stats.ss >> >> **Changes** >> >> - Single source distribution instead of separate 32 and 64 bit versions >> - nanmax and nanmin now follow Numpy 1.6 (not 1.5.1) when input is all NaN >> >> **Bug fixes** >> >> - #14 Support python 2.5 by importing `with` statement >> - #22 nanmedian wrong for particular ordering of NaN and non-NaN elements > > > Hi Keith, > > the code currently fails to compile with msvc9 on Windows. A patch is > attached. > > bottleneck.test() passes all 80 tests in ~30s. > > In move_median.c, _size_t is defined as 64 bit npy_int64 even on 32 bit > systems. Is that intended? Thank you, Christoph. You changed inline to __inline in the C code. I read that __inline is vendor specific and not a C99 keyword. Does anyone know if __inline inlines the code with gcc? You also changed: # Is the OS 32 or 64 bits? -if np.int_ == np.int32: +if tuple.__itemsize__ == 4: bits = '32' -elif np.int_ == np.int64: +elif tuple.__itemsize__ == 8: bits = '64' else: raise ValueError("Your OS does not appear to be 32 or 64 bits.") Will that always work for Numpy? If so I use it in several places and will make the change. As for the npy_int64 question, I don't know. I am confused about dtypes in move_median. The C code only uses float64 for data values yet it works fine for int dtype. I guess cython is doing the casting for me somewhere. I thought I'd have to have separate versions of the C code for each dtype. Are there problems with using npy_int64 of 32 bit systems? From rob.clewley at gmail.com Sat Jun 4 17:09:56 2011 From: rob.clewley at gmail.com (Rob Clewley) Date: Sat, 4 Jun 2011 17:09:56 -0400 Subject: [SciPy-User] Call for collaborators in NSF grant proposal for scientific software for modeling complex dynamical systems with Python Message-ID: Dear colleagues, I am planning to put in a grant application to the NSF?s Software Infrastructure for Sustained Innovation (SI^2) program, due by July 18th. The solicitation encourages a multi-disciplinary team, and although a Program Officer was supportive of my idea in itself, he encouraged me to find one or two others from fields outside my own (computational/mathematical neuroscience) who would be interested in pushing the idea into other disciplines with a longer term view of development. I understand that there isn?t much time, but maybe some of you familiar with dynamical systems modeling, model optimization, and complex systems, might be interested in discussing further with me offline. I have a broad draft of a proposal ready, and your ideas to expand on it further with a vision that resonates with mine could still be easily incorporated in time for the deadline. I wish to contribute towards a solution to the broad problem faced in many areas of modern modeling: ?We don?t know how to do data-driven science.? Specifically, my interest is in developing better computational tools to explore and diagnose hypothesized mechanisms in high-dimensional nonlinear dynamical systems. For the most part, my interest lies with models defined by (ordinary) differential equations, rather than discrete mappings or automata. You may be using standard simulators or specific technology for your field, and I am not proposing to build a more efficient simulator for large-scale models. I want to develop pioneering new tools for data-driven modeling that interfaces with a simulator and traditional analysis tools (maybe ones that are specific to your scientific field, but are likely to include bifurcation analysis, qualitative geometric analysis, fast-slow multi-scale reductions, statistical analysis). The new tools are expected to introduce algorithms incorporating qualitative reasoning and heuristics to help a user better perform their difficult work. (My opinion is that computer-assisted approaches explicitly involving a supervising expert user can make much more of a short term impact than fully automated approaches, which I think are currently too ambitious.) What I am seeking from a collaborator (a faculty member at a US research institution eligible to be a co-PI) is a resonant interest in forward-looking questions about understanding and engineering large-scale dynamic models based on a modular building-block approach (i.e., both analysis and synthesis). You would be interested in Python-based tools in a similar spirit to mine (see below), which would be directly relevant to your personal scientific interests in a different disciplinary area. E.g., this could be coming from areas such as climate modeling, geophysics, biochemistry, genomics, biomechanics, astrophysics, all of which share problems of managing complexity in large, nonlinear models involving mixed levels of representation and multiple scales. My proposal springboards off several ideas already prototyped in my PyDSTool dynamical systems software environment (http://pydstool.sourceforge.net), but the aims of the proposal do _not_ need to be focused solely on development within the PyDSTool framework itself. I currently have an NSF award for a related project, not focused on the software tools themselves, and some publications that have come from it. I also have two students working on applications of the software in this project. Some of the developments are fueling this new grant proposal. You can read more about my research here: http://www2.gsu.edu/~matrhc/. You can read more about the NSF solicitation here: http://www.nsf.gov/pubs/2011/nsf11539/nsf11539.htm. I can send you my proposal draft if you contact me with your interests (rclewley AT gsu DOT edu) and I am available to talk on the phone or skype next week. Thanks for your attention, and I look forward to talking with you! Regards, Rob -- Robert Clewley, Ph.D. Assistant Professor Neuroscience Institute and Department of Mathematics and Statistics Georgia State University PO Box 5030 Atlanta, GA 30302, USA tel: 404-413-6420 fax: 404-413-5446 http://www2.gsu.edu/~matrhc http://neuroscience.gsu.edu/rclewley.html From kwgoodman at gmail.com Sat Jun 4 20:10:35 2011 From: kwgoodman at gmail.com (Keith Goodman) Date: Sat, 4 Jun 2011 17:10:35 -0700 Subject: [SciPy-User] [ANN] Bottleneck 0.5.0beta In-Reply-To: References: <4DEA8CFD.4030505@uci.edu> Message-ID: On Sat, Jun 4, 2011 at 1:48 PM, Keith Goodman wrote: > On Sat, Jun 4, 2011 at 12:52 PM, Christoph Gohlke wrote: >> >> >> On 6/4/2011 11:44 AM, Keith Goodman wrote: >>> >>> I don't know if there are any Bottleneck users out there but I do know >>> that the Bottleneck 0.4 release was a mess (0.4.0, 0.4.1, 0.4.2, >>> 0.4.3). So this time around I've made a beta release of Bottleneck >>> 0.5: >>> >>> >>> https://github.com/downloads/kwgoodman/bottleneck/Bottleneck-0.5.0beta.tar.gz >>> >>> Reports of success or failure of bottleneck.test() are appreciated. >>> >>> *Release date: Not yet released, in development* >>> >>> The fifth release of bottleneck adds four new functions, comes in a >>> single source distribution instead of separate 32 and 64 bit versions, >>> and fixes a bug in nanmedian: >>> >>> **New functions** >>> >>> - move_median(), moving window median >>> - partsort(), partial sort >>> - argpartsort() >>> - ss(), sum of squares, faster version of scipy.stats.ss >>> >>> **Changes** >>> >>> - Single source distribution instead of separate 32 and 64 bit versions >>> - nanmax and nanmin now follow Numpy 1.6 (not 1.5.1) when input is all NaN >>> >>> **Bug fixes** >>> >>> - #14 Support python 2.5 by importing `with` statement >>> - #22 nanmedian wrong for particular ordering of NaN and non-NaN elements >> >> >> Hi Keith, >> >> the code currently fails to compile with msvc9 on Windows. A patch is >> attached. >> >> bottleneck.test() passes all 80 tests in ~30s. >> >> In move_median.c, _size_t is defined as 64 bit npy_int64 even on 32 bit >> systems. Is that intended? > > Thank you, Christoph. > > You changed inline to __inline in the C code. I read that __inline is > vendor specific and not a C99 keyword. Does anyone know if __inline > inlines the code with gcc? > > You also changed: > > ?# Is the OS 32 or 64 bits? > -if np.int_ == np.int32: > +if tuple.__itemsize__ == 4: > ? ? bits = '32' > -elif np.int_ == np.int64: > +elif tuple.__itemsize__ == 8: > ? ? bits = '64' > ?else: > ? ? raise ValueError("Your OS does not appear to be 32 or 64 bits.") > > Will that always work for Numpy? If so I use it in several places and > will make the change. > > As for the npy_int64 question, I don't know. I am confused about > dtypes in move_median. The C code only uses float64 for data values > yet it works fine for int dtype. I guess cython is doing the casting > for me somewhere. I thought I'd have to have separate versions of the > C code for each dtype. > > Are there problems with using npy_int64 of 32 bit systems? Second beta with Christoph's bug fixes for windows compilers: https://github.com/downloads/kwgoodman/bottleneck/Bottleneck-0.5.0beta2.tar.gz From josef.pktd at gmail.com Sun Jun 5 15:43:18 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 5 Jun 2011 15:43:18 -0400 Subject: [SciPy-User] scipy.stats one-sided two-sided less, greater, signed ? Message-ID: What should be the policy on one-sided versus two-sided? The main reason right now for looking at this is http://projects.scipy.org/scipy/ticket/1394 which specifies a "one-sided" alternative and provides both lower and upper tail. I would prefer that we follow the alternative patterns similar to R currently only kstest has alternative : 'two_sided' (default), 'less' or 'greater' but this should be added to other tests where it makes sense R fisher.exact """alternative indicates the alternative hypothesis and must be one of "two.sided", "greater" or "less". You can specify just the initial letter. Only used in the 2 by 2 case.""" mannwhitneyu reports a one-sided test without actually specifying which alternative is used (I thought I remembered other cases like this but don't find any right now) related: in many cases in the two-sided tests the test statistic has a sign that indicates in which tail the test-statistic falls. This is useful in ttests for example, because the one-sided tests can be backed out from the two-sided tests. (With symmetric distributions one-sided p-value is just half of the two-sided pvalue) In the discussion of https://github.com/scipy/scipy/pull/8 I argued that this might mislead users to interpret a two-sided result as a one-sided result. However, I doubt now that this is a strong argument against not reporting the signed test statistic. After going through scipy.stats.stats, it looks like we always report the signed test statistic. The test statistic in ks_2samp is in all cases defined as a max value and doesn't have a sign in R either, so adding a sign there would break with the standard definition. one-sided option for ks_2samp would just require to find the distribution of the test statistics D+, D- --- So my proposal for the general pattern (with exceptions for special reasons) would be * add/offer alternative : 'two_sided' (default), 'less' or 'greater' http://projects.scipy.org/scipy/ticket/1394 for now, and adjustments of existing tests in the future (adding the option can be mostly done in a backwards compatible way and for symmetric distributions like ttest it's just a convenience) mannwhitneyu seems to be the only "weird" one * report signed test statistic for two-sided alternative (when a signed test statistic exists): which is the status quo in stats.stats, but I didn't know that this is actually pretty consistent across tests. Opinions ? Josef From gael.varoquaux at normalesup.org Sun Jun 5 17:04:28 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 5 Jun 2011 23:04:28 +0200 Subject: [SciPy-User] Call for collaborators in NSF grant proposal for scientific software for modeling complex dynamical systems with Python In-Reply-To: References: Message-ID: <20110605210428.GA25957@phare.normalesup.org> Hi Rob, Just a quick suggestion, as I come back from traveling and am unpiling my e-mail. It seems to me that it would be interesting for your project to bring in someone from control theory that deals with infering the behavior of a dynamical system from observations. The reason I suggest this is that these guys tend to have a fairly different culture then the standard 'modeling science' --physics, chemistry, computational neuroscience-- guys. My 2 cents, Ga?l From wkerzendorf at googlemail.com Mon Jun 6 06:40:10 2011 From: wkerzendorf at googlemail.com (Wolfgang Kerzendorf) Date: Mon, 06 Jun 2011 20:40:10 +1000 Subject: [SciPy-User] ND interpolation with Qhull Message-ID: <4DECAE8A.3010507@gmail.com> Dear all, I'm interested in learning how the LinearNDInterpolator actually works. So I read up on qhull, convex hulls and delauney triangulation: I understand that one can use qhull to construct the convex hull in a d-dimensional space. If I want the delauney triangulation of n points in d dimensions I just need to project these points on a paraboloid in d+1 dimensional space build the convex hull and reproject this onto d-dimensions. So now that I have the triangles I just need to find the triangle containing the point to be interpolated. And that is where I'm a little bit unclear: How do I find the point? I know that in the barycentric coordinate system I have three coefficents and if the sum of two of them is less than 1 to reproduce my point, then I found my triangle. But this requires me to go through all triangles. I'm sure there is a faster way (which is probably used by scipy). Once I have the triangle I just determine the two coefficients (in two dimensions) and add the vectors up to get the interpolation? Help is much appreciated Wolfgang From pav at iki.fi Mon Jun 6 07:14:26 2011 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 6 Jun 2011 11:14:26 +0000 (UTC) Subject: [SciPy-User] ND interpolation with Qhull References: <4DECAE8A.3010507@gmail.com> Message-ID: Mon, 06 Jun 2011 20:40:10 +1000, Wolfgang Kerzendorf wrote: [clip] > I understand that one can use qhull to construct the convex hull in a > d-dimensional space. If I want the delauney triangulation of n points > in d dimensions I just need to project these points on a paraboloid > in d+1 dimensional space build the convex hull and reproject this > onto d-dimensions. Qhull actually does that for you -- you can ask it to directly provide a Delaunay triangulation. But yes, that's how it works it out internally. [clip] > I know that in the barycentric coordinate system I have three > coefficents and if the sum of two of them is less than 1 to reproduce my > point, then I found my triangle. You also need to require that the two coordinates are in the range [0, 1]. > But this requires me to go through all triangles. I'm sure there > is a faster way (which is probably used by scipy). You can read the code to find out how it works: https://github.com/scipy/scipy/blob/master/scipy/spatial/qhull.pyx#L771 Basically, you do a directed walk in the triangle neigbourhood graph. There are two things you can do: first, you walk on the convex hull in (d+1)-dim to the correct direction until you see the target point over the horizon, and then you walk towards the target point in d-dim. Or, you just do the directed walk in d-dim. Qhull itself uses the "walk towards the horizon" approach, but in practice this doesn't seem to be much better than the directed walk. If the walk ends up in a degenerate triangle in the triangulation, these approaches either fail or enter an infinite cycle, so you need to fall back to brute force search through all the triangles. Getting degenerate triangles in the triangulation seems difficult to avoid in practice (people like to use these routines for data on rectangular grids), so you just have to live with them. Also, when you do interpolation, you start the walk from the point at which you were last at, because the next point to interpolate is usually close to the last one. > Once I have the triangle I just determine the two coefficients (in two > dimensions) and add the vectors up to get the interpolation? You can use the three barycentric coordinates [c3 = 1 - c1 - c2] to compute the weights you want. Barycentric interpolation is simply v = c1 v1 + c2 v3 + c3 v3 But if you want *smooth* spline interpolation rather than linear, things get hairy, as you need to ensure that C1 continuity is satisfied at the triangle boundaries. In 2D these conditions are not too difficult to satisfy, but things start to get substantially more hairy in higher dimensions. First, there are more matching conditions, and satisfying them is more difficult. Second, you need higher-degree splines, and so have more free parameters -- so in 3D you need to estimate not only gradients but also the Hessians from the set of data points, etc etc. As far as I know, a general solution for N dimensions is not known so far. Instead, you have a number of cooking recipes in 3D and 4D. To my understanding, in higher dimensions, natural neighbor interpolation becomes easier to implement than spline interpolation, and IIRC, if done correctly, it does provide global C1 smoothness. However, for natural neighbor the computational complexity goes up fast with dimensions, as you in the end need to do local modifications to the triangulation. In principle, one could reuse Qhull here, but this is not done in Scipy yet. -- Pauli Virtanen From rob.clewley at gmail.com Mon Jun 6 11:44:08 2011 From: rob.clewley at gmail.com (Rob Clewley) Date: Mon, 6 Jun 2011 11:44:08 -0400 Subject: [SciPy-User] Call for collaborators in NSF grant proposal for scientific software for modeling complex dynamical systems with Python In-Reply-To: <20110605210428.GA25957@phare.normalesup.org> References: <20110605210428.GA25957@phare.normalesup.org> Message-ID: Hi Gael, Thanks for your input. I am definitely interested in control theory folks, and for more than just that reason. I am on the lookout for such people, but I have yet to match up with someone. Do you have any specific suggestions, by any chance? Best, Rob On Sun, Jun 5, 2011 at 5:04 PM, Gael Varoquaux wrote: > Hi Rob, > > Just a quick suggestion, as I come back from traveling and am unpiling my > e-mail. It seems to me that it would be interesting for your project to > bring in someone from control theory that deals with infering the > behavior of a dynamical system from observations. The reason I suggest > this is that these guys tend to have a fairly different culture then the > standard 'modeling science' --physics, chemistry, computational > neuroscience-- guys. > > My 2 cents, > > Ga?l > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Robert Clewley, Ph.D. Assistant Professor Neuroscience Institute and Department of Mathematics and Statistics Georgia State University PO Box 5030 Atlanta, GA 30302, USA tel: 404-413-6420 fax: 404-413-5446 http://www2.gsu.edu/~matrhc http://neuroscience.gsu.edu/rclewley.html From gael.varoquaux at normalesup.org Mon Jun 6 11:45:55 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 6 Jun 2011 17:45:55 +0200 Subject: [SciPy-User] Call for collaborators in NSF grant proposal for scientific software for modeling complex dynamical systems with Python In-Reply-To: References: <20110605210428.GA25957@phare.normalesup.org> Message-ID: <20110606154555.GC28093@phare.normalesup.org> On Mon, Jun 06, 2011 at 11:44:08AM -0400, Rob Clewley wrote: > Thanks for your input. I am definitely interested in control theory > folks, and for more than just that reason. I am on the lookout for > such people, but I have yet to match up with someone. Do you have any > specific suggestions, by any chance? No, unfortunately. The only people I know that do control theory are French (and thus not eligible for your call) and don't do Python. They've been looking at neuroscience thought :). Gael From bsouthey at gmail.com Mon Jun 6 14:34:45 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 06 Jun 2011 13:34:45 -0500 Subject: [SciPy-User] scipy.stats one-sided two-sided less, greater, signed ? In-Reply-To: References: Message-ID: <4DED1DC5.8090503@gmail.com> On 06/05/2011 02:43 PM, josef.pktd at gmail.com wrote: > What should be the policy on one-sided versus two-sided? Yes :-) > The main reason right now for looking at this is > http://projects.scipy.org/scipy/ticket/1394 which specifies a > "one-sided" alternative and provides both lower and upper tail. That refers to the Fisher's test rather than the more 'traditional' one-sided tests. Each value of the Fisher's test has special meanings about the value or probability of the 'first cell' under the null hypothesis. So it is necessary to provide those three values. > I would prefer that we follow the alternative patterns similar to R > > currently only kstest has alternative : 'two_sided' (default), > 'less' or 'greater' > but this should be added to other tests where it makes sense I think that these Kolmogorov-Smirnov tests are not the traditional meaning either. It is a little mind-boggling to try to think about cdfs! > R fisher.exact > """alternative indicates the alternative hypothesis and must be one > of "two.sided", "greater" or "less". You can specify just the initial > letter. Only used in the 2 by 2 case.""" > > mannwhitneyu reports a one-sided test without actually specifying > which alternative is used (I thought I remembered other cases like > this but don't find any right now) > > related: > in many cases in the two-sided tests the test statistic has a sign > that indicates in which tail the test-statistic falls. > This is useful in ttests for example, because the one-sided tests can > be backed out from the two-sided tests. (With symmetric distributions > one-sided p-value is just half of the two-sided pvalue) > > In the discussion of https://github.com/scipy/scipy/pull/8 I argued > that this might mislead users to interpret a two-sided result as a > one-sided result. However, I doubt now that this is a strong argument > against not reporting the signed test statistic. (I do not follow pull requests so is there a relevant ticket?) > After going through scipy.stats.stats, it looks like we always report > the signed test statistic. > > The test statistic in ks_2samp is in all cases defined as a max value > and doesn't have a sign in R either, so adding a sign there would > break with the standard definition. > one-sided option for ks_2samp would just require to find the > distribution of the test statistics D+, D- > > --- > > So my proposal for the general pattern (with exceptions for special > reasons) would be > > * add/offer alternative : 'two_sided' (default), 'less' or 'greater' > http://projects.scipy.org/scipy/ticket/1394 for now, > and adjustments of existing tests in the future (adding the option can > be mostly done in a backwards compatible way and for symmetric > distributions like ttest it's just a convenience) > mannwhitneyu seems to be the only "weird" one > > * report signed test statistic for two-sided alternative (when a > signed test statistic exists): which is the status quo in > stats.stats, but I didn't know that this is actually pretty consistent > across tests. > > Opinions ? > > Josef > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user I think that there is some valid misunderstanding here (as I was in the same situation) regarding what is meant here. My understanding is that under a one-sided hypothesis, all the values of the null hypothesis only exist in one tail of the test distribution. In contrast the values of null distribution exist in both tails with a two-sided hypothesis. Yet that interpretation does not have the same meaning as the tails in the Fisher or Kolmogorov-Smirnov tests. I never paid much attention to the frequency based tests but it does not surprise if there are no one-sided tests. Most are rank-based so it is rather hard to do in a simply manner - actually I am not even sure how to use a permutation test. Bruce From josef.pktd at gmail.com Mon Jun 6 15:34:12 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 6 Jun 2011 15:34:12 -0400 Subject: [SciPy-User] scipy.stats one-sided two-sided less, greater, signed ? In-Reply-To: <4DED1DC5.8090503@gmail.com> References: <4DED1DC5.8090503@gmail.com> Message-ID: On Mon, Jun 6, 2011 at 2:34 PM, Bruce Southey wrote: > On 06/05/2011 02:43 PM, josef.pktd at gmail.com wrote: >> What should be the policy on one-sided versus two-sided? > Yes :-) > >> The main reason right now for looking at this is >> http://projects.scipy.org/scipy/ticket/1394 which specifies a >> "one-sided" alternative and provides both lower and upper tail. > That refers to the Fisher's test rather than the more 'traditional' > one-sided tests. Each value of the Fisher's test has special meanings > about the value or probability of the 'first cell' under the null > hypothesis. ?So it is necessary to provide those three values. > >> I would prefer that we follow the alternative patterns similar to R >> >> currently only kstest has ? ?alternative : 'two_sided' (default), >> 'less' or 'greater' >> but this should be added to other tests where it makes sense > I think that these Kolmogorov-Smirnov ?tests are not the traditional > meaning either. It is a little mind-boggling to try to think about cdfs! > >> R fisher.exact >> """alternative ? ? ? ?indicates the alternative hypothesis and must be one >> of "two.sided", "greater" or "less". You can specify just the initial >> letter. Only used in the 2 by 2 case.""" >> >> mannwhitneyu reports a one-sided test without actually specifying >> which alternative is used ?(I thought I remembered other cases like >> this but don't find any right now) >> >> related: >> in many cases in the two-sided tests the test statistic has a sign >> that indicates in which tail the test-statistic falls. >> This is useful in ttests for example, because the one-sided tests can >> be backed out from the two-sided tests. (With symmetric distributions >> one-sided p-value is just half of the two-sided pvalue) >> >> In the discussion of https://github.com/scipy/scipy/pull/8 ?I argued >> that this might mislead users to interpret a two-sided result as a >> one-sided result. However, I doubt now that this is a strong argument >> against not reporting the signed test statistic. > (I do not follow pull requests so is there a relevant ticket?) > >> After going through scipy.stats.stats, it looks like we always report >> the signed test statistic. >> >> The test statistic in ks_2samp is in all cases defined as a max value >> and doesn't have a sign in R either, so adding a sign there would >> break with the standard definition. >> one-sided option for ks_2samp would just require to find the >> distribution of the test statistics D+, D- >> >> --- >> >> So my proposal for the general pattern (with exceptions for special >> reasons) would be >> >> * add/offer alternative : 'two_sided' (default), 'less' or 'greater' >> http://projects.scipy.org/scipy/ticket/1394 ?for now, >> and adjustments of existing tests in the future (adding the option can >> be mostly done in a backwards compatible way and for symmetric >> distributions like ttest it's just a convenience) >> mannwhitneyu seems to be the only "weird" one >> >> * report signed test statistic for two-sided alternative (when a >> signed test statistic exists): ?which is the status quo in >> stats.stats, but I didn't know that this is actually pretty consistent >> across tests. >> >> Opinions ? >> >> Josef >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > I think that there is some valid misunderstanding here (as I was in the > same situation) regarding what is meant here. My understanding is that > under a one-sided hypothesis, all the values of the null hypothesis only > exist in one tail of the test distribution. In contrast the values of > null distribution exist in both tails with a two-sided hypothesis. Yet > that interpretation does not have the same meaning as the tails in the > Fisher or Kolmogorov-Smirnov tests. The tests have a clear Null Hypothesis (equality) and Alternative Hypothesis (not equal or directional, less or greater). So the "alternative" should be clearly specified in the function argument, as in R. Whether this corresponds to left and right tails of the distribution is an "implementation detail" which holds for ttests but not for kstest/ks_2samp. kstest/ks2sample H0: cdf1 == cdf2 and H1: cdf1 != cdf2 or H1: cdf1 < cdf2 or H1: cdf1 > cdf2 (looks similar to comparing two survival curves in Kaplan-Meier ?) fisher_exact (2 by 2) H0: odds-ratio == 1 and H1: odds-ratio != 1 or H1: odds-ratio < 1 or H1: odds-ratio > 1 I know the kolmogorov-smirnov tests, but for fisher exact and contingency tables I rely on R from R-help: For 2 by 2 tables, the null of conditional independence is equivalent to the hypothesis that the odds ratio equals one. <...> The alternative for a one-sided test is based on the odds ratio, so alternative = "greater" is a test of the odds ratio being bigger than or. Two-sided tests are based on the probabilities of the tables, and take as ?more extreme? all tables with probabilities less than or equal to that of the observed table, the p-value being the sum of such probabilities. Josef > > I never paid much attention to the frequency based tests but it does not > surprise if there are no one-sided tests. Most are rank-based so it is > rather hard to do in a simply manner - actually I am not even sure how > to use a permutation test. > > Bruce > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From klonuo at gmail.com Tue Jun 7 05:32:02 2011 From: klonuo at gmail.com (Klonuo Umom) Date: Tue, 07 Jun 2011 11:32:02 +0200 Subject: [SciPy-User] [matplotlib] xgrid on values of x-variable? Message-ID: <20110607113201.7B02.B1C76292@gmail.com> Hi, can someone please assist, I speand an hour looking for this feature: I have very simple XY graph, and I want to display X grid only, and only on values of X variable, which are lets say [10, 11, 12, 15, 20] Thanks in advance From scott.sinclair.za at gmail.com Tue Jun 7 06:20:16 2011 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Tue, 7 Jun 2011 12:20:16 +0200 Subject: [SciPy-User] [matplotlib] xgrid on values of x-variable? In-Reply-To: <20110607113201.7B02.B1C76292@gmail.com> References: <20110607113201.7B02.B1C76292@gmail.com> Message-ID: On 7 June 2011 11:32, Klonuo Umom wrote: > I have very simple XY graph, and I want to display X grid only, and only > on values of X variable, which are lets say [10, 11, 12, 15, 20] This is a question for the Matplotlib list (https://lists.sourceforge.net/lists/listinfo/matplotlib-users). In any case, this should do what you want: import numpy as np import matplotlib.pyplot as plt fig = plt.figure() ax = fig.add_subplot(111) ax.plot(range(5)) ticks = [1.2, 2.3, 3.1, 4] ax.xaxis.set_ticks(ticks) ax.xaxis.grid() plt.show() Cheers, Scott From klonuo at gmail.com Tue Jun 7 06:34:11 2011 From: klonuo at gmail.com (Klonuo Umom) Date: Tue, 07 Jun 2011 12:34:11 +0200 Subject: [SciPy-User] [matplotlib] xgrid on values of x-variable? In-Reply-To: References: <20110607113201.7B02.B1C76292@gmail.com> Message-ID: <20110607123409.7B06.B1C76292@gmail.com> On 07.06.2011 12:20:16 Scott Sinclair wrote: > On 7 June 2011 11:32, Klonuo Umom wrote: > > I have very simple XY graph, and I want to display X grid only, and only > > on values of X variable, which are lets say [10, 11, 12, 15, 20] > > This is a question for the Matplotlib list > (https://lists.sourceforge.net/lists/listinfo/matplotlib-users). Yeah I know, sorry I couldn't find how to register on SF mailing list, but nevermind - problem solved :) > > In any case, this should do what you want: > > import numpy as np > import matplotlib.pyplot as plt > > fig = plt.figure() > ax = fig.add_subplot(111) > > ax.plot(range(5)) > > ticks = [1.2, 2.3, 3.1, 4] > ax.xaxis.set_ticks(ticks) > ax.xaxis.grid() > So, ticks dictate the grid... Logically. Although I spent so much time searching the manual and googling... Thanks From kwgoodman at gmail.com Tue Jun 7 11:32:24 2011 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 7 Jun 2011 08:32:24 -0700 Subject: [SciPy-User] [job] Python Job at Hedge Fund Message-ID: We are looking for help to predict tomorrow's stock returns. The challenge is model selection in the presence of noisy data. The tools are ubuntu, python, cython, c, numpy, scipy, la, bottleneck, git. A quantitative background and experience or interest in model selection, machine learning, and software development are a plus. This is a full time position in Berkeley, California, two blocks from UC Berkeley. If you are interested send a CV or similar (or questions) to '.'.join(['htiek','scitylanayelekreb at namdoog','moc'][::-1])[::-1] From andrew.maclean at gmail.com Mon Jun 6 18:07:48 2011 From: andrew.maclean at gmail.com (Andrew MacLean) Date: Mon, 6 Jun 2011 15:07:48 -0700 (PDT) Subject: [SciPy-User] Incorrect values from scipy.sparse.linalg.lobpcg using large matrices Message-ID: <13918a53-881a-44df-b75a-68f4576b8721@d1g2000yqe.googlegroups.com> Hi, I have been using lobpcg (scipy.sparse.linalg.lobpcg) to solve symmetric generalized eigenvalue problems with large, sparse stiffness and mass matrices, say 'A' and 'B'. The problem is of the form Av = ?BV. They are both Hermitian, and 'B' is positive definite, and both are of size 70 000 x 70 000. 'A' has about 5 million non-zero values and 'B' has about 1.6 million. 'A' has condition number on the order of 10^9 and 'B' has a condition number on the order of 10^6. I have stored them both as "csc" type sparse matrices from the scipy.sparse library. The part of my code using lobpcg is fairly simple: -------------------------------------------------------------------------------------------------------- from scipy.sparse.linalg import lobpcg from scipy import rand X = rand(A.shape[0], 20) W, V = lobpcg (A, X, B = B, largest = False, maxiter = 40) ------------------------------------------------------------------------------------------------------- I tested lobpcg on a "scaled down" version of my problem, with 'A' and 'B' matrices of size 10 000 x 10 000, and got the correct values (using maxiter = 20), as confirmed by using the "eigs" function in Matlab. I used it here to find the smallest 10 eigenvalues, and here is a table of my results, showing that the eigenvalues computed from lobpcg in Python are very close to those using eigs in Matlab: https://docs.google.com/leaf?id=0Bz-X2kbPhoUFMTQ0MzM2MGMtNjgwZi00N2U0LTk0YjMtMGM5NzZkODk0NGM1&sort=name&layout=list&num=50 With full sized 'A' and 'B' matrices, I could not get the correct values, and it became clear that increasing the maximum number of iterations indefinitely would not work (see graph below). I made a graph for the 20th smallest eigenvalue vs. the number of iterations. I compared 4 different guesses for X, 3 of which were just random matrices (as in the code above), and a 4th orthonormalized one. https://docs.google.com/leaf?id=0Bz-X2kbPhoUFYTM4OTIxZGQtZmE0Yi00MTMyLWE0MmYtYzUyOTU1Mzg2MzQ3&sort=name&layout=list&num=50 It appears that it will take a very large number of iterations to get the correct eigenvalues. I also find it strange how starting with an orthonormalized guess for X does not appear to change anything. As well, I tested lobpcg by using eigenvectors generated by eigs in Matlab for X, and lobpcg returned the correct values. Thanks for any suggestions on this, Andrew From pav at iki.fi Tue Jun 7 11:50:13 2011 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 7 Jun 2011 15:50:13 +0000 (UTC) Subject: [SciPy-User] Incorrect values from scipy.sparse.linalg.lobpcg using large matrices References: <13918a53-881a-44df-b75a-68f4576b8721@d1g2000yqe.googlegroups.com> Message-ID: Mon, 06 Jun 2011 15:07:48 -0700, Andrew MacLean wrote: > I have been using lobpcg (scipy.sparse.linalg.lobpcg) to solve symmetric > generalized eigenvalue problems with large, sparse stiffness and mass > matrices, say 'A' and 'B'. The problem is of the form Av = ?BV. Which version of Scipy? In Scipy 0.9, some bugs in lobpcg that appeared on certain platforms were fixed. If you are using Scipy < 0.9, it's possible you are hitting these. -- Pauli Virtanen From villamil at brandeis.edu Tue Jun 7 15:20:53 2011 From: villamil at brandeis.edu (villamil) Date: Tue, 7 Jun 2011 12:20:53 -0700 (PDT) Subject: [SciPy-User] [SciPy-user] sparse matrices - scipy Message-ID: <31792885.post@talk.nabble.com> I just recently started using python a couple of weeks ago, and I have an application with sparse matrices, so I found I need the Scipy package for this. So I have a sparse matrix S, and I want to do operations on its rows and columns: -find the count of the nonzero entries in each row S[i,:] -add all the entries in each column S[:,j] Is there a way to do this, or do I need to access all the elements?, Is there one particular format csc, csr, lil, coo, dok for which this is easier? Thank you -- View this message in context: http://old.nabble.com/sparse-matrices---scipy-tp31792885p31792885.html Sent from the Scipy-User mailing list archive at Nabble.com. From ralf.gommers at googlemail.com Tue Jun 7 17:40:15 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 7 Jun 2011 23:40:15 +0200 Subject: [SciPy-User] scipy.stats one-sided two-sided less, greater, signed ? In-Reply-To: References: <4DED1DC5.8090503@gmail.com> Message-ID: On Mon, Jun 6, 2011 at 9:34 PM, wrote: > On Mon, Jun 6, 2011 at 2:34 PM, Bruce Southey wrote: > > On 06/05/2011 02:43 PM, josef.pktd at gmail.com wrote: > >> What should be the policy on one-sided versus two-sided? > > Yes :-) > > > >> The main reason right now for looking at this is > >> http://projects.scipy.org/scipy/ticket/1394 which specifies a > >> "one-sided" alternative and provides both lower and upper tail. > > That refers to the Fisher's test rather than the more 'traditional' > > one-sided tests. Each value of the Fisher's test has special meanings > > about the value or probability of the 'first cell' under the null > > hypothesis. So it is necessary to provide those three values. > > > >> I would prefer that we follow the alternative patterns similar to R > >> > >> currently only kstest has alternative : 'two_sided' (default), > >> 'less' or 'greater' > >> but this should be added to other tests where it makes sense > > I think that these Kolmogorov-Smirnov tests are not the traditional > > meaning either. It is a little mind-boggling to try to think about cdfs! > > > >> R fisher.exact > >> """alternative indicates the alternative hypothesis and must be > one > >> of "two.sided", "greater" or "less". You can specify just the initial > >> letter. Only used in the 2 by 2 case.""" > >> > >> mannwhitneyu reports a one-sided test without actually specifying > >> which alternative is used (I thought I remembered other cases like > >> this but don't find any right now) > >> > >> related: > >> in many cases in the two-sided tests the test statistic has a sign > >> that indicates in which tail the test-statistic falls. > >> This is useful in ttests for example, because the one-sided tests can > >> be backed out from the two-sided tests. (With symmetric distributions > >> one-sided p-value is just half of the two-sided pvalue) > >> > >> In the discussion of https://github.com/scipy/scipy/pull/8 I argued > >> that this might mislead users to interpret a two-sided result as a > >> one-sided result. However, I doubt now that this is a strong argument > >> against not reporting the signed test statistic. > > (I do not follow pull requests so is there a relevant ticket?) > > > >> After going through scipy.stats.stats, it looks like we always report > >> the signed test statistic. > >> > >> The test statistic in ks_2samp is in all cases defined as a max value > >> and doesn't have a sign in R either, so adding a sign there would > >> break with the standard definition. > >> one-sided option for ks_2samp would just require to find the > >> distribution of the test statistics D+, D- > >> > >> --- > >> > >> So my proposal for the general pattern (with exceptions for special > >> reasons) would be > >> > >> * add/offer alternative : 'two_sided' (default), 'less' or 'greater' > >> http://projects.scipy.org/scipy/ticket/1394 for now, > >> and adjustments of existing tests in the future (adding the option can > >> be mostly done in a backwards compatible way and for symmetric > >> distributions like ttest it's just a convenience) > >> mannwhitneyu seems to be the only "weird" one > This would actually make the fisher_exact implementation more consistent, since only one p-value is returned in all cases. I just don't like the R naming much; alternative="greater" does not convey to me that this is a one-sided test using the upper tail. How about: test : {"two-tailed", "lower-tail", "upper-tail"} with two-tailed the default? Ralf > >> > >> * report signed test statistic for two-sided alternative (when a > >> signed test statistic exists): which is the status quo in > >> stats.stats, but I didn't know that this is actually pretty consistent > >> across tests. > >> > >> Opinions ? > >> > >> Josef > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > > I think that there is some valid misunderstanding here (as I was in the > > same situation) regarding what is meant here. My understanding is that > > under a one-sided hypothesis, all the values of the null hypothesis only > > exist in one tail of the test distribution. In contrast the values of > > null distribution exist in both tails with a two-sided hypothesis. Yet > > that interpretation does not have the same meaning as the tails in the > > Fisher or Kolmogorov-Smirnov tests. > > The tests have a clear Null Hypothesis (equality) and Alternative > Hypothesis (not equal or directional, less or greater). > So the "alternative" should be clearly specified in the function > argument, as in R. > > Whether this corresponds to left and right tails of the distribution > is an "implementation detail" which holds for ttests but not for > kstest/ks_2samp. > > kstest/ks2sample H0: cdf1 == cdf2 and H1: cdf1 != cdf2 or H1: > cdf1 < cdf2 or H1: cdf1 > cdf2 > (looks similar to comparing two survival curves in Kaplan-Meier ?) > > fisher_exact (2 by 2) H0: odds-ratio == 1 and H1: odds-ratio != 1 or > H1: odds-ratio < 1 or H1: odds-ratio > 1 > > I know the kolmogorov-smirnov tests, but for fisher exact and > contingency tables I rely on R > > from R-help: > For 2 by 2 tables, the null of conditional independence is equivalent > to the hypothesis that the odds ratio equals one. <...> The > alternative for a one-sided test is based on the odds ratio, so > alternative = "greater" is a test of the odds ratio being bigger than > or. > Two-sided tests are based on the probabilities of the tables, and take > as ?more extreme? all tables with probabilities less than or equal to > that of the observed table, the p-value being the sum of such > probabilities. > > Josef > > > > > > I never paid much attention to the frequency based tests but it does not > > surprise if there are no one-sided tests. Most are rank-based so it is > > rather hard to do in a simply manner - actually I am not even sure how > > to use a permutation test. > > > > Bruce > > > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Tue Jun 7 22:37:58 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 7 Jun 2011 21:37:58 -0500 Subject: [SciPy-User] scipy.stats one-sided two-sided less, greater, signed ? In-Reply-To: References: <4DED1DC5.8090503@gmail.com> Message-ID: On Tue, Jun 7, 2011 at 4:40 PM, Ralf Gommers wrote: > > > On Mon, Jun 6, 2011 at 9:34 PM, wrote: >> >> On Mon, Jun 6, 2011 at 2:34 PM, Bruce Southey wrote: >> > On 06/05/2011 02:43 PM, josef.pktd at gmail.com wrote: >> >> What should be the policy on one-sided versus two-sided? >> > Yes :-) >> > >> >> The main reason right now for looking at this is >> >> http://projects.scipy.org/scipy/ticket/1394 which specifies a >> >> "one-sided" alternative and provides both lower and upper tail. >> > That refers to the Fisher's test rather than the more 'traditional' >> > one-sided tests. Each value of the Fisher's test has special meanings >> > about the value or probability of the 'first cell' under the null >> > hypothesis. ?So it is necessary to provide those three values. >> > >> >> I would prefer that we follow the alternative patterns similar to R >> >> >> >> currently only kstest has ? ?alternative : 'two_sided' (default), >> >> 'less' or 'greater' >> >> but this should be added to other tests where it makes sense >> > I think that these Kolmogorov-Smirnov ?tests are not the traditional >> > meaning either. It is a little mind-boggling to try to think about cdfs! >> > >> >> R fisher.exact >> >> """alternative ? ? ? ?indicates the alternative hypothesis and must be >> >> one >> >> of "two.sided", "greater" or "less". You can specify just the initial >> >> letter. Only used in the 2 by 2 case.""" >> >> >> >> mannwhitneyu reports a one-sided test without actually specifying >> >> which alternative is used ?(I thought I remembered other cases like >> >> this but don't find any right now) >> >> >> >> related: >> >> in many cases in the two-sided tests the test statistic has a sign >> >> that indicates in which tail the test-statistic falls. >> >> This is useful in ttests for example, because the one-sided tests can >> >> be backed out from the two-sided tests. (With symmetric distributions >> >> one-sided p-value is just half of the two-sided pvalue) >> >> >> >> In the discussion of https://github.com/scipy/scipy/pull/8 ?I argued >> >> that this might mislead users to interpret a two-sided result as a >> >> one-sided result. However, I doubt now that this is a strong argument >> >> against not reporting the signed test statistic. >> > (I do not follow pull requests so is there a relevant ticket?) >> > >> >> After going through scipy.stats.stats, it looks like we always report >> >> the signed test statistic. >> >> >> >> The test statistic in ks_2samp is in all cases defined as a max value >> >> and doesn't have a sign in R either, so adding a sign there would >> >> break with the standard definition. >> >> one-sided option for ks_2samp would just require to find the >> >> distribution of the test statistics D+, D- >> >> >> >> --- >> >> >> >> So my proposal for the general pattern (with exceptions for special >> >> reasons) would be >> >> >> >> * add/offer alternative : 'two_sided' (default), 'less' or 'greater' >> >> http://projects.scipy.org/scipy/ticket/1394 ?for now, >> >> and adjustments of existing tests in the future (adding the option can >> >> be mostly done in a backwards compatible way and for symmetric >> >> distributions like ttest it's just a convenience) >> >> mannwhitneyu seems to be the only "weird" one > > This would actually make the fisher_exact implementation more consistent, > since only one p-value is returned in all cases. I just don't like the R > naming much; alternative="greater" does not convey to me that this is a > one-sided test using the upper tail. How about: > ??? test : {"two-tailed", "lower-tail", "upper-tail"} > with two-tailed the default? > > Ralf > > >> >> >> >> >> * report signed test statistic for two-sided alternative (when a >> >> signed test statistic exists): ?which is the status quo in >> >> stats.stats, but I didn't know that this is actually pretty consistent >> >> across tests. >> >> >> >> Opinions ? >> >> >> >> Josef >> >> _______________________________________________ >> >> SciPy-User mailing list >> >> SciPy-User at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > I think that there is some valid misunderstanding here (as I was in the >> > same situation) regarding what is meant here. My understanding is that >> > under a one-sided hypothesis, all the values of the null hypothesis only >> > exist in one tail of the test distribution. In contrast the values of >> > null distribution exist in both tails with a two-sided hypothesis. Yet >> > that interpretation does not have the same meaning as the tails in the >> > Fisher or Kolmogorov-Smirnov tests. >> >> The tests have a clear Null Hypothesis (equality) and Alternative >> Hypothesis (not equal or directional, less or greater). >> So the "alternative" should be clearly specified in the function >> argument, as in R. >> >> Whether this corresponds to left and right tails of the distribution >> is an "implementation detail" which holds for ttests but not for >> kstest/ks_2samp. >> >> kstest/ks2sample ? H0: cdf1 == cdf2 ?and H1: ?cdf1 != cdf2 or H1: >> cdf1 < cdf2 or H1: ?cdf1 > cdf2 >> (looks similar to comparing two survival curves in Kaplan-Meier ?) >> >> fisher_exact (2 by 2) ?H0: odds-ratio == 1 and H1: odds-ratio != 1 or >> H1: odds-ratio < 1 or H1: odds-ratio > 1 >> >> I know the kolmogorov-smirnov tests, but for fisher exact and >> contingency tables I rely on R >> >> from R-help: >> For 2 by 2 tables, the null of conditional independence is equivalent >> to the hypothesis that the odds ratio equals one. <...> The >> alternative for a one-sided test is based on the odds ratio, so >> alternative = "greater" is a test of the odds ratio being bigger than >> or. >> Two-sided tests are based on the probabilities of the tables, and take >> as ?more extreme? all tables with probabilities less than or equal to >> that of the observed table, the p-value being the sum of such >> probabilities. >> >> Josef >> >> >> > >> > I never paid much attention to the frequency based tests but it does not >> > surprise if there are no one-sided tests. Most are rank-based so it is >> > rather hard to do in a simply manner - actually I am not even sure how >> > to use a permutation test. >> > >> > Bruce >> > >> > >> > >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > But that is NOT the correct interpretation here! I tried to explain to you that this is the not the usual idea one-sided vs two-sided tests. For example: http://www.msu.edu/~fuw/teaching/Fu_ch10_2_categorical.ppt "The test holds the marginal totals fixed and computes the hypergeometric probability that n11 is at least as large as the observed value" "The output consists of three p-values: Left: Use this when the alternative to independence is that there is negative association between the variables. That is, the observations tend to lie in lower left and upper right. Right: Use this when the alternative to independence is that there is positive association between the variables. That is, the observations tend to lie in upper left and lower right. 2-Tail: Use this when there is no prior alternative. " There is also the book "Categorical data analysis: using the SAS system By Maura E. Stokes, Charles S. Davis, Gary G. Koch" that came up via Google that also refers to the n11 cell. http://www.langsrud.com/fisher.htm "The output consists of three p-values: Left: Use this when the alternative to independence is that there is negative association between the variables. That is, the observations tend to lie in lower left and upper right. Right: Use this when the alternative to independence is that there is positive association between the variables. That is, the observations tend to lie in upper left and lower right. 2-Tail: Use this when there is no prior alternative. NOTE: Decide to use Left, Right or 2-Tail before collecting (or looking at) the data." But you will get a different p-value if you switch rows and columns because of the dependence on the n11 cell. If you do that then the p-values switch between left and right sides as these now refer to different hypotheses regarding that first cell. Bruce From schut at sarvision.nl Wed Jun 8 03:41:32 2011 From: schut at sarvision.nl (Vincent Schut) Date: Wed, 08 Jun 2011 09:41:32 +0200 Subject: [SciPy-User] [job] Python Job at Hedge Fund In-Reply-To: References: Message-ID: On 06/07/2011 05:32 PM, Keith Goodman wrote: > We are looking for help to predict tomorrow's stock returns. > > The challenge is model selection in the presence of noisy data. The > tools are ubuntu, python, cython, c, numpy, scipy, la, bottleneck, > git. > > A quantitative background and experience or interest in model > selection, machine learning, and software development are a plus. > > This is a full time position in Berkeley, California, two blocks from > UC Berkeley. > > If you are interested send a CV or similar (or questions) to > '.'.join(['htiek','scitylanayelekreb at namdoog','moc'][::-1])[::-1] No interest (it's slightly out of my commuting range) nor questions, but this is by far the best email address obfuscation I have seen so far :-) VS. From josh.holbrook at gmail.com Wed Jun 8 04:04:31 2011 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Wed, 8 Jun 2011 00:04:31 -0800 Subject: [SciPy-User] [job] Python Job at Hedge Fund In-Reply-To: References: Message-ID: My comment: >> This is a full time position in Berkeley, California, two blocks from >> UC Berkeley. I'm moving way close to there actually (N. Oakland)! If I didn't already have commitments I'd apply. Heck, if things don't work out maybe I'll send you my CV anyway. ;) >> >> If you are interested send a CV or similar (or questions) to >> '.'.join(['htiek','scitylanayelekreb at namdoog','moc'][::-1])[::-1] > > No interest (it's slightly out of my commuting range) nor questions, but > this is by far the best email address obfuscation I have seen so far :-) > I agree with this. Very clever! Best of luck! --Josh From meesters at aesku.com Wed Jun 8 05:58:46 2011 From: meesters at aesku.com (Meesters, Christian) Date: Wed, 8 Jun 2011 09:58:46 +0000 Subject: [SciPy-User] curve fitting with fixed parameters Message-ID: <8E882955B5BEA54BA86AB84407D7BBE3048385@AESKU-EXCH01.AESKU.local> Hi, Recently I started a thread "curve_fit - fitting a sum of 'functions'". Thanks for all the ideas: I am working to get proper weights for the actual function I would like to fit. Along the road I stumbled on yet another problem: Perhaps the wording in the subject line is a bit sloppy. However, I would like to fit a rather complex function and actually the problem would be underdetermined, but luckily I have known parameters. Well, I could put them in the function to fit using the global keyword, but that seems a bit awkward ... Is there a way to set some parameters of a fit as 'fixed', say with scipy.optimize.curve_fit or scipy.optimize.leastsq? (If I put a particular known parameter in p0 of curve_fit, the function ends up in a falls local minimum. Only if a hard code that parameter in within the function to fit I get the correct result, but this parameter needs is different from dataset to dataset.) Any ideas / pointers for me? Christian -------------- next part -------------- An HTML attachment was scrubbed... URL: From JRadinger at gmx.at Wed Jun 8 06:52:17 2011 From: JRadinger at gmx.at (Johannes Radinger) Date: Wed, 08 Jun 2011 12:52:17 +0200 Subject: [SciPy-User] How to fit a curve/function? In-Reply-To: References: <20110607113201.7B02.B1C76292@gmail.com> Message-ID: <20110608105217.222500@gmx.net> Hello, I've got following function describing any kind of animal dispersal kernel: def pdf(x,s1,s2): return (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) On the other hand I've got data from literature with which I want to fit the function so that I get s1, s2 and x. Ususally the data in the literature are as follows: Example 1: 50% of the animals are between -270m and +270m and 90% are between -500m and + 500m Example 2: 84% is between - 5000 m and +5000m, and 73% are between -1000m and +1000m So far as I understand an integration of the function is needed to solve for s1 and s2 as all the literature data give percentage (area under the curve) Can that be used to fit the curve or can that create ranges for s1 and s2. /Johannes -- NEU: FreePhone - kostenlos mobil telefonieren! Jetzt informieren: http://www.gmx.net/de/go/freephone From josef.pktd at gmail.com Wed Jun 8 06:56:42 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 8 Jun 2011 06:56:42 -0400 Subject: [SciPy-User] scipy.stats one-sided two-sided less, greater, signed ? In-Reply-To: References: <4DED1DC5.8090503@gmail.com>

Message-ID: On Tue, Jun 7, 2011 at 10:37 PM, Bruce Southey wrote: > On Tue, Jun 7, 2011 at 4:40 PM, Ralf Gommers > wrote: >> >> >> On Mon, Jun 6, 2011 at 9:34 PM, wrote: >>> >>> On Mon, Jun 6, 2011 at 2:34 PM, Bruce Southey wrote: >>> > On 06/05/2011 02:43 PM, josef.pktd at gmail.com wrote: >>> >> What should be the policy on one-sided versus two-sided? >>> > Yes :-) >>> > >>> >> The main reason right now for looking at this is >>> >> http://projects.scipy.org/scipy/ticket/1394 which specifies a >>> >> "one-sided" alternative and provides both lower and upper tail. >>> > That refers to the Fisher's test rather than the more 'traditional' >>> > one-sided tests. Each value of the Fisher's test has special meanings >>> > about the value or probability of the 'first cell' under the null >>> > hypothesis. ?So it is necessary to provide those three values. >>> > >>> >> I would prefer that we follow the alternative patterns similar to R >>> >> >>> >> currently only kstest has ? ?alternative : 'two_sided' (default), >>> >> 'less' or 'greater' >>> >> but this should be added to other tests where it makes sense >>> > I think that these Kolmogorov-Smirnov ?tests are not the traditional >>> > meaning either. It is a little mind-boggling to try to think about cdfs! >>> > >>> >> R fisher.exact >>> >> """alternative ? ? ? ?indicates the alternative hypothesis and must be >>> >> one >>> >> of "two.sided", "greater" or "less". You can specify just the initial >>> >> letter. Only used in the 2 by 2 case.""" >>> >> >>> >> mannwhitneyu reports a one-sided test without actually specifying >>> >> which alternative is used ?(I thought I remembered other cases like >>> >> this but don't find any right now) >>> >> >>> >> related: >>> >> in many cases in the two-sided tests the test statistic has a sign >>> >> that indicates in which tail the test-statistic falls. >>> >> This is useful in ttests for example, because the one-sided tests can >>> >> be backed out from the two-sided tests. (With symmetric distributions >>> >> one-sided p-value is just half of the two-sided pvalue) >>> >> >>> >> In the discussion of https://github.com/scipy/scipy/pull/8 ?I argued >>> >> that this might mislead users to interpret a two-sided result as a >>> >> one-sided result. However, I doubt now that this is a strong argument >>> >> against not reporting the signed test statistic. >>> > (I do not follow pull requests so is there a relevant ticket?) >>> > >>> >> After going through scipy.stats.stats, it looks like we always report >>> >> the signed test statistic. >>> >> >>> >> The test statistic in ks_2samp is in all cases defined as a max value >>> >> and doesn't have a sign in R either, so adding a sign there would >>> >> break with the standard definition. >>> >> one-sided option for ks_2samp would just require to find the >>> >> distribution of the test statistics D+, D- >>> >> >>> >> --- >>> >> >>> >> So my proposal for the general pattern (with exceptions for special >>> >> reasons) would be >>> >> >>> >> * add/offer alternative : 'two_sided' (default), 'less' or 'greater' >>> >> http://projects.scipy.org/scipy/ticket/1394 ?for now, >>> >> and adjustments of existing tests in the future (adding the option can >>> >> be mostly done in a backwards compatible way and for symmetric >>> >> distributions like ttest it's just a convenience) >>> >> mannwhitneyu seems to be the only "weird" one >> >> This would actually make the fisher_exact implementation more consistent, >> since only one p-value is returned in all cases. I just don't like the R >> naming much; alternative="greater" does not convey to me that this is a >> one-sided test using the upper tail. How about: >> ??? test : {"two-tailed", "lower-tail", "upper-tail"} >> with two-tailed the default? I think matlab uses (in general) larger and smaller, the advantage of less/smaller and greater/larger is that it directly refers to the alternative hypothesis, while the meaning in terms of tails is not always clear (in kstest and I guess some others the test statistics is just reversed and uses the same tail in both cases) so greater smaller is mostly "future proof" across tests, while reference to the tail can only be used where this is an unambiguous statement. but see below >> >> Ralf >> >> >>> >>> >> >>> >> * report signed test statistic for two-sided alternative (when a >>> >> signed test statistic exists): ?which is the status quo in >>> >> stats.stats, but I didn't know that this is actually pretty consistent >>> >> across tests. >>> >> >>> >> Opinions ? >>> >> >>> >> Josef >>> >> _______________________________________________ >>> >> SciPy-User mailing list >>> >> SciPy-User at scipy.org >>> >> http://mail.scipy.org/mailman/listinfo/scipy-user >>> > I think that there is some valid misunderstanding here (as I was in the >>> > same situation) regarding what is meant here. My understanding is that >>> > under a one-sided hypothesis, all the values of the null hypothesis only >>> > exist in one tail of the test distribution. In contrast the values of >>> > null distribution exist in both tails with a two-sided hypothesis. Yet >>> > that interpretation does not have the same meaning as the tails in the >>> > Fisher or Kolmogorov-Smirnov tests. >>> >>> The tests have a clear Null Hypothesis (equality) and Alternative >>> Hypothesis (not equal or directional, less or greater). >>> So the "alternative" should be clearly specified in the function >>> argument, as in R. >>> >>> Whether this corresponds to left and right tails of the distribution >>> is an "implementation detail" which holds for ttests but not for >>> kstest/ks_2samp. >>> >>> kstest/ks2sample ? H0: cdf1 == cdf2 ?and H1: ?cdf1 != cdf2 or H1: >>> cdf1 < cdf2 or H1: ?cdf1 > cdf2 >>> (looks similar to comparing two survival curves in Kaplan-Meier ?) >>> >>> fisher_exact (2 by 2) ?H0: odds-ratio == 1 and H1: odds-ratio != 1 or >>> H1: odds-ratio < 1 or H1: odds-ratio > 1 >>> >>> I know the kolmogorov-smirnov tests, but for fisher exact and >>> contingency tables I rely on R >>> >>> from R-help: >>> For 2 by 2 tables, the null of conditional independence is equivalent >>> to the hypothesis that the odds ratio equals one. <...> The >>> alternative for a one-sided test is based on the odds ratio, so >>> alternative = "greater" is a test of the odds ratio being bigger than >>> or. >>> Two-sided tests are based on the probabilities of the tables, and take >>> as ?more extreme? all tables with probabilities less than or equal to >>> that of the observed table, the p-value being the sum of such >>> probabilities. >>> >>> Josef >>> >>> >>> > >>> > I never paid much attention to the frequency based tests but it does not >>> > surprise if there are no one-sided tests. Most are rank-based so it is >>> > rather hard to do in a simply manner - actually I am not even sure how >>> > to use a permutation test. >>> > >>> > Bruce >>> > >>> > >>> > >>> > _______________________________________________ >>> > SciPy-User mailing list >>> > SciPy-User at scipy.org >>> > http://mail.scipy.org/mailman/listinfo/scipy-user >>> > >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > But that is NOT the correct interpretation ?here! > I tried to explain to you that this is the not the usual idea > one-sided vs two-sided tests. > For example: > http://www.msu.edu/~fuw/teaching/Fu_ch10_2_categorical.ppt > "The test holds the marginal totals fixed and computes the > hypergeometric probability that n11 is at least as large as the > observed value" this still sounds like a less/greater test to me > "The output consists of three p-values: > Left: Use this when the alternative to independence is that there is > negative association between the variables. ?That is, the observations > tend to lie in lower left and upper right. > Right: Use this when the alternative to independence is that there is > positive association between the variables. That is, the observations > tend to lie in upper left and lower right. > 2-Tail: Use this when there is no prior alternative. > " > There is also the book "Categorical data analysis: using the SAS > system ?By Maura E. Stokes, Charles S. Davis, Gary G. Koch" that came > up via Google that also refers to the n11 cell. > > http://www.langsrud.com/fisher.htm I was trying to read the Agresti paper referenced there but it has too much detail to get through in 15 minutes :) > "The output consists of three p-values: > > ? ?Left: Use this when the alternative to independence is that there > is negative association between the variables. > ? ?That is, the observations tend to lie in lower left and upper right. > ? ?Right: Use this when the alternative to independence is that there > is positive association between the variables. > ? ?That is, the observations tend to lie in upper left and lower right. > ? ?2-Tail: Use this when there is no prior alternative. > > NOTE: Decide to use Left, Right or 2-Tail before collecting (or > looking at) the data." > > But you will get a different p-value if you switch rows and columns > because of the dependence on the n11 cell. If you do that then the > p-values switch between left and right sides as these now refer to > different hypotheses regarding that first cell. switching row and columns doesn't change the p-value in R reversing columns changes the definition of less and greater, reverses them The problem with 2 by 2 contingency tables with given marginals, i.e. row and column totals, is that we only have one free entry. Any test on one entry, e.g. element 0,0, pins down all the other ones and (many) tests then become equivalent. http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_freq_a0000000658.htm some math got lost """ For <2 by 2> tables, one-sided -values for Fisher?s exact test are defined in terms of the frequency of the cell in the first row and first column of the table, the (1,1) cell. Denoting the observed (1,1) cell frequency by , the left-sided -value for Fisher?s exact test is the probability that the (1,1) cell frequency is less than or equal to . For the left-sided -value, the set includes those tables with a (1,1) cell frequency less than or equal to . A small left-sided -value supports the alternative hypothesis that the probability of an observation being in the first cell is actually less than expected under the null hypothesis of independent row and column variables. Similarly, for a right-sided alternative hypothesis, is the set of tables where the frequency of the (1,1) cell is greater than or equal to that in the observed table. A small right-sided -value supports the alternative that the probability of the first cell is actually greater than that expected under the null hypothesis. Because the (1,1) cell frequency completely determines the table when the marginal row and column sums are fixed, these one-sided alternatives can be stated equivalently in terms of other cell probabilities or ratios of cell probabilities. The left-sided alternative is equivalent to an odds ratio less than 1, where the odds ratio equals (). Additionally, the left-sided alternative is equivalent to the column 1 risk for row 1 being less than the column 1 risk for row 2, . Similarly, the right-sided alternative is equivalent to the column 1 risk for row 1 being greater than the column 1 risk for row 2, . See Agresti (2007) for details. R C Tables """ I'm not a user of Fisher's exact test (and I have a hard time keeping the different statements straight), so if left/right or lower/upper makes more sense to users, then I don't complain. To me they are all just independence tests with possible one-sided alternatives that one distribution dominates the other. (with the same pattern as ks_2samp or ttest_2samp) Josef > > > Bruce > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Wed Jun 8 07:00:04 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 8 Jun 2011 07:00:04 -0400 Subject: [SciPy-User] curve fitting with fixed parameters In-Reply-To: <8E882955B5BEA54BA86AB84407D7BBE3048385@AESKU-EXCH01.AESKU.local> References: <8E882955B5BEA54BA86AB84407D7BBE3048385@AESKU-EXCH01.AESKU.local> Message-ID: On Wed, Jun 8, 2011 at 5:58 AM, Meesters, Christian wrote: > Hi, > > Recently I started a thread "curve_fit - fitting a sum of 'functions'". > Thanks for all the ideas: I am working to get proper weights for the actual > function I would like to fit. > > Along the road I stumbled on yet another problem: Perhaps the wording in the > subject line is a bit sloppy. However, I would like to fit a rather complex > function and actually the problem would be underdetermined, but luckily I > have known parameters. Well, I could put them in the function to fit using > the global keyword, but that seems a bit awkward ... > > Is there a way to set some parameters of a fit as 'fixed', say with > scipy.optimize.curve_fit or scipy.optimize.leastsq? (If I put a particular > known parameter in p0 of curve_fit, the function ends up in a falls local > minimum. Only if a hard code that parameter in within the function to fit I > get the correct result, but this parameter needs is different from dataset > to dataset.) > > Any ideas / pointers for me? write a nested function or partial function or a class that fixes the given parameter in the outer/class scope and maximize the function that has the value fixed. Josef > > Christian > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From meesters at aesku.com Wed Jun 8 07:07:18 2011 From: meesters at aesku.com (Meesters, Christian) Date: Wed, 8 Jun 2011 11:07:18 +0000 Subject: [SciPy-User] curve fitting with fixed parameters In-Reply-To: References: <8E882955B5BEA54BA86AB84407D7BBE3048385@AESKU-EXCH01.AESKU.local>, Message-ID: <8E882955B5BEA54BA86AB84407D7BBE304A401@AESKU-EXCH01.AESKU.local> > Any ideas / pointers for me? > write a nested function or partial function or a class that fixes the > given parameter in the outer/class scope and maximize the function > that has the value fixed. > > Josef Of course! Thank you. Christian From josef.pktd at gmail.com Wed Jun 8 07:10:38 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 8 Jun 2011 07:10:38 -0400 Subject: [SciPy-User] How to fit a curve/function? In-Reply-To: <20110608105217.222500@gmx.net> References: <20110607113201.7B02.B1C76292@gmail.com> <20110608105217.222500@gmx.net> Message-ID: On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger wrote: > Hello, > > I've got following function describing any kind of animal dispersal kernel: > > def pdf(x,s1,s2): > ? ?return (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > > On the other hand I've got data from literature with which I want to fit the function so that I get s1, s2 and x. > Ususally the data in the literature are as follows: > > Example 1: 50% of the animals are between -270m and +270m and 90% ?are between -500m and + 500m > > Example 2: 84% is between - 5000 m and +5000m, and 73% are between -1000m and +1000m > > So far as I understand an integration of the function is needed to solve for s1 and s2 as all the literature data give percentage (area under the curve) Can that be used to fit the curve or can that create ranges for s1 and s2. I don't see a way around integration. If you have exactly 2 probabilities, then you can you a solver like scipy.optimize.fsolve to match the probabilites eg. 0.5 = integral pdf from -270 to 270 0.9 = integral pdf from -500 to 500 If you have more than 2 probabilities, then using optimization of a weighted function of the moment conditions would be better. Josef > > /Johannes > > -- > NEU: FreePhone - kostenlos mobil telefonieren! > Jetzt informieren: http://www.gmx.net/de/go/freephone > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From JRadinger at gmx.at Wed Jun 8 07:21:25 2011 From: JRadinger at gmx.at (Johannes Radinger) Date: Wed, 08 Jun 2011 13:21:25 +0200 Subject: [SciPy-User] How to fit a curve/function? In-Reply-To: References: <20110607113201.7B02.B1C76292@gmail.com> <20110608105217.222500@gmx.net> Message-ID: <20110608112125.200760@gmx.net> -------- Original-Nachricht -------- > Datum: Wed, 8 Jun 2011 07:10:38 -0400 > Von: josef.pktd at gmail.com > An: SciPy Users List > Betreff: Re: [SciPy-User] How to fit a curve/function? > On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger > wrote: > > Hello, > > > > I've got following function describing any kind of animal dispersal > kernel: > > > > def pdf(x,s1,s2): > > ? ?return > (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > > > > On the other hand I've got data from literature with which I want to fit > the function so that I get s1, s2 and x. > > Ususally the data in the literature are as follows: > > > > Example 1: 50% of the animals are between -270m and +270m and 90% ?are > between -500m and + 500m > > > > Example 2: 84% is between - 5000 m and +5000m, and 73% are between > -1000m and +1000m > > > > So far as I understand an integration of the function is needed to solve > for s1 and s2 as all the literature data give percentage (area under the > curve) Can that be used to fit the curve or can that create ranges for s1 > and s2. > > I don't see a way around integration. > > If you have exactly 2 probabilities, then you can you a solver like > scipy.optimize.fsolve to match the probabilites > eg. > 0.5 = integral pdf from -270 to 270 > 0.9 = integral pdf from -500 to 500 > > If you have more than 2 probabilities, then using optimization of a > weighted function of the moment conditions would be better. > > Josef Thank you for that point... just a simple question: In the case of 2 probabilities is it possible to solve for 3 parameters (s1, s2 and p)? Is there a way to do that as well? /Johannes > > > > > /Johannes > > > > -- > > NEU: FreePhone - kostenlos mobil telefonieren! > > Jetzt informieren: http://www.gmx.net/de/go/freephone > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -- NEU: FreePhone - kostenlos mobil telefonieren! Jetzt informieren: http://www.gmx.net/de/go/freephone From JRadinger at gmx.at Wed Jun 8 07:21:25 2011 From: JRadinger at gmx.at (Johannes Radinger) Date: Wed, 08 Jun 2011 13:21:25 +0200 Subject: [SciPy-User] How to fit a curve/function? In-Reply-To: References: <20110607113201.7B02.B1C76292@gmail.com> <20110608105217.222500@gmx.net> Message-ID: <20110608112125.200760@gmx.net> -------- Original-Nachricht -------- > Datum: Wed, 8 Jun 2011 07:10:38 -0400 > Von: josef.pktd at gmail.com > An: SciPy Users List > Betreff: Re: [SciPy-User] How to fit a curve/function? > On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger > wrote: > > Hello, > > > > I've got following function describing any kind of animal dispersal > kernel: > > > > def pdf(x,s1,s2): > > ? ?return > (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > > > > On the other hand I've got data from literature with which I want to fit > the function so that I get s1, s2 and x. > > Ususally the data in the literature are as follows: > > > > Example 1: 50% of the animals are between -270m and +270m and 90% ?are > between -500m and + 500m > > > > Example 2: 84% is between - 5000 m and +5000m, and 73% are between > -1000m and +1000m > > > > So far as I understand an integration of the function is needed to solve > for s1 and s2 as all the literature data give percentage (area under the > curve) Can that be used to fit the curve or can that create ranges for s1 > and s2. > > I don't see a way around integration. > > If you have exactly 2 probabilities, then you can you a solver like > scipy.optimize.fsolve to match the probabilites > eg. > 0.5 = integral pdf from -270 to 270 > 0.9 = integral pdf from -500 to 500 > > If you have more than 2 probabilities, then using optimization of a > weighted function of the moment conditions would be better. > > Josef Thank you for that point... just a simple question: In the case of 2 probabilities is it possible to solve for 3 parameters (s1, s2 and p)? Is there a way to do that as well? /Johannes > > > > > /Johannes > > > > -- > > NEU: FreePhone - kostenlos mobil telefonieren! > > Jetzt informieren: http://www.gmx.net/de/go/freephone > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -- NEU: FreePhone - kostenlos mobil telefonieren! Jetzt informieren: http://www.gmx.net/de/go/freephone From josef.pktd at gmail.com Wed Jun 8 07:34:01 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 8 Jun 2011 07:34:01 -0400 Subject: [SciPy-User] How to fit a curve/function? In-Reply-To: <20110608112125.200760@gmx.net> References: <20110607113201.7B02.B1C76292@gmail.com> <20110608105217.222500@gmx.net> <20110608112125.200760@gmx.net> Message-ID: On Wed, Jun 8, 2011 at 7:21 AM, Johannes Radinger wrote: > > -------- Original-Nachricht -------- >> Datum: Wed, 8 Jun 2011 07:10:38 -0400 >> Von: josef.pktd at gmail.com >> An: SciPy Users List >> Betreff: Re: [SciPy-User] How to fit a curve/function? > >> On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger >> wrote: >> > Hello, >> > >> > I've got following function describing any kind of animal dispersal >> kernel: >> > >> > def pdf(x,s1,s2): >> > ? ?return >> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) >> > >> > On the other hand I've got data from literature with which I want to fit >> the function so that I get s1, s2 and x. >> > Ususally the data in the literature are as follows: >> > >> > Example 1: 50% of the animals are between -270m and +270m and 90% ?are >> between -500m and + 500m >> > >> > Example 2: 84% is between - 5000 m and +5000m, and 73% are between >> -1000m and +1000m >> > >> > So far as I understand an integration of the function is needed to solve >> for s1 and s2 as all the literature data give percentage (area under the >> curve) Can that be used to fit the curve or can that create ranges for s1 >> and s2. >> >> I don't see a way around integration. >> >> If you have exactly 2 probabilities, then you can you a solver like >> scipy.optimize.fsolve to match the probabilites >> eg. >> 0.5 = integral pdf from -270 to 270 >> 0.9 = integral pdf from -500 to 500 >> >> If you have more than 2 probabilities, then using optimization of a >> weighted function of the moment conditions would be better. >> >> Josef > > Thank you for that point... just a simple question: In the case of 2 probabilities is it possible to solve for 3 parameters (s1, s2 and p)? Is there a way to do that as well? No, not in general, with 3 parameters and only two conditions you can pin down only 2 parameters. The third parameters can be picked arbitrarily (or using some prior), but it might not make sense. Josef > > /Johannes > >> >> > >> > /Johannes >> > >> > -- >> > NEU: FreePhone - kostenlos mobil telefonieren! >> > Jetzt informieren: http://www.gmx.net/de/go/freephone >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > -- > NEU: FreePhone - kostenlos mobil telefonieren! > Jetzt informieren: http://www.gmx.net/de/go/freephone > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From ckkart at hoc.net Wed Jun 8 07:38:19 2011 From: ckkart at hoc.net (Christian K.) Date: Wed, 8 Jun 2011 11:38:19 +0000 (UTC) Subject: [SciPy-User] curve fitting with fixed parameters References: <8E882955B5BEA54BA86AB84407D7BBE3048385@AESKU-EXCH01.AESKU.local> Message-ID: Meesters, Christian aesku.com> writes: > > Hi, > Recently I started a thread "curve_fit - fitting a sum of 'functions'". Thanks for all the ideas: I am working to get proper weights for the actual function I would like to fit. Have a look at peak-o-mat (http://lorentz.sourceforge.net). It is an interactive fitting program, written in python/wxpython. Fitting is done using scipy.odr. You may speciy weights, mark parameters as fixed, etc. Any python expression may be used as model. The documentation is sparse and the latest file release is quite old, so better use the source on svn. Regards, Christian From villamil at brandeis.edu Tue Jun 7 11:50:44 2011 From: villamil at brandeis.edu (villamil) Date: Tue, 7 Jun 2011 08:50:44 -0700 (PDT) Subject: [SciPy-User] [SciPy-user] sparse matrices - scipy Message-ID: <31792885.post@talk.nabble.com> I just recently started using python a couple of weeks ago, and I have an application with sparse matrices, so I found I need the Scipy package for this. So I have a sparse matrix S, and I want to do operations on its rows and columns: -find the count of the nonzero entries in each row S[i,:] -add all the entries in each column S[:,j] Is there a way to do this, or do I need to access all the elements?, Is there one particular format csc, csr, lil, coo, dok for which this is easier? Thank you -- View this message in context: http://old.nabble.com/sparse-matrices---scipy-tp31792885p31792885.html Sent from the Scipy-User mailing list archive at Nabble.com. From phubaba at gmail.com Tue Jun 7 12:53:19 2011 From: phubaba at gmail.com (phubaba) Date: Tue, 7 Jun 2011 09:53:19 -0700 (PDT) Subject: [SciPy-User] [SciPy-user] fast small matrix multiplication with cython? In-Reply-To: References:

Message-ID: <31793732.post@talk.nabble.com> Hello Skipper, is there any chance you could explain the fast recursion algorithm or supply the cython code you used to implement it? Thanks, Rob jseabold wrote: > > On Thu, Dec 9, 2010 at 4:33 PM, Skipper Seabold > wrote: >> On Wed, Dec 8, 2010 at 11:28 PM, ? wrote: >>>> >>>> It looks like I don't save too much time with just Python/scipy >>>> optimizations. ?Apparently ~75% of the time is spent in l-bfgs-b, >>>> judging by its user time output and the profiler's CPU time output(?). >>>> ?Non-cython versions: >>>> >>>> Brief and rough profiling on my laptop for ARMA(2,2) with 1000 >>>> observations. ?Optimization uses fmin_l_bfgs_b with m = 12 and iprint >>>> = 0. >>> >>> Completely different idea: How costly are the numerical derivatives in >>> l-bfgs-b? >>> With l-bfgs-b, you should be able to replace the derivatives with the >>> complex step derivatives that calculate the loglike function value and >>> the derivatives in one iteration. >>> >> >> I couldn't figure out how to use it without some hacks. ?The >> fmin_l_bfgs_b will call both f and fprime as (x, *args), but >> approx_fprime or approx_fprime_cs need actually approx_fprime(x, func, >> args=args) and call func(x, *args). ?I changed fmin_l_bfgs_b to make >> the call like this for the gradient, and I get (different computer) >> >> >> Using approx_fprime_cs >> ----------------------------------- >> ? ? ? ? 861609 function calls (861525 primitive calls) in 3.337 CPU >> seconds >> >> ? Ordered by: internal time >> >> ? ncalls ?tottime ?percall ?cumtime ?percall filename:lineno(function) >> ? ? ? 70 ? ?1.942 ? ?0.028 ? ?3.213 ? ?0.046 kalmanf.py:504(loglike) >> ? 840296 ? ?1.229 ? ?0.000 ? ?1.229 ? ?0.000 {numpy.core._dotblas.dot} >> ? ? ? 56 ? ?0.038 ? ?0.001 ? ?0.038 ? ?0.001 >> {numpy.linalg.lapack_lite.zgesv} >> ? ? ?270 ? ?0.025 ? ?0.000 ? ?0.025 ? ?0.000 {sum} >> ? ? ? 90 ? ?0.019 ? ?0.000 ? ?0.019 ? ?0.000 >> {numpy.linalg.lapack_lite.dgesdd} >> ? ? ? 46 ? ?0.013 ? ?0.000 ? ?0.014 ? ?0.000 >> function_base.py:494(asarray_chkfinite) >> ? ? ?162 ? ?0.012 ? ?0.000 ? ?0.014 ? ?0.000 arima.py:117(_transparams) >> >> >> Using approx_grad = True >> --------------------------------------- >> ? ? ? ? 1097454 function calls (1097370 primitive calls) in 3.615 CPU >> seconds >> >> ? Ordered by: internal time >> >> ? ncalls ?tottime ?percall ?cumtime ?percall filename:lineno(function) >> ? ? ? 90 ? ?2.316 ? ?0.026 ? ?3.489 ? ?0.039 kalmanf.py:504(loglike) >> ?1073757 ? ?1.164 ? ?0.000 ? ?1.164 ? ?0.000 {numpy.core._dotblas.dot} >> ? ? ?270 ? ?0.025 ? ?0.000 ? ?0.025 ? ?0.000 {sum} >> ? ? ? 90 ? ?0.020 ? ?0.000 ? ?0.020 ? ?0.000 >> {numpy.linalg.lapack_lite.dgesdd} >> ? ? ?182 ? ?0.014 ? ?0.000 ? ?0.016 ? ?0.000 arima.py:117(_transparams) >> ? ? ? 46 ? ?0.013 ? ?0.000 ? ?0.014 ? ?0.000 >> function_base.py:494(asarray_chkfinite) >> ? ? ? 46 ? ?0.008 ? ?0.000 ? ?0.023 ? ?0.000 decomp_svd.py:12(svd) >> ? ? ? 23 ? ?0.004 ? ?0.000 ? ?0.004 ? ?0.000 {method 'var' of >> 'numpy.ndarray' objects} >> >> >> Definitely less function calls and a little faster, but I had to write >> some hacks to get it to work. >> > > This is more like it! With fast recursions in Cython: > > 15186 function calls (15102 primitive calls) in 0.750 CPU seconds > > Ordered by: internal time > > ncalls tottime percall cumtime percall filename:lineno(function) > 18 0.622 0.035 0.625 0.035 > kalman_loglike.pyx:15(kalman_loglike) > 270 0.024 0.000 0.024 0.000 {sum} > 90 0.019 0.000 0.019 0.000 > {numpy.linalg.lapack_lite.dgesdd} > 156 0.013 0.000 0.013 0.000 {numpy.core._dotblas.dot} > 46 0.013 0.000 0.014 0.000 > function_base.py:494(asarray_chkfinite) > 110 0.008 0.000 0.010 0.000 arima.py:118(_transparams) > 46 0.008 0.000 0.023 0.000 decomp_svd.py:12(svd) > 23 0.004 0.000 0.004 0.000 {method 'var' of > 'numpy.ndarray' objects} > 26 0.004 0.000 0.004 0.000 tsatools.py:109(lagmat) > 90 0.004 0.000 0.042 0.000 arima.py:197(loglike_css) > 81 0.004 0.000 0.004 0.000 > {numpy.core.multiarray._fastCopyAndTranspose} > > I can live with this for now. > > Skipper > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- View this message in context: http://old.nabble.com/fast-small-matrix-multiplication-with-cython--tp30391922p31793732.html Sent from the Scipy-User mailing list archive at Nabble.com. From pholvey at gmail.com Tue Jun 7 15:23:46 2011 From: pholvey at gmail.com (Patrick Holvey) Date: Tue, 7 Jun 2011 15:23:46 -0400 Subject: [SciPy-User] optimize.fmin_cg and optimize.fmin_bgfs optimize to 0 Message-ID: Good afternoon everyone, I've got an analytic gradient for an energy potential that I'm using to minimize the energy for atom positions (the Keating potential on an system of silica atoms). I'd previously used fmin_cg without an analytical gradient (the gradient was estimated) but, if I'm going to be looking at larger systems, estimated gradients slow the optimization to a crawl. So I've found and coded the analytic gradient. However, when I use the gradient in the optimization, all of the atom positions shoot right to the origin (so they're all at 0,0,0) after just 2 function calls and 1 gradient call, which seems very odd to me. So I tried fmin_bgfs with the gradient and the same thing happened. Does anyone have any experience with analytic gradients where this has happened to them? I'm confused as to whether the problem is in my gradient implementation or in how I'm passing the gradient or what. For your reference, I've included my current implementation of the WWW algorithm for silica to this email. Any and all help is always appreciated, as I've been stuck on this for far too long. I'm still learning the finer points of python programming (I wasn't a comp sci major in undergrad) so any general pointers are also appreciated. Thanks so much, Most sincerely, Patrick Holvey -- Patrick Holvey Graduate Student Dept. of Materials Science and Engineering Johns Hopkins University pholvey1 at jhu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: autosimplewwwV4.py Type: text/x-python Size: 27377 bytes Desc: not available URL: From andrew.maclean at gmail.com Tue Jun 7 18:18:47 2011 From: andrew.maclean at gmail.com (Andrew MacLean) Date: Tue, 7 Jun 2011 15:18:47 -0700 (PDT) Subject: [SciPy-User] Incorrect values from scipy.sparse.linalg.lobpcg using large matrices In-Reply-To: References: <13918a53-881a-44df-b75a-68f4576b8721@d1g2000yqe.googlegroups.com> Message-ID: <7473d1be-62b8-4861-83df-1fde9fcac785@16g2000yqy.googlegroups.com> Version 0.7, so that might be possible. I will look into trying this in Scipy 0.9. However, I have also tried this using the most up to date Matlab version of lobpcg (available from http://www.mathworks.com/matlabcentral/fileexchange/), and that also gave values that were about 1000 times too large. On Jun 7, 11:50?am, Pauli Virtanen wrote: > Mon, 06 Jun 2011 15:07:48 -0700, Andrew MacLean wrote: > > > I have been using lobpcg (scipy.sparse.linalg.lobpcg) to solve symmetric > > generalized eigenvalue problems with large, sparse stiffness and mass > > matrices, say 'A' and 'B'. The problem is of the form Av = ?BV. > > Which version of Scipy? In Scipy 0.9, some bugs in lobpcg that appeared > on certain platforms were fixed. If you are using Scipy < 0.9, it's > possible you are hitting these. > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-User mailing list > SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user From andrew.maclean at gmail.com Tue Jun 7 23:30:26 2011 From: andrew.maclean at gmail.com (Andrew MacLean) Date: Tue, 7 Jun 2011 20:30:26 -0700 (PDT) Subject: [SciPy-User] [SciPy-user] sparse matrices - scipy In-Reply-To: <31792885.post@talk.nabble.com> References: <31792885.post@talk.nabble.com> Message-ID: If you are just trying to find the number of non-zero values in a particular row, a command like S[i,:].size or for a column S[:,j].size should work. Here, S could be of type csc, csr, lil or probably also dok as these all support indexing and slicing. csc is best for column slicing, and csr is best for row slicing, so you could also use different types. csc and csr types do not support assignment though, while lil and dok do. For adding all the entries in each column, I think the csc type would be best. A code like S[:,j].sum() should work (see http://docs.scipy.org/doc/scipy-0.9.0/reference/generated/scipy.sparse.csc_matrix.sum.html#scipy.sparse.csc_matrix.sum). On Jun 7, 3:20?pm, villamil wrote: > I just recently started using python a couple of weeks ago, and I have an > application with sparse matrices, so I found I need the Scipy package for > this. > So I have a sparse matrix S, and I want to do operations on its rows and > columns: > -find the count of the nonzero entries in each row ?S[i,:] > -add all the entries in each column ?S[:,j] > > Is there a way to do this, or do I need to access all the elements?, ? > Is there one particular format csc, csr, lil, coo, dok for which this is > easier? > > Thank you > -- > View this message in context:http://old.nabble.com/sparse-matrices---scipy-tp31792885p31792885.html > Sent from the Scipy-User mailing list archive at Nabble.com. > > _______________________________________________ > SciPy-User mailing list > SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user From pav at iki.fi Wed Jun 8 07:06:56 2011 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 08 Jun 2011 13:06:56 +0200 Subject: [SciPy-User] ND interpolation with Qhull In-Reply-To: <30eea06c-0b3a-48c5-a1ee-ff2b14c55678@k17g2000vbn.googlegroups.com> References: <4DECAE8A.3010507@gmail.com> <30eea06c-0b3a-48c5-a1ee-ff2b14c55678@k17g2000vbn.googlegroups.com> Message-ID: <1307531216.22983.30.camel@talisman> Hi, ke, 2011-06-08 kello 03:47 -0700, denis kirjoitti: > A trick for smoothing barycentric interpolation is to warp big > coefficients toward 1, > so that each vertex Vj attracts nearby X more strongly: > > In: X = sum cj Vj, inside hull of Vj > Zj = value at Vj > Out: sum( warp(cj) Zj ) / sum( warp(cj) ) > instead of sum( cj Zj ) Yeah, you can smooth things inside the triangle. However, you will still get only C0 continuity, as there are discontinuities in the gradient at the triangle boundaries. Of course, whether this matters depends on the problem. Pauli From JRadinger at gmx.at Wed Jun 8 09:41:01 2011 From: JRadinger at gmx.at (Johannes Radinger) Date: Wed, 08 Jun 2011 15:41:01 +0200 Subject: [SciPy-User] How to fit a curve/function? In-Reply-To: References: <20110607113201.7B02.B1C76292@gmail.com> <20110608105217.222500@gmx.net> Message-ID: <20110608134101.259520@gmx.net> -------- Original-Nachricht -------- > Datum: Wed, 8 Jun 2011 07:10:38 -0400 > Von: josef.pktd at gmail.com > An: SciPy Users List > Betreff: Re: [SciPy-User] How to fit a curve/function? > On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger > wrote: > > Hello, > > > > I've got following function describing any kind of animal dispersal > kernel: > > > > def pdf(x,s1,s2): > > ? ?return > (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > > > > On the other hand I've got data from literature with which I want to fit > the function so that I get s1, s2 and x. > > Ususally the data in the literature are as follows: > > > > Example 1: 50% of the animals are between -270m and +270m and 90% ?are > between -500m and + 500m > > > > Example 2: 84% is between - 5000 m and +5000m, and 73% are between > -1000m and +1000m > > > > So far as I understand an integration of the function is needed to solve > for s1 and s2 as all the literature data give percentage (area under the > curve) Can that be used to fit the curve or can that create ranges for s1 > and s2. > > I don't see a way around integration. > > If you have exactly 2 probabilities, then you can you a solver like > scipy.optimize.fsolve to match the probabilites > eg. > 0.5 = integral pdf from -270 to 270 > 0.9 = integral pdf from -500 to 500 > > If you have more than 2 probabilities, then using optimization of a > weighted function of the moment conditions would be better. > > Josef Hello again I tried following, but without success so far. What do I have to do excactly... import numpy from scipy import stats from scipy import integrate from scipy.optimize import fsolve import math p=0.3 def pdf(x,s1,s2): return (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) def equ(s1,s2): 0.5==integrate.quad(pdf,-270,270,args=(s1,s2)) 0.9==integrate.quad(pdf,-500,500,args=(s1,s2)) result=fsolve(equ, 1,500) print result /Johannes > > > > > /Johannes > > > > -- > > NEU: FreePhone - kostenlos mobil telefonieren! > > Jetzt informieren: http://www.gmx.net/de/go/freephone > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -- NEU: FreePhone - kostenlos mobil telefonieren! Jetzt informieren: http://www.gmx.net/de/go/freephone From JRadinger at gmx.at Wed Jun 8 09:41:01 2011 From: JRadinger at gmx.at (Johannes Radinger) Date: Wed, 08 Jun 2011 15:41:01 +0200 Subject: [SciPy-User] How to fit a curve/function? In-Reply-To: References: <20110607113201.7B02.B1C76292@gmail.com> <20110608105217.222500@gmx.net> Message-ID: <20110608134101.259520@gmx.net> -------- Original-Nachricht -------- > Datum: Wed, 8 Jun 2011 07:10:38 -0400 > Von: josef.pktd at gmail.com > An: SciPy Users List > Betreff: Re: [SciPy-User] How to fit a curve/function? > On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger > wrote: > > Hello, > > > > I've got following function describing any kind of animal dispersal > kernel: > > > > def pdf(x,s1,s2): > > ? ?return > (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > > > > On the other hand I've got data from literature with which I want to fit > the function so that I get s1, s2 and x. > > Ususally the data in the literature are as follows: > > > > Example 1: 50% of the animals are between -270m and +270m and 90% ?are > between -500m and + 500m > > > > Example 2: 84% is between - 5000 m and +5000m, and 73% are between > -1000m and +1000m > > > > So far as I understand an integration of the function is needed to solve > for s1 and s2 as all the literature data give percentage (area under the > curve) Can that be used to fit the curve or can that create ranges for s1 > and s2. > > I don't see a way around integration. > > If you have exactly 2 probabilities, then you can you a solver like > scipy.optimize.fsolve to match the probabilites > eg. > 0.5 = integral pdf from -270 to 270 > 0.9 = integral pdf from -500 to 500 > > If you have more than 2 probabilities, then using optimization of a > weighted function of the moment conditions would be better. > > Josef Hello again I tried following, but without success so far. What do I have to do excactly... import numpy from scipy import stats from scipy import integrate from scipy.optimize import fsolve import math p=0.3 def pdf(x,s1,s2): return (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) def equ(s1,s2): 0.5==integrate.quad(pdf,-270,270,args=(s1,s2)) 0.9==integrate.quad(pdf,-500,500,args=(s1,s2)) result=fsolve(equ, 1,500) print result /Johannes > > > > > /Johannes > > > > -- > > NEU: FreePhone - kostenlos mobil telefonieren! > > Jetzt informieren: http://www.gmx.net/de/go/freephone > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -- NEU: FreePhone - kostenlos mobil telefonieren! Jetzt informieren: http://www.gmx.net/de/go/freephone From pav at iki.fi Wed Jun 8 10:00:16 2011 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 8 Jun 2011 14:00:16 +0000 (UTC) Subject: [SciPy-User] optimize.fmin_cg and optimize.fmin_bgfs optimize to 0 References: Message-ID: Tue, 07 Jun 2011 15:23:46 -0400, Patrick Holvey wrote: [clip] > However, when > I use the gradient in the optimization, all of the atom positions shoot > right to the origin (so they're all at 0,0,0) after just 2 function > calls and 1 gradient call, which seems very odd to me. So I tried > fmin_bgfs with the gradient and the same thing happened. Does anyone > have any experience with analytic gradients where this has happened to > them? I'm confused as to whether the problem is in my gradient > implementation or in how I'm passing the gradient or what. Your Box.Forces(self, xyz) method modifies the input `xyz` argument. This you should not do --- the optimizer expects that you do not alter the current position this way. Try replacing vectorfield=xyz with vectorfield = numpy.zeros_like(xyz) or put xyz = xyz.copy() in the beginning of the routine. From kwgoodman at gmail.com Wed Jun 8 10:00:52 2011 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 8 Jun 2011 07:00:52 -0700 Subject: [SciPy-User] [job] Python Job at Hedge Fund In-Reply-To: References: Message-ID: On Wed, Jun 8, 2011 at 12:41 AM, Vincent Schut wrote: > On 06/07/2011 05:32 PM, Keith Goodman wrote: >> We are looking for help to predict tomorrow's stock returns. >> >> The challenge is model selection in the presence of noisy data. The >> tools are ubuntu, python, cython, c, numpy, scipy, la, bottleneck, >> git. >> >> A quantitative background and experience or interest in model >> selection, machine learning, and software development are a plus. >> >> This is a full time position in Berkeley, California, two blocks from >> UC Berkeley. >> >> If you are interested send a CV or similar (or questions) to >> '.'.join(['htiek','scitylanayelekreb at namdoog','moc'][::-1])[::-1] > > No interest (it's slightly out of my commuting range) nor questions, but > this is by far the best email address obfuscation I have seen so far :-) Ha. I also thought about using: >> x = [c for c in x] >> rs = np.random.RandomState([1,2,3]) >> rs.shuffle(x) >> ''.join(x) 'oauoeyphjlot.nrdmorb at oerrg' Would that have cut down on the number of resumes? Not from this list. Give it a try. From josef.pktd at gmail.com Wed Jun 8 10:12:58 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 8 Jun 2011 10:12:58 -0400 Subject: [SciPy-User] How to fit a curve/function? In-Reply-To: <20110608134101.259520@gmx.net> References: <20110607113201.7B02.B1C76292@gmail.com> <20110608105217.222500@gmx.net> <20110608134101.259520@gmx.net> Message-ID: On Wed, Jun 8, 2011 at 9:41 AM, Johannes Radinger wrote: > > -------- Original-Nachricht -------- >> Datum: Wed, 8 Jun 2011 07:10:38 -0400 >> Von: josef.pktd at gmail.com >> An: SciPy Users List >> Betreff: Re: [SciPy-User] How to fit a curve/function? > >> On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger >> wrote: >> > Hello, >> > >> > I've got following function describing any kind of animal dispersal >> kernel: >> > >> > def pdf(x,s1,s2): >> > ? ?return >> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) >> > >> > On the other hand I've got data from literature with which I want to fit >> the function so that I get s1, s2 and x. >> > Ususally the data in the literature are as follows: >> > >> > Example 1: 50% of the animals are between -270m and +270m and 90% ?are >> between -500m and + 500m >> > >> > Example 2: 84% is between - 5000 m and +5000m, and 73% are between >> -1000m and +1000m >> > >> > So far as I understand an integration of the function is needed to solve >> for s1 and s2 as all the literature data give percentage (area under the >> curve) Can that be used to fit the curve or can that create ranges for s1 >> and s2. >> >> I don't see a way around integration. >> >> If you have exactly 2 probabilities, then you can you a solver like >> scipy.optimize.fsolve to match the probabilites >> eg. >> 0.5 = integral pdf from -270 to 270 >> 0.9 = integral pdf from -500 to 500 >> >> If you have more than 2 probabilities, then using optimization of a >> weighted function of the moment conditions would be better. >> >> Josef > > > > Hello again > > I tried following, but without success so far. What do I have to do excactly... > > import numpy > from scipy import stats > from scipy import integrate > from scipy.optimize import fsolve > import math > > p=0.3 > > def pdf(x,s1,s2): > ? ?return (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > > def equ(s1,s2): > ? ?0.5==integrate.quad(pdf,-270,270,args=(s1,s2)) > ? ?0.9==integrate.quad(pdf,-500,500,args=(s1,s2)) > > result=fsolve(equ, 1,500) > > print result equ needs to return the deviation of the equations (I changed some details for s1 just to try it) import numpy from scipy import stats from scipy import integrate from scipy.optimize import fsolve import math p=0.3 def pdf(x,s1,s2): return (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(math.sqrt(2*math.pi*s2**2))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) def equ(arg): s1,s2 = numpy.abs(arg) cond1 = 0.5 - integrate.quad(pdf,-270,270,args=(s1,s2))[0] cond2 = 0.9 - integrate.quad(pdf,-500,500,args=(s1,s2))[0] return [cond1, cond2] result=fsolve(equ, [200., 1200]) print result but in the results I get the parameters are very close to each other [-356.5283675 353.82544075] the pdf looks just like a mixture of 2 normals both with loc=0, then maybe the cdf of norm can be used directly >>> from scipy import stats >>> stats.norm.cdf(270, scale=350) - stats.norm.cdf(-270, scale=350) 0.55954705470577526 >>> >>> stats.norm.cdf(270, scale=354) - stats.norm.cdf(-270, scale=354) 0.55436474670960978 >>> stats.norm.cdf(500, scale=354) - stats.norm.cdf(-500, scale=354) 0.84217642881921018 Josef > > > /Johannes >> >> > >> > /Johannes >> > >> > -- >> > NEU: FreePhone - kostenlos mobil telefonieren! >> > Jetzt informieren: http://www.gmx.net/de/go/freephone >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > -- > NEU: FreePhone - kostenlos mobil telefonieren! > Jetzt informieren: http://www.gmx.net/de/go/freephone > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From meesters at aesku.com Wed Jun 8 10:26:46 2011 From: meesters at aesku.com (Meesters, Christian) Date: Wed, 8 Jun 2011 14:26:46 +0000 Subject: [SciPy-User] curve fitting with fixed parameters In-Reply-To: References: <8E882955B5BEA54BA86AB84407D7BBE3048385@AESKU-EXCH01.AESKU.local>, Message-ID: <8E882955B5BEA54BA86AB84407D7BBE304A4C3@AESKU-EXCH01.AESKU.local> Nice. Thank you. But just recently we decided to start a bigger project and translate the Python snippets to C++. So, an additional abstraction level would not be a good idea in this case. ________________________________________ From: scipy-user-bounces at scipy.org [scipy-user-bounces at scipy.org] on behalf of Christian K. [ckkart at hoc.net] Sent: Wednesday, June 08, 2011 1:38 PM To: scipy-user at scipy.org Subject: Re: [SciPy-User] curve fitting with fixed parameters Meesters, Christian aesku.com> writes: > > Hi, > Recently I started a thread "curve_fit - fitting a sum of 'functions'". Thanks for all the ideas: I am working to get proper weights for the actual function I would like to fit. Have a look at peak-o-mat (http://lorentz.sourceforge.net). It is an interactive fitting program, written in python/wxpython. Fitting is done using scipy.odr. You may speciy weights, mark parameters as fixed, etc. Any python expression may be used as model. The documentation is sparse and the latest file release is quite old, so better use the source on svn. Regards, Christian _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From JRadinger at gmx.at Wed Jun 8 10:27:43 2011 From: JRadinger at gmx.at (Johannes Radinger) Date: Wed, 08 Jun 2011 16:27:43 +0200 Subject: [SciPy-User] How to fit a curve/function? In-Reply-To: References: <20110607113201.7B02.B1C76292@gmail.com> <20110608105217.222500@gmx.net> <20110608134101.259520@gmx.net> Message-ID: <20110608142743.77890@gmx.net> -------- Original-Nachricht -------- > Datum: Wed, 8 Jun 2011 10:12:58 -0400 > Von: josef.pktd at gmail.com > An: SciPy Users List > Betreff: Re: [SciPy-User] How to fit a curve/function? > On Wed, Jun 8, 2011 at 9:41 AM, Johannes Radinger > wrote: > > > > -------- Original-Nachricht -------- > >> Datum: Wed, 8 Jun 2011 07:10:38 -0400 > >> Von: josef.pktd at gmail.com > >> An: SciPy Users List > >> Betreff: Re: [SciPy-User] How to fit a curve/function? > > > >> On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger > >> wrote: > >> > Hello, > >> > > >> > I've got following function describing any kind of animal dispersal > >> kernel: > >> > > >> > def pdf(x,s1,s2): > >> > ? ?return > >> > (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > >> > > >> > On the other hand I've got data from literature with which I want to > fit > >> the function so that I get s1, s2 and x. > >> > Ususally the data in the literature are as follows: > >> > > >> > Example 1: 50% of the animals are between -270m and +270m and 90% > ?are > >> between -500m and + 500m > >> > > >> > Example 2: 84% is between - 5000 m and +5000m, and 73% are between > >> -1000m and +1000m > >> > > >> > So far as I understand an integration of the function is needed to > solve > >> for s1 and s2 as all the literature data give percentage (area under > the > >> curve) Can that be used to fit the curve or can that create ranges for > s1 > >> and s2. > >> > >> I don't see a way around integration. > >> > >> If you have exactly 2 probabilities, then you can you a solver like > >> scipy.optimize.fsolve to match the probabilites > >> eg. > >> 0.5 = integral pdf from -270 to 270 > >> 0.9 = integral pdf from -500 to 500 > >> > >> If you have more than 2 probabilities, then using optimization of a > >> weighted function of the moment conditions would be better. > >> > >> Josef > > > > > > > > Hello again > > > > I tried following, but without success so far. What do I have to do > excactly... > > > > import numpy > > from scipy import stats > > from scipy import integrate > > from scipy.optimize import fsolve > > import math > > > > p=0.3 > > > > def pdf(x,s1,s2): > > ? ?return > (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > > > > def equ(s1,s2): > > ? ?0.5==integrate.quad(pdf,-270,270,args=(s1,s2)) > > ? ?0.9==integrate.quad(pdf,-500,500,args=(s1,s2)) > > > > result=fsolve(equ, 1,500) > > > > print result > > equ needs to return the deviation of the equations (I changed some > details for s1 just to try it) > > import numpy > from scipy import stats > from scipy import integrate > from scipy.optimize import fsolve > import math > > p=0.3 > > def pdf(x,s1,s2): > return > (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(math.sqrt(2*math.pi*s2**2))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > > def equ(arg): > s1,s2 = numpy.abs(arg) > cond1 = 0.5 - integrate.quad(pdf,-270,270,args=(s1,s2))[0] > cond2 = 0.9 - integrate.quad(pdf,-500,500,args=(s1,s2))[0] > return [cond1, cond2] > > result=fsolve(equ, [200., 1200]) > > print result > > but in the results I get the parameters are very close to each other > [-356.5283675 353.82544075] > > the pdf looks just like a mixture of 2 normals both with loc=0, then > maybe the cdf of norm can be used directly Thank you for that hint... First yes these are 2 superimposed normals but for other reasons I want to use the original formula instead of the stats.functions... anyway there is still a thing...the locator s1 and s2 are like the scale parameter of stats.norm so the are both + and -. For fsolve above it seems that I get only one parameter (s1 or s2) but for the positive and negative side of the distribution. So in actually there are four parameters -s1, +s1, -s2, +s2. How can I solve that? Maybe I can restrict the fsolve to look for the two values only in the positive range... any guesses? /J > > >>> from scipy import stats > >>> stats.norm.cdf(270, scale=350) - stats.norm.cdf(-270, scale=350) > 0.55954705470577526 > >>> > >>> stats.norm.cdf(270, scale=354) - stats.norm.cdf(-270, scale=354) > 0.55436474670960978 > >>> stats.norm.cdf(500, scale=354) - stats.norm.cdf(-500, scale=354) > 0.84217642881921018 > > Josef > > > > > > /Johannes > >> > >> > > >> > /Johannes > >> > > >> > -- > >> > NEU: FreePhone - kostenlos mobil telefonieren! > >> > Jetzt informieren: http://www.gmx.net/de/go/freephone > >> > _______________________________________________ > >> > SciPy-User mailing list > >> > SciPy-User at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > -- > > NEU: FreePhone - kostenlos mobil telefonieren! > > Jetzt informieren: http://www.gmx.net/de/go/freephone > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -- NEU: FreePhone - kostenlos mobil telefonieren! Jetzt informieren: http://www.gmx.net/de/go/freephone From JRadinger at gmx.at Wed Jun 8 10:27:43 2011 From: JRadinger at gmx.at (Johannes Radinger) Date: Wed, 08 Jun 2011 16:27:43 +0200 Subject: [SciPy-User] How to fit a curve/function? In-Reply-To: References: <20110607113201.7B02.B1C76292@gmail.com> <20110608105217.222500@gmx.net> <20110608134101.259520@gmx.net> Message-ID: <20110608142743.77890@gmx.net> -------- Original-Nachricht -------- > Datum: Wed, 8 Jun 2011 10:12:58 -0400 > Von: josef.pktd at gmail.com > An: SciPy Users List > Betreff: Re: [SciPy-User] How to fit a curve/function? > On Wed, Jun 8, 2011 at 9:41 AM, Johannes Radinger > wrote: > > > > -------- Original-Nachricht -------- > >> Datum: Wed, 8 Jun 2011 07:10:38 -0400 > >> Von: josef.pktd at gmail.com > >> An: SciPy Users List > >> Betreff: Re: [SciPy-User] How to fit a curve/function? > > > >> On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger > >> wrote: > >> > Hello, > >> > > >> > I've got following function describing any kind of animal dispersal > >> kernel: > >> > > >> > def pdf(x,s1,s2): > >> > ? ?return > >> > (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > >> > > >> > On the other hand I've got data from literature with which I want to > fit > >> the function so that I get s1, s2 and x. > >> > Ususally the data in the literature are as follows: > >> > > >> > Example 1: 50% of the animals are between -270m and +270m and 90% > ?are > >> between -500m and + 500m > >> > > >> > Example 2: 84% is between - 5000 m and +5000m, and 73% are between > >> -1000m and +1000m > >> > > >> > So far as I understand an integration of the function is needed to > solve > >> for s1 and s2 as all the literature data give percentage (area under > the > >> curve) Can that be used to fit the curve or can that create ranges for > s1 > >> and s2. > >> > >> I don't see a way around integration. > >> > >> If you have exactly 2 probabilities, then you can you a solver like > >> scipy.optimize.fsolve to match the probabilites > >> eg. > >> 0.5 = integral pdf from -270 to 270 > >> 0.9 = integral pdf from -500 to 500 > >> > >> If you have more than 2 probabilities, then using optimization of a > >> weighted function of the moment conditions would be better. > >> > >> Josef > > > > > > > > Hello again > > > > I tried following, but without success so far. What do I have to do > excactly... > > > > import numpy > > from scipy import stats > > from scipy import integrate > > from scipy.optimize import fsolve > > import math > > > > p=0.3 > > > > def pdf(x,s1,s2): > > ? ?return > (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > > > > def equ(s1,s2): > > ? ?0.5==integrate.quad(pdf,-270,270,args=(s1,s2)) > > ? ?0.9==integrate.quad(pdf,-500,500,args=(s1,s2)) > > > > result=fsolve(equ, 1,500) > > > > print result > > equ needs to return the deviation of the equations (I changed some > details for s1 just to try it) > > import numpy > from scipy import stats > from scipy import integrate > from scipy.optimize import fsolve > import math > > p=0.3 > > def pdf(x,s1,s2): > return > (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(math.sqrt(2*math.pi*s2**2))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > > def equ(arg): > s1,s2 = numpy.abs(arg) > cond1 = 0.5 - integrate.quad(pdf,-270,270,args=(s1,s2))[0] > cond2 = 0.9 - integrate.quad(pdf,-500,500,args=(s1,s2))[0] > return [cond1, cond2] > > result=fsolve(equ, [200., 1200]) > > print result > > but in the results I get the parameters are very close to each other > [-356.5283675 353.82544075] > > the pdf looks just like a mixture of 2 normals both with loc=0, then > maybe the cdf of norm can be used directly Thank you for that hint... First yes these are 2 superimposed normals but for other reasons I want to use the original formula instead of the stats.functions... anyway there is still a thing...the locator s1 and s2 are like the scale parameter of stats.norm so the are both + and -. For fsolve above it seems that I get only one parameter (s1 or s2) but for the positive and negative side of the distribution. So in actually there are four parameters -s1, +s1, -s2, +s2. How can I solve that? Maybe I can restrict the fsolve to look for the two values only in the positive range... any guesses? /J > > >>> from scipy import stats > >>> stats.norm.cdf(270, scale=350) - stats.norm.cdf(-270, scale=350) > 0.55954705470577526 > >>> > >>> stats.norm.cdf(270, scale=354) - stats.norm.cdf(-270, scale=354) > 0.55436474670960978 > >>> stats.norm.cdf(500, scale=354) - stats.norm.cdf(-500, scale=354) > 0.84217642881921018 > > Josef > > > > > > /Johannes > >> > >> > > >> > /Johannes > >> > > >> > -- > >> > NEU: FreePhone - kostenlos mobil telefonieren! > >> > Jetzt informieren: http://www.gmx.net/de/go/freephone > >> > _______________________________________________ > >> > SciPy-User mailing list > >> > SciPy-User at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > -- > > NEU: FreePhone - kostenlos mobil telefonieren! > > Jetzt informieren: http://www.gmx.net/de/go/freephone > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -- NEU: FreePhone - kostenlos mobil telefonieren! Jetzt informieren: http://www.gmx.net/de/go/freephone From villamil at brandeis.edu Wed Jun 8 10:29:26 2011 From: villamil at brandeis.edu (villamil) Date: Wed, 8 Jun 2011 07:29:26 -0700 (PDT) Subject: [SciPy-User] [SciPy-user] sparse matrices - scipy In-Reply-To: References: <31792885.post@talk.nabble.com> Message-ID: <31801164.post@talk.nabble.com> That's exactly what I needed, and it wasn't very hard too. Thank you. Andrew MacLean-3 wrote: > > If you are just trying to find the number of non-zero values in a > particular row, a command like S[i,:].size or for a column S[:,j].size > should work. Here, S could be of type csc, csr, lil or probably also > dok as these all support indexing and slicing. csc is best for column > slicing, and csr is best for row slicing, so you could also use > different types. csc and csr types do not support assignment though, > while lil and dok do. > > For adding all the entries in each column, I think the csc type would > be best. A code like S[:,j].sum() should work (see > http://docs.scipy.org/doc/scipy-0.9.0/reference/generated/scipy.sparse.csc_matrix.sum.html#scipy.sparse.csc_matrix.sum). > > > On Jun 7, 3:20?pm, villamil wrote: >> I just recently started using python a couple of weeks ago, and I have an >> application with sparse matrices, so I found I need the Scipy package for >> this. >> So I have a sparse matrix S, and I want to do operations on its rows and >> columns: >> -find the count of the nonzero entries in each row ?S[i,:] >> -add all the entries in each column ?S[:,j] >> >> Is there a way to do this, or do I need to access all the elements?, ? >> Is there one particular format csc, csr, lil, coo, dok for which this is >> easier? >> >> Thank you >> -- >> View this message in >> context:http://old.nabble.com/sparse-matrices---scipy-tp31792885p31792885.html >> Sent from the Scipy-User mailing list archive at Nabble.com. >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- View this message in context: http://old.nabble.com/sparse-matrices---scipy-tp31792885p31801164.html Sent from the Scipy-User mailing list archive at Nabble.com. From schut at sarvision.nl Wed Jun 8 10:31:37 2011 From: schut at sarvision.nl (Vincent Schut) Date: Wed, 08 Jun 2011 16:31:37 +0200 Subject: [SciPy-User] [job] Python Job at Hedge Fund In-Reply-To: References: Message-ID: On 06/08/2011 04:00 PM, Keith Goodman wrote: > On Wed, Jun 8, 2011 at 12:41 AM, Vincent Schut wrote: >> On 06/07/2011 05:32 PM, Keith Goodman wrote: >>> We are looking for help to predict tomorrow's stock returns. >>> >>> The challenge is model selection in the presence of noisy data. The >>> tools are ubuntu, python, cython, c, numpy, scipy, la, bottleneck, >>> git. >>> >>> A quantitative background and experience or interest in model >>> selection, machine learning, and software development are a plus. >>> >>> This is a full time position in Berkeley, California, two blocks from >>> UC Berkeley. >>> >>> If you are interested send a CV or similar (or questions) to >>> '.'.join(['htiek','scitylanayelekreb at namdoog','moc'][::-1])[::-1] >> >> No interest (it's slightly out of my commuting range) nor questions, but >> this is by far the best email address obfuscation I have seen so far :-) > > Ha. I also thought about using: > >>> x = [c for c in x] >>> rs = np.random.RandomState([1,2,3]) >>> rs.shuffle(x) >>> ''.join(x) > 'oauoeyphjlot.nrdmorb at oerrg' > > Would that have cut down on the number of resumes? Not from this list. > Give it a try. rs = np.random.RandomState([1,2,3]) xs = 'oauoeyphjlot.nrdmorb at oerrg' xs = [c for c in xs] x = np.asarray(xs) i = range(len(xs)) rs.shuffle(i) x[i] = xs print ''.join(x) sorry it took so long, cooking lasagna in the meantime... :-) VS. From josef.pktd at gmail.com Wed Jun 8 10:33:45 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 8 Jun 2011 10:33:45 -0400 Subject: [SciPy-User] How to fit a curve/function? In-Reply-To: <20110608142743.77890@gmx.net> References: <20110607113201.7B02.B1C76292@gmail.com> <20110608105217.222500@gmx.net> <20110608134101.259520@gmx.net> <20110608142743.77890@gmx.net> Message-ID: On Wed, Jun 8, 2011 at 10:27 AM, Johannes Radinger wrote: > > -------- Original-Nachricht -------- >> Datum: Wed, 8 Jun 2011 10:12:58 -0400 >> Von: josef.pktd at gmail.com >> An: SciPy Users List >> Betreff: Re: [SciPy-User] How to fit a curve/function? > >> On Wed, Jun 8, 2011 at 9:41 AM, Johannes Radinger >> wrote: >> > >> > -------- Original-Nachricht -------- >> >> Datum: Wed, 8 Jun 2011 07:10:38 -0400 >> >> Von: josef.pktd at gmail.com >> >> An: SciPy Users List >> >> Betreff: Re: [SciPy-User] How to fit a curve/function? >> > >> >> On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger >> >> wrote: >> >> > Hello, >> >> > >> >> > I've got following function describing any kind of animal dispersal >> >> kernel: >> >> > >> >> > def pdf(x,s1,s2): >> >> > ? ?return >> >> >> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) >> >> > >> >> > On the other hand I've got data from literature with which I want to >> fit >> >> the function so that I get s1, s2 and x. >> >> > Ususally the data in the literature are as follows: >> >> > >> >> > Example 1: 50% of the animals are between -270m and +270m and 90% >> ?are >> >> between -500m and + 500m >> >> > >> >> > Example 2: 84% is between - 5000 m and +5000m, and 73% are between >> >> -1000m and +1000m >> >> > >> >> > So far as I understand an integration of the function is needed to >> solve >> >> for s1 and s2 as all the literature data give percentage (area under >> the >> >> curve) Can that be used to fit the curve or can that create ranges for >> s1 >> >> and s2. >> >> >> >> I don't see a way around integration. >> >> >> >> If you have exactly 2 probabilities, then you can you a solver like >> >> scipy.optimize.fsolve to match the probabilites >> >> eg. >> >> 0.5 = integral pdf from -270 to 270 >> >> 0.9 = integral pdf from -500 to 500 >> >> >> >> If you have more than 2 probabilities, then using optimization of a >> >> weighted function of the moment conditions would be better. >> >> >> >> Josef >> > >> > >> > >> > Hello again >> > >> > I tried following, but without success so far. What do I have to do >> excactly... >> > >> > import numpy >> > from scipy import stats >> > from scipy import integrate >> > from scipy.optimize import fsolve >> > import math >> > >> > p=0.3 >> > >> > def pdf(x,s1,s2): >> > ? ?return >> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) >> > >> > def equ(s1,s2): >> > ? ?0.5==integrate.quad(pdf,-270,270,args=(s1,s2)) >> > ? ?0.9==integrate.quad(pdf,-500,500,args=(s1,s2)) >> > >> > result=fsolve(equ, 1,500) >> > >> > print result >> >> equ needs to return the deviation of the equations (I changed some >> details for s1 just to try it) >> >> import numpy >> from scipy import stats >> from scipy import integrate >> from scipy.optimize import fsolve >> import math >> >> p=0.3 >> >> def pdf(x,s1,s2): >> ? ? return >> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(math.sqrt(2*math.pi*s2**2))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) >> >> def equ(arg): >> ? ? s1,s2 = numpy.abs(arg) >> ? ? cond1 = 0.5 - integrate.quad(pdf,-270,270,args=(s1,s2))[0] >> ? ? cond2 = 0.9 - integrate.quad(pdf,-500,500,args=(s1,s2))[0] >> ? ? return [cond1, cond2] >> >> result=fsolve(equ, [200., 1200]) >> >> print result >> >> but in the results I get the parameters are very close to each other >> [-356.5283675 ? 353.82544075] >> >> the pdf looks just like a mixture of 2 normals both with loc=0, then >> maybe the cdf of norm can be used directly > > > Thank you for that hint... First yes these are 2 superimposed normals but for other reasons I want to use the original formula instead of the stats.functions... > > anyway there is still a thing...the locator s1 and s2 are like the scale parameter of stats.norm so the are both + and -. For fsolve above it seems that I get only one parameter (s1 or s2) but for the positive and negative side of the distribution. So in actually there are four parameters -s1, +s1, -s2, +s2. How can I solve that? Maybe I can restrict the fsolve to look for the two values only in the positive range... It doesn't really matter, if the scale only shows up in quadratic terms, or as in my initial change I added a absolute value, so whether it's positive or negative, it's still only one value, and we interprete it as postive scale s1 = sqrt(s1**2) Josef > > any guesses? > > /J > >> >> >>> from scipy import stats >> >>> stats.norm.cdf(270, scale=350) - stats.norm.cdf(-270, scale=350) >> 0.55954705470577526 >> >>> >> >>> stats.norm.cdf(270, scale=354) - stats.norm.cdf(-270, scale=354) >> 0.55436474670960978 >> >>> stats.norm.cdf(500, scale=354) - stats.norm.cdf(-500, scale=354) >> 0.84217642881921018 >> >> Josef >> > >> > >> > /Johannes >> >> >> >> > >> >> > /Johannes >> >> > >> >> > -- >> >> > NEU: FreePhone - kostenlos mobil telefonieren! >> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone >> >> > _______________________________________________ >> >> > SciPy-User mailing list >> >> > SciPy-User at scipy.org >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > >> >> _______________________________________________ >> >> SciPy-User mailing list >> >> SciPy-User at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> > -- >> > NEU: FreePhone - kostenlos mobil telefonieren! >> > Jetzt informieren: http://www.gmx.net/de/go/freephone >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > -- > NEU: FreePhone - kostenlos mobil telefonieren! > Jetzt informieren: http://www.gmx.net/de/go/freephone > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From JRadinger at gmx.at Wed Jun 8 10:56:25 2011 From: JRadinger at gmx.at (Johannes Radinger) Date: Wed, 08 Jun 2011 16:56:25 +0200 Subject: [SciPy-User] How to fit a curve/function? In-Reply-To: References: <20110607113201.7B02.B1C76292@gmail.com> <20110608105217.222500@gmx.net> <20110608134101.259520@gmx.net> <20110608142743.77890@gmx.net> Message-ID: <20110608145625.162510@gmx.net> -------- Original-Nachricht -------- > Datum: Wed, 8 Jun 2011 10:33:45 -0400 > Von: josef.pktd at gmail.com > An: SciPy Users List > Betreff: Re: [SciPy-User] How to fit a curve/function? > On Wed, Jun 8, 2011 at 10:27 AM, Johannes Radinger > wrote: > > > > -------- Original-Nachricht -------- > >> Datum: Wed, 8 Jun 2011 10:12:58 -0400 > >> Von: josef.pktd at gmail.com > >> An: SciPy Users List > >> Betreff: Re: [SciPy-User] How to fit a curve/function? > > > >> On Wed, Jun 8, 2011 at 9:41 AM, Johannes Radinger > >> wrote: > >> > > >> > -------- Original-Nachricht -------- > >> >> Datum: Wed, 8 Jun 2011 07:10:38 -0400 > >> >> Von: josef.pktd at gmail.com > >> >> An: SciPy Users List > >> >> Betreff: Re: [SciPy-User] How to fit a curve/function? > >> > > >> >> On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger > >> >> wrote: > >> >> > Hello, > >> >> > > >> >> > I've got following function describing any kind of animal > dispersal > >> >> kernel: > >> >> > > >> >> > def pdf(x,s1,s2): > >> >> > ? ?return > >> >> > >> > (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > >> >> > > >> >> > On the other hand I've got data from literature with which I want > to > >> fit > >> >> the function so that I get s1, s2 and x. > >> >> > Ususally the data in the literature are as follows: > >> >> > > >> >> > Example 1: 50% of the animals are between -270m and +270m and 90% > >> ?are > >> >> between -500m and + 500m > >> >> > > >> >> > Example 2: 84% is between - 5000 m and +5000m, and 73% are between > >> >> -1000m and +1000m > >> >> > > >> >> > So far as I understand an integration of the function is needed to > >> solve > >> >> for s1 and s2 as all the literature data give percentage (area under > >> the > >> >> curve) Can that be used to fit the curve or can that create ranges > for > >> s1 > >> >> and s2. > >> >> > >> >> I don't see a way around integration. > >> >> > >> >> If you have exactly 2 probabilities, then you can you a solver like > >> >> scipy.optimize.fsolve to match the probabilites > >> >> eg. > >> >> 0.5 = integral pdf from -270 to 270 > >> >> 0.9 = integral pdf from -500 to 500 > >> >> > >> >> If you have more than 2 probabilities, then using optimization of a > >> >> weighted function of the moment conditions would be better. > >> >> > >> >> Josef > >> > > >> > > >> > > >> > Hello again > >> > > >> > I tried following, but without success so far. What do I have to do > >> excactly... > >> > > >> > import numpy > >> > from scipy import stats > >> > from scipy import integrate > >> > from scipy.optimize import fsolve > >> > import math > >> > > >> > p=0.3 > >> > > >> > def pdf(x,s1,s2): > >> > ? ?return > >> > (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > >> > > >> > def equ(s1,s2): > >> > ? ?0.5==integrate.quad(pdf,-270,270,args=(s1,s2)) > >> > ? ?0.9==integrate.quad(pdf,-500,500,args=(s1,s2)) > >> > > >> > result=fsolve(equ, 1,500) > >> > > >> > print result > >> > >> equ needs to return the deviation of the equations (I changed some > >> details for s1 just to try it) > >> > >> import numpy > >> from scipy import stats > >> from scipy import integrate > >> from scipy.optimize import fsolve > >> import math > >> > >> p=0.3 > >> > >> def pdf(x,s1,s2): > >> ? ? return > >> > (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(math.sqrt(2*math.pi*s2**2))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > >> > >> def equ(arg): > >> ? ? s1,s2 = numpy.abs(arg) > >> ? ? cond1 = 0.5 - integrate.quad(pdf,-270,270,args=(s1,s2))[0] > >> ? ? cond2 = 0.9 - integrate.quad(pdf,-500,500,args=(s1,s2))[0] > >> ? ? return [cond1, cond2] > >> > >> result=fsolve(equ, [200., 1200]) thank you for your last reply...seems that the parameters of the two normals are nearly identical... anyway just two small addtional questions: 1)in fsolve(equ, [200., 1200]) the 200 and 1200 are kind of start values so far as I understand...how should these be choosen? what is recommended? 2) How can that be solve if I have I third condition (overfitted) can that be used as well or how does the alternative look like? /johannes > >> > >> print result > >> > >> but in the results I get the parameters are very close to each other > >> [-356.5283675 ? 353.82544075] > >> > >> the pdf looks just like a mixture of 2 normals both with loc=0, then > >> maybe the cdf of norm can be used directly > > > > > > Thank you for that hint... First yes these are 2 superimposed normals > but for other reasons I want to use the original formula instead of the > stats.functions... > > > > anyway there is still a thing...the locator s1 and s2 are like the scale > parameter of stats.norm so the are both + and -. For fsolve above it seems > that I get only one parameter (s1 or s2) but for the positive and negative > side of the distribution. So in actually there are four parameters -s1, > +s1, -s2, +s2. How can I solve that? Maybe I can restrict the fsolve to look > for the two values only in the positive range... > > It doesn't really matter, if the scale only shows up in quadratic > terms, or as in my initial change I added a absolute value, so whether > it's positive or negative, it's still only one value, and we > interprete it as postive scale > > s1 = sqrt(s1**2) > > Josef > > > > > any guesses? > > > > /J > > > >> > >> >>> from scipy import stats > >> >>> stats.norm.cdf(270, scale=350) - stats.norm.cdf(-270, scale=350) > >> 0.55954705470577526 > >> >>> > >> >>> stats.norm.cdf(270, scale=354) - stats.norm.cdf(-270, scale=354) > >> 0.55436474670960978 > >> >>> stats.norm.cdf(500, scale=354) - stats.norm.cdf(-500, scale=354) > >> 0.84217642881921018 > >> > >> Josef > >> > > >> > > >> > /Johannes > >> >> > >> >> > > >> >> > /Johannes > >> >> > > >> >> > -- > >> >> > NEU: FreePhone - kostenlos mobil telefonieren! > >> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone > >> >> > _______________________________________________ > >> >> > SciPy-User mailing list > >> >> > SciPy-User at scipy.org > >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >> >> > > >> >> _______________________________________________ > >> >> SciPy-User mailing list > >> >> SciPy-User at scipy.org > >> >> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > >> > -- > >> > NEU: FreePhone - kostenlos mobil telefonieren! > >> > Jetzt informieren: http://www.gmx.net/de/go/freephone > >> > _______________________________________________ > >> > SciPy-User mailing list > >> > SciPy-User at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > -- > > NEU: FreePhone - kostenlos mobil telefonieren! > > Jetzt informieren: http://www.gmx.net/de/go/freephone > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -- NEU: FreePhone - kostenlos mobil telefonieren! Jetzt informieren: http://www.gmx.net/de/go/freephone From josef.pktd at gmail.com Wed Jun 8 11:37:00 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 8 Jun 2011 11:37:00 -0400 Subject: [SciPy-User] How to fit a curve/function? In-Reply-To: <20110608145625.162510@gmx.net> References: <20110607113201.7B02.B1C76292@gmail.com> <20110608105217.222500@gmx.net> <20110608134101.259520@gmx.net> <20110608142743.77890@gmx.net> <20110608145625.162510@gmx.net> Message-ID: On Wed, Jun 8, 2011 at 10:56 AM, Johannes Radinger wrote: > > -------- Original-Nachricht -------- >> Datum: Wed, 8 Jun 2011 10:33:45 -0400 >> Von: josef.pktd at gmail.com >> An: SciPy Users List >> Betreff: Re: [SciPy-User] How to fit a curve/function? > >> On Wed, Jun 8, 2011 at 10:27 AM, Johannes Radinger >> wrote: >> > >> > -------- Original-Nachricht -------- >> >> Datum: Wed, 8 Jun 2011 10:12:58 -0400 >> >> Von: josef.pktd at gmail.com >> >> An: SciPy Users List >> >> Betreff: Re: [SciPy-User] How to fit a curve/function? >> > >> >> On Wed, Jun 8, 2011 at 9:41 AM, Johannes Radinger >> >> wrote: >> >> > >> >> > -------- Original-Nachricht -------- >> >> >> Datum: Wed, 8 Jun 2011 07:10:38 -0400 >> >> >> Von: josef.pktd at gmail.com >> >> >> An: SciPy Users List >> >> >> Betreff: Re: [SciPy-User] How to fit a curve/function? >> >> > >> >> >> On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger >> >> >> wrote: >> >> >> > Hello, >> >> >> > >> >> >> > I've got following function describing any kind of animal >> dispersal >> >> >> kernel: >> >> >> > >> >> >> > def pdf(x,s1,s2): >> >> >> > ? ?return >> >> >> >> >> >> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) >> >> >> > >> >> >> > On the other hand I've got data from literature with which I want >> to >> >> fit >> >> >> the function so that I get s1, s2 and x. >> >> >> > Ususally the data in the literature are as follows: >> >> >> > >> >> >> > Example 1: 50% of the animals are between -270m and +270m and 90% >> >> ?are >> >> >> between -500m and + 500m >> >> >> > >> >> >> > Example 2: 84% is between - 5000 m and +5000m, and 73% are between >> >> >> -1000m and +1000m >> >> >> > >> >> >> > So far as I understand an integration of the function is needed to >> >> solve >> >> >> for s1 and s2 as all the literature data give percentage (area under >> >> the >> >> >> curve) Can that be used to fit the curve or can that create ranges >> for >> >> s1 >> >> >> and s2. >> >> >> >> >> >> I don't see a way around integration. >> >> >> >> >> >> If you have exactly 2 probabilities, then you can you a solver like >> >> >> scipy.optimize.fsolve to match the probabilites >> >> >> eg. >> >> >> 0.5 = integral pdf from -270 to 270 >> >> >> 0.9 = integral pdf from -500 to 500 >> >> >> >> >> >> If you have more than 2 probabilities, then using optimization of a >> >> >> weighted function of the moment conditions would be better. >> >> >> >> >> >> Josef >> >> > >> >> > >> >> > >> >> > Hello again >> >> > >> >> > I tried following, but without success so far. What do I have to do >> >> excactly... >> >> > >> >> > import numpy >> >> > from scipy import stats >> >> > from scipy import integrate >> >> > from scipy.optimize import fsolve >> >> > import math >> >> > >> >> > p=0.3 >> >> > >> >> > def pdf(x,s1,s2): >> >> > ? ?return >> >> >> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) >> >> > >> >> > def equ(s1,s2): >> >> > ? ?0.5==integrate.quad(pdf,-270,270,args=(s1,s2)) >> >> > ? ?0.9==integrate.quad(pdf,-500,500,args=(s1,s2)) >> >> > >> >> > result=fsolve(equ, 1,500) >> >> > >> >> > print result >> >> >> >> equ needs to return the deviation of the equations (I changed some >> >> details for s1 just to try it) >> >> >> >> import numpy >> >> from scipy import stats >> >> from scipy import integrate >> >> from scipy.optimize import fsolve >> >> import math >> >> >> >> p=0.3 >> >> >> >> def pdf(x,s1,s2): >> >> ? ? return >> >> >> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(math.sqrt(2*math.pi*s2**2))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) >> >> >> >> def equ(arg): >> >> ? ? s1,s2 = numpy.abs(arg) >> >> ? ? cond1 = 0.5 - integrate.quad(pdf,-270,270,args=(s1,s2))[0] >> >> ? ? cond2 = 0.9 - integrate.quad(pdf,-500,500,args=(s1,s2))[0] >> >> ? ? return [cond1, cond2] >> >> >> >> result=fsolve(equ, [200., 1200]) > > thank you for your last reply...seems that the parameters of the two normals are nearly identical... anyway just two small addtional questions: > > 1)in fsolve(equ, [200., 1200]) the 200 and 1200 are kind of start values so far as I understand...how should these be choosen? what is recommended? There is no general solution for choosing starting values, in your case it should be possible to >>> q = np.array([0.5, 0.9]) >>> cr = x/stats.norm.ppf(0.5 + q/2.) >>> x = [270, 500] >>> q = np.array([0.5, 0.9]) >>> x = [270, 500] >>> cr = x/stats.norm.ppf(0.5 + q/2.) >>> stats.norm.cdf(500, scale=cr[1]) - stats.norm.cdf(-500, scale=cr[1]) 0.89999999999999991 >>> stats.norm.cdf(q[0], scale=cr[1]) - stats.norm.cdf(-q[0], scale=cr[0]) 0.0011545021185267457 >>> stats.norm.cdf(q[0], scale=cr[0]) - stats.norm.cdf(-q[0], scale=cr[0]) 0.000996601515122153 >>> stats.norm.cdf(x[0], scale=cr[0]) - stats.norm.cdf(-x[0], scale=cr[0]) 0.5 >>> sol = fsolve(equ, np.sort(cr)) there are some numerical problems finding the solution (???) >>> equ(sol) array([-0.05361093, 0.05851309]) >>> from pprint import pprint >>> pprint(fsolve(equ, np.sort(cr), xtol=1e-10, full_output=1)) (array([ 354.32616549, 354.69918062]), {'fjac': array([[-0.7373189 , -0.67554484], [ 0.67554484, -0.7373189 ]]), 'fvec': array([-0.05361093, 0.05851309]), 'nfev': 36, 'qtf': array([ 1.40019135e-07, -7.93593929e-02]), 'r': array([ -5.21390161e-04, -1.21700831e-03, 3.88274320e-07])}, 5, 'The iteration is not making good progress, as measured by the \n improvement from the last ten iterations.') > > 2) How can that be solve if I have I third condition (overfitted) can that be used as well or how does the alternative look like? use optimize.leastsq on equ (I never tried this for this case) use fmin on the sum of squared errors if the intervals for the probabilities are non-overlapping (interval data), then there is an optimal weighting matrix, (but my code for that in the statsmodels.sandbox is not verified). Josef > > /johannes > >> >> >> >> print result >> >> >> >> but in the results I get the parameters are very close to each other >> >> [-356.5283675 ? 353.82544075] >> >> >> >> the pdf looks just like a mixture of 2 normals both with loc=0, then >> >> maybe the cdf of norm can be used directly >> > >> > >> > Thank you for that hint... First yes these are 2 superimposed normals >> but for other reasons I want to use the original formula instead of the >> stats.functions... >> > >> > anyway there is still a thing...the locator s1 and s2 are like the scale >> parameter of stats.norm so the are both + and -. For fsolve above it seems >> that I get only one parameter (s1 or s2) but for the positive and negative >> side of the distribution. So in actually there are four parameters -s1, >> +s1, -s2, +s2. How can I solve that? Maybe I can restrict the fsolve to look >> for the two values only in the positive range... >> >> It doesn't really matter, if the scale only shows up in quadratic >> terms, or as in my initial change I added a absolute value, so whether >> it's positive or negative, it's still only one value, and we >> interprete it as postive scale >> >> s1 = sqrt(s1**2) >> >> Josef >> >> > >> > any guesses? >> > >> > /J >> > >> >> >> >> >>> from scipy import stats >> >> >>> stats.norm.cdf(270, scale=350) - stats.norm.cdf(-270, scale=350) >> >> 0.55954705470577526 >> >> >>> >> >> >>> stats.norm.cdf(270, scale=354) - stats.norm.cdf(-270, scale=354) >> >> 0.55436474670960978 >> >> >>> stats.norm.cdf(500, scale=354) - stats.norm.cdf(-500, scale=354) >> >> 0.84217642881921018 >> >> >> >> Josef >> >> > >> >> > >> >> > /Johannes >> >> >> >> >> >> > >> >> >> > /Johannes >> >> >> > >> >> >> > -- >> >> >> > NEU: FreePhone - kostenlos mobil telefonieren! >> >> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone >> >> >> > _______________________________________________ >> >> >> > SciPy-User mailing list >> >> >> > SciPy-User at scipy.org >> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> > >> >> >> _______________________________________________ >> >> >> SciPy-User mailing list >> >> >> SciPy-User at scipy.org >> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > >> >> > -- >> >> > NEU: FreePhone - kostenlos mobil telefonieren! >> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone >> >> > _______________________________________________ >> >> > SciPy-User mailing list >> >> > SciPy-User at scipy.org >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > >> >> _______________________________________________ >> >> SciPy-User mailing list >> >> SciPy-User at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> > -- >> > NEU: FreePhone - kostenlos mobil telefonieren! >> > Jetzt informieren: http://www.gmx.net/de/go/freephone >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > -- > NEU: FreePhone - kostenlos mobil telefonieren! > Jetzt informieren: http://www.gmx.net/de/go/freephone > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Wed Jun 8 11:37:52 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 8 Jun 2011 11:37:52 -0400 Subject: [SciPy-User] How to fit a curve/function? In-Reply-To: References: <20110607113201.7B02.B1C76292@gmail.com> <20110608105217.222500@gmx.net> <20110608134101.259520@gmx.net> <20110608142743.77890@gmx.net> <20110608145625.162510@gmx.net> Message-ID: On Wed, Jun 8, 2011 at 11:37 AM, wrote: > On Wed, Jun 8, 2011 at 10:56 AM, Johannes Radinger wrote: >> >> -------- Original-Nachricht -------- >>> Datum: Wed, 8 Jun 2011 10:33:45 -0400 >>> Von: josef.pktd at gmail.com >>> An: SciPy Users List >>> Betreff: Re: [SciPy-User] How to fit a curve/function? >> >>> On Wed, Jun 8, 2011 at 10:27 AM, Johannes Radinger >>> wrote: >>> > >>> > -------- Original-Nachricht -------- >>> >> Datum: Wed, 8 Jun 2011 10:12:58 -0400 >>> >> Von: josef.pktd at gmail.com >>> >> An: SciPy Users List >>> >> Betreff: Re: [SciPy-User] How to fit a curve/function? >>> > >>> >> On Wed, Jun 8, 2011 at 9:41 AM, Johannes Radinger >>> >> wrote: >>> >> > >>> >> > -------- Original-Nachricht -------- >>> >> >> Datum: Wed, 8 Jun 2011 07:10:38 -0400 >>> >> >> Von: josef.pktd at gmail.com >>> >> >> An: SciPy Users List >>> >> >> Betreff: Re: [SciPy-User] How to fit a curve/function? >>> >> > >>> >> >> On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger >>> >> >> wrote: >>> >> >> > Hello, >>> >> >> > >>> >> >> > I've got following function describing any kind of animal >>> dispersal >>> >> >> kernel: >>> >> >> > >>> >> >> > def pdf(x,s1,s2): >>> >> >> > ? ?return >>> >> >> >>> >> >>> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) >>> >> >> > >>> >> >> > On the other hand I've got data from literature with which I want >>> to >>> >> fit >>> >> >> the function so that I get s1, s2 and x. >>> >> >> > Ususally the data in the literature are as follows: >>> >> >> > >>> >> >> > Example 1: 50% of the animals are between -270m and +270m and 90% >>> >> ?are >>> >> >> between -500m and + 500m >>> >> >> > >>> >> >> > Example 2: 84% is between - 5000 m and +5000m, and 73% are between >>> >> >> -1000m and +1000m >>> >> >> > >>> >> >> > So far as I understand an integration of the function is needed to >>> >> solve >>> >> >> for s1 and s2 as all the literature data give percentage (area under >>> >> the >>> >> >> curve) Can that be used to fit the curve or can that create ranges >>> for >>> >> s1 >>> >> >> and s2. >>> >> >> >>> >> >> I don't see a way around integration. >>> >> >> >>> >> >> If you have exactly 2 probabilities, then you can you a solver like >>> >> >> scipy.optimize.fsolve to match the probabilites >>> >> >> eg. >>> >> >> 0.5 = integral pdf from -270 to 270 >>> >> >> 0.9 = integral pdf from -500 to 500 >>> >> >> >>> >> >> If you have more than 2 probabilities, then using optimization of a >>> >> >> weighted function of the moment conditions would be better. >>> >> >> >>> >> >> Josef >>> >> > >>> >> > >>> >> > >>> >> > Hello again >>> >> > >>> >> > I tried following, but without success so far. What do I have to do >>> >> excactly... >>> >> > >>> >> > import numpy >>> >> > from scipy import stats >>> >> > from scipy import integrate >>> >> > from scipy.optimize import fsolve >>> >> > import math >>> >> > >>> >> > p=0.3 >>> >> > >>> >> > def pdf(x,s1,s2): >>> >> > ? ?return >>> >> >>> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) >>> >> > >>> >> > def equ(s1,s2): >>> >> > ? ?0.5==integrate.quad(pdf,-270,270,args=(s1,s2)) >>> >> > ? ?0.9==integrate.quad(pdf,-500,500,args=(s1,s2)) >>> >> > >>> >> > result=fsolve(equ, 1,500) >>> >> > >>> >> > print result >>> >> >>> >> equ needs to return the deviation of the equations (I changed some >>> >> details for s1 just to try it) >>> >> >>> >> import numpy >>> >> from scipy import stats >>> >> from scipy import integrate >>> >> from scipy.optimize import fsolve >>> >> import math >>> >> >>> >> p=0.3 >>> >> >>> >> def pdf(x,s1,s2): >>> >> ? ? return >>> >> >>> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(math.sqrt(2*math.pi*s2**2))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) >>> >> >>> >> def equ(arg): >>> >> ? ? s1,s2 = numpy.abs(arg) >>> >> ? ? cond1 = 0.5 - integrate.quad(pdf,-270,270,args=(s1,s2))[0] >>> >> ? ? cond2 = 0.9 - integrate.quad(pdf,-500,500,args=(s1,s2))[0] >>> >> ? ? return [cond1, cond2] >>> >> >>> >> result=fsolve(equ, [200., 1200]) >> >> thank you for your last reply...seems that the parameters of the two normals are nearly identical... anyway just two small addtional questions: >> >> 1)in fsolve(equ, [200., 1200]) the 200 and 1200 are kind of start values so far as I understand...how should these be choosen? what is recommended? > > There is no general solution for choosing starting values, in your > case it should be possible to > >>>> q = np.array([0.5, 0.9]) >>>> cr = x/stats.norm.ppf(0.5 + q/2.) >>>> x = [270, 500] >>>> q = np.array([0.5, 0.9]) >>>> x = [270, 500] >>>> cr = x/stats.norm.ppf(0.5 + q/2.) >>>> stats.norm.cdf(500, scale=cr[1]) - stats.norm.cdf(-500, scale=cr[1]) > 0.89999999999999991 ------- I forgot to remove the typos >>>> stats.norm.cdf(q[0], scale=cr[1]) - stats.norm.cdf(-q[0], scale=cr[0]) > 0.0011545021185267457 >>>> stats.norm.cdf(q[0], scale=cr[0]) - stats.norm.cdf(-q[0], scale=cr[0]) > 0.000996601515122153 --------- >>>> stats.norm.cdf(x[0], scale=cr[0]) - stats.norm.cdf(-x[0], scale=cr[0]) > 0.5 >>>> sol = fsolve(equ, np.sort(cr)) > > there are some numerical problems finding the solution (???) > >>>> equ(sol) > array([-0.05361093, ?0.05851309]) >>>> from pprint import pprint >>>> pprint(fsolve(equ, np.sort(cr), xtol=1e-10, full_output=1)) > (array([ 354.32616549, ?354.69918062]), > ?{'fjac': array([[-0.7373189 , -0.67554484], > ? ? ? [ 0.67554484, -0.7373189 ]]), > ?'fvec': array([-0.05361093, ?0.05851309]), > ?'nfev': 36, > ?'qtf': array([ ?1.40019135e-07, ?-7.93593929e-02]), > ?'r': array([ -5.21390161e-04, ?-1.21700831e-03, ? 3.88274320e-07])}, > ?5, > ?'The iteration is not making good progress, as measured by the \n > improvement from the last ten iterations.') > >> >> 2) How can that be solve if I have I third condition (overfitted) can that be used as well or how does the alternative look like? > > use optimize.leastsq on equ (I never tried this for this case) > use fmin on the sum of squared errors > > if the intervals for the probabilities are non-overlapping (interval > data), then there is an optimal weighting matrix, (but my code for > that in the statsmodels.sandbox is not verified). > > Josef > > >> >> /johannes >> >>> >> >>> >> print result >>> >> >>> >> but in the results I get the parameters are very close to each other >>> >> [-356.5283675 ? 353.82544075] >>> >> >>> >> the pdf looks just like a mixture of 2 normals both with loc=0, then >>> >> maybe the cdf of norm can be used directly >>> > >>> > >>> > Thank you for that hint... First yes these are 2 superimposed normals >>> but for other reasons I want to use the original formula instead of the >>> stats.functions... >>> > >>> > anyway there is still a thing...the locator s1 and s2 are like the scale >>> parameter of stats.norm so the are both + and -. For fsolve above it seems >>> that I get only one parameter (s1 or s2) but for the positive and negative >>> side of the distribution. So in actually there are four parameters -s1, >>> +s1, -s2, +s2. How can I solve that? Maybe I can restrict the fsolve to look >>> for the two values only in the positive range... >>> >>> It doesn't really matter, if the scale only shows up in quadratic >>> terms, or as in my initial change I added a absolute value, so whether >>> it's positive or negative, it's still only one value, and we >>> interprete it as postive scale >>> >>> s1 = sqrt(s1**2) >>> >>> Josef >>> >>> > >>> > any guesses? >>> > >>> > /J >>> > >>> >> >>> >> >>> from scipy import stats >>> >> >>> stats.norm.cdf(270, scale=350) - stats.norm.cdf(-270, scale=350) >>> >> 0.55954705470577526 >>> >> >>> >>> >> >>> stats.norm.cdf(270, scale=354) - stats.norm.cdf(-270, scale=354) >>> >> 0.55436474670960978 >>> >> >>> stats.norm.cdf(500, scale=354) - stats.norm.cdf(-500, scale=354) >>> >> 0.84217642881921018 >>> >> >>> >> Josef >>> >> > >>> >> > >>> >> > /Johannes >>> >> >> >>> >> >> > >>> >> >> > /Johannes >>> >> >> > >>> >> >> > -- >>> >> >> > NEU: FreePhone - kostenlos mobil telefonieren! >>> >> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone >>> >> >> > _______________________________________________ >>> >> >> > SciPy-User mailing list >>> >> >> > SciPy-User at scipy.org >>> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> >> > >>> >> >> _______________________________________________ >>> >> >> SciPy-User mailing list >>> >> >> SciPy-User at scipy.org >>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> > >>> >> > -- >>> >> > NEU: FreePhone - kostenlos mobil telefonieren! >>> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone >>> >> > _______________________________________________ >>> >> > SciPy-User mailing list >>> >> > SciPy-User at scipy.org >>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> > >>> >> _______________________________________________ >>> >> SciPy-User mailing list >>> >> SciPy-User at scipy.org >>> >> http://mail.scipy.org/mailman/listinfo/scipy-user >>> > >>> > -- >>> > NEU: FreePhone - kostenlos mobil telefonieren! >>> > Jetzt informieren: http://www.gmx.net/de/go/freephone >>> > _______________________________________________ >>> > SciPy-User mailing list >>> > SciPy-User at scipy.org >>> > http://mail.scipy.org/mailman/listinfo/scipy-user >>> > >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> -- >> NEU: FreePhone - kostenlos mobil telefonieren! >> Jetzt informieren: http://www.gmx.net/de/go/freephone >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > From josef.pktd at gmail.com Wed Jun 8 11:54:15 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 8 Jun 2011 11:54:15 -0400 Subject: [SciPy-User] How to fit a curve/function? In-Reply-To: References: <20110607113201.7B02.B1C76292@gmail.com> <20110608105217.222500@gmx.net> <20110608134101.259520@gmx.net> <20110608142743.77890@gmx.net> <20110608145625.162510@gmx.net> Message-ID: On Wed, Jun 8, 2011 at 11:37 AM, wrote: > On Wed, Jun 8, 2011 at 11:37 AM, ? wrote: >> On Wed, Jun 8, 2011 at 10:56 AM, Johannes Radinger wrote: >>> >>> -------- Original-Nachricht -------- >>>> Datum: Wed, 8 Jun 2011 10:33:45 -0400 >>>> Von: josef.pktd at gmail.com >>>> An: SciPy Users List >>>> Betreff: Re: [SciPy-User] How to fit a curve/function? >>> >>>> On Wed, Jun 8, 2011 at 10:27 AM, Johannes Radinger >>>> wrote: >>>> > >>>> > -------- Original-Nachricht -------- >>>> >> Datum: Wed, 8 Jun 2011 10:12:58 -0400 >>>> >> Von: josef.pktd at gmail.com >>>> >> An: SciPy Users List >>>> >> Betreff: Re: [SciPy-User] How to fit a curve/function? >>>> > >>>> >> On Wed, Jun 8, 2011 at 9:41 AM, Johannes Radinger >>>> >> wrote: >>>> >> > >>>> >> > -------- Original-Nachricht -------- >>>> >> >> Datum: Wed, 8 Jun 2011 07:10:38 -0400 >>>> >> >> Von: josef.pktd at gmail.com >>>> >> >> An: SciPy Users List >>>> >> >> Betreff: Re: [SciPy-User] How to fit a curve/function? >>>> >> > >>>> >> >> On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger >>>> >> >> wrote: >>>> >> >> > Hello, >>>> >> >> > >>>> >> >> > I've got following function describing any kind of animal >>>> dispersal >>>> >> >> kernel: >>>> >> >> > >>>> >> >> > def pdf(x,s1,s2): >>>> >> >> > ? ?return >>>> >> >> >>>> >> >>>> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) >>>> >> >> > >>>> >> >> > On the other hand I've got data from literature with which I want >>>> to >>>> >> fit >>>> >> >> the function so that I get s1, s2 and x. >>>> >> >> > Ususally the data in the literature are as follows: >>>> >> >> > >>>> >> >> > Example 1: 50% of the animals are between -270m and +270m and 90% >>>> >> ?are >>>> >> >> between -500m and + 500m >>>> >> >> > >>>> >> >> > Example 2: 84% is between - 5000 m and +5000m, and 73% are between >>>> >> >> -1000m and +1000m >>>> >> >> > >>>> >> >> > So far as I understand an integration of the function is needed to >>>> >> solve >>>> >> >> for s1 and s2 as all the literature data give percentage (area under >>>> >> the >>>> >> >> curve) Can that be used to fit the curve or can that create ranges >>>> for >>>> >> s1 >>>> >> >> and s2. >>>> >> >> >>>> >> >> I don't see a way around integration. >>>> >> >> >>>> >> >> If you have exactly 2 probabilities, then you can you a solver like >>>> >> >> scipy.optimize.fsolve to match the probabilites >>>> >> >> eg. >>>> >> >> 0.5 = integral pdf from -270 to 270 >>>> >> >> 0.9 = integral pdf from -500 to 500 >>>> >> >> >>>> >> >> If you have more than 2 probabilities, then using optimization of a >>>> >> >> weighted function of the moment conditions would be better. >>>> >> >> >>>> >> >> Josef >>>> >> > >>>> >> > >>>> >> > >>>> >> > Hello again >>>> >> > >>>> >> > I tried following, but without success so far. What do I have to do >>>> >> excactly... >>>> >> > >>>> >> > import numpy >>>> >> > from scipy import stats >>>> >> > from scipy import integrate >>>> >> > from scipy.optimize import fsolve >>>> >> > import math >>>> >> > >>>> >> > p=0.3 >>>> >> > >>>> >> > def pdf(x,s1,s2): >>>> >> > ? ?return >>>> >> >>>> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) >>>> >> > >>>> >> > def equ(s1,s2): >>>> >> > ? ?0.5==integrate.quad(pdf,-270,270,args=(s1,s2)) >>>> >> > ? ?0.9==integrate.quad(pdf,-500,500,args=(s1,s2)) >>>> >> > >>>> >> > result=fsolve(equ, 1,500) >>>> >> > >>>> >> > print result >>>> >> >>>> >> equ needs to return the deviation of the equations (I changed some >>>> >> details for s1 just to try it) >>>> >> >>>> >> import numpy >>>> >> from scipy import stats >>>> >> from scipy import integrate >>>> >> from scipy.optimize import fsolve >>>> >> import math >>>> >> >>>> >> p=0.3 >>>> >> >>>> >> def pdf(x,s1,s2): >>>> >> ? ? return >>>> >> >>>> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(math.sqrt(2*math.pi*s2**2))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) >>>> >> >>>> >> def equ(arg): >>>> >> ? ? s1,s2 = numpy.abs(arg) >>>> >> ? ? cond1 = 0.5 - integrate.quad(pdf,-270,270,args=(s1,s2))[0] >>>> >> ? ? cond2 = 0.9 - integrate.quad(pdf,-500,500,args=(s1,s2))[0] >>>> >> ? ? return [cond1, cond2] >>>> >> >>>> >> result=fsolve(equ, [200., 1200]) >>> >>> thank you for your last reply...seems that the parameters of the two normals are nearly identical... anyway just two small addtional questions: >>> >>> 1)in fsolve(equ, [200., 1200]) the 200 and 1200 are kind of start values so far as I understand...how should these be choosen? what is recommended? >> >> There is no general solution for choosing starting values, in your >> case it should be possible to >> >>>>> q = np.array([0.5, 0.9]) >>>>> cr = x/stats.norm.ppf(0.5 + q/2.) >>>>> x = [270, 500] >>>>> q = np.array([0.5, 0.9]) >>>>> x = [270, 500] >>>>> cr = x/stats.norm.ppf(0.5 + q/2.) >>>>> stats.norm.cdf(500, scale=cr[1]) - stats.norm.cdf(-500, scale=cr[1]) >> 0.89999999999999991 > ------- > I forgot to remove the typos >>>>> stats.norm.cdf(q[0], scale=cr[1]) - stats.norm.cdf(-q[0], scale=cr[0]) >> 0.0011545021185267457 >>>>> stats.norm.cdf(q[0], scale=cr[0]) - stats.norm.cdf(-q[0], scale=cr[0]) >> 0.000996601515122153 > --------- >>>>> stats.norm.cdf(x[0], scale=cr[0]) - stats.norm.cdf(-x[0], scale=cr[0]) >> 0.5 >>>>> sol = fsolve(equ, np.sort(cr)) >> >> there are some numerical problems finding the solution (???) >> >>>>> equ(sol) >> array([-0.05361093, ?0.05851309]) >>>>> from pprint import pprint >>>>> pprint(fsolve(equ, np.sort(cr), xtol=1e-10, full_output=1)) >> (array([ 354.32616549, ?354.69918062]), >> ?{'fjac': array([[-0.7373189 , -0.67554484], >> ? ? ? [ 0.67554484, -0.7373189 ]]), >> ?'fvec': array([-0.05361093, ?0.05851309]), >> ?'nfev': 36, >> ?'qtf': array([ ?1.40019135e-07, ?-7.93593929e-02]), >> ?'r': array([ -5.21390161e-04, ?-1.21700831e-03, ? 3.88274320e-07])}, >> ?5, >> ?'The iteration is not making good progress, as measured by the \n >> improvement from the last ten iterations.') >> >>> >>> 2) How can that be solve if I have I third condition (overfitted) can that be used as well or how does the alternative look like? >> >> use optimize.leastsq on equ (I never tried this for this case) something is strange with the curvature in this problem, leastsq thinks the two scales are (essentially) identical, but the solution is not zero >>> ss = optimize.leastsq(equ, np.sort(cr)) >>> ss (array([ 354.5985618 , 354.59952267]), 1) Josef >> use fmin on the sum of squared errors >> >> if the intervals for the probabilities are non-overlapping (interval >> data), then there is an optimal weighting matrix, (but my code for >> that in the statsmodels.sandbox is not verified). >> >> Josef >> >> >>> >>> /johannes >>> >>>> >> >>>> >> print result >>>> >> >>>> >> but in the results I get the parameters are very close to each other >>>> >> [-356.5283675 ? 353.82544075] >>>> >> >>>> >> the pdf looks just like a mixture of 2 normals both with loc=0, then >>>> >> maybe the cdf of norm can be used directly >>>> > >>>> > >>>> > Thank you for that hint... First yes these are 2 superimposed normals >>>> but for other reasons I want to use the original formula instead of the >>>> stats.functions... >>>> > >>>> > anyway there is still a thing...the locator s1 and s2 are like the scale >>>> parameter of stats.norm so the are both + and -. For fsolve above it seems >>>> that I get only one parameter (s1 or s2) but for the positive and negative >>>> side of the distribution. So in actually there are four parameters -s1, >>>> +s1, -s2, +s2. How can I solve that? Maybe I can restrict the fsolve to look >>>> for the two values only in the positive range... >>>> >>>> It doesn't really matter, if the scale only shows up in quadratic >>>> terms, or as in my initial change I added a absolute value, so whether >>>> it's positive or negative, it's still only one value, and we >>>> interprete it as postive scale >>>> >>>> s1 = sqrt(s1**2) >>>> >>>> Josef >>>> >>>> > >>>> > any guesses? >>>> > >>>> > /J >>>> > >>>> >> >>>> >> >>> from scipy import stats >>>> >> >>> stats.norm.cdf(270, scale=350) - stats.norm.cdf(-270, scale=350) >>>> >> 0.55954705470577526 >>>> >> >>> >>>> >> >>> stats.norm.cdf(270, scale=354) - stats.norm.cdf(-270, scale=354) >>>> >> 0.55436474670960978 >>>> >> >>> stats.norm.cdf(500, scale=354) - stats.norm.cdf(-500, scale=354) >>>> >> 0.84217642881921018 >>>> >> >>>> >> Josef >>>> >> > >>>> >> > >>>> >> > /Johannes >>>> >> >> >>>> >> >> > >>>> >> >> > /Johannes >>>> >> >> > >>>> >> >> > -- >>>> >> >> > NEU: FreePhone - kostenlos mobil telefonieren! >>>> >> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone >>>> >> >> > _______________________________________________ >>>> >> >> > SciPy-User mailing list >>>> >> >> > SciPy-User at scipy.org >>>> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >> >> > >>>> >> >> _______________________________________________ >>>> >> >> SciPy-User mailing list >>>> >> >> SciPy-User at scipy.org >>>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >> > >>>> >> > -- >>>> >> > NEU: FreePhone - kostenlos mobil telefonieren! >>>> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone >>>> >> > _______________________________________________ >>>> >> > SciPy-User mailing list >>>> >> > SciPy-User at scipy.org >>>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >> > >>>> >> _______________________________________________ >>>> >> SciPy-User mailing list >>>> >> SciPy-User at scipy.org >>>> >> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> > >>>> > -- >>>> > NEU: FreePhone - kostenlos mobil telefonieren! >>>> > Jetzt informieren: http://www.gmx.net/de/go/freephone >>>> > _______________________________________________ >>>> > SciPy-User mailing list >>>> > SciPy-User at scipy.org >>>> > http://mail.scipy.org/mailman/listinfo/scipy-user >>>> > >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> -- >>> NEU: FreePhone - kostenlos mobil telefonieren! >>> Jetzt informieren: http://www.gmx.net/de/go/freephone >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> > From yyc at solvcon.net Wed Jun 8 17:09:31 2011 From: yyc at solvcon.net (Yung-Yu Chen) Date: Wed, 8 Jun 2011 17:09:31 -0400 Subject: [SciPy-User] ANN: SOLVCON 0.0.7 Message-ID: Hello, I am pleased to announce version 0.0.7 of SOLVCON. SOLVCON is a Python-based, multi-physics software framework for solving first-order hyperbolic PDEs. The source tarball can be downloaded at http://bitbucket.org/yungyuc/solvcon/downloads . More information can be found at http://solvcon.net/ . In this release, SOLVCON starts to support using incenters or centroids for constructing basic Conservation Elements (BCEs) of the CESE method. Incenters are only enabled for simplex cells. Three more examples for supersonic flows are also added, in addition to the new capability. New features: - A set of building scripts for dependencies of SOLVCON is written in ``ground/`` directory. A Python script ``ground/get`` download all depended source tarballs according to ``ground/get.ini``. A make file ``ground/Makefile`` directs the building with targets ``binary``, ``python``, ``vtk``. The targets must be built in order. An environment variable ``$SCPREFIX`` can be set when making to specify the destination of installation. The make file will create a shell script ``$SCROOT/bin/scvars.sh`` exporting necessary environment variables for using the customized runtime. ``$SCROOT`` is the installing destination (i.e., ``$SCPREFIX``), and is set in the shell script as well. - The center of a cell can now be calculated as an incenter. Use of incenter or centroid is controlled by a keyword parameter ``use_incenter`` of ``solvcon.block.Block`` constructor. This enables incenter-based CESE implementation that will benefit calculating Navier-Stokes equations in the future. - More examples for compressible inviscid flows are provided. Bug-fix: - A bug in coordiate transformation for wall boundary conditions of gas dynamics module (``solvcon.kerpak.gasdyn``). with regards, Yung-Yu Chen -- Yung-Yu Chen http://solvcon.net/yyc/ +1 (614) 859 2436 -------------- next part -------------- An HTML attachment was scrubbed... URL: From schut at sarvision.nl Thu Jun 9 04:39:55 2011 From: schut at sarvision.nl (Vincent Schut) Date: Thu, 09 Jun 2011 10:39:55 +0200 Subject: [SciPy-User] [job] Python Job at Hedge Fund In-Reply-To: References: Message-ID: >> Ha. I also thought about using: >> >>>> x = [c for c in x] >>>> rs = np.random.RandomState([1,2,3]) >>>> rs.shuffle(x) >>>> ''.join(x) >> 'oauoeyphjlot.nrdmorb at oerrg' >> >> Would that have cut down on the number of resumes? Not from this list. >> Give it a try. > > rs = np.random.RandomState([1,2,3]) > xs = 'oauoeyphjlot.nrdmorb at oerrg' > xs = [c for c in xs] > x = np.asarray(xs) > i = range(len(xs)) > rs.shuffle(i) > x[i] = xs > print ''.join(x) > > sorry it took so long, cooking lasagna in the meantime... :-) > > VS. or, slightly more elegant: rs = np.random.RandomState([1,2,3]) xs = 'oauoeyphjlot.nrdmorb at oerrg' xs = [c for c in xs] xs = np.asarray(xs) i = np.arange(len(xs)) rs.shuffle(i) print ''.join(xs[np.argsort(i)]) From matthieu.brucher at gmail.com Thu Jun 9 09:17:58 2011 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Thu, 9 Jun 2011 15:17:58 +0200 Subject: [SciPy-User] Conversion from 32bits IEEE floats to IBM floats Message-ID: Hi, I wondered if anyone had a conversion routine for 32 bits IEEE floats in an array to IBM floats (stored in a 4 bytes integers). I have a routine for doing the opposite, but not IEEE->IBM. There are codes in C or other languages, but the trick is starting from an array (don't know if it can be reinterpreted as an integer array easily). Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From lohani at iitk.ac.in Thu Jun 9 09:42:02 2011 From: lohani at iitk.ac.in (Vivek Lohani) Date: Thu, 9 Jun 2011 19:12:02 +0530 Subject: [SciPy-User] Eigen not working Message-ID: <821c0eb66547e970e1d84b824c02a267.squirrel@webmail.iitk.ac.in> Hi, I was trying to implement the code for diagonalizing sparse matrices through scipy.sparse.linalg.eigen but it turns out that for N=9 (which leads to 2^N X 2^N matrix) and the subsequent input paramter= f, which sets J=-1, I am not getting the correct output for -0.71=5) on my computer in a region -0.6 From sparkliang at gmail.com Thu Jun 9 10:39:07 2011 From: sparkliang at gmail.com (Spark Liang) Date: Thu, 9 Jun 2011 22:39:07 +0800 Subject: [SciPy-User] curve_fit cannot accept a function with list or array as parameters? Message-ID: Hi, I'm using scipy.optimize.curve_fit to fit two sets of data (x, y). But I found that curve_fit cannot accept a function with list or numpy.ndarray as parameters. For example, one of my function is : def testfunc(x, beta) a = beta[0] b = beta[1] c = beta[2] d = beta[3] return a+b*x+c*x**2+d*x**4 In my program, I create the parameters guess: c = [1, 2, 3]. When I using curve_fit as: popt, pcov = curve_fit(testfunc, x, y, p0=c). It threw the errors: TypeError: testfunc() takes exactly 2 arguments (4 given). How to resolve the problems ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jun 9 10:47:36 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 9 Jun 2011 10:47:36 -0400 Subject: [SciPy-User] curve_fit cannot accept a function with list or array as parameters? In-Reply-To: References: Message-ID: On Thu, Jun 9, 2011 at 10:39 AM, Spark Liang wrote: > Hi, I'm using scipy.optimize.curve_fit to fit two sets of data (x, y). But I > found that curve_fit cannot accept a function with list or numpy.ndarray as > parameters. > For example, one of my function is : > def testfunc(x, beta) > ?????? a = beta[0] > ?????? b = beta[1] > ?????? c = beta[2] > ?????? d = beta[3] > ?????? return a+b*x+c*x**2+d*x**4 > In my program, I create the parameters guess: c = [1, 2, 3].? When I using > curve_fit as: popt, pcov = curve_fit(testfunc, x, y, p0=c). It threw the > errors:? TypeError: testfunc() takes exactly 2 arguments (4 given). > How to resolve the problems ? add a * and it will unpack the iterable, array or list def testfunc(x, *beta) Josef > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From pav at iki.fi Thu Jun 9 10:47:50 2011 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 9 Jun 2011 14:47:50 +0000 (UTC) Subject: [SciPy-User] Eigen not working References: <821c0eb66547e970e1d84b824c02a267.squirrel@webmail.iitk.ac.in> Message-ID: Thu, 09 Jun 2011 19:12:02 +0530, Vivek Lohani wrote: > I was trying to implement the code for diagonalizing sparse matrices > through scipy.sparse.linalg.eigen but it turns out that for N=9 (which > leads to > 2^N X 2^N matrix) and the subsequent input paramter= f, which sets > J=-1, > I am not getting the correct output for -0.71 for lower values of N(>=5) on my computer in a region -0.6 unable to understand what is going wrong because i have a numpy routine > too which does the job correctly. Try upgrading to Scipy 0.9 (recommended), or specify a larger `maxiter`. In Scipy 0.8 and earlier, the Arpack eigenvalue routines essentially left convergence checking to the user, so if `maxiter` (default 20*n) is too small, they return non-converged results. -- Pauli Virtanen From JRadinger at gmx.at Thu Jun 9 11:52:46 2011 From: JRadinger at gmx.at (Johannes Radinger) Date: Thu, 09 Jun 2011 17:52:46 +0200 Subject: [SciPy-User] How to fit a curve/function? In-Reply-To: References: <20110607113201.7B02.B1C76292@gmail.com> <20110608105217.222500@gmx.net> <20110608134101.259520@gmx.net> <20110608142743.77890@gmx.net> <20110608145625.162510@gmx.net> Message-ID: <20110609155246.27670@gmx.net> Hello again... i try no to fit a curve using integrals as conditions. the scipy manual says that integrations to infinite are possible with Inf, I tried following but I fail (saying inf is not defined): cond2 = 5.0/10/2 - integrate.quad(pdf,35000,Inf,args=(s1,s2))[0] what causes the problem? Do I use quad/Inf in a wrong way? The error is: NameError: global name 'Inf' is not defined /Johannes -------- Original-Nachricht -------- > Datum: Wed, 8 Jun 2011 11:37:52 -0400 > Von: josef.pktd at gmail.com > An: SciPy Users List > Betreff: Re: [SciPy-User] How to fit a curve/function? > On Wed, Jun 8, 2011 at 11:37 AM, wrote: > > On Wed, Jun 8, 2011 at 10:56 AM, Johannes Radinger > wrote: > >> > >> -------- Original-Nachricht -------- > >>> Datum: Wed, 8 Jun 2011 10:33:45 -0400 > >>> Von: josef.pktd at gmail.com > >>> An: SciPy Users List > >>> Betreff: Re: [SciPy-User] How to fit a curve/function? > >> > >>> On Wed, Jun 8, 2011 at 10:27 AM, Johannes Radinger > >>> wrote: > >>> > > >>> > -------- Original-Nachricht -------- > >>> >> Datum: Wed, 8 Jun 2011 10:12:58 -0400 > >>> >> Von: josef.pktd at gmail.com > >>> >> An: SciPy Users List > >>> >> Betreff: Re: [SciPy-User] How to fit a curve/function? > >>> > > >>> >> On Wed, Jun 8, 2011 at 9:41 AM, Johannes Radinger > > >>> >> wrote: > >>> >> > > >>> >> > -------- Original-Nachricht -------- > >>> >> >> Datum: Wed, 8 Jun 2011 07:10:38 -0400 > >>> >> >> Von: josef.pktd at gmail.com > >>> >> >> An: SciPy Users List > >>> >> >> Betreff: Re: [SciPy-User] How to fit a curve/function? > >>> >> > > >>> >> >> On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger > > >>> >> >> wrote: > >>> >> >> > Hello, > >>> >> >> > > >>> >> >> > I've got following function describing any kind of animal > >>> dispersal > >>> >> >> kernel: > >>> >> >> > > >>> >> >> > def pdf(x,s1,s2): > >>> >> >> > ? ?return > >>> >> >> > >>> >> > >>> > (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > >>> >> >> > > >>> >> >> > On the other hand I've got data from literature with which I > want > >>> to > >>> >> fit > >>> >> >> the function so that I get s1, s2 and x. > >>> >> >> > Ususally the data in the literature are as follows: > >>> >> >> > > >>> >> >> > Example 1: 50% of the animals are between -270m and +270m and > 90% > >>> >> ?are > >>> >> >> between -500m and + 500m > >>> >> >> > > >>> >> >> > Example 2: 84% is between - 5000 m and +5000m, and 73% are > between > >>> >> >> -1000m and +1000m > >>> >> >> > > >>> >> >> > So far as I understand an integration of the function is > needed to > >>> >> solve > >>> >> >> for s1 and s2 as all the literature data give percentage (area > under > >>> >> the > >>> >> >> curve) Can that be used to fit the curve or can that create > ranges > >>> for > >>> >> s1 > >>> >> >> and s2. > >>> >> >> > >>> >> >> I don't see a way around integration. > >>> >> >> > >>> >> >> If you have exactly 2 probabilities, then you can you a solver > like > >>> >> >> scipy.optimize.fsolve to match the probabilites > >>> >> >> eg. > >>> >> >> 0.5 = integral pdf from -270 to 270 > >>> >> >> 0.9 = integral pdf from -500 to 500 > >>> >> >> > >>> >> >> If you have more than 2 probabilities, then using optimization > of a > >>> >> >> weighted function of the moment conditions would be better. > >>> >> >> > >>> >> >> Josef > >>> >> > > >>> >> > > >>> >> > > >>> >> > Hello again > >>> >> > > >>> >> > I tried following, but without success so far. What do I have to > do > >>> >> excactly... > >>> >> > > >>> >> > import numpy > >>> >> > from scipy import stats > >>> >> > from scipy import integrate > >>> >> > from scipy.optimize import fsolve > >>> >> > import math > >>> >> > > >>> >> > p=0.3 > >>> >> > > >>> >> > def pdf(x,s1,s2): > >>> >> > ? ?return > >>> >> > >>> > (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > >>> >> > > >>> >> > def equ(s1,s2): > >>> >> > ? ?0.5==integrate.quad(pdf,-270,270,args=(s1,s2)) > >>> >> > ? ?0.9==integrate.quad(pdf,-500,500,args=(s1,s2)) > >>> >> > > >>> >> > result=fsolve(equ, 1,500) > >>> >> > > >>> >> > print result > >>> >> > >>> >> equ needs to return the deviation of the equations (I changed some > >>> >> details for s1 just to try it) > >>> >> > >>> >> import numpy > >>> >> from scipy import stats > >>> >> from scipy import integrate > >>> >> from scipy.optimize import fsolve > >>> >> import math > >>> >> > >>> >> p=0.3 > >>> >> > >>> >> def pdf(x,s1,s2): > >>> >> ? ? return > >>> >> > >>> > (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(math.sqrt(2*math.pi*s2**2))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > >>> >> > >>> >> def equ(arg): > >>> >> ? ? s1,s2 = numpy.abs(arg) > >>> >> ? ? cond1 = 0.5 - integrate.quad(pdf,-270,270,args=(s1,s2))[0] > >>> >> ? ? cond2 = 0.9 - integrate.quad(pdf,-500,500,args=(s1,s2))[0] > >>> >> ? ? return [cond1, cond2] > >>> >> > >>> >> result=fsolve(equ, [200., 1200]) > >> > >> thank you for your last reply...seems that the parameters of the two > normals are nearly identical... anyway just two small addtional questions: > >> > >> 1)in fsolve(equ, [200., 1200]) the 200 and 1200 are kind of start > values so far as I understand...how should these be choosen? what is > recommended? > > > > There is no general solution for choosing starting values, in your > > case it should be possible to > > > >>>> q = np.array([0.5, 0.9]) > >>>> cr = x/stats.norm.ppf(0.5 + q/2.) > >>>> x = [270, 500] > >>>> q = np.array([0.5, 0.9]) > >>>> x = [270, 500] > >>>> cr = x/stats.norm.ppf(0.5 + q/2.) > >>>> stats.norm.cdf(500, scale=cr[1]) - stats.norm.cdf(-500, scale=cr[1]) > > 0.89999999999999991 > ------- > I forgot to remove the typos > >>>> stats.norm.cdf(q[0], scale=cr[1]) - stats.norm.cdf(-q[0], > scale=cr[0]) > > 0.0011545021185267457 > >>>> stats.norm.cdf(q[0], scale=cr[0]) - stats.norm.cdf(-q[0], > scale=cr[0]) > > 0.000996601515122153 > --------- > >>>> stats.norm.cdf(x[0], scale=cr[0]) - stats.norm.cdf(-x[0], > scale=cr[0]) > > 0.5 > >>>> sol = fsolve(equ, np.sort(cr)) > > > > there are some numerical problems finding the solution (???) > > > >>>> equ(sol) > > array([-0.05361093, ?0.05851309]) > >>>> from pprint import pprint > >>>> pprint(fsolve(equ, np.sort(cr), xtol=1e-10, full_output=1)) > > (array([ 354.32616549, ?354.69918062]), > > ?{'fjac': array([[-0.7373189 , -0.67554484], > > ? ? ? [ 0.67554484, -0.7373189 ]]), > > ?'fvec': array([-0.05361093, ?0.05851309]), > > ?'nfev': 36, > > ?'qtf': array([ ?1.40019135e-07, ?-7.93593929e-02]), > > ?'r': array([ -5.21390161e-04, ?-1.21700831e-03, ? 3.88274320e-07])}, > > ?5, > > ?'The iteration is not making good progress, as measured by the \n > > improvement from the last ten iterations.') > > > >> > >> 2) How can that be solve if I have I third condition (overfitted) can > that be used as well or how does the alternative look like? > > > > use optimize.leastsq on equ (I never tried this for this case) > > use fmin on the sum of squared errors > > > > if the intervals for the probabilities are non-overlapping (interval > > data), then there is an optimal weighting matrix, (but my code for > > that in the statsmodels.sandbox is not verified). > > > > Josef > > > > > >> > >> /johannes > >> > >>> >> > >>> >> print result > >>> >> > >>> >> but in the results I get the parameters are very close to each > other > >>> >> [-356.5283675 ? 353.82544075] > >>> >> > >>> >> the pdf looks just like a mixture of 2 normals both with loc=0, > then > >>> >> maybe the cdf of norm can be used directly > >>> > > >>> > > >>> > Thank you for that hint... First yes these are 2 superimposed > normals > >>> but for other reasons I want to use the original formula instead of > the > >>> stats.functions... > >>> > > >>> > anyway there is still a thing...the locator s1 and s2 are like the > scale > >>> parameter of stats.norm so the are both + and -. For fsolve above it > seems > >>> that I get only one parameter (s1 or s2) but for the positive and > negative > >>> side of the distribution. So in actually there are four parameters > -s1, > >>> +s1, -s2, +s2. How can I solve that? Maybe I can restrict the fsolve > to look > >>> for the two values only in the positive range... > >>> > >>> It doesn't really matter, if the scale only shows up in quadratic > >>> terms, or as in my initial change I added a absolute value, so whether > >>> it's positive or negative, it's still only one value, and we > >>> interprete it as postive scale > >>> > >>> s1 = sqrt(s1**2) > >>> > >>> Josef > >>> > >>> > > >>> > any guesses? > >>> > > >>> > /J > >>> > > >>> >> > >>> >> >>> from scipy import stats > >>> >> >>> stats.norm.cdf(270, scale=350) - stats.norm.cdf(-270, > scale=350) > >>> >> 0.55954705470577526 > >>> >> >>> > >>> >> >>> stats.norm.cdf(270, scale=354) - stats.norm.cdf(-270, > scale=354) > >>> >> 0.55436474670960978 > >>> >> >>> stats.norm.cdf(500, scale=354) - stats.norm.cdf(-500, > scale=354) > >>> >> 0.84217642881921018 > >>> >> > >>> >> Josef > >>> >> > > >>> >> > > >>> >> > /Johannes > >>> >> >> > >>> >> >> > > >>> >> >> > /Johannes > >>> >> >> > > >>> >> >> > -- > >>> >> >> > NEU: FreePhone - kostenlos mobil telefonieren! > >>> >> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone > >>> >> >> > _______________________________________________ > >>> >> >> > SciPy-User mailing list > >>> >> >> > SciPy-User at scipy.org > >>> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >>> >> >> > > >>> >> >> _______________________________________________ > >>> >> >> SciPy-User mailing list > >>> >> >> SciPy-User at scipy.org > >>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user > >>> >> > > >>> >> > -- > >>> >> > NEU: FreePhone - kostenlos mobil telefonieren! > >>> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone > >>> >> > _______________________________________________ > >>> >> > SciPy-User mailing list > >>> >> > SciPy-User at scipy.org > >>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >>> >> > > >>> >> _______________________________________________ > >>> >> SciPy-User mailing list > >>> >> SciPy-User at scipy.org > >>> >> http://mail.scipy.org/mailman/listinfo/scipy-user > >>> > > >>> > -- > >>> > NEU: FreePhone - kostenlos mobil telefonieren! > >>> > Jetzt informieren: http://www.gmx.net/de/go/freephone > >>> > _______________________________________________ > >>> > SciPy-User mailing list > >>> > SciPy-User at scipy.org > >>> > http://mail.scipy.org/mailman/listinfo/scipy-user > >>> > > >>> _______________________________________________ > >>> SciPy-User mailing list > >>> SciPy-User at scipy.org > >>> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > >> -- > >> NEU: FreePhone - kostenlos mobil telefonieren! > >> Jetzt informieren: http://www.gmx.net/de/go/freephone > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -- NEU: FreePhone - kostenlos mobil telefonieren! Jetzt informieren: http://www.gmx.net/de/go/freephone From JRadinger at gmx.at Thu Jun 9 11:52:46 2011 From: JRadinger at gmx.at (Johannes Radinger) Date: Thu, 09 Jun 2011 17:52:46 +0200 Subject: [SciPy-User] How to fit a curve/function? In-Reply-To: References: <20110607113201.7B02.B1C76292@gmail.com> <20110608105217.222500@gmx.net> <20110608134101.259520@gmx.net> <20110608142743.77890@gmx.net> <20110608145625.162510@gmx.net> Message-ID: <20110609155246.27670@gmx.net> Hello again... i try no to fit a curve using integrals as conditions. the scipy manual says that integrations to infinite are possible with Inf, I tried following but I fail (saying inf is not defined): cond2 = 5.0/10/2 - integrate.quad(pdf,35000,Inf,args=(s1,s2))[0] what causes the problem? Do I use quad/Inf in a wrong way? The error is: NameError: global name 'Inf' is not defined /Johannes -------- Original-Nachricht -------- > Datum: Wed, 8 Jun 2011 11:37:52 -0400 > Von: josef.pktd at gmail.com > An: SciPy Users List > Betreff: Re: [SciPy-User] How to fit a curve/function? > On Wed, Jun 8, 2011 at 11:37 AM, wrote: > > On Wed, Jun 8, 2011 at 10:56 AM, Johannes Radinger > wrote: > >> > >> -------- Original-Nachricht -------- > >>> Datum: Wed, 8 Jun 2011 10:33:45 -0400 > >>> Von: josef.pktd at gmail.com > >>> An: SciPy Users List > >>> Betreff: Re: [SciPy-User] How to fit a curve/function? > >> > >>> On Wed, Jun 8, 2011 at 10:27 AM, Johannes Radinger > >>> wrote: > >>> > > >>> > -------- Original-Nachricht -------- > >>> >> Datum: Wed, 8 Jun 2011 10:12:58 -0400 > >>> >> Von: josef.pktd at gmail.com > >>> >> An: SciPy Users List > >>> >> Betreff: Re: [SciPy-User] How to fit a curve/function? > >>> > > >>> >> On Wed, Jun 8, 2011 at 9:41 AM, Johannes Radinger > > >>> >> wrote: > >>> >> > > >>> >> > -------- Original-Nachricht -------- > >>> >> >> Datum: Wed, 8 Jun 2011 07:10:38 -0400 > >>> >> >> Von: josef.pktd at gmail.com > >>> >> >> An: SciPy Users List > >>> >> >> Betreff: Re: [SciPy-User] How to fit a curve/function? > >>> >> > > >>> >> >> On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger > > >>> >> >> wrote: > >>> >> >> > Hello, > >>> >> >> > > >>> >> >> > I've got following function describing any kind of animal > >>> dispersal > >>> >> >> kernel: > >>> >> >> > > >>> >> >> > def pdf(x,s1,s2): > >>> >> >> > ? ?return > >>> >> >> > >>> >> > >>> > (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > >>> >> >> > > >>> >> >> > On the other hand I've got data from literature with which I > want > >>> to > >>> >> fit > >>> >> >> the function so that I get s1, s2 and x. > >>> >> >> > Ususally the data in the literature are as follows: > >>> >> >> > > >>> >> >> > Example 1: 50% of the animals are between -270m and +270m and > 90% > >>> >> ?are > >>> >> >> between -500m and + 500m > >>> >> >> > > >>> >> >> > Example 2: 84% is between - 5000 m and +5000m, and 73% are > between > >>> >> >> -1000m and +1000m > >>> >> >> > > >>> >> >> > So far as I understand an integration of the function is > needed to > >>> >> solve > >>> >> >> for s1 and s2 as all the literature data give percentage (area > under > >>> >> the > >>> >> >> curve) Can that be used to fit the curve or can that create > ranges > >>> for > >>> >> s1 > >>> >> >> and s2. > >>> >> >> > >>> >> >> I don't see a way around integration. > >>> >> >> > >>> >> >> If you have exactly 2 probabilities, then you can you a solver > like > >>> >> >> scipy.optimize.fsolve to match the probabilites > >>> >> >> eg. > >>> >> >> 0.5 = integral pdf from -270 to 270 > >>> >> >> 0.9 = integral pdf from -500 to 500 > >>> >> >> > >>> >> >> If you have more than 2 probabilities, then using optimization > of a > >>> >> >> weighted function of the moment conditions would be better. > >>> >> >> > >>> >> >> Josef > >>> >> > > >>> >> > > >>> >> > > >>> >> > Hello again > >>> >> > > >>> >> > I tried following, but without success so far. What do I have to > do > >>> >> excactly... > >>> >> > > >>> >> > import numpy > >>> >> > from scipy import stats > >>> >> > from scipy import integrate > >>> >> > from scipy.optimize import fsolve > >>> >> > import math > >>> >> > > >>> >> > p=0.3 > >>> >> > > >>> >> > def pdf(x,s1,s2): > >>> >> > ? ?return > >>> >> > >>> > (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > >>> >> > > >>> >> > def equ(s1,s2): > >>> >> > ? ?0.5==integrate.quad(pdf,-270,270,args=(s1,s2)) > >>> >> > ? ?0.9==integrate.quad(pdf,-500,500,args=(s1,s2)) > >>> >> > > >>> >> > result=fsolve(equ, 1,500) > >>> >> > > >>> >> > print result > >>> >> > >>> >> equ needs to return the deviation of the equations (I changed some > >>> >> details for s1 just to try it) > >>> >> > >>> >> import numpy > >>> >> from scipy import stats > >>> >> from scipy import integrate > >>> >> from scipy.optimize import fsolve > >>> >> import math > >>> >> > >>> >> p=0.3 > >>> >> > >>> >> def pdf(x,s1,s2): > >>> >> ? ? return > >>> >> > >>> > (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(math.sqrt(2*math.pi*s2**2))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) > >>> >> > >>> >> def equ(arg): > >>> >> ? ? s1,s2 = numpy.abs(arg) > >>> >> ? ? cond1 = 0.5 - integrate.quad(pdf,-270,270,args=(s1,s2))[0] > >>> >> ? ? cond2 = 0.9 - integrate.quad(pdf,-500,500,args=(s1,s2))[0] > >>> >> ? ? return [cond1, cond2] > >>> >> > >>> >> result=fsolve(equ, [200., 1200]) > >> > >> thank you for your last reply...seems that the parameters of the two > normals are nearly identical... anyway just two small addtional questions: > >> > >> 1)in fsolve(equ, [200., 1200]) the 200 and 1200 are kind of start > values so far as I understand...how should these be choosen? what is > recommended? > > > > There is no general solution for choosing starting values, in your > > case it should be possible to > > > >>>> q = np.array([0.5, 0.9]) > >>>> cr = x/stats.norm.ppf(0.5 + q/2.) > >>>> x = [270, 500] > >>>> q = np.array([0.5, 0.9]) > >>>> x = [270, 500] > >>>> cr = x/stats.norm.ppf(0.5 + q/2.) > >>>> stats.norm.cdf(500, scale=cr[1]) - stats.norm.cdf(-500, scale=cr[1]) > > 0.89999999999999991 > ------- > I forgot to remove the typos > >>>> stats.norm.cdf(q[0], scale=cr[1]) - stats.norm.cdf(-q[0], > scale=cr[0]) > > 0.0011545021185267457 > >>>> stats.norm.cdf(q[0], scale=cr[0]) - stats.norm.cdf(-q[0], > scale=cr[0]) > > 0.000996601515122153 > --------- > >>>> stats.norm.cdf(x[0], scale=cr[0]) - stats.norm.cdf(-x[0], > scale=cr[0]) > > 0.5 > >>>> sol = fsolve(equ, np.sort(cr)) > > > > there are some numerical problems finding the solution (???) > > > >>>> equ(sol) > > array([-0.05361093, ?0.05851309]) > >>>> from pprint import pprint > >>>> pprint(fsolve(equ, np.sort(cr), xtol=1e-10, full_output=1)) > > (array([ 354.32616549, ?354.69918062]), > > ?{'fjac': array([[-0.7373189 , -0.67554484], > > ? ? ? [ 0.67554484, -0.7373189 ]]), > > ?'fvec': array([-0.05361093, ?0.05851309]), > > ?'nfev': 36, > > ?'qtf': array([ ?1.40019135e-07, ?-7.93593929e-02]), > > ?'r': array([ -5.21390161e-04, ?-1.21700831e-03, ? 3.88274320e-07])}, > > ?5, > > ?'The iteration is not making good progress, as measured by the \n > > improvement from the last ten iterations.') > > > >> > >> 2) How can that be solve if I have I third condition (overfitted) can > that be used as well or how does the alternative look like? > > > > use optimize.leastsq on equ (I never tried this for this case) > > use fmin on the sum of squared errors > > > > if the intervals for the probabilities are non-overlapping (interval > > data), then there is an optimal weighting matrix, (but my code for > > that in the statsmodels.sandbox is not verified). > > > > Josef > > > > > >> > >> /johannes > >> > >>> >> > >>> >> print result > >>> >> > >>> >> but in the results I get the parameters are very close to each > other > >>> >> [-356.5283675 ? 353.82544075] > >>> >> > >>> >> the pdf looks just like a mixture of 2 normals both with loc=0, > then > >>> >> maybe the cdf of norm can be used directly > >>> > > >>> > > >>> > Thank you for that hint... First yes these are 2 superimposed > normals > >>> but for other reasons I want to use the original formula instead of > the > >>> stats.functions... > >>> > > >>> > anyway there is still a thing...the locator s1 and s2 are like the > scale > >>> parameter of stats.norm so the are both + and -. For fsolve above it > seems > >>> that I get only one parameter (s1 or s2) but for the positive and > negative > >>> side of the distribution. So in actually there are four parameters > -s1, > >>> +s1, -s2, +s2. How can I solve that? Maybe I can restrict the fsolve > to look > >>> for the two values only in the positive range... > >>> > >>> It doesn't really matter, if the scale only shows up in quadratic > >>> terms, or as in my initial change I added a absolute value, so whether > >>> it's positive or negative, it's still only one value, and we > >>> interprete it as postive scale > >>> > >>> s1 = sqrt(s1**2) > >>> > >>> Josef > >>> > >>> > > >>> > any guesses? > >>> > > >>> > /J > >>> > > >>> >> > >>> >> >>> from scipy import stats > >>> >> >>> stats.norm.cdf(270, scale=350) - stats.norm.cdf(-270, > scale=350) > >>> >> 0.55954705470577526 > >>> >> >>> > >>> >> >>> stats.norm.cdf(270, scale=354) - stats.norm.cdf(-270, > scale=354) > >>> >> 0.55436474670960978 > >>> >> >>> stats.norm.cdf(500, scale=354) - stats.norm.cdf(-500, > scale=354) > >>> >> 0.84217642881921018 > >>> >> > >>> >> Josef > >>> >> > > >>> >> > > >>> >> > /Johannes > >>> >> >> > >>> >> >> > > >>> >> >> > /Johannes > >>> >> >> > > >>> >> >> > -- > >>> >> >> > NEU: FreePhone - kostenlos mobil telefonieren! > >>> >> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone > >>> >> >> > _______________________________________________ > >>> >> >> > SciPy-User mailing list > >>> >> >> > SciPy-User at scipy.org > >>> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >>> >> >> > > >>> >> >> _______________________________________________ > >>> >> >> SciPy-User mailing list > >>> >> >> SciPy-User at scipy.org > >>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user > >>> >> > > >>> >> > -- > >>> >> > NEU: FreePhone - kostenlos mobil telefonieren! > >>> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone > >>> >> > _______________________________________________ > >>> >> > SciPy-User mailing list > >>> >> > SciPy-User at scipy.org > >>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user > >>> >> > > >>> >> _______________________________________________ > >>> >> SciPy-User mailing list > >>> >> SciPy-User at scipy.org > >>> >> http://mail.scipy.org/mailman/listinfo/scipy-user > >>> > > >>> > -- > >>> > NEU: FreePhone - kostenlos mobil telefonieren! > >>> > Jetzt informieren: http://www.gmx.net/de/go/freephone > >>> > _______________________________________________ > >>> > SciPy-User mailing list > >>> > SciPy-User at scipy.org > >>> > http://mail.scipy.org/mailman/listinfo/scipy-user > >>> > > >>> _______________________________________________ > >>> SciPy-User mailing list > >>> SciPy-User at scipy.org > >>> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > >> -- > >> NEU: FreePhone - kostenlos mobil telefonieren! > >> Jetzt informieren: http://www.gmx.net/de/go/freephone > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -- NEU: FreePhone - kostenlos mobil telefonieren! Jetzt informieren: http://www.gmx.net/de/go/freephone From josef.pktd at gmail.com Thu Jun 9 12:07:47 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 9 Jun 2011 12:07:47 -0400 Subject: [SciPy-User] How to fit a curve/function? In-Reply-To: <20110609155246.27670@gmx.net> References: <20110607113201.7B02.B1C76292@gmail.com> <20110608105217.222500@gmx.net> <20110608134101.259520@gmx.net> <20110608142743.77890@gmx.net> <20110608145625.162510@gmx.net> <20110609155246.27670@gmx.net> Message-ID: On Thu, Jun 9, 2011 at 11:52 AM, Johannes Radinger wrote: > Hello again... > > i try no to fit a curve using integrals as conditions. > the scipy manual says that integrations to infinite are possible with Inf, > > I tried following but I fail (saying inf is not defined): > > cond2 = 5.0/10/2 - integrate.quad(pdf,35000,Inf,args=(s1,s2))[0] > > what causes the problem? Do I use quad/Inf in a wrong way? numpy.inf inf doesn't exist in python itself Josef > > The error is: > NameError: global name 'Inf' is not defined > > /Johannes > > -------- Original-Nachricht -------- >> Datum: Wed, 8 Jun 2011 11:37:52 -0400 >> Von: josef.pktd at gmail.com >> An: SciPy Users List >> Betreff: Re: [SciPy-User] How to fit a curve/function? > >> On Wed, Jun 8, 2011 at 11:37 AM, ? wrote: >> > On Wed, Jun 8, 2011 at 10:56 AM, Johannes Radinger >> wrote: >> >> >> >> -------- Original-Nachricht -------- >> >>> Datum: Wed, 8 Jun 2011 10:33:45 -0400 >> >>> Von: josef.pktd at gmail.com >> >>> An: SciPy Users List >> >>> Betreff: Re: [SciPy-User] How to fit a curve/function? >> >> >> >>> On Wed, Jun 8, 2011 at 10:27 AM, Johannes Radinger >> >>> wrote: >> >>> > >> >>> > -------- Original-Nachricht -------- >> >>> >> Datum: Wed, 8 Jun 2011 10:12:58 -0400 >> >>> >> Von: josef.pktd at gmail.com >> >>> >> An: SciPy Users List >> >>> >> Betreff: Re: [SciPy-User] How to fit a curve/function? >> >>> > >> >>> >> On Wed, Jun 8, 2011 at 9:41 AM, Johannes Radinger >> >> >>> >> wrote: >> >>> >> > >> >>> >> > -------- Original-Nachricht -------- >> >>> >> >> Datum: Wed, 8 Jun 2011 07:10:38 -0400 >> >>> >> >> Von: josef.pktd at gmail.com >> >>> >> >> An: SciPy Users List >> >>> >> >> Betreff: Re: [SciPy-User] How to fit a curve/function? >> >>> >> > >> >>> >> >> On Wed, Jun 8, 2011 at 6:52 AM, Johannes Radinger >> >> >>> >> >> wrote: >> >>> >> >> > Hello, >> >>> >> >> > >> >>> >> >> > I've got following function describing any kind of animal >> >>> dispersal >> >>> >> >> kernel: >> >>> >> >> > >> >>> >> >> > def pdf(x,s1,s2): >> >>> >> >> > ? ?return >> >>> >> >> >> >>> >> >> >>> >> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) >> >>> >> >> > >> >>> >> >> > On the other hand I've got data from literature with which I >> want >> >>> to >> >>> >> fit >> >>> >> >> the function so that I get s1, s2 and x. >> >>> >> >> > Ususally the data in the literature are as follows: >> >>> >> >> > >> >>> >> >> > Example 1: 50% of the animals are between -270m and +270m and >> 90% >> >>> >> ?are >> >>> >> >> between -500m and + 500m >> >>> >> >> > >> >>> >> >> > Example 2: 84% is between - 5000 m and +5000m, and 73% are >> between >> >>> >> >> -1000m and +1000m >> >>> >> >> > >> >>> >> >> > So far as I understand an integration of the function is >> needed to >> >>> >> solve >> >>> >> >> for s1 and s2 as all the literature data give percentage (area >> under >> >>> >> the >> >>> >> >> curve) Can that be used to fit the curve or can that create >> ranges >> >>> for >> >>> >> s1 >> >>> >> >> and s2. >> >>> >> >> >> >>> >> >> I don't see a way around integration. >> >>> >> >> >> >>> >> >> If you have exactly 2 probabilities, then you can you a solver >> like >> >>> >> >> scipy.optimize.fsolve to match the probabilites >> >>> >> >> eg. >> >>> >> >> 0.5 = integral pdf from -270 to 270 >> >>> >> >> 0.9 = integral pdf from -500 to 500 >> >>> >> >> >> >>> >> >> If you have more than 2 probabilities, then using optimization >> of a >> >>> >> >> weighted function of the moment conditions would be better. >> >>> >> >> >> >>> >> >> Josef >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > Hello again >> >>> >> > >> >>> >> > I tried following, but without success so far. What do I have to >> do >> >>> >> excactly... >> >>> >> > >> >>> >> > import numpy >> >>> >> > from scipy import stats >> >>> >> > from scipy import integrate >> >>> >> > from scipy.optimize import fsolve >> >>> >> > import math >> >>> >> > >> >>> >> > p=0.3 >> >>> >> > >> >>> >> > def pdf(x,s1,s2): >> >>> >> > ? ?return >> >>> >> >> >>> >> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(s2*math.sqrt(2*math.pi))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) >> >>> >> > >> >>> >> > def equ(s1,s2): >> >>> >> > ? ?0.5==integrate.quad(pdf,-270,270,args=(s1,s2)) >> >>> >> > ? ?0.9==integrate.quad(pdf,-500,500,args=(s1,s2)) >> >>> >> > >> >>> >> > result=fsolve(equ, 1,500) >> >>> >> > >> >>> >> > print result >> >>> >> >> >>> >> equ needs to return the deviation of the equations (I changed some >> >>> >> details for s1 just to try it) >> >>> >> >> >>> >> import numpy >> >>> >> from scipy import stats >> >>> >> from scipy import integrate >> >>> >> from scipy.optimize import fsolve >> >>> >> import math >> >>> >> >> >>> >> p=0.3 >> >>> >> >> >>> >> def pdf(x,s1,s2): >> >>> >> ? ? return >> >>> >> >> >>> >> (p/(math.sqrt(2*math.pi*s1**2))*numpy.exp(-((x-0)**(2)/(2*s1**(2)))))+((1-p)/(math.sqrt(2*math.pi*s2**2))*numpy.exp(-((x-0)**(2)/(2*s2**(2))))) >> >>> >> >> >>> >> def equ(arg): >> >>> >> ? ? s1,s2 = numpy.abs(arg) >> >>> >> ? ? cond1 = 0.5 - integrate.quad(pdf,-270,270,args=(s1,s2))[0] >> >>> >> ? ? cond2 = 0.9 - integrate.quad(pdf,-500,500,args=(s1,s2))[0] >> >>> >> ? ? return [cond1, cond2] >> >>> >> >> >>> >> result=fsolve(equ, [200., 1200]) >> >> >> >> thank you for your last reply...seems that the parameters of the two >> normals are nearly identical... anyway just two small addtional questions: >> >> >> >> 1)in fsolve(equ, [200., 1200]) the 200 and 1200 are kind of start >> values so far as I understand...how should these be choosen? what is >> recommended? >> > >> > There is no general solution for choosing starting values, in your >> > case it should be possible to >> > >> >>>> q = np.array([0.5, 0.9]) >> >>>> cr = x/stats.norm.ppf(0.5 + q/2.) >> >>>> x = [270, 500] >> >>>> q = np.array([0.5, 0.9]) >> >>>> x = [270, 500] >> >>>> cr = x/stats.norm.ppf(0.5 + q/2.) >> >>>> stats.norm.cdf(500, scale=cr[1]) - stats.norm.cdf(-500, scale=cr[1]) >> > 0.89999999999999991 >> ------- >> I forgot to remove the typos >> >>>> stats.norm.cdf(q[0], scale=cr[1]) - stats.norm.cdf(-q[0], >> scale=cr[0]) >> > 0.0011545021185267457 >> >>>> stats.norm.cdf(q[0], scale=cr[0]) - stats.norm.cdf(-q[0], >> scale=cr[0]) >> > 0.000996601515122153 >> --------- >> >>>> stats.norm.cdf(x[0], scale=cr[0]) - stats.norm.cdf(-x[0], >> scale=cr[0]) >> > 0.5 >> >>>> sol = fsolve(equ, np.sort(cr)) >> > >> > there are some numerical problems finding the solution (???) >> > >> >>>> equ(sol) >> > array([-0.05361093, ?0.05851309]) >> >>>> from pprint import pprint >> >>>> pprint(fsolve(equ, np.sort(cr), xtol=1e-10, full_output=1)) >> > (array([ 354.32616549, ?354.69918062]), >> > ?{'fjac': array([[-0.7373189 , -0.67554484], >> > ? ? ? [ 0.67554484, -0.7373189 ]]), >> > ?'fvec': array([-0.05361093, ?0.05851309]), >> > ?'nfev': 36, >> > ?'qtf': array([ ?1.40019135e-07, ?-7.93593929e-02]), >> > ?'r': array([ -5.21390161e-04, ?-1.21700831e-03, ? 3.88274320e-07])}, >> > ?5, >> > ?'The iteration is not making good progress, as measured by the \n >> > improvement from the last ten iterations.') >> > >> >> >> >> 2) How can that be solve if I have I third condition (overfitted) can >> that be used as well or how does the alternative look like? >> > >> > use optimize.leastsq on equ (I never tried this for this case) >> > use fmin on the sum of squared errors >> > >> > if the intervals for the probabilities are non-overlapping (interval >> > data), then there is an optimal weighting matrix, (but my code for >> > that in the statsmodels.sandbox is not verified). >> > >> > Josef >> > >> > >> >> >> >> /johannes >> >> >> >>> >> >> >>> >> print result >> >>> >> >> >>> >> but in the results I get the parameters are very close to each >> other >> >>> >> [-356.5283675 ? 353.82544075] >> >>> >> >> >>> >> the pdf looks just like a mixture of 2 normals both with loc=0, >> then >> >>> >> maybe the cdf of norm can be used directly >> >>> > >> >>> > >> >>> > Thank you for that hint... First yes these are 2 superimposed >> normals >> >>> but for other reasons I want to use the original formula instead of >> the >> >>> stats.functions... >> >>> > >> >>> > anyway there is still a thing...the locator s1 and s2 are like the >> scale >> >>> parameter of stats.norm so the are both + and -. For fsolve above it >> seems >> >>> that I get only one parameter (s1 or s2) but for the positive and >> negative >> >>> side of the distribution. So in actually there are four parameters >> -s1, >> >>> +s1, -s2, +s2. How can I solve that? Maybe I can restrict the fsolve >> to look >> >>> for the two values only in the positive range... >> >>> >> >>> It doesn't really matter, if the scale only shows up in quadratic >> >>> terms, or as in my initial change I added a absolute value, so whether >> >>> it's positive or negative, it's still only one value, and we >> >>> interprete it as postive scale >> >>> >> >>> s1 = sqrt(s1**2) >> >>> >> >>> Josef >> >>> >> >>> > >> >>> > any guesses? >> >>> > >> >>> > /J >> >>> > >> >>> >> >> >>> >> >>> from scipy import stats >> >>> >> >>> stats.norm.cdf(270, scale=350) - stats.norm.cdf(-270, >> scale=350) >> >>> >> 0.55954705470577526 >> >>> >> >>> >> >>> >> >>> stats.norm.cdf(270, scale=354) - stats.norm.cdf(-270, >> scale=354) >> >>> >> 0.55436474670960978 >> >>> >> >>> stats.norm.cdf(500, scale=354) - stats.norm.cdf(-500, >> scale=354) >> >>> >> 0.84217642881921018 >> >>> >> >> >>> >> Josef >> >>> >> > >> >>> >> > >> >>> >> > /Johannes >> >>> >> >> >> >>> >> >> > >> >>> >> >> > /Johannes >> >>> >> >> > >> >>> >> >> > -- >> >>> >> >> > NEU: FreePhone - kostenlos mobil telefonieren! >> >>> >> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone >> >>> >> >> > _______________________________________________ >> >>> >> >> > SciPy-User mailing list >> >>> >> >> > SciPy-User at scipy.org >> >>> >> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> >>> >> >> > >> >>> >> >> _______________________________________________ >> >>> >> >> SciPy-User mailing list >> >>> >> >> SciPy-User at scipy.org >> >>> >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >>> >> > >> >>> >> > -- >> >>> >> > NEU: FreePhone - kostenlos mobil telefonieren! >> >>> >> > Jetzt informieren: http://www.gmx.net/de/go/freephone >> >>> >> > _______________________________________________ >> >>> >> > SciPy-User mailing list >> >>> >> > SciPy-User at scipy.org >> >>> >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> >>> >> > >> >>> >> _______________________________________________ >> >>> >> SciPy-User mailing list >> >>> >> SciPy-User at scipy.org >> >>> >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >>> > >> >>> > -- >> >>> > NEU: FreePhone - kostenlos mobil telefonieren! >> >>> > Jetzt informieren: http://www.gmx.net/de/go/freephone >> >>> > _______________________________________________ >> >>> > SciPy-User mailing list >> >>> > SciPy-User at scipy.org >> >>> > http://mail.scipy.org/mailman/listinfo/scipy-user >> >>> > >> >>> _______________________________________________ >> >>> SciPy-User mailing list >> >>> SciPy-User at scipy.org >> >>> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> >> -- >> >> NEU: FreePhone - kostenlos mobil telefonieren! >> >> Jetzt informieren: http://www.gmx.net/de/go/freephone >> >> _______________________________________________ >> >> SciPy-User mailing list >> >> SciPy-User at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > -- > NEU: FreePhone - kostenlos mobil telefonieren! > Jetzt informieren: http://www.gmx.net/de/go/freephone > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From robert.kern at gmail.com Thu Jun 9 17:55:14 2011 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 9 Jun 2011 16:55:14 -0500 Subject: [SciPy-User] Conversion from 32bits IEEE floats to IBM floats In-Reply-To: References: Message-ID: On Thu, Jun 9, 2011 at 08:17, Matthieu Brucher wrote: > Hi, > I wondered if anyone had a conversion routine for 32 bits IEEE floats in an > array to IBM floats (stored in a 4 bytes integers). I have a routine for > doing the opposite, but not IEEE->IBM. > There are codes in C or other languages, but the trick is starting from an > array (don't know if it can be reinterpreted as an integer array easily). This is almost certainly not the most elegant, but it seems to work for me: import numpy as np def ibm2ieee(ibm): """ Converts an IBM floating point number into IEEE format. """ sign = ibm >> 31 & 0x01 exponent = ibm >> 24 & 0x7f mantissa = ibm & 0x00ffffff mantissa = (mantissa * np.float32(1.0)) / pow(2, 24) ieee = (1 - 2 * sign) * mantissa * np.power(np.float32(16.0), exponent - 64) return ieee def ieee2ibm(ieee): ieee = ieee.astype(np.float32) expmask = 0x7f800000 signmask = 0x80000000 mantmask = 0x7fffff asint = ieee.view('i4') signbit = asint & signmask exponent = ((asint & expmask) >> 23) - 127 # The IBM 7-bit exponent is to the base 16 and the mantissa is presumed to # be entirely to the right of the radix point. In contrast, the IEEE # exponent is to the base 2 and there is an assumed 1-bit to the left of the # radix point. exp16 = ((exponent+1) // 4) exp_remainder = (exponent+1) % 4 exp16 += exp_remainder != 0 downshift = np.where(exp_remainder, 4-exp_remainder, 0) ibm_exponent = np.clip(exp16 + 64, 0, 127) expbits = ibm_exponent << 24 # Add the implicit initial 1-bit to the 23-bit IEEE mantissa to get the # 24-bit IBM mantissa. Downshift it by the remainder from the exponent's # division by 4. It is allowed to have up to 3 leading 0s. ibm_mantissa = ((asint & mantmask) | 0x800000) >> downshift # Special-case 0.0 ibm_mantissa = np.where(ieee, ibm_mantissa, 0) expbits = np.where(ieee, expbits, 0) return signbit | expbits | ibm_mantissa -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From jsseabold at gmail.com Thu Jun 9 18:25:20 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 9 Jun 2011 18:25:20 -0400 Subject: [SciPy-User] [SciPy-user] fast small matrix multiplication with cython? In-Reply-To: <31793732.post@talk.nabble.com> References: