From ndbecker2 at gmail.com Sun Nov 1 08:26:18 2009 From: ndbecker2 at gmail.com (Neal Becker) Date: Sun, 01 Nov 2009 08:26:18 -0500 Subject: [SciPy-User] [SciPy-user] least-square filter design References: <26139404.post@talk.nabble.com> Message-ID: Tom K. wrote: > > > Neal Becker wrote: >> >> Anyone have code for least square (minimum mean square error) FIR filter >> design? >> > > Could you be a little more specific? scipy.signal.firwin almost designs a > least square low pass FIR filter if you use a rectangular window (I say > almost because like other packages the filter's response is normalized to > unity at DC so technically it is not least squares although the difference > is slight and decreases with increasing filter order). > > Do you need a transition band? What type of FIR filter: lowpass, > highpass, > bandpass, bandstop, or multiband? Are discrete samples OK, or do you need > a > continuous band (or set of bands)? Which type of filter - is symmetric > OK, or do you need antisymmetric? > > Or, are you talking about an adaptive filter? > I'm looking for something like this: http://www.mathworks.com/access/helpdesk/help/toolbox/filterdesign/ref/firls.html From amenity at enthought.com Sun Nov 1 08:59:42 2009 From: amenity at enthought.com (Amenity Applewhite) Date: Sun, 1 Nov 2009 07:59:42 -0600 Subject: [SciPy-User] November 6 EPD Webinar: How do I... use Envisage for GUIs? References: <74384495.1256959480096.JavaMail.root@p2-ws608.ad.prodcc.net> Message-ID: Having trouble viewing this email? Click here Friday, November 6: How do I... use Envisage for GUIs? Dear Leah, Envisage is a Python-based framework for building extensible applications. The Envisage Core and corresponding Envisage Plugins are components of the Enthought Tool Suite. We've found that Envisage grants us a degree of immediate functionality in our custom applications and have come to rely on the framework in much of our development. For November's EPD webinar, Corran Webster will show how you can hook together existing Envisage plugins to quickly create a new GUI. We'll also look at how you can easily turn an existing Traits UI interface into an Envisage plugin. New: Linux-ready webinars! In order to better serve the Linux-users among our subscribers, we've decided to begin hosting our EPD webinars on WebEx instead of GoToMeeting. This means that our original limit of 35 attendees will be scaled back to 30. As usual, EPD subscribers at a Basic level or above will be guaranteed seats for the event while the general public may add their name to the wait list here. EPD Webinar: How do I... use Envisage for GUIs? Friday, November 6 1pm CDT/6pm UTC Wait list We look forward to seeing you Friday! As always, feel free to contact us with questions, concerns, or suggestions for future webinar topics. Thanks, The Enthought Team QUICK LINKS ::: www.enthought.com code.enthought.com Facebook Enthought Blog Forward email This email was sent to leah at enthought.com by amenity at enthought.com. Update Profile/Email Address | Instant removal with SafeUnsubscribe? | Privacy Policy. Enthought, Inc. | 515 Congress Ave. | Suite 2100 | Austin | TX | 78701 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tpk at kraussfamily.org Sun Nov 1 15:26:55 2009 From: tpk at kraussfamily.org (Tom K.) Date: Sun, 1 Nov 2009 12:26:55 -0800 (PST) Subject: [SciPy-User] [SciPy-user] least-square filter design In-Reply-To: References: <26139404.post@talk.nabble.com> Message-ID: <26154273.post@talk.nabble.com> Neal Becker wrote: > > Tom K. wrote: > >> >> >> Neal Becker wrote: >>> >>> Anyone have code for least square (minimum mean square error) FIR filter >>> design? >>> > I'm looking for something like this: > http://www.mathworks.com/access/helpdesk/help/toolbox/filterdesign/ref/firls.html > > Here's something that works for odd length, symmetric filters with constant magnitude per band - it doesn't support everything that MathWorks' firls supports (e.g. design of even length, differentiators, antisymmetric filters, sloping bands) but hopefully this meets your need. import numpy as np from scipy.special import sinc def firls(N, f, D=None): """Least-squares FIR filter. N -- filter length, must be odd f -- list of tuples of band edges Units of band edges are Hz with 0.5 Hz == Nyquist and assumed 1 Hz sampling frequency D -- list of desired responses, one per band """ if D is None: D = [1, 0] assert len(D) == len(f), "must have one desired response per band" assert N%2 == 1, 'filter length must be odd' L = (N-1)//2 k = np.arange(L+1) k.shape = (1, L+1) j = k.T R = 0 r = 0 for i, (f0, f1) in enumerate(f): R += np.pi*f1*sinc(2*(j-k)*f1) - np.pi*f0*sinc(2*(j-k)*f0) + \ np.pi*f1*sinc(2*(j+k)*f1) - np.pi*f0*sinc(2*(j+k)*f0) r += D[i]*(2*np.pi*f1*sinc(2*j*f1) - 2*np.pi*f0*sinc(2*j*f0)) a = np.dot(np.linalg.inv(R), r) a.shape = (-1,) h = np.zeros(N) h[:L] = a[:0:-1]/2. h[L] = a[0] h[L+1:] = a[1:]/2. return h def plot_response(h, name): H = np.fft.fft(h, 2000) f = np.arange(2000)/2000. figure() semilogy(f, abs(H)) grid() setp(gca(), xlim=(0, .5)) xlabel('frequency (Hz)') ylabel('magnitude') title(name) if __name__ == '__main__': h = firls(31, [(0, .2), (.3, .5)]) from matplotlib.pyplot import * plot_response(h, 'lowpass') h = firls(51, [(0, .25), (.35, .5)], [0, 1]) plot_response(h, 'highpass') h = firls(51, [(0, .1), (.2, .3), (.4, .5)], [0, 1, 0]) plot_response(h, 'bandpass') show() -- View this message in context: http://old.nabble.com/least-square-filter-design-tp26083443p26154273.html Sent from the Scipy-User mailing list archive at Nabble.com. From arun.gokule at gmail.com Sun Nov 1 15:30:11 2009 From: arun.gokule at gmail.com (Arun Gokule) Date: Sun, 1 Nov 2009 12:30:11 -0800 Subject: [SciPy-User] scipy.linalg.det TypeError In-Reply-To: <20091030172409.GA1977@wombat.atmos.colostate.edu> References: <20091028190701.GA30122@wombat.atmos.colostate.edu> <45d1ab480910290158u7d274687t737a27690fd08497@mail.gmail.com> <20091030172409.GA1977@wombat.atmos.colostate.edu> Message-ID: Let us know if you need more help. On Fri, Oct 30, 2009 at 9:24 AM, Norm Wood wrote: > > > On 29 Oct., David Goldsmith wrote: > > Uncertain why you're having a problem - your sample code works for me: > > > > >>> import scipy.linalg > > >>> import numpy.linalg > > >>> A=np.matrix([[1.1, 1.9],[1.9,3.5]]) > > >>> y = numpy.linalg.det(A); y > > 0.23999999999999988 > > >>> y = scipy.linalg.det(A); y > > 0.23999999999999988 > > >>> scipy.__version__ > > '0.7.1' > > >>> np.__version__ > > '1.3.0rc2' > > Python 2.5 on Windoze Vista HPE > > > Thanks for checking, David. I'll have to take a closer look at how the > "get_flinalg_funcs" procedure works, and will probably try > rebuilding & reinstalling LAPACK, ATLAS, numpy and scipy from scratch. > > Norm > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tpk at kraussfamily.org Sun Nov 1 20:28:22 2009 From: tpk at kraussfamily.org (Tom K.) Date: Sun, 1 Nov 2009 17:28:22 -0800 (PST) Subject: [SciPy-User] [SciPy-user] least-square filter design In-Reply-To: <26154273.post@talk.nabble.com> References: <26139404.post@talk.nabble.com> <26154273.post@talk.nabble.com> Message-ID: <26155428.post@talk.nabble.com> Tom K. wrote: > > a = np.dot(np.linalg.inv(R), r) > It occurred to me that "inv" is not the right choice here... Replace that line with: a = np.linalg.solve(R, r) -- View this message in context: http://old.nabble.com/least-square-filter-design-tp26083443p26155428.html Sent from the Scipy-User mailing list archive at Nabble.com. From josef.pktd at gmail.com Sun Nov 1 20:45:09 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 1 Nov 2009 21:45:09 -0400 Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv Message-ID: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> This is just an exercise. In econometrics (including statsmodels) we have a lot of quadratic forms that are usually calculate with a matrix inverse. I finally spend some time to figure out how to do this with cholesky or LU composition which should be numerically more stable or accurate (and faster). "Don't let that INV go past your eyes" (matlab file exchange) Josef """ Use cholesky or LU decomposition to calculate quadratic forms different ways to calculate matrix product B.T * inv(B) * B Note: calling convention in sparse not consistent, sparse requires loop over right hand side Author: josef-pktd """ import numpy as np from scipy import linalg B = np.ones((3,2)).T B = np.arange(6).reshape((3,2)).T print 'using inv' Ainv = linalg.inv(A) print np.dot(Ainv, B[:,0]) print np.dot(Ainv, B) print reduce(np.dot, [B.T, Ainv, B]) print 'using cholesky' F = linalg.cho_factor(A) print linalg.cho_solve(F, B[:,0]) print linalg.cho_solve(F, B) print np.dot(B.T, linalg.cho_solve(F, B)) print 'using lu' F = linalg.lu_factor(A) print linalg.lu_solve(F, B[:,0]) print linalg.lu_solve(F, B) print np.dot(B.T, linalg.lu_solve(F, B)) from scipy import sparse Asp = sparse.csr_matrix(A) print 'using sparse symmetric lu' F = sparse.linalg.splu(A) print F.solve(B[:,0]) #print F.solve(B) # wrong results but no exception AiB = np.column_stack([F.solve(Bcol) for Bcol in B.T]) print AiB print np.dot(B.T, AiB) #not: #Bsp = sparse.csr_matrix(B) #print B.T * F.solve(Bsp) # argument to solve must be dense array print 'using sparse lu' F = sparse.linalg.factorized(A) print F(B[:,0]) #print F(B) # wrong results but no exception AiB = np.column_stack([F(Bcol) for Bcol in B.T]) print np.dot(B.T, AiB) From sturla at molden.no Sun Nov 1 22:03:14 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 02 Nov 2009 04:03:14 +0100 Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv In-Reply-To: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> Message-ID: <4AEE4BF2.4060403@molden.no> josef.pktd at gmail.com skrev: > In econometrics (including statsmodels) we have a lot of quadratic > forms that are usually calculate with a matrix inverse. That is a sign of numerical incompetence. You see this often in statistics as well, people who think matrix inverse is the way to calculate mahalanobis distances, when you should really use a Cholesky. As for LU, I'd rather use an SVD as it is numerically more stabile. Using LU, you are betting on singular values not being tiny. With SVD you can solve an ill-conditioned system by zeroing tiny singular values. With LU you just get astronomic rounding errors. Sturla From gael.varoquaux at normalesup.org Sun Nov 1 22:06:52 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 2 Nov 2009 04:06:52 +0100 Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv In-Reply-To: <4AEE4BF2.4060403@molden.no> References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> <4AEE4BF2.4060403@molden.no> Message-ID: <20091102030652.GB27768@phare.normalesup.org> On Mon, Nov 02, 2009 at 04:03:14AM +0100, Sturla Molden wrote: > josef.pktd at gmail.com skrev: > > In econometrics (including statsmodels) we have a lot of quadratic > > forms that are usually calculate with a matrix inverse. > That is a sign of numerical incompetence. Yup, but you'd be surprised to see how much inverse is used. I was astonished to find out that senior researchers that I respect a lot were not even aware of the problem. Ga?l From josef.pktd at gmail.com Sun Nov 1 22:25:42 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 1 Nov 2009 23:25:42 -0400 Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv In-Reply-To: <4AEE4BF2.4060403@molden.no> References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> <4AEE4BF2.4060403@molden.no> Message-ID: <1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com> On Sun, Nov 1, 2009 at 11:03 PM, Sturla Molden wrote: > josef.pktd at gmail.com skrev: >> In econometrics (including statsmodels) we have a lot of quadratic >> forms that are usually calculate with a matrix inverse. > That is a sign of numerical incompetence. I agree, but that's the training. Last time I did principal components with SVD, it took me a long time to figure out how to get it to work, I still don't understand it.The only matrix decomposition that I'm familiar with is eigenvalue decomposition. But we had this part of the discussion before, in applied econometrics, if we have enough multicollinearity that numerical precision matters, then we are screwed anyway and have to rethink the data analysis or the model, or do a pca. > > You see this often in statistics as well, people who think matrix > inverse is the way to calculate mahalanobis distances, when you should > really use a Cholesky. > > As for LU, I'd rather use an SVD as it is numerically more stabile. > Using LU, you are betting on singular values not being tiny. With SVD > you can solve an ill-conditioned system by zeroing tiny singular values. > With LU you just get astronomic rounding errors. How can you calculate the quadratic form or the product inv(A)*B with SVD? Solving the equations is ok, since pinv and lstsq are based on SVD internally. In matlab there is also a version for QR, but I haven't figured out how to do this in scipy without an inverse. Josef > > Sturla > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From sturla at molden.no Sun Nov 1 23:33:26 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 02 Nov 2009 05:33:26 +0100 Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv In-Reply-To: <1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com> References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> <4AEE4BF2.4060403@molden.no> <1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com> Message-ID: <4AEE6116.6070602@molden.no> josef.pktd at gmail.com skrev: > if we have enough multicollinearity that numerical > precision matters, then we are screwed anyway and have to rethink the > data analysis or the model, or do a pca. > > And PCA has nothing to do with SVD, right? Or ... what what would you call a procesure that takes your data, subtracts the mean, and does an SVD? :-D Sturla From josef.pktd at gmail.com Mon Nov 2 00:15:41 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 2 Nov 2009 01:15:41 -0400 Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv In-Reply-To: <4AEE6116.6070602@molden.no> References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> <4AEE4BF2.4060403@molden.no> <1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com> <4AEE6116.6070602@molden.no> Message-ID: <1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com> On Mon, Nov 2, 2009 at 12:33 AM, Sturla Molden wrote: > josef.pktd at gmail.com skrev: >> if we have enough multicollinearity that numerical >> precision matters, then we are screwed anyway and have to rethink the >> data analysis or the model, or do a pca. >> >> > And PCA has nothing to do with SVD, right? > Or ... what what would you call a procesure that takes your data, > subtracts the mean, and does an SVD? All the explanations I read where in terms of eigenvalue decomposition and not with SVD. I'm pretty good in removing negative eigenvalues when I'm supposed to have a positive definite matrix, but SVD has too many parts. (Besides I don't like pca for regression, and I'm still struggling how to do partial least squares with SVD.) Josef > > :-D > > > Sturla > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From robert.kern at gmail.com Mon Nov 2 00:19:50 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 2 Nov 2009 00:19:50 -0500 Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv In-Reply-To: <1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com> References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> <4AEE4BF2.4060403@molden.no> <1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com> <4AEE6116.6070602@molden.no> <1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com> Message-ID: <3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com> On Mon, Nov 2, 2009 at 00:15, wrote: > On Mon, Nov 2, 2009 at 12:33 AM, Sturla Molden wrote: >> josef.pktd at gmail.com skrev: >>> if we have enough multicollinearity that numerical >>> precision matters, then we are screwed anyway and have to rethink the >>> data analysis or the model, or do a pca. >>> >>> >> And PCA has nothing to do with SVD, right? > >> Or ... what what would you call a procesure that takes your data, >> subtracts the mean, and does an SVD? > > All the explanations I read where in terms of eigenvalue decomposition > and not with SVD. Eigenvalues of the covariance matrix. The SVD gives you eigenvalues of the covariance matrix directly from the demeaned data matrix without explicitly forming the covariance matrix. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From josef.pktd at gmail.com Mon Nov 2 00:55:42 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 2 Nov 2009 01:55:42 -0400 Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv In-Reply-To: <3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com> References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> <4AEE4BF2.4060403@molden.no> <1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com> <4AEE6116.6070602@molden.no> <1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com> <3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com> Message-ID: <1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com> On Mon, Nov 2, 2009 at 1:19 AM, Robert Kern wrote: > On Mon, Nov 2, 2009 at 00:15, ? wrote: >> On Mon, Nov 2, 2009 at 12:33 AM, Sturla Molden wrote: >>> josef.pktd at gmail.com skrev: >>>> if we have enough multicollinearity that numerical >>>> precision matters, then we are screwed anyway and have to rethink the >>>> data analysis or the model, or do a pca. >>>> >>>> >>> And PCA has nothing to do with SVD, right? >> >>> Or ... what what would you call a procesure that takes your data, >>> subtracts the mean, and does an SVD? >> >> All the explanations I read where in terms of eigenvalue decomposition >> and not with SVD. > > Eigenvalues of the covariance matrix. The SVD gives you eigenvalues of > the covariance matrix directly from the demeaned data matrix without > explicitly forming the covariance matrix. Good, I didn't realize this when I worked on the eig and svd versions of the pca. In a similar way, I was initially puzzled that pinv can be used on the data matrix or on the covariance matrix (only the latter I have seen in books). I will go back to do my homework, I just saw that numpy.linalg.pinv directly works with the svd. I never read the source of the linalgs, because I thought they are just direct calls to Lapack and Blas. Josef > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > ?-- Umberto Eco > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Mon Nov 2 01:09:52 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 2 Nov 2009 02:09:52 -0400 Subject: [SciPy-User] characteristic functions of probability distributions Message-ID: <1cd32cbb0911012209p117d86fbhd7dab9dbde7fbe46@mail.gmail.com> The characteristic function is just the (continuous) fourier transform of the probability density function. I tried to use fft and ifft to convert between the characteristic function and the density function but I don't manage to get the units or discretization correctly. Does anyone have an example script for any distribution. Right now it's mostly a theoretical exercise, but there are some interesting applications in finance. Second related question, since I'm not good with complex numbers. scipy.integrate.quad of a complex function returns the absolute value. Is there a numerical integration function in scipy that returns the complex integral or do I have to integrate the real and imaginary parts separately? Thanks, Josef From sturla at molden.no Mon Nov 2 04:37:26 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 02 Nov 2009 10:37:26 +0100 Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv In-Reply-To: <1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com> References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> <4AEE4BF2.4060403@molden.no> <1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com> <4AEE6116.6070602@molden.no> <1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com> <3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com> <1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com> Message-ID: <4AEEA856.2060109@molden.no> josef.pktd at gmail.com skrev: > Good, I didn't realize this when I worked on the eig and svd versions of > the pca. In a similar way, I was initially puzzled that pinv can be used > on the data matrix or on the covariance matrix (only the latter I have seen > in books). > > I'll try to explain... If you have a matrix C, you can factorize like this, with Sigma being a diagonal matrix: C = U * Sigma * V' >>> u,s,vt = np.linalg.svd(c) If C is square (rank n x n), we now have the inverse C**-1 = V * [S**-1] * U' >>> c_inv = np.mat(vt.T) * np.mat(np.eye(4)/s) * np.mat(u.T) And here you have the pathology diagnosis: A small value of s, will cause a huge value of 1/s. This is "ill-conditioning" that e.g. happens with multicolinearity. You get a small s, you divide by it, and rounding error skyrockets. We can improve the situation by editing the tiny values in Sigma to zero. That just changes C by a tiny amount, but might have a dramatic stabilizing effect on C**-1. Now you can do your LU and not worry. It might not be clear from statistics textbooks why multicolinearity is problem. But using SVD, we see both the problem and the solution very clearly: A small singular value might not contribute significantly to C, but could or severly affect or even dominate in C**-1. We can thus get a biased but numerically better approximation to C**-1 by deleting it from the equation. So after editing s, we could e.g. do: >>> c_fixed = np.mat(u) * np.mat(np.eye(4)*s) * np.mat(vt) and continue with LU on c_fixed to get the quadratic form. Also beware that you can solve C * x = b like this x = (V * [S**-1]) * (U' * b) But if we are to reapeat this for several values of b, it would make more sence to reconstruct C and go for the LU. Soving with LU also involves two matrix multiplications: L * y = b U * x = y but the computational demand is reduced by the triangular structure of L and U. Please don't say you'd rather preprocess data with a PCA. If C was a covariance matrix, we just threw out the smallest principal components out of the data. Deleting tiny singular values is in fact why PCA helps! Also beware that pca = lambda x : np.linalg.svd(x-x.mean(axis=0), full_matrices=0) So we can get PCA from SVD without even calculating the covariance. Now you have the standard deviations in Sigma, the principal components in V, and the factor loadings in U. SVD is how PCA is usually computed. It is better than estimating Cov(X), and then apply Jacobi rotations to get the eigenvalues and eigenvectors of Cov(X). One reason is that Cov(X) should be estimated using a "two-pass algorithm" to cancel accumulating rounding error (Am Stat, 37: p. 242-247). But that equation is not shown in most statistics textbooks, so most practitioners tend to not know of it . We can solve the common least squares problem using an SVD: b = argmin { || X * b - Y || ** 2 } If we do an SVD of X, we can compute b = sum( ((u[i,:] * Y )/s[i]) * vt[:,i].T ) Unlike the other methods of fitting least squares, this one cannot fail. And you also see clearly what a PCA will do: Skip "(u[i,:] * Y )/s[i]" for too small values of s[i] So you can preprocess with PCA anf fit LS in one shot. Ridge-regression (Tychonov regularization) is another solution to the multicollinearity problem: (A'A + lambda*I)*x = A'b But how would you choose the numerically optimal value of lambda? It turns out to be a case of SVD as well. Goloub & van Loan has that on page 583. QR with column pivoting can be seen as a case of SVD. Many use this for least-squares, not even knowing it is SVD. So SVD is ubiquitous in data modelling, even if you don't know it. :-) One more thing: The Cholesky factorization is always stabile, the LU is not. But don't be fooled: This only applies to the facotization itself. If you have multicolinearity, the problem is there even if you use Cholesky. You get the "singular value disease" (astronomic rounding error) when you solve the triangular system. A Cholesky can tell you if a covariance matrix is singular at your numerical precision. An SVD can tell you how close to singularity it is, and how to fix it. SVD comes at a cost, which is slower computation. But usually it is worth the extra investment in CPU cycles. Sturla Molden From bsouthey at gmail.com Mon Nov 2 09:46:22 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 02 Nov 2009 08:46:22 -0600 Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv In-Reply-To: <4AEE4BF2.4060403@molden.no> References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> <4AEE4BF2.4060403@molden.no> Message-ID: <4AEEF0BE.3010508@gmail.com> On 11/01/2009 09:03 PM, Sturla Molden wrote: > josef.pktd at gmail.com skrev: > >> In econometrics (including statsmodels) we have a lot of quadratic >> forms that are usually calculate with a matrix inverse. >> > That is a sign of numerical incompetence. > By whom? :-) Sure there are cases that just require solving of a linear system when inverses perhaps should not be used. But there are a lot of other cases especially statistical that you require an inverse such as getting standard errors and using solving algorithms that are way faster than than those that do not use inverses. Although, statistical problems are usually bad numerically, some of the issues like speed and precision get further away with modern 64-bit cpus. Really it is getting the right tool for the job in hand. Bruce From souheil.inati at nyu.edu Mon Nov 2 09:50:36 2009 From: souheil.inati at nyu.edu (Souheil Inati) Date: Mon, 2 Nov 2009 09:50:36 -0500 Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv In-Reply-To: <4AEEF0BE.3010508@gmail.com> References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> <4AEE4BF2.4060403@molden.no> <4AEEF0BE.3010508@gmail.com> Message-ID: <4A9052E2-0E8F-4EE8-A01A-20E93D38ACC5@nyu.edu> On Nov 2, 2009, at 9:46 AM, Bruce Southey wrote: > On 11/01/2009 09:03 PM, Sturla Molden wrote: >> josef.pktd at gmail.com skrev: >> >>> In econometrics (including statsmodels) we have a lot of quadratic >>> forms that are usually calculate with a matrix inverse. >>> >> That is a sign of numerical incompetence. >> > By whom? :-) > Sure there are cases that just require solving of a linear system when > inverses perhaps should not be used. But there are a lot of other > cases > especially statistical that you require an inverse such as getting > standard errors and using solving algorithms that are way faster than > than those that do not use inverses. > > Although, statistical problems are usually bad numerically, some of > the > issues like speed and precision get further away with modern 64-bit > cpus. Really it is getting the right tool for the job in hand. > > Bruce > Sorry, this statement is misleading. Precision has nothing to do with 64-bit cpus - that's just of storing bigger matrices. -Souheil --------------------------------- Souheil Inati, PhD Research Associate Professor Center for Neural Science and Department of Psychology Chief Physicist, NYU Center for Brain Imaging New York University 4 Washington Place, Room 809 New York, N.Y., 10003-6621 Office: (212) 998-3741 Fax: (212) 995-4011 Email: souheil.inati at nyu.edu From souheil.inati at nyu.edu Mon Nov 2 09:53:25 2009 From: souheil.inati at nyu.edu (Souheil Inati) Date: Mon, 2 Nov 2009 09:53:25 -0500 Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv In-Reply-To: <4AEEA856.2060109@molden.no> References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> <4AEE4BF2.4060403@molden.no> <1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com> <4AEE6116.6070602@molden.no> <1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com> <3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com> <1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com> <4AEEA856.2060109@molden.no> Message-ID: <1DD0444A-4815-4F78-B7A4-2E5425427A71@nyu.edu> On Nov 2, 2009, at 4:37 AM, Sturla Molden wrote: > josef.pktd at gmail.com skrev: >> Good, I didn't realize this when I worked on the eig and svd >> versions of >> the pca. In a similar way, I was initially puzzled that pinv can be >> used >> on the data matrix or on the covariance matrix (only the latter I >> have seen >> in books). >> >> > > I'll try to explain... If you have a matrix C, you can factorize like > this, with Sigma being a diagonal matrix: > > C = U * Sigma * V' > >>>> u,s,vt = np.linalg.svd(c) > > If C is square (rank n x n), we now have the inverse > > C**-1 = V * [S**-1] * U' > >>>> c_inv = np.mat(vt.T) * np.mat(np.eye(4)/s) * np.mat(u.T) > > And here you have the pathology diagnosis: > > A small value of s, will cause a huge value of 1/s. This is > "ill-conditioning" that e.g. happens with multicolinearity. You get a > small s, you divide by it, and rounding error skyrockets. We can > improve > the situation by editing the tiny values in Sigma to zero. That just > changes C by a tiny amount, but might have a dramatic stabilizing > effect > on C**-1. Now you can do your LU and not worry. It might not be clear > from statistics textbooks why multicolinearity is problem. But using > SVD, we see both the problem and the solution very clearly: A small > singular value might not contribute significantly to C, but could or > severly affect or even dominate in C**-1. We can thus get a biased but > numerically better approximation to C**-1 by deleting it from the > equation. So after editing s, we could e.g. do: > >>>> c_fixed = np.mat(u) * np.mat(np.eye(4)*s) * np.mat(vt) > > and continue with LU on c_fixed to get the quadratic form. > > Also beware that you can solve > > C * x = b > > like this > > x = (V * [S**-1]) * (U' * b) > > But if we are to reapeat this for several values of b, it would make > more sence to reconstruct C and go for the LU. Soving with LU also > involves two matrix multiplications: > > L * y = b > U * x = y > > but the computational demand is reduced by the triangular structure > of L > and U. > > Please don't say you'd rather preprocess data with a PCA. If C was a > covariance matrix, we just threw out the smallest principal components > out of the data. Deleting tiny singular values is in fact why PCA > helps! > > Also beware that > > pca = lambda x : np.linalg.svd(x-x.mean(axis=0), full_matrices=0) > > So we can get PCA from SVD without even calculating the covariance. > Now > you have the standard deviations in Sigma, the principal components in > V, and the factor loadings in U. SVD is how PCA is usually computed. > It > is better than estimating Cov(X), and then apply Jacobi rotations to > get > the eigenvalues and eigenvectors of Cov(X). One reason is that Cov(X) > should be estimated using a "two-pass algorithm" to cancel > accumulating > rounding error (Am Stat, 37: p. 242-247). But that equation is not > shown in most statistics textbooks, so most practitioners tend to not > know of it . > We can solve the common least squares problem using an SVD: > > b = argmin { || X * b - Y || ** 2 } > > If we do an SVD of X, we can compute > > b = sum( ((u[i,:] * Y )/s[i]) * vt[:,i].T ) > > Unlike the other methods of fitting least squares, this one cannot > fail. > And you also see clearly what a PCA will do: > > Skip "(u[i,:] * Y )/s[i]" for too small values of s[i] > > So you can preprocess with PCA anf fit LS in one shot. > > Ridge-regression (Tychonov regularization) is another solution to the > multicollinearity problem: > > (A'A + lambda*I)*x = A'b > > But how would you choose the numerically optimal value of lambda? It > turns out to be a case of SVD as well. Goloub & van Loan has that on > page 583. > > QR with column pivoting can be seen as a case of SVD. Many use this > for > least-squares, not even knowing it is SVD. So SVD is ubiquitous in > data > modelling, even if you don't know it. :-) > > One more thing: The Cholesky factorization is always stabile, the LU > is > not. But don't be fooled: This only applies to the facotization > itself. > If you have multicolinearity, the problem is there even if you use > Cholesky. You get the "singular value disease" (astronomic rounding > error) when you solve the triangular system. A Cholesky can tell you > if > a covariance matrix is singular at your numerical precision. An SVD > can > tell you how close to singularity it is, and how to fix it. SVD > comes at > a cost, which is slower computation. But usually it is worth the > extra > investment in CPU cycles. > > Sturla Molden I agree with Sturla's comment's above 100%. You should almost always use SVD to understand your linear system properties. For least squares fitting QR is the modern, stable algorithm of choise. (see for example the matlab \ operator). It's really a crime that we don't teach SVD and QR. There are two sources of error: 1. noise in the measurement and 2. noise in the numerics (rounding, division, etc.). A properly constructed linear system solver will take care of the second type of error (rounding, etc.). If your system is ill-conditioned, then you need to control the inversion so that the signal is maintained and the noise is not amplified too much. In the overwhelming majority of applications, the SNR isn't better than 1000:1. If you know your the relative size of your noise and signal, then you can control the SNR in your parameter estimates by choosing the svd truncation (noise amplification factor). For those of you that want an accessible reference for numerical stability in linear algebra, this book is a must read: Numerical Linear Algebra, Lloyd Trefethen http://www.amazon.com/Numerical-Linear-Algebra-Lloyd-Trefethen/dp/0898713617 Cheers, Souheil --------------------------------- Souheil Inati, PhD Research Associate Professor Center for Neural Science and Department of Psychology Chief Physicist, NYU Center for Brain Imaging New York University 4 Washington Place, Room 809 New York, N.Y., 10003-6621 Office: (212) 998-3741 Fax: (212) 995-4011 Email: souheil.inati at nyu.edu From josef.pktd at gmail.com Mon Nov 2 10:38:59 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 2 Nov 2009 10:38:59 -0500 Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv In-Reply-To: <1DD0444A-4815-4F78-B7A4-2E5425427A71@nyu.edu> References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> <4AEE4BF2.4060403@molden.no> <1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com> <4AEE6116.6070602@molden.no> <1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com> <3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com> <1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com> <4AEEA856.2060109@molden.no> <1DD0444A-4815-4F78-B7A4-2E5425427A71@nyu.edu> Message-ID: <1cd32cbb0911020738i7172d246o61e63a32fd2a71c1@mail.gmail.com> On Mon, Nov 2, 2009 at 9:53 AM, Souheil Inati wrote: > > On Nov 2, 2009, at 4:37 AM, Sturla Molden wrote: > >> josef.pktd at gmail.com skrev: >>> Good, I didn't realize this when I worked on the eig and svd >>> versions of >>> the pca. In a similar way, I was initially puzzled that pinv can be >>> used >>> on the data matrix or on the covariance matrix (only the latter I >>> have seen >>> in books). >>> >>> >> >> I'll try to explain... If you have a matrix C, you can factorize like >> this, with Sigma being a diagonal matrix: >> >> ? C = U * Sigma * V' >> >>>>> u,s,vt = np.linalg.svd(c) >> >> If C is square (rank n x n), we now have the inverse >> >> ? C**-1 = V * [S**-1] * U' >> >>>>> c_inv = np.mat(vt.T) * np.mat(np.eye(4)/s) * np.mat(u.T) >> >> And here you have the pathology diagnosis: >> >> A small value of s, will cause a huge value of 1/s. This is >> "ill-conditioning" that e.g. happens with multicolinearity. You get a >> small s, you divide by it, and rounding error skyrockets. We can >> improve >> the situation by editing the tiny values in Sigma to zero. That just >> changes C by a tiny amount, but might have a dramatic stabilizing >> effect >> on C**-1. Now you can do your LU and not worry. It might not be clear >> from statistics textbooks why multicolinearity is problem. But using >> SVD, we see both the problem and the solution very clearly: A small >> singular value might not contribute significantly to C, but could or >> severly affect or even dominate in C**-1. We can thus get a biased but >> numerically better approximation to C**-1 by deleting it from the >> equation. So after editing s, we could e.g. do: >> >>>>> c_fixed = np.mat(u) * np.mat(np.eye(4)*s) * np.mat(vt) >> >> and continue with LU on c_fixed to get the quadratic form. >> >> Also beware that you can solve >> >> ? C * x = b >> >> like this >> >> ? x = (V * [S**-1]) * (U' * b) >> >> But if we are to reapeat this for several values of b, it would make >> more sence to reconstruct C and go for the LU. Soving with LU also >> involves two matrix multiplications: >> >> ? L * y = b >> ? U * x = y >> >> but the computational demand is reduced by the triangular structure >> of L >> and U. >> >> Please don't say you'd rather preprocess data with a PCA. If C was a >> covariance matrix, we just threw out the smallest principal components >> out of the data. Deleting tiny singular values is in fact why PCA >> helps! >> >> Also beware that >> >> ? pca = lambda x : np.linalg.svd(x-x.mean(axis=0), full_matrices=0) >> >> So we can get PCA from SVD without even calculating the covariance. >> Now >> you have the standard deviations in Sigma, the principal components in >> V, and the factor loadings in U. SVD is how PCA is usually computed. >> It >> is better than estimating Cov(X), and then apply Jacobi rotations to >> get >> the eigenvalues and eigenvectors of ?Cov(X). One reason is that Cov(X) >> should be estimated using a "two-pass algorithm" to cancel >> accumulating >> rounding error (Am Stat, 37: p. ?242-247). But that equation is not >> shown in most statistics textbooks, so most practitioners tend to not >> know of it . >> We can solve the common least squares problem using an SVD: >> >> ? b = argmin { || X * b - Y ?|| ?** ?2 } >> >> If we do an SVD of X, we can compute >> >> ? b = sum( ((u[i,:] * Y )/s[i]) * vt[:,i].T ) >> >> Unlike the other methods of fitting least squares, this one cannot >> fail. >> And you also see clearly what a PCA will do: >> >> ? Skip "(u[i,:] * Y )/s[i]" for too small values of s[i] >> >> So you can preprocess with PCA anf fit LS in one shot. >> >> Ridge-regression (Tychonov regularization) is another solution to the >> multicollinearity problem: >> >> ? ?(A'A + lambda*I)*x = A'b >> >> But how would you choose the numerically optimal value of lambda? It >> turns out to be a case of SVD as well. Goloub & van Loan has that on >> page 583. >> >> QR with column pivoting can be seen as a case of SVD. Many use this >> for >> least-squares, not even knowing it is SVD. So SVD is ubiquitous in >> data >> modelling, even if you don't know it. :-) >> >> One more thing: The Cholesky factorization is always stabile, the LU >> is >> not. But don't be fooled: This only applies to the facotization >> itself. >> If you have multicolinearity, the problem is there even if you use >> Cholesky. You get the "singular value disease" (astronomic rounding >> error) when you solve the triangular system. A Cholesky can tell you >> if >> a covariance matrix is singular at your numerical precision. An SVD >> can >> tell you how close to singularity it is, and how to fix it. SVD >> comes at >> a cost, which is ?slower ?computation. But usually it is worth the >> extra >> investment in CPU cycles. >> >> Sturla Molden Thanks Sturla, I'm going to slowly work my way through this. at least I'm able now to calculate a inverse matrix squareroot, which is useful for the quadratic form and we will switch away from some of the remaining uses of the matrix inverse in statsmodels. > > > I agree with Sturla's comment's above 100%. ? ?You should almost > always use SVD to understand your linear system properties. ? For > least squares fitting QR is the modern, stable algorithm of choise. > (see for example the matlab \ operator). ? It's really a crime that we > don't teach SVD and QR. > > There are two sources of error: 1. noise in the measurement and 2. > noise in the numerics (rounding, division, etc.). ? A properly > constructed linear system solver will take care of the second type of > error (rounding, etc.). ?If your system is ill-conditioned, then you > need to control the inversion so that the signal is maintained and the > noise is not amplified too much. ?In the overwhelming majority of > applications, the SNR isn't better than 1000:1. ?If you know your the > relative size of your noise and signal, then you can control the SNR > in your parameter estimates by choosing the svd truncation (noise > amplification factor). > > For those of you that want an accessible reference for numerical > stability in linear algebra, this book is a must read: > Numerical Linear Algebra, Lloyd Trefethen > http://www.amazon.com/Numerical-Linear-Algebra-Lloyd-Trefethen/dp/0898713617 It really depends on the application. From the applications I know, pca is used for dimension reduction, when there are way too many regressors to avoid overfitting. The most popular in econometrics might be Forecasting Using Principal Components from a Large Number of Predictors # James H. Stock and Mark W. Watson # Journal of the American Statistical Association, Vol. 97, No. 460 (Dec., 2002), pp. 1167-1179 A similar problem exists in chemometrics with more regressors than observations (at least from the descriptions I read when reading about NIPALS). I don't think that compared to the big stochastic errors, numerical precision plays much of a role. When we have large estimation errors in small samples in statistics, we don't have to worry, for example, about 10e-15 precision when our sampling errors are 10e-1. Of course, there are other applications, and I'm working my way slowly through the numerical issues. Josef > > Cheers, > Souheil > > --------------------------------- > > Souheil Inati, PhD > Research Associate Professor > Center for Neural Science and Department of Psychology > Chief Physicist, NYU Center for Brain Imaging > New York University > 4 Washington Place, Room 809 > New York, N.Y., 10003-6621 > Office: (212) 998-3741 > Fax: ? ? (212) 995-4011 > Email: souheil.inati at nyu.edu > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From souheil.inati at nyu.edu Mon Nov 2 11:26:07 2009 From: souheil.inati at nyu.edu (Souheil Inati) Date: Mon, 2 Nov 2009 11:26:07 -0500 Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv In-Reply-To: <1cd32cbb0911020738i7172d246o61e63a32fd2a71c1@mail.gmail.com> References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> <4AEE4BF2.4060403@molden.no> <1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com> <4AEE6116.6070602@molden.no> <1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com> <3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com> <1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com> <4AEEA856.2060109@molden.no> <1DD0444A-4815-4F78-B7A4-2E5425427A71@nyu.edu> <1cd32cbb0911020738i7172d246o61e63a32fd2a71c1@mail.gmail.com> Message-ID: On Nov 2, 2009, at 10:38 AM, josef.pktd at gmail.com wrote: > > snip > > It really depends on the application. From the applications I know, > pca is used for dimension reduction, when there are way too many > regressors to avoid overfitting. The most popular in econometrics > might be > > Forecasting Using Principal Components from a Large Number of > Predictors > # James H. Stock and Mark W. Watson > # Journal of the American Statistical Association, Vol. 97, No. 460 > (Dec., 2002), pp. 1167-1179 > > A similar problem exists in chemometrics with more regressors than > observations (at least from the descriptions I read when reading about > NIPALS). > > I don't think that compared to the big stochastic errors, numerical > precision plays much of a role. When we have large estimation errors > in small samples in statistics, we don't have to worry, for example, > about 10e-15 precision when our sampling errors are 10e-1. > > Of course, there are other applications, and I'm working my way slowly > through the numerical issues. > > Josef Hi Josef, I have a strong opinion about this, and I am almost certainly in the minority, but my feeling is this: once you have ill-conditioning all bets are off. Once the problem is ill-conditioned, then there are an infinite number of solutions that match your data in a least-squares sense. You are then required to say something further about how you want to pick a particular solution from among the infinite number of equivalent solutions. SVD/PCA is a procedure to find the minimum-two-norm solution that fits the data. The minimum two-norm solution is unique. For the general case, SVD is the only method that has a proper theory. There is no proper theory for anything else, PERIOD. The only other useful thing one can say is that if you expect your solution to be sparse, then you can use the newly developed theory of compressed sensing (tao and candes). This says that the minimum one- norm solution is best in a statistical sense and provides an algorithm to find it. The difference between SVD and compressed sensing is that the former spreads the power out equally among the coefficients, while the latter picks the solution that maximizes the magnitude of some of the cofficients and sets others to zero (i.e. picks a sparse answer). So if you're problem is ill-conditioned, then you are in trouble. Your only legitimate options are to you use the SVD to pick the minimum two-norm answer, or to use compressed sensing and pick the minimum one-norm answer. Everything else is completely nonsense. Cheers, Souheil From sturla at molden.no Mon Nov 2 11:31:16 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 02 Nov 2009 17:31:16 +0100 Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv In-Reply-To: <1cd32cbb0911020738i7172d246o61e63a32fd2a71c1@mail.gmail.com> References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> <4AEE4BF2.4060403@molden.no> <1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com> <4AEE6116.6070602@molden.no> <1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com> <3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com> <1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com> <4AEEA856.2060109@molden.no> <1DD0444A-4815-4F78-B7A4-2E5425427A71@nyu.edu> <1cd32cbb0911020738i7172d246o61e63a32fd2a71c1@mail.gmail.com> Message-ID: <4AEF0954.8060509@molden.no> josef.pktd at gmail.com skrev: > It really depends on the application. From the applications I know, > pca is used for dimension reduction, when there are way too many > regressors to avoid overfitting. Too many regressors gives you one or more tiny singular values in the covariance matrix (X'X), which you use in: betas = (X'X)**-1 * X' * y So the inverse of X'X is heavily influenced by one or more of these "singular values" that do not contribute significantly to X'X. That is obviously ridicilous, because we want the factors that determines X'X to determinate the inverse, (X'X)**-1, as well. I.e. we want the regressors (betas) we estimate to be determined by the same factors that determines X'X. So we proceed by doing SVD on X'X and throw the offenders out. And in statistics, that is called "PCA". And small singular values in X'X is known as "multicolinearity". When multicolinearity is present, numerical stability is the problem: 1 / s[i] becomes infinite for s[i] == 0, and thus s[i] dominates (X'X)**-1 completely. But with s[i] == 0, s[i] does not even contribute to X'X. So it makes sence to edit too small s[i] values out, so that only the values of s[i] important for X'X is used to compute (X'X)**-1 and betas. And that is what PCA does. Statistics textbooks usually don't teach this. They just say "multicolinearity is bad". Yes PCA is used for "dimensionality reduction" and avoiding overfitting. But why is overfitting a problem anyway? And why does PCA help? This is actually all entagled. The main issue is alwys that 1/s[i] is big when s[i] is small. Overfitting gives you a lot of these big 1/s values. And now the betas you solved does not reflect the signal in X'X, so the model has no predictive power. Sturla From cimrman3 at ntc.zcu.cz Mon Nov 2 11:46:57 2009 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Mon, 02 Nov 2009 17:46:57 +0100 Subject: [SciPy-User] reverse Cuthill-McKee Message-ID: <4AEF0D01.1080209@ntc.zcu.cz> Hi, I need an implementation of the (symmetric) reverse Cuthill-McKee matrix reordering algorithm. Is anyone aware of an implementation callable from Python? A scipy CSR/CSC matrix-based one would be the best, of course. thanks, r. From bsouthey at gmail.com Mon Nov 2 12:31:53 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 02 Nov 2009 11:31:53 -0600 Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv In-Reply-To: <4AEF0954.8060509@molden.no> References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> <4AEE4BF2.4060403@molden.no> <1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com> <4AEE6116.6070602@molden.no> <1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com> <3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com> <1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com> <4AEEA856.2060109@molden.no> <1DD0444A-4815-4F78-B7A4-2E5425427A71@nyu.edu> <1cd32cbb0911020738i7172d246o61e63a32fd2a71c1@mail.gmail.com> <4AEF0954.8060509@molden.no> Message-ID: <4AEF1789.5060405@gmail.com> On 11/02/2009 10:31 AM, Sturla Molden wrote: > josef.pktd at gmail.com skrev: > >> It really depends on the application. From the applications I know, >> pca is used for dimension reduction, when there are way too many >> regressors to avoid overfitting. >> > Too many regressors gives you one or more tiny singular values in the > covariance matrix (X'X), which you use in: > > betas = (X'X)**-1 * X' * y > > So the inverse of X'X is heavily influenced by one or more of these > "singular values" that do not contribute significantly to X'X. That is > obviously ridicilous, because we want the factors that determines X'X to > determinate the inverse, (X'X)**-1, as well. I.e. we want the regressors > (betas) we estimate to be determined by the same factors that determines > X'X. > > So we proceed by doing SVD on X'X and throw the offenders out. And in > statistics, that is called "PCA". And small singular values in X'X is > known as "multicolinearity". > > > > When multicolinearity is present, numerical stability is the problem: > > 1 / s[i] becomes infinite for s[i] == 0, and thus s[i] dominates > (X'X)**-1 completely. But with s[i] == 0, s[i] does not even contribute > to X'X. So it makes sence to edit too small s[i] values out, so that > only the values of s[i] important for X'X is used to compute (X'X)**-1 > and betas. And that is what PCA does. Statistics textbooks usually don't > teach this. They just say "multicolinearity is bad". > > Yes PCA is used for "dimensionality reduction" and avoiding overfitting. > But why is overfitting a problem anyway? And why does PCA help? This is > actually all entagled. The main issue is alwys that 1/s[i] is big when > s[i] is small. Overfitting gives you a lot of these big 1/s values. And > now the betas you solved does not reflect the signal in X'X, so the > model has no predictive power. > > > Sturla > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Well that is fine if you are doing feature extraction but not feature selection. Most of statistical problems involve feature selection so obviously it gets more space and time. Feature extraction has relatively very limited use in statistics (usually when 'black boxes' are useful) so it is usually taught as an advanced topic. Bruce From josef.pktd at gmail.com Mon Nov 2 12:40:47 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 2 Nov 2009 12:40:47 -0500 Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv In-Reply-To: <4AEF0954.8060509@molden.no> References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> <1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com> <4AEE6116.6070602@molden.no> <1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com> <3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com> <1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com> <4AEEA856.2060109@molden.no> <1DD0444A-4815-4F78-B7A4-2E5425427A71@nyu.edu> <1cd32cbb0911020738i7172d246o61e63a32fd2a71c1@mail.gmail.com> <4AEF0954.8060509@molden.no> Message-ID: <1cd32cbb0911020940x741047e5m57e9d31d46f0f6bd@mail.gmail.com> On Mon, Nov 2, 2009 at 11:26 AM, Souheil Inati wrote: > > I have a strong opinion about this, and I am almost certainly in the > minority, but my feeling is this: once you have ill-conditioning all > bets are off. > > Once the problem is ill-conditioned, then there are an infinite number > of solutions that match your data in a least-squares sense. You are > then required to say something further about how you want to pick a > particular solution from among the infinite number of equivalent > solutions. I think, that's the point. However, the solution in economics is not to replace the decision about your solution by a numerical procedure that selects one for the researcher. In statsmodels, I looked at the estimation results using pinv, which is exactly svd plus throw away tiny singular values (np.linalg.pinv). The problem is that this provides a nice solution and doesn't ring an alarm bell, I want to have exceptions or infinite standard errors for the parameter estimates. Handling multicollinearity has to be an explicit task and a conscious choice by the researcher, e.g. I used Ridge Regression (Tychonov), Bayesian priors, reparameterization and variable selection in the past. The choice of multicollinearity correction has to be reported in the results. If pinv (or svd) is blindly used, because there is no warning, then we will see researchers presenting their "nice" parameter estimates, which completely hide the fact that the parameters are actually not identified. I think I worry more about numerical precision and efficiency when the multicollinearity is not yet so extreme that we have to drop (near)zero eigenvalues. On Mon, Nov 2, 2009 at 11:31 AM, Sturla Molden wrote: > josef.pktd at gmail.com skrev: >> It really depends on the application. From the applications I know, >> pca is used for dimension reduction, when there are way too many >> regressors to avoid overfitting. > > Too many regressors gives you one or more tiny singular values in the > covariance matrix (X'X), which you use in: > > ? betas = (X'X)**-1 * X' * y > > So the inverse of X'X is heavily influenced by one or more of these > "singular values" that do not contribute significantly to X'X. That is > obviously ridicilous, because we want the factors that determines X'X to > determinate the inverse, (X'X)**-1, as well. I.e. we want the regressors > (betas) we estimate to be determined by the same factors that determines > X'X. > > So we proceed by doing SVD on X'X and throw the offenders out. And in > statistics, that is called "PCA". And small singular values in X'X is > known as "multicolinearity". > I think this applies to forecasting, but not when parameter estimates and standard errors of the parameter estimates are the primary interest. > > When multicolinearity is present, numerical stability is the problem: > > 1 / s[i] ?becomes infinite for s[i] == 0, and thus s[i] dominates > (X'X)**-1 completely. But with s[i] == 0, s[i] does not even contribute > to X'X. So it makes sence to edit too small s[i] values out, so that > only the values of s[i] important for X'X is used to compute (X'X)**-1 > and betas. And that is what PCA does. Statistics textbooks usually don't > teach this. They just say "multicolinearity is bad". > > Yes PCA is used for "dimensionality reduction" and avoiding overfitting. > But why is overfitting a problem anyway? And why does PCA help? This is > actually all entagled. The main issue is alwys that 1/s[i] is big when > s[i] is small. Overfitting gives you a lot of these big 1/s values. And > now the betas you solved does not reflect the signal in X'X, so the > model has no predictive power. I'm not sure you need high multicollinearity to have overfitting. Overfitting is still a problem after dropping the near zero singular values, if many of the variables just capture variation in the past data that doesn't really reflect the data generating process. I think, cross validation and parameter selection usually select fewer variables than would be required for positive definiteness. Josef > > > Sturla > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From bsouthey at gmail.com Mon Nov 2 13:31:28 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 02 Nov 2009 12:31:28 -0600 Subject: [SciPy-User] linear algebra: quadratic forms without linalg.inv In-Reply-To: <1cd32cbb0911020940x741047e5m57e9d31d46f0f6bd@mail.gmail.com> References: <1cd32cbb0911011745p64a447fjcd14adee13ecb58a@mail.gmail.com> <1cd32cbb0911011925x78e852b4l92c9545d2044df98@mail.gmail.com> <4AEE6116.6070602@molden.no> <1cd32cbb0911012115r35cdcc8el46fc8491c5a22bb4@mail.gmail.com> <3d375d730911012119t2d8d64d5qad3a7c048e5ec36a@mail.gmail.com> <1cd32cbb0911012155n4f460592ocf9ce02122f601c9@mail.gmail.com> <4AEEA856.2060109@molden.no> <1DD0444A-4815-4F78-B7A4-2E5425427A71@nyu.edu> <1cd32cbb0911020738i7172d246o61e63a32fd2a71c1@mail.gmail.com> <4AEF0954.8060509@molden.no> <1cd32cbb0911020940x741047e5m57e9d31d46f0f6bd@mail.gmail.com> Message-ID: <4AEF2580.3050804@gmail.com> On 11/02/2009 11:40 AM, josef.pktd at gmail.com wrote: > On Mon, Nov 2, 2009 at 11:26 AM, Souheil Inati wrote: > >> I have a strong opinion about this, and I am almost certainly in the >> minority, but my feeling is this: once you have ill-conditioning all >> bets are off. >> >> Once the problem is ill-conditioned, then there are an infinite number >> of solutions that match your data in a least-squares sense. You are >> then required to say something further about how you want to pick a >> particular solution from among the infinite number of equivalent >> solutions. >> > I think, that's the point. However, the solution in economics is not to > replace the decision about your solution by a numerical procedure > that selects one for the researcher. > > In statsmodels, I looked at the estimation results using pinv, which is > exactly svd plus throw away tiny singular values (np.linalg.pinv). > Please do not confuse SVD with pinv as these are not the same functions. pinv returns a Moore Penrose inverse: http://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_pseudoinverse Thus pinv is implemented using SVD but that is not the only way to get a Moore Penrose inverse. > The problem is that this provides a nice solution and doesn't ring > an alarm bell, I want to have exceptions or infinite standard errors for > the parameter estimates. > Handling multicollinearity has to be an explicit task and a conscious choice > by the researcher, e.g. I used Ridge Regression (Tychonov), Bayesian priors, > reparameterization and variable selection in the past. > The choice of multicollinearity correction has to be reported in the results. > If pinv (or svd) is blindly used, because there is no warning, then we will see > researchers presenting their "nice" parameter estimates, which completely > hide the fact that the parameters are actually not identified. > There is no people 'blindly' using these methods as these are the basics of linear algebra and really has nothing to do with multicollinearity. When you have an overdetermined system to solve then there are an infinite number of solutions and you can not use the inverse to solve the normal equations. The most common approach is to rely on a generalized inverse (http://en.wikipedia.org/wiki/Generalized_inverse - not a great reference) to solve it - of which the Moore Penrose inverse is one specific type. When these are used such as in analysis of variance, then the results are not wrong, not hidden and totally accepted by the scientific community. But it does rely on the user to know when things are not as expected (which is usually trivial because the degrees of freedom are not as expected). Bruce From dominique.orban at gmail.com Mon Nov 2 14:32:50 2009 From: dominique.orban at gmail.com (Dominique Orban) Date: Mon, 2 Nov 2009 15:32:50 -0400 Subject: [SciPy-User] reverse Cuthill-McKee Message-ID: <8793ae6e0911021132m37fd2560ma00ed66a4effcc86@mail.gmail.com> > ---------- Forwarded message ---------- > From:?Robert Cimrman > To:?SciPy Users List > Date:?Mon, 02 Nov 2009 17:46:57 +0100 > Subject:?[SciPy-User] reverse Cuthill-McKee > Hi, > > I need an implementation of the (symmetric) reverse Cuthill-McKee matrix reordering algorithm. Is anyone aware of an implementation callable from Python? A scipy CSR/CSC matrix-based one would be the best, of course. > > thanks, > r. Hi Robert, I just pushed my repo to GitHub so you can try it out. I'm using the implementation from the Harwell Subroutine Library which you'll need to grab from their website. Specify your source dir in site.cfg and you should be good to go. git clone git://github.com/dpo/pyorder.git Cheers, Dominique From vanforeest at gmail.com Mon Nov 2 15:51:06 2009 From: vanforeest at gmail.com (nicky van foreest) Date: Mon, 2 Nov 2009 21:51:06 +0100 Subject: [SciPy-User] characteristic functions of probability distributions In-Reply-To: <1cd32cbb0911012209p117d86fbhd7dab9dbde7fbe46@mail.gmail.com> References: <1cd32cbb0911012209p117d86fbhd7dab9dbde7fbe46@mail.gmail.com> Message-ID: Hi Josef, 2009/11/2 : > The characteristic function is just the (continuous) fourier transform > of the probability density function. > > I tried to use fft and ifft to convert between the characteristic > function and the density function but I don't manage to get the units > or discretization correctly. Does anyone have an example script for > any distribution. Right now it's mostly a theoretical exercise, but > there are some interesting applications in finance. There is an inversion formula used to invert the characteristic function to the distribution function (the density should follow easily then), see e.g. Chung (or any other book on graduate probability). I don't know about its numerical properties though. The formula is used to prove the central limit theorem. I also recall that Ward Whitt (see his homepage) used Fourier theory to invert Laplace transforms. He was also concerned with numerical properties, so this might be the best place to look for. He also uses the inversion formula, and refers to Feller. > > Second related question, since I'm not good with complex numbers. > > scipy.integrate.quad of a complex function returns the absolute value. > Is there a numerical integration function in scipy that returns the > complex integral or do I have to integrate the real and imaginary > parts separately? You want to compute \int_w^z f(t) dt? When f is analytic (i.e., satisfies the Cauchy Riemann equations) this integral is path independent. Otherwise the path from w to z is of importance. You might like the book Visual Complex Analysis by Needham for intuition. bye Nicky > > Thanks, > > Josef > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From lorenzo.isella at gmail.com Tue Nov 3 06:16:19 2009 From: lorenzo.isella at gmail.com (Lorenzo Isella) Date: Tue, 3 Nov 2009 12:16:19 +0100 Subject: [SciPy-User] Wrapping C/C++ Code Message-ID: Dear All, I hope this is not too off-topic. If you were asked to wrap C/C++ codes into a Python application (potentially relying on NumPy/SciPy) which route would you follow? Bear in mind that the initial C/C++ code is a standalone program which was not written having Python in mind at all. Many thanks Lorenzo From cimrman3 at ntc.zcu.cz Tue Nov 3 10:05:36 2009 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Tue, 03 Nov 2009 16:05:36 +0100 Subject: [SciPy-User] reverse Cuthill-McKee In-Reply-To: <8793ae6e0911021132m37fd2560ma00ed66a4effcc86@mail.gmail.com> References: <8793ae6e0911021132m37fd2560ma00ed66a4effcc86@mail.gmail.com> Message-ID: <4AF046C0.7090706@ntc.zcu.cz> Dominique Orban wrote: >> ---------- Forwarded message ---------- >> From: Robert Cimrman >> To: SciPy Users List >> Date: Mon, 02 Nov 2009 17:46:57 +0100 >> Subject: [SciPy-User] reverse Cuthill-McKee >> Hi, >> >> I need an implementation of the (symmetric) reverse Cuthill-McKee matrix reordering algorithm. Is anyone aware of an implementation callable from Python? A scipy CSR/CSC matrix-based one would be the best, of course. >> >> thanks, >> r. > > Hi Robert, > > I just pushed my repo to GitHub so you can try it out. I'm using the > implementation from the Harwell Subroutine Library which you'll need > to grab from their website. Specify your source dir in site.cfg and > you should be good to go. > > git clone git://github.com/dpo/pyorder.git > > Cheers, > Dominique Hi Dominique, thanks! I may ultimately need something BSD-ish, but your code would be great to test the stuff against. cheers, r. From rpg.314 at gmail.com Tue Nov 3 10:23:03 2009 From: rpg.314 at gmail.com (Rohit Garg) Date: Tue, 3 Nov 2009 20:53:03 +0530 Subject: [SciPy-User] Wrapping C/C++ Code In-Reply-To: References: Message-ID: <4d5dd8c20911030723y4a98cc74v18ec4a7ceb81ce70@mail.gmail.com> The first thing to do is to expose an API from your program that your script can access. It'll likely not be done as it was written with one language in mind. After that it's your call whether you want to embed or extend the interpreter. For extending, IMHO, SWIG is your friend. On Tue, Nov 3, 2009 at 4:46 PM, Lorenzo Isella wrote: > Dear All, > I hope this is not too off-topic. > If you were asked to wrap ?C/C++ codes into a Python application > (potentially relying on NumPy/SciPy) which route would you follow? > Bear in mind that the initial C/C++ code is a standalone program which > was not written having Python in mind at all. > Many thanks > > Lorenzo > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Rohit Garg http://rpg-314.blogspot.com/ Senior Undergraduate Department of Physics Indian Institute of Technology Bombay From bsouthey at gmail.com Tue Nov 3 11:06:43 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 03 Nov 2009 10:06:43 -0600 Subject: [SciPy-User] Wrapping C/C++ Code In-Reply-To: <4d5dd8c20911030723y4a98cc74v18ec4a7ceb81ce70@mail.gmail.com> References: <4d5dd8c20911030723y4a98cc74v18ec4a7ceb81ce70@mail.gmail.com> Message-ID: <4AF05513.8080307@gmail.com> On 11/03/2009 09:23 AM, Rohit Garg wrote: > The first thing to do is to expose an API from your program that your > script can access. It'll likely not be done as it was written with > one language in mind. > > After that it's your call whether you want to embed or extend the > interpreter. For extending, IMHO, SWIG is your friend. > > On Tue, Nov 3, 2009 at 4:46 PM, Lorenzo Isella wrote: > >> Dear All, >> I hope this is not too off-topic. >> If you were asked to wrap C/C++ codes into a Python application >> (potentially relying on NumPy/SciPy) which route would you follow? >> Bear in mind that the initial C/C++ code is a standalone program which >> was not written having Python in mind at all. >> Many thanks >> >> Lorenzo >> First you should determine if it is worth accessing that code/program. Since you are going to use numpy then it may be worth the effort to rewrite the required parts using numpy/scipy/Cython. If you have no control over the development or it needs to be a standalone program then you probably should call it through Python. The reason is that you probably have little control over code maintenance and how any changes will impact your code. If the code is a stable then I agree with Rohit that swig is a viable option. Bruce From zachary.pincus at yale.edu Tue Nov 3 11:25:52 2009 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Tue, 3 Nov 2009 11:25:52 -0500 Subject: [SciPy-User] Wrapping C/C++ Code In-Reply-To: <4AF05513.8080307@gmail.com> References: <4d5dd8c20911030723y4a98cc74v18ec4a7ceb81ce70@mail.gmail.com> <4AF05513.8080307@gmail.com> Message-ID: <22A68FE3-73C7-4AC0-BBAF-B38448AC9219@yale.edu> Another option instead of SWIG, if you have a reasonably stable C API and a pre-built shared library exporting the same, is to use ctypes to call into it. This works well enough with many numeric APIs, too where you can allocate arrays with numpy, and then use the array's ctypes attribute to get at a pointer to the memory suitable for passing into the C code. The downside is that (as far as I know) there's no good way to build pure-C libraries as part of a "python setup.py build" step (though some functionality along these lines might be now in the numpy distutuls?). Zach From rmrndr at unife.it Tue Nov 3 14:04:57 2009 From: rmrndr at unife.it (ANDREA ARMAROLI) Date: Tue, 3 Nov 2009 20:04:57 +0100 Subject: [SciPy-User] Troubles with odeint or ode In-Reply-To: References: Message-ID: <20091103185905.M2354@unife.it> Dear users, I'm trying to solve an ODE system that models a parametric oscillator with complex amplitudes at two freqencies. I'm new to python. This problem is simply solved in matlab using ode45. If I try to use odeint I cannot set parameters like tolerances or max step. The result is constant-valued solutions. Trying with ode class and ZVODE integrator, I have that total intensity is not conserved. I have weird quasi-periodic oscillations. I do know that my equations have excess degrees of freedom and I know from symmetries what the integrals of motion are, but in Matlab this works pretty well. Here is the code with odeint import numpy as N import scipy as S import scipy.integrate import pylab as P def deriv(Y,t): A1 = Y[0] A2 = Y[1] A1d = 1j*A2*N.conj(A1) A2d = 1j*(A1**2/2.0 + dk*A2) return [A1d, A2d] nplot = 10000 zmax = 10.0 zstep = zmax/nplot dk = 0.5 # normalised detuning/dispersion phi0 = 0.0*N.pi # initial dephasing eta0 = 0.3 # initial pump intensity # initial values u20 = N.sqrt(eta0)*N.exp(1j*phi0) u10 = N.sqrt(2*(1-eta0)) H0=dk*eta0+2*N.sqrt(eta0)*(1-eta0)*N.cos(phi0); Y,info = scipy.integrate.odeint(deriv,[u10,u20],N.arange(0,zmax+zstep,zstep),full_output=True,printmessg = True) # are conserved quantities constant? Ptot = N.abs(Y[:,0])**2/2+ N.abs(Y[:,1])**2 P.plot(N.arange(0,zmax+zstep-0.0001,zstep),Ptot) # then Hamiltonian... #... p1 = P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,0])) p2 = P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,1])) Thank you very much for your help. Andrea Armaroli From peridot.faceted at gmail.com Tue Nov 3 14:28:09 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Tue, 3 Nov 2009 14:28:09 -0500 Subject: [SciPy-User] Troubles with odeint or ode In-Reply-To: <20091103185905.M2354@unife.it> References: <20091103185905.M2354@unife.it> Message-ID: 2009/11/3 ANDREA ARMAROLI : > Dear users, > > I'm trying to solve an ODE system that models a parametric oscillator with > complex amplitudes at two freqencies. > > I'm new to python. This problem is simply solved in matlab using ode45. > > If I try to use odeint I cannot set parameters like tolerances or max step. You can in fact control these using the (optional) parameters hmax, rtol, and atol. > The result is constant-valued solutions. After a bit of experimentation (in particular, a print statement inside your derivative function) it turns out that the problem is that odeint does not support complex values (it silently discards imaginary parts). This is not a major obstacle, since you can just pack and unpack the values yourself. When I do that, the plot I get is two oscillatory results (the absolute values), and one nice flat line (for the total, which I'm guessing you need to be conserved). I have to say, it would be very good if odeint either reported an error or worked with complex values, so you would have found this easier to track down. But it looks like it does a reasonable job of solving your problem once you work around its lack of complex support. Anne > Trying with ode class and ZVODE integrator, I have that total intensity is not > conserved. I have weird quasi-periodic oscillations. > > I do know that my equations have excess degrees of freedom and I know from > symmetries what the integrals of motion are, but in Matlab this works pretty well. > > Here is the code with odeint > > import numpy as N > import scipy as S > import scipy.integrate > import pylab as P > > def deriv(Y,t): > > A1 = Y[0] > A2 = Y[1] > A1d = 1j*A2*N.conj(A1) > A2d = 1j*(A1**2/2.0 + dk*A2) > return [A1d, A2d] > > > > > nplot = 10000 > zmax = 10.0 > zstep = zmax/nplot > > dk = 0.5 # normalised detuning/dispersion > phi0 = 0.0*N.pi # initial dephasing > eta0 = 0.3 # initial pump intensity > > # initial values > u20 = N.sqrt(eta0)*N.exp(1j*phi0) > u10 = N.sqrt(2*(1-eta0)) > > H0=dk*eta0+2*N.sqrt(eta0)*(1-eta0)*N.cos(phi0); > > Y,info = > scipy.integrate.odeint(deriv,[u10,u20],N.arange(0,zmax+zstep,zstep),full_output=True,printmessg > = True) > # are conserved quantities constant? > Ptot = N.abs(Y[:,0])**2/2+ N.abs(Y[:,1])**2 > P.plot(N.arange(0,zmax+zstep-0.0001,zstep),Ptot) > # then Hamiltonian... > #... > > p1 = P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,0])) > p2 = P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,1])) > > > Thank you very much for your help. > > Andrea Armaroli > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- A non-text attachment was scrubbed... Name: foo.py Type: text/x-python Size: 1235 bytes Desc: not available URL: From fperez.net at gmail.com Tue Nov 3 14:28:55 2009 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 3 Nov 2009 11:28:55 -0800 Subject: [SciPy-User] [ANN] For SF Bay Area residents: a discussion with Guido at the Berkeley Py4Science seminar Message-ID: Hi folks, if you reside in the San Francisco Bay Area, you may be interested in a meeting we'll be having tomorrow November 4 (2-4 pm), as part of our regular py4science meeting series. Guido van Rossum, the creator of the Python language, will visit for a session where we will first do a very rapid overview of a number of scientific projects that use Python (in a lightning talk format) and then we will have an open discussion with Guido with hopefully interesting questions going in both directions. The meeting is open to all, bring your questions! More details on this seminar series (including location) can be found here: https://cirl.berkeley.edu/view/Py4Science Cheers, f From jagan_cbe2003 at yahoo.co.in Tue Nov 3 14:58:32 2009 From: jagan_cbe2003 at yahoo.co.in (jagan prabhu) Date: Wed, 4 Nov 2009 01:28:32 +0530 (IST) Subject: [SciPy-User] fmin_slsqp- Bounds are not obeyed Message-ID: <807497.44749.qm@web8316.mail.in.yahoo.com> Dear users, Problem is " Bounded with inequality constrains", in slssqp often Bounds are not obeyed, its deviates the bounds. So if deviates i made it to come back with in the region(bounds). But i face a problem in execution. i get error like, File "/usr/lib/python2.5/site-packages/scipy/optimize/slsqp.py", line 277, in fmin_slsqp ??? c_ieq = array([ ieqcons[i](x) for i in range(len(ieqcons)) ]) ? File "/usr/lib/python2.5/site-packages/scipy/optimize/optimize.py", line 97, in function_wrapper ??? return function(x, *args) TypeError: () takes exactly 1 argument (2 given) program will look like, #import os #import scipy.optimize from scipy import * import numpy from scipy import optimize from numpy import asarray from math import * def cst(aParams,bounds): ? aParams = numpy.asarray(aParams) ? for par in range(len(aParams)): ?? if ((bounds[par][0]<= aParams[par]<= bounds[par][1])): ???? pass ?? else: ???? if (aParams[par]< bounds[par][0]): aParams[par] = bounds[par][0] ???? if (aParams[par]> bounds[par][1]): aParams[par] = bounds[par][1] ? x = aParams[0] ? y = aParams[1] ? z = aParams[2] # objective function ? eqn = -cos(x)*cos(y)*cos(z)*log(-((x-pi)**2-(y-pi)**2-(z-pi)**2)) ? return eqn #Initial guess Init = numpy.array([5.0,15.0,17.0]) # parameters x,y,z bounds = [(2.0, 20000.0),(4.0, 50000.0),(5.0, 60000.0)] # inequality constraints x must be least,y larger than x smaller than z,and z the largest of all con1 = lambda x:numpy.asarray(x[1]-x[0], x[2]-x[1]) opt = fmin_slsqp(cst,Init,ieqcons= [con1] , bounds=bounds, fprime = None, args=(bounds,), full_output=True, iter=20000, iprint=2, acc=0.001) print '****************************************' print opt[0] print opt[1] print opt[2] print opt[4] Problems are, 1, bounds i could not able to pass to the function as args( ). 2, Whether implementation of the ineq. constraints are correct? any better way? 3, How to avoid bounds deviation? Please help me. Regards, Prabhu Now, send attachments up to 25MB with Yahoo! India Mail. Learn how. http://in.overview.mail.yahoo.com/photos -------------- next part -------------- An HTML attachment was scrubbed... URL: From zachary.pincus at yale.edu Tue Nov 3 15:09:17 2009 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Tue, 3 Nov 2009 15:09:17 -0500 Subject: [SciPy-User] fmin_slsqp- Bounds are not obeyed In-Reply-To: <807497.44749.qm@web8316.mail.in.yahoo.com> References: <807497.44749.qm@web8316.mail.in.yahoo.com> Message-ID: <7AE2F264-385C-4CC9-92CB-C486D6C05693@yale.edu> Hello, > Problem is " Bounded with inequality constrains", in slssqp often > Bounds are not obeyed, its deviates the bounds. So if deviates i > made it to come back with in the region(bounds). But i face a > problem in execution. i get error like, > > File "/usr/lib/python2.5/site-packages/scipy/optimize/slsqp.py", > line 277, in fmin_slsqp > c_ieq = array([ ieqcons[i](x) for i in range(len(ieqcons)) ]) > File "/usr/lib/python2.5/site-packages/scipy/optimize/ > optimize.py", line 97, in function_wrapper > return function(x, *args) > TypeError: () takes exactly 1 argument (2 given) I can't help with the bounds not being obeyed... but I can say that you'll probably need to provide more detail about what you mean by this for others to help though -- is it that the optimizer will occasionally evaluate the objective outside of the bounds (which I understand is normal?) or that the final results are out-of-bounds? Anyhow, the traceback explains exactly what the problem with the execution is. You define your inequality constraint as: con1 = lambda x:numpy.asarray(x[1]-x[0], x[2]-x[1]) but from the traceback, you can see that it is being called like: function(x, *args) The error is quite clear on the problem: your lambda takes one argument, but it is called with two. I assume that x is the current position, and args is just what you passed to the slsqp. So you should rewrite con1 as lambda x, args: whatever... Zach On Nov 3, 2009, at 2:58 PM, jagan prabhu wrote: > Dear users, > > Problem is " Bounded with inequality constrains", in slssqp often > Bounds are not obeyed, its deviates the bounds. So if deviates i > made it to come back with in the region(bounds). But i face a > problem in execution. i get error like, > > File "/usr/lib/python2.5/site-packages/scipy/optimize/slsqp.py", > line 277, in fmin_slsqp > c_ieq = array([ ieqcons[i](x) for i in range(len(ieqcons)) ]) > File "/usr/lib/python2.5/site-packages/scipy/optimize/ > optimize.py", line 97, in function_wrapper > return function(x, *args) > TypeError: () takes exactly 1 argument (2 given) > > program will look like, > > > #import os > #import scipy.optimize > from scipy import * > import numpy > from scipy import optimize > from numpy import asarray > from math import * > > > def cst(aParams,bounds): > aParams = numpy.asarray(aParams) > for par in range(len(aParams)): > if ((bounds[par][0]<= aParams[par]<= bounds[par][1])): > pass > else: > if (aParams[par]< bounds[par][0]): aParams[par] = bounds[par][0] > if (aParams[par]> bounds[par][1]): aParams[par] = bounds[par][1] > > x = aParams[0] > y = aParams[1] > z = aParams[2] > # objective function > eqn = -cos(x)*cos(y)*cos(z)*log(-((x-pi)**2-(y-pi)**2-(z-pi)**2)) > return eqn > > > #Initial guess > Init = numpy.array([5.0,15.0,17.0]) # parameters x,y,z > bounds = [(2.0, 20000.0),(4.0, 50000.0),(5.0, 60000.0)] > # inequality constraints x must be least,y larger than x smaller > than z,and z the largest of all > con1 = lambda x:numpy.asarray(x[1]-x[0], x[2]-x[1]) > > > opt = fmin_slsqp(cst,Init,ieqcons= [con1] , bounds=bounds, fprime = > None, args=(bounds,), full_output=True, iter=20000, iprint=2, > acc=0.001) > > print '****************************************' > > print opt[0] > print opt[1] > print opt[2] > print opt[4] > > Problems are, > 1, bounds i could not able to pass to the function as args( ). > 2, Whether implementation of the ineq. constraints are correct? any > better way? > 3, How to avoid bounds deviation? > > Please help me. > > Regards, > Prabhu > > Add whatever you love to the Yahoo! India homepage. Try now! > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From girishvs at gmail.com Tue Nov 3 16:28:21 2009 From: girishvs at gmail.com (Girish Venkatasubramanian) Date: Tue, 3 Nov 2009 13:28:21 -0800 Subject: [SciPy-User] GPG key issues when trying to install on RHEL Message-ID: Hello, I followed your instructions and copied the appropriate .repo file to /etc/yum.repos.d/ on my RHEL5 x86_64 machine. But when I try to install the python-numpy and python-scipy using yum, I get the following error Warning: rpmts_HdrFromFdno: Header V3 DSA signature: NOKEY, key ID eda8433f GPG key retrieval failed: [ Errno 4] IOError: I am trying to install via a proxy - and I have configured the yum.conf and set proxy_http and proxy_ftp directives. Any help is appreciated. Thanks From dwf at cs.toronto.edu Tue Nov 3 17:44:36 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Tue, 3 Nov 2009 17:44:36 -0500 Subject: [SciPy-User] GPG key issues when trying to install on RHEL In-Reply-To: References: Message-ID: On 3-Nov-09, at 4:28 PM, Girish Venkatasubramanian wrote: > Warning: rpmts_HdrFromFdno: Header V3 DSA signature: NOKEY, key ID > eda8433f > > GPG key retrieval failed: [ Errno 4] IOError: 'Connection timed out')> > > I am trying to install via a proxy - and I have configured the > yum.conf and set proxy_http and proxy_ftp directives. Hmm... You'd be better off asking Red Hat support (or possibly on a Fedora forum), the SciPy project doesn't maintain those package repositories. David From girishvs at gmail.com Tue Nov 3 18:09:17 2009 From: girishvs at gmail.com (Girish Venkatasubramanian) Date: Tue, 3 Nov 2009 15:09:17 -0800 Subject: [SciPy-User] GPG key issues when trying to install on RHEL In-Reply-To: References: Message-ID: Thanks David - but I managed to figure it out. The problem was that the key could not be retrieved because of proxy issues (even though the rpms themselves were being downloaded). So I downloaded the key (from the location in the .repo file) and imported it using rpm --import. After that, the installation went through OK Thanks. On Tue, Nov 3, 2009 at 2:44 PM, David Warde-Farley wrote: > On 3-Nov-09, at 4:28 PM, Girish Venkatasubramanian wrote: > >> Warning: rpmts_HdrFromFdno: Header V3 DSA signature: NOKEY, key ID >> eda8433f >> >> GPG key retrieval failed: [ Errno 4] IOError: > 'Connection timed out')> >> >> I am trying to install via a proxy - and I have configured the >> yum.conf and set proxy_http and proxy_ftp directives. > > > Hmm... You'd be better off asking Red Hat support (or possibly on a > Fedora forum), > the SciPy project doesn't maintain those package repositories. > > David > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Tue Nov 3 22:40:47 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 3 Nov 2009 22:40:47 -0500 Subject: [SciPy-User] fmin_slsqp- Bounds are not obeyed In-Reply-To: <7AE2F264-385C-4CC9-92CB-C486D6C05693@yale.edu> References: <807497.44749.qm@web8316.mail.in.yahoo.com> <7AE2F264-385C-4CC9-92CB-C486D6C05693@yale.edu> Message-ID: <1cd32cbb0911031940y2e2778b3o5a0867dd4942bfe6@mail.gmail.com> On Tue, Nov 3, 2009 at 3:09 PM, Zachary Pincus wrote: > Hello, > >> Problem is " Bounded with inequality constrains", in slssqp often >> Bounds are not obeyed, its deviates the bounds. So if deviates i >> made it to come back with in the region(bounds). But i face a >> problem in execution. i get error like, >> >> File "/usr/lib/python2.5/site-packages/scipy/optimize/slsqp.py", >> line 277, in fmin_slsqp >> ? ? c_ieq = array([ ieqcons[i](x) for i in range(len(ieqcons)) ]) >> ? File "/usr/lib/python2.5/site-packages/scipy/optimize/ >> optimize.py", line 97, in function_wrapper >> ? ? return function(x, *args) >> TypeError: () takes exactly 1 argument (2 given) > > I can't help with the bounds not being obeyed... but I can say that > you'll probably need to provide more detail about what you mean by > this for others to help though -- is it that the optimizer will > occasionally evaluate the objective outside of the bounds (which I > understand is normal?) or that the final results are out-of-bounds? > > Anyhow, the traceback explains exactly what the problem with the > execution is. You define your inequality constraint as: > con1 = lambda x:numpy.asarray(x[1]-x[0], x[2]-x[1]) > > but from the traceback, you can see that it is being called like: > function(x, *args) > > The error is quite clear on the problem: your lambda takes one > argument, but it is called with two. > > I assume that x is the current position, and args is just what you > passed to the slsqp. So you should rewrite con1 as lambda x, args: > whatever... > > Zach > > > > On Nov 3, 2009, at 2:58 PM, jagan prabhu wrote: > >> Dear users, >> >> Problem is " Bounded with inequality constrains", in slssqp often >> Bounds are not obeyed, its deviates the bounds. So if deviates i >> made it to come back with in the region(bounds). But i face a >> problem in execution. i get error like, >> >> File "/usr/lib/python2.5/site-packages/scipy/optimize/slsqp.py", >> line 277, in fmin_slsqp >> ? ? c_ieq = array([ ieqcons[i](x) for i in range(len(ieqcons)) ]) >> ? File "/usr/lib/python2.5/site-packages/scipy/optimize/ >> optimize.py", line 97, in function_wrapper >> ? ? return function(x, *args) >> TypeError: () takes exactly 1 argument (2 given) >> >> program will look like, >> >> >> #import os >> #import scipy.optimize >> from scipy import * >> import numpy >> from scipy import optimize >> from numpy import asarray >> from math import * >> >> >> def cst(aParams,bounds): >> ? aParams = numpy.asarray(aParams) >> ? for par in range(len(aParams)): >> ? ?if ((bounds[par][0]<= aParams[par]<= bounds[par][1])): >> ? ? ?pass >> ? ?else: >> ? ? ?if (aParams[par]< bounds[par][0]): aParams[par] = bounds[par][0] >> ? ? ?if (aParams[par]> bounds[par][1]): aParams[par] = bounds[par][1] >> >> ? x = aParams[0] >> ? y = aParams[1] >> ? z = aParams[2] >> # objective function >> ? eqn = -cos(x)*cos(y)*cos(z)*log(-((x-pi)**2-(y-pi)**2-(z-pi)**2)) >> ? return eqn >> >> >> #Initial guess >> Init = numpy.array([5.0,15.0,17.0]) # parameters x,y,z >> bounds = [(2.0, 20000.0),(4.0, 50000.0),(5.0, 60000.0)] >> # inequality constraints x must be least,y larger than x smaller >> than z,and z the largest of all >> con1 = lambda x:numpy.asarray(x[1]-x[0], x[2]-x[1]) >> >> >> opt = fmin_slsqp(cst,Init,ieqcons= [con1] , bounds=bounds, fprime = >> None, args=(bounds,), full_output=True, iter=20000, iprint=2, >> acc=0.001) >> >> print '****************************************' >> >> print opt[0] >> print opt[1] >> print opt[2] >> print opt[4] >> >> Problems are, >> 1, bounds i could not able to pass to the function as args( ). >> 2, Whether implementation of the ineq. constraints are correct? any >> better way? >> 3, How to avoid bounds deviation? >> >> Please help me. >> >> Regards, >> Prabhu >> According to the slsqp help inequality constraints are supposed to be a list of functions running your example with the args as mentioned by Zachary, produced a result that violated the second inequality changing to this con1 = [lambda x,args: x[1]-x[0], lambda x,args: x[2]-x[1]] opt = optimize.fmin_slsqp(cst, Init, ieqcons= con1 , bounds=bounds, fprime = None, args=(bounds,), full_output=True, iter=20000, iprint=2, acc=0.001) gives results with second ineq constraint binding: **************************************** [6.2838144667527311, 15.724071907665964, 15.724071907665964] -5.72459190155 6 Optimization terminated successfully. I don't know why in your example np.asarray(x[1]-x[0], x[2]-x[1]) doesn't raise an exception, the second argument to asarray is dtype, so this should be wrong. (missing []) >>> xx=opt[0] >>> xx [6.2838144667527311, 15.724071907665964, 15.724071907665964] >>> np.asarray(xx[1]-xx[0], xx[2]-xx[1]) array(9.440257440913232) There was also recently a nice example for slsqp on the mailing list. Josef From sebastian.walter at gmail.com Wed Nov 4 05:05:52 2009 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Wed, 4 Nov 2009 11:05:52 +0100 Subject: [SciPy-User] Wrapping C/C++ Code In-Reply-To: <22A68FE3-73C7-4AC0-BBAF-B38448AC9219@yale.edu> References: <4d5dd8c20911030723y4a98cc74v18ec4a7ceb81ce70@mail.gmail.com> <4AF05513.8080307@gmail.com> <22A68FE3-73C7-4AC0-BBAF-B38448AC9219@yale.edu> Message-ID: 1) I'd also use ctypes whenever possible. Numpy offers good builtin support to make it easy to call C/Fortran functions that expect pointers to arrays. There is a nice tutorial on http://www.scipy.org/Cookbook/Ctypes Unfortunately, this route only works for C and not for C++, so you would have to write a C interface to a C++ library. 2) I use boost::python to wrap existing C++ projects in a quite verbose way, e.g. in http://github.com/b45ch1/pyadolc/blob/master/adolc/src/py_adolc.hpp It works reasonably well when you know what you are doing and it's also quite flexible. The downside is the documentation, the long compilation times and the "magic" template implementation that is hard to understand. hope that helps a little, Sebastian On Tue, Nov 3, 2009 at 5:25 PM, Zachary Pincus wrote: > Another option instead of SWIG, if you have a reasonably stable C API > and a pre-built shared library exporting the same, is to use ctypes to > call into it. This works well enough with many numeric APIs, too where > you can allocate arrays with numpy, and then use the array's ctypes > attribute to get at a pointer to the memory suitable for passing into > the C code. > > The downside is that (as far as I know) there's no good way to build > pure-C libraries as part of a "python setup.py build" step (though > some functionality along these lines might be now in the numpy > distutuls?). > > Zach > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From jagan_cbe2003 at yahoo.co.in Wed Nov 4 05:16:43 2009 From: jagan_cbe2003 at yahoo.co.in (jagan prabhu) Date: Wed, 4 Nov 2009 15:46:43 +0530 (IST) Subject: [SciPy-User] fmin_slsqp- Bounds are not obeyed In-Reply-To: <7AE2F264-385C-4CC9-92CB-C486D6C05693@yale.edu> Message-ID: <396448.739.qm@web8314.mail.in.yahoo.com> Thank you for your answers, the program is working now referring to "outside bounds" optimizer occasionally evaluate the objective outside of the bounds (which are quite abnormal for my case?) For example: occasionally fmin_slsqp passes parameters like? -45.0,-12.0,16 which perfectly obeys inequality constrain, but out of bounds.. my code is quite sensitive, at any case it should stick with in bounds. Please help me. Is there any problem in my Bound syntax ? * i face out of bounds problem in all Constrained (multivariate) optimization methods. --- On Wed, 4/11/09, Zachary Pincus wrote: From: Zachary Pincus Subject: Re: [SciPy-User] fmin_slsqp- Bounds are not obeyed To: "SciPy Users List" Date: Wednesday, 4 November, 2009, 1:39 AM Hello, > Problem is " Bounded with inequality constrains", in slssqp often? > Bounds are not obeyed, its deviates the bounds. So if deviates i? > made it to come back with in the region(bounds). But i face a? > problem in execution. i get error like, > > File "/usr/lib/python2.5/site-packages/scipy/optimize/slsqp.py",? > line 277, in fmin_slsqp >? ???c_ieq = array([ ieqcons[i](x) for i in range(len(ieqcons)) ]) >???File "/usr/lib/python2.5/site-packages/scipy/optimize/ > optimize.py", line 97, in function_wrapper >? ???return function(x, *args) > TypeError: () takes exactly 1 argument (2 given) I can't help with the bounds not being obeyed... but I can say that? you'll probably need to provide more detail about what you mean by? this for others to help though -- is it that the optimizer will? occasionally evaluate the objective outside of the bounds (which I? understand is normal?) or that the final results are out-of-bounds? Anyhow, the traceback explains exactly what the problem with the? execution is. You define your inequality constraint as: con1 = lambda x:numpy.asarray(x[1]-x[0], x[2]-x[1]) but from the traceback, you can see that it is being called like: function(x, *args) The error is quite clear on the problem: your lambda takes one? argument, but it is called with two. I assume that x is the current position, and args is just what you? passed to the slsqp. So you should rewrite con1 as lambda x, args:? whatever... Zach On Nov 3, 2009, at 2:58 PM, jagan prabhu wrote: > Dear users, > > Problem is " Bounded with inequality constrains", in slssqp often? > Bounds are not obeyed, its deviates the bounds. So if deviates i? > made it to come back with in the region(bounds). But i face a? > problem in execution. i get error like, > > File "/usr/lib/python2.5/site-packages/scipy/optimize/slsqp.py",? > line 277, in fmin_slsqp >? ???c_ieq = array([ ieqcons[i](x) for i in range(len(ieqcons)) ]) >???File "/usr/lib/python2.5/site-packages/scipy/optimize/ > optimize.py", line 97, in function_wrapper >? ???return function(x, *args) > TypeError: () takes exactly 1 argument (2 given) > > program will look like, > > > #import os > #import scipy.optimize > from scipy import * > import numpy > from scipy import optimize > from numpy import asarray > from math import * > > > def cst(aParams,bounds): >???aParams = numpy.asarray(aParams) >???for par in range(len(aParams)): >? ? if ((bounds[par][0]<= aParams[par]<= bounds[par][1])): >? ? ? pass >? ? else: >? ? ? if (aParams[par]< bounds[par][0]): aParams[par] = bounds[par][0] >? ? ? if (aParams[par]> bounds[par][1]): aParams[par] = bounds[par][1] > >???x = aParams[0] >???y = aParams[1] >???z = aParams[2] > # objective function >???eqn = -cos(x)*cos(y)*cos(z)*log(-((x-pi)**2-(y-pi)**2-(z-pi)**2)) >???return eqn > > > #Initial guess > Init = numpy.array([5.0,15.0,17.0]) # parameters x,y,z > bounds = [(2.0, 20000.0),(4.0, 50000.0),(5.0, 60000.0)] > # inequality constraints x must be least,y larger than x smaller? > than z,and z the largest of all > con1 = lambda x:numpy.asarray(x[1]-x[0], x[2]-x[1]) > > > opt = fmin_slsqp(cst,Init,ieqcons= [con1] , bounds=bounds, fprime =? > None, args=(bounds,), full_output=True, iter=20000, iprint=2,? > acc=0.001) > > print '****************************************' > > print opt[0] > print opt[1] > print opt[2] > print opt[4] > > Problems are, > 1, bounds i could not able to pass to the function as args( ). > 2, Whether implementation of the ineq. constraints are correct? any? > better way? > 3, How to avoid bounds deviation? > > Please help me. > > Regards, > Prabhu > > Add whatever you love to the Yahoo! India homepage. Try now! > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user Yahoo! India has a new look. Take a sneak peek http://in.yahoo.com/trynew -------------- next part -------------- An HTML attachment was scrubbed... URL: From zunbeltz at gmail.com Wed Nov 4 07:10:26 2009 From: zunbeltz at gmail.com (Zunbeltz Izaola) Date: Wed, 04 Nov 2009 13:10:26 +0100 Subject: [SciPy-User] iterating in a timeseries Message-ID: <1257336626.13402.3.camel@mineat2.hmi.de> Hi, I am using scikits.timeseries. I have a time series with daily frequency, with data of 2 years. I have data almost every day, but some days are missing. I want to iterate over the timeseries to get the values of the first and last day of each month. I had tried to convert freq to 'M' and other things, but I can not find an easy way. Any idea? TIA, Zunbeltz From kmichael.aye at googlemail.com Wed Nov 4 08:09:46 2009 From: kmichael.aye at googlemail.com (Michael Aye) Date: Wed, 4 Nov 2009 05:09:46 -0800 (PST) Subject: [SciPy-User] Troubles with odeint or ode In-Reply-To: References: <20091103185905.M2354@unife.it> Message-ID: <13a85e27-9a1e-4768-9cb5-9c9b2dd777ea@k17g2000yqh.googlegroups.com> I think this example is worth to be kept in the cookbook, what do you guys think? It is showing how to do the oscillator modelling and can show the 'danger' of lack of complex support of odeint. Just my feeling it could be quite helpful. I myself certainly copied this into my treasure of worth-to-remember evernotes.. ;) Regards, Michael On Nov 3, 8:28?pm, Anne Archibald wrote: > 2009/11/3 ANDREA ARMAROLI : > > > Dear users, > > > I'm trying to solve an ODE system that models a parametric oscillator with > > complex amplitudes at two freqencies. > > > I'm new to python. This problem is simply solved in matlab using ode45. > > > If I try to use odeint I cannot set parameters like tolerances or max step. > > You can in fact control these using the (optional) parameters hmax, > rtol, and atol. > > > The result is constant-valued solutions. > > After a bit of experimentation (in particular, a print statement > inside your derivative function) it turns out that the problem is that > odeint does not support complex values (it silently discards imaginary > parts). This is not a major obstacle, since you can just pack and > unpack the values yourself. When I do that, the plot I get is two > oscillatory results (the absolute values), and one nice flat line (for > the total, which I'm guessing you need to be conserved). > > I have to say, it would be very good if odeint either reported an > error or worked with complex values, so you would have found this > easier to track down. But it looks like it does a reasonable job of > solving your problem once you work around its lack of complex support. > > Anne > > > > > Trying with ode class and ZVODE integrator, I have that total intensity is not > > conserved. I have weird quasi-periodic oscillations. > > > I do know that my equations have excess degrees of freedom and I know from > > symmetries what the integrals of motion are, but in Matlab this works pretty well. > > > Here is the code with odeint > > > import numpy as N > > import scipy as S > > import scipy.integrate > > import pylab as P > > > def deriv(Y,t): > > > ? ?A1 = Y[0] > > ? ?A2 = Y[1] > > ? ?A1d = 1j*A2*N.conj(A1) > > ? ?A2d = 1j*(A1**2/2.0 + dk*A2) > > ? ?return [A1d, A2d] > > > nplot = 10000 > > zmax = 10.0 > > zstep = zmax/nplot > > > dk = 0.5 # normalised detuning/dispersion > > phi0 = 0.0*N.pi # initial dephasing > > eta0 = 0.3 # initial pump intensity > > > # initial values > > u20 = N.sqrt(eta0)*N.exp(1j*phi0) > > u10 = N.sqrt(2*(1-eta0)) > > > H0=dk*eta0+2*N.sqrt(eta0)*(1-eta0)*N.cos(phi0); > > > Y,info = > > scipy.integrate.odeint(deriv,[u10,u20],N.arange(0,zmax+zstep,zstep),full_ou tput=True,printmessg > > = True) > > # are conserved quantities constant? > > Ptot = N.abs(Y[:,0])**2/2+ N.abs(Y[:,1])**2 > > P.plot(N.arange(0,zmax+zstep-0.0001,zstep),Ptot) > > # then Hamiltonian... > > #... > > > p1 ?= P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,0])) > > p2 ?= P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,1])) > > > Thank you very much for your help. > > > Andrea Armaroli > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-U... at scipy.org > >http://mail.scipy.org/mailman/listinfo/scipy-user > > > > ?foo.py > 1KViewDownload > > _______________________________________________ > SciPy-User mailing list > SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user From dave.hirschfeld at gmail.com Wed Nov 4 08:49:27 2009 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Wed, 4 Nov 2009 13:49:27 +0000 (UTC) Subject: [SciPy-User] iterating in a timeseries References: <1257336626.13402.3.camel@mineat2.hmi.de> Message-ID: Zunbeltz Izaola gmail.com> writes: > > Hi, > > I am using scikits.timeseries. > > I have a time series with daily frequency, with data of 2 years. I have > data almost every day, but some days are missing. > > I want to iterate over the timeseries to get the values of the first and > last day of each month. I had tried to convert freq to 'M' and other > things, but I can not find an easy way. Any idea? > > TIA, > > Zunbeltz > Does this do what you're looking for? from numpy.random import randint import scikits.timeseries as ts dates = ts.date_array(ts.Date('D','01-Jan-2009'),ts.Date('D','31-Dec-2011')) series = ts.time_series(dates.day,dates) monthly_series = series.convert('M') ts.first_unmasked_val(monthly_series,axis=1) ts.last_unmasked_val(monthly_series,axis=1) # Example with missing data series[randint(0,series.size,512)] = ma.masked monthly_series = series.convert('M') ts.first_unmasked_val(monthly_series,axis=1) ts.last_unmasked_val(monthly_series,axis=1) HTH, Dave From Jim.Vickroy at noaa.gov Wed Nov 4 10:03:39 2009 From: Jim.Vickroy at noaa.gov (Jim Vickroy) Date: Wed, 04 Nov 2009 08:03:39 -0700 Subject: [SciPy-User] Troubles with odeint or ode In-Reply-To: <13a85e27-9a1e-4768-9cb5-9c9b2dd777ea@k17g2000yqh.googlegroups.com> References: <20091103185905.M2354@unife.it> <13a85e27-9a1e-4768-9cb5-9c9b2dd777ea@k17g2000yqh.googlegroups.com> Message-ID: <4AF197CB.2040402@noaa.gov> Michael Aye wrote: > I think this example is worth to be kept in the cookbook, what do you > guys think? > +1 I thought the same but did not speak up. -- jv > It is showing how to do the oscillator modelling and can show the > 'danger' of lack of complex support of odeint. > > Just my feeling it could be quite helpful. > I myself certainly copied this into my treasure of worth-to-remember > evernotes.. ;) > > Regards, > Michael > > On Nov 3, 8:28 pm, Anne Archibald wrote: > >> 2009/11/3 ANDREA ARMAROLI : >> >> >>> Dear users, >>> >>> I'm trying to solve an ODE system that models a parametric oscillator with >>> complex amplitudes at two freqencies. >>> >>> I'm new to python. This problem is simply solved in matlab using ode45. >>> >>> If I try to use odeint I cannot set parameters like tolerances or max step. >>> >> You can in fact control these using the (optional) parameters hmax, >> rtol, and atol. >> >> >>> The result is constant-valued solutions. >>> >> After a bit of experimentation (in particular, a print statement >> inside your derivative function) it turns out that the problem is that >> odeint does not support complex values (it silently discards imaginary >> parts). This is not a major obstacle, since you can just pack and >> unpack the values yourself. When I do that, the plot I get is two >> oscillatory results (the absolute values), and one nice flat line (for >> the total, which I'm guessing you need to be conserved). >> >> I have to say, it would be very good if odeint either reported an >> error or worked with complex values, so you would have found this >> easier to track down. But it looks like it does a reasonable job of >> solving your problem once you work around its lack of complex support. >> >> Anne >> >> >> >> >>> Trying with ode class and ZVODE integrator, I have that total intensity is not >>> conserved. I have weird quasi-periodic oscillations. >>> >>> I do know that my equations have excess degrees of freedom and I know from >>> symmetries what the integrals of motion are, but in Matlab this works pretty well. >>> >>> Here is the code with odeint >>> >>> import numpy as N >>> import scipy as S >>> import scipy.integrate >>> import pylab as P >>> >>> def deriv(Y,t): >>> >>> A1 = Y[0] >>> A2 = Y[1] >>> A1d = 1j*A2*N.conj(A1) >>> A2d = 1j*(A1**2/2.0 + dk*A2) >>> return [A1d, A2d] >>> >>> nplot = 10000 >>> zmax = 10.0 >>> zstep = zmax/nplot >>> >>> dk = 0.5 # normalised detuning/dispersion >>> phi0 = 0.0*N.pi # initial dephasing >>> eta0 = 0.3 # initial pump intensity >>> >>> # initial values >>> u20 = N.sqrt(eta0)*N.exp(1j*phi0) >>> u10 = N.sqrt(2*(1-eta0)) >>> >>> H0=dk*eta0+2*N.sqrt(eta0)*(1-eta0)*N.cos(phi0); >>> >>> Y,info = >>> scipy.integrate.odeint(deriv,[u10,u20],N.arange(0,zmax+zstep,zstep),full_ou tput=True,printmessg >>> = True) >>> # are conserved quantities constant? >>> Ptot = N.abs(Y[:,0])**2/2+ N.abs(Y[:,1])**2 >>> P.plot(N.arange(0,zmax+zstep-0.0001,zstep),Ptot) >>> # then Hamiltonian... >>> #... >>> >>> p1 = P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,0])) >>> p2 = P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,1])) >>> >>> Thank you very much for your help. >>> >>> Andrea Armaroli >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-U... at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> >> foo.py >> 1KViewDownload >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From peridot.faceted at gmail.com Wed Nov 4 10:11:34 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Wed, 4 Nov 2009 10:11:34 -0500 Subject: [SciPy-User] Troubles with odeint or ode In-Reply-To: <4AF197CB.2040402@noaa.gov> References: <20091103185905.M2354@unife.it> <13a85e27-9a1e-4768-9cb5-9c9b2dd777ea@k17g2000yqh.googlegroups.com> <4AF197CB.2040402@noaa.gov> Message-ID: 2009/11/4 Jim Vickroy : > Michael Aye wrote: > > I think this example is worth to be kept in the cookbook, what do you > guys think? In the short term it's probably worth putting something like this in the cookbook (though don't use the OP's code without permission!) but really the solution is to fix odeint (which shouldn't be too difficult). I put a ticket in Trac, but haven't had time to actually fix the problem yet. Anne > +1 > > I thought the same but did not speak up. > -- jv > > It is showing how to do the oscillator modelling and can show the > 'danger' of lack of complex support of odeint. > > Just my feeling it could be quite helpful. > I myself certainly copied this into my treasure of worth-to-remember > evernotes.. ;) > > Regards, > Michael > > On Nov 3, 8:28?pm, Anne Archibald wrote: > > > 2009/11/3 ANDREA ARMAROLI : > > > > Dear users, > > > I'm trying to solve an ODE system that models a parametric oscillator with > complex amplitudes at two freqencies. > > > I'm new to python. This problem is simply solved in matlab using ode45. > > > If I try to use odeint I cannot set parameters like tolerances or max step. > > > You can in fact control these using the (optional) parameters hmax, > rtol, and atol. > > > > The result is constant-valued solutions. > > > After a bit of experimentation (in particular, a print statement > inside your derivative function) it turns out that the problem is that > odeint does not support complex values (it silently discards imaginary > parts). This is not a major obstacle, since you can just pack and > unpack the values yourself. When I do that, the plot I get is two > oscillatory results (the absolute values), and one nice flat line (for > the total, which I'm guessing you need to be conserved). > > I have to say, it would be very good if odeint either reported an > error or worked with complex values, so you would have found this > easier to track down. But it looks like it does a reasonable job of > solving your problem once you work around its lack of complex support. > > Anne > > > > > > Trying with ode class and ZVODE integrator, I have that total intensity is > not > conserved. I have weird quasi-periodic oscillations. > > > I do know that my equations have excess degrees of freedom and I know from > symmetries what the integrals of motion are, but in Matlab this works pretty > well. > > > Here is the code with odeint > > > import numpy as N > import scipy as S > import scipy.integrate > import pylab as P > > > def deriv(Y,t): > > > ? ?A1 = Y[0] > ? ?A2 = Y[1] > ? ?A1d = 1j*A2*N.conj(A1) > ? ?A2d = 1j*(A1**2/2.0 + dk*A2) > ? ?return [A1d, A2d] > > > nplot = 10000 > zmax = 10.0 > zstep = zmax/nplot > > > dk = 0.5 # normalised detuning/dispersion > phi0 = 0.0*N.pi # initial dephasing > eta0 = 0.3 # initial pump intensity > > > # initial values > u20 = N.sqrt(eta0)*N.exp(1j*phi0) > u10 = N.sqrt(2*(1-eta0)) > > > H0=dk*eta0+2*N.sqrt(eta0)*(1-eta0)*N.cos(phi0); > > > Y,info = > scipy.integrate.odeint(deriv,[u10,u20],N.arange(0,zmax+zstep,zstep),full_ou > tput=True,printmessg > = True) > # are conserved quantities constant? > Ptot = N.abs(Y[:,0])**2/2+ N.abs(Y[:,1])**2 > P.plot(N.arange(0,zmax+zstep-0.0001,zstep),Ptot) > # then Hamiltonian... > #... > > > p1 ?= P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,0])) > p2 ?= P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,1])) > > > Thank you very much for your help. > > > Andrea Armaroli > > > _______________________________________________ > SciPy-User mailing list > SciPy-U... at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > ?foo.py > 1KViewDownload > > _______________________________________________ > SciPy-User mailing list > SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From jsseabold at gmail.com Wed Nov 4 20:25:12 2009 From: jsseabold at gmail.com (Skipper Seabold) Date: Wed, 4 Nov 2009 20:25:12 -0500 Subject: [SciPy-User] Reshaping Question Message-ID: My brain is failing me. Is there a clean way to reshape an array like the following? import numpy as np c = np.arange(16).reshape(4, 2, 2) In [209]: c Out[209]: array([[[ 0, 1], [ 2, 3]], [[ 4, 5], [ 6, 7]], [[ 8, 9], [10, 11]], [[12, 13], [14, 15]]]) So that c == d where d = np.array(([0, 1, 4, 5], [2,3,6,7], [8,9,12,13], [10, 11, 14, 15])) In [211]: d Out[211]: array([[ 0, 1, 4, 5], [ 2, 3, 6, 7], [ 8, 9, 12, 13], [10, 11, 14, 15]]) Cheers, Skipper From dwf at cs.toronto.edu Wed Nov 4 20:08:35 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Wed, 4 Nov 2009 20:08:35 -0500 Subject: [SciPy-User] Reshaping Question In-Reply-To: References: Message-ID: <20091105010834.GA15007@rodimus> Hi Skipper, No, I don't believe so. The reason is that NumPy arrays have to obey constant stride along each dimension. Assuming dtype is int32, the reshaping you describe (assuming you want to reshape c into d) would require the stride along dim 2 to be 4 bytes to get from 0 to 1, and then 12 bytes to get to 4, and then 4 bytes again to get to 5. This isn't legal, you'd have to do a copy to construct this matrix. David On Wed, Nov 04, 2009 at 08:25:12PM -0500, Skipper Seabold wrote: > My brain is failing me. Is there a clean way to reshape an array like > the following? > > import numpy as np > > c = np.arange(16).reshape(4, 2, 2) > > In [209]: c > Out[209]: > array([[[ 0, 1], > [ 2, 3]], > > [[ 4, 5], > [ 6, 7]], > > [[ 8, 9], > [10, 11]], > > [[12, 13], > [14, 15]]]) > > So that c == d where > > d = np.array(([0, 1, 4, 5], [2,3,6,7], [8,9,12,13], [10, 11, 14, 15])) > > In [211]: d > Out[211]: > array([[ 0, 1, 4, 5], > [ 2, 3, 6, 7], > [ 8, 9, 12, 13], > [10, 11, 14, 15]]) > > Cheers, > > Skipper > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From peridot.faceted at gmail.com Wed Nov 4 21:05:54 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Wed, 4 Nov 2009 21:05:54 -0500 Subject: [SciPy-User] Reshaping Question In-Reply-To: <20091105010834.GA15007@rodimus> References: <20091105010834.GA15007@rodimus> Message-ID: 2009/11/4 David Warde-Farley : > Hi Skipper, > > No, I don't believe so. The reason is that NumPy arrays have to obey constant stride along each dimension. Assuming > dtype is int32, the reshaping you describe (assuming you want to reshape c into d) would require the stride along dim > 2 to be 4 bytes to get from 0 to 1, and then 12 bytes to get to 4, and then 4 bytes again to get to 5. This isn't > legal, you'd have to do a copy to construct this matrix. Reshape sometimes creates copies. It tries hard not to, and if you assign the shape attribute rather than calling reshape it won't ever make a copy, but if necessary reshape will copy the input array: In [42]: np.transpose(c.reshape(2,2,2,2),(0,2,1,3)).reshape(4,4)Out[42]: array([[ 0, 1, 4, 5], [ 2, 3, 6, 7], [ 8, 9, 12, 13], [10, 11, 14, 15]]) The trick is to use transpose to do an arbitrary permutation of the input axes, and also to rearrange the first axis with an additional reshape. Anne > David > > On Wed, Nov 04, 2009 at 08:25:12PM -0500, Skipper Seabold wrote: >> My brain is failing me. ?Is there a clean way to reshape an array like >> the following? >> >> import numpy as np >> >> c = np.arange(16).reshape(4, 2, 2) >> >> In [209]: c >> Out[209]: >> array([[[ 0, ?1], >> ? ? ? ? [ 2, ?3]], >> >> ? ? ? ?[[ 4, ?5], >> ? ? ? ? [ 6, ?7]], >> >> ? ? ? ?[[ 8, ?9], >> ? ? ? ? [10, 11]], >> >> ? ? ? ?[[12, 13], >> ? ? ? ? [14, 15]]]) >> >> So that c == d where >> >> d = np.array(([0, 1, 4, 5], [2,3,6,7], [8,9,12,13], [10, 11, 14, 15])) >> >> In [211]: d >> Out[211]: >> array([[ 0, ?1, ?4, ?5], >> ? ? ? ?[ 2, ?3, ?6, ?7], >> ? ? ? ?[ 8, ?9, 12, 13], >> ? ? ? ?[10, 11, 14, 15]]) >> >> Cheers, >> >> Skipper >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From dwf at cs.toronto.edu Wed Nov 4 22:12:18 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Wed, 4 Nov 2009 22:12:18 -0500 Subject: [SciPy-User] Reshaping Question In-Reply-To: References: <20091105010834.GA15007@rodimus> Message-ID: <38D1783F-AF6F-4915-B597-569CE49707A4@cs.toronto.edu> On 4-Nov-09, at 9:05 PM, Anne Archibald wrote: > Reshape sometimes creates copies. It tries hard not to, and if you > assign the shape attribute rather than calling reshape it won't ever > make a copy, but if necessary reshape will copy the input array: > > In [42]: np.transpose(c.reshape(2,2,2,2), > (0,2,1,3)).reshape(4,4)Out[42]: > array([[ 0, 1, 4, 5], > [ 2, 3, 6, 7], > [ 8, 9, 12, 13], > [10, 11, 14, 15]]) > > The trick is to use transpose to do an arbitrary permutation of the > input axes, and also to rearrange the first axis with an additional > reshape. D'oh. When he said reshape I was thinking purely in terms of what could be done with .reshape(). I didn't even think about .transpose(). Is it then the .transpose() call that triggers the copy in this situation? David From peridot.faceted at gmail.com Wed Nov 4 22:49:57 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Wed, 4 Nov 2009 22:49:57 -0500 Subject: [SciPy-User] Reshaping Question In-Reply-To: <38D1783F-AF6F-4915-B597-569CE49707A4@cs.toronto.edu> References: <20091105010834.GA15007@rodimus> <38D1783F-AF6F-4915-B597-569CE49707A4@cs.toronto.edu> Message-ID: 2009/11/4 David Warde-Farley : > > On 4-Nov-09, at 9:05 PM, Anne Archibald wrote: > >> Reshape sometimes creates copies. It tries hard not to, and if you >> assign the shape attribute rather than calling reshape it won't ever >> make a copy, but if necessary reshape will copy the input array: >> >> In [42]: np.transpose(c.reshape(2,2,2,2), >> (0,2,1,3)).reshape(4,4)Out[42]: >> array([[ 0, ?1, ?4, ?5], >> ? ? ? [ 2, ?3, ?6, ?7], >> ? ? ? [ 8, ?9, 12, 13], >> ? ? ? [10, 11, 14, 15]]) >> >> The trick is to use transpose to do an arbitrary permutation of the >> input axes, and also to rearrange the first axis with an additional >> reshape. > > D'oh. When he said reshape I was thinking purely in terms of what > could be done with .reshape(). I didn't even think about .transpose(). > > Is it then the .transpose() call that triggers the copy in this > situation? No, transpose() never needs to copy. It's the reshape. In [3]: a = np.arange(6) In [4]: a.shape = (2,3) Here a's shape can be changed without copying the data, so the assignment works. In [5]: b = a.T In [6]: b.reshape(6) Out[6]: array([0, 3, 1, 4, 2, 5]) Here b has been reshaped using the method, which returns a new array that has copied the underlying data. In [7]: b.shape = 6, --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /home/peridot/ in () AttributeError: incompatible shape for a non-contiguous array This didn't work because assignment to the shape attribute is not allowed to copy the data, and there's no way for this reshape to work without copying the data. The error message is misleading, because there are a number of copy-less rearrangements that are possible even with non-contiguous arrays: In [9]: c = np.arange(12)[::2] In [10]: c.shape = (2,3) In [11]: c.shape = (3,2) IIRC there are still a few rearrangements that can in principle be done without copying the data that numpy doesn't recognize, but it's fairly good at avoiding copies. This is not necessarily a good thing, since it can mean that users expect reshape() never to copy data and then become surprised when they fail to get a view when handed some array whose strides are especially peculiar. So I think the best rule is, if you want a view, always assign to the shape attribute. Anne From jsseabold at gmail.com Wed Nov 4 22:56:10 2009 From: jsseabold at gmail.com (Skipper Seabold) Date: Wed, 4 Nov 2009 22:56:10 -0500 Subject: [SciPy-User] Reshaping Question In-Reply-To: References: <20091105010834.GA15007@rodimus> Message-ID: On Wed, Nov 4, 2009 at 9:05 PM, Anne Archibald wrote: > 2009/11/4 David Warde-Farley : >> Hi Skipper, >> >> No, I don't believe so. The reason is that NumPy arrays have to obey constant stride along each dimension. Assuming >> dtype is int32, the reshaping you describe (assuming you want to reshape c into d) would require the stride along dim >> 2 to be 4 bytes to get from 0 to 1, and then 12 bytes to get to 4, and then 4 bytes again to get to 5. This isn't >> legal, you'd have to do a copy to construct this matrix. Ah ok. This makes sense, and is kind of why I thought I couldn't do what I wanted as easily, as I'd like. > > Reshape sometimes creates copies. It tries hard not to, and if you > assign the shape attribute rather than calling reshape it won't ever > make a copy, but if necessary reshape will copy the input array: > > In [42]: np.transpose(c.reshape(2,2,2,2),(0,2,1,3)).reshape(4,4)Out[42]: > array([[ 0, ?1, ?4, ?5], > ? ? ? [ 2, ?3, ?6, ?7], > ? ? ? [ 8, ?9, 12, 13], > ? ? ? [10, 11, 14, 15]]) > > The trick is to use transpose to do an arbitrary permutation of the > input axes, and also to rearrange the first axis with an additional > reshape. > > Anne > This makes sense as well. This is kind of what I was looking for I just couldn't figure out the permutation. I was trying to roll the axes, though I guess this could still work if you add the extra axis. I don't know if I'd use this in the end though, as it might sacrifice too much readability in the code, but maybe that's just me... What if I had the outermost container as a list? Say, c = [np.arange(4).reshape(2,2),np.arange(4,8).reshape(2,2),np.arange(8,12).reshape(2,2),np.arange(12,16).reshape(2,2)] I seem to be running into much the same problems trying to use list comprehension to end up with d. It seems like I'm going to need a copy anyway, so maybe I'd be better off just allocating a new array and filling it up transparently? Skipper >> David >> >> On Wed, Nov 04, 2009 at 08:25:12PM -0500, Skipper Seabold wrote: >>> My brain is failing me. ?Is there a clean way to reshape an array like >>> the following? >>> >>> import numpy as np >>> >>> c = np.arange(16).reshape(4, 2, 2) >>> >>> In [209]: c >>> Out[209]: >>> array([[[ 0, ?1], >>> ? ? ? ? [ 2, ?3]], >>> >>> ? ? ? ?[[ 4, ?5], >>> ? ? ? ? [ 6, ?7]], >>> >>> ? ? ? ?[[ 8, ?9], >>> ? ? ? ? [10, 11]], >>> >>> ? ? ? ?[[12, 13], >>> ? ? ? ? [14, 15]]]) >>> >>> So that c == d where >>> >>> d = np.array(([0, 1, 4, 5], [2,3,6,7], [8,9,12,13], [10, 11, 14, 15])) >>> >>> In [211]: d >>> Out[211]: >>> array([[ 0, ?1, ?4, ?5], >>> ? ? ? ?[ 2, ?3, ?6, ?7], >>> ? ? ? ?[ 8, ?9, 12, 13], >>> ? ? ? ?[10, 11, 14, 15]]) >>> >>> Cheers, >>> >>> Skipper >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From jsseabold at gmail.com Wed Nov 4 23:42:57 2009 From: jsseabold at gmail.com (Skipper Seabold) Date: Wed, 4 Nov 2009 23:42:57 -0500 Subject: [SciPy-User] Reshaping Question In-Reply-To: References: <20091105010834.GA15007@rodimus> Message-ID: On Wed, Nov 4, 2009 at 10:56 PM, Skipper Seabold wrote: >> >> Reshape sometimes creates copies. It tries hard not to, and if you >> assign the shape attribute rather than calling reshape it won't ever >> make a copy, but if necessary reshape will copy the input array: >> >> In [42]: np.transpose(c.reshape(2,2,2,2),(0,2,1,3)).reshape(4,4)Out[42]: >> array([[ 0, ?1, ?4, ?5], >> ? ? ? [ 2, ?3, ?6, ?7], >> ? ? ? [ 8, ?9, 12, 13], >> ? ? ? [10, 11, 14, 15]]) >> >> The trick is to use transpose to do an arbitrary permutation of the >> input axes, and also to rearrange the first axis with an additional >> reshape. >> >> Anne >> > > This makes sense as well. ?This is kind of what I was looking for I > just couldn't figure out the permutation. ?I was trying to roll the > axes, though I guess this could still work if you add the extra axis. > > I don't know if I'd use this in the end though, as it might sacrifice > too much readability in the code, but maybe that's just me... > The more I think about it, this is actually pretty elegant. I'm always going to have a c (it's really a Hessian from a multinomial logit) that's J**2*K x K, so I can just replace (2,2,2,2) with (J,J,K,K) and (4,4) with (J * K, J * K), and I think it's still pretty clear. Thanks! Skipper From matthew.brett at gmail.com Thu Nov 5 03:04:07 2009 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 5 Nov 2009 00:04:07 -0800 Subject: [SciPy-User] loading mat file in scipy In-Reply-To: <63321.93.37.128.116.1256227162.squirrel@webmail.sissa.it> References: <63321.93.37.128.116.1256227162.squirrel@webmail.sissa.it> Message-ID: <1e2af89e0911050004u5bb11942l9d28f8023963674b@mail.gmail.com> Hi, > I am sure the cause of ?slow loading of your file and that of mine are the > same. It took ~56sec on my computer to load your data into the python. If you are still interested in this problem, please consider taking a look at a branch I'm working on: git clone git://github.com/matthew-brett/scipy-work.git scipy-mb cd scipy-mb git checkout mio-optimization # make (largish) cython .c file cython scipy/io/matlab/mio5_utils.pyx python setup.py install It's about three times faster for loading Robin's file, at least (10s on my laptop). Best, Matthew From zunbeltz at gmail.com Thu Nov 5 03:45:32 2009 From: zunbeltz at gmail.com (Zunbeltz Izaola) Date: Thu, 05 Nov 2009 09:45:32 +0100 Subject: [SciPy-User] iterating in a timeseries In-Reply-To: References: <1257336626.13402.3.camel@mineat2.hmi.de> Message-ID: <1257410732.13402.5.camel@mineat2.hmi.de> On Wed, 2009-11-04 at 13:49 +0000, Dave Hirschfeld wrote: > Zunbeltz Izaola gmail.com> writes: > > > > > Hi, > > > > I am using scikits.timeseries. > > > > I have a time series with daily frequency, with data of 2 years. I have > > data almost every day, but some days are missing. > > > > I want to iterate over the timeseries to get the values of the first and > > last day of each month. I had tried to convert freq to 'M' and other > > things, but I can not find an easy way. Any idea? > > > > TIA, > > > > Zunbeltz > > > > > Does this do what you're looking for? > Thanks, It works perfectly, Zunbeltz > from numpy.random import randint > import scikits.timeseries as ts > dates = ts.date_array(ts.Date('D','01-Jan-2009'),ts.Date('D','31-Dec-2011')) > series = ts.time_series(dates.day,dates) > monthly_series = series.convert('M') > ts.first_unmasked_val(monthly_series,axis=1) > ts.last_unmasked_val(monthly_series,axis=1) > > # Example with missing data > series[randint(0,series.size,512)] = ma.masked > monthly_series = series.convert('M') > ts.first_unmasked_val(monthly_series,axis=1) > ts.last_unmasked_val(monthly_series,axis=1) > > HTH, > Dave > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From rmrndr at unife.it Thu Nov 5 05:17:15 2009 From: rmrndr at unife.it (ANDREA ARMAROLI) Date: Thu, 5 Nov 2009 11:17:15 +0100 Subject: [SciPy-User] Troubles with odeint or ode In-Reply-To: <4AF197CB.2040402@noaa.gov> References: <20091103185905.M2354@unife.it> <13a85e27-9a1e-4768-9cb5-9c9b2dd777ea@k17g2000yqh.googlegroups.com> <4AF197CB.2040402@noaa.gov> Message-ID: <20091105101716.M4638@unife.it> Hi guys, I think it is worth placing in the cookbook. If you like, I can provide a few details of the framework it comes from. Thank you VERY much for the quick response. Andrea. ---------- Original Message ----------- From: Jim Vickroy To: SciPy Users List Sent: Wed, 04 Nov 2009 08:03:39 -0700 Subject: Re: [SciPy-User] Troubles with odeint or ode > Michael Aye wrote:I think this example is worth to be kept in the cookbook, what do you guys think? +1 > > I thought the same but did not speak up. > -- jv > It is showing how to do the oscillator modelling and can show the 'danger' of lack of complex support of odeint. Just my feeling it could be quite helpful. I myself certainly copied this into my treasure of worth-to-remember evernotes.. ;) Regards, Michael On Nov 3, 8:28?pm, Anne Archibald wrote: 2009/11/3 ANDREA ARMAROLI : Dear users, I'm trying to solve an ODE system that models a parametric oscillator with complex amplitudes at two freqencies. I'm new to python. This problem is simply solved in matlab using ode45. If I try to use odeint I cannot set parameters like tolerances or max step. You can in fact control these using the (optional) parameters hmax, rtol, and atol. The result is constant-valued solutions. After a bit of experimentation (in particular, a print statement inside your derivative function) it turns out that the problem is that odeint does not support complex values (it silently discards imaginary parts). This is not a major obstacle, since you can just pack and unpack the values yourself. When I do that, the plot I get is two oscillatory results (the absolute values), and one nice flat line (for the total, which I'm guessing you need to be conserved). I have to say, it would be very good if odeint either reported an error or worked with complex values, so you would have found this easier to track down. But it looks like it does a reasonable job of solving your problem once you work around its lack of complex support. Anne Trying with ode class and ZVODE integrator, I have that total intensity is not conserved. I have weird quasi-periodic oscillations. I do know that my equations have excess degrees of freedom and I know from symmetries what the integrals of motion are, but in Matlab this works pretty well. Here is the code with odeint import numpy as N import scipy as S import scipy.integrate import pylab as P def deriv(Y,t): ? ?A1 = Y[0] ? ?A2 = Y[1] ? ?A1d = 1j*A2*N.conj(A1) ? ?A2d = 1j*(A1**2/2.0 + dk*A2) ? ?return [A1d, A2d] nplot = 10000 zmax = 10.0 zstep = zmax/nplot dk = 0.5 # normalised detuning/dispersion phi0 = 0.0*N.pi # initial dephasing eta0 = 0.3 # initial pump intensity # initial values u20 = N.sqrt(eta0)*N.exp(1j*phi0) u10 = N.sqrt(2*(1-eta0)) H0=dk*eta0+2*N.sqrt(eta0)*(1-eta0)*N.cos(phi0); Y,info = scipy.integrate.odeint(deriv,[u10,u20],N.arange(0,zmax+zstep,zstep),full_ou tput=True,printmessg = True) # are conserved quantities constant? Ptot = N.abs(Y[:,0])**2/2+ N.abs(Y[:,1])**2 P.plot(N.arange(0,zmax+zstep-0.0001,zstep),Ptot) # then Hamiltonian... #... p1 ?= P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,0])) p2 ?= P.plot(N.arange(0,zmax+zstep-0.0001,zstep),N.abs(Y[:,1])) Thank you very much for your help. Andrea Armaroli _______________________________________________ SciPy-User mailing list SciPy-U... at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user ?foo.py 1KViewDownload _______________________________________________ SciPy-User mailing list SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user ------- End of Original Message ------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaston.fiore at gmail.com Thu Nov 5 13:37:13 2009 From: gaston.fiore at gmail.com (Gaston Fiore) Date: Thu, 5 Nov 2009 13:37:13 -0500 Subject: [SciPy-User] SciPy installation problems Message-ID: <51dfb45c0911051037k58543095v3bc93784f6f8a41e@mail.gmail.com> Hello, I'm trying to install scipy from the SVN repository but I'm getting an error. I've also discovered that numpy doesn't pass the built-in unit tests, although it did seem to install without problems. Below is all the information that I think it's needed to determine the problem and its solution. Is it the fact that UMFPACK is missing? Thanks a lot, -Gaston gbrain2:~ gafiore$ sw_vers ProductName: Mac OS X ProductVersion: 10.5.8 BuildVersion: 9L30 gbrain2:~ gafiore$ gcc --version i686-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5465) Copyright (C) 2005 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. gbrain2:~ gafiore$ gfortran --version GNU Fortran (GCC) 4.2.3 Copyright (C) 2007 Free Software Foundation, Inc. GNU Fortran comes with NO WARRANTY, to the extent permitted by law. You may redistribute copies of GNU Fortran under the terms of the GNU General Public License. For more information about these matters, see the file named COPYING >>> import numpy >>> numpy.test('1','10') Traceback (most recent call last): File "", line 1, in File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/__init__.py", line 88, in test return NumpyTest().testall(level, verbosity) File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/testing/numpytest.py", line 576, in testall for mthname in self._get_method_names(obj,abs(level)): TypeError: bad operand type for abs(): 'str' dhcp-0011377089-64-8f:scipy gafiore$ python setup.py build Warning: No configuration returned, assuming unavailable. blas_opt_info: FOUND: extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] define_macros = [('NO_ATLAS_INFO', 3)] extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers'] lapack_opt_info: FOUND: extra_link_args = ['-Wl,-framework', '-Wl,Accelerate'] define_macros = [('NO_ATLAS_INFO', 3)] extra_compile_args = ['-msse3'] umfpack_info: libraries umfpack not found in /System/Library/Frameworks/Python.framework/Versions/2.5/lib libraries umfpack not found in /usr/local/lib libraries umfpack not found in /usr/lib libraries umfpack not found in /opt/local/lib /System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/distutils/system_info.py:401: UserWarning: UMFPACK sparse solver (http://www.cise.ufl.edu/research/sparse/umfpack/) not found. Directories to search for the libraries can be specified in the numpy/distutils/site.cfg file (section [umfpack]) or by setting the UMFPACK environment variable. warnings.warn(self.notfounderror.__doc__) NOT AVAILABLE Traceback (most recent call last): File "setup.py", line 160, in setup_package() File "setup.py", line 152, in setup_package configuration=configuration ) File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/distutils/core.py", line 144, in setup config = configuration() File "setup.py", line 118, in configuration config.add_subpackage('scipy') File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/distutils/misc_util.py", line 765, in add_subpackage caller_level = 2) File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/distutils/misc_util.py", line 748, in get_subpackage caller_level = caller_level + 1) File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/distutils/misc_util.py", line 695, in _get_configuration_from_setup_py config = setup_module.configuration(*args) File "./scipy/setup.py", line 20, in configuration config.add_subpackage('special') File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/distutils/misc_util.py", line 765, in add_subpackage caller_level = 2) File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/distutils/misc_util.py", line 748, in get_subpackage caller_level = caller_level + 1) File "/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/lib/python/numpy/distutils/misc_util.py", line 680, in _get_configuration_from_setup_py ('.py', 'U', 1)) File "scipy/special/setup.py", line 7, in from numpy.distutils.misc_util import get_numpy_include_dirs, get_info ImportError: cannot import name get_info From robert.kern at gmail.com Thu Nov 5 15:20:25 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 5 Nov 2009 14:20:25 -0600 Subject: [SciPy-User] SciPy installation problems In-Reply-To: <51dfb45c0911051037k58543095v3bc93784f6f8a41e@mail.gmail.com> References: <51dfb45c0911051037k58543095v3bc93784f6f8a41e@mail.gmail.com> Message-ID: <3d375d730911051220u7a4f499erbb2629949ac768a2@mail.gmail.com> On Thu, Nov 5, 2009 at 12:37, Gaston Fiore wrote: > Hello, > > I'm trying to install scipy from the SVN repository but I'm getting an > error. I've also discovered that numpy doesn't pass the built-in unit > tests, although it did seem to install without problems. Below is all > the information that I think it's needed to determine the problem and > its solution. Is it the fact that UMFPACK is missing? No. >>>> import numpy >>>> numpy.test('1','10') It's "numpy.test(1, 10)". > dhcp-0011377089-64-8f:scipy gafiore$ python setup.py build > ?File "scipy/special/setup.py", line 7, in > ? ?from numpy.distutils.misc_util import get_numpy_include_dirs, get_info > ImportError: cannot import name get_info The version of numpy supplied with the OS is very old. You will need a newer version in order to build this version of scipy. I recommend avoiding the system's installation of Python entirely and using the binaries from python.org instead. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From gokhansever at gmail.com Thu Nov 5 21:21:15 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Thu, 5 Nov 2009 20:21:15 -0600 Subject: [SciPy-User] Comparing variable time-shifted two measurements Message-ID: <49d6b3500911051821l77618452tc3229af345b5d685@mail.gmail.com> Hello, I have two aircraft based aerosol measurements. The first one is dccnConSTP (blue), and the latter is CPCConc (red) as shown in this screen capture. ( http://img513.imageshack.us/img513/7498/ccncpclag.png). My goal is to compare these two measurements. It is expected to see that they must have a positive correlation throughout the flight. However, the instrument that gives CPCConc was experiencing a sampling issue and therefore making a varying time-shifted measurements with respect to the first instrument. (From the first box it is about 20 seconds, 24 from the seconds before the dccnConSTP measurements shows up.) In other words in different altitude levels, I have varying time differences in between these two measurements in terms of their shapes. So, my goal turns to addressing this variable shifting issue before I start doing the comparisons. Is there a known automated approach to correct this mentioned varying-lag issue? If so, how? Thank you. -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Thu Nov 5 23:05:05 2009 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Thu, 5 Nov 2009 20:05:05 -0800 Subject: [SciPy-User] characteristic functions of probability distributions In-Reply-To: References: <1cd32cbb0911012209p117d86fbhd7dab9dbde7fbe46@mail.gmail.com> Message-ID: <45d1ab480911052005y3929daf8q11821596c71c895a@mail.gmail.com> On Mon, Nov 2, 2009 at 12:51 PM, nicky van foreest wrote: > Hi Josef, > > Second related question, since I'm not good with complex numbers. > > > > scipy.integrate.quad of a complex function returns the absolute value. > > Is there a numerical integration function in scipy that returns the > > complex integral or do I have to integrate the real and imaginary > > parts separately? > > You want to compute \int_w^z f(t) dt? When f is analytic (i.e., > satisfies the Cauchy Riemann equations) this integral is path > independent. Otherwise the path from w to z is of importance. You > might like the book Visual Complex Analysis by Needham for intuition. > Furthermore, if f is analytic in an (open) region R homotopic to an (open) disc, then the integral (an integer number of times) around *any* _closed_ path wholly in R is identically equal to zero; there's a similar statement (though the end value is a multiple of 2ipi) if f has only poles of finite order in R. (Indeed, these properties should be used to unit test any numerical complex path integration routine.) Are any of your paths closed? DG > > bye > > Nicky > > > > > Thanks, > > > > Josef > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From peridot.faceted at gmail.com Thu Nov 5 23:19:56 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Thu, 5 Nov 2009 23:19:56 -0500 Subject: [SciPy-User] characteristic functions of probability distributions In-Reply-To: <45d1ab480911052005y3929daf8q11821596c71c895a@mail.gmail.com> References: <1cd32cbb0911012209p117d86fbhd7dab9dbde7fbe46@mail.gmail.com> <45d1ab480911052005y3929daf8q11821596c71c895a@mail.gmail.com> Message-ID: 2009/11/5 David Goldsmith : > On Mon, Nov 2, 2009 at 12:51 PM, nicky van foreest > wrote: >> >> Hi Josef, >> > Second related question, since I'm not good with complex numbers. >> > >> > scipy.integrate.quad of a complex function returns the absolute value. >> > Is there a numerical integration function in scipy that returns the >> > complex integral or do I have to integrate the real and imaginary >> > parts separately? >> >> You want to compute \int_w^z f(t) dt? When f is analytic (i.e., >> satisfies the Cauchy Riemann equations) this integral is path >> independent. Otherwise the path from w to z is of importance. You >> might like the book Visual Complex Analysis by Needham for intuition. > > Furthermore, if f is analytic in an (open) region R homotopic to an (open) > disc, then the integral (an integer number of times) around *any* _closed_ > path wholly in R is identically equal to zero; there's a similar statement > (though the end value is a multiple of 2ipi) if f has only poles of finite > order in R.? (Indeed, these properties should be used to unit test any > numerical complex path integration routine.)? Are any of your paths closed? This may well be a red herring. It happens fairly often (to me at least) that I want to integrate or otherwise manipulate a function whose values are complex but whose independent variable is real. Such a function can arise by substituting a path into an analytic function, but there are potentially many other ways to get such a thing - for example you might choose to represent some random function R -> R2 as R -> C instead. Even if it's obtained by feeding a path into some function from C -> C, it happens very often that that function isn't analytic - say it involves an absolute value, or involves the complex conjugate. There are definitely situations in which all the clever machinery of analytic functions can be applied to integration problems (or for that matter, contour integration may be the best way available to evaluate some complex function), but there are also plenty of situations where what you want is just a real function whose values happen to be complex numbers. (Or vectors of length n for that matter.) But I don't think that any of the adaptive quadrature gizmos can handle such a case, so you might be stuck integrating the real and imaginary parts separately. If you *are* in a situation where you're dealing with an analytic function, then as long as you're well away from its poles and your path is nice enough, you may find that it's very well approximated by a polynomial of high degree, which will let you use Gaussian quadrature, which can very easily work with complex-valued functions. The Romberg integration might even work unmodified. Anne > DG >> >> bye >> >> Nicky >> >> > >> > Thanks, >> > >> > Josef >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From peridot.faceted at gmail.com Thu Nov 5 23:48:21 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Thu, 5 Nov 2009 23:48:21 -0500 Subject: [SciPy-User] Comparing variable time-shifted two measurements In-Reply-To: <49d6b3500911051821l77618452tc3229af345b5d685@mail.gmail.com> References: <49d6b3500911051821l77618452tc3229af345b5d685@mail.gmail.com> Message-ID: 2009/11/5 G?khan Sever : > Hello, > > I have two aircraft based aerosol measurements. The first one is dccnConSTP > (blue), and the latter is CPCConc (red) as shown in this screen capture. > (http://img513.imageshack.us/img513/7498/ccncpclag.png). My goal is to > compare these two measurements. It is expected to see that they must have a > positive correlation throughout the flight. However, the instrument that > gives CPCConc was experiencing a sampling issue and therefore making a > varying time-shifted measurements with respect to the first instrument. > (From the first box it is about 20 seconds, 24 from the seconds before the > dccnConSTP measurements shows up.) In other words in different altitude > levels, I have varying time differences in between these two measurements in > terms of their shapes. So, my goal turns to addressing this variable > shifting issue before I start doing the comparisons. > > Is there a known automated approach to correct this mentioned varying-lag > issue? If so, how? There are several tools you can use, depending on exactly what the problem is. If the problem is that there's a constant lag for each data set but you don't know what it is, then you can use the correlation to fit for the lag - if you take the correlation of two vectors, then the highest peak in the correlation vector is the lag where the two vectors are most similar. Correlations can be calculated rapidly using FFTs. If the lag isn't constant over a data set, you can try using correlations to find the lag at several points in the data set and interpolate to get the lag as a function of time (but be careful - depending on what caused the lag, a steadily-drifting model isn't necessarily appropriate; maybe you'll have periods of constant offset separated by jumps). If you know the lag, but it isn't constant and you're not sure how to resample your data set to remove the lag, look at scipy's ndimage. This should have the tools to do what you want. If your data sets are unevenly sampled, so that you can't use simple correlations, I'm not sure quite what to suggest, except perhaps interpolating them to evenly-spaced samples and then running the correlation. For this try scipy.interpolate. If you do end up fitting for the lag, keep in mind that you'll have adjusted the lags to make the time series as similar as possible, so that there's a risk of overestimating their similarities. But the only way around that problem is to know the lags from some independent source. Anne > Thank you. > > -- > G?khan > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From josef.pktd at gmail.com Fri Nov 6 00:02:47 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 6 Nov 2009 00:02:47 -0500 Subject: [SciPy-User] characteristic functions of probability distributions In-Reply-To: References: <1cd32cbb0911012209p117d86fbhd7dab9dbde7fbe46@mail.gmail.com> <45d1ab480911052005y3929daf8q11821596c71c895a@mail.gmail.com> Message-ID: <1cd32cbb0911052102r40a066e3v86a5ed62cdf954aa@mail.gmail.com> On Thu, Nov 5, 2009 at 11:19 PM, Anne Archibald wrote: > 2009/11/5 David Goldsmith : >> On Mon, Nov 2, 2009 at 12:51 PM, nicky van foreest >> wrote: >>> >>> Hi Josef, >>> > Second related question, since I'm not good with complex numbers. >>> > >>> > scipy.integrate.quad of a complex function returns the absolute value. >>> > Is there a numerical integration function in scipy that returns the >>> > complex integral or do I have to integrate the real and imaginary >>> > parts separately? >>> >>> You want to compute \int_w^z f(t) dt? When f is analytic (i.e., >>> satisfies the Cauchy Riemann equations) this integral is path >>> independent. Otherwise the path from w to z is of importance. You >>> might like the book Visual Complex Analysis by Needham for intuition. >> >> Furthermore, if f is analytic in an (open) region R homotopic to an (open) >> disc, then the integral (an integer number of times) around *any* _closed_ >> path wholly in R is identically equal to zero; there's a similar statement >> (though the end value is a multiple of 2ipi) if f has only poles of finite >> order in R.? (Indeed, these properties should be used to unit test any >> numerical complex path integration routine.)? Are any of your paths closed? > > This may well be a red herring. It happens fairly often (to me at > least) that I want to integrate or otherwise manipulate a function > whose values are complex but whose independent variable is real. > > Such a function can arise by substituting a path into an analytic > function, but there are potentially many other ways to get such a > thing - for example you might choose to represent some random function > R -> R2 as R -> C instead. Even if it's obtained by feeding a path > into some function from C -> C, it happens very often that that > function isn't analytic - say it involves an absolute value, or > involves the complex conjugate. > > There are definitely situations in which all the clever machinery of > analytic functions can be applied to integration problems (or for that > matter, contour integration may be the best way available to evaluate > some complex function), but there are also plenty of situations where > what you want is just a real function whose values happen to be > complex numbers. (Or vectors of length n for that matter.) But I don't > think that any of the adaptive quadrature gizmos can handle such a > case, so you might be stuck integrating the real and imaginary parts > separately. > > If you *are* in a situation where you're dealing with an analytic > function, then as long as you're well away from its poles and your > path is nice enough, you may find that it's very well approximated by > a polynomial of high degree, which will let you use Gaussian > quadrature, which can very easily work with complex-valued functions. > The Romberg integration might even work unmodified. Sorry for not coming back to this earlier, Thanks Nicky, I looked at some papers by Ward Whitt and they look interesting but much more than what I want to chew on right now. There is more background, that I would have to read, than I have time right now for this. I finally added "matlab" to my google searches, and I think I found some references that use discretization and fft more directly. The integration problem should be pretty "nice", just a continuous fourier transform and the inverse http://en.wikipedia.org/wiki/Characteristic_function_%28probability_theory%29#Definition http://en.wikipedia.org/wiki/Characteristic_function_%28probability_theory%29#Inversion_formulas For many distributions there is an explicit formula for both the density and the characteristic function, e.g. normal http://en.wikipedia.org/wiki/Normal_distribution#Characteristic_function For some distributions only the characteristic functions has a closed form expression, and the pdf or cdf has to be recovered numerically, and I would have liked to have a generic method to go between the two. I don't think I ever needed a path integral in my life, and I'm pretty much a newbie to complex numbers, so parts of your explanations are still quite a bit over my head. I think, I will come back to this after I looked more at the examples where the estimation of a statistical model or of a distribution is done in terms of the characteristic function instead of the density. The immediate example that I had tried, was (integration from -large number to +large number) integral exp(i t x)dF(x) = integrate.quad(exp(itx)*f(x)) or do I have to do integral exp(i t x)dF(x) = integrate.quad(real(exp(itx)*f(x))) + j * integrate.quad(imag(exp(itx)*f(x))) or is there another way? The solution/integral might be either real or complex. Thanks, Josef > > Anne > >> DG >>> >>> bye >>> >>> Nicky >>> >>> > >>> > Thanks, >>> > >>> > Josef From denis-bz-py at t-online.de Wed Nov 4 11:24:37 2009 From: denis-bz-py at t-online.de (denis) Date: Wed, 04 Nov 2009 17:24:37 +0100 Subject: [SciPy-User] RBF plot in reference/tutorial/interpolate.html Message-ID: Scipy doc people, in http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html#d-example last updated Oct 08 the two 1d plots Interpolation using univariate spline Interpolation using RBF - multiquadrics look rather similar, hmm - plt.plot(xi, yi, 'g') + plt.plot(xi, fi, 'g') cheers -- denis From scott.sinclair.za at gmail.com Fri Nov 6 06:03:26 2009 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Fri, 6 Nov 2009 13:03:26 +0200 Subject: [SciPy-User] RBF plot in reference/tutorial/interpolate.html In-Reply-To: References: Message-ID: <6a17e9ee0911060303x70c7854eub5c8ca2840c872f3@mail.gmail.com> >2009/11/4 denis : > Scipy doc people, > ? in http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html#d-example > last updated Oct 08 > > the two 1d plots > ? ? Interpolation using univariate spline > ? ? Interpolation using RBF - multiquadrics > look rather similar, hmm > - plt.plot(xi, yi, 'g') > + plt.plot(xi, fi, 'g') Thanks for spotting that. It's fixed in the doc editor http://docs.scipy.org/scipy/docs/scipy-docs/tutorial/interpolate.rst You are welcome to create an account on the doc wiki and help out with similar fixes in future. Take a look at http://docs.scipy.org/numpy/Front%20Page/ to see how you can contribute (esp. the section "Before you start"). Cheers, Scott From rcsqtc at iqac.csic.es Fri Nov 6 05:53:25 2009 From: rcsqtc at iqac.csic.es (Ramon Crehuet) Date: Fri, 06 Nov 2009 11:53:25 +0100 Subject: [SciPy-User] Contribution to Performance Python Message-ID: <4AF40025.3000906@iqac.csic.es> Hi all, After reading the "Performance Python" page at: http://www.scipy.org/PerformancePython?action=show I thought some code with Fortran 90/95 was missing, in partuclar considering its useful array features. So I have written a couple of examples with Fortran 90 arrays and with the Fortran 95 forall construct. Both are nicer (to me :-) ) than the FORTRAN77 loops and also faster! In my PC a 1000x1000 array gives: Doing 100 iterations on a 1000x1000 grid numeric took 5.86 seconds fortran77 took 3.53 seconds fortran90-arrays took 1.58 seconds fortran95-forall took 1.58 seconds slow (1 iteration) took 9.13 seconds 100 iterations should take about 913.000000 seconds If this is interesting to the community, who should I contact to have this included in the scipy web page? Cheers, Ramon This are the two new subroutines. I can send the modified laplace.py to whoever wants it. ****************************************************** ! File flaplace90_arrays.f90 subroutine timestep(u,n,m,dx,dy,error) implicit none real (kind=8), dimension(0:n-1,0:m-1), intent(inout):: u real (kind=8), intent(in) :: dx,dy real (kind=8), intent(out) :: error integer, intent(in) :: n,m real (kind=8), dimension(0:n-1,0:m-1) :: diff real (kind=8) :: dx2,dy2,dnr_inv !f2py intent(in) :: dx,dy !f2py intent(in,out) :: u !f2py intent(out) :: error !f2py intent(hide) :: n,m dx2 = dx*dx dy2 = dy*dy dnr_inv = 0.5d0 / (dx2+dy2) diff=u u(1:n-2, 1:m-2) = ((u(0:n-3, 1:m-2) + u(2:n-1, 1:m-2))*dy2 + & (u(1:n-2,0:m-3) + u(1:n-2, 2:m-1))*dx2)*dnr_inv error=sqrt(sum((u-diff)**2)) end subroutine ****************************************************** ! File flaplace95_forall.f90 subroutine timestep(u,n,m,dx,dy,error) implicit none real (kind=8), dimension(0:n-1,0:m-1), intent(inout):: u real (kind=8), intent(in) :: dx,dy real (kind=8), intent(out) :: error integer, intent(in) :: n,m real (kind=8), dimension(0:n-1,0:m-1) :: diff real (kind=8) :: dx2,dy2,dnr_inv integer :: i,j !f2py intent(in) :: dx,dy !f2py intent(in,out) :: u !f2py intent(out) :: error !f2py intent(hide) :: n,m dx2 = dx*dx dy2 = dy*dy dnr_inv = 0.5d0 / (dx2+dy2) diff=u forall (i=1:n-2,j=1:m-2) u(i,j) = ((u(i-1,j) + u(i+1,j))*dy2+(u(i,j-1) + u(i,j+1))*dx2)*dnr_inv end forall error=sqrt(sum((u-diff)**2)) end subroutine ****************************************************** From fperez.net at gmail.com Fri Nov 6 07:18:53 2009 From: fperez.net at gmail.com (Fernando Perez) Date: Fri, 6 Nov 2009 04:18:53 -0800 Subject: [SciPy-User] [ANN] For SF Bay Area residents: a discussion with Guido at the Berkeley Py4Science seminar In-Reply-To: References: Message-ID: On Tue, Nov 3, 2009 at 11:28 AM, Fernando Perez wrote: > if you reside in the San Francisco Bay Area, you may be interested in > a meeting we'll be having tomorrow November 4 (2-4 pm), as part of our > regular py4science meeting series. ?Guido van Rossum, the creator of > the Python language, will visit for a session where we will first do a > very rapid overview of a number of scientific projects that use Python > (in a lightning talk format) and then we will have an open discussion > with Guido with hopefully interesting questions going in both > directions. ?The meeting is open to all, bring your questions! Video of the event: http://www.archive.org/details/ucb_py4science_2009_11_04_Guido_van_Rossum Slides: http://fperez.org/py4science/2009_guido_ucb/index.html A few blog posts about it: - Guido: http://neopythonic.blogspot.com/2009/11/python-in-scientific-world.html - Jarrod: http://jarrodmillman.blogspot.com/2009/11/visit-from-guido-van-rossum.html - Matthew: http://nipyworld.blogspot.com/2009/11/guido-van-rossum-talks-about-python-3.html - Me: http://fdoperez.blogspot.com/2009/11/guido-van-rossum-at-uc-berkeleys.html Attendance was excellent (standing room only, and I saw some people leave because it was too full). Many thanks to all the presenters! Cheers, f From nmb at wartburg.edu Fri Nov 6 12:17:27 2009 From: nmb at wartburg.edu (Neil Martinsen-Burrell) Date: Fri, 06 Nov 2009 11:17:27 -0600 Subject: [SciPy-User] [ANN] For SF Bay Area residents: a discussion with Guido at the Berkeley Py4Science seminar In-Reply-To: References: Message-ID: <4AF45A27.3020805@wartburg.edu> On 2009-11-06 06:18 , Fernando Perez wrote: > On Tue, Nov 3, 2009 at 11:28 AM, Fernando Perez wrote: > >> if you reside in the San Francisco Bay Area, you may be interested in >> a meeting we'll be having tomorrow November 4 (2-4 pm), as part of our >> regular py4science meeting series. Guido van Rossum, the creator of >> the Python language, will visit for a session where we will first do a >> very rapid overview of a number of scientific projects that use Python >> (in a lightning talk format) and then we will have an open discussion >> with Guido with hopefully interesting questions going in both >> directions. The meeting is open to all, bring your questions! > > Video of the event: > http://www.archive.org/details/ucb_py4science_2009_11_04_Guido_van_Rossum > > Slides: http://fperez.org/py4science/2009_guido_ucb/index.html > > A few blog posts about it: > > - Guido: http://neopythonic.blogspot.com/2009/11/python-in-scientific-world.html > > - Jarrod: http://jarrodmillman.blogspot.com/2009/11/visit-from-guido-van-rossum.html > > - Matthew: http://nipyworld.blogspot.com/2009/11/guido-van-rossum-talks-about-python-3.html > > - Me: http://fdoperez.blogspot.com/2009/11/guido-van-rossum-at-uc-berkeleys.html > > Attendance was excellent (standing room only, and I saw some people > leave because it was too full). Many thanks to all the presenters! From the silent majority who lurk here, many thanks to you Fernando for setting this up (and for IPython). It is wonderful to know that the concerns and achievements of scientific computing in Python are on the radar of the group of people responsible for leading the language. If you have thoughts on how the wider community can contribute to this sort of communication in the future, please share. -Neil From dsdale24 at gmail.com Fri Nov 6 12:34:34 2009 From: dsdale24 at gmail.com (Darren Dale) Date: Fri, 6 Nov 2009 12:34:34 -0500 Subject: [SciPy-User] [ANN] For SF Bay Area residents: a discussion with Guido at the Berkeley Py4Science seminar In-Reply-To: <4AF45A27.3020805@wartburg.edu> References: <4AF45A27.3020805@wartburg.edu> Message-ID: On Fri, Nov 6, 2009 at 12:17 PM, Neil Martinsen-Burrell wrote: > On 2009-11-06 06:18 , Fernando Perez wrote: >> On Tue, Nov 3, 2009 at 11:28 AM, Fernando Perez ?wrote: >> >>> if you reside in the San Francisco Bay Area, you may be interested in >>> a meeting we'll be having tomorrow November 4 (2-4 pm), as part of our >>> regular py4science meeting series. ?Guido van Rossum, the creator of >>> the Python language, will visit for a session where we will first do a >>> very rapid overview of a number of scientific projects that use Python >>> (in a lightning talk format) and then we will have an open discussion >>> with Guido with hopefully interesting questions going in both >>> directions. ?The meeting is open to all, bring your questions! >> >> Video of the event: >> http://www.archive.org/details/ucb_py4science_2009_11_04_Guido_van_Rossum >> >> Slides: http://fperez.org/py4science/2009_guido_ucb/index.html >> >> A few blog posts about it: >> >> - Guido: http://neopythonic.blogspot.com/2009/11/python-in-scientific-world.html >> >> - Jarrod: http://jarrodmillman.blogspot.com/2009/11/visit-from-guido-van-rossum.html >> >> - Matthew: http://nipyworld.blogspot.com/2009/11/guido-van-rossum-talks-about-python-3.html >> >> - Me: http://fdoperez.blogspot.com/2009/11/guido-van-rossum-at-uc-berkeleys.html >> >> Attendance was excellent (standing room only, and I saw some people >> leave because it was too full). Many thanks to all the presenters! > > ?From the silent majority who lurk here, many thanks to you Fernando for > setting this up (and for IPython). Yes, thank you Fernando. If you are at liberty to comment further on discussions concerning parallel computing and the GIL, I would be very interested to hear about it. Darren From gokhansever at gmail.com Fri Nov 6 18:36:23 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Fri, 6 Nov 2009 17:36:23 -0600 Subject: [SciPy-User] Comparing variable time-shifted two measurements In-Reply-To: References: <49d6b3500911051821l77618452tc3229af345b5d685@mail.gmail.com> Message-ID: <49d6b3500911061536v780f3686v594b58bf26e419c9@mail.gmail.com> On Thu, Nov 5, 2009 at 10:48 PM, Anne Archibald wrote: > 2009/11/5 G?khan Sever : > > Hello, > > > > I have two aircraft based aerosol measurements. The first one is > dccnConSTP > > (blue), and the latter is CPCConc (red) as shown in this screen capture. > > (http://img513.imageshack.us/img513/7498/ccncpclag.png). My goal is to > > compare these two measurements. It is expected to see that they must have > a > > positive correlation throughout the flight. However, the instrument that > > gives CPCConc was experiencing a sampling issue and therefore making a > > varying time-shifted measurements with respect to the first instrument. > > (From the first box it is about 20 seconds, 24 from the seconds before > the > > dccnConSTP measurements shows up.) In other words in different altitude > > levels, I have varying time differences in between these two measurements > in > > terms of their shapes. So, my goal turns to addressing this variable > > shifting issue before I start doing the comparisons. > > > > Is there a known automated approach to correct this mentioned varying-lag > > issue? If so, how? > > There are several tools you can use, depending on exactly what the problem > is. > > If the problem is that there's a constant lag for each data set but > you don't know what it is, then you can use the correlation to fit for > the lag - if you take the correlation of two vectors, then the highest > peak in the correlation vector is the lag where the two vectors are > most similar. That's how I discovered the varying lag. I was expecting a nicer correlation when I shifted the data at a constant value however, it turned wrong and later analysis showed that the lags are not constant. > Correlations can be calculated rapidly using FFTs. > I am curious to know how to use FFT in this case? > > If the lag isn't constant over a data set, you can try using > correlations to find the lag at several points in the data set and > interpolate to get the lag as a function of time (but be careful - > depending on what caused the lag, a steadily-drifting model isn't > necessarily appropriate; maybe you'll have periods of constant offset > separated by jumps). > Ok, good idea. Probably the more finer I correlate the data the higher accuracy I will get from the correlations therefore a better interpolated result. "steadily-drifting model" is another new term to me. > > If you know the lag, but it isn't constant and you're not sure how to > resample your data set to remove the lag, look at scipy's ndimage. > This should have the tools to do what you want. > This is a 1D data. Could you give me an example how to utilize the ndimage library for my case? > > If your data sets are unevenly sampled, so that you can't use simple > correlations, I'm not sure quite what to suggest, except perhaps > interpolating them to evenly-spaced samples and then running the > correlation. For this try scipy.interpolate. > I don't think uneven sampling is an issue in my case. Both instruments sample at 1Hz. One samples from 0.5 L/min flow, the other from 1.0 L/min where it cannot maintain this rate when the pressure gets lower. > > If you do end up fitting for the lag, keep in mind that you'll have > adjusted the lags to make the time series as similar as possible, so > that there's a risk of overestimating their similarities. But the only > way around that problem is to know the lags from some independent > source. > Thank you for your suggestions. For now I am sure that these varying lags are only determined via a manual inspection. If I had the sample flow rate recorded than it would be easy to correct the data, unfortunately this will be something for the future experiments. > Anne > > > Thank you. > > > > -- > > G?khan > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From aarchiba at physics.mcgill.ca Fri Nov 6 19:13:20 2009 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Fri, 6 Nov 2009 19:13:20 -0500 Subject: [SciPy-User] Unit testing of Bayesian estimator Message-ID: Hi, I have implemented a simple Bayesian regression program (it takes events modulo one and returns a posterior probability that the data is phase-invariant plus a posterior distribution for two parameters (modulation fraction and phase) in case there is modulation). I'm rather new at this, so I'd like to construct some unit tests. Does anyone have any suggestions on how to go about this? For a frequentist periodicity detector, the return value is a probability that, given the null hypothesis is true, the statistic would be this extreme. So I can construct a strong unit test by generating a collection of data sets given the null hypothesis, evaluating the statistic, and seeing whether the number that claim to be significant at a 5% level is really 5%. (In fact I can use the binomial distribution to get limits on the number of false positive.) This gives me a unit test that is completely orthogonal to my implementation, and that passes if and only if the code works. For a Bayesian hypothesis testing setup, I don't really see how to do something analogous. I can generate non-modulated data sets and confirm that my code returns a high probability that the data is not modulated, but how high should I expect the probability to be? I can generate data sets with models with known parameters and check that the best-fit parameters are close to the known parameters - but how close? Even if I do it many times, is the posterior mean unbiased? What about the posterior mode or median? I can even generate models and then data sets that are drawn from the prior distribution, but what should I expect from the code output on such a data set? I feel sure there's some test that verifies a statistical property of Bayesian estimators/hypothesis testers, but I cant quite put my finger on it. Suggestions welcome. Thanks, Anne From josef.pktd at gmail.com Fri Nov 6 22:37:44 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 6 Nov 2009 22:37:44 -0500 Subject: [SciPy-User] Unit testing of Bayesian estimator In-Reply-To: References: Message-ID: <1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com> On Fri, Nov 6, 2009 at 7:13 PM, Anne Archibald wrote: > Hi, > > I have implemented a simple Bayesian regression program (it takes > events modulo one and returns a posterior probability that the data is > phase-invariant plus a posterior distribution for two parameters > (modulation fraction and phase) in case there is modulation). I'm > rather new at this, so I'd like to construct some unit tests. Does > anyone have any suggestions on how to go about this? > > For a frequentist periodicity detector, the return value is a > probability that, given the null hypothesis is true, the statistic > would be this extreme. So I can construct a strong unit test by > generating a collection of data sets given the null hypothesis, > evaluating the statistic, and seeing whether the number that claim to > be significant at a 5% level is really 5%. (In fact I can use the > binomial distribution to get limits on the number of false positive.) > This gives me a unit test that is completely orthogonal to my > implementation, and that passes if and only if the code works. For a > Bayesian hypothesis testing setup, I don't really see how to do > something analogous. > > I can generate non-modulated data sets and confirm that my code > returns a high probability that the data is not modulated, but how > high should I expect the probability to be? I can generate data sets > with models with known parameters and check that the best-fit > parameters are close to the known parameters - but how close? Even if > I do it many times, is the posterior mean unbiased? What about the > posterior mode or median? I can even generate models and then data > sets that are drawn from the prior distribution, but what should I > expect from the code output on such a data set? I feel sure there's > some test that verifies a statistical property of Bayesian > estimators/hypothesis testers, but I cant quite put my finger on it. The Bayesian experts are at pymc, maybe you can look at there tests for inspiration. I don't know those, since I never looked at that part. I never tried to test a Bayesian estimator but many properties are still the same as in the non-Bayesian analysis. In my Bayesian past, I essentially only used normal and t distributions, and binomial. One of my first tests for these things is to create a huge sample and see whether the parameter estimates converge. With Bayesian analysis you still have the law of large numbers, (for non-dogmatic priors) Do you have an example with a known posterior? Then, the posterior with a large sample or the average in a Monte Carlo should still be approximately the true one. For symmetric distributions, the Bayesian posterior confidence intervals and posterior mean should be roughly the same as the frequentist estimates. With diffuse priors, in many cases the results are exactly the same in Bayesian and MLE. Another version I used in the past is to trace the posterior mean, as the prior variance is reduced, in one extreme you should get the prior back in the other extreme the MLE. > I can even generate models and then data > sets that are drawn from the prior distribution, but what should I > expect from the code output on such a data set? If you ignore the Bayesian interpretation, then this is just a standard sampling problem, you draw prior parameters and observations, the rest is just finding the conditional and marginal probabilities. I think the posterior odds ratio should converge in a large Monte Carlo to the true one, and the significance levels should correspond to the one that has been set for the test (5%). (In simplest case of conjugate priors, you can just interpret the prior as a previous sample and you are back to a frequentist explanation.) The problem is that with an informative prior, you always have a biased estimator in small samples and the posterior odds ratio is affected by an informative prior. And "real" Bayesians don't care about sampling properties. What are your prior distributions and the likelihood function in your case? Can you model degenerate and diffuse priors, so that an informative prior doesn't influence you sampling results? I'm trying to think of special cases where you could remove the effect of the prior. It's a bit vague because I don't see the details, and I haven't looked at this in a while. > > Suggestions welcome. > > Thanks, > Anne > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From tpk at kraussfamily.org Sat Nov 7 10:09:27 2009 From: tpk at kraussfamily.org (Tom K.) Date: Sat, 7 Nov 2009 07:09:27 -0800 (PST) Subject: [SciPy-User] [SciPy-user] Unit testing of Bayesian estimator In-Reply-To: References: Message-ID: <26241135.post@talk.nabble.com> Hi Anne, interesting question. I'm not really sure what a Bayesian hypothesis tester is, but I expect that it results in a random variable. For a given input prior and measurement distribution, and choice of hypothesis (signal present or signal absent), can you know the distribution of this random variable? If so, it could come down to a test that this random variable - or a function of it such as mean or probability that it is greater than some value - behaves as expected. How do you create a unit-test that a random variable generator is working? If the random variables were all iid normal, you could average a bunch and then test that the mean of the sample was close to the mean of the distribution - which is going to be impossible to guarantee, since there is a non-zero probability that the mean is large. In practice however it is likely that your test will pass, but you will probably have to tack down the seeds and make sure that the probability of failure is really small so changing seeds (and hence the underlying sequence of "random" inputs) won't likely cause a failure. Is there anything in scipy's stats module to test that a series of random variables has a given distribution? Maybe scipy.stats.kstest? Who the heck are Kolmogorov and Smirnov anyway :-)? Anne Archibald-2 wrote: > > Hi, > > I have implemented a simple Bayesian regression program (it takes > events modulo one and returns a posterior probability that the data is > phase-invariant plus a posterior distribution for two parameters > (modulation fraction and phase) in case there is modulation). I'm > rather new at this, so I'd like to construct some unit tests. Does > anyone have any suggestions on how to go about this? > > For a frequentist periodicity detector, the return value is a > probability that, given the null hypothesis is true, the statistic > would be this extreme. So I can construct a strong unit test by > generating a collection of data sets given the null hypothesis, > evaluating the statistic, and seeing whether the number that claim to > be significant at a 5% level is really 5%. (In fact I can use the > binomial distribution to get limits on the number of false positive.) > This gives me a unit test that is completely orthogonal to my > implementation, and that passes if and only if the code works. For a > Bayesian hypothesis testing setup, I don't really see how to do > something analogous. > > I can generate non-modulated data sets and confirm that my code > returns a high probability that the data is not modulated, but how > high should I expect the probability to be? I can generate data sets > with models with known parameters and check that the best-fit > parameters are close to the known parameters - but how close? Even if > I do it many times, is the posterior mean unbiased? What about the > posterior mode or median? I can even generate models and then data > sets that are drawn from the prior distribution, but what should I > expect from the code output on such a data set? I feel sure there's > some test that verifies a statistical property of Bayesian > estimators/hypothesis testers, but I cant quite put my finger on it. > > Suggestions welcome. > > Thanks, > Anne > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- View this message in context: http://old.nabble.com/Unit-testing-of-Bayesian-estimator-tp26240654p26241135.html Sent from the Scipy-User mailing list archive at Nabble.com. From bsouthey at gmail.com Sat Nov 7 22:23:25 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Sat, 7 Nov 2009 21:23:25 -0600 Subject: [SciPy-User] Unit testing of Bayesian estimator In-Reply-To: References: Message-ID: On Fri, Nov 6, 2009 at 6:13 PM, Anne Archibald wrote: > Hi, > > I have implemented a simple Bayesian regression program (it takes > events modulo one and returns a posterior probability that the data is > phase-invariant plus a posterior distribution for two parameters > (modulation fraction and phase) in case there is modulation). I do not know your field, a little rusty on certain issues and I do not consider myself a Bayesian. Exactly what type of Bayesian did you use? I also do not know how you implemented it especially if it is empirical or Monte Carlo Markov Chains. > I'm > rather new at this, so I'd like to construct some unit tests. Does > anyone have any suggestions on how to go about this? Since this is a test, the theoretical 'correctness' is irrelevant. So I would guess that you should use very informative priors and data with a huge amount of information. That should make the posterior have an extremely narrow range so your modal estimate is very close to the true value within a very small range. After that it really depends on the algorithm, the data used and what you need to test. Basically you just have to say given this set of inputs I get this 'result' that I consider reasonable. After all, if the implementation of algorithm works then it is most likely the inputs that are a problem. In statistics, problems usually enter because the desired model can not be estimated from the provided data. Separation of user errors from a bug in the code usually identified by fitting simpler or alternative models. > > For a frequentist periodicity detector, the return value is a > probability that, given the null hypothesis is true, the statistic > would be this extreme. So I can construct a strong unit test by > generating a collection of data sets given the null hypothesis, > evaluating the statistic, and seeing whether the number that claim to > be significant at a 5% level is really 5%. (In fact I can use the > binomial distribution to get limits on the number of false positive.) > This gives me a unit test that is completely orthogonal to my > implementation, and that passes if and only if the code works. For a > Bayesian hypothesis testing setup, I don't really see how to do > something analogous. > > I can generate non-modulated data sets and confirm that my code > returns a high probability that the data is not modulated, but how > high should I expect the probability to be? I can generate data sets > with models with known parameters and check that the best-fit > parameters are close to the known parameters - but how close? Even if > I do it many times, is the posterior mean unbiased? What about the > posterior mode or median? I can even generate models and then data > sets that are drawn from the prior distribution, but what should I > expect from the code output on such a data set? I feel sure there's > some test that verifies a statistical property of Bayesian > estimators/hypothesis testers, but I cant quite put my finger on it. > > Suggestions welcome. > > Thanks, > Anne Please do not mix Frequentist or Likelihood concepts with Bayesian. Also you never generate data for estimation from the prior distribution, you generate it from the posterior distribution as that is what your estimating. Really in Bayesian sense all this data generation is unnecessary because you have already calculated that information in computing the posteriors. The posterior of a parameter is a distribution not a single number so you just compare distributions. For example, you can compute modal values and construct Bayesian credible intervals of the parameters. These should make very strong sense to the original values simulated. For Bayesian work, you must address the data and the priors. In particular, you need to be careful about the informativeness of the prior. You can get great results just because your prior was sufficiently informative but you can get great results because you data was very informative. Depending on how it was implemented, a improper prior can be an issue because these do not guarantee a proper posterior (but often do lead to proper posteriors). So if your posterior is improper then you are in a very bad situation and can lead to weird results some or all of the time.Some times this is can easily be fixed such as by putting bounds on flat priors. Whereas proper priors give proper posteriors. But as a final comment, it should not matter which approach you use as if you do not get what you simulated then either your code is wrong or you did not simulate what your code implements. (Surprising how frequent the latter is.) Bruce From peridot.faceted at gmail.com Sun Nov 8 02:14:37 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Sun, 8 Nov 2009 02:14:37 -0500 Subject: [SciPy-User] Unit testing of Bayesian estimator In-Reply-To: <1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com> References: <1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com> Message-ID: 2009/11/6 : > On Fri, Nov 6, 2009 at 7:13 PM, Anne Archibald > wrote: >> Hi, >> >> I have implemented a simple Bayesian regression program (it takes >> events modulo one and returns a posterior probability that the data is >> phase-invariant plus a posterior distribution for two parameters >> (modulation fraction and phase) in case there is modulation). I'm >> rather new at this, so I'd like to construct some unit tests. Does >> anyone have any suggestions on how to go about this? >> >> For a frequentist periodicity detector, the return value is a >> probability that, given the null hypothesis is true, the statistic >> would be this extreme. So I can construct a strong unit test by >> generating a collection of data sets given the null hypothesis, >> evaluating the statistic, and seeing whether the number that claim to >> be significant at a 5% level is really 5%. (In fact I can use the >> binomial distribution to get limits on the number of false positive.) >> This gives me a unit test that is completely orthogonal to my >> implementation, and that passes if and only if the code works. For a >> Bayesian hypothesis testing setup, I don't really see how to do >> something analogous. >> >> I can generate non-modulated data sets and confirm that my code >> returns a high probability that the data is not modulated, but how >> high should I expect the probability to be? I can generate data sets >> with models with known parameters and check that the best-fit >> parameters are close to the known parameters - but how close? Even if >> I do it many times, is the posterior mean unbiased? What about the >> posterior mode or median? I can even generate models and then data >> sets that are drawn from the prior distribution, but what should I >> expect from the code output on such a data set? I feel sure there's >> some test that verifies a statistical property of Bayesian >> estimators/hypothesis testers, but I cant quite put my finger on it. > > > The Bayesian experts are at pymc, maybe you can look at there tests > for inspiration. I don't know those, since I never looked at that > part. > > I never tried to test a Bayesian estimator but many properties are > still the same as in the non-Bayesian analysis. In my Bayesian past, I > essentially only used normal and t distributions, and binomial. > > One of my first tests for these things is to create a huge sample and > see whether the parameter estimates converge. With Bayesian analysis > you still have the law of large numbers, (for non-dogmatic priors) As far as getting the code roughly working, this is what I used; just run it generating lots of photons and see that roughly the right parameters come out. Unfortunately, this isn't really very sensitive to all the things that are supposed to make a Bayesian estimator better than (say) a maximum-likelihood estimator; I could have the probability estimation pretty badly wrong, but there are so many photons that anything but the right parameters are such a horrible fit even a somewhat wrong algorithm won't select them. Maybe you meant something different: I could also try fixing some model parameters and generating just a handful of photons, so I get a crummy estimate, but then repeating the photon generation and fit many times, to see if the average value of the best-fit parameters comes out close to the true parameters. But this is a test for unbiasedness of the estimator, and it's not clear that this estimator should be unbiased even if correct. > Do you have an example with a known posterior? Then, the posterior > with a large sample or the average in a Monte Carlo should still be > approximately the true one. > For symmetric distributions, the Bayesian posterior confidence > intervals and posterior mean should be roughly the same as the > frequentist estimates. With diffuse priors, in many cases the results > are exactly the same in Bayesian and MLE. > Another version I used in the past is to trace the posterior mean, as > the prior variance is reduced, in one extreme you should get the prior > back in the other extreme the MLE. My priors are flat, on (0,1) in both phase and pulsed fraction. It seems a bit peculiar to use anything else in phase, but I can imagine some sort of logarithmic prior for pulsed fraction (making 0.01-0.1 equally likely to 0.1-1). I haven't experimented with introducing localized priors, but it seems like that too wouldn't be very sensitive to whether the Bayesian calculation is right; if the prior insists that the values are both 0.5, then any remotely sane algorithm will come up with posteriors that are also 0.5. >> I can even generate models and then data >> sets that are drawn from the prior distribution, but what should I >> expect from the code output on such a data set? > > If you ignore the Bayesian interpretation, then this is just a > standard sampling problem, you draw prior parameters and observations, > the rest is just finding the conditional and marginal probabilities. I > think the posterior odds ratio should converge in a large Monte Carlo > to the true one, and the significance levels should correspond to the > one that has been set for the test (5%). > (In simplest case of conjugate priors, you can just interpret the > prior as a previous sample and you are back to a frequentist > explanation.) This sounds like what I was trying for - draw a model according to the priors, then generate a data set according to the model. I then get some numbers out: the simplest is a probability that the model was pulsed, but I can also get a credible interval or an estimated CDF for the model parameters. But I'm trying to figure out what test I should apply to those values to see if they make sense. For a credible interval, I suppose I could take (say) a 95% credible interval, then 95 times out of a hundred the model parameters I used to generate the data set should be in the credible interval. And I should be able to use the binomial distribution to put limits on how close to 95% I should get in M trials. This seems to work, but I'm not sure I understand why. The credible region is obtained from a probability distribution for the model parameters, but I am turning things around and testing the distribution of credible regions. In any case, that seems to work, so now I just need to figure out a similar test for the probability of being pulsed. > The problem is that with an informative prior, you always have a > biased estimator in small samples and the posterior odds ratio is > affected by an informative prior. ?And "real" Bayesians don't care > about sampling properties. > > What are your prior distributions and the likelihood function in your > case? Can you model degenerate and diffuse priors, so that an > informative prior doesn't influence you sampling results? > I'm trying to think of special cases where you could remove the effect > of the prior. I've put the code on github in case that helps make this any clearer: http://github.com/aarchiba/bayespf The model I'm using is either: completely uniform mod 1 (probability 0.5) or: the PDF is a cosine plus a constant, where the two parameters are the fraction of area under the cosine (as opposed to under the constant) and the phase offset of the cosine. The likelihood is just (modulo logs for range issues) the product over all observed phases x of PDF(fraction, phase, x). So the mode of the posterior is exactly the maximum-likelihood estimate (whether or not I got the math right, more or less). > It's a bit vague because I don't see the details, and I haven't looked > at this in a while. As is probably obvious, I'm pretty vague on Bayesian statistics in general. But I'm working on it. Anne > > > > > > >> >> Suggestions welcome. >> >> Thanks, >> Anne >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From peridot.faceted at gmail.com Sun Nov 8 02:25:04 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Sun, 8 Nov 2009 02:25:04 -0500 Subject: [SciPy-User] [SciPy-user] Unit testing of Bayesian estimator In-Reply-To: <26241135.post@talk.nabble.com> References: <26241135.post@talk.nabble.com> Message-ID: 2009/11/7 Tom K. : > > Hi Anne, interesting question. > > I'm not really sure what a Bayesian hypothesis tester is, but I expect that > it results in a random variable. ?For a given input prior and measurement > distribution, and choice of hypothesis (signal present or signal absent), > can you know the distribution of this random variable? ?If so, it could come > down to a test that this random variable - or a function of it such as mean > or probability that it is greater than some value - behaves as expected. > How do you create a unit-test that a random variable generator is working? > If the random variables were all iid normal, you could average a bunch and > then test that the mean of the sample was close to the mean of the > distribution - which is going to be impossible to guarantee, since there is > a non-zero probability that the mean is large. ?In practice however it is > likely that your test will pass, but you will probably have to tack down the > seeds and make sure that the probability of failure is really small so > changing seeds (and hence the underlying sequence of "random" inputs) won't > likely cause a failure. > > Is there anything in scipy's stats module to test that a series of random > variables has a given distribution? ?Maybe scipy.stats.kstest? ?Who the heck > are Kolmogorov and Smirnov anyway :-)? Focusing on the hypothesis testing part, what my code does is take a collection of photons and return the probability that they are drawn from a pulsed distribution. My prior has two alternatives: not pulsed (p=0.5) and pulsed (p=0.5, parameters randomly chosen). I feed this to a Bayesian gizmo and get back a probability that the photons were drawn from the former case. In terms of testing, the very crude tests pass: if I give it a zillion photons, it can correctly distinguish pulsed from unpulsed. But what I'd like to test is whether the probability it returns is correct. What I'd really like is some statistical test I can do on the procedure to check whether the returned numbers are correct. Of course, if I knew what distribution they were supposed to have, I could just feed them to the K-S test. But I don't. Part of the problem is that the data quality affects the result: if feed in a zillion unpulsed photons, I get a variety of probabilities that are close to zero - but how close is correct? I have no idea. If I use pulsed photons, it is even more complicated: for a large pulsed fraction, I'll get a variety of probabilities that are close to one. But if I either reduce the number of photons or the pulsed fraction, it gets harder to distinguish pulsed from unpulsed and the probabilities start to drop. But I have no real idea how much. Anne > Anne Archibald-2 wrote: >> >> Hi, >> >> I have implemented a simple Bayesian regression program (it takes >> events modulo one and returns a posterior probability that the data is >> phase-invariant plus a posterior distribution for two parameters >> (modulation fraction and phase) in case there is modulation). I'm >> rather new at this, so I'd like to construct some unit tests. Does >> anyone have any suggestions on how to go about this? >> >> For a frequentist periodicity detector, the return value is a >> probability that, given the null hypothesis is true, the statistic >> would be this extreme. So I can construct a strong unit test by >> generating a collection of data sets given the null hypothesis, >> evaluating the statistic, and seeing whether the number that claim to >> be significant at a 5% level is really 5%. (In fact I can use the >> binomial distribution to get limits on the number of false positive.) >> This gives me a unit test that is completely orthogonal to my >> implementation, and that passes if and only if the code works. For a >> Bayesian hypothesis testing setup, I don't really see how to do >> something analogous. >> >> I can generate non-modulated data sets and confirm that my code >> returns a high probability that the data is not modulated, but how >> high should I expect the probability to be? I can generate data sets >> with models with known parameters and check that the best-fit >> parameters are close to the known parameters - but how close? Even if >> I do it many times, is the posterior mean unbiased? What about the >> posterior mode or median? I can even generate models and then data >> sets that are drawn from the prior distribution, but what should I >> expect from the code output on such a data set? I feel sure there's >> some test that verifies a statistical property of Bayesian >> estimators/hypothesis testers, but I cant quite put my finger on it. >> >> Suggestions welcome. >> >> Thanks, >> Anne >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > -- > View this message in context: http://old.nabble.com/Unit-testing-of-Bayesian-estimator-tp26240654p26241135.html > Sent from the Scipy-User mailing list archive at Nabble.com. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From peridot.faceted at gmail.com Sun Nov 8 02:47:04 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Sun, 8 Nov 2009 02:47:04 -0500 Subject: [SciPy-User] Unit testing of Bayesian estimator In-Reply-To: References: Message-ID: 2009/11/7 Bruce Southey : > On Fri, Nov 6, 2009 at 6:13 PM, Anne Archibald > wrote: >> Hi, >> >> I have implemented a simple Bayesian regression program (it takes >> events modulo one and returns a posterior probability that the data is >> phase-invariant plus a posterior distribution for two parameters >> (modulation fraction and phase) in case there is modulation). > > I do not know your field, a little rusty on certain issues and I do > not consider myself a Bayesian. > > Exactly what type of Bayesian did you use? > I also do not know how you implemented it especially if it is > empirical or Monte Carlo Markov Chains. It's an ultra-simple toy problem, really: I did the numerical integration in the absolute simplest way possible, by evaluating the quantity to be evaluated on a grid and averaging. See github for details: http://github.com/aarchiba/bayespf I can certainly improve on this, but I'd rather get my testing issues sorted out first, so that I can test the tests, as it were, on an implementation I'm reasonably confident is correct, before changing it to a mathematically more subtle one. >> I'm >> rather new at this, so I'd like to construct some unit tests. Does >> anyone have any suggestions on how to go about this? > > Since this is a test, the theoretical 'correctness' is irrelevant. So > I would guess that you should use very informative priors and data > with a huge amount of information. That should make the posterior have > an extremely narrow range so your modal estimate is very close to the > true value within a very small range. This doesn't really test whether the estimator is doing a good job, since if I throw mountains of information at it, even a rather badly wrong implementation will eventually converge to the right answer. (This is painful experience speaking.) I disagree on the issue of theoretical correctness, though. The best tests do exactly that: test the theoretical correctness of the routine in question, ideally without any reference to the implementation. To test the SVD, for example, you just test that the two matrices are both orthogonal, and you test that multiplying them together with the singular values between gives you your original matrix. If your implementation passes this test, it is computing the SVD just fine, no matter what it looks like inside. With the frequentist signal-detection statistics I'm more familiar with, I can write exactly this sort of test. I talk a little more about it here: http://lighthouseinthesky.blogspot.com/2009/11/testing-statistical-tests.html This works too well, it turns out, to apply to scipy's K-S test or my own Kuiper test, since their p-values are calculated rather approximately, so they fail. > After that it really depends on the algorithm, the data used and what > you need to test. Basically you just have to say given this set of > inputs I get this 'result' that I consider reasonable. After all, if > the implementation of algorithm works then it is most likely the > inputs that are a problem. In statistics, problems usually enter > because the desired model can not be estimated from the provided data. > Separation of user errors from a bug in the code usually identified by > fitting simpler or alternative models. It's exactly the implementation I don't trust, here. I can scrutinize the implementation all I like, but I'd really like an independent check on my calculations, and staring at the code won't get me that. >> >> For a frequentist periodicity detector, the return value is a >> probability that, given the null hypothesis is true, the statistic >> would be this extreme. So I can construct a strong unit test by >> generating a collection of data sets given the null hypothesis, >> evaluating the statistic, and seeing whether the number that claim to >> be significant at a 5% level is really 5%. (In fact I can use the >> binomial distribution to get limits on the number of false positive.) >> This gives me a unit test that is completely orthogonal to my >> implementation, and that passes if and only if the code works. For a >> Bayesian hypothesis testing setup, I don't really see how to do >> something analogous. >> >> I can generate non-modulated data sets and confirm that my code >> returns a high probability that the data is not modulated, but how >> high should I expect the probability to be? I can generate data sets >> with models with known parameters and check that the best-fit >> parameters are close to the known parameters - but how close? Even if >> I do it many times, is the posterior mean unbiased? What about the >> posterior mode or median? I can even generate models and then data >> sets that are drawn from the prior distribution, but what should I >> expect from the code output on such a data set? I feel sure there's >> some test that verifies a statistical property of Bayesian >> estimators/hypothesis testers, but I cant quite put my finger on it. >> >> Suggestions welcome. >> >> Thanks, >> Anne > > Please do not mix Frequentist or Likelihood concepts with Bayesian. > Also you never generate data for estimation from the prior > distribution, you generate it from the posterior distribution as that > is what your estimating. Um. I would be picking models from the prior distribution, not data. However I find the models, I have a well-defined way to generate data from the model. Why do you say it's a bad idea to mix Bayesian and frequentist approaches? It seems to me that as I use them to try to answer similar questions, it makes sense to compare them; and since I know how to test frequentist estimators, it's worth seeing whether I can cast Bayesian estimators in frequentist terms, at least for testing purposes. > Really in Bayesian sense all this data generation is unnecessary > because you have already calculated that information in computing the > posteriors. The posterior of a parameter is a distribution not a > single number so you just compare distributions. ?For example, you can > compute modal values and construct Bayesian credible intervals of the > parameters. These should make very strong sense to the original values > simulated. I take this to mean that I don't need to do simulations to get credible intervals (while I normally would have to to get confidence intervals), which I agree with. But this is a different question: I'm talking about constructing a test by simulating the whole Bayesian process and seeing whether it behaves as it should. The problem is coming up with a sufficiently clear mathematical definition of "should". > For Bayesian work, you must address the data and the priors. In > particular, you need to be careful about the informativeness of the > prior. You can get great results just because your prior was > sufficiently informative but you can get great results because you > data was very informative. > > Depending on how it was implemented, a improper prior can be an issue > because these do not guarantee a proper posterior (but often do lead > to proper posteriors). So if your posterior is improper then you are > in a very bad situation and can lead to weird results some or all of > the time.Some times this is can easily be fixed such as by putting > bounds on flat priors. Whereas proper priors give proper posteriors. Indeed. I think my priors are pretty safe: 50% chance it's pulsed, flat priors in phase and pulsed fraction. In the long run I might want a slightly smarter prior on pulsed fraction, but for the moment I think it's fine. > But as a final comment, it should not matter which approach you use as > if you do not get what you simulated then either your code is wrong or > you did not simulate what your code implements. (Surprising how > frequent the latter is.) This is a bit misleading. If I use a (fairly) small number of photons, and/or a fairly small pulsed fraction, I should be astonished if I got back the model parameters exactly. I know already that the data leave a lot of room for slop, so what I am trying to test is how well this Bayesian gizmo quantifies that slop. Anne > Bruce > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From tmp50 at ukr.net Sun Nov 8 06:22:34 2009 From: tmp50 at ukr.net (Dmitrey) Date: Sun, 08 Nov 2009 13:22:34 +0200 Subject: [SciPy-User] isn't it a bug in scipy.sparse? + some questions Message-ID: Hi scipy.sparse developers and all other scipy users, I'm trying to take benefits for solving SLEs in FuncDesigner via involving scipy.sparse. Some examples are here http://openopt.org/FuncDesignerDoc#Solving_systems_of_linear_equations and example for sparse SLEs is here http://trac.openopt.org/openopt/browser/PythonPackages/FuncDesigner/FuncDesigner/examples/sparseSLE.py It already works faster than using dense matrices, but I want to speedup it even more, so I have some questions and seems like bug report (scipy.__version__ 0.7.0): from scipy import sparse from numpy import * a=sparse.lil_matrix((3,1)) a[0:3,:] = ones(3) print a.todense() #prints [[ 1.] ?[ 0.] ?[ 0.]] while I expect all-ones Questions: 1) Seems like a[some_ind,:]=something works very, very slow for lil. I have implemented a workaround, but can I use a[some_ind,:] for another format than lil? (seems like all other ones doesn't support it). 2) What is current situation with matmat and matvec functions? They say "deprecated" but no alternative is mentioned. 3) What is current situation with scipy.sparse.linalg.spsolve? It says /usr/lib/python2.6/dist-packages/scipy/sparse/linalg/dsolve/linsolve.py:78: DeprecationWarning: scipy.sparse.linalg.dsolve.umfpack will be removed, install scikits.umfpack instead ? ' install scikits.umfpack instead', DeprecationWarning ) But I don't want my code to be dependent on a scikits module. Are there another default/autoselect solver for sparse SLEs? If no, which one would you recommend me to use as default for sparse SLEs - bicg, gmres, something else? Thank you in advance, D. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eadrogue at gmx.net Sun Nov 8 10:16:25 2009 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Sun, 8 Nov 2009 16:16:25 +0100 Subject: [SciPy-User] the skellam distribution Message-ID: <20091108151625.GA561@doriath.local> Hi, In case somebody is interested, or you want to include it in scipy. I used these specs here from the R package: cran.r-project.org/web/packages/skellam/skellam.pdf Note that I am no statician, somebody who knows what he's doing (as opposed to me ;) should verify it's correct. import numpy import scipy.stats.distributions # Skellam distribution ncx2 = scipy.stats.distributions.ncx2 class skellam_gen(scipy.stats.distributions.rv_discrete): def _pmf(self, x, mu1, mu2): if x < 0: px = ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2 else: px = ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2 return px def _cdf(self, x, mu1, mu2): x = numpy.floor(x) if x < 0: px = ncx2.cdf(2*mu2, x*(-2), 2*mu1) else: px = 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2) return px def _stats(self, mu1, mu2): mean = mu1 - mu2 var = mu1 + mu2 g1 = (mu1 - mu2) / numpy.sqrt((mu1 + mu2)**3) g2 = 1 / (mu1 + mu2) return mean, var, g1, g2 skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam', shapes="mu1,mu2", extradoc="") Bye. -- Ernest From vanforeest at gmail.com Sun Nov 8 15:30:36 2009 From: vanforeest at gmail.com (nicky van foreest) Date: Sun, 8 Nov 2009 21:30:36 +0100 Subject: [SciPy-User] characteristic functions of probability distributions In-Reply-To: <1cd32cbb0911052102r40a066e3v86a5ed62cdf954aa@mail.gmail.com> References: <1cd32cbb0911012209p117d86fbhd7dab9dbde7fbe46@mail.gmail.com> <45d1ab480911052005y3929daf8q11821596c71c895a@mail.gmail.com> <1cd32cbb0911052102r40a066e3v86a5ed62cdf954aa@mail.gmail.com> Message-ID: Hi Joseph, > Thanks Nicky, I looked at some papers by Ward Whitt and they look > interesting but much more than what I want to chew on right now. I understand. I wish I had the time to study these papers in more detail. > I don't think I ever needed a path integral in my life, That must be a most undesirable state of affairs, :-) > integral exp(i t x)dF(x) ?= integrate.quad(real(exp(itx)*f(x))) + j * > integrate.quad(imag(exp(itx)*f(x))) > or is there another way? Perhaps you recall that Re(exp(ix)) = cos(x), and Im(exp(ix) = sin(x). Hence, you might try simply: integrate.quad(cos(tx)f(x)) + i integrate.quad(sin(tx) f(x)) (untested code though..) I have my doubt about the stability of these integrations, although I am by no means an expert on this. Suppose that t is big. Then cos(tx) varies rapidly in comparison to f(x) as a function of x. Then you are adding lots of negative and possitive numbers of roughly the same size... This must result in bogus. Perhaps it is better not to include any "generic" code to tranform the characteristic function into a density, unless the methods work reasonably well. bye Nicky From josef.pktd at gmail.com Sun Nov 8 17:14:58 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 8 Nov 2009 17:14:58 -0500 Subject: [SciPy-User] Unit testing of Bayesian estimator In-Reply-To: References: <1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com> Message-ID: <1cd32cbb0911081414u23d3ef8cgb408c2fbed65c098@mail.gmail.com> On Sun, Nov 8, 2009 at 2:14 AM, Anne Archibald wrote: > 2009/11/6 ?: >> On Fri, Nov 6, 2009 at 7:13 PM, Anne Archibald >> wrote: >>> Hi, >>> >>> I have implemented a simple Bayesian regression program (it takes >>> events modulo one and returns a posterior probability that the data is >>> phase-invariant plus a posterior distribution for two parameters >>> (modulation fraction and phase) in case there is modulation). I'm >>> rather new at this, so I'd like to construct some unit tests. Does >>> anyone have any suggestions on how to go about this? >>> >>> For a frequentist periodicity detector, the return value is a >>> probability that, given the null hypothesis is true, the statistic >>> would be this extreme. So I can construct a strong unit test by >>> generating a collection of data sets given the null hypothesis, >>> evaluating the statistic, and seeing whether the number that claim to >>> be significant at a 5% level is really 5%. (In fact I can use the >>> binomial distribution to get limits on the number of false positive.) >>> This gives me a unit test that is completely orthogonal to my >>> implementation, and that passes if and only if the code works. For a >>> Bayesian hypothesis testing setup, I don't really see how to do >>> something analogous. >>> >>> I can generate non-modulated data sets and confirm that my code >>> returns a high probability that the data is not modulated, but how >>> high should I expect the probability to be? I can generate data sets >>> with models with known parameters and check that the best-fit >>> parameters are close to the known parameters - but how close? Even if >>> I do it many times, is the posterior mean unbiased? What about the >>> posterior mode or median? I can even generate models and then data >>> sets that are drawn from the prior distribution, but what should I >>> expect from the code output on such a data set? I feel sure there's >>> some test that verifies a statistical property of Bayesian >>> estimators/hypothesis testers, but I cant quite put my finger on it. >> >> >> The Bayesian experts are at pymc, maybe you can look at there tests >> for inspiration. I don't know those, since I never looked at that >> part. >> >> I never tried to test a Bayesian estimator but many properties are >> still the same as in the non-Bayesian analysis. In my Bayesian past, I >> essentially only used normal and t distributions, and binomial. >> >> One of my first tests for these things is to create a huge sample and >> see whether the parameter estimates converge. With Bayesian analysis >> you still have the law of large numbers, (for non-dogmatic priors) > > As far as getting the code roughly working, this is what I used; just > run it generating lots of photons and see that roughly the right > parameters come out. Unfortunately, this isn't really very sensitive > to all the things that are supposed to make a Bayesian estimator > better than (say) a maximum-likelihood estimator; I could have the > probability estimation pretty badly wrong, but there are so many > photons that anything but the right parameters are such a horrible fit > even a somewhat wrong algorithm won't select them. > > Maybe you meant something different: I could also try fixing some > model parameters and generating just a handful of photons, so I get a > crummy estimate, but then repeating the photon generation and fit many > times, to see if the average value of the best-fit parameters comes > out close to the true parameters. But this is a test for unbiasedness > of the estimator, and it's not clear that this estimator should be > unbiased even if correct. When I do a Monte Carlo for point estimates, I usually check bias, variance, mean squared error, and mean absolute and median absolute error (which is a more robust to outliers, e.g. because for some cases the estimator produces numerical nonsense because of non-convergence or other numerical problems). MSE captures better cases of biased estimators that are better in MSE sense. I ran your test, test_bayes.py for M = 50, 500 and 1000 adding "return in_interval_f" and inside = test_credible_interval() If my reading is correct inside should be 80% of M, and you are pretty close. (M=1000 is pretty slow on my notebook) >>> inside 39 >>> 39/50. 0.78000000000000003 >>> >>> inside 410 >>> inside/500. 0.81999999999999995 >>> >>> inside/1000. 0.81499999999999995 I haven't looked enough on the details yet, but I think this way you could test more quantiles of the distribution, to see whether the posterior distribution is roughly the same as the sampling distribution in the MonteCarlo. In each iteration of the Monte Carlo you get a full posterior distribution, after a large number of iterations you have a sampling distribution, and it should be possible to compare this distribution with the posterior distributions. I'm still not sure how. two questions to your algorithm Isn't np.random.shuffle(r) redundant? I didn't see anywhere were the sequence of observation in r would matter. Why do you subtract mx in the loglikelihood function? mx = np.amax(lpdf) p = np.exp(lpdf - mx)/np.average(np.exp(lpdf-mx)) > >> Do you have an example with a known posterior? Then, the posterior >> with a large sample or the average in a Monte Carlo should still be >> approximately the true one. >> For symmetric distributions, the Bayesian posterior confidence >> intervals and posterior mean should be roughly the same as the >> frequentist estimates. With diffuse priors, in many cases the results >> are exactly the same in Bayesian and MLE. >> Another version I used in the past is to trace the posterior mean, as >> the prior variance is reduced, in one extreme you should get the prior >> back in the other extreme the MLE. > > My priors are flat, on (0,1) in both phase and pulsed fraction. It > seems a bit peculiar to use anything else in phase, but I can imagine > some sort of logarithmic prior for pulsed fraction (making 0.01-0.1 > equally likely to 0.1-1). I haven't experimented with introducing > localized priors, but it seems like that too wouldn't be very > sensitive to whether the Bayesian calculation is right; if the prior > insists that the values are both 0.5, then any remotely sane algorithm > will come up with posteriors that are also 0.5. In the last sentence, you better hope that, if the true fraction is 0.1 than the posterior should be concentrated around 0.1 and not around 0.5. Right now you don't have an explicit prior, but once you use one, you might want to test the effects of an informative prior. For binomial (fraction) the natural prior is the beta distribution, if I remember correctly. But I don't know if the marginal posterior in this case would also be beta. > >>> I can even generate models and then data >>> sets that are drawn from the prior distribution, but what should I >>> expect from the code output on such a data set? >> >> If you ignore the Bayesian interpretation, then this is just a >> standard sampling problem, you draw prior parameters and observations, >> the rest is just finding the conditional and marginal probabilities. I >> think the posterior odds ratio should converge in a large Monte Carlo >> to the true one, and the significance levels should correspond to the >> one that has been set for the test (5%). >> (In simplest case of conjugate priors, you can just interpret the >> prior as a previous sample and you are back to a frequentist >> explanation.) > > This sounds like what I was trying for - draw a model according to the > priors, then generate a data set according to the model. I then get > some numbers out: the simplest is a probability that the model was > pulsed, but I can also get a credible interval or an estimated CDF for > the model parameters. ?But I'm trying to figure out what test I should > apply to those values to see if they make sense. > > For a credible interval, I suppose I could take (say) a 95% credible > interval, then 95 times out of a hundred the model parameters I used > to generate the data set should be in the credible interval. And I > should be able to use the binomial distribution to put limits on how > close to 95% I should get in M trials. This seems to work, but I'm not > sure I understand why. The credible region is obtained from a > probability distribution for the model parameters, but I am turning > things around and testing the distribution of credible regions. If you ignore the Bayesian belief interpretation, then it's just a problem of Probability Theory, and you are just checking the small and large sample behavior of an estimator and a test, whether it has a Bayesian origin or not. > > In any case, that seems to work, so now I just need to figure out a > similar test for the probability of being pulsed. "probability of being pulsed" I'm not sure what test you have in mind. There are two interpretations: In your current example, fraction is the fraction of observations that are pulsed and fraction=0 is a zero probability event. So you cannot really test fraction==0 versus fraction >0. In the other interpretation you would have a prior probability (mass) that your star is a pulsar with fraction >0 or a non-pulsing unit with fraction=0. The probabilities in both cases would be similar, but the interpretation of the test would be different, and differ between frequentists and Bayesians. Overall your results look almost too "nice", with 8000 observations you get a very narrow posterior in the plot. Josef > >> The problem is that with an informative prior, you always have a >> biased estimator in small samples and the posterior odds ratio is >> affected by an informative prior. ?And "real" Bayesians don't care >> about sampling properties. >> >> What are your prior distributions and the likelihood function in your >> case? Can you model degenerate and diffuse priors, so that an >> informative prior doesn't influence you sampling results? >> I'm trying to think of special cases where you could remove the effect >> of the prior. > > I've put the code on github in case that helps make this any clearer: > http://github.com/aarchiba/bayespf > > The model I'm using is either: completely uniform mod 1 (probability > 0.5) or: the PDF is a cosine plus a constant, where the two parameters > are the fraction of area under the cosine (as opposed to under the > constant) and the phase offset of the cosine. The likelihood is just > (modulo logs for range issues) the product over all observed phases x > of PDF(fraction, phase, x). So the mode of the posterior is exactly > the maximum-likelihood estimate (whether or not I got the math right, > more or less). > >> It's a bit vague because I don't see the details, and I haven't looked >> at this in a while. > > As is probably obvious, I'm pretty vague on Bayesian statistics in > general. But I'm working on it. > > Anne > >> >> >> >> >> >> >>> >>> Suggestions welcome. >>> >>> Thanks, >>> Anne >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From peridot.faceted at gmail.com Sun Nov 8 17:51:08 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Sun, 8 Nov 2009 17:51:08 -0500 Subject: [SciPy-User] Unit testing of Bayesian estimator In-Reply-To: <1cd32cbb0911081414u23d3ef8cgb408c2fbed65c098@mail.gmail.com> References: <1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com> <1cd32cbb0911081414u23d3ef8cgb408c2fbed65c098@mail.gmail.com> Message-ID: 2009/11/8 : > When I do a Monte Carlo for point estimates, I usually check bias, > variance, mean squared error, > and mean absolute and median absolute error (which is a more > robust to outliers, e.g. because for some cases the estimator produces > numerical nonsense because of non-convergence or other numerical > problems). MSE captures better cases of biased estimators that are > better in MSE sense. I can certainly compute all these quantities from a collection of Monte Carlo runs, but I don't have any idea what values would indicate correctness, apart from "not too big". > I ran your test, test_bayes.py for M = 50, 500 and 1000 adding "return > in_interval_f" > and inside = test_credible_interval() > > If my reading is correct inside should be 80% of M, and you are pretty close. > (M=1000 is pretty slow on my notebook) Yeah, that's the problem with using the world's simplest numerical integration scheme. >>>> inside > 39 >>>> 39/50. > 0.78000000000000003 >>>> >>>> inside > 410 >>>> inside/500. > 0.81999999999999995 >>>> >>>> inside/1000. > 0.81499999999999995 > > I haven't looked enough on the details yet, but I think this way you could > test more quantiles of the distribution, to see whether the posterior > distribution is roughly the same as the sampling distribution in the > MonteCarlo. I could test more quantiles, but I'm very distrustful of testing more than one quantile per randomly-generated sample: they should be covariant (if the 90% mark is too high, the 95% mark will almost certainly be too high as well) and I don't know how to take that into account. And running the test is currently so slow I'm inclined to spend my CPU time on a stricter test of a single quantile. Though unfortunately to increase the strictness I also need to improve the sampling in phase and fraction. > In each iteration of the Monte Carlo you get a full posterior distribution, > after a large number of iterations you have a sampling distribution, > and it should be possible to compare this distribution with the > posterior distributions. I'm still not sure how. I don't understand what you mean here. I do get a full posterior distribution out of every simulation. But how would I combine these different distributions, and what would the combined distribution mean? > two questions to your algorithm > > Isn't np.random.shuffle(r) redundant? > I didn't see anywhere were the sequence of observation in r would matter. It is technically redundant. But since the point of all this is that I don't trust my code to be right, I want to make sure there's no way it can "cheat" by taking advantage of the order. And in any case, the slow part is my far-too-simple numerical integration scheme. I'm pretty sure the phase integration, at least, could be done analytically. > Why do you subtract mx in the loglikelihood function? > ? ?mx = np.amax(lpdf) > ? ?p = np.exp(lpdf - mx)/np.average(np.exp(lpdf-mx)) This is to avoid overflows. I could just use logsumexp/logaddexp, but that's not yet in numpy on any of the machines I regularly use. It has no effect on the value, since it's subtracted from top and bottom both, but it ensures that the largest value exponentiated is exactly zero. >>>> I can even generate models and then data >>>> sets that are drawn from the prior distribution, but what should I >>>> expect from the code output on such a data set? >>> >>> If you ignore the Bayesian interpretation, then this is just a >>> standard sampling problem, you draw prior parameters and observations, >>> the rest is just finding the conditional and marginal probabilities. I >>> think the posterior odds ratio should converge in a large Monte Carlo >>> to the true one, and the significance levels should correspond to the >>> one that has been set for the test (5%). >>> (In simplest case of conjugate priors, you can just interpret the >>> prior as a previous sample and you are back to a frequentist >>> explanation.) >> >> This sounds like what I was trying for - draw a model according to the >> priors, then generate a data set according to the model. I then get >> some numbers out: the simplest is a probability that the model was >> pulsed, but I can also get a credible interval or an estimated CDF for >> the model parameters. ?But I'm trying to figure out what test I should >> apply to those values to see if they make sense. >> >> For a credible interval, I suppose I could take (say) a 95% credible >> interval, then 95 times out of a hundred the model parameters I used >> to generate the data set should be in the credible interval. And I >> should be able to use the binomial distribution to put limits on how >> close to 95% I should get in M trials. This seems to work, but I'm not >> sure I understand why. The credible region is obtained from a >> probability distribution for the model parameters, but I am turning >> things around and testing the distribution of credible regions. > > If you ignore the Bayesian belief interpretation, then it's just a > problem of Probability Theory, and you are just checking the > small and large sample behavior of an estimator and a test, > whether it has a Bayesian origin or not. Indeed. But with frequentist tests, I have a clear statement of what they're telling me that I can test against: "If you feed this test pure noise you'll get a result this high with probability p". I haven't figured out how to turn the p-value returned by this test into something I can test against. >> In any case, that seems to work, so now I just need to figure out a >> similar test for the probability of being pulsed. > > "probability of being pulsed" > I'm not sure what test you have in mind. > There are two interpretations: > In your current example, fraction is the fraction of observations that > are pulsed and fraction=0 is a zero probability event. So you cannot > really test fraction==0 versus fraction >0. > > In the other interpretation you would have a prior probability (mass) > that your star is a pulsar with fraction >0 or a non-pulsing unit > with fraction=0. This is what the code currently implements: I begin with a 50% chance the signal is unpulsed and a 50% chance the signal is pulsed with some fraction >= 0. > The probabilities in both cases would be similar, but the interpretation > of the test would be different, and differ between frequentists and > Bayesians. > > Overall your results look almost too "nice", with 8000 observations > you get a very narrow posterior in the plot. If you supply a fairly high pulsed fraction, it's indeed easy to tell that it's pulsed with 8000 photons; the difficulty comes when you're looking for a 10% pulsed fraction; it's much harder than 800 photons with a 100% pulsed fraction. If I were really interested in the many-photons case I'd want to think about a prior that made more sense for really small fractions. But I'm keeping things simple for now. Anne From josef.pktd at gmail.com Sun Nov 8 21:35:18 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 8 Nov 2009 21:35:18 -0500 Subject: [SciPy-User] Unit testing of Bayesian estimator In-Reply-To: References: <1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com> <1cd32cbb0911081414u23d3ef8cgb408c2fbed65c098@mail.gmail.com> Message-ID: <1cd32cbb0911081835i34e8d404x3e3c2f75b96f9aa9@mail.gmail.com> On Sun, Nov 8, 2009 at 5:51 PM, Anne Archibald wrote: > 2009/11/8 ?: > >> When I do a Monte Carlo for point estimates, I usually check bias, >> variance, mean squared error, >> and mean absolute and median absolute error (which is a more >> robust to outliers, e.g. because for some cases the estimator produces >> numerical nonsense because of non-convergence or other numerical >> problems). MSE captures better cases of biased estimators that are >> better in MSE sense. > > I can certainly compute all these quantities from a collection of > Monte Carlo runs, but I don't have any idea what values would indicate > correctness, apart from "not too big". I consider them mainly as an absolute standard to see how well an estimator works (or what the size and power of a test is) or to compare them to other estimators, which is a common case for publishing Monte Carlo studies for new estimators. > >> I ran your test, test_bayes.py for M = 50, 500 and 1000 adding "return >> in_interval_f" >> and inside = test_credible_interval() >> >> If my reading is correct inside should be 80% of M, and you are pretty close. >> (M=1000 is pretty slow on my notebook) > > Yeah, that's the problem with using the world's simplest numerical > integration scheme. > >>>>> inside >> 39 >>>>> 39/50. >> 0.78000000000000003 >>>>> >>>>> inside >> 410 >>>>> inside/500. >> 0.81999999999999995 >>>>> >>>>> inside/1000. >> 0.81499999999999995 >> >> I haven't looked enough on the details yet, but I think this way you could >> test more quantiles of the distribution, to see whether the posterior >> distribution is roughly the same as the sampling distribution in the >> MonteCarlo. > > I could test more quantiles, but I'm very distrustful of testing more > than one quantile per randomly-generated sample: they should be > covariant (if the 90% mark is too high, the 95% mark will almost > certainly be too high as well) and I don't know how to take that into > account. And running the test is currently so slow I'm inclined to > spend my CPU time on a stricter test of a single quantile. Though > unfortunately to increase the strictness I also need to improve the > sampling in phase and fraction. Adding additional quantiles might be relatively cheap, mainly the call to searchsorted. One or two quantiles could be consistent with many different distributions or e.g with fatter tails, so I usually check more points. > >> In each iteration of the Monte Carlo you get a full posterior distribution, >> after a large number of iterations you have a sampling distribution, >> and it should be possible to compare this distribution with the >> posterior distributions. I'm still not sure how. > > I don't understand what you mean here. I do get a full posterior > distribution out of every simulation. But how would I combine these > different distributions, and what would the combined distribution > mean? I'm still trying to think how this can be done. Checking more quantiles as discussed above might be doing it to some extend. (I also wonder whether it might be useful to fix the observations during the monte carlo and only vary the sampling of the parameters ?) > >> two questions to your algorithm >> >> Isn't np.random.shuffle(r) redundant? >> I didn't see anywhere were the sequence of observation in r would matter. > > It is technically redundant. But since the point of all this is that I > don't trust my code to be right, I want to make sure there's no way it > can "cheat" by taking advantage of the order. And in any case, the > slow part is my far-too-simple numerical integration scheme. I'm > pretty sure the phase integration, at least, could be done > analytically. > >> Why do you subtract mx in the loglikelihood function? >> ? ?mx = np.amax(lpdf) >> ? ?p = np.exp(lpdf - mx)/np.average(np.exp(lpdf-mx)) > > This is to avoid overflows. I could just use logsumexp/logaddexp, but > that's not yet in numpy on any of the machines I regularly use. It has > no effect on the value, since it's subtracted from top and bottom > both, but it ensures that the largest value exponentiated is exactly > zero. > >>>>> I can even generate models and then data >>>>> sets that are drawn from the prior distribution, but what should I >>>>> expect from the code output on such a data set? >>>> >>>> If you ignore the Bayesian interpretation, then this is just a >>>> standard sampling problem, you draw prior parameters and observations, >>>> the rest is just finding the conditional and marginal probabilities. I >>>> think the posterior odds ratio should converge in a large Monte Carlo >>>> to the true one, and the significance levels should correspond to the >>>> one that has been set for the test (5%). >>>> (In simplest case of conjugate priors, you can just interpret the >>>> prior as a previous sample and you are back to a frequentist >>>> explanation.) >>> >>> This sounds like what I was trying for - draw a model according to the >>> priors, then generate a data set according to the model. I then get >>> some numbers out: the simplest is a probability that the model was >>> pulsed, but I can also get a credible interval or an estimated CDF for >>> the model parameters. ?But I'm trying to figure out what test I should >>> apply to those values to see if they make sense. >>> >>> For a credible interval, I suppose I could take (say) a 95% credible >>> interval, then 95 times out of a hundred the model parameters I used >>> to generate the data set should be in the credible interval. And I >>> should be able to use the binomial distribution to put limits on how >>> close to 95% I should get in M trials. This seems to work, but I'm not >>> sure I understand why. The credible region is obtained from a >>> probability distribution for the model parameters, but I am turning >>> things around and testing the distribution of credible regions. >> >> If you ignore the Bayesian belief interpretation, then it's just a >> problem of Probability Theory, and you are just checking the >> small and large sample behavior of an estimator and a test, >> whether it has a Bayesian origin or not. > > Indeed. But with frequentist tests, I have a clear statement of what > they're telling me that I can test against: "If you feed this test > pure noise you'll get a result this high with probability p". I > haven't figured out how to turn the p-value returned by this test into > something I can test against. What exactly are the null and the alternative hypothesis that you want to test? This is still not clear to me, see also below. > >>> In any case, that seems to work, so now I just need to figure out a >>> similar test for the probability of being pulsed. >> >> "probability of being pulsed" >> I'm not sure what test you have in mind. >> There are two interpretations: >> In your current example, fraction is the fraction of observations that >> are pulsed and fraction=0 is a zero probability event. So you cannot >> really test fraction==0 versus fraction >0. >> >> In the other interpretation you would have a prior probability (mass) >> that your star is a pulsar with fraction >0 or a non-pulsing unit >> with fraction=0. > > This is what the code currently implements: I begin with a 50% chance > the signal is unpulsed and a 50% chance the signal is pulsed with some > fraction >= 0. I don't see this in generate you have m = np.random.binomial(n, fraction) where m is the number of pulsed observations. the probability of observing no pulsed observations is very small >>> stats.binom.pmf(0,100,0.05) 0.0059205292203340009 your likelihood function pdf_data_given_model also treats each observation with equal fraction to be pulsed or not. I would have expected something like case1: pulsed according to binomial fraction, same as now case2: no observations are pulsed prior Prob(case1)=0.5, Prob(case2)=0.5 Or am I missing something? Where is there a test? You have a good posterior distribution for the fraction, which imply a point estimate and confidence interval, which look good from the tests. But I don't see a test hypothesis, (especially a Bayesian statement) Josef > >> The probabilities in both cases would be similar, but the interpretation >> of the test would be different, and differ between frequentists and >> Bayesians. >> >> Overall your results look almost too "nice", with 8000 observations >> you get a very narrow posterior in the plot. > > If you supply a fairly high pulsed fraction, it's indeed easy to tell > that it's pulsed with 8000 photons; the difficulty comes when you're > looking for a 10% pulsed fraction; it's much harder than 800 photons > with a 100% pulsed fraction. If I were really interested in the > many-photons case I'd want to think about a prior that made more sense > for really small fractions. But I'm keeping things simple for now. > > Anne > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From peridot.faceted at gmail.com Sun Nov 8 22:22:36 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Sun, 8 Nov 2009 22:22:36 -0500 Subject: [SciPy-User] Unit testing of Bayesian estimator In-Reply-To: <1cd32cbb0911081835i34e8d404x3e3c2f75b96f9aa9@mail.gmail.com> References: <1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com> <1cd32cbb0911081414u23d3ef8cgb408c2fbed65c098@mail.gmail.com> <1cd32cbb0911081835i34e8d404x3e3c2f75b96f9aa9@mail.gmail.com> Message-ID: 2009/11/8 : > On Sun, Nov 8, 2009 at 5:51 PM, Anne Archibald > wrote: >> 2009/11/8 ?: >> >>> When I do a Monte Carlo for point estimates, I usually check bias, >>> variance, mean squared error, >>> and mean absolute and median absolute error (which is a more >>> robust to outliers, e.g. because for some cases the estimator produces >>> numerical nonsense because of non-convergence or other numerical >>> problems). MSE captures better cases of biased estimators that are >>> better in MSE sense. >> >> I can certainly compute all these quantities from a collection of >> Monte Carlo runs, but I don't have any idea what values would indicate >> correctness, apart from "not too big". > > I consider them mainly as an absolute standard to see how well > an estimator works (or what the size and power of a test is) or > to compare them to other estimators, which is a common case for > publishing Monte Carlo studies for new estimators. Ah. Yes, I will definitely be wanting to compute these at some point. But first I'd like to make sure this estimator is doing what I want it to. >> I could test more quantiles, but I'm very distrustful of testing more >> than one quantile per randomly-generated sample: they should be >> covariant (if the 90% mark is too high, the 95% mark will almost >> certainly be too high as well) and I don't know how to take that into >> account. And running the test is currently so slow I'm inclined to >> spend my CPU time on a stricter test of a single quantile. Though >> unfortunately to increase the strictness I also need to improve the >> sampling in phase and fraction. > > Adding additional quantiles might be relatively cheap, mainly the > call to searchsorted. One or two quantiles could be consistent > with many different distributions or e.g with fatter tails, so I usually > check more points. As I said, I'm concerned about using more than one credible interval per simulation run, since the credible intervals for different quantiles will be different sizes. >>> In each iteration of the Monte Carlo you get a full posterior distribution, >>> after a large number of iterations you have a sampling distribution, >>> and it should be possible to compare this distribution with the >>> posterior distributions. I'm still not sure how. >> >> I don't understand what you mean here. I do get a full posterior >> distribution out of every simulation. But how would I combine these >> different distributions, and what would the combined distribution >> mean? > > I'm still trying to think how this can be done. Checking more quantiles > as discussed above might be doing it to some extend. > (I also wonder whether it might be useful to fix the observations > during the monte carlo and only vary the sampling of the parameters ?) I can see that fixing the data would be sort of nice, but it's not at all clear to me what it would even mean to vary the model while keeping the data constant - after all, the estimator has no access to the model, only the data, so varying the model would have no effect on the result returned. >> Indeed. But with frequentist tests, I have a clear statement of what >> they're telling me that I can test against: "If you feed this test >> pure noise you'll get a result this high with probability p". I >> haven't figured out how to turn the p-value returned by this test into >> something I can test against. > > What exactly are the null and the alternative hypothesis that you > want to test? This is still not clear to me, see also below. Null hypothesis: no pulsations, all photons are drawn from a uniform distributions. Alternative: photons are drawn from a distribution with pulsed fraction f and phase p. >> This is what the code currently implements: I begin with a 50% chance >> the signal is unpulsed and a 50% chance the signal is pulsed with some >> fraction >= 0. > > I don't see this > > in generate you have m = np.random.binomial(n, fraction) > where m is the number of pulsed observations. Generate can be used to generate photons from a uniform distribution by calling it with fraction set to zero. I don't actually do this while testing the credible intervals, because (as I understand it) the presence of this hypothesis does not affect the credible intervals. That is, the credible intervals I'm testing are the credible intervals assuming that the pulsations are real. I'm not at all sure how to incorporate the alternative hypothesis into my testing. > the probability of observing no pulsed observations is very > small >>>> stats.binom.pmf(0,100,0.05) > 0.0059205292203340009 > > your likelihood function pdf_data_given_model > also treats each observation with equal fraction to be pulsed or not. pdf_data_given_model computes the PDF given a set of model parameters. If you want to use it to get a likelihood with fraction=0, you can call it with fraction=0. But this likelihood is always zero. > I would have expected something like > case1: pulsed according to binomial fraction, same as now > case2: no observations are pulsed > prior Prob(case1)=0.5, Prob(case2)=0.5 > > Or am I missing something? > > Where is there a test? You have a good posterior distribution > for the fraction, which imply a point estimate and confidence interval, > which look good from the tests. > But I don't see a test hypothesis, (especially a Bayesian statement) When I came to implement it, the only place I actually needed to mention the null hypothesis was in the calculation of the pulsed probability, which is the last value returned by the inference routine. I did make the somewhat peculiar choice to have the model PDF returned normalized so that its total probability was one, rather than scaling all points by the probability that the system is pulsed at all. It turns out that the inference code just computes the total normalization S over all parameters; then the probability that the signal is pulsed is S/(S+1). Anne From josef.pktd at gmail.com Mon Nov 9 00:07:53 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 9 Nov 2009 00:07:53 -0500 Subject: [SciPy-User] Unit testing of Bayesian estimator In-Reply-To: References: <1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com> <1cd32cbb0911081414u23d3ef8cgb408c2fbed65c098@mail.gmail.com> <1cd32cbb0911081835i34e8d404x3e3c2f75b96f9aa9@mail.gmail.com> Message-ID: <1cd32cbb0911082107k347cbc13s2bc41efb34c4fba8@mail.gmail.com> On Sun, Nov 8, 2009 at 10:22 PM, Anne Archibald wrote: > 2009/11/8 ?: >> On Sun, Nov 8, 2009 at 5:51 PM, Anne Archibald >> wrote: >>> 2009/11/8 ?: >>> >>>> When I do a Monte Carlo for point estimates, I usually check bias, >>>> variance, mean squared error, >>>> and mean absolute and median absolute error (which is a more >>>> robust to outliers, e.g. because for some cases the estimator produces >>>> numerical nonsense because of non-convergence or other numerical >>>> problems). MSE captures better cases of biased estimators that are >>>> better in MSE sense. >>> >>> I can certainly compute all these quantities from a collection of >>> Monte Carlo runs, but I don't have any idea what values would indicate >>> correctness, apart from "not too big". >> >> I consider them mainly as an absolute standard to see how well >> an estimator works (or what the size and power of a test is) or >> to compare them to other estimators, which is a common case for >> publishing Monte Carlo studies for new estimators. > > Ah. Yes, I will definitely be wanting to compute these at some point. > But first I'd like to make sure this estimator is doing what I want it > to. > >>> I could test more quantiles, but I'm very distrustful of testing more >>> than one quantile per randomly-generated sample: they should be >>> covariant (if the 90% mark is too high, the 95% mark will almost >>> certainly be too high as well) and I don't know how to take that into >>> account. And running the test is currently so slow I'm inclined to >>> spend my CPU time on a stricter test of a single quantile. Though >>> unfortunately to increase the strictness I also need to improve the >>> sampling in phase and fraction. >> >> Adding additional quantiles might be relatively cheap, mainly the >> call to searchsorted. One or two quantiles could be consistent >> with many different distributions or e.g with fatter tails, so I usually >> check more points. > > As I said, I'm concerned about using more than one credible interval > per simulation run, since the credible intervals for different > quantiles will be different sizes. > >>>> In each iteration of the Monte Carlo you get a full posterior distribution, >>>> after a large number of iterations you have a sampling distribution, >>>> and it should be possible to compare this distribution with the >>>> posterior distributions. I'm still not sure how. >>> >>> I don't understand what you mean here. I do get a full posterior >>> distribution out of every simulation. But how would I combine these >>> different distributions, and what would the combined distribution >>> mean? >> >> I'm still trying to think how this can be done. Checking more quantiles >> as discussed above might be doing it to some extend. >> (I also wonder whether it might be useful to fix the observations >> during the monte carlo and only vary the sampling of the parameters ?) > > I can see that fixing the data would be sort of nice, but it's not at > all clear to me what it would even mean to vary the model while > keeping the data constant - after all, the estimator has no access to > the model, only the data, so varying the model would have no effect on > the result returned. > >>> Indeed. But with frequentist tests, I have a clear statement of what >>> they're telling me that I can test against: "If you feed this test >>> pure noise you'll get a result this high with probability p". I >>> haven't figured out how to turn the p-value returned by this test into >>> something I can test against. >> >> What exactly are the null and the alternative hypothesis that you >> want to test? This is still not clear to me, see also below. > > Null hypothesis: no pulsations, all photons are drawn from a uniform > distributions. > > Alternative: photons are drawn from a distribution with pulsed > fraction f and phase p. > >>> This is what the code currently implements: I begin with a 50% chance >>> the signal is unpulsed and a 50% chance the signal is pulsed with some >>> fraction >= 0. >> >> I don't see this >> >> in generate you have m = np.random.binomial(n, fraction) >> where m is the number of pulsed observations. > > Generate can be used to generate photons from a uniform distribution > by calling it with fraction set to zero. I don't actually do this > while testing the credible intervals, because ?(as I understand it) > the presence of this hypothesis does not affect the credible > intervals. That is, the credible intervals I'm testing are the > credible intervals assuming that the pulsations are real. I'm not at > all sure how to incorporate the alternative hypothesis into my > testing. > >> the probability of observing no pulsed observations is very >> small >>>>> stats.binom.pmf(0,100,0.05) >> 0.0059205292203340009 >> >> your likelihood function pdf_data_given_model >> also treats each observation with equal fraction to be pulsed or not. > > pdf_data_given_model computes the PDF given a set of model parameters. > If you want to use it to get a likelihood with fraction=0, you can > call it with fraction=0. But this likelihood is always zero. > >> I would have expected something like >> case1: pulsed according to binomial fraction, same as now >> case2: no observations are pulsed >> prior Prob(case1)=0.5, Prob(case2)=0.5 >> >> Or am I missing something? >> >> Where is there a test? You have a good posterior distribution >> for the fraction, which imply a point estimate and confidence interval, >> which look good from the tests. >> But I don't see a test hypothesis, (especially a Bayesian statement) > > When I came to implement it, the only place I actually needed to > mention the null hypothesis was in the calculation of the pulsed > probability, which is the last value returned by the inference > routine. I did make the somewhat peculiar choice to have the model PDF > returned normalized so that its total probability was one, rather than > scaling all points by the probability that the system is pulsed at > all. > > It turns out that the inference code just computes the total > normalization S over all parameters; then the probability that the > signal is pulsed is S/(S+1). Ok, I think I'm starting to see how this works, since you drop the prior probabilities (0.5, 0.5) and the likelihood under the uniform distribution is just 1, everything is pretty reduced form. >From the posterior probability S/(S+1), you could construct a decision rule similar to a classical test, e.g. accept null if S/(S+1) < 0.95, and then construct a MonteCarlo with samples drawn form either the uniform or the pulsed distribution in the same way as for a classical test, and verify that the decision mistakes, alpha and beta errors, in the sample are close to the posterior probabilities. The posterior probability would be similar to the p-value in a classical test. If you want to balance alpha and beta errors, a threshold S/(S+1)<0.5 would be more appropriate, but for the unit tests it wouldn't matter. Running the example a few times, it looks like that the power is relatively low for distinguishing uniform distribution from a pulsed distribution with fraction/binomial parameter 0.05 and sample size <1000. If you have strong beliefs that the fraction is really this low than an informative prior for the fraction, might improve the results. Josef > > Anne > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Mon Nov 9 09:40:50 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 9 Nov 2009 09:40:50 -0500 Subject: [SciPy-User] the skellam distribution In-Reply-To: <20091108151625.GA561@doriath.local> References: <20091108151625.GA561@doriath.local> Message-ID: <1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com> 2009/11/8 Ernest Adrogu? : > Hi, > > In case somebody is interested, or you want to include it > in scipy. I used these specs here from the R package: > cran.r-project.org/web/packages/skellam/skellam.pdf > > Note that I am no statician, somebody who knows what he's > doing (as opposed to me ;) should verify it's correct. > > > import numpy > import scipy.stats.distributions > > # Skellam distribution > > ncx2 = scipy.stats.distributions.ncx2 > > class skellam_gen(scipy.stats.distributions.rv_discrete): > ? ?def _pmf(self, x, mu1, mu2): > ? ? ? ?if x < 0: > ? ? ? ? ? ?px = ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2 > ? ? ? ?else: > ? ? ? ? ? ?px = ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2 > ? ? ? ?return px > ? ?def _cdf(self, x, mu1, mu2): > ? ? ? ?x = numpy.floor(x) > ? ? ? ?if x < 0: > ? ? ? ? ? ?px = ncx2.cdf(2*mu2, x*(-2), 2*mu1) > ? ? ? ?else: > ? ? ? ? ? ?px = 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2) > ? ? ? ?return px > ? ?def _stats(self, mu1, mu2): > ? ? ? ?mean = mu1 - mu2 > ? ? ? ?var = mu1 + mu2 > ? ? ? ?g1 = (mu1 - mu2) / numpy.sqrt((mu1 + mu2)**3) > ? ? ? ?g2 = 1 / (mu1 + mu2) > ? ? ? ?return mean, var, g1, g2 > skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam', > ? ? ? ? ? ? ? ? ? ? ?shapes="mu1,mu2", extradoc="") > Thanks, I think the distribution of the difference of two poisson distributed random variables could be useful. Would you please open an enhancement ticket for this at http://projects.scipy.org/scipy/report/1 I had only a brief look at it so far, I had never looked at the Skellam distribution before, and just read a few references. The "if x < 0 .. else ..." will have to be replace with a "numpy.where" assignment, since the methods are supposed to work with arrays of x (as far as I remember) _rvs could be implemented directly instead of generically (I don't find the reference, where I saw it, right now). Documentation will be necessary, a brief description in the (currently) extradocs, and a listing of the properties for the description of the distributions currently in the stats tutorial. I have some background questions, which address the limitation of the implementation (but are not really necessary for inclusion into scipy). The description in R mentions several implementation of Skellam. Do you have a rough idea what the range of parameters are for which the implementation using ncx produces good results? Do you know if any other special functions would produce good results over a larger range, e.g. using Bessel function? Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also mentions (but doesn't describe) the case of Skellam distribution with correlated Poisson distributions. Do you know what the difference to your implementation would be? Tests for a new distribution will be picked up by the generic tests, but it would be useful to have some extra tests for extreme/uncommon parameter ranges. Do you have any comparisons with R, since you already looked it? Thanks again, I'm always looking out for new useful distributions, (but I have to find the time to do the testing and actual implementation). Josef > Bye. > > -- > Ernest > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From bsouthey at gmail.com Mon Nov 9 10:53:33 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 09 Nov 2009 09:53:33 -0600 Subject: [SciPy-User] the skellam distribution In-Reply-To: <1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com> References: <20091108151625.GA561@doriath.local> <1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com> Message-ID: <4AF83AFD.60304@gmail.com> On 11/09/2009 08:40 AM, josef.pktd at gmail.com wrote: > 2009/11/8 Ernest Adrogu?: > >> Hi, >> >> In case somebody is interested, or you want to include it >> in scipy. I used these specs here from the R package: >> cran.r-project.org/web/packages/skellam/skellam.pdf >> >> Note that I am no statician, somebody who knows what he's >> doing (as opposed to me ;) should verify it's correct. >> >> >> import numpy >> import scipy.stats.distributions >> >> # Skellam distribution >> >> ncx2 = scipy.stats.distributions.ncx2 >> >> class skellam_gen(scipy.stats.distributions.rv_discrete): >> def _pmf(self, x, mu1, mu2): >> if x< 0: >> px = ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2 >> else: >> px = ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2 >> return px >> def _cdf(self, x, mu1, mu2): >> x = numpy.floor(x) >> if x< 0: >> px = ncx2.cdf(2*mu2, x*(-2), 2*mu1) >> else: >> px = 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2) >> return px >> def _stats(self, mu1, mu2): >> mean = mu1 - mu2 >> var = mu1 + mu2 >> g1 = (mu1 - mu2) / numpy.sqrt((mu1 + mu2)**3) >> g2 = 1 / (mu1 + mu2) >> return mean, var, g1, g2 >> skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam', >> shapes="mu1,mu2", extradoc="") >> >> > Thanks, I think the distribution of the difference of two poisson > distributed random variables could be useful. > > Would you please open an enhancement ticket for this at > http://projects.scipy.org/scipy/report/1 > > I had only a brief look at it so far, I had never looked at the > Skellam distribution before, and just read a few references. > > The "if x< 0 .. else ..." will have to be replace with a > "numpy.where" assignment, since the methods are supposed to work with > arrays of x (as far as I remember) > > _rvs could be implemented directly instead of generically (I don't > find the reference, where I saw it, right now). > > Documentation will be necessary, a brief description in the > (currently) extradocs, and a listing of the properties for the > description of the distributions currently in the stats tutorial. > > I have some background questions, which address the limitation of the > implementation (but are not really necessary for inclusion into > scipy). > > The description in R mentions several implementation of Skellam. Do > you have a rough idea what the range of parameters are for which the > implementation using ncx produces good results? Do you know if any > other special functions would produce good results over a larger > range, e.g. using Bessel function? > > Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also > mentions (but doesn't describe) the case of Skellam distribution with > correlated Poisson distributions. Do you know what the difference to > your implementation would be? > > Tests for a new distribution will be picked up by the generic tests, > but it would be useful to have some extra tests for extreme/uncommon > parameter ranges. Do you have any comparisons with R, since you > already looked it? > > > Thanks again, I'm always looking out for new useful distributions, > (but I have to find the time to do the testing and actual > implementation). > > Josef > > > >> Bye. >> >> -- >> Ernest >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Generally any R code can not be used in numpy because R is GPL. Usually R code is also licensed under GPL so translation from R to Python/numpy still maintains the original license. So the code not used by numpy unless that code is licensed under a BSD compatible license. You *must* show that you implementation is from a BSD-compatible source not from the R package. I can see that your code is very simple so there should be an viable alternative source. Also, in the _stats function why do you do not re-use the mean and var variables in computing the g1 and g2 variables? What are 'x, mu1, mu2' ? This looks like a scalar implementation so you need to either check that or allow for array-like inputs. Bruce From cohen at lpta.in2p3.fr Mon Nov 9 11:01:44 2009 From: cohen at lpta.in2p3.fr (Johann Cohen-Tanugi) Date: Mon, 09 Nov 2009 17:01:44 +0100 Subject: [SciPy-User] the skellam distribution In-Reply-To: <4AF83AFD.60304@gmail.com> References: <20091108151625.GA561@doriath.local> <1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com> <4AF83AFD.60304@gmail.com> Message-ID: <4AF83CE8.7080507@lpta.in2p3.fr> From what I understand of the initial statement from Ernest: " In case somebody is interested, or you want to include it in scipy. I used these specs here from the R package: cran.r-project.org/web/packages/skellam/skellam.pdf " he used the spec, as defined in this pdf, and did not look at the code itself. If my interpretation of the small preamble above is correct, I believe his implementation is not GPL-tainted, right? Johann Bruce Southey wrote: > On 11/09/2009 08:40 AM, josef.pktd at gmail.com wrote: > >> 2009/11/8 Ernest Adrogu?: >> >> >>> Hi, >>> >>> In case somebody is interested, or you want to include it >>> in scipy. I used these specs here from the R package: >>> cran.r-project.org/web/packages/skellam/skellam.pdf >>> >>> Note that I am no statician, somebody who knows what he's >>> doing (as opposed to me ;) should verify it's correct. >>> >>> >>> import numpy >>> import scipy.stats.distributions >>> >>> # Skellam distribution >>> >>> ncx2 = scipy.stats.distributions.ncx2 >>> >>> class skellam_gen(scipy.stats.distributions.rv_discrete): >>> def _pmf(self, x, mu1, mu2): >>> if x< 0: >>> px = ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2 >>> else: >>> px = ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2 >>> return px >>> def _cdf(self, x, mu1, mu2): >>> x = numpy.floor(x) >>> if x< 0: >>> px = ncx2.cdf(2*mu2, x*(-2), 2*mu1) >>> else: >>> px = 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2) >>> return px >>> def _stats(self, mu1, mu2): >>> mean = mu1 - mu2 >>> var = mu1 + mu2 >>> g1 = (mu1 - mu2) / numpy.sqrt((mu1 + mu2)**3) >>> g2 = 1 / (mu1 + mu2) >>> return mean, var, g1, g2 >>> skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam', >>> shapes="mu1,mu2", extradoc="") >>> >>> >>> >> Thanks, I think the distribution of the difference of two poisson >> distributed random variables could be useful. >> >> Would you please open an enhancement ticket for this at >> http://projects.scipy.org/scipy/report/1 >> >> I had only a brief look at it so far, I had never looked at the >> Skellam distribution before, and just read a few references. >> >> The "if x< 0 .. else ..." will have to be replace with a >> "numpy.where" assignment, since the methods are supposed to work with >> arrays of x (as far as I remember) >> >> _rvs could be implemented directly instead of generically (I don't >> find the reference, where I saw it, right now). >> >> Documentation will be necessary, a brief description in the >> (currently) extradocs, and a listing of the properties for the >> description of the distributions currently in the stats tutorial. >> >> I have some background questions, which address the limitation of the >> implementation (but are not really necessary for inclusion into >> scipy). >> >> The description in R mentions several implementation of Skellam. Do >> you have a rough idea what the range of parameters are for which the >> implementation using ncx produces good results? Do you know if any >> other special functions would produce good results over a larger >> range, e.g. using Bessel function? >> >> Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also >> mentions (but doesn't describe) the case of Skellam distribution with >> correlated Poisson distributions. Do you know what the difference to >> your implementation would be? >> >> Tests for a new distribution will be picked up by the generic tests, >> but it would be useful to have some extra tests for extreme/uncommon >> parameter ranges. Do you have any comparisons with R, since you >> already looked it? >> >> >> Thanks again, I'm always looking out for new useful distributions, >> (but I have to find the time to do the testing and actual >> implementation). >> >> Josef >> >> >> >> >>> Bye. >>> >>> -- >>> Ernest >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > Generally any R code can not be used in numpy because R is GPL. Usually > R code is also licensed under GPL so translation from R to Python/numpy > still maintains the original license. So the code not used by numpy > unless that code is licensed under a BSD compatible license. > > You *must* show that you implementation is from a BSD-compatible source > not from the R package. I can see that your code is very simple so there > should be an viable alternative source. > > Also, in the _stats function why do you do not re-use the mean and var > variables in computing the g1 and g2 variables? > > What are 'x, mu1, mu2' ? > This looks like a scalar implementation so you need to either check that > or allow for array-like inputs. > > Bruce > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From josef.pktd at gmail.com Mon Nov 9 11:07:02 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 9 Nov 2009 11:07:02 -0500 Subject: [SciPy-User] the skellam distribution In-Reply-To: <4AF83AFD.60304@gmail.com> References: <20091108151625.GA561@doriath.local> <1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com> <4AF83AFD.60304@gmail.com> Message-ID: <1cd32cbb0911090807x2cdcebbcu9b296b8a943630e9@mail.gmail.com> On Mon, Nov 9, 2009 at 10:53 AM, Bruce Southey wrote: > On 11/09/2009 08:40 AM, josef.pktd at gmail.com wrote: >> 2009/11/8 Ernest Adrogu?: >> >>> Hi, >>> >>> In case somebody is interested, or you want to include it >>> in scipy. I used these specs here from the R package: >>> cran.r-project.org/web/packages/skellam/skellam.pdf >>> >>> Note that I am no statician, somebody who knows what he's >>> doing (as opposed to me ;) should verify it's correct. >>> >>> >>> import numpy >>> import scipy.stats.distributions >>> >>> # Skellam distribution >>> >>> ncx2 = scipy.stats.distributions.ncx2 >>> >>> class skellam_gen(scipy.stats.distributions.rv_discrete): >>> ? ? def _pmf(self, x, mu1, mu2): >>> ? ? ? ? if x< ?0: >>> ? ? ? ? ? ? px = ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2 >>> ? ? ? ? else: >>> ? ? ? ? ? ? px = ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2 >>> ? ? ? ? return px >>> ? ? def _cdf(self, x, mu1, mu2): >>> ? ? ? ? x = numpy.floor(x) >>> ? ? ? ? if x< ?0: >>> ? ? ? ? ? ? px = ncx2.cdf(2*mu2, x*(-2), 2*mu1) >>> ? ? ? ? else: >>> ? ? ? ? ? ? px = 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2) >>> ? ? ? ? return px >>> ? ? def _stats(self, mu1, mu2): >>> ? ? ? ? mean = mu1 - mu2 >>> ? ? ? ? var = mu1 + mu2 >>> ? ? ? ? g1 = (mu1 - mu2) / numpy.sqrt((mu1 + mu2)**3) >>> ? ? ? ? g2 = 1 / (mu1 + mu2) >>> ? ? ? ? return mean, var, g1, g2 >>> skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam', >>> ? ? ? ? ? ? ? ? ? ? ? shapes="mu1,mu2", extradoc="") >>> >>> >> Thanks, I think the distribution of the difference of two poisson >> distributed random variables could be useful. >> >> Would you please open an enhancement ticket for this at >> http://projects.scipy.org/scipy/report/1 >> >> I had only a brief look at it so far, I had never looked at the >> Skellam distribution before, and just read a few references. >> >> The "if x< ?0 .. else ..." will have to be replace with a >> "numpy.where" assignment, since the methods are supposed to work with >> arrays of x (as far as I remember) >> >> _rvs could be implemented directly instead of generically (I don't >> find the reference, where I saw it, right now). >> >> Documentation will be necessary, ?a brief description in the >> (currently) extradocs, and a listing of the properties for the >> description of the distributions currently in the stats tutorial. >> >> I have some background questions, which address the limitation of the >> implementation (but are not really necessary for inclusion into >> scipy). >> >> The description in R mentions several implementation of Skellam. Do >> you have a rough idea what the range of parameters are for which the >> implementation using ncx produces good results? Do you know if any >> other special functions would produce good results over a larger >> range, e.g. using Bessel function? >> >> Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also >> mentions (but doesn't describe) the case of Skellam distribution with >> correlated Poisson distributions. Do you know what the difference to >> your implementation would be? >> >> Tests for a new distribution will be picked up by the generic tests, >> but it would be useful to have some extra tests for extreme/uncommon >> parameter ranges. Do you have any comparisons with R, since you >> already looked it? >> >> >> Thanks again, I'm always looking out for new useful distributions, >> (but I have to find the time to do the testing and actual >> implementation). >> >> Josef >> >> >> >>> Bye. >>> >>> -- >>> Ernest >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > Generally any R code can not be used in numpy because R is GPL. Usually > R code is also licensed under GPL ?so translation from R to Python/numpy > still maintains the original license. So the code not used by numpy > unless that code is licensed under a BSD compatible license. > > You *must* show that you implementation is from a BSD-compatible source > not from the R package. I can see that your code is very simple so there > should be an viable alternative source. We only need to read the R documentation and not the R code: pskellam(x,lambda1,lambda2) returns pchisq(2*lambda2, -2*x, 2*lambda1) for x <= 0 and 1 - pchisq(2*lambda1, 2*(x+1), 2*lambda2) for x >= 0. When pchisq incorrectly returns 0, a saddlepoint approximation is substituted, which typically gives at least 2-figure accuracy. The quantile is defined as the smallest value x such that F(x)  p, where F is the distribution function. For lower.tail=FALSE, the quantile is defined as the largest value x such that F(x;lower.tail=FALSE)  p. rskellam is calculated as rpois(n,lambda1)-rpois(n,lambda2) dskellam. and The relation of dgamma to the modified Bessel function of the first kind was given by Skellam (1946). The relation of pgamma to the noncentral chi-square was given by Johnson (1959). Tables are given by Strackee and van der Gon (1962), which can be used to verify this implementation (cf. direct calculation in the examples below). the rest follows from the Wikipedia page (which is also in the list of references in R docs), there is no copyright on the definition of a distribution. > > Also, in the _stats function why do you do not re-use the mean and var > variables in computing the g1 and g2 variables? > > What are 'x, mu1, mu2' ? x is the integer at which cdf pr pmf are calculated, mu1,mu2 are the parameters of the poisson distributions, following wikipedia. Josef > This looks like a scalar implementation so you need to either check that > or allow for array-like inputs. > > Bruce > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From bsouthey at gmail.com Mon Nov 9 11:19:41 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 09 Nov 2009 10:19:41 -0600 Subject: [SciPy-User] the skellam distribution In-Reply-To: <4AF83CE8.7080507@lpta.in2p3.fr> References: <20091108151625.GA561@doriath.local> <1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com> <4AF83AFD.60304@gmail.com> <4AF83CE8.7080507@lpta.in2p3.fr> Message-ID: <4AF8411D.9050605@gmail.com> On 11/09/2009 10:01 AM, Johann Cohen-Tanugi wrote: > From what I understand of the initial statement from Ernest: > " > In case somebody is interested, or you want to include it > in scipy. I used these specs here from the R package: > cran.r-project.org/web/packages/skellam/skellam.pdf > " > > he used the spec, as defined in this pdf, and did not look at the code > itself. If my interpretation of the small preamble above is correct, I > believe his implementation is not GPL-tainted, right? > > Johann > Bruce Southey wrote: > >> On 11/09/2009 08:40 AM, josef.pktd at gmail.com wrote: >> >> >>> 2009/11/8 Ernest Adrogu?: >>> >>> >>> >>>> Hi, >>>> >>>> In case somebody is interested, or you want to include it >>>> in scipy. I used these specs here from the R package: >>>> cran.r-project.org/web/packages/skellam/skellam.pdf >>>> >>>> Note that I am no statician, somebody who knows what he's >>>> doing (as opposed to me ;) should verify it's correct. >>>> >>>> >>>> import numpy >>>> import scipy.stats.distributions >>>> >>>> # Skellam distribution >>>> >>>> ncx2 = scipy.stats.distributions.ncx2 >>>> >>>> class skellam_gen(scipy.stats.distributions.rv_discrete): >>>> def _pmf(self, x, mu1, mu2): >>>> if x< 0: >>>> px = ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2 >>>> else: >>>> px = ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2 >>>> return px >>>> def _cdf(self, x, mu1, mu2): >>>> x = numpy.floor(x) >>>> if x< 0: >>>> px = ncx2.cdf(2*mu2, x*(-2), 2*mu1) >>>> else: >>>> px = 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2) >>>> return px >>>> def _stats(self, mu1, mu2): >>>> mean = mu1 - mu2 >>>> var = mu1 + mu2 >>>> g1 = (mu1 - mu2) / numpy.sqrt((mu1 + mu2)**3) >>>> g2 = 1 / (mu1 + mu2) >>>> return mean, var, g1, g2 >>>> skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam', >>>> shapes="mu1,mu2", extradoc="") >>>> >>>> >>>> >>>> >>> Thanks, I think the distribution of the difference of two poisson >>> distributed random variables could be useful. >>> >>> Would you please open an enhancement ticket for this at >>> http://projects.scipy.org/scipy/report/1 >>> >>> I had only a brief look at it so far, I had never looked at the >>> Skellam distribution before, and just read a few references. >>> >>> The "if x< 0 .. else ..." will have to be replace with a >>> "numpy.where" assignment, since the methods are supposed to work with >>> arrays of x (as far as I remember) >>> >>> _rvs could be implemented directly instead of generically (I don't >>> find the reference, where I saw it, right now). >>> >>> Documentation will be necessary, a brief description in the >>> (currently) extradocs, and a listing of the properties for the >>> description of the distributions currently in the stats tutorial. >>> >>> I have some background questions, which address the limitation of the >>> implementation (but are not really necessary for inclusion into >>> scipy). >>> >>> The description in R mentions several implementation of Skellam. Do >>> you have a rough idea what the range of parameters are for which the >>> implementation using ncx produces good results? Do you know if any >>> other special functions would produce good results over a larger >>> range, e.g. using Bessel function? >>> >>> Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also >>> mentions (but doesn't describe) the case of Skellam distribution with >>> correlated Poisson distributions. Do you know what the difference to >>> your implementation would be? >>> >>> Tests for a new distribution will be picked up by the generic tests, >>> but it would be useful to have some extra tests for extreme/uncommon >>> parameter ranges. Do you have any comparisons with R, since you >>> already looked it? >>> >>> >>> Thanks again, I'm always looking out for new useful distributions, >>> (but I have to find the time to do the testing and actual >>> implementation). >>> >>> Josef >>> >>> >>> >>> >>> >>>> Bye. >>>> >>>> -- >>>> Ernest >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>>> >>>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >>> >> Generally any R code can not be used in numpy because R is GPL. Usually >> R code is also licensed under GPL so translation from R to Python/numpy >> still maintains the original license. So the code not used by numpy >> unless that code is licensed under a BSD compatible license. >> >> You *must* show that you implementation is from a BSD-compatible source >> not from the R package. I can see that your code is very simple so there >> should be an viable alternative source. >> >> Also, in the _stats function why do you do not re-use the mean and var >> variables in computing the g1 and g2 variables? >> >> What are 'x, mu1, mu2' ? >> This looks like a scalar implementation so you need to either check that >> or allow for array-like inputs. >> >> Bruce >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > I am not a lawyer! But I do not see that any reference to not seeing the code. Furthermore, there is insufficient information in the cited reference for this implementation (but I have not seen the actual code and would rather not have to see it). But, as Josef pointed out, there is a Wikipedia source so it should be trivial to show that this code is independent of the R implementation. Bruce From josef.pktd at gmail.com Mon Nov 9 11:58:04 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 9 Nov 2009 11:58:04 -0500 Subject: [SciPy-User] the skellam distribution In-Reply-To: <4AF8411D.9050605@gmail.com> References: <20091108151625.GA561@doriath.local> <1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com> <4AF83AFD.60304@gmail.com> <4AF83CE8.7080507@lpta.in2p3.fr> <4AF8411D.9050605@gmail.com> Message-ID: <1cd32cbb0911090858r60713d83jf4e351401a640b12@mail.gmail.com> On Mon, Nov 9, 2009 at 11:19 AM, Bruce Southey wrote: > On 11/09/2009 10:01 AM, Johann Cohen-Tanugi wrote: >> ? ?From what I understand of the initial statement from Ernest: >> " >> In case somebody is interested, or you want to include it >> in scipy. I used these specs here from the R package: >> cran.r-project.org/web/packages/skellam/skellam.pdf >> " >> >> he used the spec, as defined in this pdf, and did not look at the code >> itself. If my interpretation of the small preamble above is correct, I >> believe his implementation is not GPL-tainted, right? >> >> Johann >> Bruce Southey wrote: >> >>> On 11/09/2009 08:40 AM, josef.pktd at gmail.com wrote: >>> >>> >>>> 2009/11/8 Ernest Adrogu?: >>>> >>>> >>>> >>>>> Hi, >>>>> >>>>> In case somebody is interested, or you want to include it >>>>> in scipy. I used these specs here from the R package: >>>>> cran.r-project.org/web/packages/skellam/skellam.pdf >>>>> >>>>> Note that I am no statician, somebody who knows what he's >>>>> doing (as opposed to me ;) should verify it's correct. >>>>> >>>>> >>>>> import numpy >>>>> import scipy.stats.distributions >>>>> >>>>> # Skellam distribution >>>>> >>>>> ncx2 = scipy.stats.distributions.ncx2 >>>>> >>>>> class skellam_gen(scipy.stats.distributions.rv_discrete): >>>>> ? ? ?def _pmf(self, x, mu1, mu2): >>>>> ? ? ? ? ?if x< ? 0: >>>>> ? ? ? ? ? ? ?px = ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2 >>>>> ? ? ? ? ?else: >>>>> ? ? ? ? ? ? ?px = ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2 >>>>> ? ? ? ? ?return px >>>>> ? ? ?def _cdf(self, x, mu1, mu2): >>>>> ? ? ? ? ?x = numpy.floor(x) >>>>> ? ? ? ? ?if x< ? 0: >>>>> ? ? ? ? ? ? ?px = ncx2.cdf(2*mu2, x*(-2), 2*mu1) >>>>> ? ? ? ? ?else: >>>>> ? ? ? ? ? ? ?px = 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2) >>>>> ? ? ? ? ?return px >>>>> ? ? ?def _stats(self, mu1, mu2): >>>>> ? ? ? ? ?mean = mu1 - mu2 >>>>> ? ? ? ? ?var = mu1 + mu2 >>>>> ? ? ? ? ?g1 = (mu1 - mu2) / numpy.sqrt((mu1 + mu2)**3) >>>>> ? ? ? ? ?g2 = 1 / (mu1 + mu2) >>>>> ? ? ? ? ?return mean, var, g1, g2 >>>>> skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam', >>>>> ? ? ? ? ? ? ? ? ? ? ? ?shapes="mu1,mu2", extradoc="") >>>>> >>>>> >>>>> >>>>> >>>> Thanks, I think the distribution of the difference of two poisson >>>> distributed random variables could be useful. >>>> >>>> Would you please open an enhancement ticket for this at >>>> http://projects.scipy.org/scipy/report/1 >>>> >>>> I had only a brief look at it so far, I had never looked at the >>>> Skellam distribution before, and just read a few references. >>>> >>>> The "if x< ? 0 .. else ..." will have to be replace with a >>>> "numpy.where" assignment, since the methods are supposed to work with >>>> arrays of x (as far as I remember) >>>> >>>> _rvs could be implemented directly instead of generically (I don't >>>> find the reference, where I saw it, right now). >>>> >>>> Documentation will be necessary, ?a brief description in the >>>> (currently) extradocs, and a listing of the properties for the >>>> description of the distributions currently in the stats tutorial. >>>> >>>> I have some background questions, which address the limitation of the >>>> implementation (but are not really necessary for inclusion into >>>> scipy). >>>> >>>> The description in R mentions several implementation of Skellam. Do >>>> you have a rough idea what the range of parameters are for which the >>>> implementation using ncx produces good results? Do you know if any >>>> other special functions would produce good results over a larger >>>> range, e.g. using Bessel function? >>>> >>>> Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also >>>> mentions (but doesn't describe) the case of Skellam distribution with >>>> correlated Poisson distributions. Do you know what the difference to >>>> your implementation would be? same form different interpretation "The distributions of the dierence between two independent and two bivariate (correlated) Poisson variates are of the same form. However, the interpretation of the parameters is different. Assuming that the bivariate Poisson distribution is the correct distribution, then the marginal means x and y will be unbiased estimates of 1 + 3 and 2 + 3, respectively, instead of the parameters of interest 1 and 2. Therefore, the parameters of the PD distribution are not directly connected to the marginal means of the actual Poisson distributions." from Bayesian analysis of the dierences of count data D. Karlis and I. Ntzoufras STATISTICS IN MEDICINE Statist. Med. 2006; 25:1885?1905 they have some funny application to soccer scores http://stat-athens.aueb.gr/~jbn/publications.htm Josef Published online 26 October 2005 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/sim.2382 >>>> >>>> Tests for a new distribution will be picked up by the generic tests, >>>> but it would be useful to have some extra tests for extreme/uncommon >>>> parameter ranges. Do you have any comparisons with R, since you >>>> already looked it? >>>> >>>> >>>> Thanks again, I'm always looking out for new useful distributions, >>>> (but I have to find the time to do the testing and actual >>>> implementation). >>>> >>>> Josef >>>> >>>> >>>> >>>> >>>> >>>>> Bye. >>>>> >>>>> -- >>>>> Ernest >>>>> _______________________________________________ >>>>> SciPy-User mailing list >>>>> SciPy-User at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>>> >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>>> >>>> >>> Generally any R code can not be used in numpy because R is GPL. Usually >>> R code is also licensed under GPL ?so translation from R to Python/numpy >>> still maintains the original license. So the code not used by numpy >>> unless that code is licensed under a BSD compatible license. >>> >>> You *must* show that you implementation is from a BSD-compatible source >>> not from the R package. I can see that your code is very simple so there >>> should be an viable alternative source. >>> >>> Also, in the _stats function why do you do not re-use the mean and var >>> variables in computing the g1 and g2 variables? >>> >>> What are 'x, mu1, mu2' ? >>> This looks like a scalar implementation so you need to either check that >>> or allow for array-like inputs. >>> >>> Bruce >>> >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > I am not a lawyer! But I do not see that any reference to not seeing the > code. Furthermore, there is insufficient information in the cited > reference for this implementation (but I have not seen the actual code > and would rather not have to see it). But, as Josef pointed out, there > is a Wikipedia source so it should be trivial to show that this code is > independent of the R implementation. > > Bruce > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From bsouthey at gmail.com Mon Nov 9 12:18:41 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 09 Nov 2009 11:18:41 -0600 Subject: [SciPy-User] Unit testing of Bayesian estimator In-Reply-To: References: Message-ID: <4AF84EF1.2090608@gmail.com> On 11/08/2009 01:47 AM, Anne Archibald wrote: > 2009/11/7 Bruce Southey: > >> On Fri, Nov 6, 2009 at 6:13 PM, Anne Archibald >> wrote: >> >>> Hi, >>> >>> I have implemented a simple Bayesian regression program (it takes >>> events modulo one and returns a posterior probability that the data is >>> phase-invariant plus a posterior distribution for two parameters >>> (modulation fraction and phase) in case there is modulation). >>> >> I do not know your field, a little rusty on certain issues and I do >> not consider myself a Bayesian. >> >> Exactly what type of Bayesian did you use? >> I also do not know how you implemented it especially if it is >> empirical or Monte Carlo Markov Chains. >> > It's an ultra-simple toy problem, really: I did the numerical > integration in the absolute simplest way possible, by evaluating the > quantity to be evaluated on a grid and averaging. See github for > details: > http://github.com/aarchiba/bayespf > > I can certainly improve on this, but I'd rather get my testing issues > sorted out first, so that I can test the tests, as it were, on an > implementation I'm reasonably confident is correct, before changing it > to a mathematically more subtle one. > I do not know what you are trying to do with the code as it is not my area. But you are using some empirical Bayesian estimator (http://en.wikipedia.org/wiki/Empirical_Bayes_method) and thus you lose much of the value of Bayesian as you are only dealing with modal estimates. Really you should be obtaining the distribution of "Probability the signal is pulsed" not just the modal estimate. >>> I'm >>> rather new at this, so I'd like to construct some unit tests. Does >>> anyone have any suggestions on how to go about this? >>> >> Since this is a test, the theoretical 'correctness' is irrelevant. So >> I would guess that you should use very informative priors and data >> with a huge amount of information. That should make the posterior have >> an extremely narrow range so your modal estimate is very close to the >> true value within a very small range. >> > This doesn't really test whether the estimator is doing a good job, > since if I throw mountains of information at it, even a rather badly > wrong implementation will eventually converge to the right answer. > (This is painful experience speaking.) > Are you testing the code or the method? My understanding of unit tests is that they test the code not the method. Unit tests tell me that my code is working correctly but do not necessary tell me if the method is right always. For example, if I need to iterate to get a solution, my test could stop after 1 or 2 rounds before convergence because I know that rest will be correct if the first rounds are correct. Testing the algorithm is relatively easy because you just have to use sensitivity analysis. Basically just use multiple data sets that vary in the number of observations and parameters to see how well these work. The hard part is making sense of the numbers. Also note that you have some explicit assumptions involved like the type of prior distribution. These tend to limit what you can do because if these assume a uniform prior then you can not use a non-uniform data set. Well you can but unless the data dominates the prior you will most likely get a weird answer. > I disagree on the issue of theoretical correctness, though. The best > tests do exactly that: test the theoretical correctness of the routine > in question, ideally without any reference to the implementation. To > test the SVD, for example, you just test that the two matrices are > both orthogonal, and you test that multiplying them together with the > singular values between gives you your original matrix. If your > implementation passes this test, it is computing the SVD just fine, no > matter what it looks like inside. > I agree that code must provide theoretical correctness. But I disagree that the code should always give the correct answer because the code is only as good as the algorithm. > With the frequentist signal-detection statistics I'm more familiar > with, I can write exactly this sort of test. I talk a little more > about it here: > http://lighthouseinthesky.blogspot.com/2009/11/testing-statistical-tests.html > > This works too well, it turns out, to apply to scipy's K-S test or my > own Kuiper test, since their p-values are calculated rather > approximately, so they fail. > Again, this is a failure of the algorithm not the code. Often statistical tests rely on large sample sizes or rely on the central limit theorem so these break down when the data is not well approximated by the normal distribution and when the sample size is small. In the blog, your usage of the chi-squared approximation is an example of this - it will be inappropriate for small sample size as well as when the true probability is very extreme (usually consider it valid between about 0.2 and 0.8 obviously depending on sample size). >> After that it really depends on the algorithm, the data used and what >> you need to test. Basically you just have to say given this set of >> inputs I get this 'result' that I consider reasonable. After all, if >> the implementation of algorithm works then it is most likely the >> inputs that are a problem. In statistics, problems usually enter >> because the desired model can not be estimated from the provided data. >> Separation of user errors from a bug in the code usually identified by >> fitting simpler or alternative models. >> > It's exactly the implementation I don't trust, here. I can scrutinize > the implementation all I like, but I'd really like an independent > check on my calculations, and staring at the code won't get me that. > I think that you mean is the algorithm and I do agree that looking at the code will only tell you that you have implemented the algorithm and will not tell you if the algorithm can be trusted. >>> For a frequentist periodicity detector, the return value is a >>> probability that, given the null hypothesis is true, the statistic >>> would be this extreme. So I can construct a strong unit test by >>> generating a collection of data sets given the null hypothesis, >>> evaluating the statistic, and seeing whether the number that claim to >>> be significant at a 5% level is really 5%. (In fact I can use the >>> binomial distribution to get limits on the number of false positive.) >>> This gives me a unit test that is completely orthogonal to my >>> implementation, and that passes if and only if the code works. For a >>> Bayesian hypothesis testing setup, I don't really see how to do >>> something analogous. >>> >>> I can generate non-modulated data sets and confirm that my code >>> returns a high probability that the data is not modulated, but how >>> high should I expect the probability to be? I can generate data sets >>> with models with known parameters and check that the best-fit >>> parameters are close to the known parameters - but how close? Even if >>> I do it many times, is the posterior mean unbiased? What about the >>> posterior mode or median? I can even generate models and then data >>> sets that are drawn from the prior distribution, but what should I >>> expect from the code output on such a data set? I feel sure there's >>> some test that verifies a statistical property of Bayesian >>> estimators/hypothesis testers, but I cant quite put my finger on it. >>> >>> Suggestions welcome. >>> >>> Thanks, >>> Anne >>> >> Please do not mix Frequentist or Likelihood concepts with Bayesian. >> Also you never generate data for estimation from the prior >> distribution, you generate it from the posterior distribution as that >> is what your estimating. >> > Um. I would be picking models from the prior distribution, not data. > However I find the models, I have a well-defined way to generate data > from the model. > > Why do you say it's a bad idea to mix Bayesian and frequentist > approaches? It seems to me that as I use them to try to answer similar > questions, it makes sense to compare them; and since I know how to > test frequentist estimators, it's worth seeing whether I can cast > Bayesian estimators in frequentist terms, at least for testing > purposes. > This is the fundamental difference between Bayesian and Frequentist approaches. In Bayesian, the posterior provides everything that you know about a parameter because it is a distribution. However, the modal parameter estimates should agree between both approaches. >> Really in Bayesian sense all this data generation is unnecessary >> because you have already calculated that information in computing the >> posteriors. The posterior of a parameter is a distribution not a >> single number so you just compare distributions. For example, you can >> compute modal values and construct Bayesian credible intervals of the >> parameters. These should make very strong sense to the original values >> simulated. >> > I take this to mean that I don't need to do simulations to get > credible intervals (while I normally would have to to get confidence > intervals), which I agree with. But this is a different question: I'm > talking about constructing a test by simulating the whole Bayesian > process and seeing whether it behaves as it should. The problem is > coming up with a sufficiently clear mathematical definition of > "should". > In Bayesian, you should have the posterior distribution of the parameter which is far more than just the mode, mean and variance. So if the posterior is normal, then I know what the mean and variance should be and thus what the confidence interval should be. >> For Bayesian work, you must address the data and the priors. In >> particular, you need to be careful about the informativeness of the >> prior. You can get great results just because your prior was >> sufficiently informative but you can get great results because you >> data was very informative. >> >> Depending on how it was implemented, a improper prior can be an issue >> because these do not guarantee a proper posterior (but often do lead >> to proper posteriors). So if your posterior is improper then you are >> in a very bad situation and can lead to weird results some or all of >> the time.Some times this is can easily be fixed such as by putting >> bounds on flat priors. Whereas proper priors give proper posteriors. >> > Indeed. I think my priors are pretty safe: 50% chance it's pulsed, > flat priors in phase and pulsed fraction. In the long run I might want > a slightly smarter prior on pulsed fraction, but for the moment I > think it's fine. > It is not about whether or not the prior are 'safe'. Rather it is the relative amount of information given. Also, from other areas, I tend to distrust anything that assumes a 50:50 split because these often lead to special results that do not always occur. So you think great, my code is working as everything looks fine. Then everything crashes when you deviate from that assumption because the 50% probability 'hides' something. An example in my area, the formula (for genetic effect) is of the form: alpha=2*p*q*(a + d(q-p)) where alpha, a and d are parameters and p and q are frequencies that add to one. We could mistakenly assume that a=alpha but that is only true if d=0 or when p=q=0.5. So we may start to get incorrect results when p is not equal to q and d is not zero. >> But as a final comment, it should not matter which approach you use as >> if you do not get what you simulated then either your code is wrong or >> you did not simulate what your code implements. (Surprising how >> frequent the latter is.) >> > This is a bit misleading. If I use a (fairly) small number of photons, > and/or a fairly small pulsed fraction, I should be astonished if I got > back the model parameters exactly. I know already that the data leave > a lot of room for slop, so what I am trying to test is how well this > Bayesian gizmo quantifies that slop. > > Anne > > You should not be astonished to get the prior as that is exactly what should happen if the data contains no information. For your simplistic case, I would implore you to generate the posterior distribution of the probability that the signal is pulsed (yes, easier to say than do). If you can do that then you will not only you get the modal value but you can compute the area under that distribution that is say less than 0.5 or whatever threshold you want to use. Bruce From josef.pktd at gmail.com Mon Nov 9 13:02:38 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 9 Nov 2009 13:02:38 -0500 Subject: [SciPy-User] Unit testing of Bayesian estimator In-Reply-To: <4AF84EF1.2090608@gmail.com> References: <4AF84EF1.2090608@gmail.com> Message-ID: <1cd32cbb0911091002s3bdb14cama1285a6596feda6a@mail.gmail.com> On Mon, Nov 9, 2009 at 12:18 PM, Bruce Southey wrote: > On 11/08/2009 01:47 AM, Anne Archibald wrote: >> 2009/11/7 Bruce Southey: >> >>> On Fri, Nov 6, 2009 at 6:13 PM, Anne Archibald >>> ?wrote: >>> >>>> Hi, >>>> >>>> I have implemented a simple Bayesian regression program (it takes >>>> events modulo one and returns a posterior probability that the data is >>>> phase-invariant plus a posterior distribution for two parameters >>>> (modulation fraction and phase) in case there is modulation). >>>> >>> I do not know your field, a little rusty on certain issues and I do >>> not consider myself a Bayesian. >>> >>> Exactly what type of Bayesian did you use? >>> I also do not know how you implemented it especially if it is >>> empirical or Monte Carlo Markov Chains. >>> >> It's an ultra-simple toy problem, really: I did the numerical >> integration in the absolute simplest way possible, by evaluating the >> quantity to be evaluated on a grid and averaging. See github for >> details: >> http://github.com/aarchiba/bayespf >> >> I can certainly improve on this, but I'd rather get my testing issues >> sorted out first, so that I can test the tests, as it were, on an >> implementation I'm reasonably confident is correct, before changing it >> to a mathematically more subtle one. >> > I do not know what you are trying to do with the code as it is not my > area. But you are using some empirical Bayesian estimator > (http://en.wikipedia.org/wiki/Empirical_Bayes_method) and thus you lose > much of the value of Bayesian as you are only dealing with modal > estimates. Really you should be obtaining the distribution of > "Probability the signal is pulsed" not just the modal estimate. > >>>> I'm >>>> rather new at this, so I'd like to construct some unit tests. Does >>>> anyone have any suggestions on how to go about this? >>>> >>> Since this is a test, the theoretical 'correctness' is irrelevant. So >>> I would guess that you should use very informative priors and data >>> with a huge amount of information. That should make the posterior have >>> an extremely narrow range so your modal estimate is very close to the >>> true value within a very small range. >>> >> This doesn't really test whether the estimator is doing a good job, >> since if I throw mountains of information at it, even a rather badly >> wrong implementation will eventually converge to the right answer. >> (This is painful experience speaking.) >> > Are you testing the code or the method? > My understanding of unit tests is that they test the code not the > method. Unit tests tell me that my code is working correctly but do not > necessary tell me if the method is right always. For example, if I need > to iterate to get a solution, my test could stop after 1 or 2 rounds > before convergence because I know that rest will be correct if the first > rounds are correct. > > Testing the algorithm is relatively easy because you just have to use > sensitivity analysis. Basically just use multiple data sets that vary in > the number of observations and parameters to see how well these work. > The hard part is making sense of the numbers. > > Also note that you have some explicit assumptions involved like the type > of prior distribution. These tend to limit what you can do because if > these assume a uniform prior then you can not use a non-uniform data > set. Well you can but unless the data dominates the prior you will most > likely get a weird answer. > >> I disagree on the issue of theoretical correctness, though. The best >> tests do exactly that: test the theoretical correctness of the routine >> in question, ideally without any reference to the implementation. To >> test the SVD, for example, you just test that the two matrices are >> both orthogonal, and you test that multiplying them together with the >> singular values between gives you your original matrix. If your >> implementation passes this test, it is computing the SVD just fine, no >> matter what it looks like inside. >> > I agree that code must provide theoretical correctness. But I disagree > that the code should always give the correct answer because the code is > only as good as the algorithm. > >> With the frequentist signal-detection statistics I'm more familiar >> with, I can write exactly this sort of test. I talk a little more >> about it here: >> http://lighthouseinthesky.blogspot.com/2009/11/testing-statistical-tests.html >> >> This works too well, it turns out, to apply to scipy's K-S test or my >> own Kuiper test, since their p-values are calculated rather >> approximately, so they fail. >> > Again, this is a failure of the algorithm not the code. Often > statistical tests rely on large sample sizes or rely on the central > limit theorem so these break down when the data is not well approximated > by the normal distribution and when the sample size is small. In the > blog, your usage of the chi-squared approximation is an example of this > - it will be inappropriate for small sample size as well as when the > true probability is very extreme (usually consider it valid between > about 0.2 and 0.8 obviously depending on sample size). > > >>> After that it really depends on the algorithm, the data used and what >>> you need to test. Basically you just have to say given this set of >>> inputs I get this 'result' that I consider reasonable. After all, if >>> the implementation of algorithm works then it is most likely the >>> inputs that are a problem. In statistics, problems usually enter >>> because the desired model can not be estimated from the provided data. >>> Separation of user errors from a bug in the code usually identified by >>> fitting simpler or alternative models. >>> >> It's exactly the implementation I don't trust, here. I can scrutinize >> the implementation all I like, but I'd really like an independent >> check on my calculations, and staring at the code won't get me that. >> > > I think that you mean is the algorithm and I do agree that looking at > the code will only tell you that you have implemented the algorithm and > will not tell you if the algorithm can be trusted. > >>>> For a frequentist periodicity detector, the return value is a >>>> probability that, given the null hypothesis is true, the statistic >>>> would be this extreme. So I can construct a strong unit test by >>>> generating a collection of data sets given the null hypothesis, >>>> evaluating the statistic, and seeing whether the number that claim to >>>> be significant at a 5% level is really 5%. (In fact I can use the >>>> binomial distribution to get limits on the number of false positive.) >>>> This gives me a unit test that is completely orthogonal to my >>>> implementation, and that passes if and only if the code works. For a >>>> Bayesian hypothesis testing setup, I don't really see how to do >>>> something analogous. >>>> >>>> I can generate non-modulated data sets and confirm that my code >>>> returns a high probability that the data is not modulated, but how >>>> high should I expect the probability to be? I can generate data sets >>>> with models with known parameters and check that the best-fit >>>> parameters are close to the known parameters - but how close? Even if >>>> I do it many times, is the posterior mean unbiased? What about the >>>> posterior mode or median? I can even generate models and then data >>>> sets that are drawn from the prior distribution, but what should I >>>> expect from the code output on such a data set? I feel sure there's >>>> some test that verifies a statistical property of Bayesian >>>> estimators/hypothesis testers, but I cant quite put my finger on it. >>>> >>>> Suggestions welcome. >>>> >>>> Thanks, >>>> Anne >>>> >>> Please do not mix Frequentist or Likelihood concepts with Bayesian. >>> Also you never generate data for estimation from the prior >>> distribution, you generate it from the posterior distribution as that >>> is what your estimating. >>> >> Um. I would be picking models from the prior distribution, not data. >> However I find the models, I have a well-defined way to generate data >> from the model. >> >> Why do you say it's a bad idea to mix Bayesian and frequentist >> approaches? It seems to me that as I use them to try to answer similar >> questions, it makes sense to compare them; and since I know how to >> test frequentist estimators, it's worth seeing whether I can cast >> Bayesian estimators in frequentist terms, at least for testing >> purposes. >> > > This is the fundamental difference between Bayesian and Frequentist > approaches. In Bayesian, the posterior provides everything that you know > about a parameter because it is a distribution. However, the modal > parameter estimates should agree between both approaches. > >>> Really in Bayesian sense all this data generation is unnecessary >>> because you have already calculated that information in computing the >>> posteriors. The posterior of a parameter is a distribution not a >>> single number so you just compare distributions. ?For example, you can >>> compute modal values and construct Bayesian credible intervals of the >>> parameters. These should make very strong sense to the original values >>> simulated. >>> >> I take this to mean that I don't need to do simulations to get >> credible intervals (while I normally would have to to get confidence >> intervals), which I agree with. But this is a different question: I'm >> talking about constructing a test by simulating the whole Bayesian >> process and seeing whether it behaves as it should. The problem is >> coming up with a sufficiently clear mathematical definition of >> "should". >> > In Bayesian, you should have the posterior distribution of the parameter > which is far more than just the mode, mean and variance. So if the > posterior is normal, then I know what the mean and variance should be > and thus what the confidence interval should be. > > >>> For Bayesian work, you must address the data and the priors. In >>> particular, you need to be careful about the informativeness of the >>> prior. You can get great results just because your prior was >>> sufficiently informative but you can get great results because you >>> data was very informative. >>> >>> Depending on how it was implemented, a improper prior can be an issue >>> because these do not guarantee a proper posterior (but often do lead >>> to proper posteriors). So if your posterior is improper then you are >>> in a very bad situation and can lead to weird results some or all of >>> the time.Some times this is can easily be fixed such as by putting >>> bounds on flat priors. Whereas proper priors give proper posteriors. >>> >> Indeed. I think my priors are pretty safe: 50% chance it's pulsed, >> flat priors in phase and pulsed fraction. In the long run I might want >> a slightly smarter prior on pulsed fraction, but for the moment I >> think it's fine. >> > It is not about whether or not the prior are 'safe'. Rather it is the > relative amount of information given. > > Also, from other areas, I tend to distrust anything that assumes a 50:50 > split because these often lead to special results that do not always > occur. So you think great, my code is working as everything looks fine. > Then everything crashes when you deviate from that assumption because > the 50% probability 'hides' something. > An example in my area, the formula (for genetic effect) is of the form: > alpha=2*p*q*(a + d(q-p)) > where alpha, a and d are parameters and p and q are frequencies that add > to one. > We could mistakenly assume that a=alpha but that is only true if d=0 or > when p=q=0.5. So we may start to get incorrect results when p is not > equal to q and d is not zero. > > >>> But as a final comment, it should not matter which approach you use as >>> if you do not get what you simulated then either your code is wrong or >>> you did not simulate what your code implements. (Surprising how >>> frequent the latter is.) >>> >> This is a bit misleading. If I use a (fairly) small number of photons, >> and/or a fairly small pulsed fraction, I should be astonished if I got >> back the model parameters exactly. I know already that the data leave >> a lot of room for slop, so what I am trying to test is how well this >> Bayesian gizmo quantifies that slop. >> >> Anne >> >> > You should not be astonished to get the prior as that is exactly what > should happen if the data contains no information. > > For your simplistic case, I would implore you to generate the posterior > distribution of the probability that the signal is pulsed (yes, easier > to say than do). ?If you can do that then you will not only you get the > modal value but you can compute the area under that distribution that is > say less than 0.5 or whatever threshold you want to use. > > Bruce With the limitation of using only a single prior at the current stage, I think Anne has already implemented the standard Bayesian estimation including a nice graph of the joint posterior distribution of fraction and phase. Whether the signal is pulsed or not is just a binary variable, and the posterior belief is just the probability that it is pulsed. (I think that's correct, since there is no additional level where the "population" distribution of pulsing versus non-pulsing signals is a random variable.) I think it is very useful to look at the sampling distribution of a Bayesian estimator or test. For an individual Bayesian everything is summarized in the posterior distribution, but how well are a 1000 Bayesian econometricians/statisticians doing that look at a 1000 different stars, especially if I'm not sure they programmed their code correctly? At the end, it's only interesting if they have a lower MSE or a higher power in the test, otherwise I just use a different algorithm. I usually check and test the statistical properties of an algorithm and not just whether it is correctly implemented. And I think Anne is doing it in a similar way, checking whether the size of a statistical test is ok. Of course, for applications where only incorrectly sized (biased) tests are available, it is difficult to find a good tightness of the unit tests. Josef > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From peridot.faceted at gmail.com Mon Nov 9 13:06:15 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Mon, 9 Nov 2009 13:06:15 -0500 Subject: [SciPy-User] Unit testing of Bayesian estimator In-Reply-To: <4AF84EF1.2090608@gmail.com> References: <4AF84EF1.2090608@gmail.com> Message-ID: 2009/11/9 Bruce Southey : > I do not know what you are trying to do with the code as it is not my > area. But you are using some empirical Bayesian estimator > (http://en.wikipedia.org/wiki/Empirical_Bayes_method) and thus you lose > much of the value of Bayesian as you are only dealing with modal > estimates. Really you should be obtaining the distribution of > "Probability the signal is pulsed" not just the modal estimate. Um. Given a data set and a prior, I just do Bayesian hypothesis comparison. This gives me a single probability that the signal is pulsed. You seem to be imagining a probability distribution for this probability - but what would the independent variables be? The unpulsed distribution does not depend on any parameters, and I have integrated over all possible values for the pulsed distribution. So what I get should really be the probability, given the data, that the signal is pulsed. I'm not using an empirical Bayesian estimator; I'm doing the numerical integrations directly (and inefficiently). >> This doesn't really test whether the estimator is doing a good job, >> since if I throw mountains of information at it, even a rather badly >> wrong implementation will eventually converge to the right answer. >> (This is painful experience speaking.) >> > Are you testing the code or the method? > My understanding of unit tests is that they test the code not the > method. Unit tests tell me that my code is working correctly but do not > necessary tell me if the method is right always. For example, if I need > to iterate to get a solution, my test could stop after 1 or 2 rounds > before convergence because I know that rest will be correct if the first > rounds are correct. Unit tests can be used to do either. Since what I'm trying to do here is make sure I understand Bayesian inference, I'm most worried about the algorithm. > Testing the algorithm is relatively easy because you just have to use > sensitivity analysis. Basically just use multiple data sets that vary in > the number of observations and parameters to see how well these work. > The hard part is making sense of the numbers. It is exactly how to make sense of the numbers that I'm asking about. > Also note that you have some explicit assumptions involved like the type > of prior distribution. These tend to limit what you can do because if > these assume a uniform prior then you can not use a non-uniform data > set. Well you can but unless the data dominates the prior you will most > likely get a weird answer. I don't understand what you mean by a "non-uniform data set". Individual data sets are drawn from models, one of which is uniform. The priors define the distribution of models; the priors I use give a 50% chance the model is uniform and a 50% chance the model is pulsed. Anne From rpg.314 at gmail.com Mon Nov 9 13:11:29 2009 From: rpg.314 at gmail.com (Rohit Garg) Date: Mon, 9 Nov 2009 23:41:29 +0530 Subject: [SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster Message-ID: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com> Hi all, I have an embarrassingly parallel problem, very nicely suited to parallelization. I am looking for community feedback on how to best approach this matter? Basically, I just setup a bunch of tasks, and the various cpu's will pull data, process it, and send it back. Out of order arrival of results is no problem. The processing times involved are so large that the communication is effectively free, and hence I don't care how fast/slow the communication is. I thought I'll ask in case somebody has done this stuff before to avoid reinventing the wheel. Any other suggestions are welcome too. My only constraint is that it should be able to run a python extension (c++) with minimum of fuss. I want to minimize the headaches involved with setting up/writing the boilerplate code. Which framework/approach/library would you recommend? There is one method mentioned at [1], and of course, one could resort to something like mpi4py. [1] http://docs.python.org/library/multiprocessing.html {see the last example} -- Rohit Garg http://rpg-314.blogspot.com/ Senior Undergraduate Department of Physics Indian Institute of Technology Bombay From peridot.faceted at gmail.com Mon Nov 9 13:14:36 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Mon, 9 Nov 2009 13:14:36 -0500 Subject: [SciPy-User] Unit testing of Bayesian estimator In-Reply-To: <1cd32cbb0911082107k347cbc13s2bc41efb34c4fba8@mail.gmail.com> References: <1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com> <1cd32cbb0911081414u23d3ef8cgb408c2fbed65c098@mail.gmail.com> <1cd32cbb0911081835i34e8d404x3e3c2f75b96f9aa9@mail.gmail.com> <1cd32cbb0911082107k347cbc13s2bc41efb34c4fba8@mail.gmail.com> Message-ID: 2009/11/9 : > >From the posterior probability S/(S+1), you could construct > a decision rule similar to a classical test, e.g. accept null > if S/(S+1) < 0.95, and then construct a MonteCarlo > with samples drawn form either the uniform or the pulsed > distribution in the same way as for a classical test, and > verify that the decision mistakes, alpha and beta errors, in the > sample are close to the posterior probabilities. > The posterior probability would be similar to the p-value > in a classical test. If you want to balance alpha and > beta errors, a threshold S/(S+1)<0.5 would be more > appropriate, but for the unit tests it wouldn't matter. Unfortunately this doesn't work. Think of it this way: if my data size is 10000 photons, and I'm looking at the fraction of uniformly-distributed data sets that have a probability > 0.95 that they are pulsed, this won't happen with 5% of my fake data sets - it will almost never happen, since 10000 photons are enough to give a very solid answer (experiment confirms this). So I can't interpret my Bayesian probability as a frequentist probability of alpha error. > Running the example a few times, it looks like that the power > is relatively low for distinguishing uniform distribution from > a pulsed distribution with fraction/binomial parameter 0.05 > and sample size <1000. > If you have strong beliefs that the fraction is really this low > than an informative prior for the fraction, might improve the > results. I really don't want to encourage my code to return reports of pulsations. To be believed in this nest of frequentists I work with, I need a solid detection in spite of very conservative priors. Anne From eadrogue at gmx.net Mon Nov 9 13:16:51 2009 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Mon, 9 Nov 2009 19:16:51 +0100 Subject: [SciPy-User] the skellam distribution In-Reply-To: <1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com> References: <20091108151625.GA561@doriath.local> <1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com> Message-ID: <20091109181650.GA5957@doriath.local> 9/11/09 @ 09:40 (-0500), thus spake josef.pktd at gmail.com: > Thanks, I think the distribution of the difference of two poisson > distributed random variables could be useful. > > Would you please open an enhancement ticket for this at > http://projects.scipy.org/scipy/report/1 Okay, I will. > I had only a brief look at it so far, I had never looked at the > Skellam distribution before, and just read a few references. > > The "if x < 0 .. else ..." will have to be replace with a > "numpy.where" assignment, since the methods are supposed to work with > arrays of x (as far as I remember) Done. > _rvs could be implemented directly instead of generically (I don't > find the reference, where I saw it, right now). I suppose it could be done, but I can't figure out how. As far as I understand, you can't derive the poisson parameters from the skellam parameters alone, you need the correlation coefficient (between the two poisson rv) too, is that right? > Documentation will be necessary, a brief description in the > (currently) extradocs, and a listing of the properties for the > description of the distributions currently in the stats tutorial. > > I have some background questions, which address the limitation of the > implementation (but are not really necessary for inclusion into > scipy). > > The description in R mentions several implementation of Skellam. Do > you have a rough idea what the range of parameters are for which the > implementation using ncx produces good results? Do you know if any > other special functions would produce good results over a larger > range, e.g. using Bessel function? No, sorry, I have no idea. The R paper says that in R the Bessel function is more accurate, but I don't think this means that the Bessel function implementation is more accurate in general. I compared results using the Bessel function in scipy and using the ncx2 implementation, and the differences were minimal, although I admit my testing wasn't extensive. Another problem is that I haven't got a table of values for this particular distribution, so in case results differ I can't tell which one is more accurate. > Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also > mentions (but doesn't describe) the case of Skellam distribution with > correlated Poisson distributions. Do you know what the difference to > your implementation would be? As far as I know, it makes no difference if the two Poisson variates are correlated or not. The Skellam parameters are defined as mu1 = lam1 - rho * sqrt(lam1*lam2) mu2 = lam2 - rho * sqrt(lam1*lam2) where lam1 and lam2 are the Poisson means and rho is the correlation coefficient. So, in case there is correlation it is implicit in the parameters mu1 and mu2, and it doesn't make any difference in terms of calculating the pmf or cdf. Again, I am no statician or mathematician. > Tests for a new distribution will be picked up by the generic tests, > but it would be useful to have some extra tests for extreme/uncommon > parameter ranges. Do you have any comparisons with R, since you > already looked it? I have this visual test, although it's usefulness is questionable. It shows the deviation between observed and expected frequencies for a Skellam random variable. Ideally errors should be random and centered around 0. To me it looks like errors tend to increase around the mean and are smaller along the tails, but I don't know if this means something or not. import numpy import scipy.stats.distributions poisson = scipy.stats.distributions.poisson ncx2 = scipy.stats.distributions.ncx2 # Skellam distribution class skellam_gen(scipy.stats.distributions.rv_discrete): def _pmf(self, x, mu1, mu2): px = numpy.where(x < 0, ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2, ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2) return px def _cdf(self, x, mu1, mu2): x = numpy.floor(x) px = numpy.where(x < 0, ncx2.cdf(2*mu2, -2*x, 2*mu1), 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2)) return px def _stats(self, mu1, mu2): mean = mu1 - mu2 var = mu1 + mu2 g1 = (mean) / numpy.sqrt((var)**3) g2 = 1 / var return mean, var, g1, g2 skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam', shapes="mu1,mu2", extradoc="") if __name__ == '__main__': lam1 = 3.4 lam2 = 6.1 n = 5000 poisson_var1 = numpy.random.poisson(lam1, n) poisson_var2 = numpy.random.poisson(lam2, n) skellam_var = poisson_var1-poisson_var2 low = min(skellam_var) high = max(skellam_var) obs_freq = numpy.histogram( skellam_var, numpy.arange(low, high+2))[0] / float(n) rho = numpy.corrcoef(poisson_var1, poisson_var2)[1,0] mu1 = lam1 - rho * numpy.sqrt(lam1*lam2) mu2 = lam2 - rho * numpy.sqrt(lam1*lam2) exp_freq = skellam.pmf(numpy.arange(low, high+1), mu1, mu2) print obs_freq-exp_freq # plot import matplotlib.pyplot as plt plt.figure().add_subplot(1,1,1).plot(obs_freq-exp_freq) plt.show() Regards. -- Ernest From gael.varoquaux at normalesup.org Mon Nov 9 13:17:13 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 9 Nov 2009 19:17:13 +0100 Subject: [SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster In-Reply-To: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com> References: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com> Message-ID: <20091109181713.GF28468@phare.normalesup.org> On Mon, Nov 09, 2009 at 11:41:29PM +0530, Rohit Garg wrote: > Hi all, > I have an embarrassingly parallel problem, very nicely suited to > parallelization. A non-optimal solution that I like: http://gael-varoquaux.info/blog/?p=119 Ga?l From peridot.faceted at gmail.com Mon Nov 9 13:18:46 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Mon, 9 Nov 2009 13:18:46 -0500 Subject: [SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster In-Reply-To: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com> References: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com> Message-ID: 2009/11/9 Rohit Garg : > Hi all, > > I have an embarrassingly parallel problem, very nicely suited to > parallelization. I am looking for community feedback on how to best > approach this matter? Basically, I just setup a bunch of tasks, and > the various cpu's will pull data, process it, and send it back. Out of > order arrival of results is no problem. The processing times involved > are so large that the communication is effectively free, and hence I > don't care how fast/slow the communication is. I thought I'll ask in > case somebody has done this stuff before to avoid reinventing the > wheel. Any other suggestions are welcome too. > > My only constraint is that it should be able to run a python extension > (c++) with minimum of fuss. I want to minimize the headaches involved > with setting up/writing the boilerplate code. Which > framework/approach/library would you recommend? For our pulsar searches, we pick about the simplest possible method. Each job is set up so that you run it from a UNIX shell in a directory containing all the needed files, and it saves any output to a common directory. We then submit jobs to the PBS batch system. We have some minor complications to this setup because copying the input data is quite network-intensive, so we make sure only one job starts at a time, but other than that the jobs have no interaction at all. Anne > There is one method mentioned at [1], and of course, one could resort > to something like mpi4py. > > [1] http://docs.python.org/library/multiprocessing.html {see the last example} > > -- > Rohit Garg > > http://rpg-314.blogspot.com/ > > Senior Undergraduate > Department of Physics > Indian Institute of Technology > Bombay > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From eadrogue at gmx.net Mon Nov 9 13:20:59 2009 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Mon, 9 Nov 2009 19:20:59 +0100 Subject: [SciPy-User] the skellam distribution In-Reply-To: <4AF83CE8.7080507@lpta.in2p3.fr> References: <20091108151625.GA561@doriath.local> <1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com> <4AF83AFD.60304@gmail.com> <4AF83CE8.7080507@lpta.in2p3.fr> Message-ID: <20091109182059.GB5957@doriath.local> 9/11/09 @ 17:01 (+0100), thus spake Johann Cohen-Tanugi: > From what I understand of the initial statement from Ernest: > " > In case somebody is interested, or you want to include it > in scipy. I used these specs here from the R package: > cran.r-project.org/web/packages/skellam/skellam.pdf > " > > he used the spec, as defined in this pdf, and did not look at the code > itself. If my interpretation of the small preamble above is correct, I > believe his implementation is not GPL-tainted, right? Your interpretation of my preamble is correct. I didn't look at the source code, I only read the PDF doc where they explain in plain English how the pmf and cdf are calculated. -- Ernest From pav+sp at iki.fi Mon Nov 9 13:28:39 2009 From: pav+sp at iki.fi (Pauli Virtanen) Date: Mon, 9 Nov 2009 18:28:39 +0000 (UTC) Subject: [SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster References: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com> Message-ID: Mon, 09 Nov 2009 23:41:29 +0530, Rohit Garg wrote: [clip: embarassingly parallel problems] With multiprocessing, using Pool.imap_unordered to apply a computation function to a list of parameter sets is one good alternative. (IIRC, it balances load between subprocesses &c automatically.) Multiprocessing can however work on only one node at a time. With mpi4py, it's probably best to write a simple master-slave architecture. -- Pauli Virtanen From rpg.314 at gmail.com Mon Nov 9 13:28:55 2009 From: rpg.314 at gmail.com (Rohit Garg) Date: Mon, 9 Nov 2009 23:58:55 +0530 Subject: [SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster In-Reply-To: <20091109181713.GF28468@phare.normalesup.org> References: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com> <20091109181713.GF28468@phare.normalesup.org> Message-ID: <4d5dd8c20911091028v3d98242dmd6d2cb5f11741ff7@mail.gmail.com> On Mon, Nov 9, 2009 at 11:47 PM, Gael Varoquaux wrote: > On Mon, Nov 09, 2009 at 11:41:29PM +0530, Rohit Garg wrote: >> Hi all, > >> I have an embarrassingly parallel problem, very nicely suited to >> parallelization. > > A non-optimal solution that I like: > http://gael-varoquaux.info/blog/?p=119 Thanks, for the pointer, but after a quick read, it doesn't look like it supports distributed memory parallelism. Or does it? > > Ga?l > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Rohit Garg http://rpg-314.blogspot.com/ Senior Undergraduate Department of Physics Indian Institute of Technology Bombay From gael.varoquaux at normalesup.org Mon Nov 9 13:35:15 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 9 Nov 2009 19:35:15 +0100 Subject: [SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster In-Reply-To: <4d5dd8c20911091028v3d98242dmd6d2cb5f11741ff7@mail.gmail.com> References: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com> <20091109181713.GF28468@phare.normalesup.org> <4d5dd8c20911091028v3d98242dmd6d2cb5f11741ff7@mail.gmail.com> Message-ID: <20091109183515.GG28468@phare.normalesup.org> On Mon, Nov 09, 2009 at 11:58:55PM +0530, Rohit Garg wrote: > On Mon, Nov 9, 2009 at 11:47 PM, Gael Varoquaux > wrote: > > On Mon, Nov 09, 2009 at 11:41:29PM +0530, Rohit Garg wrote: > >> Hi all, > >> I have an embarrassingly parallel problem, very nicely suited to > >> parallelization. > > A non-optimal solution that I like: > > http://gael-varoquaux.info/blog/?p=119 > Thanks, for the pointer, but after a quick read, it doesn't look like > it supports distributed memory parallelism. Or does it? If by distributed memory you mean shared memory, you won't get this, but the copy on write of Unix gives you part of it, but not all of it. One hack is to use memmapping to a file to share memory between processes (it won't cost you IO, because your OS will be smart-enough to cache everything). The right way to do it is to use a shared memory array, which Sturla and I started working on ages ago, but never found time to integrate to numpy. If you mean parallelism on architectures where 'fork' won't distributes the processes (like a cluster), than multiprocessing won't do the trick, and you will need to look at IPython or parallel Python. Ga?l From josef.pktd at gmail.com Mon Nov 9 13:44:50 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 9 Nov 2009 13:44:50 -0500 Subject: [SciPy-User] Unit testing of Bayesian estimator In-Reply-To: References: <1cd32cbb0911061937q3ffe3182g143dd2af461c4aea@mail.gmail.com> <1cd32cbb0911081414u23d3ef8cgb408c2fbed65c098@mail.gmail.com> <1cd32cbb0911081835i34e8d404x3e3c2f75b96f9aa9@mail.gmail.com> <1cd32cbb0911082107k347cbc13s2bc41efb34c4fba8@mail.gmail.com> Message-ID: <1cd32cbb0911091044h28e9a5a7x624be502df05629c@mail.gmail.com> On Mon, Nov 9, 2009 at 1:14 PM, Anne Archibald wrote: > 2009/11/9 ?: > >> >From the posterior probability S/(S+1), you could construct >> a decision rule similar to a classical test, e.g. accept null >> if S/(S+1) < 0.95, and then construct a MonteCarlo >> with samples drawn form either the uniform or the pulsed >> distribution in the same way as for a classical test, and >> verify that the decision mistakes, alpha and beta errors, in the >> sample are close to the posterior probabilities. >> The posterior probability would be similar to the p-value >> in a classical test. If you want to balance alpha and >> beta errors, a threshold S/(S+1)<0.5 would be more >> appropriate, but for the unit tests it wouldn't matter. > > Unfortunately this doesn't work. Think of it this way: if my data size > is 10000 photons, and I'm looking at the fraction of > uniformly-distributed data sets that have a probability > 0.95 that > they are pulsed, this won't happen with 5% of my fake data sets - it > will almost never happen, since 10000 photons are enough to give a > very solid answer (experiment confirms this). So I can't interpret my > Bayesian probability as a frequentist probability of alpha error. Doesn't this mean that the Bayesian posterior doesn't have the correct tail probabilities? If my posterior beliefs are that the probability that I make a mistake is 5% and I have the correct model, but the real probability that I make a mistake is only 0.1%, then my updating should have correctly taken into account that the signal is so informative and tightened my posterior distribution. With 8000 in your example, I get Probability the signal is pulsed: 0.999960 This makes it pretty obvious if the signal is pulsed or not. Do the tail probabilities work better for cases that are not so easy to distinguish? Josef > >> Running the example a few times, it looks like that the power >> is relatively low for distinguishing uniform distribution from >> a pulsed distribution with fraction/binomial parameter 0.05 >> and sample size <1000. >> If you have strong beliefs that the fraction is really this low >> than an informative prior for the fraction, might improve the >> results. > > I really don't want to encourage my code to return reports of > pulsations. To be believed in this nest of frequentists I work with, I > need a solid detection in spite of very conservative priors. When you have everything working, then you could check the sensitivity to the priors. For parameter estimation, I found it interesting to see which parameters change a lot when I varied the prior variance, and it helps in the defense against frequentists. Josef > > Anne > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From coughlan at ski.org Mon Nov 9 13:18:03 2009 From: coughlan at ski.org (James Coughlan) Date: Mon, 09 Nov 2009 10:18:03 -0800 Subject: [SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster In-Reply-To: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com> References: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com> Message-ID: <4AF85CDB.5010908@ski.org> Rohit Garg wrote: > Hi all, > > I have an embarrassingly parallel problem, very nicely suited to > parallelization. I am looking for community feedback on how to best > approach this matter? Basically, I just setup a bunch of tasks, and > the various cpu's will pull data, process it, and send it back. Out of > order arrival of results is no problem. The processing times involved > are so large that the communication is effectively free, and hence I > don't care how fast/slow the communication is. I thought I'll ask in > case somebody has done this stuff before to avoid reinventing the > wheel. Any other suggestions are welcome too. > > My only constraint is that it should be able to run a python extension > (c++) with minimum of fuss. I want to minimize the headaches involved > with setting up/writing the boilerplate code. Which > framework/approach/library would you recommend? > > There is one method mentioned at [1], and of course, one could resort > to something like mpi4py. > > [1] http://docs.python.org/library/multiprocessing.html {see the last example} > > Hi, I've never done any parallel processing, but you might consider Shedskin, a Python to C++ compiler, which makes it easy to convert Python functions into fast C++ modules, and offers support for parallel processing: http://code.google.com/p/shedskin/ Best, James -- ------------------------------------------------------- James Coughlan, Ph.D., Scientist The Smith-Kettlewell Eye Research Institute Email: coughlan at ski.org URL: http://www.ski.org/Rehab/Coughlan_lab/ Phone: 415-345-2146 Fax: 415-345-8455 ------------------------------------------------------- From bsouthey at gmail.com Mon Nov 9 13:49:34 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 09 Nov 2009 12:49:34 -0600 Subject: [SciPy-User] the skellam distribution In-Reply-To: <20091109182059.GB5957@doriath.local> References: <20091108151625.GA561@doriath.local> <1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com> <4AF83AFD.60304@gmail.com> <4AF83CE8.7080507@lpta.in2p3.fr> <20091109182059.GB5957@doriath.local> Message-ID: <4AF8643E.3020802@gmail.com> On 11/09/2009 12:20 PM, Ernest Adrogu? wrote: > 9/11/09 @ 17:01 (+0100), thus spake Johann Cohen-Tanugi: > >> From what I understand of the initial statement from Ernest: >> " >> In case somebody is interested, or you want to include it >> in scipy. I used these specs here from the R package: >> cran.r-project.org/web/packages/skellam/skellam.pdf >> " >> >> he used the spec, as defined in this pdf, and did not look at the code >> itself. If my interpretation of the small preamble above is correct, I >> believe his implementation is not GPL-tainted, right? >> > Your interpretation of my preamble is correct. > I didn't look at the source code, I only read the PDF doc > where they explain in plain English how the pmf and cdf are > calculated. > > Okay, Then just provide a suitable reference outside the R package where someone can derive it independently or get more details. This also allows people to go back and check the implementation when things are not as expected. Perhaps the reference that Josef provided provides the same formulation? Bruce From josef.pktd at gmail.com Mon Nov 9 14:18:33 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 9 Nov 2009 14:18:33 -0500 Subject: [SciPy-User] the skellam distribution In-Reply-To: <4AF8643E.3020802@gmail.com> References: <20091108151625.GA561@doriath.local> <1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com> <4AF83AFD.60304@gmail.com> <4AF83CE8.7080507@lpta.in2p3.fr> <20091109182059.GB5957@doriath.local> <4AF8643E.3020802@gmail.com> Message-ID: <1cd32cbb0911091118i588800dat2d6707690ea8f868@mail.gmail.com> On Mon, Nov 9, 2009 at 1:49 PM, Bruce Southey wrote: > On 11/09/2009 12:20 PM, Ernest Adrogu? wrote: >> ? 9/11/09 @ 17:01 (+0100), thus spake Johann Cohen-Tanugi: >> >>> ? ?From what I understand of the initial statement from Ernest: >>> " >>> In case somebody is interested, or you want to include it >>> in scipy. I used these specs here from the R package: >>> cran.r-project.org/web/packages/skellam/skellam.pdf >>> " >>> >>> he used the spec, as defined in this pdf, and did not look at the code >>> itself. If my interpretation of the small preamble above is correct, I >>> believe his implementation is not GPL-tainted, right? >>> >> Your interpretation of my preamble is correct. >> I didn't look at the source code, I only read the PDF doc >> where they explain in plain English how the pmf and cdf are >> calculated. >> >> > Okay, > Then just provide a suitable reference outside the R package where > someone can derive it independently or get more details. This also > allows people to go back and check the implementation when things are > not as expected. Perhaps the reference that Josef provided provides the > same formulation? If someone wants to check the relationship of chisquare and difference of two independent poisson, but that's an implementation detail and not really informative in a doc string. On an Extension of the Connexion Between Poisson and ?2 Distributions Author(s): N. L. Johnson Source: Biometrika, Vol. 46, No. 3/4 (Dec., 1959), pp. 352-363 Josef > > Bruce > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Mon Nov 9 14:36:29 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 9 Nov 2009 14:36:29 -0500 Subject: [SciPy-User] the skellam distribution In-Reply-To: <20091109181650.GA5957@doriath.local> References: <20091108151625.GA561@doriath.local> <1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com> <20091109181650.GA5957@doriath.local> Message-ID: <1cd32cbb0911091136m26c9dd37r229051142c43c63d@mail.gmail.com> 2009/11/9 Ernest Adrogu? : > ?9/11/09 @ 09:40 (-0500), thus spake josef.pktd at gmail.com: >> Thanks, I think the distribution of the difference of two poisson >> distributed random variables could be useful. >> >> Would you please open an enhancement ticket for this at >> http://projects.scipy.org/scipy/report/1 > > Okay, I will. > >> I had only a brief look at it so far, I had never looked at the >> Skellam distribution before, and just read a few references. >> >> The "if x < 0 .. else ..." will have to be replace with a >> "numpy.where" assignment, since the methods are supposed to work with >> arrays of x (as far as I remember) > > Done. > >> _rvs could be implemented directly instead of generically (I don't >> find the reference, where I saw it, right now). > > I suppose it could be done, but I can't figure out how. > As far as I understand, you can't derive the poisson parameters > from the skellam parameters alone, you need the correlation > coefficient (between the two poisson rv) too, is that right? it's can be generated as difference between two independent poisson, (see R docs). Correlation only matters for the interpretation, as you explain below and I found in a reference. > >> Documentation will be necessary, ?a brief description in the >> (currently) extradocs, and a listing of the properties for the >> description of the distributions currently in the stats tutorial. >> >> I have some background questions, which address the limitation of the >> implementation (but are not really necessary for inclusion into >> scipy). >> >> The description in R mentions several implementation of Skellam. Do >> you have a rough idea what the range of parameters are for which the >> implementation using ncx produces good results? Do you know if any >> other special functions would produce good results over a larger >> range, e.g. using Bessel function? > > No, sorry, I have no idea. The R paper says that in R the Bessel > function is more accurate, but I don't think this means that the > Bessel function implementation is more accurate in general. > I compared results using the Bessel function in scipy and using > the ncx2 implementation, and the differences were minimal, although > I admit my testing wasn't extensive. Another problem is that > I haven't got a table of values for this particular distribution, > so in case results differ I can't tell which one is more accurate I will look at it, but small numerical inaccuracies, 1e-??, can also be fixed later, if someone finds a better implementation., > >> Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also >> mentions (but doesn't describe) the case of Skellam distribution with >> correlated Poisson distributions. Do you know what the difference to >> your implementation would be? > > As far as I know, it makes no difference if the two Poisson > variates are correlated or not. The Skellam parameters are > defined as > > mu1 = lam1 - rho * sqrt(lam1*lam2) > mu2 = lam2 - rho * sqrt(lam1*lam2) > > where lam1 and lam2 are the Poisson means and rho is the > correlation coefficient. So, in case there is correlation it > is implicit in the parameters mu1 and mu2, and it doesn't > make any difference in terms of calculating the pmf or cdf. > Again, I am no statician or mathematician. I agree, that's what I have seen in the references. > >> Tests for a new distribution will be picked up by the generic tests, >> but it would be useful to have some extra tests for extreme/uncommon >> parameter ranges. Do you have any comparisons with R, since you >> already looked it? > > I have this visual test, although it's usefulness is > questionable. It shows the deviation between observed and > expected frequencies for a Skellam random variable. Ideally > errors should be random and centered around 0. To me > it looks like errors tend to increase around the mean > and are smaller along the tails, but I don't know if this > means something or not. > > import numpy > import scipy.stats.distributions > > poisson = scipy.stats.distributions.poisson > ncx2 = scipy.stats.distributions.ncx2 > > # Skellam distribution > > class skellam_gen(scipy.stats.distributions.rv_discrete): > ? ?def _pmf(self, x, mu1, mu2): > ? ? ? ?px = numpy.where(x < 0, ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2, > ? ? ? ? ? ? ? ? ? ? ? ? ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2) > ? ? ? ?return px > ? ?def _cdf(self, x, mu1, mu2): > ? ? ? ?x = numpy.floor(x) > ? ? ? ?px = numpy.where(x < 0, ncx2.cdf(2*mu2, -2*x, 2*mu1), > ? ? ? ? ? ? ? ? ? ? ? ? 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2)) > ? ? ? ?return px > ? ?def _stats(self, mu1, mu2): > ? ? ? ?mean = mu1 - mu2 > ? ? ? ?var = mu1 + mu2 > ? ? ? ?g1 = (mean) / numpy.sqrt((var)**3) > ? ? ? ?g2 = 1 / var > ? ? ? ?return mean, var, g1, g2 > skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam', > ? ? ? ? ? ? ? ? ? ? ?shapes="mu1,mu2", extradoc="") > > if __name__ == '__main__': > > ? ?lam1 = 3.4 > ? ?lam2 = 6.1 > ? ?n = 5000 > > ? ?poisson_var1 = numpy.random.poisson(lam1, n) > ? ?poisson_var2 = numpy.random.poisson(lam2, n) > ? ?skellam_var = poisson_var1-poisson_var2 > > ? ?low = min(skellam_var) > ? ?high = max(skellam_var) > ? ?obs_freq = numpy.histogram( > ? ? ? ?skellam_var, numpy.arange(low, high+2))[0] / float(n) > > ? ?rho = numpy.corrcoef(poisson_var1, poisson_var2)[1,0] > ? ?mu1 = lam1 - rho * numpy.sqrt(lam1*lam2) > ? ?mu2 = lam2 - rho * numpy.sqrt(lam1*lam2) > ? ?exp_freq = skellam.pmf(numpy.arange(low, high+1), mu1, mu2) > ? ?print obs_freq-exp_freq > > ? ?# plot > ? ?import matplotlib.pyplot as plt > ? ?plt.figure().add_subplot(1,1,1).plot(obs_freq-exp_freq) > ? ?plt.show() I'm not sure the example is correct. You are simulating two independent poisson variables, so the difference skellam_var should be distributed as skellam with mu1 = lam1 mu2 = lam2 and theoretical rho should be zero. Or am I missing something? thanks, the numpy.where looks good, I still have to actually run the examples. Josef > > > Regards. > -- > Ernest > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From bsouthey at gmail.com Mon Nov 9 14:47:10 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 09 Nov 2009 13:47:10 -0600 Subject: [SciPy-User] Unit testing of Bayesian estimator In-Reply-To: References: <4AF84EF1.2090608@gmail.com> Message-ID: <4AF871BE.6050300@gmail.com> On 11/09/2009 12:06 PM, Anne Archibald wrote: > 2009/11/9 Bruce Southey: > > >> I do not know what you are trying to do with the code as it is not my >> area. But you are using some empirical Bayesian estimator >> (http://en.wikipedia.org/wiki/Empirical_Bayes_method) and thus you lose >> much of the value of Bayesian as you are only dealing with modal >> estimates. Really you should be obtaining the distribution of >> "Probability the signal is pulsed" not just the modal estimate. >> > Um. Given a data set and a prior, I just do Bayesian hypothesis > comparison. This gives me a single probability that the signal is > pulsed. You seem to be imagining a probability distribution for this > probability - but what would the independent variables be? The > unpulsed distribution does not depend on any parameters, and I have > integrated over all possible values for the pulsed distribution. So > what I get should really be the probability, given the data, that the > signal is pulsed. I'm not using an empirical Bayesian estimator; I'm > doing the numerical integrations directly (and inefficiently). > Here are two links on what I mean with reference to the binomial case: http://lingpipe-blog.com/2009/09/11/batting-averages-bayesian-vs-mle-estimate/ TEACHING OF BAYESIAN ESTIMATION OF ?P? PROBABILITY IN A BERNOULLI PROCESS: http://www.stat.auckland.ac.nz/~iase/publications/17/C439.pdf I do not know your area but you should be able to do something similar. Bruce From david_baddeley at yahoo.com.au Mon Nov 9 15:56:31 2009 From: david_baddeley at yahoo.com.au (David Baddeley) Date: Mon, 9 Nov 2009 12:56:31 -0800 (PST) Subject: [SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster In-Reply-To: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com> Message-ID: <92881.60847.qm@web33004.mail.mud.yahoo.com> Hi Rohit, I've had a lot of sucess using PYRO (pyro.sourceforge.net) to distribute tasks across a cluster. Pyro's a remote objects implementation for python and makes inter-process communication really easy. The disadvantage of this approach is that you've got to write your own server to distribute the tasks, but this is almost trivial (mine's a class with getTask and postTask methods, and with the tasks stored internally in a list, and which is made remotely accessible using pyro). The advantage is that it seems to work well on any platform I've tried it on, and that it's really easy to add things like a timeout on tasks so that they can be reassigned if one of the workers falls over or is killed (I've had workers running as a windows screensaver). My tasks use a mixture of python and c, although no communication takes place in the c code. I took this route before I was aware of multiprocessing / the parallel components of ipython etc... and the communications overhead when using PYRO is relatively high so these other options would definitely be worth looking into. I can post the code for a minimal task server/client if you like. best wishes, David --- On Tue, 10/11/09, Rohit Garg wrote: > From: Rohit Garg > Subject: [SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster > To: "SciPy Users List" , numpy-discussions at scipy.org > Received: Tuesday, 10 November, 2009, 7:11 AM > Hi all, > > I have an embarrassingly parallel problem, very nicely > suited to > parallelization. I am looking for community feedback on how > to best > approach this matter? Basically, I just setup a bunch of > tasks, and > the various cpu's will pull data, process it, and send it > back. Out of > order arrival of results is no problem. The processing > times involved > are so large that the communication is effectively free, > and hence I > don't care how fast/slow the communication is. I thought > I'll ask in > case somebody has done this stuff before to avoid > reinventing the > wheel. Any other suggestions are welcome too. > > My only constraint is that it should be able to run a > python extension > (c++) with minimum of fuss. I want to minimize the > headaches involved > with setting up/writing the boilerplate code. Which > framework/approach/library would you recommend? > > There is one method mentioned at [1], and of course, one > could resort > to something like mpi4py. > > [1] http://docs.python.org/library/multiprocessing.html???{see > the last example} > > -- > Rohit Garg > > http://rpg-314.blogspot.com/ > > Senior Undergraduate > Department of Physics > Indian Institute of Technology > Bombay > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From davide.cittaro at ifom-ieo-campus.it Mon Nov 9 15:58:28 2009 From: davide.cittaro at ifom-ieo-campus.it (Davide Cittaro) Date: Mon, 9 Nov 2009 21:58:28 +0100 Subject: [SciPy-User] poisson distribution in scipy.stats Message-ID: <077A6881-7537-43FD-8BA4-0A32554BC944@ifom-ieo-campus.it> Hi all, about the poisson generator... given l (expected) and k (found) I guess that the way to get the probability of k I have to do this: d = scipy.stats.poisson(l) p = pmf(k) which I found being the same of p = scipy.stats.poisson.pmf(l, k) I've here some code in which it is written: d = scipy.stats.poisson(l, k) p = d.pmf(k) which gives different results. Which is the right way to initialize the distribution (and get the PMF)? Thanks d /* Davide Cittaro Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro at ifom-ieo-campus.it */ -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Nov 9 16:00:29 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 9 Nov 2009 15:00:29 -0600 Subject: [SciPy-User] poisson distribution in scipy.stats In-Reply-To: <077A6881-7537-43FD-8BA4-0A32554BC944@ifom-ieo-campus.it> References: <077A6881-7537-43FD-8BA4-0A32554BC944@ifom-ieo-campus.it> Message-ID: <3d375d730911091300h6d9b4df6u4be1e6ae0811b12@mail.gmail.com> On Mon, Nov 9, 2009 at 14:58, Davide Cittaro wrote: > Hi all, > about the poisson generator... given l (expected) and k (found) I guess that > the way to get the probability of k I have to do this: > > d = scipy.stats.poisson(l) > p = pmf(k) Correct. > which I found being the same of > p = scipy.stats.poisson.pmf(l, k) Also correct. > I've here some code in which it is written: > d = scipy.stats.poisson(l, k) That one is completely wrong. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From wesmckinn at gmail.com Mon Nov 9 16:03:18 2009 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 9 Nov 2009 16:03:18 -0500 Subject: [SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster In-Reply-To: <92881.60847.qm@web33004.mail.mud.yahoo.com> References: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com> <92881.60847.qm@web33004.mail.mud.yahoo.com> Message-ID: <6c476c8a0911091303k3ecbbb24ve4a2abb862bb4da9@mail.gmail.com> On Mon, Nov 9, 2009 at 3:56 PM, David Baddeley wrote: > Hi Rohit, > > I've had a lot of sucess using PYRO (pyro.sourceforge.net) to distribute tasks across a cluster. Pyro's a remote objects implementation for python and makes inter-process communication really easy. The disadvantage of this approach is that you've got to write your own server to distribute the tasks, but this is almost trivial (mine's a class with getTask and postTask methods, and with the tasks stored internally in a list, and which is made remotely accessible using pyro). The advantage is that it seems to work well on any platform I've tried it on, and that it's really easy to add things like a timeout on tasks so that they can be reassigned if one of the workers falls over or is killed (I've had workers running as a windows screensaver). My tasks use a mixture of python and c, although no communication takes place in the c code. > > I took this route before I was aware of multiprocessing / the parallel components of ipython etc... and the communications overhead when using PYRO is relatively high so these other options would definitely be worth looking into. > > I can post the code for a minimal task server/client if you like. > > best wishes, > David > > --- On Tue, 10/11/09, Rohit Garg wrote: > >> From: Rohit Garg >> Subject: [SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster >> To: "SciPy Users List" , numpy-discussions at scipy.org >> Received: Tuesday, 10 November, 2009, 7:11 AM >> Hi all, >> >> I have an embarrassingly parallel problem, very nicely >> suited to >> parallelization. I am looking for community feedback on how >> to best >> approach this matter? Basically, I just setup a bunch of >> tasks, and >> the various cpu's will pull data, process it, and send it >> back. Out of >> order arrival of results is no problem. The processing >> times involved >> are so large that the communication is effectively free, >> and hence I >> don't care how fast/slow the communication is. I thought >> I'll ask in >> case somebody has done this stuff before to avoid >> reinventing the >> wheel. Any other suggestions are welcome too. >> >> My only constraint is that it should be able to run a >> python extension >> (c++) with minimum of fuss. I want to minimize the >> headaches involved >> with setting up/writing the boilerplate code. Which >> framework/approach/library would you recommend? >> >> There is one method mentioned at [1], and of course, one >> could resort >> to something like mpi4py. >> >> [1] http://docs.python.org/library/multiprocessing.html???{see >> the last example} >> >> -- >> Rohit Garg >> >> http://rpg-314.blogspot.com/ >> >> Senior Undergraduate >> Department of Physics >> Indian Institute of Technology >> Bombay >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Here's a little parallel processing library using Pyro which might be of interest to some: http://code.google.com/p/papyros/ From karl.young at ucsf.edu Mon Nov 9 16:05:49 2009 From: karl.young at ucsf.edu (Karl Young) Date: Mon, 09 Nov 2009 13:05:49 -0800 Subject: [SciPy-User] Weierstrass and Jacobi In-Reply-To: References: Message-ID: <4AF8842D.5010805@ucsf.edu> Sorry for the dumb question (but some of you know me by now !). I was able to stumble around and solve a differential equation I was working on in terms of Weierstrass elliptic functions (though an open source type of guy I have to thank Wolfram re. wloframalpha for help with that...). I'd like to evaluate the function for various sets of parameters and found that the special functions package for scipy has Jacobi elliptic functions available. I seem to recall that the Weierstrass elliptic functions are special cases of the Jacobi elliptic functions but haven't been able to locate any source that describes that in any detail. Anyone have any hints ? Thanks, -- Karl From vanforeest at gmail.com Mon Nov 9 16:17:44 2009 From: vanforeest at gmail.com (nicky van foreest) Date: Mon, 9 Nov 2009 22:17:44 +0100 Subject: [SciPy-User] Weierstrass and Jacobi In-Reply-To: <4AF8842D.5010805@ucsf.edu> References: <4AF8842D.5010805@ucsf.edu> Message-ID: Hi Karl, I haven't checked.. you might try the books of Apostol (mathematical analysis), Courant and John, or Numerical recipes on this. bye Nicky 2009/11/9 Karl Young : > > Sorry for the dumb question (but some of you know me by now !). I was > able to stumble around and solve a differential equation I was working > on in terms of Weierstrass elliptic functions (though an open source > type of guy I have to thank Wolfram re. wloframalpha for help with > that...). I'd like to evaluate the function for various sets of > parameters and found that the special functions package for scipy has > Jacobi elliptic functions available. I seem to recall that the > Weierstrass elliptic functions are special cases of the Jacobi elliptic > functions but haven't been able to locate any source that describes that > in any detail. Anyone have any hints ? Thanks, > > -- Karl > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From davide.cittaro at ifom-ieo-campus.it Mon Nov 9 16:21:59 2009 From: davide.cittaro at ifom-ieo-campus.it (Davide Cittaro) Date: Mon, 9 Nov 2009 22:21:59 +0100 Subject: [SciPy-User] poisson distribution in scipy.stats In-Reply-To: References: Message-ID: On Nov 9, 2009, at 10:03 PM, scipy-user-request at scipy.org wrote: > Message: 5 > Date: Mon, 9 Nov 2009 15:00:29 -0600 > From: Robert Kern > Subject: Re: [SciPy-User] poisson distribution in scipy.stats > To: SciPy Users List > Message-ID: > <3d375d730911091300h6d9b4df6u4be1e6ae0811b12 at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Mon, Nov 9, 2009 at 14:58, Davide Cittaro > wrote: >> Hi all, >> about the poisson generator... given l (expected) and k (found) I >> guess that >> the way to get the probability of k I have to do this: >> >> d = scipy.stats.poisson(l) >> p = pmf(k) > > Correct. > >> which I found being the same of >> p = scipy.stats.poisson.pmf(l, k) > > Also correct. > Although I've just plotted values and they are sligthly different... the second version does have only a max value at l, whereas the first has two maxima (l and l-1) >> I've here some code in which it is written: >> d = scipy.stats.poisson(l, k) > > That one is completely wrong. > Ok, I see... It looks like it shifts the pmf of k... but how does it works? I mean, how the discrete distribution constructor interprets this kind of declaration? d /* Davide Cittaro Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro at ifom-ieo-campus.it */ -------------- next part -------------- An HTML attachment was scrubbed... URL: From karl.young at ucsf.edu Mon Nov 9 16:13:05 2009 From: karl.young at ucsf.edu (Karl Young) Date: Mon, 09 Nov 2009 13:13:05 -0800 Subject: [SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster In-Reply-To: <6c476c8a0911091303k3ecbbb24ve4a2abb862bb4da9@mail.gmail.com> References: <4d5dd8c20911091011hd5d9265k75edfc78dfde2fe@mail.gmail.com> <92881.60847.qm@web33004.mail.mud.yahoo.com> <6c476c8a0911091303k3ecbbb24ve4a2abb862bb4da9@mail.gmail.com> Message-ID: <4AF885E1.30709@ucsf.edu> If I were starting from scratch I'd learn how to do this using Ipython: http://ipython.scipy.org/moin/ but in the past I had great luck using pypar, a very simple interface to the MPI library; for embarrassingly parallel problems it's very easy to get up and running quickly by just copying examples (I knew a wee bit of MPI before using this but not much). http://datamining.anu.edu.au/~ole/pypar/ -- KY > On Mon, Nov 9, 2009 at 3:56 PM, David Baddeley > wrote: > >> Hi Rohit, >> >> I've had a lot of sucess using PYRO (pyro.sourceforge.net) to distribute tasks across a cluster. Pyro's a remote objects implementation for python and makes inter-process communication really easy. The disadvantage of this approach is that you've got to write your own server to distribute the tasks, but this is almost trivial (mine's a class with getTask and postTask methods, and with the tasks stored internally in a list, and which is made remotely accessible using pyro). The advantage is that it seems to work well on any platform I've tried it on, and that it's really easy to add things like a timeout on tasks so that they can be reassigned if one of the workers falls over or is killed (I've had workers running as a windows screensaver). My tasks use a mixture of python and c, although no communication takes place in the c code. >> >> I took this route before I was aware of multiprocessing / the parallel components of ipython etc... and the communications overhead when using PYRO is relatively high so these other options would definitely be worth looking into. >> >> I can post the code for a minimal task server/client if you like. >> >> best wishes, >> David >> >> --- On Tue, 10/11/09, Rohit Garg wrote: >> >> >>> From: Rohit Garg >>> Subject: [SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster >>> To: "SciPy Users List" , numpy-discussions at scipy.org >>> Received: Tuesday, 10 November, 2009, 7:11 AM >>> Hi all, >>> >>> I have an embarrassingly parallel problem, very nicely >>> suited to >>> parallelization. I am looking for community feedback on how >>> to best >>> approach this matter? Basically, I just setup a bunch of >>> tasks, and >>> the various cpu's will pull data, process it, and send it >>> back. Out of >>> order arrival of results is no problem. The processing >>> times involved >>> are so large that the communication is effectively free, >>> and hence I >>> don't care how fast/slow the communication is. I thought >>> I'll ask in >>> case somebody has done this stuff before to avoid >>> reinventing the >>> wheel. Any other suggestions are welcome too. >>> >>> My only constraint is that it should be able to run a >>> python extension >>> (c++) with minimum of fuss. I want to minimize the >>> headaches involved >>> with setting up/writing the boilerplate code. Which >>> framework/approach/library would you recommend? >>> >>> There is one method mentioned at [1], and of course, one >>> could resort >>> to something like mpi4py. >>> >>> [1] http://docs.python.org/library/multiprocessing.html {see >>> the last example} >>> >>> -- >>> Rohit Garg >>> >>> http://rpg-314.blogspot.com/ >>> >>> Senior Undergraduate >>> Department of Physics >>> Indian Institute of Technology >>> Bombay >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > Here's a little parallel processing library using Pyro which might be > of interest to some: > > http://code.google.com/p/papyros/ > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > . > > From josef.pktd at gmail.com Mon Nov 9 16:24:35 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 9 Nov 2009 16:24:35 -0500 Subject: [SciPy-User] poisson distribution in scipy.stats In-Reply-To: References: Message-ID: <1cd32cbb0911091324y47ad01edne8d3f666feb746e2@mail.gmail.com> On Mon, Nov 9, 2009 at 4:21 PM, Davide Cittaro wrote: > > On Nov 9, 2009, at 10:03 PM, scipy-user-request at scipy.org wrote: > > Message: 5 > Date: Mon, 9 Nov 2009 15:00:29 -0600 > From: Robert Kern > Subject: Re: [SciPy-User] poisson distribution in scipy.stats > To: SciPy Users List > Message-ID: > <3d375d730911091300h6d9b4df6u4be1e6ae0811b12 at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Mon, Nov 9, 2009 at 14:58, Davide Cittaro > wrote: > > Hi all, > > about the poisson generator... given l (expected) and k (found) I guess that > > the way to get the probability of k I have to do this: > > d = scipy.stats.poisson(l) > > p = pmf(k) > > Correct. > > which I found being the same of > > p = scipy.stats.poisson.pmf(l, k) > > Also correct. > > > Although I've just plotted values and they are sligthly different... the > second version does have only a max value at l, whereas the first has two > maxima (l and l-1) > > I've here some code in which it is written: > > d = scipy.stats.poisson(l, k) k is interpreted as location same as stats.poisson(l, loc=k) loc shifts the distribution, and returns a frozen distribution with shifted support Josef > > That one is completely wrong. > > > Ok, I see... It looks like it shifts the pmf of k... but how does it works? > I mean, how the discrete distribution constructor interprets this kind of > declaration? > d > > /* > Davide Cittaro > Cogentech - Consortium for Genomic Technologies > via adamello, 16 > 20139 Milano > Italy > tel.: +39(02)574303007 > e-mail:?davide.cittaro at ifom-ieo-campus.it > */ > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From robert.kern at gmail.com Mon Nov 9 16:25:24 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 9 Nov 2009 15:25:24 -0600 Subject: [SciPy-User] poisson distribution in scipy.stats In-Reply-To: References: Message-ID: <3d375d730911091325g7c70aeb2x68fb7cb2137f0a06@mail.gmail.com> On Mon, Nov 9, 2009 at 15:21, Davide Cittaro wrote: > > On Nov 9, 2009, at 10:03 PM, scipy-user-request at scipy.org wrote: > > Message: 5 > Date: Mon, 9 Nov 2009 15:00:29 -0600 > From: Robert Kern > Subject: Re: [SciPy-User] poisson distribution in scipy.stats > To: SciPy Users List > Message-ID: > <3d375d730911091300h6d9b4df6u4be1e6ae0811b12 at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Mon, Nov 9, 2009 at 14:58, Davide Cittaro > wrote: > > Hi all, > > about the poisson generator... given l (expected) and k (found) I guess that > > the way to get the probability of k I have to do this: > > d = scipy.stats.poisson(l) > > p = pmf(k) > > Correct. > > which I found being the same of > > p = scipy.stats.poisson.pmf(l, k) > > Also correct. > > > Although I've just plotted values and they are sligthly different... the > second version does have only a max value at l, whereas the first has two > maxima (l and l-1) I'm sorry. I meant "Wrong." p = scipy.stats.poisson.pmf(k, l) > I've here some code in which it is written: > > d = scipy.stats.poisson(l, k) > > That one is completely wrong. > > > Ok, I see... It looks like it shifts the pmf of k... but how does it works? > I mean, how the discrete distribution constructor interprets this kind of > declaration? Exactly as the documentation states: myrv = poisson(mu,loc=0) - frozen RV object with the same methods but holding the given shape and location fixed. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From peridot.faceted at gmail.com Mon Nov 9 16:28:19 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Mon, 9 Nov 2009 16:28:19 -0500 Subject: [SciPy-User] Unit testing of Bayesian estimator In-Reply-To: <4AF871BE.6050300@gmail.com> References: <4AF84EF1.2090608@gmail.com> <4AF871BE.6050300@gmail.com> Message-ID: 2009/11/9 Bruce Southey : > On 11/09/2009 12:06 PM, Anne Archibald wrote: >> 2009/11/9 Bruce Southey: >> >>> I do not know what you are trying to do with the code as it is not my >>> area. But you are using some empirical Bayesian estimator >>> (http://en.wikipedia.org/wiki/Empirical_Bayes_method) and thus you lose >>> much of the value of Bayesian as you are only dealing with modal >>> estimates. Really you should be obtaining the distribution of >>> "Probability the signal is pulsed" not just the modal estimate. >>> >> Um. Given a data set and a prior, I just do Bayesian hypothesis >> comparison. This gives me a single probability that the signal is >> pulsed. You seem to be imagining a probability distribution for this >> probability - but what would the independent variables be? The >> unpulsed distribution does not depend on any parameters, and I have >> integrated over all possible values for the pulsed distribution. So >> what I get should really be the probability, given the data, that the >> signal is pulsed. I'm not using an empirical Bayesian estimator; I'm >> doing the numerical integrations directly (and inefficiently). >> > Here are two links on what I mean with reference to the binomial case: > http://lingpipe-blog.com/2009/09/11/batting-averages-bayesian-vs-mle-estimate/ > > TEACHING OF BAYESIAN ESTIMATION OF ?P? PROBABILITY > IN A BERNOULLI PROCESS: > http://www.stat.auckland.ac.nz/~iase/publications/17/C439.pdf > > > I do not know your area but you should be able to do something similar. They are doing something essentially different from what I am doing. They have a single (parameterized) hypothesis, so they don't compute a probability of it being the case rather than some other hypothesis. Perhaps you are being misled by the fact that the system they are reasoning about is a binomial system, in which the parameter is "probability of occurrence". In my case, I am not working with a binomial system; the closest analog in my system to their p is my fraction parameter, and I seem to have a usable way to test the posterior distribution of this parameter. It is the hypothesis testing that I am trying to test at the moment. Anne From karl.young at ucsf.edu Mon Nov 9 18:44:40 2009 From: karl.young at ucsf.edu (Karl Young) Date: Mon, 09 Nov 2009 15:44:40 -0800 Subject: [SciPy-User] Weierstrass and Jacobi In-Reply-To: References: <4AF8842D.5010805@ucsf.edu> Message-ID: <4AF8A968.8030907@ucsf.edu> Hi Nicky, Thanks for the tips, -- Karl > Hi Karl, > > I haven't checked.. you might try the books of Apostol (mathematical > analysis), Courant and John, or Numerical recipes on this. > > bye > > Nicky > > 2009/11/9 Karl Young : > >> Sorry for the dumb question (but some of you know me by now !). I was >> able to stumble around and solve a differential equation I was working >> on in terms of Weierstrass elliptic functions (though an open source >> type of guy I have to thank Wolfram re. wloframalpha for help with >> that...). I'd like to evaluate the function for various sets of >> parameters and found that the special functions package for scipy has >> Jacobi elliptic functions available. I seem to recall that the >> Weierstrass elliptic functions are special cases of the Jacobi elliptic >> functions but haven't been able to locate any source that describes that >> in any detail. Anyone have any hints ? Thanks, >> >> -- Karl >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From sturla at molden.no Mon Nov 9 22:48:01 2009 From: sturla at molden.no (Sturla Molden) Date: Tue, 10 Nov 2009 04:48:01 +0100 Subject: [SciPy-User] scipy.org is down Message-ID: <4AF8E271.9090301@molden.no> Someone please restart the web-server. Sturla From denis-bz-gg at t-online.de Tue Nov 10 10:53:56 2009 From: denis-bz-gg at t-online.de (denis) Date: Tue, 10 Nov 2009 07:53:56 -0800 (PST) Subject: [SciPy-User] Contribution to Performance Python In-Reply-To: <4AF40025.3000906@iqac.csic.es> References: <4AF40025.3000906@iqac.csic.es> Message-ID: <65963269-c0fe-483a-8628-bbcc8a8f9b9e@c3g2000yqd.googlegroups.com> Ramon, not an answer to your question, just a word of agreement: FORALL is a fine construct, natural to write in pseudocode (I don't use fortran) in python, generators in C++: for( j k x y in parallel) is very fast on superscalar cpus (I use e.g. #define Forjkxy(...) a[j][k] = f(x,y), don't tell the C++ police) in testing: for a, b, c in product( [a...], [b...], [c...] ) Would you have any more examples ? They'd help spread FORALL. Apropos examples, one (1) laplace is a bit meager; has anyone (flameproof) done others, weave vs cython vs C or fortran ? cheers -- denis From robince at gmail.com Tue Nov 10 11:21:25 2009 From: robince at gmail.com (Robin) Date: Tue, 10 Nov 2009 16:21:25 +0000 Subject: [SciPy-User] Contribution to Performance Python In-Reply-To: <4AF40025.3000906@iqac.csic.es> References: <4AF40025.3000906@iqac.csic.es> Message-ID: <2d5132a50911100821l1332528bl9509f2f11e74e081@mail.gmail.com> On Fri, Nov 6, 2009 at 10:53 AM, Ramon Crehuet wrote: > If this is interesting to the community, who should I contact to have > this included in the scipy web page? Hi, It's a wiki, so I think you should be able to register an account and modify the page yourself. I'd certainly support the addition of the new fortran versions (although perhaps one is sufficient since they both seem to perform very closely) and perhaps updated timings section. Cheers Robin From dyamins at gmail.com Tue Nov 10 11:37:19 2009 From: dyamins at gmail.com (Dan Yamins) Date: Tue, 10 Nov 2009 11:37:19 -0500 Subject: [SciPy-User] Edge Detection Message-ID: <15e4667e0911100837q5dc003d6re78e6e66ba51972@mail.gmail.com> Hi, I'm looking into using SciPy for a couple of edge-detection problems, involving detection of edges in images of text (in simple, clean fonts). If someone on this list could point me to a relevant resource / function, that would be excellent. (I have essentially no background in image processing, but am reasonably comfortable mathematically, and I would be happy to dive into something fairly technical.) thanks, Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: From zachary.pincus at yale.edu Tue Nov 10 11:48:51 2009 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Tue, 10 Nov 2009 11:48:51 -0500 Subject: [SciPy-User] Edge Detection In-Reply-To: <15e4667e0911100837q5dc003d6re78e6e66ba51972@mail.gmail.com> References: <15e4667e0911100837q5dc003d6re78e6e66ba51972@mail.gmail.com> Message-ID: <4A698D9C-7FC0-4FE4-BA8D-7628EF3AAE71@yale.edu> References: Start around just looking at the top google hits for "image processing edge detection" -- that should be a pretty good start. Also, google any unfamiliar terms below... I really find that there's a ton of good basic image-processing information available online. Code: Look at what's available in scipy.ndimage. There are functions for getting gradient magnitudes, as well as standard filters like Sobel etc. (which you'll learn about from the above), plus morphological operators for modifying binarized image regions (e.g. like erosion etc.; useful for getting rid of stray noise-induced edges), plus some basic functions for image smoothing like median filters, etc. For exploratory analysis, you might want some ability to interactively visualize images; you could use matplotlib or the imaging scikit, which is still pre-release but making fast progress: http://github.com/stefanv/scikits.image I've attached basic code for Canny edge detection, which should demonstrate a bit about how ndimage works, plus it's useful in its own right. There is also some code floating around for anisotropic diffusion and bilateral filtering, which are two noise-reduction methods that can be better than simple median filtering. Zach On Nov 10, 2009, at 11:37 AM, Dan Yamins wrote: > Hi, > > I'm looking into using SciPy for a couple of edge-detection > problems, involving detection of edges in images of text (in simple, > clean fonts). If someone on this list could point me to a relevant > resource / function, that would be excellent. (I have essentially > no background in image processing, but am reasonably comfortable > mathematically, and I would be happy to dive into something fairly > technical.) > > thanks, > Dan > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- A non-text attachment was scrubbed... Name: canny.py Type: text/x-python-script Size: 2174 bytes Desc: not available URL: From agile.aspect at gmail.com Tue Nov 10 12:54:31 2009 From: agile.aspect at gmail.com (Agile Aspect) Date: Tue, 10 Nov 2009 09:54:31 -0800 Subject: [SciPy-User] Weierstrass and Jacobi In-Reply-To: <4AF8842D.5010805@ucsf.edu> References: <4AF8842D.5010805@ucsf.edu> Message-ID: On Mon, Nov 9, 2009 at 1:05 PM, Karl Young wrote: > > Sorry for the dumb question (but some of you know me by now !). I was > able to stumble around and solve a differential equation I was working > on in terms of Weierstrass elliptic functions (though an open source > type of guy I have to thank Wolfram re. wloframalpha for help with > that...). I'd like to evaluate the function for various sets of > parameters and found that the special functions package for scipy has > Jacobi elliptic functions available. I seem to recall that the > Weierstrass elliptic functions are special cases of the Jacobi elliptic > functions but haven't been able to locate any source that describes that > in any detail. Anyone have any hints ? Thanks, > > -- Karl > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Take a look at http://eom.springer.de/w/w097450.htm just before the references, or http://en.wikipedia.org/wiki/Weierstrass's_elliptic_functions#Relation_to_Jacobi_elliptic_functions just before the references. -- Enjoy global warming while it lasts. From mudit_19a at yahoo.com Tue Nov 10 13:16:53 2009 From: mudit_19a at yahoo.com (mudit sharma) Date: Tue, 10 Nov 2009 23:46:53 +0530 (IST) Subject: [SciPy-User] Pytseries numpy func error In-Reply-To: References: <4AF8842D.5010805@ucsf.edu> Message-ID: <835246.33088.qm@web94906.mail.in2.yahoo.com> series.sum() gives this error whereas series.data.sum() works. /usr/local/lib/python2.6/dist-packages/scikits.timeseries-0.91.1-py2.6-linux-x86_64.egg/scikits/timeseries/tseries.pyc in __call__(self, *args, **params) 471 (_dates, _series) = (instance._dates, instance._series) 472 func = getattr(_series, self.__name__) --> 473 result = func(*args, **params) 474 if _dates.size != _series.size: 475 axis = params.get('axis', None) /usr/local/lib/python2.6/dist-packages/numpy-1.3.0-py2.6-linux-x86_64.egg/numpy/ma/core.pyc in sum(self, axis, dtype, out) 3675 # No explicit output 3676 if out is None: -> 3677 result = self.filled(0).sum(axis, dtype=dtype).view(type(self)) 3678 if result.ndim: 3679 result.__setmask__(newmask) AttributeError: 'float' object has no attribute 'view' From rcsqtc at iqac.csic.es Tue Nov 10 13:26:10 2009 From: rcsqtc at iqac.csic.es (Ramon Crehuet) Date: Tue, 10 Nov 2009 19:26:10 +0100 Subject: [SciPy-User] Contribution to Performance Python Message-ID: <4AF9B042.3040402@iqac.csic.es> Robin, I know it is a wiki, but you need to get permission to modify it. I've tried registering at: http://docs.scipy.org/numpy/accounts/login but I always get an Authentication failed error. However if I try to register again, it complains that my username is already in use (it wasn't the first time I registered, so that is me! :-) ) My username is rcrehuet, and the address rcsqtc_at_iqac.csic.es Cheers, Ramon PS. David, I am willing to contribute to the cookbook. On Fri, Nov 6, 2009 at 10:53 AM, Ramon Crehuet wrote: > > If this is interesting to the community, who should I contact to have > > this included in the scipy web page? Hi, It's a wiki, so I think you should be able to register an account and modify the page yourself. I'd certainly support the addition of the new fortran versions (although perhaps one is sufficient since they both seem to perform very closely) and perhaps updated timings section. Cheers Robin From robince at gmail.com Tue Nov 10 13:40:37 2009 From: robince at gmail.com (Robin) Date: Tue, 10 Nov 2009 18:40:37 +0000 Subject: [SciPy-User] Contribution to Performance Python In-Reply-To: <4AF9B042.3040402@iqac.csic.es> References: <4AF9B042.3040402@iqac.csic.es> Message-ID: <2d5132a50911101040gfdfa593r3b914cfaf1f7e0a2@mail.gmail.com> On Tue, Nov 10, 2009 at 6:26 PM, Ramon Crehuet wrote: > Robin, > I know it is a wiki, but you need to get permission to modify it. I've > tried registering at: > http://docs.scipy.org/numpy/accounts/login > but I always get an Authentication failed error. However if I try to > register again, it complains that my username is already in use (it > wasn't the first time I registered, so that is me! :-) ) My username is > rcrehuet, and the address rcsqtc_at_iqac.csic.es I think docs.scipy.org is specifically for the documentation effort - it is a wiki whereby people can contribute to docstrings that will then be merged back into the numpy source code. It does require permissions to edit, but I thought the scipy website, which is a wiki at www.scipy.org requires registration, but I thought only certain pages (like the front page) were locked and the rest should be editable by any user. Try to follow the login link at the top right directly from the page you want to edit: http://www.scipy.org/PerformancePython There is a link from the login screen to register a wiki account here: http://www.scipy.org/UserPreferences So you could try that. Cheers Robin > Cheers, > Ramon > > PS. David, I am willing to contribute to the cookbook. > > > > On Fri, Nov 6, 2009 at 10:53 AM, Ramon Crehuet wrote: >> > If this is interesting to the community, who should I contact to have >> > this included in the scipy web page? > > Hi, > > It's a wiki, so I think you should be able to register an account and > modify the page yourself. I'd certainly support the addition of the > new fortran versions (although perhaps one is sufficient since they > both seem to perform very closely) and perhaps updated timings > section. > > Cheers > > Robin > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From david_baddeley at yahoo.com.au Tue Nov 10 15:51:14 2009 From: david_baddeley at yahoo.com.au (David Baddeley) Date: Tue, 10 Nov 2009 12:51:14 -0800 (PST) Subject: [SciPy-User] Saving record arrays to tab formatted txt Message-ID: <67025.17033.qm@web33008.mail.mud.yahoo.com> Hi all, does anyone know of an easy way to save record arrays as tab formatted txt? numpy.savetxt doesn't do the trick. I've got a record array with nested records, and mixed data types - dtype is as follows: dtype([('tIndex', ' In the code below (which is an extraction from a larger set of file), I can plot fine across the dateline and elsewhere, however any plots across the PM error out. I'm using OpenLayers to allow users to designate their plot area via a web form by dragging a map area. OpenLayers provide longitudes from 180/-180. The data I'm extracting has longitudes from 0-360 so some manipulation is necessary. I suspect the problem that is causing the error is the line that reads: "grid = grid[slat:nlat+1,wlong:elong+1]" because my wlong is greater than my elong. Does anyone have a suggestion for handling this at the PM. Any suggestions would be appreciated. The error is in the comments of the below script: import matplotlib import matplotlib.pyplot as pyplot #used to build contour and wind barbs plots import matplotlib.colors as pycolors #used to build color schemes for plots import numpy.ma as M #matrix manipulation functions from mpl_toolkits.basemap import Basemap import numpy as np #used to perform simple math functions on data from numpy import * ############################################################## def read_netcdf(filename, hour, param, wlong, elong, nlat, slat,day_time_level): from netCDF4 import Dataset #interprets NetCDF files nc_file = Dataset(filename, mode="r") nlat = nlat+90 slat = slat+90 swh = nc_file.variables['sig_wav_ht'][:] swh = np.squeeze(swh) grid = swh[int(day_time_level)-1, :,: ] grid = M.array(grid) grid = M.masked_where(grid < 0.001, grid) grid = grid[slat:nlat+1,wlong:elong+1] return grid ############################################################### map_res = "i" nlat = 7 slat = -7 #******************************************************* #when the below values for wlong and elong are as such # (across the Prime Meridian) I get the following error: #Traceback (most recent call last): # File "C:\Program Files\Wing\src\debug\tserver\_sandbox.py", line 58, in # File "C:\Python25\Lib\site-packages\numpy\ma\core.py", line 4262, in min # result = self.filled(fill_value).min(axis=axis, out=out).view(type(self)) #ValueError: zero-size array to ufunc.reduce without identity wlong = -12 elong = 8 #******************************************************* # When the below two values are as such, I get the figure I intend #wlong = -89 #elong = -75 day_raw = 1 param = "sig" year1 = 1993 month = "01" vint = 15 para_spacing = 10 merid_spacing = 10 hour = "%02d" % day_raw if wlong < 0: wlong = wlong+360 if elong < 0: elong = elong+360 vtype = "auto" filename = "x:/ww3/NetCDF/daily_means/ww3dm."+str(year1)+str(month)+ ".nc" grid = read_netcdf(filename, hour=hour, param=param, wlong=wlong, elong=elong, nlat=nlat, slat=slat, day_time_level=day_raw) m = Basemap(projection='cyl',resolution=map_res,llcrnrlon=wlong,llcrnrlat=slat,urcrnrlon=elong,urcrnrlat=nlat) x,y = m(*np.meshgrid(range(wlong,elong+1),range(slat,nlat+1))) print wlong print elong if vtype == 'auto': vmin = grid.min() print "

Vmin: ",vmin,"

" vmax = grid.max() print "

Vmax: ",vmax,"

" #vmin = 0 #vmax = 30 elif vtype== "manual": grid = M.masked_outside(grid,vmin,vmax) pyplot.jet() plot = m.contour(x,y,grid,int(vint)-1,linewidths=0.5,colors='k') plot = m.contourf(x,y,grid,int(vint)-1,cmap=pyplot.cm.jet) m.drawcoastlines() #draw coastlines m.drawmapboundary() #draw a line around the map region m.fillcontinents(color='0.8', lake_color=None, ax=None, zorder=None) #fill in continents with color (gray) m.drawparallels(np.arange(-90,90,para_spacing),labels=[1,0,0,0]) #draw parallels m.drawmeridians(np.arange(-180,180,merid_spacing),labels=[0,0,0,1]) #draw meridians pyplot.show() Bruce --------------------------------------- Bruce W. Ford Clear Science, Inc. bruce at clearscienceinc.com bruce.w.ford.ctr at navy.smil.mil http://www.ClearScienceInc.com Phone/Fax: 904-379-9704 8241 Parkridge Circle N. Jacksonville, FL 32211 Skype: bruce.w.ford Google Talk: fordbw at gmail.com From jsseabold at gmail.com Tue Nov 10 17:00:47 2009 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 10 Nov 2009 17:00:47 -0500 Subject: [SciPy-User] Saving record arrays to tab formatted txt In-Reply-To: <67025.17033.qm@web33008.mail.mud.yahoo.com> References: <67025.17033.qm@web33008.mail.mud.yahoo.com> Message-ID: On Nov 10, 2009, at 3:51 PM, David Baddeley wrote: > Hi all, > > does anyone know of an easy way to save record arrays as tab > formatted txt? numpy.savetxt doesn't do the trick. > > I've got a record array with nested records, and mixed data types - > dtype is as follows: > > dtype([('tIndex', ' ' ('bx', ' ' ('bx', ' ('slicesUsed', [('x', [('start', ' ' ' ' > and want to flatten it so that each entry in the table becomes a row > in the .txt file. ie: > > tIndex fitResults_A fitResults_x0 ....\n > tIndex fitResults_A fitResults_x0 ....\n > etc... > > If necessary I'll write my own code to format up a string for each > line, but thought I'd ask first in case anyone knew of a pre- > existing solution. It'd ideally be generic as I'm also likely to > want to use variations on the data type. > > many thanks, > David > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user I have just been using savetxt as a template and adding processing as needed. Would be interested in hearing any other solutions or extending savetxt to be more flexible / general. -Skipper From mattknox.ca at gmail.com Tue Nov 10 17:29:41 2009 From: mattknox.ca at gmail.com (Matt Knox) Date: Tue, 10 Nov 2009 22:29:41 +0000 (UTC) Subject: [SciPy-User] Pytseries numpy func error References: <4AF8842D.5010805@ucsf.edu> <835246.33088.qm@web94906.mail.in2.yahoo.com> Message-ID: > series.sum() gives this error whereas series.data.sum() > works. I don't get this error when trying a sum on a TimeSeries object. I noticed you are using an older version of the timeseries module. Can you try upgrading to the latest version and see if you still get an error? Also, if you still get the error please post a small example demonstrating how to get the error, thanks. Also, note that we will probably be doing a new minor bug fix release within the next week or two. - Matt From eadrogue at gmx.net Tue Nov 10 17:38:23 2009 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Tue, 10 Nov 2009 23:38:23 +0100 Subject: [SciPy-User] the skellam distribution In-Reply-To: <1cd32cbb0911091136m26c9dd37r229051142c43c63d@mail.gmail.com> References: <20091108151625.GA561@doriath.local> <1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com> <20091109181650.GA5957@doriath.local> <1cd32cbb0911091136m26c9dd37r229051142c43c63d@mail.gmail.com> Message-ID: <20091110223823.GA10421@doriath.local> 9/11/09 @ 14:36 (-0500), thus spake josef.pktd at gmail.com: > I'm not sure the example is correct. You are simulating two > independent poisson variables, so the difference skellam_var should be > distributed as skellam with > mu1 = lam1 > mu2 = lam2 > and theoretical rho should be zero. Or am I missing something? Yes, rho should be zero, but in practice, there may be a certain amount of correlation even though the variables are theoretically independent. If you set rho=0 when it's not actually zero, this will result in an artificially worse fit, which sort of defeats the purpose of this test. It would make sense to set rho=0 if we were testing whether the to Poisson variates are independent, though. Bye the way, I have opened a ticket here: http://projects.scipy.org/scipy/ticket/1050 Bye. Ernest From eadrogue at gmx.net Tue Nov 10 17:47:05 2009 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Tue, 10 Nov 2009 23:47:05 +0100 Subject: [SciPy-User] the skellam distribution In-Reply-To: <1cd32cbb0911090858r60713d83jf4e351401a640b12@mail.gmail.com> References: <20091108151625.GA561@doriath.local> <1cd32cbb0911090640q1f932141m834e26602c053018@mail.gmail.com> <4AF83AFD.60304@gmail.com> <4AF83CE8.7080507@lpta.in2p3.fr> <4AF8411D.9050605@gmail.com> <1cd32cbb0911090858r60713d83jf4e351401a640b12@mail.gmail.com> Message-ID: <20091110224705.GB10421@doriath.local> 9/11/09 @ 11:58 (-0500), thus spake josef.pktd at gmail.com: > from > Bayesian analysis of the dierences of count data > D. Karlis and I. Ntzoufras > STATISTICS IN MEDICINE > Statist. Med. 2006; 25:1885?1905 > > they have some funny application to soccer scores > http://stat-athens.aueb.gr/~jbn/publications.htm Yes, I have working on modelling soccer scores for some time now, and have come to the conclusion that Poisson models are doomed, because in essence football scores are not Poisson distributed. If you are interested in these things let me suggest you to concentrate your efforts on the negative binomial model, no matter how tempting the Poisson model may be. Cheers. Ernest From devicerandom at gmail.com Wed Nov 11 11:26:23 2009 From: devicerandom at gmail.com (ms) Date: Wed, 11 Nov 2009 16:26:23 +0000 Subject: [SciPy-User] ODR fitting several equations to the same parameters Message-ID: <4AFAE5AF.3020506@gmail.com> Hi, Probably it is a noobish question, but statistics is still not my cup of tea as I'd like it to be. :) Let's start with a simple example. Imagine I have several linear data sets y=ax+b which have different b (all of them are known) but that should fit to the same (unknown) a. To have my best estimate of a, I would want to fit them all together. In this case it is trivial, you just subtract the known b from the data set and fit them all at the same time. In my case it is a bit different, in the sense that I have to do conceptually the same thing but for a highly non-linear equation where the equivalent of "b" above is not so simple to separate. I wonder therefore if there is a way to do a simultaneous fit of different equations differing only in the known parameters and having a single output, possibly with the help of ODR. Is this possible? And/or what should be the best thing to do, in general, for this kind of problems? Many thanks, M. From bsouthey at gmail.com Wed Nov 11 12:04:14 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 11 Nov 2009 11:04:14 -0600 Subject: [SciPy-User] ODR fitting several equations to the same parameters In-Reply-To: <4AFAE5AF.3020506@gmail.com> References: <4AFAE5AF.3020506@gmail.com> Message-ID: <4AFAEE8E.9080000@gmail.com> On 11/11/2009 10:26 AM, ms wrote: > Hi, > > Probably it is a noobish question, but statistics is still not my cup of > tea as I'd like it to be. :) > > Let's start with a simple example. Imagine I have several linear data > sets y=ax+b which have different b (all of them are known) but that > should fit to the same (unknown) a. To have my best estimate of a, I > would want to fit them all together. In this case it is trivial, you > just subtract the known b from the data set and fit them all at the same > time. > Although b is known without error you still have potentially effects due to each data set. What I would do is fit: y= mu + dataset + a*x + dataset*a*x Where mu is some overall mean, dataset is the effect of the ith dataset - allows different intercepts for each data set dataset*a is the interaction between a and the dataset - allows different slopes for each dataset. Obviously you first test that interaction is zero. In theory, the difference between the solutions of dataset should equate to the differences between the known b's. > In my case it is a bit different, in the sense that I have to do > conceptually the same thing but for a highly non-linear equation where > the equivalent of "b" above is not so simple to separate. I wonder > therefore if there is a way to do a simultaneous fit of different > equations differing only in the known parameters and having a single > output, possibly with the help of ODR. Is this possible? And/or what > should be the best thing to do, in general, for this kind of problems? > > Many thanks, > M. > Now you just expand your linear model to nonlinear one. The formulation depends on your equation. But really you just replace f(a*x) with f(a*x+dataset*a*x). So I first try with a linear model before a nonlinear. Also I would see if I could linearize the non-linear function. Bruce From seefeld at sympatico.ca Wed Nov 11 12:01:47 2009 From: seefeld at sympatico.ca (Stefan Seefeld) Date: Wed, 11 Nov 2009 12:01:47 -0500 Subject: [SciPy-User] Use of MPI in extension modules Message-ID: <4AFAEDFB.10808@sympatico.ca> Hello, I have a rather basic question about using (C++) extension modules with ipython. Sorry if this is the wrong list for this. I'm working on a signal & image processing library that uses MPI internally. I'd like to provide a Python interface to it, so I can integrate it into SciPy. With 'normal' Python this all works nicely. Just recently I have started to consider parallelism, i.e. I want to use the library's internal parallelism, by running it with ipython in parallel. My assumption was that all the engines started via 'ipcluster mpiexec ..." would already have MPI_Init called, and thus, my extension modules would merely share the global MPI state with the Python interpreter. That doesn't seem to be the case, as I either see all my module instances report rank 0, or, if I don't call MPI_Init, get a failure on the first MPI call I do. Can anybody help ? Do I need to initialize MPI myself in my extension module ? Any pointers are highly appreciated. Thanks, Stefan -- ...ich hab' noch einen Koffer in Berlin... From ellisonbg.net at gmail.com Wed Nov 11 13:23:42 2009 From: ellisonbg.net at gmail.com (Brian Granger) Date: Wed, 11 Nov 2009 10:23:42 -0800 Subject: [SciPy-User] Use of MPI in extension modules In-Reply-To: <4AFAEDFB.10808@sympatico.ca> References: <4AFAEDFB.10808@sympatico.ca> Message-ID: <6ce0ac130911111023i2fd49829yd4b0feaa9022cfca@mail.gmail.com> Stefan, This is probably a better topic for the IPython users list: http://mail.scipy.org/mailman/listinfo/ipython-user I'm working on a signal & image processing library that uses MPI internally. I'd like to provide a Python interface to it, so I can > integrate it into SciPy. With 'normal' Python this all works nicely. > Just recently I have started to consider parallelism, i.e. I want to use > the library's internal parallelism, by running it with ipython in parallel. > My assumption was that all the engines started via 'ipcluster mpiexec > ..." would already have MPI_Init called, and thus, my extension modules > would merely share the global MPI state with the Python interpreter. > That doesn't seem to be the case, as I either see all my module > instances report rank 0, or, if I don't call MPI_Init, get a failure on > the first MPI call I do. > > You do need to tell the IPython engine how the should call MPI_Init. The best way of doing this is to install mpi4py and then call ipcluster with the --mpi=mpi4py option. Once you do this, you can simply import your extension module and use it - you won't have to call MPI_Init again. The reason that IPython need to be told how MPI_Init is called is that we try to make sure that the engine ids match the MPI ranks. But, one question. Why not use mip4py for yor MPI calls? If you really need low-level C stuff mpi4py works very well with cython. All that would be much more pleasant than writing low level C/MPI code. The key is that mpi4py handles all the subtleties of the different MPI platforms, and OSs. Doing that yourself is quite painful. Cheers, Brian Can anybody help ? Do I need to initialize MPI myself in my extension > module ? > Any pointers are highly appreciated. > > Thanks, > Stefan > > -- > > ...ich hab' noch einen Koffer in Berlin... > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmp50 at ukr.net Wed Nov 11 13:28:24 2009 From: tmp50 at ukr.net (Dmitrey) Date: Wed, 11 Nov 2009 20:28:24 +0200 Subject: [SciPy-User] isn't it a bug in scipy.sparse? + some questions In-Reply-To: Message-ID: So, anyone doesn't know answers to the questions about scipy.sparse module mentioned below? As for the bug mentioned, I have installed latest numpy & scipy svn snapshots (1.4.0.dev7726 and 0.8.0.dev6096), the bug still exist. D. --- ???????? ????????? --- ?? ????: "Dmitrey" ????: scipy-user at scipy.org ????: 8 ??????, 13:22:34 ????: [SciPy-User] isn't it a bug in scipy.sparse? + some questions Hi scipy.sparse developers and all other scipy users, I'm trying to take benefits for solving SLEs in FuncDesigner via involving scipy.sparse. Some examples are here http://openopt.org/FuncDesignerDoc#Solving_systems_of_linear_equations and example for sparse SLEs is here http://trac.openopt.org/openopt/browser/PythonPackages/FuncDesigner/FuncDesigner/examples/sparseSLE.py It already works faster than using dense matrices, but I want to speedup it even more, so I have some questions and seems like bug report (scipy.__version__ 0.7.0): from scipy import sparse from numpy import * a=sparse.lil_matrix((3,1)) a[0:3,:] = ones(3) print a.todense() #prints [[ 1.] ?[ 0.] ?[ 0.]] while I expect all-ones Questions: 1) Seems like a[some_ind,:]=something works very, very slow for lil. I have implemented a workaround, but can I use a[some_ind,:] for another format than lil? (seems like all other ones doesn't support it). 2) What is current situation with matmat and matvec functions? They say "deprecated" but no alternative is mentioned. 3) What is current situation with scipy.sparse.linalg.spsolve? It says /usr/lib/python2.6/dist-packages/scipy/sparse/linalg/dsolve/linsolve.py:78: DeprecationWarning: scipy.sparse.linalg.dsolve.umfpack will be removed, install scikits.umfpack instead ? ' install scikits.umfpack instead', DeprecationWarning ) But I don't want my code to be dependent on a scikits module. Are there another default/autoselect solver for sparse SLEs? If no, which one would you recommend me to use as default for sparse SLEs - bicg, gmres, something else? Thank you in advance, D. _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Wed Nov 11 14:36:31 2009 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 11 Nov 2009 21:36:31 +0200 Subject: [SciPy-User] isn't it a bug in scipy.sparse? + some questions In-Reply-To: References: Message-ID: <1257968191.4524.2.camel@idol> ke, 2009-11-11 kello 20:28 +0200, Dmitrey kirjoitti: > So, anyone doesn't know answers to the questions about scipy.sparse > module mentioned below? Or the people who know did not have time to immediately answer your question, and forgot about the mail afterwards. If you think it's a bug, please file a ticket in the Trac. Thanks! -- Pauli Virtanen From denis.laxalde at gmail.com Wed Nov 11 14:48:20 2009 From: denis.laxalde at gmail.com (Denis Laxalde) Date: Wed, 11 Nov 2009 14:48:20 -0500 Subject: [SciPy-User] isn't it a bug in scipy.sparse? + some questions In-Reply-To: References: Message-ID: <1257968900.22136.24.camel@157-rome.campus.mcgill.ca> Hi Dmitrey, Le mercredi 11 novembre 2009 ? 20:28 +0200, Dmitrey a ?crit : > So, anyone doesn't know answers to the questions about scipy.sparse > module mentioned below? > As for the bug mentioned, I have installed latest numpy & scipy svn > snapshots (1.4.0.dev7726 and 0.8.0.dev6096), the bug still exist. > D. > > --- ???????? ????????? --- > ?? ????: "Dmitrey" > ????: scipy-user at scipy.org > ????: 8 ??????, 13:22:34 > ????: [SciPy-User] isn't it a bug in scipy.sparse? + some questions > > Hi scipy.sparse developers and all other scipy users, > I'm trying to take benefits for solving SLEs in FuncDesigner > via involving scipy.sparse. > Some examples are here > http://openopt.org/FuncDesignerDoc#Solving_systems_of_linear_equations > and example for sparse SLEs is here > http://trac.openopt.org/openopt/browser/PythonPackages/FuncDesigner/FuncDesigner/examples/sparseSLE.py > It already works faster than using dense matrices, but I want > to speedup it even more, so I have some questions and seems > like bug report (scipy.__version__ 0.7.0): > > from scipy import sparse > from numpy import * > a=sparse.lil_matrix((3,1)) > a[0:3,:] = ones(3) > print a.todense() > #prints > [[ 1.] > [ 0.] > [ 0.]] > while I expect all-ones in this case, using: a[0:3,:] = 1 will do what you want. I don't know if it's really a bug. > > Questions: > 1) Seems like a[some_ind,:]=something works very, very slow > for lil. I have implemented a workaround, but can I use > a[some_ind,:] for another format than lil? (seems like all > other ones doesn't support it). >From what I understand, lil format is useful for building matrices terms by terms. As for advanced indexing operations, I guess coo format is more appropriate... -- Denis From sturla at molden.no Wed Nov 11 17:16:43 2009 From: sturla at molden.no (Sturla Molden) Date: Wed, 11 Nov 2009 23:16:43 +0100 Subject: [SciPy-User] Use of MPI in extension modules In-Reply-To: <4AFAEDFB.10808@sympatico.ca> References: <4AFAEDFB.10808@sympatico.ca> Message-ID: <4AFB37CB.7000807@molden.no> Stefan Seefeld skrev: > the library's internal parallelism, by running it with ipython in parallel. > My assumption was that all the engines started via 'ipcluster mpiexec > ..." would already have MPI_Init called, and thus, my extension modules > would merely share the global MPI state with the Python interpreter. I don't know ipython, but I use MPI now and then. You can e.g. spawn 4 processes of an executable using a statement like: $ mpiexec -n 4 executable Each process spawned ny mpiexec must call MPI_Init once and before any other MPI call. The call to MPI_Init is global to the process, it does not matter that Python extensions are DLLs. You need to call MPI_Init exactly once in each MPI-spawned process, and it does not matter how: - using ctypes - in an extension module - in C code embedding a Python interpreter - in a modified Python interpreter If you only get rank 0 reported, it means you spawned just one process. That could happen if you forget to specify how many processes you want in the call to mpiexec. Sturla From seefeld at sympatico.ca Wed Nov 11 17:45:35 2009 From: seefeld at sympatico.ca (Stefan Seefeld) Date: Wed, 11 Nov 2009 17:45:35 -0500 Subject: [SciPy-User] Use of MPI in extension modules In-Reply-To: <6ce0ac130911111023i2fd49829yd4b0feaa9022cfca@mail.gmail.com> References: <4AFAEDFB.10808@sympatico.ca> <6ce0ac130911111023i2fd49829yd4b0feaa9022cfca@mail.gmail.com> Message-ID: <4AFB3E8F.3040009@sympatico.ca> On 11/11/2009 01:23 PM, Brian Granger wrote: > Stefan, > > This is probably a better topic for the IPython users list: > > http://mail.scipy.org/mailman/listinfo/ipython-user Thanks ! I didn't know that actually exists. It doesn't appear to be listed on either http://www.scipy.org or http://ipython.scipy.org, nor on http://www.scipy.org/Mailing_Lists. I'll cross-post there, so we may continue the conversation there, assuming I'm not moderated. > > I'm working on a signal & image processing library that uses MPI > > internally. I'd like to provide a Python interface to it, so I can > integrate it into SciPy. With 'normal' Python this all works nicely. > Just recently I have started to consider parallelism, i.e. I want > to use > the library's internal parallelism, by running it with ipython in > parallel. > My assumption was that all the engines started via 'ipcluster mpiexec > ..." would already have MPI_Init called, and thus, my extension > modules > would merely share the global MPI state with the Python interpreter. > That doesn't seem to be the case, as I either see all my module > instances report rank 0, or, if I don't call MPI_Init, get a > failure on > the first MPI call I do. > > > You do need to tell the IPython engine how the should call MPI_Init. > The best way of doing this > is to install mpi4py and then call ipcluster with the --mpi=mpi4py option. > > Once you do this, you can simply import your extension module and use > it - you won't have > to call MPI_Init again. The reason that IPython need to be told how > MPI_Init is called > is that we try to make sure that the engine ids match the MPI ranks. I'm not sure I understand. In fact, I had expected *only* ipython needed to know how to call MPI_Init. The rest of my own (extension) code then merely assumes it has been called with the appropriate arguments (which ultimately come from "mpirun", which itself is invoked by ipcluster, isn't it ? Is that not true ? Is there some documentation that explains the interaction between (i)python (the ipcluster.py module in particular), mpirun, and the ipengine script that the latter then invokes ? May be I can call MPI_Init() myself, if I know the arguments I need to pass along. > > But, one question. Why not use mip4py for yor MPI calls? If you > really need low-level C stuff > mpi4py works very well with cython. All that would be much more > pleasant than writing > low level C/MPI code. The key is that mpi4py handles all the > subtleties of the different MPI > platforms, and OSs. Doing that yourself is quite painful. Well, happily this is already done. :-) (I'm talking about http://www.codesourcery.com/vsiplplusplus) In fact, we have embedded most of the MPI subtleties deeply in our library. Let me (very quickly) outline the idea of our approach: The library provides a set of block types (for vectors, matrices, tensors), which may or may not be distributed. Most of the MPI calls need to be done on assignment, i.e. an equation "A = B" will result in communication if (and only if) A and B are distributed, and their distributions don't match. This programming paradigm is very similar to that used in pMatlab (http://www.ll.mit.edu/pMatlab) So, all of this is already done. I'm now merely interested in adding Python bindings to it. Thanks, Stefan -- ...ich hab' noch einen Koffer in Berlin... From seefeld at sympatico.ca Wed Nov 11 17:48:11 2009 From: seefeld at sympatico.ca (Stefan Seefeld) Date: Wed, 11 Nov 2009 17:48:11 -0500 Subject: [SciPy-User] Use of MPI in extension modules In-Reply-To: <4AFB37CB.7000807@molden.no> References: <4AFAEDFB.10808@sympatico.ca> <4AFB37CB.7000807@molden.no> Message-ID: <4AFB3F2B.8060706@sympatico.ca> On 11/11/2009 05:16 PM, Sturla Molden wrote: > > If you only get rank 0 reported, it means you spawned just one process. > That could happen if you forget to specify how many processes you want > in the call to mpiexec. > I get rank 0 reported in each of the spawned processes, which is caused by each one calling MPI_Init() as if it was the only process. What I need to know is precisely how to call MPI_Init(), i.e. what to arguments to pass (either from sys.argv, or some other list where this gets stored when ipython invokes iengine). Thanks, Stefan -- ...ich hab' noch einen Koffer in Berlin... From robert.kern at gmail.com Wed Nov 11 17:52:01 2009 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 11 Nov 2009 16:52:01 -0600 Subject: [SciPy-User] Use of MPI in extension modules In-Reply-To: <4AFB3E8F.3040009@sympatico.ca> References: <4AFAEDFB.10808@sympatico.ca> <6ce0ac130911111023i2fd49829yd4b0feaa9022cfca@mail.gmail.com> <4AFB3E8F.3040009@sympatico.ca> Message-ID: <3d375d730911111452p67fe262t27fd39f3d6024425@mail.gmail.com> On Wed, Nov 11, 2009 at 16:45, Stefan Seefeld wrote: > On 11/11/2009 01:23 PM, Brian Granger wrote: >> Stefan, >> >> This is probably a better topic for the IPython users list: >> >> http://mail.scipy.org/mailman/listinfo/ipython-user > > Thanks ! > I didn't know that actually exists. It doesn't appear to be listed on > either http://www.scipy.org or http://ipython.scipy.org, It's under the section "Mailing Lists" on that page. > nor on > http://www.scipy.org/Mailing_Lists. I'll cross-post there, so we may > continue the conversation there, assuming I'm not moderated. The next time you need to move a thread over, please just start a new thread. Cross-posts have a way of not stopping when you want them to. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From matthew.brett at gmail.com Thu Nov 12 02:51:23 2009 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 11 Nov 2009 23:51:23 -0800 Subject: [SciPy-User] loading mat file in scipy In-Reply-To: <2d5132a50910200925u279cd97av1855069e12e72112@mail.gmail.com> References: <3312.140.105.40.24.1256052796.squirrel@webmail.sissa.it> <3d375d730910200835p65aa28b3i92c72759dbed5f20@mail.gmail.com> <2d5132a50910200925u279cd97av1855069e12e72112@mail.gmail.com> Message-ID: <1e2af89e0911112351l339c015apc8ede71d7751f9fc@mail.gmail.com> Hi, > http://www.robince.net/robince/structs_cells.mat > http://mail.scipy.org/pipermail/scipy-user/2009-April/020860.html > > Make sure you are using a recent version of scipy. I think there was > some performance fixes that improved it - with current scipy SVN on a > macbook pro structs_cells.mat takes about 28s to load > (structs_as_record doesn't seem to make a difference). This is already > some improvement (40s in April, 4 minutes prior to that). On Matlab it > takes about 1.4s. The current code from the git branch that I posted is now running at around 1.5 s to load your file, on a fast machine. Matlab 7.4 is taking around 5s on the same machine. Best, Matthew From robince at gmail.com Thu Nov 12 05:44:44 2009 From: robince at gmail.com (Robin) Date: Thu, 12 Nov 2009 10:44:44 +0000 Subject: [SciPy-User] loading mat file in scipy In-Reply-To: <1e2af89e0911112351l339c015apc8ede71d7751f9fc@mail.gmail.com> References: <3312.140.105.40.24.1256052796.squirrel@webmail.sissa.it> <3d375d730910200835p65aa28b3i92c72759dbed5f20@mail.gmail.com> <2d5132a50910200925u279cd97av1855069e12e72112@mail.gmail.com> <1e2af89e0911112351l339c015apc8ede71d7751f9fc@mail.gmail.com> Message-ID: <2d5132a50911120244p7bcbd1eei496b62531d89f7fa@mail.gmail.com> On Thu, Nov 12, 2009 at 7:51 AM, Matthew Brett wrote: > The current code from the git branch that I posted is now running at > around 1.5 s to load your file, on a fast machine. Matlab 7.4 is > taking around 5s on the same machine. Thanks very much, that's a terrific improvement! I have been meaning to test your branch this week, but I wasn't sure if there was a way I could build it without reinstalling my current scipy version so I was waiting for some time to read about how building in place and stuff work - in the end I just moved the current installation sideways and installed your branch as normal. I see a tremendous improvement as well - for me the demo file loads in 2.5s with improved loadmat vs 1.6s for Matlab 7.8. This is on a couple of years old macbook pro. Because I'm building against python 2.5 I'm forced to use apple gcc 4.0 and I think this could account for some of the difference (I read gcc got a lot better in more recent versions). Thanks again, Robin From anderse at gmx.de Thu Nov 12 06:35:30 2009 From: anderse at gmx.de (anderse at gmx.de) Date: Thu, 12 Nov 2009 12:35:30 +0100 Subject: [SciPy-User] Interpolate: Derivatives of parametric splines Message-ID: <20091112113530.279300@gmx.net> Hi, I'd like to get the derivatives of parametric splines. Looking at the tutorial (http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html) I get a spline like this: >>> x = np.arange(0, 2*np.pi + np.pi / 4, 2 * np.pi / 8) >>> y = np.sin(x) >>> tck = interpolate.splrep(x, y, s = 0, k = 5) >>> xnew = np.arange(0, 2 * np.pi, np.pi / 50) >>> ynew = interpolate.splev(xnew, tck, der = 0) now, the derivatives can be determined like this: >>> yder = interpolate.splev(xnew, tck, der = 1) >>> yder2 = interpolate.splev(xnew, tck, der = 2) >>> plt.plot(x, y, 'x', xnew, ynew, xnew, yder, xnew, yder2) The first derivative is about null at pi / 2, the second one at pi, as they should be: >>> interpolate.spalde(np.pi, tck) array([ 0.00000000e+00, -1.00064770e+00, -1.73418916e-17, 1.00726743e+00, -2.65046223e-16, -1.01680119e+00]) >>> interpolate.spalde(np.pi / 2, tck) array([ 1. , -0.00199181, -0.99629386, 0.02365328, 0.90756527, -0.1387468 ]) Of course, the x-range is the same, no matter of der=#. Now the parametric version: >>> tckp, u = interpolate.splprep([x, y], s=0, k=5) >>> u array([ 0. , 0.13941767, 0.25 , 0.36058233, 0.5 , 0.63941767, 0.75 , 0.86058233, 1. ]) so pi is at 0.5, pi/2 is at 0.25. And this is what I get at these 'x' values: >>> interpolate.spalde(0.5, tckp) [array([ 3.14159265e+00, 5.14754151e+00, 1.10395807e-13, 1.69542498e+02, -4.03851332e-11, -2.01255417e+04]), array([ 7.73894012e-16, -5.38240284e+00, -1.31811639e-13, 7.74093936e+01, 5.58012792e-11, 1.89849315e+04])] >>> interpolate.spalde(0.25, tckp) [array([ 1.57079633e+00, 7.44935679e+00, -7.65674781e-02, -1.85343925e+02, 7.51370411e+01, 2.46939899e+04]), array([ 1.00000000e+00, -3.47491248e-01, -5.16420728e+01, 2.05418849e+02, 3.66866738e+03, -5.71113127e+04])] The first array states the x-values, the second one the y-values, respectively, AFAIK. This makes sense without derivatives, and I get a plot using >>> unew = np.arange(0, 1.01, 0.01) >>> out = interpolate.splev(unew, tckp, der = 0) >>> plt.plot(out[0], out[1]) which looks like the one above, but what about the derivatives? >>> der1 = interpolate.splev(unew, tckp, der = 1) >>> der2 = interpolate.splev(unew, tckp, der = 2) >>> plt.plot(der1[0], der1[1], der2[0], der2[1]) dont make sense to me at all. Thank you in advance for your help. Raimund -- GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 From devicerandom at gmail.com Thu Nov 12 06:35:50 2009 From: devicerandom at gmail.com (ms) Date: Thu, 12 Nov 2009 11:35:50 +0000 Subject: [SciPy-User] ODR fitting several equations to the same parameters In-Reply-To: <4AFAEE8E.9080000@gmail.com> References: <4AFAE5AF.3020506@gmail.com> <4AFAEE8E.9080000@gmail.com> Message-ID: <4AFBF316.30507@gmail.com> Hi Bruce, Thanks for your reply but there are several things I don't really grasp: Bruce Southey ha scritto: > On 11/11/2009 10:26 AM, ms wrote: >> Let's start with a simple example. Imagine I have several linear data >> sets y=ax+b which have different b (all of them are known) but that >> should fit to the same (unknown) a. To have my best estimate of a, I >> would want to fit them all together. In this case it is trivial, you >> just subtract the known b from the data set and fit them all at the same >> time. >> > Although b is known without error you still have potentially effects due > to each data set. > > What I would do is fit: > y= mu + dataset + a*x + dataset*a*x > > Where mu is some overall mean, Mean of what? The b's? > dataset is the effect of the ith dataset - allows different intercepts > for each data set > dataset*a is the interaction between a and the dataset - allows > different slopes for each dataset. I don't really understand what quantities you mean by "effect" and "interaction", and why should I want to allow different slopes for each dataset -the aim to fit one and only one slope from all datasets. > Obviously you first test that interaction is zero. In theory, the > difference between the solutions of dataset should equate to the > differences between the known b's. ...same as above... > Now you just expand your linear model to nonlinear one. The formulation > depends on your equation. But really you just replace f(a*x) with > f(a*x+dataset*a*x). > > So I first try with a linear model before a nonlinear. Also I would see > if I could linearize the non-linear function. Well, the function is for sure non linear (it has a sigmoidal shape). To linearize it is a good idea but I am doubtful it is doable. Thanks! m. From gnurser at googlemail.com Thu Nov 12 08:07:15 2009 From: gnurser at googlemail.com (George Nurser) Date: Thu, 12 Nov 2009 13:07:15 +0000 Subject: [SciPy-User] vectorplot scikit code patch Message-ID: <1d1e6ea70911120507q621c8269h5d5b0c3f934a1377@mail.gmail.com> Hi, Not sure where to post this. The lic_demo.py and lic_efield_demo.py scripts in the vectorplot scikit fail for me with Traceback (most recent call last): File "lic_efield_demo.py", line 55, in plt.figimage(image) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/matplotlib/pyplot.py", line 404, in figimage sci(ret) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/matplotlib/pyplot.py", line 160, in sci gca()._sci(im) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/matplotlib/axes.py", line 1338, in _sci "Argument must be an image, collection, or ContourSet in this Axes") ValueError: Argument must be an image, collection, or ContourSet in this Axes I got both scripts to work by in each of them replacing plt.clf() plt.axis('off') plt.figimage(image) by fig = plt.figure() plt.clf() plt.axis('off') fig.figimage(image) --George. From zachary.pincus at yale.edu Thu Nov 12 08:19:58 2009 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Thu, 12 Nov 2009 08:19:58 -0500 Subject: [SciPy-User] Interpolate: Derivatives of parametric splines In-Reply-To: <20091112113530.279300@gmx.net> References: <20091112113530.279300@gmx.net> Message-ID: Without thinking deeply about this at all, aren't the derivatives of a parametric spline [x(p), y(p)] given as dx/dp and dy/dp, not the dx/dy that you are perhaps expecting? On Nov 12, 2009, at 6:35 AM, anderse at gmx.de wrote: > Hi, > > I'd like to get the derivatives of parametric splines. > Looking at the tutorial (http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html > ) > I get a spline like this: > >>>> x = np.arange(0, 2*np.pi + np.pi / 4, 2 * np.pi / 8) >>>> y = np.sin(x) >>>> tck = interpolate.splrep(x, y, s = 0, k = 5) >>>> xnew = np.arange(0, 2 * np.pi, np.pi / 50) >>>> ynew = interpolate.splev(xnew, tck, der = 0) > > now, the derivatives can be determined like this: > >>>> yder = interpolate.splev(xnew, tck, der = 1) >>>> yder2 = interpolate.splev(xnew, tck, der = 2) > >>>> plt.plot(x, y, 'x', xnew, ynew, xnew, yder, xnew, yder2) > > The first derivative is about null at pi / 2, > the second one at pi, as they should be: > >>>> interpolate.spalde(np.pi, tck) > array([ 0.00000000e+00, -1.00064770e+00, -1.73418916e-17, > 1.00726743e+00, -2.65046223e-16, -1.01680119e+00]) > >>>> interpolate.spalde(np.pi / 2, tck) > array([ 1. , -0.00199181, -0.99629386, 0.02365328, > 0.90756527, > -0.1387468 ]) > > Of course, the x-range is the same, no matter of der=#. > > Now the parametric version: > >>>> tckp, u = interpolate.splprep([x, y], s=0, k=5) >>>> u > array([ 0. , 0.13941767, 0.25 , 0.36058233, > 0.5 , > 0.63941767, 0.75 , 0.86058233, 1. ]) > > so pi is at 0.5, pi/2 is at 0.25. > > And this is what I get at these 'x' values: > >>>> interpolate.spalde(0.5, tckp) > [array([ 3.14159265e+00, 5.14754151e+00, 1.10395807e-13, > 1.69542498e+02, -4.03851332e-11, -2.01255417e+04]), > array([ 7.73894012e-16, -5.38240284e+00, -1.31811639e-13, > 7.74093936e+01, 5.58012792e-11, 1.89849315e+04])] > >>>> interpolate.spalde(0.25, tckp) > [array([ 1.57079633e+00, 7.44935679e+00, -7.65674781e-02, > -1.85343925e+02, 7.51370411e+01, 2.46939899e+04]), > array([ 1.00000000e+00, -3.47491248e-01, -5.16420728e+01, > 2.05418849e+02, 3.66866738e+03, -5.71113127e+04])] > > The first array states the x-values, the second one the y-values, > respectively, AFAIK. > This makes sense without derivatives, and I get a plot using > >>>> unew = np.arange(0, 1.01, 0.01) >>>> out = interpolate.splev(unew, tckp, der = 0) >>>> plt.plot(out[0], out[1]) > > which looks like the one above, but what about the derivatives? > >>>> der1 = interpolate.splev(unew, tckp, der = 1) >>>> der2 = interpolate.splev(unew, tckp, der = 2) >>>> plt.plot(der1[0], der1[1], der2[0], der2[1]) > > dont make sense to me at all. > > Thank you in advance for your help. > > Raimund > > -- > GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! > Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From anderse at gmx.de Thu Nov 12 08:44:56 2009 From: anderse at gmx.de (Raimund Andersen) Date: Thu, 12 Nov 2009 14:44:56 +0100 Subject: [SciPy-User] Interpolate: Derivatives of parametric splines In-Reply-To: References: <20091112113530.279300@gmx.net> Message-ID: <20091112134456.279300@gmx.net> Hello Zachary Pincus, thanks for your answer. Maybe I didn't get you right. The first derivative at pi/2 should be 0 ( cos(pi/2) ). What I get from interpolate.spalde(0.25, tckp) is 7.44935679e+00 and -3.47491248e-01. Now, how do I get to 0? Why those different 'x' values at all? It should be always 1.57079633e+00, no? -------- Original-Nachricht -------- > Datum: Thu, 12 Nov 2009 08:19:58 -0500 > Von: Zachary Pincus > An: SciPy Users List > Betreff: Re: [SciPy-User] Interpolate: Derivatives of parametric splines > Without thinking deeply about this at all, aren't the derivatives of a > parametric spline [x(p), y(p)] given as dx/dp and dy/dp, not the dx/dy > that you are perhaps expecting? > > > On Nov 12, 2009, at 6:35 AM, anderse at gmx.de wrote: > > > Hi, > > > > I'd like to get the derivatives of parametric splines. > > Looking at the tutorial > (http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html > > ) > > I get a spline like this: > > > >>>> x = np.arange(0, 2*np.pi + np.pi / 4, 2 * np.pi / 8) > >>>> y = np.sin(x) > >>>> tck = interpolate.splrep(x, y, s = 0, k = 5) > >>>> xnew = np.arange(0, 2 * np.pi, np.pi / 50) > >>>> ynew = interpolate.splev(xnew, tck, der = 0) > > > > now, the derivatives can be determined like this: > > > >>>> yder = interpolate.splev(xnew, tck, der = 1) > >>>> yder2 = interpolate.splev(xnew, tck, der = 2) > > > >>>> plt.plot(x, y, 'x', xnew, ynew, xnew, yder, xnew, yder2) > > > > The first derivative is about null at pi / 2, > > the second one at pi, as they should be: > > > >>>> interpolate.spalde(np.pi, tck) > > array([ 0.00000000e+00, -1.00064770e+00, -1.73418916e-17, > > 1.00726743e+00, -2.65046223e-16, -1.01680119e+00]) > > > >>>> interpolate.spalde(np.pi / 2, tck) > > array([ 1. , -0.00199181, -0.99629386, 0.02365328, > > 0.90756527, > > -0.1387468 ]) > > > > Of course, the x-range is the same, no matter of der=#. > > > > Now the parametric version: > > > >>>> tckp, u = interpolate.splprep([x, y], s=0, k=5) > >>>> u > > array([ 0. , 0.13941767, 0.25 , 0.36058233, > > 0.5 , > > 0.63941767, 0.75 , 0.86058233, 1. ]) > > > > so pi is at 0.5, pi/2 is at 0.25. > > > > And this is what I get at these 'x' values: > > > >>>> interpolate.spalde(0.5, tckp) > > [array([ 3.14159265e+00, 5.14754151e+00, 1.10395807e-13, > > 1.69542498e+02, -4.03851332e-11, -2.01255417e+04]), > > array([ 7.73894012e-16, -5.38240284e+00, -1.31811639e-13, > > 7.74093936e+01, 5.58012792e-11, 1.89849315e+04])] > > > >>>> interpolate.spalde(0.25, tckp) > > [array([ 1.57079633e+00, 7.44935679e+00, -7.65674781e-02, > > -1.85343925e+02, 7.51370411e+01, 2.46939899e+04]), > > array([ 1.00000000e+00, -3.47491248e-01, -5.16420728e+01, > > 2.05418849e+02, 3.66866738e+03, -5.71113127e+04])] > > > > The first array states the x-values, the second one the y-values, > > respectively, AFAIK. > > This makes sense without derivatives, and I get a plot using > > > >>>> unew = np.arange(0, 1.01, 0.01) > >>>> out = interpolate.splev(unew, tckp, der = 0) > >>>> plt.plot(out[0], out[1]) > > > > which looks like the one above, but what about the derivatives? > > > >>>> der1 = interpolate.splev(unew, tckp, der = 1) > >>>> der2 = interpolate.splev(unew, tckp, der = 2) > >>>> plt.plot(der1[0], der1[1], der2[0], der2[1]) > > > > dont make sense to me at all. > > > > Thank you in advance for your help. > > > > Raimund > > > > -- > > GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! > > Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -- GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 From stefan at sun.ac.za Thu Nov 12 09:08:38 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 12 Nov 2009 16:08:38 +0200 Subject: [SciPy-User] ANN: scikits.image v0.2 In-Reply-To: <9457e7c80911120603h6863532cy99499a352c2f6fca@mail.gmail.com> References: <9457e7c80911120603h6863532cy99499a352c2f6fca@mail.gmail.com> Message-ID: <9457e7c80911120608i1343d61an9fd02fb82707e668@mail.gmail.com> I'm glad to announce the second release of `scikits.image`, a collection of image processing routines for SciPy. On top of bug-fixes and improved documentation, the following changes/additions were made: - A new IO plugin infrastructure so that commands like 'imshow' are available via multiple backends (PIL, matplotlib, QT4, etc.) - ImageCollections (for cached loading of multiple images) and MultiImage (for working with multi-layered images) - More complete OpenCV wrappers - A graphical image viewer (also installed as a script `scivi`), that allows colour adjustments - Shortest path algorithm For version 0.3, we aim to - Incorporate some of the code offered by the Broad institute - Implement acquisition (grabbing images from cameras) and intrinsic camera calibration - Add real time video and camera display with processing - Improve filtering code - Add morphological operations More information is available at: http://stefanv.github.com/scikits.image/ Regards St?fan From lpc at cmu.edu Thu Nov 12 09:37:36 2009 From: lpc at cmu.edu (Luis Pedro Coelho) Date: Thu, 12 Nov 2009 09:37:36 -0500 Subject: [SciPy-User] Distributed computing: running embarrassingly parallel (python/c++) codes over a cluster Message-ID: <200911120937.36690.lpc@cmu.edu> Rohit Garg wrote: > I have an embarrassingly parallel problem, very nicely suited to > parallelization. I have lots of those :) > My only constraint is that it should be able to run a python extension > (c++) with minimum of fuss. I want to minimize the headaches involved > with setting up/writing the boilerplate code. Which > framework/approach/library would you recommend? My own: It's called jug. See http://luispedro.org/software/jug ( Or download the code from github: http://github.com/luispedro/jug ) * It works with any set of processors that can either share a filesystem (plays well with NFS, but can be slow) or a connection to a redis database (which is very easy to set up and is probably as fast as any other approach if everyone is on the same processor). A major advantage is that you write mostly Python (and not something funny looking). For example, here's what a programme with that framework would look like: @TaskGenerator def preprocess(input): ... @TaskGenerator def compute(input, param): ... @TaskGenerator def collect(inputs): ... results = [] for input in glob('*.in'): intermediate = preprocess(input) results.append(compute(intermediate, param)) final = collect(results) The only step that's different w.r.t. to the linear version is adding the TaskGenerator decorator, which changes a call of preprocess(input) into Task(preprocess, input). Jug handles everything else. I have been using this now for almost year for all my research work and it works very well for me. HTH, Luis -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part. URL: From josef.pktd at gmail.com Thu Nov 12 09:44:12 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 12 Nov 2009 09:44:12 -0500 Subject: [SciPy-User] ODR fitting several equations to the same parameters In-Reply-To: <4AFAE5AF.3020506@gmail.com> References: <4AFAE5AF.3020506@gmail.com> Message-ID: <1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com> On Wed, Nov 11, 2009 at 11:26 AM, ms wrote: > Hi, > > Probably it is a noobish question, but statistics is still not my cup of > tea as I'd like it to be. :) > > Let's start with a simple example. Imagine I have several linear data > sets y=ax+b which have different b (all of them are known) but that > should fit to the same (unknown) a. To have my best estimate of a, I > would want to fit them all together. In this case it is trivial, you > just subtract the known b from the data set and fit them all at the same > time. > > In my case it is a bit different, in the sense that I have to do > conceptually the same thing but for a highly non-linear equation where > the equivalent of "b" above is not so simple to separate. I wonder > therefore if there is a way to do a simultaneous fit of different > equations differing only in the known parameters and having a single > output, possibly with the help of ODR. Is this possible? And/or what > should be the best thing to do, in general, for this kind of problems? I don't know enough about ODR, but for least squares, optimize.leastsq or curve_fit, it seems you can just substitute any known parameters into your equation. y_i = f(x_i, a, b_i) for each group i plug in values for all b_i, gives reduced f(x_i, a) independent of specific parameters stack equations [y_i for all i] and [f(..) for all i] If you fit this in curve_fit you could also choose the weights, in case the error variance differs by groups. Does this work or am I missing the point? Josef > Many thanks, > M. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From devicerandom at gmail.com Thu Nov 12 10:04:39 2009 From: devicerandom at gmail.com (ms) Date: Thu, 12 Nov 2009 15:04:39 +0000 Subject: [SciPy-User] ODR fitting several equations to the same parameters In-Reply-To: <1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com> References: <4AFAE5AF.3020506@gmail.com> <1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com> Message-ID: <4AFC2407.1020902@gmail.com> josef.pktd at gmail.com ha scritto: > On Wed, Nov 11, 2009 at 11:26 AM, ms wrote: >> Let's start with a simple example. Imagine I have several linear data >> sets y=ax+b which have different b (all of them are known) but that >> should fit to the same (unknown) a. To have my best estimate of a, I >> would want to fit them all together. In this case it is trivial, you >> just subtract the known b from the data set and fit them all at the same >> time. >> >> In my case it is a bit different, in the sense that I have to do >> conceptually the same thing but for a highly non-linear equation where >> the equivalent of "b" above is not so simple to separate. I wonder >> therefore if there is a way to do a simultaneous fit of different >> equations differing only in the known parameters and having a single >> output, possibly with the help of ODR. Is this possible? And/or what >> should be the best thing to do, in general, for this kind of problems? > > I don't know enough about ODR, but for least squares, optimize.leastsq > or curve_fit, it seems you can just substitute any known parameters > into your equation. > > y_i = f(x_i, a, b_i) for each group i > plug in values for all b_i, gives reduced f(x_i, a) independent of > specific parameters > stack equations [y_i for all i] and [f(..) for all i] > > If you fit this in curve_fit you could also choose the weights, in > case the error variance differs by groups. > > Does this work or am I missing the point? Probably it's me missing it. Do you just mean to fit them all together separately and then make a weighted average of the fitted parameters, and using the standard deviation of the mean as the error of the fit? I am confused. sorry, m. From sccolbert at gmail.com Thu Nov 12 10:23:53 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Thu, 12 Nov 2009 16:23:53 +0100 Subject: [SciPy-User] Edge Detection In-Reply-To: <4A698D9C-7FC0-4FE4-BA8D-7628EF3AAE71@yale.edu> References: <15e4667e0911100837q5dc003d6re78e6e66ba51972@mail.gmail.com> <4A698D9C-7FC0-4FE4-BA8D-7628EF3AAE71@yale.edu> Message-ID: <7f014ea60911120723j26027a0ascec6d50c977a3add@mail.gmail.com> All of the OpenCV edge detection routines are also available in scikits.image if you have opencv (>= 2.0) installed. On Tue, Nov 10, 2009 at 5:48 PM, Zachary Pincus wrote: > References: Start around just looking at the top google hits for "image > processing edge detection" -- that should be a pretty good start. Also, > google any unfamiliar terms below... I really find that there's a ton of > good basic image-processing information available online. > > Code: Look at what's available in scipy.ndimage. There are functions for > getting gradient magnitudes, as well as standard filters like Sobel etc. > (which you'll learn about from the above), plus morphological operators for > modifying binarized image regions (e.g. like erosion etc.; useful for > getting rid of stray noise-induced edges), plus some basic functions for > image smoothing like median filters, etc. > > For exploratory analysis, you might want some ability to interactively > visualize images; you could use matplotlib or the imaging scikit, which is > still pre-release but making fast progress: > http://github.com/stefanv/scikits.image > > I've attached basic code for Canny edge detection, which should demonstrate > a bit about how ndimage works, plus it's useful in its own right. There is > also some code floating around for anisotropic diffusion and bilateral > filtering, which are two noise-reduction methods that can be better than > simple median filtering. > > Zach > > > > On Nov 10, 2009, at 11:37 AM, Dan Yamins wrote: > >> Hi, >> >> I'm looking into using SciPy for a couple of edge-detection problems, >> involving detection of edges in images of text (in simple, clean fonts). >> If someone on this list could point me to a relevant resource / function, >> that would be excellent. ? (I have essentially no background in image >> processing, but am reasonably comfortable mathematically, and I would be >> happy to dive into something fairly technical.) >> >> thanks, >> Dan >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From warren.weckesser at enthought.com Thu Nov 12 10:27:47 2009 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 12 Nov 2009 09:27:47 -0600 Subject: [SciPy-User] Interpolate: Derivatives of parametric splines In-Reply-To: <20091112134456.279300@gmx.net> References: <20091112113530.279300@gmx.net> <20091112134456.279300@gmx.net> Message-ID: <4AFC2973.6080800@enthought.com> Raimund, When you interpolate the curve using splprep, you get a spline representation of the parameterized curve (x(u), y(u)). As Zachary pointed out, the derivative values returned by spalde are the derivatives of x and y with respect to u. To get dy/dx, you can compute dy/dx = y'(u)/x'(u). This bit of code shows an example: ---------- import numpy as np from scipy import interpolate numpoints = 20 x = np.linspace(0, 2*np.pi, numpoints) y = np.sin(x) tckp, u = interpolate.splprep([x, y], s=0, k=5) u0 = 0.25 ders = interpolate.spalde(u0, tckp) x = ders[0][0] y = ders[1][0] dxdu = ders[0][1] dydu = ders[1][1] dydx = dydu / dxdu print "u =", u0, ": x =", x, " y =", y, " dy/dx = ", dydx ---------- Warren Raimund Andersen wrote: > Hello Zachary Pincus, > > thanks for your answer. Maybe I didn't get you right. > The first derivative at pi/2 should be 0 ( cos(pi/2) ). > What I get from interpolate.spalde(0.25, tckp) is > > 7.44935679e+00 and -3.47491248e-01. > > Now, how do I get to 0? Why those different 'x' values at all? > It should be always 1.57079633e+00, no? > > > -------- Original-Nachricht -------- > >> Datum: Thu, 12 Nov 2009 08:19:58 -0500 >> Von: Zachary Pincus >> An: SciPy Users List >> Betreff: Re: [SciPy-User] Interpolate: Derivatives of parametric splines >> > > >> Without thinking deeply about this at all, aren't the derivatives of a >> parametric spline [x(p), y(p)] given as dx/dp and dy/dp, not the dx/dy >> that you are perhaps expecting? >> >> >> On Nov 12, 2009, at 6:35 AM, anderse at gmx.de wrote: >> >> >>> Hi, >>> >>> I'd like to get the derivatives of parametric splines. >>> Looking at the tutorial >>> >> (http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html >> >>> ) >>> I get a spline like this: >>> >>> >>>>>> x = np.arange(0, 2*np.pi + np.pi / 4, 2 * np.pi / 8) >>>>>> y = np.sin(x) >>>>>> tck = interpolate.splrep(x, y, s = 0, k = 5) >>>>>> xnew = np.arange(0, 2 * np.pi, np.pi / 50) >>>>>> ynew = interpolate.splev(xnew, tck, der = 0) >>>>>> >>> now, the derivatives can be determined like this: >>> >>> >>>>>> yder = interpolate.splev(xnew, tck, der = 1) >>>>>> yder2 = interpolate.splev(xnew, tck, der = 2) >>>>>> >>>>>> plt.plot(x, y, 'x', xnew, ynew, xnew, yder, xnew, yder2) >>>>>> >>> The first derivative is about null at pi / 2, >>> the second one at pi, as they should be: >>> >>> >>>>>> interpolate.spalde(np.pi, tck) >>>>>> >>> array([ 0.00000000e+00, -1.00064770e+00, -1.73418916e-17, >>> 1.00726743e+00, -2.65046223e-16, -1.01680119e+00]) >>> >>> >>>>>> interpolate.spalde(np.pi / 2, tck) >>>>>> >>> array([ 1. , -0.00199181, -0.99629386, 0.02365328, >>> 0.90756527, >>> -0.1387468 ]) >>> >>> Of course, the x-range is the same, no matter of der=#. >>> >>> Now the parametric version: >>> >>> >>>>>> tckp, u = interpolate.splprep([x, y], s=0, k=5) >>>>>> u >>>>>> >>> array([ 0. , 0.13941767, 0.25 , 0.36058233, >>> 0.5 , >>> 0.63941767, 0.75 , 0.86058233, 1. ]) >>> >>> so pi is at 0.5, pi/2 is at 0.25. >>> >>> And this is what I get at these 'x' values: >>> >>> >>>>>> interpolate.spalde(0.5, tckp) >>>>>> >>> [array([ 3.14159265e+00, 5.14754151e+00, 1.10395807e-13, >>> 1.69542498e+02, -4.03851332e-11, -2.01255417e+04]), >>> array([ 7.73894012e-16, -5.38240284e+00, -1.31811639e-13, >>> 7.74093936e+01, 5.58012792e-11, 1.89849315e+04])] >>> >>> >>>>>> interpolate.spalde(0.25, tckp) >>>>>> >>> [array([ 1.57079633e+00, 7.44935679e+00, -7.65674781e-02, >>> -1.85343925e+02, 7.51370411e+01, 2.46939899e+04]), >>> array([ 1.00000000e+00, -3.47491248e-01, -5.16420728e+01, >>> 2.05418849e+02, 3.66866738e+03, -5.71113127e+04])] >>> >>> The first array states the x-values, the second one the y-values, >>> respectively, AFAIK. >>> This makes sense without derivatives, and I get a plot using >>> >>> >>>>>> unew = np.arange(0, 1.01, 0.01) >>>>>> out = interpolate.splev(unew, tckp, der = 0) >>>>>> plt.plot(out[0], out[1]) >>>>>> >>> which looks like the one above, but what about the derivatives? >>> >>> >>>>>> der1 = interpolate.splev(unew, tckp, der = 1) >>>>>> der2 = interpolate.splev(unew, tckp, der = 2) >>>>>> plt.plot(der1[0], der1[1], der2[0], der2[1]) >>>>>> >>> dont make sense to me at all. >>> >>> Thank you in advance for your help. >>> >>> Raimund >>> >>> -- >>> GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! >>> Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > From zachary.pincus at yale.edu Thu Nov 12 10:31:14 2009 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Thu, 12 Nov 2009 10:31:14 -0500 Subject: [SciPy-User] Interpolate: Derivatives of parametric splines In-Reply-To: <20091112134456.279300@gmx.net> References: <20091112113530.279300@gmx.net> <20091112134456.279300@gmx.net> Message-ID: <5E5419A2-84AE-44A5-8A4C-1C96D4857CFD@yale.edu> Hi, > thanks for your answer. Maybe I didn't get you right. > The first derivative at pi/2 should be 0 ( cos(pi/2) ). > What I get from interpolate.spalde(0.25, tckp) is > > 7.44935679e+00 and -3.47491248e-01. The first value is dx/du at 0.25. If you look at der1[0] (e.g. dx/du), you'll see it's basically constant, which is what you expect since x and u are linear with one another. > Why those different 'x' values at all? > It should be always 1.57079633e+00, no? I don't know why you think dx/du ought to be pi/2: x goes from 0 to 2pi while u goes from 0 to 1, therefore the slope of the line x(u) is 2pi; thus dx/du ought to be 2pi as well. Which it is, more or less, except for endpoint effects. These effects are more pronounced with parametric splines since, basically, there's more degrees of freedom for what the spline can do beyond the range of the input data. (Check out how the spline goes beyond the endpoints of your original data -- the parametric spline goes nuts, because, essentially, dx/du isn't fixed at a constant, unlike in the nonparametric spline case. When fitting a function with fewer constraints, it should not be a surprise that the fit is worse.) Now, the second value you show above (-3.47491248e-01) is dy/du at 0.25. Because dx/du is ~constant, dy/du should have the same zeros as dy/dx. Now, -0.35 isn't exactly zero, but if you look at the plot of der1[1], you'll see that der1[1] does have a zero pretty close to 0.25 that point. So again, you're getting more or less the expected result, especially given that a parametric spline fit with all those extra degrees of freedom just won't fit a function y(x) as well as the nonparametric spline designed for fitting functions of the form y(x). Make sense? By the way, if I were to try to evaluate a periodic function with a spline, I'd use interpolate.splrep with per=1. And if I had a periodic parametric function (e.g. a closed plane curve), use splprep with per=1. Periodic functions are the only case where endpoint effects can be completely banished with spline fitting. Otherwise endpoints effects are just par for the course with non-periodic spline fits, and are, as above, more troublesome in the parametric case because there are even more degrees of freedom. Feed splrep more data and you'll get better results because there are more constraints. Alternately, use a lower-order spline -- which are less prone to "ringing" artifacts when under-constrained -- to get better results with sparser data. (Not also below my use of numpy.linspace, which is far easier than arange for the sort of things you're needing.) # Not much data, high order spline In [119]: x = np.linspace(0, 2*np.pi, 9) In [120]: y = np.sin(x) In [121]: tckp, u = interpolate.splprep([x, y], s=0, k=5) In [122]: interpolate.spalde(0.25, tckp)[1] array([ 1.00000000e+00, -3.47491248e-01, -5.16420728e+01, 2.05418849e+02, 3.66866738e+03, -5.71113127e+04])] # More data, high order spline In [123]: x = np.linspace(0, 2*np.pi, 20) In [124]: y = np.sin(x) In [125]: tckp, u = interpolate.splprep([x, y], s=0, k=5) In [126]: interpolate.spalde(0.25, tckp)[1] array([ 9.99945859e-01, 4.44888721e-03, -5.85188610e+01, -3.32328600e+01, 1.61037915e+04, 2.86354557e+05])] # Not much data, but lower order spline In [127]: x = np.linspace(0, 2*np.pi, 9) In [128]: y = np.sin(x) In [129]: tckp, u = interpolate.splprep([x, y], s=0, k=3) In [130]: interpolate.spalde(0.25, tckp)[1] array([ 1.00000000e+00, -6.93498869e-02, -6.23269054e+01, 4.25319565e+02])] Zach From bsouthey at gmail.com Thu Nov 12 10:45:34 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 12 Nov 2009 09:45:34 -0600 Subject: [SciPy-User] ODR fitting several equations to the same parameters In-Reply-To: <4AFBF316.30507@gmail.com> References: <4AFAE5AF.3020506@gmail.com> <4AFAEE8E.9080000@gmail.com> <4AFBF316.30507@gmail.com> Message-ID: <4AFC2D9E.2060302@gmail.com> On 11/12/2009 05:35 AM, ms wrote: > Hi Bruce, > > Thanks for your reply but there are several things I don't really grasp: > > Bruce Southey ha scritto: > >> On 11/11/2009 10:26 AM, ms wrote: >> >>> Let's start with a simple example. Imagine I have several linear data >>> sets y=ax+b which have different b (all of them are known) but that >>> should fit to the same (unknown) a. To have my best estimate of a, I >>> would want to fit them all together. In this case it is trivial, you >>> just subtract the known b from the data set and fit them all at the same >>> time. >>> >>> >> Although b is known without error you still have potentially effects due >> to each data set. >> >> What I would do is fit: >> y= mu + dataset + a*x + dataset*a*x >> >> Where mu is some overall mean, >> > Mean of what? The b's? > Depending on what your terms are, y=a*x +b can be viewed is a simple linear regression then b is an intercept and a is a slope. Under a different view (typically general linear modeling), b can be a factor or class variable where 'b' can have multiple levels. As in the model above, this is analysis of covariance. You can get your estimate of 'b' for each data set as mu plus the appropriate solution of dataset. (While you can parameterize the model as y= dataset + ..., it is not as easy to interpret as the one using mu.) The reason for using this type of model is that you can quantify the variation between the data sets. >> dataset is the effect of the ith dataset - allows different intercepts >> for each data set >> dataset*a is the interaction between a and the dataset - allows >> different slopes for each dataset. >> > I don't really understand what quantities you mean by "effect" and > "interaction", and why should I want to allow different slopes for each > dataset -the aim to fit one and only one slope from all datasets. > The reason is that you can test that the slopes are the same and see if any data sets appear unusual. If the slopes are the same then you are back to what you wanted to know. Otherwise, you need to address why one or more data sets are different from the others. >> Obviously you first test that interaction is zero. In theory, the >> difference between the solutions of dataset should equate to the >> differences between the known b's. >> > ...same as above... > > >> Now you just expand your linear model to nonlinear one. The formulation >> depends on your equation. But really you just replace f(a*x) with >> f(a*x+dataset*a*x). >> >> So I first try with a linear model before a nonlinear. Also I would see >> if I could linearize the non-linear function. >> > Well, the function is for sure non linear (it has a sigmoidal shape). To > linearize it is a good idea but I am doubtful it is doable. > > Thanks! > > m. > Again it depends on the function because some of these do have linearized forms or can be well approximated by a linear model. Bruce From devicerandom at gmail.com Thu Nov 12 10:46:43 2009 From: devicerandom at gmail.com (ms) Date: Thu, 12 Nov 2009 15:46:43 +0000 Subject: [SciPy-User] ODR fitting several equations to the same parameters In-Reply-To: <4AFC2D9E.2060302@gmail.com> References: <4AFAE5AF.3020506@gmail.com> <4AFAEE8E.9080000@gmail.com> <4AFBF316.30507@gmail.com> <4AFC2D9E.2060302@gmail.com> Message-ID: <4AFC2DE3.1040908@gmail.com> Bruce Southey ha scritto: > On 11/12/2009 05:35 AM, ms wrote: >>> Although b is known without error you still have potentially effects due >>> to each data set. >>> >>> What I would do is fit: >>> y= mu + dataset + a*x + dataset*a*x >>> >>> Where mu is some overall mean, >>> >> Mean of what? The b's? >> > Depending on what your terms are, y=a*x +b can be viewed is a simple > linear regression then b is an intercept and a is a slope. Under a > different view (typically general linear modeling), b can be a factor or > class variable where 'b' can have multiple levels. As in the model > above, this is analysis of covariance. You can get your estimate of 'b' > for each data set as mu plus the appropriate solution of dataset. (While > you can parameterize the model as y= dataset + ..., it is not as easy to > interpret as the one using mu.) > > The reason for using this type of model is that you can quantify the > variation between the data sets. This sounds interesting, but my problem is much more mundane: what are the "mu" or "dataset" quantities? >>> dataset is the effect of the ith dataset - allows different intercepts >>> for each data set >>> dataset*a is the interaction between a and the dataset - allows >>> different slopes for each dataset. >>> >> I don't really understand what quantities you mean by "effect" and >> "interaction", and why should I want to allow different slopes for each >> dataset -the aim to fit one and only one slope from all datasets. >> > The reason is that you can test that the slopes are the same and see if > any data sets appear unusual. If the slopes are the same then you are > back to what you wanted to know. Otherwise, you need to address why one > or more data sets are different from the others. Agree. My point is however that the data sets fitting to the same slope is an *assumption* that I have to make. Of course checking it is a good idea, but again, I don't know what are the mathematical definitions of the quantities you are talking about. > Again it depends on the function because some of these do have > linearized forms or can be well approximated by a linear model. A linear model cannot do it for sure, and I don't think it can be linearized. thanks... m. From josef.pktd at gmail.com Thu Nov 12 11:55:40 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 12 Nov 2009 11:55:40 -0500 Subject: [SciPy-User] ODR fitting several equations to the same parameters In-Reply-To: <4AFC2407.1020902@gmail.com> References: <4AFAE5AF.3020506@gmail.com> <1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com> <4AFC2407.1020902@gmail.com> Message-ID: <1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com> On Thu, Nov 12, 2009 at 10:04 AM, ms wrote: > josef.pktd at gmail.com ha scritto: >> On Wed, Nov 11, 2009 at 11:26 AM, ms wrote: >>> Let's start with a simple example. Imagine I have several linear data >>> sets y=ax+b which have different b (all of them are known) but that >>> should fit to the same (unknown) a. To have my best estimate of a, I >>> would want to fit them all together. In this case it is trivial, you >>> just subtract the known b from the data set and fit them all at the same >>> time. >>> >>> In my case it is a bit different, in the sense that I have to do >>> conceptually the same thing but for a highly non-linear equation where >>> the equivalent of "b" above is not so simple to separate. I wonder >>> therefore if there is a way to do a simultaneous fit of different >>> equations differing only in the known parameters and having a single >>> output, possibly with the help of ODR. Is this possible? And/or what >>> should be the best thing to do, in general, for this kind of problems? >> >> I don't know enough about ODR, but for least squares, optimize.leastsq >> or curve_fit, it seems you can just substitute any known parameters >> into your equation. >> >> y_i = f(x_i, a, b_i) for each group i >> plug in values for all b_i, gives reduced f(x_i, a) independent of >> specific parameters >> stack equations [y_i for all i] and [f(..) for all i] >> >> If you fit this in curve_fit you could also choose the weights, in >> case the error variance differs by groups. >> >> Does this work or am I missing the point? > > Probably it's me missing it. Do you just mean to fit them all together > separately and then make a weighted average of the fitted parameters, > and using the standard deviation of the mean as the error of the fit? I > am confused. I meant stacking all equations into one big estimation problem y = f(x,a) and minimize squared residual over all equations. This assumes homoscedastic errors (identical noise in each equation). an example (quickly written and not optimized, there are parts I don't remember about curve_fit, fixed parameters could be better handled by a class) #################### """stack equations with different known parameters I didn't get curve_fit to work with only 1 parameter to estimate Created on Thu Nov 12 11:17:21 2009 Author: josef-pktd """ import numpy as np from scipy import optimize def fsingle(a,c,b,x): return b*x**a + c atrue = 1. ctrue = 10. b = np.array([[1.]*10, [2.]*10, [3.]*10]) b = np.array([1.,2.,3.]) x = np.random.uniform(size=(3,10)) y = np.hstack([fsingle(atrue, ctrue, b[i], x[i]) for i in range(x.shape[0])]) y += 0.1*np.random.normal(size=y.shape) def fun(x,a,c): #b is taken from enclosing scope #print x.shape xx=x.reshape((3,10)) return np.hstack([fsingle(a, c, b[i], xx[i]) for i in range(xx.shape[0])]) res = optimize.curve_fit(fun,x.ravel(),y, p0=np.array([2.,1.])) print 'true parameters ', atrue, ctrue print 'parameter estimate', res[0] print 'standard deviation', np.sqrt(np.diag(res[1])) #################### > > sorry, > m. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From devicerandom at gmail.com Thu Nov 12 12:28:00 2009 From: devicerandom at gmail.com (ms) Date: Thu, 12 Nov 2009 17:28:00 +0000 Subject: [SciPy-User] ODR fitting several equations to the same parameters In-Reply-To: <1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com> References: <4AFAE5AF.3020506@gmail.com> <1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com> <4AFC2407.1020902@gmail.com> <1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com> Message-ID: <4AFC45A0.30502@gmail.com> josef.pktd at gmail.com ha scritto: >>> Does this work or am I missing the point? >> Probably it's me missing it. Do you just mean to fit them all together >> separately and then make a weighted average of the fitted parameters, >> and using the standard deviation of the mean as the error of the fit? I >> am confused. > > I meant stacking all equations into one big estimation problem y = > f(x,a) and minimize squared residual over all equations. > This assumes homoscedastic errors (identical noise in each equation). Yes, that's what I want! Thanks. I am going to read and try your code and see what I get and don't get of it. Thanks a lot :) m. > an example > (quickly written and not optimized, there are parts I don't remember > about curve_fit, fixed parameters could be better handled by a class) > > #################### > """stack equations with different known parameters > > I didn't get curve_fit to work with only 1 parameter to estimate > > Created on Thu Nov 12 11:17:21 2009 > Author: josef-pktd > """ > import numpy as np > from scipy import optimize > > > def fsingle(a,c,b,x): > return b*x**a + c > > atrue = 1. > ctrue = 10. > b = np.array([[1.]*10, [2.]*10, [3.]*10]) > b = np.array([1.,2.,3.]) > x = np.random.uniform(size=(3,10)) > y = np.hstack([fsingle(atrue, ctrue, b[i], x[i]) for i in range(x.shape[0])]) > y += 0.1*np.random.normal(size=y.shape) > > def fun(x,a,c): > #b is taken from enclosing scope > #print x.shape > xx=x.reshape((3,10)) > return np.hstack([fsingle(a, c, b[i], xx[i]) for i in range(xx.shape[0])]) > > res = optimize.curve_fit(fun,x.ravel(),y, p0=np.array([2.,1.])) > > print 'true parameters ', atrue, ctrue > print 'parameter estimate', res[0] > print 'standard deviation', np.sqrt(np.diag(res[1])) > #################### > > > > >> sorry, >> m. >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From dyamins at gmail.com Thu Nov 12 13:23:33 2009 From: dyamins at gmail.com (Dan Yamins) Date: Thu, 12 Nov 2009 13:23:33 -0500 Subject: [SciPy-User] Edge Detection In-Reply-To: <7f014ea60911120723j26027a0ascec6d50c977a3add@mail.gmail.com> References: <15e4667e0911100837q5dc003d6re78e6e66ba51972@mail.gmail.com> <4A698D9C-7FC0-4FE4-BA8D-7628EF3AAE71@yale.edu> <7f014ea60911120723j26027a0ascec6d50c977a3add@mail.gmail.com> Message-ID: <15e4667e0911121023x33402f95l99f6cea7e1fb3cdd@mail.gmail.com> On Thu, Nov 12, 2009 at 10:23 AM, Chris Colbert wrote: > All of the OpenCV edge detection routines are also available in > scikits.image if you have opencv (>= 2.0) installed. > > On Tue, Nov 10, 2009 at 5:48 PM, Zachary Pincus > wrote: > > > > > Code: Look at what's available in scipy.ndimage. There are functions for > > getting gradient magnitudes, as well as standard filters like Sobel etc. > > (which you'll learn about from the above), plus morphological operators > for > > modifying binarized image regions (e.g. like erosion etc.; useful for > > getting rid of stray noise-induced edges), plus some basic functions for > > image smoothing like median filters, etc. > > > > For exploratory analysis, you might want some ability to interactively > > visualize images; you could use matplotlib or the imaging scikit, which > is > > still pre-release but making fast progress: > > http://github.com/stefanv/scikits.image > > > > I've attached basic code for Canny edge detection, which should > demonstrate > > a bit about how ndimage works, plus it's useful in its own right. There > is > > also some code floating around for anisotropic diffusion and bilateral > > filtering, which are two noise-reduction methods that can be better than > > simple median filtering. > > > Hi Chris and Zachary, thanks very much for your help. I really appreciate it. My goal was to recognize linear (and circular) strokes in images of text. After I wrote my question and did some further research, I realized that I was so ignorant that I didn't know enough to properly ask for what I wanted. Finding strokes in letters is actually more like "line detection" (as in "detecting lines as geometric features") than it is like edge detection (e.g. something that the sobel operator does well). I needed to localize the lines and describe them in some geometric way, not so much determine where their boundaries were. What I ended up doing is using the Radon transform (scipy.misc.radon), together with the hcluster package. The basic idea is that applying Radon transform to the image of a letter transforms the strokes into confined blobs whose position and extent in the resulting point/angle space describes the location, width, and angle of the original stroke. Then, I make a binary version of the transformed image by applying an indicated threshold on intensity -- e.g. a 1 at all points in the transformed image whose intensity are above the threshold, and 0 elsewhere. Then, I cluster this binary image, which ends up identifying clusters whose centroid and diameter correspond to features of idealized strokes. This algorithm seems to work pretty well. Thanks alot again for your help, the scipy.ndimage package really seems great. I read somewhere that the edge-detection routines will actually become part of the next version of the package. Is that still true? Thanks, Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: From sccolbert at gmail.com Thu Nov 12 13:35:13 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Thu, 12 Nov 2009 19:35:13 +0100 Subject: [SciPy-User] Edge Detection In-Reply-To: <15e4667e0911121023x33402f95l99f6cea7e1fb3cdd@mail.gmail.com> References: <15e4667e0911100837q5dc003d6re78e6e66ba51972@mail.gmail.com> <4A698D9C-7FC0-4FE4-BA8D-7628EF3AAE71@yale.edu> <7f014ea60911120723j26027a0ascec6d50c977a3add@mail.gmail.com> <15e4667e0911121023x33402f95l99f6cea7e1fb3cdd@mail.gmail.com> Message-ID: <7f014ea60911121035h73cb69e7pbdd0996253e2adb1@mail.gmail.com> Dan, You may also want to look into HOG features as well (Histogram of oriented gradients). They are used quite often for shape characterization, and with proper normalization, can become scale and rotation invariant. Glad to see you got something working! Cheers, Chris On Thu, Nov 12, 2009 at 7:23 PM, Dan Yamins wrote: > > On Thu, Nov 12, 2009 at 10:23 AM, Chris Colbert wrote: >> >> All of the OpenCV edge detection routines are also available in >> scikits.image if you have opencv ?(>= 2.0) installed. >> >> On Tue, Nov 10, 2009 at 5:48 PM, Zachary Pincus >> wrote: >> >> > >> > Code: Look at what's available in scipy.ndimage. There are functions for >> > getting gradient magnitudes, as well as standard filters like Sobel etc. >> > (which you'll learn about from the above), plus morphological operators >> > for >> > modifying binarized image regions (e.g. like erosion etc.; useful for >> > getting rid of stray noise-induced edges), plus some basic functions for >> > image smoothing like median filters, etc. >> > >> > For exploratory analysis, you might want some ability to interactively >> > visualize images; you could use matplotlib or the imaging scikit, which >> > is >> > still pre-release but making fast progress: >> > http://github.com/stefanv/scikits.image >> > >> > I've attached basic code for Canny edge detection, which should >> > demonstrate >> > a bit about how ndimage works, plus it's useful in its own right. There >> > is >> > also some code floating around for anisotropic diffusion and bilateral >> > filtering, which are two noise-reduction methods that can be better than >> > simple median filtering. >> > > > Hi Chris and Zachary, thanks very much for your help.? I really appreciate > it. > > My goal was to recognize linear (and circular) strokes in images of text. > After I wrote my question and did some further research, I realized that I > was so ignorant that I didn't know enough to properly ask for what I > wanted.? Finding strokes in letters is actually more like "line detection" > (as in "detecting lines as geometric features") than it is like edge > detection (e.g. something that the sobel operator does well).? I needed to > localize the lines and describe them in some geometric way, not so much > determine where their boundaries were. > > What I ended up doing is using the Radon transform (scipy.misc.radon), > together with the hcluster package.? The basic idea is that applying Radon > transform to the image of a letter transforms the strokes into confined > blobs whose position and extent in the resulting point/angle space describes > the location, width, and angle of the original stroke. ? Then, I make a > binary version of the transformed image by applying an indicated threshold > on intensity -- e.g. a 1 at all points in the transformed image whose > intensity are above the threshold, and 0 elsewhere.?? Then, I cluster this > binary image, which ends up identifying clusters whose centroid and diameter > correspond to features of idealized strokes.????? This algorithm seems to > work pretty well. > > Thanks alot again for your help, the scipy.ndimage package really seems > great.? I read somewhere that the edge-detection routines will actually > become part of the next version of the package.? Is that still true? > > Thanks, > Dan > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From nwagner at iam.uni-stuttgart.de Thu Nov 12 13:53:55 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Thu, 12 Nov 2009 19:53:55 +0100 Subject: [SciPy-User] FAIL: test_lambertw.test_values Message-ID: Hi all, Can someone reproduce the following failure with recent svn ? ====================================================================== FAIL: test_lambertw.test_values ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/nwagner/local/lib64/python2.6/site-packages/nose-0.11.2.dev-py2.6.egg/nose/case.py", line 183, in runTest self.test(*self.arg) File "/home/nwagner/local/lib64/python2.6/site-packages/scipy/special/tests/test_lambertw.py", line 80, in test_values FuncData(w, data, (0,1), 2, rtol=1e-10, atol=1e-13).check() File "/home/nwagner/local/lib64/python2.6/site-packages/scipy/special/tests/testutils.py", line 187, in check assert False, "\n".join(msg) AssertionError: Max |adiff|: 2.5797 Max |rdiff|: 3.81511 Bad results for the following points (in output 0): (-0.44800000000000001+0.40000000000000002j) 0j => (-1.2370928928166736-1.6588828572971359j) != (-0.11855133765652383+0.66570534313583418j) (rdiff 3.8151122286225245) Nils From robfelty at gmail.com Thu Nov 12 18:29:38 2009 From: robfelty at gmail.com (Robert Felty) Date: Thu, 12 Nov 2009 16:29:38 -0700 Subject: [SciPy-User] specify libgfortran.dylib location In-Reply-To: References: Message-ID: <5E9D6CE9-0272-4101-8830-9B36AB4F4544@gmail.com> I've been trying to get scipy working on snow leopard for several weeks now. I have seen several blogs with lots of suggestions, but none have worked for me, until I finally figured it out today. I kept getting an error that my libgfortan.dylib file was the wrong architecture. Here is the stack trace: >>> import scipy.stats Traceback (most recent call last): File "", line 1, in File "/Library/Python/2.6/site-packages/scipy-0.8.0.dev5975-py2.6- macosx-10.6-universal.egg/scipy/stats/__init__.py", line 7, in from stats import * File "/Library/Python/2.6/site-packages/scipy-0.8.0.dev5975-py2.6- macosx-10.6-universal.egg/scipy/stats/stats.py", line 198, in import scipy.special as special File "/Library/Python/2.6/site-packages/scipy-0.8.0.dev5975-py2.6- macosx-10.6-universal.egg/scipy/special/__init__.py", line 8, in from basic import * File "/Library/Python/2.6/site-packages/scipy-0.8.0.dev5975-py2.6- macosx-10.6-universal.egg/scipy/special/basic.py", line 8, in from _cephes import * ImportError: dlopen(/Library/Python/2.6/site-packages/ scipy-0.8.0.dev5975-py2.6-macosx-10.6-universal.egg/scipy/special/ _cephes.so, 2): Library not loaded: /usr/local/lib/libgfortran.2.dylib Referenced from: /Library/Python/2.6/site-packages/ scipy-0.8.0.dev5975-py2.6-macosx-10.6-universal.egg/scipy/special/ _cephes.so Reason: no suitable image found. Did find: /usr/local/lib/libgfortran.2.dylib: mach-o, but wrong architecture /usr/local/lib/libgfortran.2.dylib: mach-o, but wrong architecture I discovered today that I had several different libgfortran files: /usr/local/lib/libgfortran.2.0.0.dylib /usr/local/lib/libgfortran.2.dylib /usr/local/lib/libgfortran.a /usr/local/lib/libgfortran.dylib /usr/local/lib/libgfortran.la /usr/local/lib/ppc64/libgfortran.2.0.0.dylib /usr/local/lib/ppc64/libgfortran.2.dylib /usr/local/lib/ppc64/libgfortran.a /usr/local/lib/ppc64/libgfortran.dylib /usr/local/lib/ppc64/libgfortran.la /usr/local/lib/x86_64/libgfortran.2.0.0.dylib /usr/local/lib/x86_64/libgfortran.2.dylib /usr/local/lib/x86_64/libgfortran.a /usr/local/lib/x86_64/libgfortran.dylib /usr/local/lib/x86_64/libgfortran.la I tried copying the x86_64 file to /usr/local/lib, and now scipy works. However, this does not seem like the right way to do it. Is there a way to tell scipy that it should use the version in /usr/local/ lib/x86_64? I am using system Python 2.6.1 on Mac 10.6.1, which scipy 0.8.0 Thanks in advance for any suggestions. Rob From robert.kern at gmail.com Thu Nov 12 18:35:34 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 12 Nov 2009 17:35:34 -0600 Subject: [SciPy-User] specify libgfortran.dylib location In-Reply-To: <5E9D6CE9-0272-4101-8830-9B36AB4F4544@gmail.com> References: <5E9D6CE9-0272-4101-8830-9B36AB4F4544@gmail.com> Message-ID: <3d375d730911121535t589b29a3y1e595bab8aa601e8@mail.gmail.com> On Thu, Nov 12, 2009 at 17:29, Robert Felty wrote: > I tried copying the x86_64 file to /usr/local/lib, and now scipy > works. However, this does not seem like the right way to do it. Is > there a way to tell scipy that it should use the version in /usr/local/ > lib/x86_64? > I am using system Python 2.6.1 on Mac 10.6.1, which scipy 0.8.0 Where did you get your gfortran from? If you are using the R gfortran binaries, my version appears to have the files /usr/local/lib/x86_64/ as just symlinks back to the ones in /usr/local/lib/. Perhaps you should upgrade to the most recent release. Unless if you already are on the latest release and my slightly older release is better. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From kevinar18 at hotmail.com Thu Nov 12 22:41:14 2009 From: kevinar18 at hotmail.com (Kevin Ar18) Date: Thu, 12 Nov 2009 22:41:14 -0500 Subject: [SciPy-User] Is there a Win 64bit version? Message-ID: The installer says: "Python version 2.6 required, which was not found in the registry." I did some searching and it seems like this may be a 32bit vs 64bit conflict. I'm running Vista 64bit and Python 2.6.4 64bit. Has anyone made an installer for 64bit Windows? _________________________________________________________________ Hotmail: Powerful Free email with security by Microsoft. http://clk.atdmt.com/GBL/go/171222986/direct/01/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Thu Nov 12 23:05:26 2009 From: sturla at molden.no (Sturla Molden) Date: Fri, 13 Nov 2009 05:05:26 +0100 Subject: [SciPy-User] Is there a Win 64bit version? In-Reply-To: References: Message-ID: <4AFCDB06.9000105@molden.no> Kevin Ar18 skrev: > Has anyone made an installer for 64bit Windows? I tried the installer for NumPy ... no luck. NumPy segfaulted on import. I guess there is a good reason the release notes says "highly experimental". SciPy would be even further away from 64 bit support. I guess if you really need 64 bit, you should use Linux or Mac. In a perfect world, NumPy and SciPy would run on 64-bit Windows 7 and Python 3.1. But in the real world, we have to use Python 2.6.4 in 32-bit mode for stability. From dwf at cs.toronto.edu Thu Nov 12 23:09:35 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 12 Nov 2009 23:09:35 -0500 Subject: [SciPy-User] Is there a Win 64bit version? In-Reply-To: References: Message-ID: <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu> On 12-Nov-09, at 10:41 PM, Kevin Ar18 wrote: > The installer says: > "Python version 2.6 required, which was not found in the registry." > > I did some searching and it seems like this may be a 32bit vs 64bit > conflict. > > I'm running Vista 64bit and Python 2.6.4 64bit. > > Has anyone made an installer for 64bit Windows? So far, no. NumPy is relatively easy to build yourself I think, see: http://mail.scipy.org/pipermail/numpy-discussion/2008-December/039230.html SciPy a little less so, thanks to the mess created by the Fortran situation on Windows. As of June 2009, SciPy could be compiled with gfortran-mingw but would crash randomly, and there had been no success in debugging why. http://mail.scipy.org/pipermail/numpy-discussion/2009-June/043571.html I'd suggest using 32-bit Python/NumPy/SciPy unless you really need it. David From cournape at gmail.com Thu Nov 12 23:21:18 2009 From: cournape at gmail.com (David Cournapeau) Date: Fri, 13 Nov 2009 13:21:18 +0900 Subject: [SciPy-User] Is there a Win 64bit version? In-Reply-To: <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu> References: <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu> Message-ID: <5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com> On Fri, Nov 13, 2009 at 1:09 PM, David Warde-Farley wrote: > > So far, no. NumPy is relatively easy to build yourself I think, see: > > ? ? ? ?http://mail.scipy.org/pipermail/numpy-discussion/2008-December/039230.html > Thanks to Enthought support, I have fixed all major *sources* issues so that both numpy and scipy can be build under VS 2008 + ifort combination for windows 64. The build process is complicated though, and require to have the MKL. I have not made much progress on building numpy and scipy with open source tools, though. David From kevinar18 at hotmail.com Thu Nov 12 23:28:17 2009 From: kevinar18 at hotmail.com (Kevin Ar18) Date: Thu, 12 Nov 2009 23:28:17 -0500 Subject: [SciPy-User] Is there a Win 64bit version? In-Reply-To: <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu> References: , <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu> Message-ID: > So far, no. NumPy is relatively easy to build yourself I think, see: > > http://mail.scipy.org/pipermail/numpy-discussion/2008-December/039230.html My version of MSVC only compiles to 32bit (it's free after all). This only makes me wish more that CLANG C++ support was complete and it could be used to compile all Python modules on the fly in Windows. :) Oh, wait, I need a Fortran compiler? What are my options? > SciPy a little less so, thanks to the mess created by the Fortran > situation on Windows. As of June 2009, SciPy could be compiled with > gfortran-mingw but would crash randomly, and there had been no success > in debugging why. > > http://mail.scipy.org/pipermail/numpy-discussion/2009-June/043571.html > > I'd suggest using 32-bit Python/NumPy/SciPy unless you really need it. I only need Numpy -- which from what I am seeing may be possible to do? As for going back to 32bit, well that would be kind of a mess. I've have to figure out what to do with my current Python install, then I'd have to re-install the other Python modules I need, etc.... It's possible, I guess if there is no other option. Here's what I am currently using: Vista 64bit Python 2.6.4 64bit so, I need one for Python 2.6.4 64bit _________________________________________________________________ Hotmail: Free, trusted and rich email service. http://clk.atdmt.com/GBL/go/171222984/direct/01/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinar18 at hotmail.com Thu Nov 12 23:30:07 2009 From: kevinar18 at hotmail.com (Kevin Ar18) Date: Thu, 12 Nov 2009 23:30:07 -0500 Subject: [SciPy-User] Is there a Win 64bit version? In-Reply-To: <5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com> References: , <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>, <5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com> Message-ID: > > So far, no. NumPy is relatively easy to build yourself I think, see: > > > > http://mail.scipy.org/pipermail/numpy-discussion/2008-December/039230.html > > > > Thanks to Enthought support, I have fixed all major *sources* issues > so that both numpy and scipy can be build under VS 2008 + ifort > combination for windows 64. The build process is complicated though, > and require to have the MKL. I have not made much progress on building > numpy and scipy with open source tools, though. Is numpy pure C? If so, would CLANG work? http://clang.llvm.org/ _________________________________________________________________ Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. http://clk.atdmt.com/GBL/go/171222985/direct/01/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Thu Nov 12 23:36:05 2009 From: sturla at molden.no (Sturla Molden) Date: Fri, 13 Nov 2009 05:36:05 +0100 Subject: [SciPy-User] Is there a Win 64bit version? In-Reply-To: References: , <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>, <5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com> Message-ID: <4AFCE235.803@molden.no> Kevin Ar18 skrev: > > Is numpy pure C? C and Python. > If so, would CLANG work? http://clang.llvm.org/ Probably not. Also beware of the CRT issue. From dwf at cs.toronto.edu Thu Nov 12 23:36:21 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 12 Nov 2009 23:36:21 -0500 Subject: [SciPy-User] Is there a Win 64bit version? In-Reply-To: <5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com> References: <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu> <5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com> Message-ID: <46421F92-AD72-40B1-AA55-4577A1DA653D@cs.toronto.edu> On 12-Nov-09, at 11:21 PM, David Cournapeau wrote: > Thanks to Enthought support, I have fixed all major *sources* issues > so that both numpy and scipy can be build under VS 2008 + ifort > combination for windows 64. The build process is complicated though, > and require to have the MKL. I have not made much progress on building > numpy and scipy with open source tools, though. Well, that's encouraging; at least there's SOME way to do it. Cheers to Enthought, and of course to Mr. Cournapeau. :) Does it require static linking, or is there a possibility (down the road, of course) that binaries can be built that dynamically link against a (separately installed) MKL? Also, any chance you have the build instructions written down somewhere? This is what we have for the Windows build instructions on the new website: http://projects.scipy.org/scipy/browser/scipy.org/source/building/windows.rst I'm sure it is far from complete, and mentions nothing about 64-bit. David From sturla at molden.no Thu Nov 12 23:44:19 2009 From: sturla at molden.no (Sturla Molden) Date: Fri, 13 Nov 2009 05:44:19 +0100 Subject: [SciPy-User] Is there a Win 64bit version? In-Reply-To: References: , <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu> Message-ID: <4AFCE423.6000300@molden.no> Kevin Ar18 skrev: > > Oh, wait, I need a Fortran compiler? What are my options? Only for SciPy. Your options are e.g. Intel (ifort), GNU (gfortran), Absoft, Lahey, or NAG. From dwf at cs.toronto.edu Thu Nov 12 23:47:05 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 12 Nov 2009 23:47:05 -0500 Subject: [SciPy-User] Is there a Win 64bit version? In-Reply-To: References: , <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>, <5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com> Message-ID: On 12-Nov-09, at 11:30 PM, Kevin Ar18 wrote: > Is numpy pure C? If so, would CLANG work? http://clang.llvm.org/ The other David is the authority on these matters but I would imagine not, unless you also had somehow compiled a custom Python through that. Then maybe you'd have a shot. I think your only option is MSVC on 64-bit Windows. David From sturla at molden.no Thu Nov 12 23:58:10 2009 From: sturla at molden.no (Sturla Molden) Date: Fri, 13 Nov 2009 05:58:10 +0100 Subject: [SciPy-User] Is there a Win 64bit version? In-Reply-To: References: , <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu> Message-ID: <4AFCE762.3030907@molden.no> Kevin Ar18 skrev: > I only need Numpy -- which from what I am seeing may be possible to do? If you have a processor that supports Intel VT-X or AMD-V technology, you can always download Sun VirtualBox for free (PUEL license) and install a 64-bit Linux or OpenSolaris on it. From kevinar18 at hotmail.com Thu Nov 12 23:58:38 2009 From: kevinar18 at hotmail.com (Kevin Ar18) Date: Thu, 12 Nov 2009 23:58:38 -0500 Subject: [SciPy-User] Is there a Win 64bit version? In-Reply-To: References: , , <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>, , <5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com>, , Message-ID: > > Is numpy pure C? If so, would CLANG work? http://clang.llvm.org/ > > The other David is the authority on these matters but I would imagine > not, unless you also had somehow compiled a custom Python through > that. Then maybe you'd have a shot. I think your only option is MSVC > on 64-bit Windows. > > David Is this the same David that said it would be easy to compile numpy for Win64 in that other link? Since I only need numpy, I'd be interested if anybody has made one for Win64 and just not uploaded it yet. :) _________________________________________________________________ Windows 7: Unclutter your desktop. http://go.microsoft.com/?linkid=9690331&ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen:112009 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinar18 at hotmail.com Fri Nov 13 00:02:27 2009 From: kevinar18 at hotmail.com (Kevin Ar18) Date: Fri, 13 Nov 2009 00:02:27 -0500 Subject: [SciPy-User] Is there a Win 64bit version? In-Reply-To: <4AFCE235.803@molden.no> References: , , <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>, , <5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com>, , <4AFCE235.803@molden.no> Message-ID: > > Is numpy pure C? > C and Python. > > > If so, would CLANG work? http://clang.llvm.org/ > Probably not. > > Also beware of the CRT issue. Parden the fact that I don't know much about this. The only c++ program I've compiled was to convert a Python program I made to something faster. :) I know nothing about the CRT issue. Anyways, I want to ask an off topic question.... Does this CRT issue only apply because Python itself was compiled in MSVC? In theory, would it be possible to eventually create a version of Python that compiled in Clang (after it finishes C++ support), and thus, be able to eventually also compile modules like numpy in Clang? Or does the problem go much deeper (unrelated to Python)? _________________________________________________________________ Hotmail: Powerful Free email with security by Microsoft. http://clk.atdmt.com/GBL/go/171222986/direct/01/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevinar18 at hotmail.com Fri Nov 13 00:09:18 2009 From: kevinar18 at hotmail.com (Kevin Ar18) Date: Fri, 13 Nov 2009 00:09:18 -0500 Subject: [SciPy-User] Is there a Win 64bit version? In-Reply-To: <4AFCE762.3030907@molden.no> References: , , <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>, , <4AFCE762.3030907@molden.no> Message-ID: > > I only need Numpy -- which from what I am seeing may be possible to do? > If you have a processor that supports Intel VT-X or AMD-V technology, > you can always download Sun VirtualBox for free (PUEL license) and > install a 64-bit Linux or OpenSolaris on it. Yeah, "personal use" wouldn't work. It's really more trouble to setup an entirely new system for it all. Basically, I have Python like I want it right now and just wanted to add another module which required numpy. If I can't get a numpy for 64bit, I can alway install Python 32bit and then re-install the extra modules/add-ons that I've added to it; but, of course, that's probably also gonna take some time to do. Still, I do want to thank you all for trying to help out. I do appreciate it. :) _________________________________________________________________ Hotmail: Powerful Free email with security by Microsoft. http://clk.atdmt.com/GBL/go/171222986/direct/01/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Fri Nov 13 00:11:03 2009 From: sturla at molden.no (Sturla Molden) Date: Fri, 13 Nov 2009 06:11:03 +0100 Subject: [SciPy-User] Is there a Win 64bit version? In-Reply-To: References: , , <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>, , <5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com>, , <4AFCE235.803@molden.no> Message-ID: <4AFCEA67.8020208@molden.no> Kevin Ar18 skrev: > Anyways, I want to ask an off topic question.... > Does this CRT issue only apply because Python itself was compiled in MSVC? You always have to link extensions with the same C runtime as Python, otherwise bad things might happen. To be precise: you cannot share resources between different CRTs. E.g. you cannot fopen a FILE* with one CRT and fread on the FILE* with another. If you link with a different CRT than Python, a function like numpy.fromfile can mess up badly. > > In theory, would it be possible to eventually create a version of > Python that compiled in Clang (after it finishes C++ support), and > thus, be able to eventually also compile modules like numpy in Clang? > Or does the problem go much deeper (unrelated to Python)? I don't know which libraries clang link. But very likely ... if you want to build NumPy with clang, you also have to build Python with clang. From sturla at molden.no Fri Nov 13 00:14:13 2009 From: sturla at molden.no (Sturla Molden) Date: Fri, 13 Nov 2009 06:14:13 +0100 Subject: [SciPy-User] Is there a Win 64bit version? In-Reply-To: References: , , <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>, , <4AFCE762.3030907@molden.no> Message-ID: <4AFCEB25.5080309@molden.no> Kevin Ar18 skrev: > Yeah, "personal use" wouldn't work. It's really more trouble to setup > an entirely new system for it all. > That is confusing. "Personal use" in Sun's license also covers academic use and commercial use. What is does not cover is automated mass deployment in an organisation, except academic institutions. From sturla at molden.no Fri Nov 13 00:18:48 2009 From: sturla at molden.no (Sturla Molden) Date: Fri, 13 Nov 2009 06:18:48 +0100 Subject: [SciPy-User] Is there a Win 64bit version? In-Reply-To: <4AFCEB25.5080309@molden.no> References: , , <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>, , <4AFCE762.3030907@molden.no> <4AFCEB25.5080309@molden.no> Message-ID: <4AFCEC38.3080606@molden.no> Sturla Molden skrev: > Kevin Ar18 skrev: > >> Yeah, "personal use" wouldn't work. It's really more trouble to setup >> an entirely new system for it all. >> >> > That is confusing. "Personal use" in Sun's license also covers academic > use and commercial use. What is does not cover is automated mass > deployment in an organisation, except academic institutions. > > http://www.virtualbox.org/wiki/Licensing_FAQ 6. *What exactly do you mean by /personal use/ and /academic use/ in the Personal Use and Evaluation License ?* Personal use is when you install the product on one or more PCs yourself and you make use of it (or even your friend, sister and grandmother). It doesn't matter whether you just use it for fun or run your multi-million euro business with it. Also, if you install it on your work PC at some large company, this is still personal use. However, if you are an administrator and want to deploy it to the 500 desktops in your company, this would no longer qualify as /personal use/. Well, you could ask each of your 500 employees to install VirtualBox but don't you think we deserve some money in this case? We'd even assist you with any issue you might have. Use at academic institutions such as schools, colleges and universities by both teachers and students is covered. So in addition to the personal use which is always permitted, academic institutions may also choose to roll out the software in an automated way to make it available to its students and personnel. From kevinar18 at hotmail.com Fri Nov 13 00:48:03 2009 From: kevinar18 at hotmail.com (Kevin Ar18) Date: Fri, 13 Nov 2009 00:48:03 -0500 Subject: [SciPy-User] Is there a Win 64bit version? In-Reply-To: <4AFCEC38.3080606@molden.no> References: , , , <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>, , , , <4AFCE762.3030907@molden.no> , <4AFCEB25.5080309@molden.no>, <4AFCEC38.3080606@molden.no> Message-ID: > Sturla Molden skrev: > > Kevin Ar18 skrev: > > > >> Yeah, "personal use" wouldn't work. It's really more trouble to setup > >> an entirely new system for it all. > >> > >> > > That is confusing. "Personal use" in Sun's license also covers academic > > use and commercial use. What is does not cover is automated mass > > deployment in an organisation, except academic institutions. > > > > > > > http://www.virtualbox.org/wiki/Licensing_FAQ > > 6. *What exactly do you mean by /personal use/ and /academic use/ in > the Personal Use and Evaluation License ?* > > Personal use is when you install the product on one or more PCs > yourself and you make use of it (or even your friend, sister and > grandmother). It doesn't matter whether you just use it for fun or > run your multi-million euro business with it. Also, if you install > it on your work PC at some large company, this is still personal > use. However, if you are an administrator and want to deploy it to > the 500 desktops in your company, this would no longer qualify > as /personal use/. Well, you could ask each of your 500 employees to > install VirtualBox but don't you think we deserve some money in this > case? We'd even assist you with any issue you might have. Wow, well thanks for clearing that up. I would have never thought that is what they meant by personal use. :) Maybe I'll even use it for something fun one day. _________________________________________________________________ Hotmail: Trusted email with Microsoft?s powerful SPAM protection. http://clk.atdmt.com/GBL/go/177141664/direct/01/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Fri Nov 13 00:28:17 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 13 Nov 2009 14:28:17 +0900 Subject: [SciPy-User] Is there a Win 64bit version? In-Reply-To: References: , <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu>, <5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com> Message-ID: <4AFCEE71.1050405@ar.media.kyoto-u.ac.jp> Kevin Ar18 wrote: > > > Is numpy pure C? If so, would CLANG work? http://clang.llvm.org/ I doubt clang is well supported on windows. llvm itself has some issues on windows AFAIK. Since the project is backed up by Apple, I don't think there is a strong incentive to make this works well on windows. To build numpy on windows 64 bits, you need only one software besides python (which has to be 2.6), and that's Visual Studio 2008. If you are willing to spend a lot of time, you can download various packages from MS to get the free 64 bits compilers, but that's a lot of work compared to just using VS 2008. Note that you can download a fully functional trial edition of VS 2008 for free. David From david at ar.media.kyoto-u.ac.jp Fri Nov 13 00:43:22 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 13 Nov 2009 14:43:22 +0900 Subject: [SciPy-User] Is there a Win 64bit version? In-Reply-To: <46421F92-AD72-40B1-AA55-4577A1DA653D@cs.toronto.edu> References: <81633F29-2332-41CD-A6E4-472DAE2F4A0C@cs.toronto.edu> <5b8d13220911122021k458d1fffyb0e1aeba68b72d22@mail.gmail.com> <46421F92-AD72-40B1-AA55-4577A1DA653D@cs.toronto.edu> Message-ID: <4AFCF1FA.3010706@ar.media.kyoto-u.ac.jp> David Warde-Farley wrote: > Well, that's encouraging; at least there's SOME way to do it. Cheers > to Enthought, and of course to Mr. Cournapeau. :) Does it require > static linking, or is there a possibility (down the road, of course) > that binaries can be built that dynamically link against a (separately > installed) MKL? > It always require some dll, even if you compile the MKL statically. > Also, any chance you have the build instructions written down > somewhere? This is what we have for the Windows build instructions on > the new website: > > http://projects.scipy.org/scipy/browser/scipy.org/source/building/windows.rst > > I'm sure it is far from complete, and mentions nothing about 64-bit. The necessary tools have not all been released. David From dineshbvadhia at hotmail.com Fri Nov 13 03:30:40 2009 From: dineshbvadhia at hotmail.com (Dinesh B Vadhia) Date: Fri, 13 Nov 2009 00:30:40 -0800 Subject: [SciPy-User] Is there a Win 64bit version? Message-ID: Ditto I'd be over-the-moon if someone has a working Numpy for Windows 64-bit and are willing to share it. Oh, yes ... bring it on! Dinesh -------------------------------------------------------------------------------- Message: 10 Date: Thu, 12 Nov 2009 23:58:38 -0500 From: Kevin Ar18 Subject: Re: [SciPy-User] Is there a Win 64bit version? To: Message-ID: Content-Type: text/plain; charset="iso-8859-1" > > Is numpy pure C? If so, would CLANG work? http://clang.llvm.org/ > > The other David is the authority on these matters but I would imagine > not, unless you also had somehow compiled a custom Python through > that. Then maybe you'd have a shot. I think your only option is MSVC > on 64-bit Windows. > > David Is this the same David that said it would be easy to compile numpy for Win64 in that other link? Since I only need numpy, I'd be interested if anybody has made one for Win64 and just not uploaded it yet. :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From dineshbvadhia at hotmail.com Fri Nov 13 04:02:06 2009 From: dineshbvadhia at hotmail.com (Dinesh B Vadhia) Date: Fri, 13 Nov 2009 01:02:06 -0800 Subject: [SciPy-User] Is there a Win 64bit version? Message-ID: Good to know about the trial version of VS 2008. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.hirschfeld at gmail.com Fri Nov 13 04:14:26 2009 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Fri, 13 Nov 2009 09:14:26 +0000 (UTC) Subject: [SciPy-User] scikits.timeseries concatenate Message-ID: It appears that when remove_duplicates is True (the default) ts.concatenate doesn't respect the dimensions of the data array c.f. In [1]: ts1 = ts.time_series(array([[1,2]]).repeat(10,axis=0), start_date=ts.Date('D','01-Jan-2009')) In [2]: ts2 = ts.time_series(array([[3,4]]).repeat(10,axis=0), start_date=ts.Date('D','11-Jan-2009')) In [3]: ts.concatenate([ts1,ts2],axis=0,remove_duplicates=False) Out[3]: timeseries( [[1 2] [1 2] [1 2] [1 2] [1 2] [1 2] [1 2] [1 2] [1 2] [1 2] [3 4] [3 4] [3 4] [3 4] [3 4] [3 4] [3 4] [3 4] [3 4] [3 4]], dates = [01-Jan-2009 ... 20-Jan-2009], freq = D) In [4]: ts.concatenate([ts1,ts2],axis=0) Out[4]: timeseries([1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2], dates = [01-Jan-2009 ... 20-Jan-2009], freq = D) ?!? I think the fix is to pass the axis parameter as follows: result = ts.time_series(ndata.compress(orig,axis=axis), dates=ndates.compress(orig),freq=common_f) I haven't had time to test this thoroughly (execpt that it works for me) but thought I'd post it before the next bug-fix release got out. HTH, Dave From anderse at gmx.de Fri Nov 13 07:12:51 2009 From: anderse at gmx.de (Raimund Andersen) Date: Fri, 13 Nov 2009 13:12:51 +0100 Subject: [SciPy-User] Interpolate: Derivatives of parametric splines In-Reply-To: <5E5419A2-84AE-44A5-8A4C-1C96D4857CFD@yale.edu> References: <20091112113530.279300@gmx.net> <20091112134456.279300@gmx.net> <5E5419A2-84AE-44A5-8A4C-1C96D4857CFD@yale.edu> Message-ID: <20091113121251.239630@gmx.net> Hello Zach, once again thank you so much for your long and detailed answer. After testing this with many different kinds of input data, I really like it. Of course you are right, parametric splines are much less exact near the endpoints compared to the nonparametric ones. I didn't thought about that. Indeed, with 20 points of input data, even the second derivative is about 0 near the endpoints, and I am mainly interested in finding the zeros of the second derivative, so here I don't care much about how the slope goes in between. Maybe I can make use of the difference between x and (dx/du*u) as a kind of fuzzy quality rate; comparing (dx/du) between the first and the second derivative could bring some insights, too. This was really the help I needed, also thanks to you, Warren. Raimund -- GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 From devicerandom at gmail.com Fri Nov 13 10:28:51 2009 From: devicerandom at gmail.com (ms) Date: Fri, 13 Nov 2009 15:28:51 +0000 Subject: [SciPy-User] ODR fitting several equations to the same parameters In-Reply-To: <1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com> References: <4AFAE5AF.3020506@gmail.com> <1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com> <4AFC2407.1020902@gmail.com> <1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com> Message-ID: <4AFD7B33.1040904@gmail.com> josef.pktd at gmail.com ha scritto: > On Thu, Nov 12, 2009 at 10:04 AM, ms wrote: >> josef.pktd at gmail.com ha scritto: > an example > (quickly written and not optimized, there are parts I don't remember > about curve_fit, fixed parameters could be better handled by a class) Hmm, it seems I don't have curve_fit -I am constrained to use scipy-0.6.0 and there's no chance to change that (it's an external server). I am going to have a good look at what's doable with your approach anyway, but I am happy if someone gives me old-school alternatives :) cheers, m. > #################### > """stack equations with different known parameters > > I didn't get curve_fit to work with only 1 parameter to estimate > > Created on Thu Nov 12 11:17:21 2009 > Author: josef-pktd > """ > import numpy as np > from scipy import optimize > > > def fsingle(a,c,b,x): > return b*x**a + c > > atrue = 1. > ctrue = 10. > b = np.array([[1.]*10, [2.]*10, [3.]*10]) > b = np.array([1.,2.,3.]) > x = np.random.uniform(size=(3,10)) > y = np.hstack([fsingle(atrue, ctrue, b[i], x[i]) for i in range(x.shape[0])]) > y += 0.1*np.random.normal(size=y.shape) > > def fun(x,a,c): > #b is taken from enclosing scope > #print x.shape > xx=x.reshape((3,10)) > return np.hstack([fsingle(a, c, b[i], xx[i]) for i in range(xx.shape[0])]) > > res = optimize.curve_fit(fun,x.ravel(),y, p0=np.array([2.,1.])) > > print 'true parameters ', atrue, ctrue > print 'parameter estimate', res[0] > print 'standard deviation', np.sqrt(np.diag(res[1])) > #################### > > > > >> sorry, >> m. >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Fri Nov 13 11:00:49 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 13 Nov 2009 11:00:49 -0500 Subject: [SciPy-User] ODR fitting several equations to the same parameters In-Reply-To: <4AFD7B33.1040904@gmail.com> References: <4AFAE5AF.3020506@gmail.com> <1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com> <4AFC2407.1020902@gmail.com> <1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com> <4AFD7B33.1040904@gmail.com> Message-ID: <1cd32cbb0911130800q7d850211xcedd559e4325a26@mail.gmail.com> On Fri, Nov 13, 2009 at 10:28 AM, ms wrote: > josef.pktd at gmail.com ha scritto: >> On Thu, Nov 12, 2009 at 10:04 AM, ms wrote: >>> josef.pktd at gmail.com ha scritto: >> an example >> (quickly written and not optimized, there are parts I don't remember >> about curve_fit, fixed parameters could be better handled by a class) > > Hmm, it seems I don't have curve_fit -I am constrained to use > scipy-0.6.0 and there's no chance to change that (it's an external server). You can just copy the function (plus 2 helper functions) from the current trunk. You would need to add the imports. Alternatively you can just use optimize.leastsq directly, using curve_fit as a recipe. Josef http://projects.scipy.org/scipy/browser/trunk/scipy/optimize/minpack.py#L338 338 def _general_function(params, xdata, ydata, function): 339 return function(xdata, *params) - ydata 340 341 def _weighted_general_function(params, xdata, ydata, function, weights): 342 return weights * (function(xdata, *params) - ydata) 343 344 def curve_fit(f, xdata, ydata, p0=None, sigma=None, **kw): > > I am going to have a good look at what's doable with your approach > anyway, but I am happy if someone gives me old-school alternatives :) > > cheers, > m. > >> #################### >> """stack equations with different known parameters >> >> I didn't get curve_fit to work with only 1 parameter to estimate >> >> Created on Thu Nov 12 11:17:21 2009 >> Author: josef-pktd >> """ >> import numpy as np >> from scipy import optimize >> >> >> def fsingle(a,c,b,x): >> ? ? return b*x**a + c >> >> atrue = 1. >> ctrue = 10. >> b = np.array([[1.]*10, [2.]*10, [3.]*10]) >> b = np.array([1.,2.,3.]) >> x = np.random.uniform(size=(3,10)) >> y = np.hstack([fsingle(atrue, ctrue, b[i], x[i]) for i in range(x.shape[0])]) >> y += 0.1*np.random.normal(size=y.shape) >> >> def fun(x,a,c): >> ? ? #b is taken from enclosing scope >> ? ? #print x.shape >> ? ? xx=x.reshape((3,10)) >> ? ? return np.hstack([fsingle(a, c, b[i], xx[i]) for i in range(xx.shape[0])]) >> >> res = optimize.curve_fit(fun,x.ravel(),y, p0=np.array([2.,1.])) >> >> print 'true parameters ? ', atrue, ctrue >> print 'parameter estimate', res[0] >> print 'standard deviation', np.sqrt(np.diag(res[1])) >> #################### >> >> >> >> >>> sorry, >>> m. >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From devicerandom at gmail.com Fri Nov 13 11:30:07 2009 From: devicerandom at gmail.com (ms) Date: Fri, 13 Nov 2009 16:30:07 +0000 Subject: [SciPy-User] ODR fitting several equations to the same parameters In-Reply-To: <1cd32cbb0911130800q7d850211xcedd559e4325a26@mail.gmail.com> References: <4AFAE5AF.3020506@gmail.com> <1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com> <4AFC2407.1020902@gmail.com> <1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com> <4AFD7B33.1040904@gmail.com> <1cd32cbb0911130800q7d850211xcedd559e4325a26@mail.gmail.com> Message-ID: <4AFD898F.2040308@gmail.com> josef.pktd at gmail.com ha scritto: > On Fri, Nov 13, 2009 at 10:28 AM, ms wrote: >> josef.pktd at gmail.com ha scritto: >>> On Thu, Nov 12, 2009 at 10:04 AM, ms wrote: >>>> josef.pktd at gmail.com ha scritto: >>> an example >>> (quickly written and not optimized, there are parts I don't remember >>> about curve_fit, fixed parameters could be better handled by a class) >> Hmm, it seems I don't have curve_fit -I am constrained to use >> scipy-0.6.0 and there's no chance to change that (it's an external server). > > You can just copy the function (plus 2 helper functions) from the > current trunk. You would need to add the imports. Alternatively you > can just use optimize.leastsq directly, using curve_fit as a recipe. Thanks, but I've seen that with a bit of tweaking it works good with ODR too. Thanks a lot, really nice trick! A polished version of it should go in the cookbook in my opinion. thanks! m. > > Josef > > http://projects.scipy.org/scipy/browser/trunk/scipy/optimize/minpack.py#L338 > > 338 def _general_function(params, xdata, ydata, function): > 339 return function(xdata, *params) - ydata > 340 > 341 def _weighted_general_function(params, xdata, ydata, function, weights): > 342 return weights * (function(xdata, *params) - ydata) > 343 > 344 def curve_fit(f, xdata, ydata, p0=None, sigma=None, **kw): > > > >> I am going to have a good look at what's doable with your approach >> anyway, but I am happy if someone gives me old-school alternatives :) >> >> cheers, >> m. >> >>> #################### >>> """stack equations with different known parameters >>> >>> I didn't get curve_fit to work with only 1 parameter to estimate >>> >>> Created on Thu Nov 12 11:17:21 2009 >>> Author: josef-pktd >>> """ >>> import numpy as np >>> from scipy import optimize >>> >>> >>> def fsingle(a,c,b,x): >>> return b*x**a + c >>> >>> atrue = 1. >>> ctrue = 10. >>> b = np.array([[1.]*10, [2.]*10, [3.]*10]) >>> b = np.array([1.,2.,3.]) >>> x = np.random.uniform(size=(3,10)) >>> y = np.hstack([fsingle(atrue, ctrue, b[i], x[i]) for i in range(x.shape[0])]) >>> y += 0.1*np.random.normal(size=y.shape) >>> >>> def fun(x,a,c): >>> #b is taken from enclosing scope >>> #print x.shape >>> xx=x.reshape((3,10)) >>> return np.hstack([fsingle(a, c, b[i], xx[i]) for i in range(xx.shape[0])]) >>> >>> res = optimize.curve_fit(fun,x.ravel(),y, p0=np.array([2.,1.])) >>> >>> print 'true parameters ', atrue, ctrue >>> print 'parameter estimate', res[0] >>> print 'standard deviation', np.sqrt(np.diag(res[1])) >>> #################### >>> >>> >>> >>> >>>> sorry, >>>> m. >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-user >>>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From devicerandom at gmail.com Fri Nov 13 11:44:35 2009 From: devicerandom at gmail.com (ms) Date: Fri, 13 Nov 2009 16:44:35 +0000 Subject: [SciPy-User] ODR fitting several equations to the same parameters In-Reply-To: <1cd32cbb0911130800q7d850211xcedd559e4325a26@mail.gmail.com> References: <4AFAE5AF.3020506@gmail.com> <1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com> <4AFC2407.1020902@gmail.com> <1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com> <4AFD7B33.1040904@gmail.com> <1cd32cbb0911130800q7d850211xcedd559e4325a26@mail.gmail.com> Message-ID: <4AFD8CF3.7090508@gmail.com> josef.pktd at gmail.com ha scritto: > On Fri, Nov 13, 2009 at 10:28 AM, ms wrote: >> josef.pktd at gmail.com ha scritto: >>> On Thu, Nov 12, 2009 at 10:04 AM, ms wrote: >>>> josef.pktd at gmail.com ha scritto: >>> an example >>> (quickly written and not optimized, there are parts I don't remember >>> about curve_fit, fixed parameters could be better handled by a class) >> Hmm, it seems I don't have curve_fit -I am constrained to use >> scipy-0.6.0 and there's no chance to change that (it's an external server). > > You can just copy the function (plus 2 helper functions) from the > current trunk. You would need to add the imports. Alternatively you > can just use optimize.leastsq directly, using curve_fit as a recipe. A further question: It seems to me it works only if the data sets have the same size, because what gets minimized is then the matrix. What about datasets with different sizes? thanks, m. From oliphant at enthought.com Fri Nov 13 11:56:21 2009 From: oliphant at enthought.com (Travis Oliphant) Date: Fri, 13 Nov 2009 10:56:21 -0600 Subject: [SciPy-User] ODR fitting several equations to the same parameters In-Reply-To: <4AFD7B33.1040904@gmail.com> References: <4AFAE5AF.3020506@gmail.com> <1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com> <4AFC2407.1020902@gmail.com> <1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com> <4AFD7B33.1040904@gmail.com> Message-ID: <8530A83C-7525-4928-8C69-65E7249452F7@enthought.com> On Nov 13, 2009, at 9:28 AM, ms wrote: > josef.pktd at gmail.com ha scritto: >> On Thu, Nov 12, 2009 at 10:04 AM, ms wrote: >>> josef.pktd at gmail.com ha scritto: >> an example >> (quickly written and not optimized, there are parts I don't remember >> about curve_fit, fixed parameters could be better handled by a class) > > Hmm, it seems I don't have curve_fit -I am constrained to use > scipy-0.6.0 and there's no chance to change that (it's an external > server). > The code for curve_fit is pure python. You can grab it from the scipy trunk and just insert it into your own code. http://projects.scipy.org/scipy/browser/trunk/scipy/optimize/minpack.py Get lines 338 through 430 and make sure the module you put them in also has the lines: from scipy.optimize import leastsq from numpy import isscalar, asarray -Travis From josef.pktd at gmail.com Fri Nov 13 12:18:35 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 13 Nov 2009 12:18:35 -0500 Subject: [SciPy-User] ODR fitting several equations to the same parameters In-Reply-To: <4AFD8CF3.7090508@gmail.com> References: <4AFAE5AF.3020506@gmail.com> <1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com> <4AFC2407.1020902@gmail.com> <1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com> <4AFD7B33.1040904@gmail.com> <1cd32cbb0911130800q7d850211xcedd559e4325a26@mail.gmail.com> <4AFD8CF3.7090508@gmail.com> Message-ID: <1cd32cbb0911130918y71ef45a7h8515164d163a553d@mail.gmail.com> On Fri, Nov 13, 2009 at 11:44 AM, ms wrote: > josef.pktd at gmail.com ha scritto: >> On Fri, Nov 13, 2009 at 10:28 AM, ms wrote: >>> josef.pktd at gmail.com ha scritto: >>>> On Thu, Nov 12, 2009 at 10:04 AM, ms wrote: >>>>> josef.pktd at gmail.com ha scritto: >>>> an example >>>> (quickly written and not optimized, there are parts I don't remember >>>> about curve_fit, fixed parameters could be better handled by a class) >>> Hmm, it seems I don't have curve_fit -I am constrained to use >>> scipy-0.6.0 and there's no chance to change that (it's an external server). >> >> You can just copy the function (plus 2 helper functions) from the >> current trunk. You would need to add the imports. Alternatively you >> can just use optimize.leastsq directly, using curve_fit as a recipe. > > A further question: It seems to me it works only if the data sets have > the same size, because what gets minimized is then the matrix. What > about datasets with different sizes? In the example, I just did the stacking based on the 2d array to have it quickly written, for unequal sized data groups it is easier to work directly with the stacked array, and just index into it, or for example create a `b` array that has the values repeated corresponding to the group sizes. (Same story as with balanced versus unbalance panels.) Do you have a non-linear ODR example? I didn't even know ODR can do non-linear parameter estimation. Josef > > thanks, > m. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Fri Nov 13 12:21:14 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 13 Nov 2009 12:21:14 -0500 Subject: [SciPy-User] ODR fitting several equations to the same parameters In-Reply-To: <1cd32cbb0911130918y71ef45a7h8515164d163a553d@mail.gmail.com> References: <4AFAE5AF.3020506@gmail.com> <1cd32cbb0911120644o45d5c2a2pbb8763f1728cf3bd@mail.gmail.com> <4AFC2407.1020902@gmail.com> <1cd32cbb0911120855w47c06d2s2c0eed035f6192e0@mail.gmail.com> <4AFD7B33.1040904@gmail.com> <1cd32cbb0911130800q7d850211xcedd559e4325a26@mail.gmail.com> <4AFD8CF3.7090508@gmail.com> <1cd32cbb0911130918y71ef45a7h8515164d163a553d@mail.gmail.com> Message-ID: <1cd32cbb0911130921y5e568718uf78f4c22227130c6@mail.gmail.com> On Fri, Nov 13, 2009 at 12:18 PM, wrote: > On Fri, Nov 13, 2009 at 11:44 AM, ms wrote: >> josef.pktd at gmail.com ha scritto: >>> On Fri, Nov 13, 2009 at 10:28 AM, ms wrote: >>>> josef.pktd at gmail.com ha scritto: >>>>> On Thu, Nov 12, 2009 at 10:04 AM, ms wrote: >>>>>> josef.pktd at gmail.com ha scritto: >>>>> an example >>>>> (quickly written and not optimized, there are parts I don't remember >>>>> about curve_fit, fixed parameters could be better handled by a class) >>>> Hmm, it seems I don't have curve_fit -I am constrained to use >>>> scipy-0.6.0 and there's no chance to change that (it's an external server). >>> >>> You can just copy the function (plus 2 helper functions) from the >>> current trunk. You would need to add the imports. Alternatively you >>> can just use optimize.leastsq directly, using curve_fit as a recipe. >> >> A further question: It seems to me it works only if the data sets have >> the same size, because what gets minimized is then the matrix. What >> about datasets with different sizes? > > In the example, I just did the stacking based on the 2d array to have > it quickly written, for unequal sized data groups it is easier to work > directly with the stacked array, and just index into it, or for > example create a `b` array that has the values repeated corresponding > to the group sizes. (Same story as with balanced versus unbalance > panels.) > > Do you have a non-linear ODR example? I didn't even know ODR can do > non-linear parameter estimation. Or maybe I knew it and just forgot about it. > > Josef > > > >> >> thanks, >> m. >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > From d.l.goldsmith at gmail.com Fri Nov 13 15:23:10 2009 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Fri, 13 Nov 2009 12:23:10 -0800 Subject: [SciPy-User] For DavidC, relevant to Windoze in general: BLAS/LAPACK installer Message-ID: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com> Hi, David C. (and all). Searching the archives for "Windows BLAS," I found your 2008 post announcing an alpha version of a BLAS/LAPACK Windows installer "superpack," but the link therein appears to be dead (I get a 404 not found error). What's the status of this endeavor? Have you "recalled" or stopped developing the product? If so, what's your present recommendation for installing these in Vista? (I found an interesting 2008 paper "Choosing the optimal BLAS and LAPACK library," which has a list indicating ATLAS, Goto, and Intel MKL as "better-than-reference" tested implementations on Intel-based architecture, but in Linux. Also, it indicates that Goto is BLAS-only - is this all I need for a viable build of scipy from source?) Thanks! David G. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dwf at cs.toronto.edu Fri Nov 13 17:08:02 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Fri, 13 Nov 2009 17:08:02 -0500 Subject: [SciPy-User] For DavidC, relevant to Windoze in general: BLAS/LAPACK installer In-Reply-To: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com> References: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com> Message-ID: On 13-Nov-09, at 3:23 PM, David Goldsmith wrote: > (I found an interesting 2008 > paper "Choosing the optimal BLAS and LAPACK library," which has a list > indicating ATLAS, Goto, and Intel MKL as "better-than-reference" > tested > implementations on Intel-based architecture, but in Linux. Also, it > indicates that Goto is BLAS-only - is this all I need for a viable > build of > scipy from source?) Thanks! ATLAS is also "BLAS-only" (i.e. if you download the ATLAS source tarball and build, all you will get are ATLAS's tuned implementations of the Basic Linear Algebra Subroutines, LAPACK must be downloaded separately). LAPACK routines rely on BLAS, The difference is that I think ATLAS has support for building an (optimized?) LAPACK. I think it may also do some auto-tuning of LAPACK compiler flags or something. I'm afraid I have no clue about building on Windows, sorry. If you get it working, document your steps so we can put it on the website (I'm pretty sure the version on the wiki/in SVN now are out of date). David From sturla at molden.no Fri Nov 13 17:10:13 2009 From: sturla at molden.no (Sturla Molden) Date: Fri, 13 Nov 2009 23:10:13 +0100 Subject: [SciPy-User] For DavidC, relevant to Windoze in general: BLAS/LAPACK installer In-Reply-To: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com> References: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com> Message-ID: <4AFDD945.60507@molden.no> David Goldsmith skrev: > Also, it indicates that Goto is BLAS-only - is this all I need for a > viable build of scipy from source?) Thanks! The heavy lifiting in LAPACK is delegated to BLAS. So if you build LAPACK against ATLAS or GotoBLAS, LAPACK does not tend to be the bottleneck. Both ATLAS and GotoBLAS reimplements some routines from LAPACK. But they will not give you a full LAPACK. To avoid LAPACK overshadowing LAPACK routines in GotoBLAS, see this: http://jupiter.ethz.ch/~dmay/Research/GotoBLAS/index.html MKL has both LAPACK and BLAS, but it is very expensive. I don't know if Intel has reimplemented LAPACK, but that would surprise me, as it would suffice to make a fast BLAS. From sturla at molden.no Fri Nov 13 17:16:19 2009 From: sturla at molden.no (Sturla Molden) Date: Fri, 13 Nov 2009 23:16:19 +0100 Subject: [SciPy-User] For DavidC, relevant to Windoze in general: BLAS/LAPACK installer In-Reply-To: References: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com> Message-ID: <4AFDDAB3.5060807@molden.no> David Warde-Farley skrev: > I'm afraid I have no clue about building on Windows, sorry. If you get > it working, document your steps so we can put it on the website (I'm > pretty sure the version on the wiki/in SVN now are out of date). > > At least ATLAS needs to be built using Cygwin, which is a PITA. And from what I know, there is no 64 bit support in Cygwin either, so we always end up with a 32 bit ATLAS. And considering that GotoBLAS is claimed to speed up Matlab (ATLAS or MKL being defaults), the choise is not difficult... From d.l.goldsmith at gmail.com Fri Nov 13 17:32:09 2009 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Fri, 13 Nov 2009 14:32:09 -0800 Subject: [SciPy-User] For DavidC, relevant to Windoze in general: BLAS/LAPACK installer In-Reply-To: <4AFDDAB3.5060807@molden.no> References: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com> <4AFDDAB3.5060807@molden.no> Message-ID: <45d1ab480911131432h9e02a3ajcb5f06e9c2c12ffb@mail.gmail.com> Thanks, both, sounds like Goto is the way to go unless David C. still has/supports his MSI for BLAS for Windoze. (Speeding up Matlab isn't an issue for me - except to the extent that it implies speeding up numpy/scipy - as I've committed myself to eventually converting/refactoring all my old Matlab code to Python and its modules.) :-) DG On Fri, Nov 13, 2009 at 2:16 PM, Sturla Molden wrote: > David Warde-Farley skrev: > > I'm afraid I have no clue about building on Windows, sorry. If you get > > it working, document your steps so we can put it on the website (I'm > > pretty sure the version on the wiki/in SVN now are out of date). > > > > > > At least ATLAS needs to be built using Cygwin, which is a PITA. And from > what I know, there is no 64 bit support in Cygwin either, so we always > end up with a 32 bit ATLAS. > > And considering that GotoBLAS is claimed to speed up Matlab (ATLAS or > MKL being defaults), the choise is not difficult... > > > > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Fri Nov 13 17:44:47 2009 From: sturla at molden.no (Sturla Molden) Date: Fri, 13 Nov 2009 23:44:47 +0100 Subject: [SciPy-User] For DavidC, relevant to Windoze in general: BLAS/LAPACK installer In-Reply-To: <45d1ab480911131432h9e02a3ajcb5f06e9c2c12ffb@mail.gmail.com> References: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com> <4AFDDAB3.5060807@molden.no> <45d1ab480911131432h9e02a3ajcb5f06e9c2c12ffb@mail.gmail.com> Message-ID: <4AFDE15F.7020700@molden.no> David Goldsmith skrev: > Thanks, both, sounds like Goto is the way to go unless David C. still > has/supports his MSI for BLAS for Windoze. (Speeding up Matlab isn't > an issue for me - except to the extent that it implies speeding up > numpy/scipy - as I've committed myself to eventually > converting/refactoring all my old Matlab code to Python and its > modules.) :-) GotoBLAS will speed up linear algebra computation with NumPy and SciPy, and it is trivial to build. The catch is that GotoBLAS is only free for personal or academic use. Note that most use of NumPy or SciPy do not involve heavy use of LAPACK, and the computational bottleneck tends to be in the creation of temporary arrays, not in CPU-intensive computation. From sturla at molden.no Fri Nov 13 18:06:46 2009 From: sturla at molden.no (Sturla Molden) Date: Sat, 14 Nov 2009 00:06:46 +0100 Subject: [SciPy-User] For DavidC, relevant to Windoze in general: BLAS/LAPACK installer In-Reply-To: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com> References: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com> Message-ID: <4AFDE686.9080102@molden.no> David Goldsmith skrev: > Hi, David C. (and all). Searching the archives for "Windows BLAS," I > found your 2008 post announcing an alpha version of a BLAS/LAPACK > Windows installer "superpack," but the link therein appears to be dead > (I get a 404 not found error). I'll just mention that re-distribution of GotoBLAS is prohibited. So if SciPy is to provide a BLAS superpack it will have to be ATLAS. That is difficult as ATLAS requires more tuning parameters than GotoBLAS+LAPACK. The performance of Intel's MKL and AMD's ACML is only slightly less than GotoBLAS. If you cannot build NumPy with GotoBLAS and LAPACK, I suggest you use MKL or ACML instead. By the way, it would be better if NumPy and SciPy had a way of replacing it's BLAS and LAPACK libraries. For example if they were not statically linked, as today, they could be a DLL whose path would be found in a config file. Sturla From cournape at gmail.com Fri Nov 13 18:21:48 2009 From: cournape at gmail.com (David Cournapeau) Date: Sat, 14 Nov 2009 08:21:48 +0900 Subject: [SciPy-User] For DavidC, relevant to Windoze in general: BLAS/LAPACK installer In-Reply-To: <4AFDE686.9080102@molden.no> References: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com> <4AFDE686.9080102@molden.no> Message-ID: <5b8d13220911131521qcbe5ac1laa3ab510b2e09680@mail.gmail.com> On Sat, Nov 14, 2009 at 8:06 AM, Sturla Molden wrote: > > By the way, it would be better if NumPy and ?SciPy had a way of > replacing it's BLAS and LAPACK libraries. For example if they were not > statically linked, as today, they could be a DLL whose path would be > found in a config file. Yes, we know, but that is difficult, partly because of windows limitations, partly because every blas/lapack has different conventions (name mangling for example). It is not impossible, but that's quite a lot of work to make it work reliably. David From pgmdevlist at gmail.com Fri Nov 13 18:54:23 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 13 Nov 2009 18:54:23 -0500 Subject: [SciPy-User] scikits.timeseries concatenate In-Reply-To: References: Message-ID: <0872A453-272D-4520-907B-8468146B8AB3@gmail.com> On Nov 13, 2009, at 4:14 AM, Dave Hirschfeld wrote: > > It appears that when remove_duplicates is True (the default) ts.concatenate > doesn't respect the dimensions of the data array c.f. Good call, and thanks for the fix. I gonna investigate some more and let you know... From d.l.goldsmith at gmail.com Fri Nov 13 19:16:07 2009 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Fri, 13 Nov 2009 16:16:07 -0800 Subject: [SciPy-User] For DavidC, relevant to Windoze in general: BLAS/LAPACK installer In-Reply-To: <5b8d13220911131521qcbe5ac1laa3ab510b2e09680@mail.gmail.com> References: <45d1ab480911131223t3b2a6ac3l97a2dbd81eb758f8@mail.gmail.com> <4AFDE686.9080102@molden.no> <5b8d13220911131521qcbe5ac1laa3ab510b2e09680@mail.gmail.com> Message-ID: <45d1ab480911131616y7f34c41cx5a6c92723c9375ed@mail.gmail.com> Hi, David. I take it you concur w/ Sturla's rec. that try Goto first? DG On Fri, Nov 13, 2009 at 3:21 PM, David Cournapeau wrote: > On Sat, Nov 14, 2009 at 8:06 AM, Sturla Molden wrote: > > > > > By the way, it would be better if NumPy and SciPy had a way of > > replacing it's BLAS and LAPACK libraries. For example if they were not > > statically linked, as today, they could be a DLL whose path would be > > found in a config file. > > Yes, we know, but that is difficult, partly because of windows > limitations, partly because every blas/lapack has different > conventions (name mangling for example). > > It is not impossible, but that's quite a lot of work to make it work > reliably. > > David > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amenity at enthought.com Sat Nov 14 14:18:04 2009 From: amenity at enthought.com (Amenity Applewhite) Date: Sat, 14 Nov 2009 13:18:04 -0600 Subject: [SciPy-User] November 20 Webinar: Interpolation with NumPy/SciPy References: <1102826606498.1102424111856.4101.9.1814156C@scheduler> Message-ID: Having trouble viewing this email? Click here Friday, November 20: Interpolation with NumPy/SciPy Dear Amenity, It's time for our mid-month Scientific Computing with Python webinar! This month's topic is sure to prove very useful for data analysts: Interpolation with NumPy and SciPy. In many data-processing scenarios, it is necessary to use a discrete set of available data-points to infer the value of a function at a new data-point. One approach to this problem is interpolation, which constructs a new model-function that goes through the original data- points. There are many forms of interpolation - polynomial, spline, kriging, radial basis function, etc. - and SciPy includes some of these interpolation forms. This webinar will review the interpolation modules available in SciPy and in the larger Python community and provide instruction on their use via example. Scientific Computing with Python Webinar: Interpolation with NumPy/SciPy Friday, November 20 1pm CDT/7pm UTC Register at GoToMeeting We look forward to seeing you Friday! As always, feel free to contact us with questions, concerns, or suggestions for future webinar topics. Thanks, The Enthought Team QUICK LINKS ::: www.enthought.com code.enthought.com Facebook Enthought Blog Forward email This email was sent to amenity at enthought.com by amenity at enthought.com. Update Profile/Email Address | Instant removal with SafeUnsubscribe? | Privacy Policy. Enthought, Inc. | 515 Congress Ave. | Suite 2100 | Austin | TX | 78701 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mattknox.ca at gmail.com Sun Nov 15 15:47:51 2009 From: mattknox.ca at gmail.com (Matt Knox) Date: Sun, 15 Nov 2009 20:47:51 +0000 (UTC) Subject: [SciPy-User] ANN: scikits.timeseries 0.91.3 Message-ID: We are pleased to announce the release of scikits.timeseries 0.91.3 This is a bug fix release and is recommended for all users. Home page: http://pytseries.sourceforge.net/ Please see the website for installation requirements and download details. Bug Fixes --------- * general improvements for tsfromtxt * accept datetime objects for 'value' positional arg in Date class * fixes for compatibility with matplotlib 0.99.1 * fix problem with '%j' directive in strftime method * fix problem with concatenate and 2-d series * fixed crash in reportlib.Report class when fixed_width=False and a header_row were specified at same time Thanks, Matt Knox & Pierre Gerard-Marchant From cohen at lpta.in2p3.fr Sun Nov 15 16:02:38 2009 From: cohen at lpta.in2p3.fr (Johann Cohen-Tanugi) Date: Sun, 15 Nov 2009 22:02:38 +0100 Subject: [SciPy-User] Unit testing of Bayesian estimator In-Reply-To: References: Message-ID: <4B006C6E.6080800@lpta.in2p3.fr> Anne, do you know of a python implementation of Lomb-Scargle? Johann Anne Archibald wrote: > Hi, > > I have implemented a simple Bayesian regression program (it takes > events modulo one and returns a posterior probability that the data is > phase-invariant plus a posterior distribution for two parameters > (modulation fraction and phase) in case there is modulation). I'm > rather new at this, so I'd like to construct some unit tests. Does > anyone have any suggestions on how to go about this? > > For a frequentist periodicity detector, the return value is a > probability that, given the null hypothesis is true, the statistic > would be this extreme. So I can construct a strong unit test by > generating a collection of data sets given the null hypothesis, > evaluating the statistic, and seeing whether the number that claim to > be significant at a 5% level is really 5%. (In fact I can use the > binomial distribution to get limits on the number of false positive.) > This gives me a unit test that is completely orthogonal to my > implementation, and that passes if and only if the code works. For a > Bayesian hypothesis testing setup, I don't really see how to do > something analogous. > > I can generate non-modulated data sets and confirm that my code > returns a high probability that the data is not modulated, but how > high should I expect the probability to be? I can generate data sets > with models with known parameters and check that the best-fit > parameters are close to the known parameters - but how close? Even if > I do it many times, is the posterior mean unbiased? What about the > posterior mode or median? I can even generate models and then data > sets that are drawn from the prior distribution, but what should I > expect from the code output on such a data set? I feel sure there's > some test that verifies a statistical property of Bayesian > estimators/hypothesis testers, but I cant quite put my finger on it. > > Suggestions welcome. > > Thanks, > Anne > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From dpfrota at yahoo.com.br Sun Nov 15 19:00:47 2009 From: dpfrota at yahoo.com.br (dpfrota) Date: Sun, 15 Nov 2009 16:00:47 -0800 (PST) Subject: [SciPy-User] [SciPy-user] Audiolab on Py2.6 Message-ID: <26355930.post@talk.nabble.com> I got the exactly same error... More tips? Thanks David Cournapeau wrote: > > On Tue, Oct 27, 2009 at 3:04 AM, Christopher Brown wrote: >> Thanks for the suggestion. However, audiolab didn't need it installed on >> Python 2.5. I also see the file _sndfile.dll in the audiolab folder, >> which I assume contains the sndfile code (it is ~3.5mb). >> >> I installed it anyway, and I copied the dll into the audiolab folder, >> but the error persists. Any other suggestions? > > There is indeed a problem on 2.6, but I have not found the time to > look at it. Most likely linked to the manifest nonsense on windows, > > David > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- View this message in context: http://old.nabble.com/Audiolab-on-Py2.6-tp26064218p26355930.html Sent from the Scipy-User mailing list archive at Nabble.com. From mudit_19a at yahoo.com Sun Nov 15 19:25:31 2009 From: mudit_19a at yahoo.com (mudit sharma) Date: Mon, 16 Nov 2009 05:55:31 +0530 (IST) Subject: [SciPy-User] Pytseries numpy func error In-Reply-To: References: <4AF8842D.5010805@ucsf.edu> <835246.33088.qm@web94906.mail.in2.yahoo.com> Message-ID: <814366.30498.qm@web94914.mail.in2.yahoo.com> actually i figured that out it throws that error when data array is of dtype object In [74]: data = npy.array([-1840.0,-1550.0,-940.0,2660.0,190.0,3980.0,1130.0,2090.0,1980.0,1220.0,-1220.0,1140.0,-2420.0,2200.0,370.0,230.0,-60.0,2550.0,970.0,660.0,-20.0,50.0,-980.0,6580.0,4090.0,3240.0,-350.0,-1800.0,2020.0,5050.0,-110.0,-330.0,-2290.0], dtype=npy.object) In [75]: dates = "Mar-2007","Apr-2007","May-2007","Jun-2007","Jul-2007","Aug-2007","Sep-2007","Oct-2007","Nov-2007","Dec-2007","Jan-2008","Feb-2008","Mar-2008","Apr-2008","May-2008","Jun-2008","Jul-2008","Aug-2008","Sep-2008","Oct-2008","Nov-2008","Dec-2008","Jan-2009","Feb-2009","Mar-2009","Apr-2009","May-2009","Jun-2009","Jul-2009","Aug-2009","Sep-2009","Oct-2009","Nov-2009" In [76]: series = ts.time_series(data,dates, freq="M") ----- Original Message ---- From: Matt Knox To: scipy-user at scipy.org Sent: Tue, 10 November, 2009 22:29:41 Subject: Re: [SciPy-User] Pytseries numpy func error > series.sum() gives this error whereas series.data.sum() > works. I don't get this error when trying a sum on a TimeSeries object. I noticed you are using an older version of the timeseries module. Can you try upgrading to the latest version and see if you still get an error? Also, if you still get the error please post a small example demonstrating how to get the error, thanks. Also, note that we will probably be doing a new minor bug fix release within the next week or two. - Matt _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From pgmdevlist at gmail.com Sun Nov 15 19:58:10 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Sun, 15 Nov 2009 19:58:10 -0500 Subject: [SciPy-User] Pytseries numpy func error In-Reply-To: <814366.30498.qm@web94914.mail.in2.yahoo.com> References: <4AF8842D.5010805@ucsf.edu> <835246.33088.qm@web94906.mail.in2.yahoo.com> <814366.30498.qm@web94914.mail.in2.yahoo.com> Message-ID: <11764FA5-B832-4397-9C76-6EEBF3A82AEA@gmail.com> On Nov 15, 2009, at 7:25 PM, mudit sharma wrote: > > actually i figured that out it throws that error when data array is of dtype object Confirmed. The bug is in numpy.ma, I'll check that later this evening... From peridot.faceted at gmail.com Sun Nov 15 21:49:23 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Sun, 15 Nov 2009 21:49:23 -0500 Subject: [SciPy-User] Unit testing of Bayesian estimator In-Reply-To: <4B006C6E.6080800@lpta.in2p3.fr> References: <4B006C6E.6080800@lpta.in2p3.fr> Message-ID: 2009/11/15 Johann Cohen-Tanugi : > Anne, do you know of a python implementation of Lomb-Scargle? I don't, unfortunately. But as there seems to be no clever FFT-like trick to it, it can probably be written in a few lines of numpy code - simply take an array of frequencies and an array of times, broadcast them together, and apply the formulas. If you have a lot of events or a lot of frequencies, a loop over the smaller array will save a big intermediate array, but beyond that I don't think there's much cleverness to be put in. Anne From peridot.faceted at gmail.com Sun Nov 15 22:22:28 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Sun, 15 Nov 2009 22:22:28 -0500 Subject: [SciPy-User] Unit testing of Bayesian estimator In-Reply-To: References: <4AF84EF1.2090608@gmail.com> <4AF871BE.6050300@gmail.com> Message-ID: Thank you everyone for all your comments. I have managed to pull together a more-or-less satisfactory solution. If you're curious, I have written up the problem at: http://lighthouseinthesky.blogspot.com/2009/11/curve-fitting-part-3-bayesian-fitting.html and my solution so far at: http://lighthouseinthesky.blogspot.com/2009/11/curve-fitting-part-4-validating.html Thanks, Anne From gokhansever at gmail.com Tue Nov 17 00:44:17 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Mon, 16 Nov 2009 23:44:17 -0600 Subject: [SciPy-User] Fitting a curve on a log-normal distributed data Message-ID: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> Hello, I have a data which represents aerosol size distribution in between 0.1 to 3.0 micrometer ranges. I would like extrapolate the lower size down to 10 nm. The data in this context is log-normally distributed. Therefore I am looking a way to fit a log-normal curve onto my data. Could you please give me some pointers to solve this problem? Thank you. -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Nov 17 00:51:19 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 16 Nov 2009 23:51:19 -0600 Subject: [SciPy-User] Fitting a curve on a log-normal distributed data In-Reply-To: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> Message-ID: <3d375d730911162151v6f4db525ka8ccca864a32b162@mail.gmail.com> On Mon, Nov 16, 2009 at 23:44, G?khan Sever wrote: > Hello, > > I have a data which represents aerosol size distribution in between 0.1 to > 3.0 micrometer ranges. I would like extrapolate the lower size down to 10 > nm. The data in this context is log-normally distributed. Therefore I am > looking a way to fit a log-normal curve onto my data. Could you please give > me some pointers to solve this problem? Transform the data y=log(x) then estimate the mean and variance of y. With the appropriate transformations (which you will have to look up depending on the convention of the log-normal calculations that you are using), these are reasonable estimates of the log-normal distribution for your data. Or you could just stay in the transformed space. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From gokhansever at gmail.com Tue Nov 17 12:29:10 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Tue, 17 Nov 2009 11:29:10 -0600 Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> Message-ID: <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> On Tue, Nov 17, 2009 at 12:13 AM, Ian Mallett wrote: > Theory wise: > -Do a linear regression on your data. > -Apply a logrithmic transform to your data's dependent variable, and do > another linear regression. > -Apply a logrithmic transform to your data's independent variable, and do > another linear regression. > -Take the best regression (highest r^2 value) and execute a back transform. > > Then, to get your desired extrapolation, simply substitute in the size for > the independent variable to get the expected value. > > If, however, you're looking for how to implement this in NumPy or SciPy, I > can't really help :-P > Ian > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > OK, before applying your suggestions. I have a few more questions. Here is 1 real-sample data that I will use as a part of the log-normal fitting. There is 15 elements in this arrays each being a concentration for corresponding 0.1 - 3.0 um size ranges. I[74]: conc O[74]: array([ 119.7681, 118.546 , 146.6548, 96.5478, 109.9911, 32.9974, 20.7762, 6.1107, 12.2212, 3.6664, 3.6664, 1.2221, 2.4443, 2.4443, 3.6664]) For now not calibrated size range I just assume a linear array: I[78]: sizes = linspace(0.1, 3.0, 15) I[79]: sizes O[79]: array([ 0.1 , 0.30714286, 0.51428571, 0.72142857, 0.92857143, 1.13571429, 1.34285714, 1.55 , 1.75714286, 1.96428571, 2.17142857, 2.37857143, 2.58571429, 2.79285714, 3. ]) Not a very ideal looking log-normal, but so far I don't know what else besides a log-normal fit would give me a better estimate: I[80]: figure(); plot(sizes, conc) http://img406.imageshack.us/img406/156/sizeconc.png scipy.stats has the lognorm.fit lognorm.fit(data,s,loc=0,scale=1) - Parameter estimates for lognorm data and applying this to my data. However not sure the right way of calling it, and not sure if this could be applied to my case? I[81]: stats.lognorm.fit(conc) O[81]: array([ 2.31386066, 1.19126064, 9.5748391 ]) Lastly, what is the way to create a ideal log-normal sample using the stats.lognorm.rvs? Thanks -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Nov 17 13:38:01 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 17 Nov 2009 13:38:01 -0500 Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> Message-ID: <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> On Tue, Nov 17, 2009 at 12:29 PM, G?khan Sever wrote: > > > On Tue, Nov 17, 2009 at 12:13 AM, Ian Mallett wrote: >> >> Theory wise: >> -Do a linear regression on your data. >> -Apply a logrithmic transform to your data's dependent variable, and do >> another linear regression. >> -Apply a logrithmic transform to your data's independent variable, and do >> another linear regression. >> -Take the best regression (highest r^2 value) and execute a back >> transform. >> >> Then, to get your desired extrapolation, simply substitute in the size for >> the independent variable to get the expected value. >> >> If, however, you're looking for how to implement this in NumPy or SciPy, I >> can't really help :-P >> Ian >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > OK, before applying your suggestions. I have a few more questions. Here is 1 > real-sample data that I will use as a part of the log-normal fitting. There > is 15 elements in this arrays each being a concentration for corresponding > 0.1 - 3.0 um size ranges. > > I[74]: conc > O[74]: > array([ 119.7681,? 118.546 ,? 146.6548,?? 96.5478,? 109.9911,?? 32.9974, > ???????? 20.7762,??? 6.1107,?? 12.2212,??? 3.6664,??? 3.6664,??? 1.2221, > ????????? 2.4443,??? 2.4443,??? 3.6664]) > > For now not calibrated size range I just assume a linear array: > > I[78]: sizes = linspace(0.1, 3.0, 15) > > I[79]: sizes > O[79]: > array([ 0.1?????? ,? 0.30714286,? 0.51428571,? 0.72142857,? 0.92857143, > ??????? 1.13571429,? 1.34285714,? 1.55????? ,? 1.75714286,? 1.96428571, > ??????? 2.17142857,? 2.37857143,? 2.58571429,? 2.79285714,? 3.??????? ]) > > > Not a very ideal looking log-normal, but so far I don't know what else > besides a log-normal fit would give me a better estimate: > I[80]: figure(); plot(sizes, conc) > http://img406.imageshack.us/img406/156/sizeconc.png > > scipy.stats has the lognorm.fit > > ??? lognorm.fit(data,s,loc=0,scale=1) > ??????? - Parameter estimates for lognorm data > > and applying this to my data. However not sure the right way of calling it, > and not sure if this could be applied to my case? > > I[81]: stats.lognorm.fit(conc) > O[81]: array([ 2.31386066,? 1.19126064,? 9.5748391 ]) > > Lastly, what is the way to create a ideal log-normal sample using the > stats.lognorm.rvs? I don't think I understand the connection to the log-normal distribution. You seem to have a non-linear relationship conc = f(size) where you want to find a non-linear relationship f If conc where just lognormal distributed, then you would not get any relationship between conc and size. If you have many observations with conc, size pairs then you could estimate a noisy model conc = f(size) + u where the noise u is for example log-normal distributed but you would still need to get an expression for the non-linear function f. Extending a non-linear function outside of the observed range is essentially always just a guess or by assumption. If you want to fit a curve f that has the same shape as the pdf of the log-normal, then you cannot do it with lognorm.fit, because that just assumes you have a random sample independent of size. So, it's not clear to me what you really want, or what your sample data looks like (do you have only one 15 element sample or lots of them). Josef > > Thanks > > > -- > G?khan > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From robert.kern at gmail.com Tue Nov 17 13:57:46 2009 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 17 Nov 2009 12:57:46 -0600 Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> Message-ID: <3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com> On Tue, Nov 17, 2009 at 12:38, wrote: > So, it's not clear to me what you really want, or what your sample data > looks like (do you have only one 15 element sample or lots of them). I'm guessing that they aren't really samples of (conc, size) pairs so much as binned data. Particles with sizes between 0.1 and 0.3 um (for example; I don't know where the bin edges actually are in his data) have a concentration of 119.7681 particles/. This can be normalized to a more proper histogrammed distribution, except that the lower end of the distribution below 0.1 um has been censored by his measuring process. He then wants to infer the continuous distribution that generated that censored histogram so he can predict what the distribution is in the censored region. So, I would say that it's a bit trickier than fitting the log-normal PDF to the data for a couple of reasons. 1) Directly fitting PDFs to histogram values is usually not a great idea to begin with. 2) We don't know how much probability mass is in the censored region. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From gokhansever at gmail.com Tue Nov 17 14:28:45 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Tue, 17 Nov 2009 13:28:45 -0600 Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> Message-ID: <49d6b3500911171128q78672fc7r50975a06a7c91132@mail.gmail.com> On Tue, Nov 17, 2009 at 12:38 PM, wrote: > On Tue, Nov 17, 2009 at 12:29 PM, G?khan Sever > wrote: > > > > > > On Tue, Nov 17, 2009 at 12:13 AM, Ian Mallett > wrote: > >> > >> Theory wise: > >> -Do a linear regression on your data. > >> -Apply a logrithmic transform to your data's dependent variable, and do > >> another linear regression. > >> -Apply a logrithmic transform to your data's independent variable, and > do > >> another linear regression. > >> -Take the best regression (highest r^2 value) and execute a back > >> transform. > >> > >> Then, to get your desired extrapolation, simply substitute in the size > for > >> the independent variable to get the expected value. > >> > >> If, however, you're looking for how to implement this in NumPy or SciPy, > I > >> can't really help :-P > >> Ian > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > > > OK, before applying your suggestions. I have a few more questions. Here > is 1 > > real-sample data that I will use as a part of the log-normal fitting. > There > > is 15 elements in this arrays each being a concentration for > corresponding > > 0.1 - 3.0 um size ranges. > > > > I[74]: conc > > O[74]: > > array([ 119.7681, 118.546 , 146.6548, 96.5478, 109.9911, 32.9974, > > 20.7762, 6.1107, 12.2212, 3.6664, 3.6664, 1.2221, > > 2.4443, 2.4443, 3.6664]) > > > > For now not calibrated size range I just assume a linear array: > > > > I[78]: sizes = linspace(0.1, 3.0, 15) > > > > I[79]: sizes > > O[79]: > > array([ 0.1 , 0.30714286, 0.51428571, 0.72142857, 0.92857143, > > 1.13571429, 1.34285714, 1.55 , 1.75714286, 1.96428571, > > 2.17142857, 2.37857143, 2.58571429, 2.79285714, 3. ]) > > > > > > Not a very ideal looking log-normal, but so far I don't know what else > > besides a log-normal fit would give me a better estimate: > > I[80]: figure(); plot(sizes, conc) > > http://img406.imageshack.us/img406/156/sizeconc.png > > > > scipy.stats has the lognorm.fit > > > > lognorm.fit(data,s,loc=0,scale=1) > > - Parameter estimates for lognorm data > > > > and applying this to my data. However not sure the right way of calling > it, > > and not sure if this could be applied to my case? > > > > I[81]: stats.lognorm.fit(conc) > > O[81]: array([ 2.31386066, 1.19126064, 9.5748391 ]) > > > > Lastly, what is the way to create a ideal log-normal sample using the > > stats.lognorm.rvs? > > R. Kern has nicely summarized my intention. Let me add some more onto his description. > I don't think I understand the connection to the log-normal distribution. > You seem to have a non-linear relationship > conc = f(size) where you want to find a non-linear relationship f > Here I am directly quoting from on of my cloud physics books: "Once a discrete model size distribution has been laid out, the initial particle number, volume, and mass concentrations must be distributed among model size bins. This can be accomplished by fitting measurements to a continuous size distribution, then discretizing the continuous distribution over the model bins. Three continuous distributions available for this procedure are the lognormal, Marshall?Palmer, and modified gamma distributions." My data are discrete in its nature, since have only 15 channels in between (0.1 to 3.0 um ranges). Say that (from the sample data that I used in my previous e-mail) the first channel is in between 0.10 to 0.31 um and I read the number concentration for this size-range as 119.77 #/cm^3 so on so forth. Since I am interested to estimate the number concentrations below the 0.1 um (preferably down to 0.01 um or 10 nm) I would like to fit a continuous distribution onto my dataset. Among the all three continuous distributions lognormal seems to be the easiest to implement, and log-normal distribution is commonly used to represent aerosol size distribution in the atmosphere. If there is a way to do this discretely I would like to know very much. > > If conc where just lognormal distributed, then you would not get any > relationship between conc and size. > > If you have many observations with conc, size pairs then you could > estimate a noisy model > conc = f(size) + u where the noise u is for example log-normal > distributed but you would still need to get an expression for the > non-linear function f. > I don't understand why I can't get a relation between sizes and conc values if conc is log-normally distributed. Can I elaborate this a bit more? The non-linear relationship part is also confusing me. If say to test the linear relationship of x and y data pairs we just fit a line, in this case what I am looking is to fit a log-normal to get a relation between size and conc. > Extending a non-linear function outside of the observed range > is essentially always just a guess or by assumption. > Yes, I am aware of this. Just trying to put my guesses into a well-defined form. So when I am describing the analysis process I will be able tell to others that this extrapolation is a result of log-normal fitting. > > If you want to fit a curve f that has the same shape as the pdf of > the log-normal, then you cannot do it with lognorm.fit, because that > just assumes you have a random sample independent of size. > Could you give an example on this? > > So, it's not clear to me what you really want, or what your sample data > looks like (do you have only one 15 element sample or lots of them). > I have many sample points (thousands) that are composed of this 15 elements. But the whole data don't look much different the sample I used. Most peaks are around 3rd - 4th channel and decaying as shown in the figure. > > Josef > > > > > > > Thanks > > > > > > -- > > G?khan > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhansever at gmail.com Tue Nov 17 14:36:36 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Tue, 17 Nov 2009 13:36:36 -0600 Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: <3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> <3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com> Message-ID: <49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com> On Tue, Nov 17, 2009 at 12:57 PM, Robert Kern wrote: > On Tue, Nov 17, 2009 at 12:38, wrote: > > > So, it's not clear to me what you really want, or what your sample data > > looks like (do you have only one 15 element sample or lots of them). > > I'm guessing that they aren't really samples of (conc, size) pairs so > much as binned data. Correct. These are discrete sample points. > Particles with sizes between 0.1 and 0.3 um (for > example; I don't know where the bin edges actually are in his data) > have a concentration of 119.7681 particles/. True, in particles/cm^3 units > This can be normalized to a more proper histogrammed distribution, except > that the lower end of the distribution below 0.1 um has been censored > by his measuring process. He then wants to infer the continuous > distribution that generated that censored histogram so he can predict > what the distribution is in the censored region. > Exactly. Where later I am hoping to find a critical size point using another equation, and integrating upwards to obtain total concentration from that point on and do a comparison with another instrument. The 0.1 um threshold comes from the instrument limit. It can't measure below this level due to the constraint of the Mie scattering theory. > > So, I would say that it's a bit trickier than fitting the log-normal > PDF to the data for a couple of reasons. > > 1) Directly fitting PDFs to histogram values is usually not a great > idea to begin with. > 2) We don't know how much probability mass is in the censored region. > > So we agree that it is easy to implement a log-normal fit than a discrete one? > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Nov 17 14:37:33 2009 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 17 Nov 2009 13:37:33 -0600 Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: <49d6b3500911171128q78672fc7r50975a06a7c91132@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> <49d6b3500911171128q78672fc7r50975a06a7c91132@mail.gmail.com> Message-ID: <3d375d730911171137j7d2c8bdfg75bec3df6a856239@mail.gmail.com> On Tue, Nov 17, 2009 at 13:28, G?khan Sever wrote: > > > On Tue, Nov 17, 2009 at 12:38 PM, wrote: >> If conc where just lognormal distributed, then you would not get any >> relationship between conc and size. >> >> If you have many observations with conc, size pairs then you could >> estimate a noisy model >> conc = f(size) + u ?where the noise u is for example log-normal >> distributed but you would still need to get an expression for the >> non-linear function f. > > I don't understand why I can't get a relation between sizes and conc values > if conc is log-normally distributed. Can I elaborate this a bit more? The > non-linear relationship part is also confusing me. If say to test the linear > relationship of x and y data pairs we just fit a line, in this case what I > am looking is to fit a log-normal to get a relation between size and conc. It's a language issue. Your concentration values are not log-normally distributed. Your particle sizes are log-normally distributed (maybe). The concentration of a range of particle sizes is a measurement that is related to particle size the distribution, but you would not say that the measurements themselves are log-normally distributed. Josef was taking your language at face value. >> If you want to fit a curve f that has the same shape as the pdf of >> the log-normal, then you cannot do it with lognorm.fit, because that >> just assumes you have a random sample independent of size. > > Could you give an example on this? x = stats.norm.rvs() stats.norm.fit(x) >> So, it's not clear to me what you really want, or what your sample data >> looks like (do you have only one 15 element sample or lots of them). > > I have many sample points (thousands) that are composed of this 15 elements. > But the whole data don't look much different the sample I used. Most peaks > are around 3rd - 4th channel and decaying as shown in the figure. Do you need to fit a different distribution for each 15-vector? Or are all of these measurements supposed to be merged somehow? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Tue Nov 17 14:40:33 2009 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 17 Nov 2009 13:40:33 -0600 Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: <49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> <3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com> <49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com> Message-ID: <3d375d730911171140r29bea3cx865c7291b3546b12@mail.gmail.com> On Tue, Nov 17, 2009 at 13:36, G?khan Sever wrote: > On Tue, Nov 17, 2009 at 12:57 PM, Robert Kern wrote: >> So, I would say that it's a bit trickier than fitting the log-normal >> PDF to the data for a couple of reasons. >> >> 1) Directly fitting PDFs to histogram values is usually not a great >> idea to begin with. >> 2) We don't know how much probability mass is in the censored region. > > So we agree that it is easy to implement a log-normal fit than a discrete > one? No, none of the things we have suggested will work well for you. You have a more complicated task ahead of you. I have ideas that might work, but explaining them will take more time than I have. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From josef.pktd at gmail.com Tue Nov 17 15:04:20 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 17 Nov 2009 15:04:20 -0500 Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: <3d375d730911171137j7d2c8bdfg75bec3df6a856239@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> <49d6b3500911171128q78672fc7r50975a06a7c91132@mail.gmail.com> <3d375d730911171137j7d2c8bdfg75bec3df6a856239@mail.gmail.com> Message-ID: <1cd32cbb0911171204t37618c94n39a843d879dd6af9@mail.gmail.com> On Tue, Nov 17, 2009 at 2:37 PM, Robert Kern wrote: > On Tue, Nov 17, 2009 at 13:28, G?khan Sever wrote: >> >> >> On Tue, Nov 17, 2009 at 12:38 PM, wrote: > >>> If conc where just lognormal distributed, then you would not get any >>> relationship between conc and size. >>> >>> If you have many observations with conc, size pairs then you could >>> estimate a noisy model >>> conc = f(size) + u ?where the noise u is for example log-normal >>> distributed but you would still need to get an expression for the >>> non-linear function f. >> >> I don't understand why I can't get a relation between sizes and conc values >> if conc is log-normally distributed. Can I elaborate this a bit more? The >> non-linear relationship part is also confusing me. If say to test the linear >> relationship of x and y data pairs we just fit a line, in this case what I >> am looking is to fit a log-normal to get a relation between size and conc. > > It's a language issue. Your concentration values are not log-normally > distributed. Your particle sizes are log-normally distributed (maybe). > The concentration of a range of particle sizes is a measurement that > is related to particle size the distribution, but you would not say > that the measurements themselves are log-normally distributed. Josef > was taking your language at face value. The way I see it, you have to variables, size and counts (or concentration). My initial interpretation was you want to model the relationship between these two variables. When the total number of particles is fixed, then the conditional size distribution is univariate, and could be modeled by a log-normal distribution. (This still leaves the total count unmodelled.) If you have the total particle count per bin, then it should be possible to write down the likelihood function that is discretized to the bins from the continuous distribution. Given a random particle, what's the probability of being in bin 1, bin 2 and so on. Then add the log-likelihood over all particles and maximize as a function of the log-normal parameters. (There might be a numerical trick using fraction instead of conditional count, but I'm not sure what the analogous discrete distribution would be. ) Once the parameters of the log-normal distribution are estimated, the distribution would be defined over all of the real line (where the out of sample pdf is determined by assumption not data). Josef > >>> If you want to fit a curve f that has the same shape as the pdf of >>> the log-normal, then you cannot do it with lognorm.fit, because that >>> just assumes you have a random sample independent of size. >> >> Could you give an example on this? > > x = stats.norm.rvs() > stats.norm.fit(x) > >>> So, it's not clear to me what you really want, or what your sample data >>> looks like (do you have only one 15 element sample or lots of them). >> >> I have many sample points (thousands) that are composed of this 15 elements. >> But the whole data don't look much different the sample I used. Most peaks >> are around 3rd - 4th channel and decaying as shown in the figure. > > Do you need to fit a different distribution for each 15-vector? Or are > all of these measurements supposed to be merged somehow? > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > ?-- Umberto Eco > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From robert.kern at gmail.com Tue Nov 17 15:41:47 2009 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 17 Nov 2009 14:41:47 -0600 Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: <1cd32cbb0911171204t37618c94n39a843d879dd6af9@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> <49d6b3500911171128q78672fc7r50975a06a7c91132@mail.gmail.com> <3d375d730911171137j7d2c8bdfg75bec3df6a856239@mail.gmail.com> <1cd32cbb0911171204t37618c94n39a843d879dd6af9@mail.gmail.com> Message-ID: <3d375d730911171241j2a0efbbdw9b4ce39dfa73f817@mail.gmail.com> On Tue, Nov 17, 2009 at 14:04, wrote: > The way I see it, you have to variables, size and counts (or concentration). > My initial interpretation was you want to model the relationship between > these two variables. > When the total number of particles is fixed, then the conditional size > distribution is univariate, and could be modeled by a log-normal > distribution. (This still leaves the total count unmodelled.) > > If you have the total particle count per bin, then it > should be possible to write down the likelihood function that is > discretized to the bins from the continuous distribution. > Given a random particle, what's the probability of being in bin 1, > bin 2 and so on. Then add the log-likelihood over all particles > and maximize as a function of the log-normal parameters. > (There might be a numerical trick using fraction instead of > conditional count, but I'm not sure what the analogous discrete > distribution would be. ) I usually use the multinomial as the likelihood for such "histogram-fitting" exercises. The two problem points here are that we have real-valued concentrations, not integer-valued counts, and that we don't have a measurement for the censored region. For the former, I would suggest simply multiplying by the concentrations by a factor of 10 (equivalently, changing the units to particles/<10^n larger volume>) such that the resolution of the measurements is 1 particle/. Then just apply the multinomial. It should be a close enough approximation. I'm not entirely sure what to do about the censored probability mass. I think there might be a simple correction factor that you can apply to the multinomial likelihood, but I haven't worked it out. > Once the parameters of the log-normal distribution are > estimated, the distribution would be defined over all of > the real line (where the out of sample pdf is determined > by assumption not data). Since we are extrapolating to the censored region, it would probably be a good idea to estimate the uncertainty of the estimate. I would probably suggest using PyMC to do a Bayesian model. A parametric bootstrap might also serve. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From josef.pktd at gmail.com Tue Nov 17 16:01:56 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 17 Nov 2009 16:01:56 -0500 Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: <3d375d730911171241j2a0efbbdw9b4ce39dfa73f817@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> <49d6b3500911171128q78672fc7r50975a06a7c91132@mail.gmail.com> <3d375d730911171137j7d2c8bdfg75bec3df6a856239@mail.gmail.com> <1cd32cbb0911171204t37618c94n39a843d879dd6af9@mail.gmail.com> <3d375d730911171241j2a0efbbdw9b4ce39dfa73f817@mail.gmail.com> Message-ID: <1cd32cbb0911171301u4691cf87ydd3132ed7f9375d4@mail.gmail.com> On Tue, Nov 17, 2009 at 3:41 PM, Robert Kern wrote: > On Tue, Nov 17, 2009 at 14:04, ? wrote: > >> The way I see it, you have to variables, size and counts (or concentration). >> My initial interpretation was you want to model the relationship between >> these two variables. >> When the total number of particles is fixed, then the conditional size >> distribution is univariate, and could be modeled by a log-normal >> distribution. (This still leaves the total count unmodelled.) >> >> If you have the total particle count per bin, then it >> should be possible to write down the likelihood function that is >> discretized to the bins from the continuous distribution. >> Given a random particle, what's the probability of being in bin 1, >> bin 2 and so on. Then add the log-likelihood over all particles >> and maximize as a function of the log-normal parameters. >> (There might be a numerical trick using fraction instead of >> conditional count, but I'm not sure what the analogous discrete >> distribution would be. ) > > I usually use the multinomial as the likelihood for such > "histogram-fitting" exercises. The two problem points here are that we > have real-valued concentrations, not integer-valued counts, and that > we don't have a measurement for the censored region. For the former, I > would suggest simply multiplying by the concentrations by a factor of > 10 (equivalently, changing the units to particles/<10^n larger > volume>) such that the resolution of the measurements is 1 > particle/. Then just apply the multinomial. It should be a > close enough approximation. > > I'm not entirely sure what to do about the censored probability mass. > I think there might be a simple correction factor that you can apply > to the multinomial likelihood, but I haven't worked it out. I think, for the continuous distribution it would be just dividing by the probability of the not-censored region (which is also a function of the distribution parameters). This would then just be a truncated log-normal. multinomial might work the same, as long as the probabilities are defined by the discretization. Would you apply the multinomial directly? I don't see in that case how you would recover the parameters of the continuous distribution. Josef > >> Once the parameters of the log-normal distribution are >> estimated, the distribution would be defined over all of >> the real line (where the out of sample pdf is determined >> by assumption not data). > > Since we are extrapolating to the censored region, it would probably > be a good idea to estimate the uncertainty of the estimate. I would > probably suggest using PyMC to do a Bayesian model. A parametric > bootstrap might also serve. I would use bootstrap, since I still haven't figured out how to use MCMC. Josef > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > ?-- Umberto Eco > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From david_baddeley at yahoo.com.au Tue Nov 17 16:11:59 2009 From: david_baddeley at yahoo.com.au (David Baddeley) Date: Tue, 17 Nov 2009 13:11:59 -0800 (PST) Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: <3d375d730911171241j2a0efbbdw9b4ce39dfa73f817@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> <49d6b3500911171128q78672fc7r50975a06a7c91132@mail.gmail.com> <3d375d730911171137j7d2c8bdfg75bec3df6a856239@mail.gmail.com> <1cd32cbb0911171204t37618c94n39a843d879dd6af9@mail.gmail.com> <3d375d730911171241j2a0efbbdw9b4ce39dfa73f817@mail.gmail.com> Message-ID: <484601.32915.qm@web33004.mail.mud.yahoo.com> I guess it depends on how accurately you want to estimate the missing bin, and whether you can get any information about the amount of error in the individual measurements. Just looking at the curve you posted it looks like the variability at low particle sizes is a lot higher than at larger particle sizes. Although you would expect a similar effect due to the Poisson nature of counting, I'd expect it to be smaller. This might suggest that there is additional structure in your size distribution at these sizes, and that the best you can hope for with a log-normal model is a fairly rough approximation. If this is the case, I suspect you might be able to get away with just doing a least-squares fit of a log-normal model function to your measured values, potentially with weights which reflect the estimated error in each bin (obtained either by taking the std. deviation of repeated measurements, or by analysing the noise characteristics of the measurement instrument). Although it's not strictly optimal, and you ought to be aware of the potential hiccups, it's often good enough for the task at hand (I use it routinely to fit 2D Gaussians to point like objects in image data). cheers, David ----- Original Message ---- From: Robert Kern To: SciPy Users List Sent: Wed, 18 November, 2009 9:41:47 AM Subject: Re: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data On Tue, Nov 17, 2009 at 14:04, wrote: > The way I see it, you have to variables, size and counts (or concentration). > My initial interpretation was you want to model the relationship between > these two variables. > When the total number of particles is fixed, then the conditional size > distribution is univariate, and could be modeled by a log-normal > distribution. (This still leaves the total count unmodelled.) > > If you have the total particle count per bin, then it > should be possible to write down the likelihood function that is > discretized to the bins from the continuous distribution. > Given a random particle, what's the probability of being in bin 1, > bin 2 and so on. Then add the log-likelihood over all particles > and maximize as a function of the log-normal parameters. > (There might be a numerical trick using fraction instead of > conditional count, but I'm not sure what the analogous discrete > distribution would be. ) I usually use the multinomial as the likelihood for such "histogram-fitting" exercises. The two problem points here are that we have real-valued concentrations, not integer-valued counts, and that we don't have a measurement for the censored region. For the former, I would suggest simply multiplying by the concentrations by a factor of 10 (equivalently, changing the units to particles/<10^n larger volume>) such that the resolution of the measurements is 1 particle/. Then just apply the multinomial. It should be a close enough approximation. I'm not entirely sure what to do about the censored probability mass. I think there might be a simple correction factor that you can apply to the multinomial likelihood, but I haven't worked it out. > Once the parameters of the log-normal distribution are > estimated, the distribution would be defined over all of > the real line (where the out of sample pdf is determined > by assumption not data). Since we are extrapolating to the censored region, it would probably be a good idea to estimate the uncertainty of the estimate. I would probably suggest using PyMC to do a Bayesian model. A parametric bootstrap might also serve. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From robert.kern at gmail.com Tue Nov 17 16:12:17 2009 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 17 Nov 2009 15:12:17 -0600 Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: <1cd32cbb0911171301u4691cf87ydd3132ed7f9375d4@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> <49d6b3500911171128q78672fc7r50975a06a7c91132@mail.gmail.com> <3d375d730911171137j7d2c8bdfg75bec3df6a856239@mail.gmail.com> <1cd32cbb0911171204t37618c94n39a843d879dd6af9@mail.gmail.com> <3d375d730911171241j2a0efbbdw9b4ce39dfa73f817@mail.gmail.com> <1cd32cbb0911171301u4691cf87ydd3132ed7f9375d4@mail.gmail.com> Message-ID: <3d375d730911171312k5def059p16e90d7d6063f08@mail.gmail.com> On Tue, Nov 17, 2009 at 15:01, wrote: > On Tue, Nov 17, 2009 at 3:41 PM, Robert Kern wrote: >> On Tue, Nov 17, 2009 at 14:04, ? wrote: >> >>> The way I see it, you have to variables, size and counts (or concentration). >>> My initial interpretation was you want to model the relationship between >>> these two variables. >>> When the total number of particles is fixed, then the conditional size >>> distribution is univariate, and could be modeled by a log-normal >>> distribution. (This still leaves the total count unmodelled.) >>> >>> If you have the total particle count per bin, then it >>> should be possible to write down the likelihood function that is >>> discretized to the bins from the continuous distribution. >>> Given a random particle, what's the probability of being in bin 1, >>> bin 2 and so on. Then add the log-likelihood over all particles >>> and maximize as a function of the log-normal parameters. >>> (There might be a numerical trick using fraction instead of >>> conditional count, but I'm not sure what the analogous discrete >>> distribution would be. ) >> >> I usually use the multinomial as the likelihood for such >> "histogram-fitting" exercises. The two problem points here are that we >> have real-valued concentrations, not integer-valued counts, and that >> we don't have a measurement for the censored region. For the former, I >> would suggest simply multiplying by the concentrations by a factor of >> 10 (equivalently, changing the units to particles/<10^n larger >> volume>) such that the resolution of the measurements is 1 >> particle/. Then just apply the multinomial. It should be a >> close enough approximation. >> >> I'm not entirely sure what to do about the censored probability mass. >> I think there might be a simple correction factor that you can apply >> to the multinomial likelihood, but I haven't worked it out. > > I think, for the continuous distribution it would be just dividing by > the probability of the not-censored region (which is also a function of > the distribution parameters). This would then just be a truncated > log-normal. multinomial might work the same, as long as the > probabilities are defined by the discretization. > > Would you apply the multinomial directly? I don't see in that case > how you would recover the parameters of the continuous distribution. You would just be using the multinomial to build the likelihood. For each iteration in the likelihood maximization, you are given the parameters of the continuous distribution. Given the bin edges and those parameters, you compute the probability mass within each bin for that specific distribution (the difference of the CDF between bin edges). That is the p-vector for the multinomial. The probability of getting the observed counts is the likelihood for the given parameters of the continuous distribution. And now that I think about it, you don't need to apply any correction to the multinomial in the likelihood. The number of counts in the censored region is just another unknown parameter to optimize along with the continuous distribution's parameters. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From lorenzo.isella at gmail.com Tue Nov 17 17:03:14 2009 From: lorenzo.isella at gmail.com (Lorenzo Isella) Date: Tue, 17 Nov 2009 23:03:14 +0100 Subject: [SciPy-User] Fitting a curve on a log-normal distributed data In-Reply-To: References: Message-ID: <4B031DA2.8040400@gmail.com> > Date: Mon, 16 Nov 2009 23:44:17 -0600 > From: G?khan Sever > Subject: [SciPy-User] Fitting a curve on a log-normal distributed data > To: Discussion of Numerical Python , SciPy > Users List > Message-ID: > <49d6b3500911162144x1193e04cj1a103776092c4471 at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hello, > > I have a data which represents aerosol size distribution in between 0.1 to > 3.0 micrometer ranges. I would like extrapolate the lower size down to 10 > nm. The data in this context is log-normally distributed. Therefore I am > looking a way to fit a log-normal curve onto my data. Could you please give > me some pointers to solve this problem? > > Thank you. > > Hello, I have not followed the many replies to this long post in detail, but by chance I happen to know quite in detail what you are talking about (probably SMPS data or similar). I normally resort to R for this kind of tasks (http://www.r-project.org/), but nothing prevents you from using Python instead. You just want to compare your empirical data binning with what would be expected from a lognormal distribution. Please have a look at http://tinyurl.com/ygmw4lc and at the functions defined there (A1, mu1 and myvar1 are the overall concentration, the geometric mean and the std of the number-size distribution, respectively). Cheers Lorenzo From gokhansever at gmail.com Tue Nov 17 17:07:17 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Tue, 17 Nov 2009 16:07:17 -0600 Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: <3d375d730911171137j7d2c8bdfg75bec3df6a856239@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> <49d6b3500911171128q78672fc7r50975a06a7c91132@mail.gmail.com> <3d375d730911171137j7d2c8bdfg75bec3df6a856239@mail.gmail.com> Message-ID: <49d6b3500911171407o19efbbf1t996bd33c698b4e2b@mail.gmail.com> On Tue, Nov 17, 2009 at 1:37 PM, Robert Kern wrote: > On Tue, Nov 17, 2009 at 13:28, G?khan Sever wrote: > > > > > > On Tue, Nov 17, 2009 at 12:38 PM, wrote: > > >> If conc where just lognormal distributed, then you would not get any > >> relationship between conc and size. > >> > >> If you have many observations with conc, size pairs then you could > >> estimate a noisy model > >> conc = f(size) + u where the noise u is for example log-normal > >> distributed but you would still need to get an expression for the > >> non-linear function f. > > > > I don't understand why I can't get a relation between sizes and conc > values > > if conc is log-normally distributed. Can I elaborate this a bit more? The > > non-linear relationship part is also confusing me. If say to test the > linear > > relationship of x and y data pairs we just fit a line, in this case what > I > > am looking is to fit a log-normal to get a relation between size and > conc. > > It's a language issue. Your concentration values are not log-normally > distributed. Your particle sizes are log-normally distributed (maybe). > The concentration of a range of particle sizes is a measurement that > is related to particle size the distribution, but you would not say > that the measurements themselves are log-normally distributed. Josef > was taking your language at face value. > > >> If you want to fit a curve f that has the same shape as the pdf of > >> the log-normal, then you cannot do it with lognorm.fit, because that > >> just assumes you have a random sample independent of size. > > > > Could you give an example on this? > > x = stats.norm.rvs() > stats.norm.fit(x) > > >> So, it's not clear to me what you really want, or what your sample data > >> looks like (do you have only one 15 element sample or lots of them). > > > > I have many sample points (thousands) that are composed of this 15 > elements. > > But the whole data don't look much different the sample I used. Most > peaks > > are around 3rd - 4th channel and decaying as shown in the figure. > > Do you need to fit a different distribution for each 15-vector? Or are > all of these measurements supposed to be merged somehow? > For my comparison case I will use an hour length of data, which are composed of 3600 sample points. At each minute I will average these points. This is because I am comparing data from two different instruments and by averaging I am trying to eliminate intrinsic measurement error. It is really not an easy task to make point by point comparison in my case. So in the end I will have 60 averaged data-points where each point composed of 15-elements in them. Later use the same fitting technique to guess the out-ouf-the-measurement-limits parts. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhansever at gmail.com Tue Nov 17 17:21:27 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Tue, 17 Nov 2009 16:21:27 -0600 Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: <3d375d730911171140r29bea3cx865c7291b3546b12@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> <3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com> <49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com> <3d375d730911171140r29bea3cx865c7291b3546b12@mail.gmail.com> Message-ID: <49d6b3500911171421x27b2055duc17558c3df3a179e@mail.gmail.com> On Tue, Nov 17, 2009 at 1:40 PM, Robert Kern wrote: > On Tue, Nov 17, 2009 at 13:36, G?khan Sever wrote: > > On Tue, Nov 17, 2009 at 12:57 PM, Robert Kern > wrote: > > >> So, I would say that it's a bit trickier than fitting the log-normal > >> PDF to the data for a couple of reasons. > >> > >> 1) Directly fitting PDFs to histogram values is usually not a great > >> idea to begin with. > >> 2) We don't know how much probability mass is in the censored region. > > > > So we agree that it is easy to implement a log-normal fit than a discrete > > one? > > No, none of the things we have suggested will work well for you. You > have a more complicated task ahead of you. I have ideas that might > work, but explaining them will take more time than I have. > Looking at some recent replies and re-reading them a couple times, I should say the techniques mentioned in them are beyond my technical skills or at least I need a professor to help me or a good statistics book to study further. I should also note that this is just a feasibility study comparing actual observed cloud condensation nuclei concentration measurements to the modelled concentrations using another instrument's size distribution data with the help of a thermodynamic particle activation equation which I will be able to infer an activation size limit. The results that are found in this study will not be placed on a journal, they will just be presented in my cloud physics class presentation. I am trying to assess the sources of errors and testing the usability of the size distributions from that particular instrument in this comparison study. Extending the size distribution beyond and below the instruments measurement limit is one of the biggest source of errors to represent the reality, but of course there other simplifications and assumptions that add uncertainties. Besides, what is wrong with using the spline interpolation technique? It fits nicely on my sample data. See the resulting image here: http://img197.imageshack.us/img197/9638/sizeconcsplinefit.png (Green line represents the fit spline) > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Nov 17 17:27:06 2009 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 17 Nov 2009 16:27:06 -0600 Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: <49d6b3500911171421x27b2055duc17558c3df3a179e@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> <3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com> <49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com> <3d375d730911171140r29bea3cx865c7291b3546b12@mail.gmail.com> <49d6b3500911171421x27b2055duc17558c3df3a179e@mail.gmail.com> Message-ID: <3d375d730911171427k6f5a9771p9f7a8cf125395f20@mail.gmail.com> On Tue, Nov 17, 2009 at 16:21, G?khan Sever wrote: > Besides, what is wrong with using the spline interpolation technique? It > fits nicely on my sample data. See the resulting image here: > http://img197.imageshack.us/img197/9638/sizeconcsplinefit.png??? (Green line > represents the fit spline) What spline interpolation technique? That certainly doesn't look like a good spline fit. In any case, splines may be fine for *interpolation*, but you need *extrapolation*, and splines are useless for that. You need a physically-motivated model like the distributions recommended by your textbook. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From gokhansever at gmail.com Tue Nov 17 17:30:56 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Tue, 17 Nov 2009 16:30:56 -0600 Subject: [SciPy-User] Fitting a curve on a log-normal distributed data In-Reply-To: <4B031DA2.8040400@gmail.com> References: <4B031DA2.8040400@gmail.com> Message-ID: <49d6b3500911171430i1f85ca94y8a56300ec1952607@mail.gmail.com> On Tue, Nov 17, 2009 at 4:03 PM, Lorenzo Isella wrote: > > Date: Mon, 16 Nov 2009 23:44:17 -0600 >> From: G?khan Sever >> Subject: [SciPy-User] Fitting a curve on a log-normal distributed data >> To: Discussion of Numerical Python , >> SciPy >> >> Users List >> Message-ID: >> <49d6b3500911162144x1193e04cj1a103776092c4471 at mail.gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> >> Hello, >> >> I have a data which represents aerosol size distribution in between 0.1 to >> 3.0 micrometer ranges. I would like extrapolate the lower size down to 10 >> nm. The data in this context is log-normally distributed. Therefore I am >> looking a way to fit a log-normal curve onto my data. Could you please >> give >> me some pointers to solve this problem? >> >> Thank you. >> >> >> > Hello, > I have not followed the many replies to this long post in detail, but by > chance I happen to know quite in detail what you are talking about (probably > SMPS data or similar). > I normally resort to R for this kind of tasks (http://www.r-project.org/), > but nothing prevents you from using Python instead. You just want to compare > your empirical data binning with what would be expected from a lognormal > distribution. Please have a look at > http://tinyurl.com/ygmw4lc > and at the functions defined there (A1, mu1 and myvar1 are the overall > concentration, the geometric mean and the std of the number-size > distribution, respectively). > Cheers > Hey Lorenzo, Finally someone who knows the heart of the subject :) Thanks for stopping by. The data that I am using is Passive Cavity Aerosol Spectrometer (PCASP) measured size-distributions. Unfortunately even if we had the mains part of the SMPS instrument we couldn't fly it since the radioactive element was not reached during campaign. It is always an issue to deliver the radioactive parts out of the US :) Anyways assuming that the relative humidity was quite low in the measurement region I am not expecting a huge deviation from the dry-particle size definition. But as I said above this is just a feasibility study. I will test and see how much an error I will get with this method. Besides there is no information regarding to the chemical composition of the aerosols, therefore I am basing on kappa-kohler theory and making another simplification at that point. Could you please send this script off-list and the data file associated with it? Would be greatly appreciated. > > Lorenzo > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhansever at gmail.com Tue Nov 17 17:42:44 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Tue, 17 Nov 2009 16:42:44 -0600 Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: <3d375d730911171427k6f5a9771p9f7a8cf125395f20@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> <3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com> <49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com> <3d375d730911171140r29bea3cx865c7291b3546b12@mail.gmail.com> <49d6b3500911171421x27b2055duc17558c3df3a179e@mail.gmail.com> <3d375d730911171427k6f5a9771p9f7a8cf125395f20@mail.gmail.com> Message-ID: <49d6b3500911171442h718384bdy4ef365626f728cc5@mail.gmail.com> On Tue, Nov 17, 2009 at 4:27 PM, Robert Kern wrote: > On Tue, Nov 17, 2009 at 16:21, G?khan Sever wrote: > > > Besides, what is wrong with using the spline interpolation technique? It > > fits nicely on my sample data. See the resulting image here: > > http://img197.imageshack.us/img197/9638/sizeconcsplinefit.png (Green > line > > represents the fit spline) > > What spline interpolation technique? >From here http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html Spline interpolation in 1-d (interpolate.splXXX) That certainly doesn't look like > a good spline fit. True, because I used only 30 points. It looks much smoother with alot more point as you might expected. > In any case, splines may be fine for > *interpolation*, but you need *extrapolation*, and splines are useless > for that. > You need a physically-motivated model like the distributions > recommended by your textbook. > > Using spline-interp is a test case to see how good it will do on my data. I will use log-normal way as was in the original intention. Let me check with someone else in the department to get some feedback on this before I completely get lost in the matter. One quick question: "extrapolation" means to estimate a data both "beyond" and "below" the given limits, right? (For my example to guess less than 0.1um should I say downward-extrapolation and above 3.0 um upward-extrapolation or just extrapolation is enough?) > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Nov 17 17:58:43 2009 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 17 Nov 2009 16:58:43 -0600 Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: <49d6b3500911171442h718384bdy4ef365626f728cc5@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> <3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com> <49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com> <3d375d730911171140r29bea3cx865c7291b3546b12@mail.gmail.com> <49d6b3500911171421x27b2055duc17558c3df3a179e@mail.gmail.com> <3d375d730911171427k6f5a9771p9f7a8cf125395f20@mail.gmail.com> <49d6b3500911171442h718384bdy4ef365626f728cc5@mail.gmail.com> Message-ID: <3d375d730911171458n2b8c49dfl22eecf8e6f4a8b57@mail.gmail.com> On Tue, Nov 17, 2009 at 16:42, G?khan Sever wrote: > > On Tue, Nov 17, 2009 at 4:27 PM, Robert Kern wrote: >> >> On Tue, Nov 17, 2009 at 16:21, G?khan Sever wrote: >> >> > Besides, what is wrong with using the spline interpolation technique? It >> > fits nicely on my sample data. See the resulting image here: >> > http://img197.imageshack.us/img197/9638/sizeconcsplinefit.png??? (Green >> > line >> > represents the fit spline) >> >> What spline interpolation technique? > > From here > http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html > > Spline interpolation in 1-d (interpolate.splXXX) > >> That certainly doesn't look like >> a good spline fit. > > True, because I used only 30 points. It looks much smoother with alot more > point as you might expected. Don't judge it based on its smoothness at many points. The smooth appearance is simply a function of the number of points you choose to sample it with, not how well it fits the data. Even if you weren't dealing with an extrapolation problem, you shouldn't use spline interpolation* on noisy data. You would do something like least-squares fitting to a low-order spline. The spline should not go through the observed data points exactly. * And this brings up another terminological issue. I may have used the term "interpolation" in a couple of different ways. There is a general sense in which "interpolate" means "to make predictions about certain inputs (e.g. the concentration [prediction] for the given particle size [input]) within the range of observed inputs". Whereas, "interpolate" can also mean something much more specific: finding a curve that exactly goes through the given observations. "Spline interpolation" would be a form of the latter, and is not related to what you need. >> In any case, splines may be fine for >> *interpolation*, but you need *extrapolation*, and splines are useless >> for that. >> >> You need a physically-motivated model like the distributions >> recommended by your textbook. > > Using spline-interp is a test case to see how good it will do on my data. Good. I just wanted to make sure that you knew what was wrong with using splines in this case. :-) > I > will use log-normal way as was in the original intention. Let me check with > someone else in the department to get some feedback on this before I > completely get lost in the matter. Always wise. :-) > One quick question: "extrapolation" means to estimate a data both "beyond" > and "below" the given limits, right? (For my example to guess less than > 0.1um should I say downward-extrapolation and above 3.0 um > upward-extrapolation or just extrapolation is enough?) Just "extrapolation" can describe either case, yes. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From gokhansever at gmail.com Tue Nov 17 18:52:34 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Tue, 17 Nov 2009 17:52:34 -0600 Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: <3d375d730911171458n2b8c49dfl22eecf8e6f4a8b57@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> <49d6b3500911170929rccaa0e7k75f1450a2f48e519@mail.gmail.com> <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> <3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com> <49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com> <3d375d730911171140r29bea3cx865c7291b3546b12@mail.gmail.com> <49d6b3500911171421x27b2055duc17558c3df3a179e@mail.gmail.com> <3d375d730911171427k6f5a9771p9f7a8cf125395f20@mail.gmail.com> <49d6b3500911171442h718384bdy4ef365626f728cc5@mail.gmail.com> <3d375d730911171458n2b8c49dfl22eecf8e6f4a8b57@mail.gmail.com> Message-ID: <49d6b3500911171552n739ae92as79062bedf0391a93@mail.gmail.com> On Tue, Nov 17, 2009 at 4:58 PM, Robert Kern wrote: > On Tue, Nov 17, 2009 at 16:42, G?khan Sever wrote: > > > > On Tue, Nov 17, 2009 at 4:27 PM, Robert Kern > wrote: > >> > >> On Tue, Nov 17, 2009 at 16:21, G?khan Sever > wrote: > >> > >> > Besides, what is wrong with using the spline interpolation technique? > It > >> > fits nicely on my sample data. See the resulting image here: > >> > http://img197.imageshack.us/img197/9638/sizeconcsplinefit.png > (Green > >> > line > >> > represents the fit spline) > >> > >> What spline interpolation technique? > > > > From here > > http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html > > > > Spline interpolation in 1-d (interpolate.splXXX) > > > >> That certainly doesn't look like > >> a good spline fit. > > > > True, because I used only 30 points. It looks much smoother with alot > more > > point as you might expected. > > Don't judge it based on its smoothness at many points. The smooth > appearance is simply a function of the number of points you choose to > sample it with, not how well it fits the data. > > Even if you weren't dealing with an extrapolation problem, you > shouldn't use spline interpolation* on noisy data. You would do > something like least-squares fitting to a low-order spline. The spline > should not go through the observed data points exactly. > > * And this brings up another terminological issue. I may have used the > term "interpolation" in a couple of different ways. There is a general > sense in which "interpolate" means "to make predictions about certain > inputs (e.g. the concentration [prediction] for the given particle > size [input]) within the range of observed inputs". Whereas, > "interpolate" can also mean something much more specific: finding a > curve that exactly goes through the given observations. "Spline > interpolation" would be a form of the latter, and is not related to > what you need. > > >> In any case, splines may be fine for > >> *interpolation*, but you need *extrapolation*, and splines are useless > >> for that. > >> > >> You need a physically-motivated model like the distributions > >> recommended by your textbook. > > > > Using spline-interp is a test case to see how good it will do on my data. > > Good. I just wanted to make sure that you knew what was wrong with > using splines in this case. :-) > > > I > > will use log-normal way as was in the original intention. Let me check > with > > someone else in the department to get some feedback on this before I > > completely get lost in the matter. > > Always wise. :-) > Talking to another guy creating second modal (probably a normal distributed way) might be the other approach to take in addition to log-normally extrapolating the data. In any case, I should be able to parametrize the fits since I will do integration once I am done with the extrapolation part. I asked this in one of my early replies just repeating what is the way to get log-normal sample using scipy.stats? I will use it for a demonstrative case. For some reason, this never looks an expected log-normal sample to me: stats.lognorm.rvs(1,size=15) What am I missing here? > > > One quick question: "extrapolation" means to estimate a data both > "beyond" > > and "below" the given limits, right? (For my example to guess less than > > 0.1um should I say downward-extrapolation and above 3.0 um > > upward-extrapolation or just extrapolation is enough?) > > Just "extrapolation" can describe either case, yes. > > Thanks for your time and explanations Robert. I really appreciate your help. Probably I will include you in the acknowledgements part of my presentation. > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Nov 17 19:00:40 2009 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 17 Nov 2009 18:00:40 -0600 Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: <49d6b3500911171552n739ae92as79062bedf0391a93@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> <1cd32cbb0911171038w363e055ct3c59ba900ee1e8b4@mail.gmail.com> <3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com> <49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com> <3d375d730911171140r29bea3cx865c7291b3546b12@mail.gmail.com> <49d6b3500911171421x27b2055duc17558c3df3a179e@mail.gmail.com> <3d375d730911171427k6f5a9771p9f7a8cf125395f20@mail.gmail.com> <49d6b3500911171442h718384bdy4ef365626f728cc5@mail.gmail.com> <3d375d730911171458n2b8c49dfl22eecf8e6f4a8b57@mail.gmail.com> <49d6b3500911171552n739ae92as79062bedf0391a93@mail.gmail.com> Message-ID: <3d375d730911171600i5dbc9dd1mcf6d4dc5ccc9c568@mail.gmail.com> On Tue, Nov 17, 2009 at 17:52, G?khan Sever wrote: > I asked this in one of my early replies just repeating what is the way to > get log-normal sample using scipy.stats? I will use it for a demonstrative > case. > For some reason, this never looks an expected log-normal sample to me: > > stats.lognorm.rvs(1,size=15) > > What am I missing here? Are you expecting that to look like your 15-vector concentration data? That's not what you should expect. Instead, x = stats.lognorm.rvs(1, size=10000) h = np.histogram(x, bins=15) Now, the *histogram* of the samples should look like roughly like the shapes that you are expecting. .rvs() produces the samples themselves. I.e. pretend like each element is the size of an individual particle, not the concentration of a size class of particles. Taking the histogram "simulates" what your instrument does: it finds the amont of particles in each size class. > Thanks for your time and explanations Robert. I really appreciate your help. > Probably I will include you in the acknowledgements part of my presentation. Entirely unnecessary, of course. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From gokhansever at gmail.com Tue Nov 17 19:36:42 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Tue, 17 Nov 2009 18:36:42 -0600 Subject: [SciPy-User] [Numpy-discussion] Fitting a curve on a log-normal distributed data In-Reply-To: <3d375d730911171600i5dbc9dd1mcf6d4dc5ccc9c568@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> <3d375d730911171057g4742284fge322cd4e471c7ec@mail.gmail.com> <49d6b3500911171136m2986a26etcfb0e0116fc00b63@mail.gmail.com> <3d375d730911171140r29bea3cx865c7291b3546b12@mail.gmail.com> <49d6b3500911171421x27b2055duc17558c3df3a179e@mail.gmail.com> <3d375d730911171427k6f5a9771p9f7a8cf125395f20@mail.gmail.com> <49d6b3500911171442h718384bdy4ef365626f728cc5@mail.gmail.com> <3d375d730911171458n2b8c49dfl22eecf8e6f4a8b57@mail.gmail.com> <49d6b3500911171552n739ae92as79062bedf0391a93@mail.gmail.com> <3d375d730911171600i5dbc9dd1mcf6d4dc5ccc9c568@mail.gmail.com> Message-ID: <49d6b3500911171636n2fbb5bddt58bb21a0057742a6@mail.gmail.com> On Tue, Nov 17, 2009 at 6:00 PM, Robert Kern wrote: > On Tue, Nov 17, 2009 at 17:52, G?khan Sever wrote: > > > I asked this in one of my early replies just repeating what is the way to > > get log-normal sample using scipy.stats? I will use it for a > demonstrative > > case. > > For some reason, this never looks an expected log-normal sample to me: > > > > stats.lognorm.rvs(1,size=15) > > > > What am I missing here? > > Are you expecting that to look like your 15-vector concentration data? > That's not what you should expect. Instead, > > x = stats.lognorm.rvs(1, size=10000) > h = np.histogram(x, bins=15) > > Now, the *histogram* of the samples should look like roughly like the > shapes that you are expecting. .rvs() produces the samples themselves. > I.e. pretend like each element is the size of an individual particle, > not the concentration of a size class of particles. Taking the > histogram "simulates" what your instrument does: it finds the amont of > particles in each size class. > Now, I see it better. Makes much more sense now. > > > Thanks for your time and explanations Robert. I really appreciate your > help. > > Probably I will include you in the acknowledgements part of my > presentation. > > Entirely unnecessary, of course. > Not at all ;) > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From dpfrota at yahoo.com.br Wed Nov 18 01:29:02 2009 From: dpfrota at yahoo.com.br (dpfrota) Date: Tue, 17 Nov 2009 22:29:02 -0800 (PST) Subject: [SciPy-User] [SciPy-user] Audiolab on Py2.6 In-Reply-To: <4AE5DEDF.7070701@asu.edu> References: <4AE5DEDF.7070701@asu.edu> Message-ID: <26402986.post@talk.nabble.com> What is the meaning of these adresses? I opened these files, and they has some strange lines. The first file has only " __import__('pkg_resources').declare_namespace(__name__) ". Is module PKG necessary? And the second has a comment line that looks a code line. I am looking forward to run Audiolab... Thanks for helping, and forgive my (probably) mistakes! Christopher Brown wrote: > > Hi List, > > Has anyone gotten scikits.audiolab working with python 2.6? Here is the > error I get on a clean Python 2.6 install with numpy and audiolab > installed (using the audiolab 0.10.2 installer for py2.6 I downloaded > from pypi, and a clean Win XPSP3 install): > > >>> from scikits import audiolab > Traceback (most recent call last): > File "C:\Python26\lib\site-packages\scikits\audiolab\__init__.py", > line 25, in > from pysndfile import formatinfo, sndfile > File > "C:\Python26\lib\site-packages\scikits\audiolab\pysndfile\__init__.py", > line 1, in > from _sndfile import Sndfile, Format, available_file_formats, > available_encodings > ImportError: DLL load failed: The specified procedure could not be found. > > Any ideas? Everything works fine on py2.5. > > -- > Chris > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- View this message in context: http://old.nabble.com/Audiolab-on-Py2.6-tp26064218p26402986.html Sent from the Scipy-User mailing list archive at Nabble.com. From robert.kern at gmail.com Wed Nov 18 01:31:57 2009 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 18 Nov 2009 00:31:57 -0600 Subject: [SciPy-User] [SciPy-user] Audiolab on Py2.6 In-Reply-To: <26402986.post@talk.nabble.com> References: <4AE5DEDF.7070701@asu.edu> <26402986.post@talk.nabble.com> Message-ID: <3d375d730911172231i4cf42760l80038a00f84fa7c8@mail.gmail.com> On Wed, Nov 18, 2009 at 00:29, dpfrota wrote: > > What is the meaning of these adresses? > I opened these files, and they has some strange lines. The first file has > only " __import__('pkg_resources').declare_namespace(__name__) ". Is module > PKG necessary? These enable the scikits namespace such that you can have multiple scikits packages installed (possibly to separate locations). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From jr at sun.ac.za Wed Nov 18 05:53:06 2009 From: jr at sun.ac.za (Johann Rohwer) Date: Wed, 18 Nov 2009 12:53:06 +0200 Subject: [SciPy-User] SciPy build error Message-ID: <200911181253.06501.jr@sun.ac.za> I get the following build error for scipy (both numpy and scipy fresh from SVN today): building 'scipy.special.lambertw' extension compiling C sources C compiler: gcc -fno-strict-aliasing -DNDEBUG -O2 -g -pipe -Wformat - Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack- protector --param=ssp-buffer-size=4 -g -fPIC compile options: '-I/usr/lib64/python2.6/site- packages/numpy/core/include -I/usr/lib64/python2.6/site- packages/numpy/core/include -I/usr/include/python2.6 -c' gcc: scipy/special/lambertw.c scipy/special/lambertw.c: In function ?__pyx_f_5scipy_7special_8lambertw_zlog?: scipy/special/lambertw.c:562: error: incompatible types when assigning to type ?npy_cdouble? from type ?int? scipy/special/lambertw.c: In function ?__pyx_f_5scipy_7special_8lambertw_zexp?: scipy/special/lambertw.c:599: error: incompatible types when assigning to type ?npy_cdouble? from type ?int? scipy/special/lambertw.c: In function ?__pyx_f_5scipy_7special_8lambertw_zlog?: scipy/special/lambertw.c:562: error: incompatible types when assigning to type ?npy_cdouble? from type ?int? scipy/special/lambertw.c: In function ?__pyx_f_5scipy_7special_8lambertw_zexp?: scipy/special/lambertw.c:599: error: incompatible types when assigning to type ?npy_cdouble? from type ?int? error: Command "gcc -fno-strict-aliasing -DNDEBUG -O2 -g -pipe - Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions - fstack-protector --param=ssp-buffer-size=4 -g -fPIC - I/usr/lib64/python2.6/site-packages/numpy/core/include - I/usr/lib64/python2.6/site-packages/numpy/core/include - I/usr/include/python2.6 -c scipy/special/lambertw.c -o build/temp.linux-x86_64-2.6/scipy/special/lambertw.o" failed with exit status 1 System: Linux x86_64 gcc version 4.4.1 Self compiled ATLAS 3.8.0 and LAPACK 3.1.1 (Numpy installs fine and passes all tests.) Any ideas? J. From gael.varoquaux at normalesup.org Wed Nov 18 08:46:13 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 18 Nov 2009 14:46:13 +0100 Subject: [SciPy-User] Graph connect components and sparse matrices Message-ID: <20091118134613.GB17382@phare.normalesup.org> Hi there, I would like to list the connect components of a graph (or a sparse matrix, same thing). I know of course of the bread-first traversal, as implemented eg in networkX, to find the connect components. However, I have a feeling that sparse linear algebra must be performing such searches, to decompose sparse matrices in blocks. I'd love to piggy back on such implementations, rather than code and maintain a C or cython version of breadth-first graph traversal. Any idea how I could squeeze the information out of the sparse linear algebra that we carry around with scipy? I thought about using arpack to get the largest eigen vectors of the transition matrix, but that was a stupid idea, as (AFAIK) it will not partition my graph in connect components, but only tell me how many connect components I have. Ga?l From zachary.pincus at yale.edu Wed Nov 18 09:03:18 2009 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Wed, 18 Nov 2009 09:03:18 -0500 Subject: [SciPy-User] Graph connect components and sparse matrices In-Reply-To: <20091118134613.GB17382@phare.normalesup.org> References: <20091118134613.GB17382@phare.normalesup.org> Message-ID: <7E884924-F811-4440-B107-DA86F76A150F@yale.edu> Hi Ga?l, > Any idea how I could squeeze the information out of the sparse linear > algebra that we carry around with scipy? I thought about using > arpack to > get the largest eigen vectors of the transition matrix, but that was a > stupid idea, as (AFAIK) it will not partition my graph in connect > components, but only tell me how many connect components I have. From this useful tutorial on spectral clustering: http://www.kyb.tuebingen.mpg.de/bs/people/ule/publications/publication_downloads/Luxburg07_tutorial.pdf > Thus, the matrix L has as many eigenvalues 0 as there are connected > components, and > the corresponding eigenvectors are the indicator vectors of the > connected components. (where L is the graph laplacian). Zach On Nov 18, 2009, at 8:46 AM, Gael Varoquaux wrote: > Hi there, > > I would like to list the connect components of a graph (or a sparse > matrix, same thing). I know of course of the bread-first traversal, as > implemented eg in networkX, to find the connect components. However, I > have a feeling that sparse linear algebra must be performing such > searches, to decompose sparse matrices in blocks. I'd love to piggy > back > on such implementations, rather than code and maintain a C or cython > version of breadth-first graph traversal. > > Any idea how I could squeeze the information out of the sparse linear > algebra that we carry around with scipy? I thought about using > arpack to > get the largest eigen vectors of the transition matrix, but that was a > stupid idea, as (AFAIK) it will not partition my graph in connect > components, but only tell me how many connect components I have. > > Ga?l > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From cimrman3 at ntc.zcu.cz Wed Nov 18 09:03:43 2009 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Wed, 18 Nov 2009 15:03:43 +0100 Subject: [SciPy-User] Graph connect components and sparse matrices In-Reply-To: <20091118134613.GB17382@phare.normalesup.org> References: <20091118134613.GB17382@phare.normalesup.org> Message-ID: <4B03FEBF.3080308@ntc.zcu.cz> Hi Gael, Gael Varoquaux wrote: > Hi there, > > I would like to list the connect components of a graph (or a sparse > matrix, same thing). I know of course of the bread-first traversal, as > implemented eg in networkX, to find the connect components. However, I > have a feeling that sparse linear algebra must be performing such > searches, to decompose sparse matrices in blocks. I'd love to piggy back > on such implementations, rather than code and maintain a C or cython > version of breadth-first graph traversal. I have a function in C (as a part of sfepy), that does that. But as it might be useful for more people, what about putting it scipy sparsetools? > Any idea how I could squeeze the information out of the sparse linear > algebra that we carry around with scipy? I thought about using arpack to > get the largest eigen vectors of the transition matrix, but that was a > stupid idea, as (AFAIK) it will not partition my graph in connect > components, but only tell me how many connect components I have. Getting eigenvectors is imho more costly that the graph search, no? r. From gael.varoquaux at normalesup.org Wed Nov 18 09:12:59 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 18 Nov 2009 15:12:59 +0100 Subject: [SciPy-User] Graph connect components and sparse matrices In-Reply-To: <7E884924-F811-4440-B107-DA86F76A150F@yale.edu> References: <20091118134613.GB17382@phare.normalesup.org> <7E884924-F811-4440-B107-DA86F76A150F@yale.edu> Message-ID: <20091118141259.GA3477@phare.normalesup.org> On Wed, Nov 18, 2009 at 09:03:18AM -0500, Zachary Pincus wrote: > From this useful tutorial on spectral clustering: > http://www.kyb.tuebingen.mpg.de/bs/people/ule/publications/publication_downloads/Luxburg07_tutorial.pdf > > Thus, the matrix L has as many eigenvalues 0 as there are connected > > components, and the corresponding eigenvectors are the indicator > > vectors of the connected components. > (where L is the graph laplacian). I read this tutorial (a very good one, by the way). But I am too dumb to figure out from the above assertion how to retrieve the connect components. Let me explain my problem on an example. Suppose that we have the trivial graph. Its adjacency matrix is the identity, the corresponding laplacian is null. An EVD of these matrices will result in an abitrary orthonormal basis of my vertex space. How do I figure out the connect components from that? The problem arises also on non trivial graphs, by the way. The problem is that doing an EVD of the transition of laplace matrix only gives a subspace of the kernel of the laplace matrix. I could probably do a sparse matrix factorization on that, but I see complexity and cost coming in, and I am trying to avoid that. Thanks for your answer, Ga?l From gael.varoquaux at normalesup.org Wed Nov 18 09:16:38 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 18 Nov 2009 15:16:38 +0100 Subject: [SciPy-User] Graph connect components and sparse matrices In-Reply-To: <4B03FEBF.3080308@ntc.zcu.cz> References: <20091118134613.GB17382@phare.normalesup.org> <4B03FEBF.3080308@ntc.zcu.cz> Message-ID: <20091118141638.GB3477@phare.normalesup.org> On Wed, Nov 18, 2009 at 03:03:43PM +0100, Robert Cimrman wrote: > Hi Gael, > Gael Varoquaux wrote: > > Hi there, > > I would like to list the connect components of a graph (or a sparse > > matrix, same thing). I know of course of the bread-first traversal, as > > implemented eg in networkX, to find the connect components. However, I > > have a feeling that sparse linear algebra must be performing such > > searches, to decompose sparse matrices in blocks. I'd love to piggy back > > on such implementations, rather than code and maintain a C or cython > > version of breadth-first graph traversal. > I have a function in C (as a part of sfepy), that does that. But as it > might be useful for more people, what about putting it scipy > sparsetools? I think it would be very useful. I would actually include it in the scipy.sparse namespace too. > > Any idea how I could squeeze the information out of the sparse linear > > algebra that we carry around with scipy? I thought about using arpack to > > get the largest eigen vectors of the transition matrix, but that was a > > stupid idea, as (AFAIK) it will not partition my graph in connect > > components, but only tell me how many connect components I have. > Getting eigenvectors is imho more costly that the graph search, no? Well, getting the largest eigenvector of the transition matrix is in o(n), using arpack, AFAIK. So the cost is similar, and on one side we have optimized C code, and on the other side I only had Python code (or C code that I don't want to maintain). In addition, as I am doing diffusion maps, I needed to call arpack anyhow. Cheers, Ga?l From cimrman3 at ntc.zcu.cz Wed Nov 18 09:24:37 2009 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Wed, 18 Nov 2009 15:24:37 +0100 Subject: [SciPy-User] Graph connect components and sparse matrices In-Reply-To: <20091118141638.GB3477@phare.normalesup.org> References: <20091118134613.GB17382@phare.normalesup.org> <4B03FEBF.3080308@ntc.zcu.cz> <20091118141638.GB3477@phare.normalesup.org> Message-ID: <4B0403A5.60903@ntc.zcu.cz> Gael Varoquaux wrote: > On Wed, Nov 18, 2009 at 03:03:43PM +0100, Robert Cimrman wrote: >> Hi Gael, > >> Gael Varoquaux wrote: >>> Hi there, > >>> I would like to list the connect components of a graph (or a sparse >>> matrix, same thing). I know of course of the bread-first traversal, as >>> implemented eg in networkX, to find the connect components. However, I >>> have a feeling that sparse linear algebra must be performing such >>> searches, to decompose sparse matrices in blocks. I'd love to piggy back >>> on such implementations, rather than code and maintain a C or cython >>> version of breadth-first graph traversal. > >> I have a function in C (as a part of sfepy), that does that. But as it >> might be useful for more people, what about putting it scipy >> sparsetools? > > I think it would be very useful. I would actually include it in the > scipy.sparse namespace too. OK, I will give it a shot (soon), unless someone jumps in with a better solution. >>> Any idea how I could squeeze the information out of the sparse linear >>> algebra that we carry around with scipy? I thought about using arpack to >>> get the largest eigen vectors of the transition matrix, but that was a >>> stupid idea, as (AFAIK) it will not partition my graph in connect >>> components, but only tell me how many connect components I have. > >> Getting eigenvectors is imho more costly that the graph search, no? > > Well, getting the largest eigenvector of the transition matrix is in > o(n), using arpack, AFAIK. So the cost is similar, and on one side we > have optimized C code, and on the other side I only had Python code (or C > code that I don't want to maintain). In addition, as I am doing diffusion > maps, I needed to call arpack anyhow. I see. BTW. putting a code into scipy somewhat alleviates the maintenance burden ;) cheers, r. From cclarke at chrisdev.com Wed Nov 18 09:04:40 2009 From: cclarke at chrisdev.com (Chris Clarke) Date: Wed, 18 Nov 2009 10:04:40 -0400 Subject: [SciPy-User] timeseries forwardfill Message-ID: Hi I haven't used this library in a while but i seem to recall you could forward fill 2d arrays and set initial starting values etc.?? Am i correct ?? If so, any special reasons were they removed?? Regards Chris From gael.varoquaux at normalesup.org Wed Nov 18 09:31:19 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 18 Nov 2009 15:31:19 +0100 Subject: [SciPy-User] Graph connect components and sparse matrices In-Reply-To: <4B0403A5.60903@ntc.zcu.cz> References: <20091118134613.GB17382@phare.normalesup.org> <4B03FEBF.3080308@ntc.zcu.cz> <20091118141638.GB3477@phare.normalesup.org> <4B0403A5.60903@ntc.zcu.cz> Message-ID: <20091118143119.GC3477@phare.normalesup.org> On Wed, Nov 18, 2009 at 03:24:37PM +0100, Robert Cimrman wrote: > > Well, getting the largest eigenvector of the transition matrix is in > > o(n), using arpack, AFAIK. So the cost is similar, and on one side we > > have optimized C code, and on the other side I only had Python code (or C > > code that I don't want to maintain). In addition, as I am doing diffusion > > maps, I needed to call arpack anyhow. > I see. BTW. putting a code into scipy somewhat alleviates the > maintenance burden ;) I'd love to, but the code I am talking about is not something you want to see. I inherited it from the lab, and its been a horrible burden. Not that there are not good part in it (there are a lot of excellent alogrithms), but the problem is that it uses home grown vector abstractions, and graph structures, which makes it really hard to split out the good part. In the long run, I hope I will be able to trim out the bad parts and the vector library, and replace this by scipy components, and the work that David Cournapeau has been doing to expose numpy internals to C libraries. Once this is doing, we can think of moving things out to other libraries: scipy, networkx, or the machine learning scikit (we have an engineer hired to work on that, beginning in January). Ga?l From zachary.pincus at yale.edu Wed Nov 18 09:36:26 2009 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Wed, 18 Nov 2009 09:36:26 -0500 Subject: [SciPy-User] Graph connect components and sparse matrices In-Reply-To: <20091118141259.GA3477@phare.normalesup.org> References: <20091118134613.GB17382@phare.normalesup.org> <7E884924-F811-4440-B107-DA86F76A150F@yale.edu> <20091118141259.GA3477@phare.normalesup.org> Message-ID: > I read this tutorial (a very good one, by the way). But I am too > dumb to > figure out from the above assertion how to retrieve the connect > components. Let me explain my problem on an example. Suppose that we > have > the trivial graph. Its adjacency matrix is the identity, the > corresponding laplacian is null. An EVD of these matrices will > result in > an abitrary orthonormal basis of my vertex space. How do I figure > out the > connect components from that? Good question! On the other hand, IIRC the eigenvectors of the zero matrix are usually defined to be the unit basis vectors, so that solves this particular edge case. Note that at least the non-sparse routines in numpy do this: numpy.linalg.eig(numpy.zeros((5,5))) (array([ 0., 0., 0., 0., 0.]), array([[ 1., 0., 0., 0., 0.], [ 0., 1., 0., 0., 0.], [ 0., 0., 1., 0., 0.], [ 0., 0., 0., 1., 0.], [ 0., 0., 0., 0., 1.]])) So there we have exactly what you want in the trivial case. > The problem arises also on non trivial graphs, by the way. The > problem is > that doing an EVD of the transition of laplace matrix only gives a > subspace of the kernel of the laplace matrix. I could probably do a > sparse matrix factorization on that, but I see complexity and cost > coming > in, and I am trying to avoid that. My linear algebra is only tenuous at best, so I don't exactly see why this is a problem. As far as I understand, to find the connected components, first you find the eigenvectors of the laplacian that have an eigenvalue of zero. Then for each node in the graph i, there will be exactly one eigenvector with a non-zero value at position i. The index of this eigenvector is the index of the connected component that i belongs to. Is that right? Again, my linear algebra is rusty. Zach From cimrman3 at ntc.zcu.cz Wed Nov 18 09:44:16 2009 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Wed, 18 Nov 2009 15:44:16 +0100 Subject: [SciPy-User] Graph connect components and sparse matrices In-Reply-To: <20091118143119.GC3477@phare.normalesup.org> References: <20091118134613.GB17382@phare.normalesup.org> <4B03FEBF.3080308@ntc.zcu.cz> <20091118141638.GB3477@phare.normalesup.org> <4B0403A5.60903@ntc.zcu.cz> <20091118143119.GC3477@phare.normalesup.org> Message-ID: <4B040840.2050606@ntc.zcu.cz> Gael Varoquaux wrote: > On Wed, Nov 18, 2009 at 03:24:37PM +0100, Robert Cimrman wrote: >>> Well, getting the largest eigenvector of the transition matrix is in >>> o(n), using arpack, AFAIK. So the cost is similar, and on one side we >>> have optimized C code, and on the other side I only had Python code (or C >>> code that I don't want to maintain). In addition, as I am doing diffusion >>> maps, I needed to call arpack anyhow. > >> I see. BTW. putting a code into scipy somewhat alleviates the >> maintenance burden ;) > > I'd love to, but the code I am talking about is not something you want to > see. I inherited it from the lab, and its been a horrible burden. Not > that there are not good part in it (there are a lot of excellent > alogrithms), but the problem is that it uses home grown vector > abstractions, and graph structures, which makes it really hard to split > out the good part. In the long run, I hope I will be able to trim out the > bad parts and the vector library, and replace this by scipy components, > and the work that David Cournapeau has been doing to expose numpy > internals to C libraries. Once this is doing, we can think of moving > things out to other libraries: scipy, networkx, or the machine learning > scikit (we have an engineer hired to work on that, beginning in January). Now this is an interesting shift in attitude that I experienced myself - instead of putting all the cool stuff into own code, distribute it over well-known and maintained packages ;) cheers, r. From zachary.pincus at yale.edu Wed Nov 18 09:55:17 2009 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Wed, 18 Nov 2009 09:55:17 -0500 Subject: [SciPy-User] Graph connect components and sparse matrices In-Reply-To: References: <20091118134613.GB17382@phare.normalesup.org> <7E884924-F811-4440-B107-DA86F76A150F@yale.edu> <20091118141259.GA3477@phare.normalesup.org> Message-ID: <79E5D595-F6DF-4E94-872B-68059F5E8001@yale.edu> >> The problem arises also on non trivial graphs, by the way. The >> problem is >> that doing an EVD of the transition of laplace matrix only gives a >> subspace of the kernel of the laplace matrix. I could probably do a >> sparse matrix factorization on that, but I see complexity and cost >> coming >> in, and I am trying to avoid that. > > My linear algebra is only tenuous at best, so I don't exactly see why > this is a problem. As far as I understand, to find the connected > components, first you find the eigenvectors of the laplacian that have > an eigenvalue of zero. Then for each node in the graph i, there will > be exactly one eigenvector with a non-zero value at position i. The > index of this eigenvector is the index of the connected component that > i belongs to. Wait... you're saying that the eigenvectors will only span a subspace of the kernel, so that there must be at least some position i where there is a zero value in each eigenvector? If this is correct then I see the problem; hopefully someone who actually knows what they're talking about can help me out here... From pgmdevlist at gmail.com Wed Nov 18 10:50:31 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 18 Nov 2009 10:50:31 -0500 Subject: [SciPy-User] timeseries forwardfill In-Reply-To: References: Message-ID: On Nov 18, 2009, at 9:04 AM, Chris Clarke wrote: > Hi > I haven't used this library in a while but i seem to recall you could > forward fill 2d arrays and set initial starting values etc.?? > Am i correct ?? If so, any special reasons were they removed?? > Regards > Chris What do you mean, removed ? You can find `forward_fill` in scikits.timeseries.lib.interpolate. Am I answering your question ? P. From seb.haase at gmail.com Wed Nov 18 11:00:19 2009 From: seb.haase at gmail.com (Sebastian Haase) Date: Wed, 18 Nov 2009 17:00:19 +0100 Subject: [SciPy-User] difference of angles - to be between -180 and + 180 Message-ID: Hi, Does anyone have a function that calculates delta-angles taking the wrap-around at 180 degrees into account ? I'm thinking of a function like: >>> diffAngle(190, -10) 160 My current version looks like this: def diffAngle(a1,a0): """ return a1-a0 handle wrap-around for -180 and +180 """ d = a1-a0 if d < -180: d=360+d if d> 180: d=360-d return d diffAngle=np.vectorize(diffAngle) But I'm not sure if this is handling all cases correctly ;-( Especially I have problems regarding the correct sign - in cases like this: diffAngle(20, -170) where I was expecting -170 , but I get 170. Thanks, Sebastian Haase From gael.varoquaux at normalesup.org Wed Nov 18 11:10:04 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 18 Nov 2009 17:10:04 +0100 Subject: [SciPy-User] Graph connect components and sparse matrices In-Reply-To: References: <20091118134613.GB17382@phare.normalesup.org> <7E884924-F811-4440-B107-DA86F76A150F@yale.edu> <20091118141259.GA3477@phare.normalesup.org> Message-ID: <20091118161004.GE17382@phare.normalesup.org> On Wed, Nov 18, 2009 at 09:36:26AM -0500, Zachary Pincus wrote: > Note that at least the non-sparse routines in numpy do this: > numpy.linalg.eig(numpy.zeros((5,5))) > (array([ 0., 0., 0., 0., 0.]), > array([[ 1., 0., 0., 0., 0.], > [ 0., 1., 0., 0., 0.], > [ 0., 0., 1., 0., 0.], > [ 0., 0., 0., 1., 0.], > [ 0., 0., 0., 0., 1.]])) Correct, and they do work OK on real-word graphs, but arpack doesn't, and its easy to see why (more on that below). > > The problem arises also on non trivial graphs, by the way. The > > problem is that doing an EVD of the transition of laplace matrix only > > gives a subspace of the kernel of the laplace matrix. I could > > probably do a sparse matrix factorization on that, but I see > > complexity and cost coming in, and I am trying to avoid that. > My linear algebra is only tenuous at best, so I don't exactly see why > this is a problem. As far as I understand, to find the connected > components, first you find the eigenvectors of the laplacian that have > an eigenvalue of zero. Then for each node in the graph i, there will > be exactly one eigenvector with a non-zero value at position i. The > index of this eigenvector is the index of the connected component that > i belongs to. > Is that right? Again, my linear algebra is rusty. Well, the problem is that if you have several eigen values that have the same value (0 for the laplacian, or 1 for the transition matrix), there is an infinity of eigen vectors defined: any combination of eigen vector corresponding to that eigen value is an eigen vector. What I am looking for is a set of particular eigen vectors. I suspect that the property that defines seem is sparsity (in a machine learning sens, rather than a sparse linear algebra sens): many of their coefficients are 0. There machine learning algorithms to find a sparse basis from a non-sparse one, but first of all it starts getting too complex for my liking, second I am unsure that the sparsity is really the exact property that will give me the connect components of my graph. Ga?l From guyer at nist.gov Wed Nov 18 11:19:59 2009 From: guyer at nist.gov (Jonathan Guyer) Date: Wed, 18 Nov 2009 11:19:59 -0500 Subject: [SciPy-User] difference of angles - to be between -180 and + 180 In-Reply-To: <8641C428-4AB0-4BFD-8EF5-90F73939668C@nist.gov> References: <8641C428-4AB0-4BFD-8EF5-90F73939668C@nist.gov> Message-ID: On Nov 18, 2009, at 11:14 AM, I wrote: > return np.fmod(d + 540, 360) - 180 Actually, I think you can just write (d + 540) % 360 - 180 I think we used fmod because of some automatic weave inlining we do that didn't play nice with '%'. From gael.varoquaux at normalesup.org Wed Nov 18 11:22:51 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 18 Nov 2009 17:22:51 +0100 Subject: [SciPy-User] Graph connect components and sparse matrices In-Reply-To: <79E5D595-F6DF-4E94-872B-68059F5E8001@yale.edu> References: <20091118134613.GB17382@phare.normalesup.org> <7E884924-F811-4440-B107-DA86F76A150F@yale.edu> <20091118141259.GA3477@phare.normalesup.org> <79E5D595-F6DF-4E94-872B-68059F5E8001@yale.edu> Message-ID: <20091118162251.GF17382@phare.normalesup.org> On Wed, Nov 18, 2009 at 09:55:17AM -0500, Zachary Pincus wrote: > >> The problem arises also on non trivial graphs, by the way. The > >> problem is > >> that doing an EVD of the transition of laplace matrix only gives a > >> subspace of the kernel of the laplace matrix. I could probably do a > >> sparse matrix factorization on that, but I see complexity and cost > >> coming > >> in, and I am trying to avoid that. > > My linear algebra is only tenuous at best, so I don't exactly see why > > this is a problem. As far as I understand, to find the connected > > components, first you find the eigenvectors of the laplacian that have > > an eigenvalue of zero. Then for each node in the graph i, there will > > be exactly one eigenvector with a non-zero value at position i. The > > index of this eigenvector is the index of the connected component that > > i belongs to. > Wait... you're saying that the eigenvectors will only span a subspace > of the kernel, so that there must be at least some position i where > there is a zero value in each eigenvector? Yes, that's it. The eigenvectors are defined only at a rotation. > If this is correct then I see the problem; hopefully someone who > actually knows what they're talking about can help me out here... So do I :) Thanks for your thoughts, Ga?l From cclarke at chrisdev.com Wed Nov 18 17:18:35 2009 From: cclarke at chrisdev.com (Chris Clarke) Date: Wed, 18 Nov 2009 18:18:35 -0400 Subject: [SciPy-User] timeseries forwardfill In-Reply-To: References: Message-ID: <9DC4A120-0DF0-4E33-91E1-04584E04135F@chrisdev.com> Sorry for the later reply. Yes forward_fill is still there and it works!!! But it seemed to have some more capability (initial values, 2d arrays) when it was in the sandbox?? I may be wrong and mixing up with some other library. I just wanted to be sure that if i do my own patch i'm not reinventing the wheel!! As this is why we are standardizing on scikists.timeseries Regards Chris On Nov 18, 2009, at 11:50 AM, Pierre GM wrote: > > On Nov 18, 2009, at 9:04 AM, Chris Clarke wrote: > >> Hi >> I haven't used this library in a while but i seem to recall you >> could >> forward fill 2d arrays and set initial starting values etc.?? >> Am i correct ?? If so, any special reasons were they removed?? >> Regards >> Chris > > What do you mean, removed ? You can find `forward_fill` in > scikits.timeseries.lib.interpolate. > Am I answering your question ? > P. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From sftrytry at gmail.com Wed Nov 18 17:56:53 2009 From: sftrytry at gmail.com (Jesse Fox) Date: Wed, 18 Nov 2009 17:56:53 -0500 Subject: [SciPy-User] scipy have problems with preinstalled arpack Message-ID: <6a2f0640911181456i3aa5fbffka85c5ba87a61b0a4@mail.gmail.com> I tried to compile scipy and numpy on my Archlinux box. I always got segment fault during scipy.test() on arpack related functions. I tried to remove my pre-installed arpack and recompile scipy. The test() ran without any problem. Is there any conflict between scipy and arpack? -------------- next part -------------- An HTML attachment was scrubbed... URL: From reakinator at gmail.com Wed Nov 18 18:13:42 2009 From: reakinator at gmail.com (Rich E) Date: Thu, 19 Nov 2009 00:13:42 +0100 Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard Message-ID: Hi list, I just joined because I've had this mac for over a month and I still can't get a working module of scipy in Snow Leopard. The dmg says it needs 'python 2.6 or newer', but the one that comes with Snow Leopard is 2.6.1. Installing MacPython seems to create more problems than anything else, and building scipy from source is a no-go (I saw various other posts in the archives about dependency problems, but no solutions yet. I am stuck at UMFPACK sourcers being missing.) Any help or guidance is greatly appreciate. Rich -------------- next part -------------- An HTML attachment was scrubbed... URL: From cool-rr at cool-rr.com Wed Nov 18 18:50:40 2009 From: cool-rr at cool-rr.com (cool-RR) Date: Thu, 19 Nov 2009 01:50:40 +0200 Subject: [SciPy-User] Announcment and question Message-ID: Hello, Announcement: I've talked about it in this mailing list before, but yesterday I finally made the first alpha release of my open-source Python scientific computing project, GarlicSim. Check it out: http://garlicsim.org It is a Pythonic framework for working with simulations. Check out the page, also there is a yet-incomplete introductionto it, which goes more in-depth. My first priority right now is getting users and building a community around it, so I'd be available to help people to write their simulation packages and solve problems. Early users will have this benefit, and additionally will have more effect on the evolution of the software. So if you do any work with simulations, drop me a mail, and I can help you use GarlicSim. Also, I have a question. Up to now I've been supporting Python 2.4 through 3.1 with my project. Supporting 2.4 has been a real burden; I created a separate fork for it, because I don't want to limit my entire project to 2.4. (I love context managers, for example.) I'm considering dropping support for 2.4. The question is, how many people in the scientific Python community still use 2.4? Is it worth supporting? Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Wed Nov 18 19:40:03 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 18 Nov 2009 19:40:03 -0500 Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard In-Reply-To: References: Message-ID: <6324058A-FD14-47BF-9308-9BB5D123CA3A@gmail.com> On Nov 18, 2009, at 6:13 PM, Rich E wrote: > Hi list, > > I just joined because I've had this mac for over a month and I still can't get a working module of scipy in Snow Leopard. The dmg says it needs 'python 2.6 or newer', but the one that comes with Snow Leopard is 2.6.1. So you're set: 2.6.1 is more recent than 2.6... But you probably shouldn't use a dmg: install Scipy from sources, it's far easier to help you. Assuming you have xcode installed, and a proper gfortran (I think this one is the recommended one: http://r.research.att.com/tools/) * Install numpy first. make a local install by using the --user flag when calling python setup.py install. No need to install an additional Python if you use --user, you won't be messing with your system * Then, install scipy, using the --user flag as well. Don't bother for UMFPACK for the moment * You may want to set CFLAGS="-arch x86_64" before installing numpy and scipy. * Let me know where your problems are (off-list for now to reduce the noise), post the log of your build somewhere. Don't worry, it's straightforward, provided you stick to the Python that comes w/ SnowLeopard Good luck P. From zachary.pincus at yale.edu Wed Nov 18 20:15:55 2009 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Wed, 18 Nov 2009 20:15:55 -0500 Subject: [SciPy-User] Dijkstra's algorithm on a lattice Message-ID: Hi all, A bit off-topic, but before I write some C or cython to do this, I thought I'd ask to see if anyone knows of existing code for the task of finding the shortest (weighted) path between two points on a lattice. Specifically, I have images with "start" and "end" pixels marked and I want to find the path through the image with the lowest integrated intensity. Trivial but tedious to implement, so if anyone has some good tips I'd be happy to know. (There's already a left-to-right- shortest-path-finder in the image scikit repository, but that's not quite what I need.) Thanks, Zach From pgmdevlist at gmail.com Wed Nov 18 20:17:24 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 18 Nov 2009 20:17:24 -0500 Subject: [SciPy-User] timeseries forwardfill In-Reply-To: <9DC4A120-0DF0-4E33-91E1-04584E04135F@chrisdev.com> References: <9DC4A120-0DF0-4E33-91E1-04584E04135F@chrisdev.com> Message-ID: On Nov 18, 2009, at 5:18 PM, Chris Clarke wrote: > Sorry for the later reply. Yes forward_fill is still there and it > works!!! Good > But it seemed to have some more capability (initial values, 2d arrays) > when it was in the sandbox?? > I may be wrong and mixing up with some other library. That does sound familiar, but i don't think it was part of scikits.timeseries... A patch for 2D would be welcome, I'm not quite sure what you mean by initial value, though From david at ar.media.kyoto-u.ac.jp Wed Nov 18 23:31:39 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 19 Nov 2009 13:31:39 +0900 Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard In-Reply-To: References: Message-ID: <4B04CA2B.4000105@ar.media.kyoto-u.ac.jp> Rich E wrote: > Hi list, > > I just joined because I've had this mac for over a month and I still > can't get a working module of scipy in Snow Leopard. The dmg says it > needs 'python 2.6 or newer', but the one that comes with Snow Leopard > is 2.6.1. The dmg needs a python installed from python.org. If you want to get scipy with the included python, you need to build it yourself. UMFPACK is not needed - if you have a problem, please report the exact error as well as the command you used to build. Just saying it does not work is not enough to help you, cheers, David From cimrman3 at ntc.zcu.cz Thu Nov 19 06:51:34 2009 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Thu, 19 Nov 2009 12:51:34 +0100 Subject: [SciPy-User] Graph connect components and sparse matrices In-Reply-To: <4B0403A5.60903@ntc.zcu.cz> References: <20091118134613.GB17382@phare.normalesup.org> <4B03FEBF.3080308@ntc.zcu.cz> <20091118141638.GB3477@phare.normalesup.org> <4B0403A5.60903@ntc.zcu.cz> Message-ID: <4B053146.6020609@ntc.zcu.cz> Robert Cimrman wrote: > Gael Varoquaux wrote: >> On Wed, Nov 18, 2009 at 03:03:43PM +0100, Robert Cimrman wrote: >>> Hi Gael, >>> Gael Varoquaux wrote: >>>> Hi there, >>>> I would like to list the connect components of a graph (or a sparse >>>> matrix, same thing). I know of course of the bread-first traversal, as >>>> implemented eg in networkX, to find the connect components. However, I >>>> have a feeling that sparse linear algebra must be performing such >>>> searches, to decompose sparse matrices in blocks. I'd love to piggy back >>>> on such implementations, rather than code and maintain a C or cython >>>> version of breadth-first graph traversal. >>> I have a function in C (as a part of sfepy), that does that. But as it >>> might be useful for more people, what about putting it scipy >>> sparsetools? >> I think it would be very useful. I would actually include it in the >> scipy.sparse namespace too. > > OK, I will give it a shot (soon), unless someone jumps in with a better solution. > It's now in ticket #1057. r. From gael.varoquaux at normalesup.org Thu Nov 19 07:27:27 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 19 Nov 2009 13:27:27 +0100 Subject: [SciPy-User] Graph connect components and sparse matrices In-Reply-To: <4B053146.6020609@ntc.zcu.cz> References: <20091118134613.GB17382@phare.normalesup.org> <4B03FEBF.3080308@ntc.zcu.cz> <20091118141638.GB3477@phare.normalesup.org> <4B0403A5.60903@ntc.zcu.cz> <4B053146.6020609@ntc.zcu.cz> Message-ID: <20091119122727.GA1278@phare.normalesup.org> On Thu, Nov 19, 2009 at 12:51:34PM +0100, Robert Cimrman wrote: > > OK, I will give it a shot (soon), unless someone jumps in with a better solution. > It's now in ticket #1057. Excellent. I am looking at it right now. Too bad I am not getting the ticket mail :) Ga?l From seb.haase at gmail.com Thu Nov 19 07:48:22 2009 From: seb.haase at gmail.com (Sebastian Haase) Date: Thu, 19 Nov 2009 13:48:22 +0100 Subject: [SciPy-User] difference of angles - to be between -180 and + 180 In-Reply-To: References: <8641C428-4AB0-4BFD-8EF5-90F73939668C@nist.gov> Message-ID: On Wed, Nov 18, 2009 at 5:19 PM, Jonathan Guyer wrote: > On Nov 18, 2009, at 11:14 AM, I wrote: > >> ? return np.fmod(d + 540, 360) - 180 > > Actually, I think you can just write (d + 540) % 360 - 180 > > I think we used fmod because of some automatic weave inlining we do > that didn't play nice with '%'. Hi Jonathan, thanks for your answer. I might prefer your solution simply for its brevity. However, there are also some sign "problems": for the angle from -10 to 180 I was expecting +170, but your solution returns -170. and for (to:)'190' (from) '-10' expected: '160' , yours returns -160. I have a list of 30 test cases, which these are the only 2 were yours gave unexpected results regarding the sign -- besides the fact that yours always returns -180 instead of +180, but that is obviously not really wrong. Thanks, Sebastian From sccolbert at gmail.com Thu Nov 19 08:34:25 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Thu, 19 Nov 2009 14:34:25 +0100 Subject: [SciPy-User] difference of angles - to be between -180 and + 180 In-Reply-To: References: <8641C428-4AB0-4BFD-8EF5-90F73939668C@nist.gov> Message-ID: <7f014ea60911190534w3a838e52ne4bc46467656e46@mail.gmail.com> On Thu, Nov 19, 2009 at 1:48 PM, Sebastian Haase wrote: > On Wed, Nov 18, 2009 at 5:19 PM, Jonathan Guyer wrote: >> On Nov 18, 2009, at 11:14 AM, I wrote: >> >>> ? return np.fmod(d + 540, 360) - 180 >> >> Actually, I think you can just write (d + 540) % 360 - 180 >> >> I think we used fmod because of some automatic weave inlining we do >> that didn't play nice with '%'. > > Hi Jonathan, > thanks for your answer. I might prefer your solution simply for its brevity. > > However, there are also some sign "problems": > for the angle from -10 to 180 I was expecting +170, but your solution > returns -170. > > and for (to:)'190' (from) '-10' ?expected: '160' , yours returns -160. > -170 and -160 are the correct answers for those differences > I have a list of 30 test cases, which these are the only 2 were yours > gave unexpected results regarding the sign -- besides the fact that > yours always returns -180 instead of +180, but that is obviously not > really wrong. > > Thanks, > Sebastian > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From seb.haase at gmail.com Thu Nov 19 08:52:13 2009 From: seb.haase at gmail.com (Sebastian Haase) Date: Thu, 19 Nov 2009 14:52:13 +0100 Subject: [SciPy-User] difference of angles - to be between -180 and + 180 In-Reply-To: <7f014ea60911190534w3a838e52ne4bc46467656e46@mail.gmail.com> References: <8641C428-4AB0-4BFD-8EF5-90F73939668C@nist.gov> <7f014ea60911190534w3a838e52ne4bc46467656e46@mail.gmail.com> Message-ID: Thanks for the enlightenment ;-) -S. On Thu, Nov 19, 2009 at 2:34 PM, Chris Colbert wrote: > On Thu, Nov 19, 2009 at 1:48 PM, Sebastian Haase wrote: >> On Wed, Nov 18, 2009 at 5:19 PM, Jonathan Guyer wrote: >>> On Nov 18, 2009, at 11:14 AM, I wrote: >>> >>>> ? return np.fmod(d + 540, 360) - 180 >>> >>> Actually, I think you can just write (d + 540) % 360 - 180 >>> >>> I think we used fmod because of some automatic weave inlining we do >>> that didn't play nice with '%'. >> >> Hi Jonathan, >> thanks for your answer. I might prefer your solution simply for its brevity. >> >> However, there are also some sign "problems": >> for the angle from -10 to 180 I was expecting +170, but your solution >> returns -170. >> >> and for (to:)'190' (from) '-10' ?expected: '160' , yours returns -160. >> > > > -170 and -160 are the correct answers for those differences > > >> I have a list of 30 test cases, which these are the only 2 were yours >> gave unexpected results regarding the sign -- besides the fact that >> yours always returns -180 instead of +180, but that is obviously not >> really wrong. >> >> Thanks, >> Sebastian >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From dcswest at gmail.com Thu Nov 19 10:58:33 2009 From: dcswest at gmail.com (Dennis C) Date: Thu, 19 Nov 2009 07:58:33 -0800 Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard Message-ID: Greetings Rich; Another option that recently worked for me was to just install it through MacPorts. That does maintain its own library in /opt so it will also install Python even when it's already elsewhere on the system, but it'll take care of all other dependencies too including the NumPy... Good luck, Message: 3 Date: Thu, 19 Nov 2009 00:13:42 +0100 From: Rich E Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard To: scipy-user at scipy.org Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hi list, I just joined because I've had this mac for over a month and I still can't get a working module of scipy in Snow Leopard. The dmg says it needs 'python 2.6 or newer', but the one that comes with Snow Leopard is 2.6.1. Installing MacPython seems to create more problems than anything else, and building scipy from source is a no-go (I saw various other posts in the archives about dependency problems, but no solutions yet. I am stuck at UMFPACK sourcers being missing.) Any help or guidance is greatly appreciate. Rich -------------- next part -------------- An HTML attachment was scrubbed... URL: From wnbell at gmail.com Thu Nov 19 11:01:19 2009 From: wnbell at gmail.com (Nathan Bell) Date: Thu, 19 Nov 2009 11:01:19 -0500 Subject: [SciPy-User] Graph connect components and sparse matrices In-Reply-To: <4B053146.6020609@ntc.zcu.cz> References: <20091118134613.GB17382@phare.normalesup.org> <4B03FEBF.3080308@ntc.zcu.cz> <20091118141638.GB3477@phare.normalesup.org> <4B0403A5.60903@ntc.zcu.cz> <4B053146.6020609@ntc.zcu.cz> Message-ID: On Thu, Nov 19, 2009 at 6:51 AM, Robert Cimrman wrote: > > It's now in ticket #1057. > Hi Robert, Sorry for getting on this thread so late, I've been extremely busy lately. I think we should definitely include more graph algorithms in scipy.sparse. The cost of extracting the same info via eigenvectors is high and the results are less trustworthy. We've implemented several such algorithms (like connected_components [1]) in PyAMG. Since the code is organized in similar fashion to scipy.sparse it would make sense to transfer some or all of the functionality in pyamg.graph into scipy.sparse.graph or some such namespace. I'd also like to add some reordering methods like RCM and nested bisection. [1] http://code.google.com/p/pyamg/source/browse/trunk/pyamg/graph.py#271 -- Nathan Bell wnbell at gmail.com http://www.wnbell.com/ From gael.varoquaux at normalesup.org Thu Nov 19 11:09:56 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 19 Nov 2009 17:09:56 +0100 Subject: [SciPy-User] Graph connect components and sparse matrices In-Reply-To: References: <20091118134613.GB17382@phare.normalesup.org> <4B03FEBF.3080308@ntc.zcu.cz> <20091118141638.GB3477@phare.normalesup.org> <4B0403A5.60903@ntc.zcu.cz> <4B053146.6020609@ntc.zcu.cz> Message-ID: <20091119160956.GB1278@phare.normalesup.org> On Thu, Nov 19, 2009 at 11:01:19AM -0500, Nathan Bell wrote: > I think we should definitely include more graph algorithms in > scipy.sparse. The cost of extracting the same info via eigenvectors > is high and the results are less trustworthy. > We've implemented several such algorithms (like connected_components > [1]) in PyAMG. I thought you might have. By the way, that thing (pyAMG) is just fantastic). > Since the code is organized in similar fashion to scipy.sparse it would > make sense to transfer some or all of the functionality in pyamg.graph > into scipy.sparse.graph or some such namespace. I'd love to see all of it, actually. > I'd also like to add some reordering methods like RCM and nested > bisection. I am really interested in all that. I don't have time to contribute in the short term, but in the long run (one to two years), I have a big interest there. I think moving these features in scipy would enable code sharing between a lot of other libraries (pyAMG, networkX, sfepy, and probably other PDE solvers). Beside, the nipy project has some graph algorithms for machine learning and computer vision that use custom structures, and should move to common structures in the long run, and maybe in a comon project (we are thinking of the scikit learn). Very exciting talk! Ga?l From mcmcclur at unca.edu Thu Nov 19 11:01:21 2009 From: mcmcclur at unca.edu (Mark McClure) Date: Thu, 19 Nov 2009 11:01:21 -0500 Subject: [SciPy-User] Numerical methods textbook recs? Message-ID: <7414ba0d0911190801i521ebe35s61494f33fa130b9e@mail.gmail.com> I'll be teaching an undergraduate level course in elementary numerical methods next semester. I would seriously consider using Python/SciPy as the computing environment for the course but I have not been able to find a textbook that is Python based. The appropriate level for the text would be similar to Kincaid and Cheney, as you can preview here: http://books.google.com/books?id=x69Q226WR8kC That book is more expensive than I'd like, however, and is not Python based. Any suggestions? Thanks, Mark McClure From cimrman3 at ntc.zcu.cz Thu Nov 19 11:21:56 2009 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Thu, 19 Nov 2009 17:21:56 +0100 Subject: [SciPy-User] Graph connect components and sparse matrices In-Reply-To: References: <20091118134613.GB17382@phare.normalesup.org> <4B03FEBF.3080308@ntc.zcu.cz> <20091118141638.GB3477@phare.normalesup.org> <4B0403A5.60903@ntc.zcu.cz> <4B053146.6020609@ntc.zcu.cz> Message-ID: <4B0570A4.6090402@ntc.zcu.cz> Nathan Bell wrote: > On Thu, Nov 19, 2009 at 6:51 AM, Robert Cimrman wrote: >> It's now in ticket #1057. >> > > Hi Robert, > > Sorry for getting on this thread so late, I've been extremely busy lately. > > > I think we should definitely include more graph algorithms in > scipy.sparse. The cost of extracting the same info via eigenvectors > is high and the results are less trustworthy. > > We've implemented several such algorithms (like connected_components > [1]) in PyAMG. Since the code is organized in similar fashion to > scipy.sparse it would make sense to transfer some or all of the > functionality in pyamg.graph into scipy.sparse.graph or some such > namespace. I'd also like to add some reordering methods like RCM and > nested bisection. > > [1] http://code.google.com/p/pyamg/source/browse/trunk/pyamg/graph.py#271 Hi Nathan, I have implemented RCM into sfepy too... Fortunately, I already had a C functions lying around, so I did not waste too much time on that. It would be perfect to have all this in scipy instead! cheers, r. From cimrman3 at ntc.zcu.cz Thu Nov 19 11:24:25 2009 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Thu, 19 Nov 2009 17:24:25 +0100 Subject: [SciPy-User] Graph connect components and sparse matrices In-Reply-To: <20091119160956.GB1278@phare.normalesup.org> References: <20091118134613.GB17382@phare.normalesup.org> <4B03FEBF.3080308@ntc.zcu.cz> <20091118141638.GB3477@phare.normalesup.org> <4B0403A5.60903@ntc.zcu.cz> <4B053146.6020609@ntc.zcu.cz> <20091119160956.GB1278@phare.normalesup.org> Message-ID: <4B057139.7060001@ntc.zcu.cz> Gael Varoquaux wrote: > On Thu, Nov 19, 2009 at 11:01:19AM -0500, Nathan Bell wrote: >> I think we should definitely include more graph algorithms in >> scipy.sparse. The cost of extracting the same info via eigenvectors >> is high and the results are less trustworthy. > >> We've implemented several such algorithms (like connected_components >> [1]) in PyAMG. > > I thought you might have. By the way, that thing (pyAMG) is just > fantastic). +1. (BTW. I still have to explore why it does not work well with my matrices...) >> Since the code is organized in similar fashion to scipy.sparse it would >> make sense to transfer some or all of the functionality in pyamg.graph >> into scipy.sparse.graph or some such namespace. > > I'd love to see all of it, actually. > >> I'd also like to add some reordering methods like RCM and nested >> bisection. > > I am really interested in all that. I don't have time to contribute in > the short term, but in the long run (one to two years), I have a big > interest there. > > I think moving these features in scipy would enable code sharing between > a lot of other libraries (pyAMG, networkX, sfepy, and probably other PDE > solvers). Beside, the nipy project has some graph algorithms for machine > learning and computer vision that use custom structures, and should move > to common structures in the long run, and maybe in a comon project (we > are thinking of the scikit learn). Again, +1. I was forced to code some linear algebra/graph stuff, which is now in sfepy, but which I would prefer to have in scipy instead. r. From aisaac at american.edu Thu Nov 19 11:52:30 2009 From: aisaac at american.edu (Alan G Isaac) Date: Thu, 19 Nov 2009 11:52:30 -0500 Subject: [SciPy-User] Numerical methods textbook recs? In-Reply-To: <7414ba0d0911190801i521ebe35s61494f33fa130b9e@mail.gmail.com> References: <7414ba0d0911190801i521ebe35s61494f33fa130b9e@mail.gmail.com> Message-ID: <4B0577CE.1060500@american.edu> On 11/19/2009 11:01 AM, Mark McClure wrote: > I'll be teaching an undergraduate level course in elementary numerical > methods next semester. I would seriously consider using Python/SciPy > as the computing environment for the course but I have not been able > to find a textbook that is Python based. http://www.amazon.com/Numerical-Methods-Engineering-Python-Kiusalaas/dp/0521852870/ref=sr_1_1?ie=UTF8&s=books&qid=1258649498&sr=8-1 hth, Alan Isaac From Dharhas.Pothina at twdb.state.tx.us Thu Nov 19 11:53:33 2009 From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina) Date: Thu, 19 Nov 2009 10:53:33 -0600 Subject: [SciPy-User] Reset IPython to original blank state. Message-ID: <4B0523AD.63BA.009B.0@twdb.state.tx.us> Hi All, I'm trying to do something like matlab's 'clear all; close all; fclose all;' this command basically resets matlab to a blank state by clearing all variables and closing all figures and files. It is hugely useful for avoiding old variables and data interfering with current work when interactively plotting and exploring data. With Ipython on Linux this is not too big of a deal since I can easily just quit and restart Ipython. On windows Ipython seems to take an inordinate amount of time to start so this is really an issue and causes the workflow to be interrupted. I've tried using %reset and while that seems to clear any variables in memory it doesn't seem to reset everything. I'm having lots of issues with matplotlib figures and other crashes related to 'too many open file handles' if I do not close and restart Ipython. Any way around this. Is there a small script I could use to clear everything and take Ipython back to its original startup state without restarting it? - dharhas From robert.kern at gmail.com Thu Nov 19 13:08:50 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 19 Nov 2009 12:08:50 -0600 Subject: [SciPy-User] Reset IPython to original blank state. In-Reply-To: <4B0523AD.63BA.009B.0@twdb.state.tx.us> References: <4B0523AD.63BA.009B.0@twdb.state.tx.us> Message-ID: <3d375d730911191008m628d09aexdca3f0f2c59a00bc@mail.gmail.com> On Thu, Nov 19, 2009 at 10:53, Dharhas Pothina wrote: > Hi All, > > I'm trying to do something like matlab's 'clear all; close all; fclose all;' this command basically resets matlab to a blank state by clearing all variables and closing all figures and files. It is hugely useful for avoiding old variables and data interfering with current work when interactively plotting and exploring data. > > With Ipython on Linux this is not too big of a deal since I can easily just quit and restart Ipython. On windows Ipython seems to take an inordinate amount of time to start so this is really an issue and causes the workflow to be interrupted. > > I've tried using %reset and while that seems to clear any variables in memory it doesn't seem to reset everything. I'm having lots of issues with matplotlib figures and other crashes related to 'too many open file handles' if I do not close and restart Ipython. Where are these open files coming from? Most of the code in numpy/matplotlib/IPython should be properly closing files. If it is your code, it would be worth your time to fix your code to not keep files open longer than necessary rather than restarting IPython. > Any way around this. Is there a small script I could use to clear everything and take Ipython back to its original startup state without restarting it? Not really, no. Also, you will want to ask further IPython questions on the IPython mailing list: http://mail.scipy.org/mailman/listinfo/ipython-user -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Dharhas.Pothina at twdb.state.tx.us Thu Nov 19 13:41:42 2009 From: Dharhas.Pothina at twdb.state.tx.us (Dharhas Pothina) Date: Thu, 19 Nov 2009 12:41:42 -0600 Subject: [SciPy-User] Reset IPython to original blank state. In-Reply-To: <3d375d730911191008m628d09aexdca3f0f2c59a00bc@mail.gmail.com> References: <4B0523AD.63BA.009B.0@twdb.state.tx.us> <3d375d730911191008m628d09aexdca3f0f2c59a00bc@mail.gmail.com> Message-ID: <4B053D06.63BA.009B.0@twdb.state.tx.us> Sorry didn't realize IPython had a separate mailing list. Will repost there. I think I may have fixed the "too many files" problem but I the big problem I have is when rerunning the same script and having too many legend labels show up in matplotlib plots since it still has the old ones from the previous time the script run. - d >>> Robert Kern 11/19/2009 12:08 PM >>> On Thu, Nov 19, 2009 at 10:53, Dharhas Pothina wrote: > Hi All, > > I'm trying to do something like matlab's 'clear all; close all; fclose all;' this command basically resets matlab to a blank state by clearing all variables and closing all figures and files. It is hugely useful for avoiding old variables and data interfering with current work when interactively plotting and exploring data. > > With Ipython on Linux this is not too big of a deal since I can easily just quit and restart Ipython. On windows Ipython seems to take an inordinate amount of time to start so this is really an issue and causes the workflow to be interrupted. > > I've tried using %reset and while that seems to clear any variables in memory it doesn't seem to reset everything. I'm having lots of issues with matplotlib figures and other crashes related to 'too many open file handles' if I do not close and restart Ipython. Where are these open files coming from? Most of the code in numpy/matplotlib/IPython should be properly closing files. If it is your code, it would be worth your time to fix your code to not keep files open longer than necessary rather than restarting IPython. > Any way around this. Is there a small script I could use to clear everything and take Ipython back to its original startup state without restarting it? Not really, no. Also, you will want to ask further IPython questions on the IPython mailing list: http://mail.scipy.org/mailman/listinfo/ipython-user -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From pav at iki.fi Thu Nov 19 14:29:37 2009 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 19 Nov 2009 21:29:37 +0200 Subject: [SciPy-User] Reset IPython to original blank state. In-Reply-To: <4B053D06.63BA.009B.0@twdb.state.tx.us> References: <4B0523AD.63BA.009B.0@twdb.state.tx.us> <3d375d730911191008m628d09aexdca3f0f2c59a00bc@mail.gmail.com> <4B053D06.63BA.009B.0@twdb.state.tx.us> Message-ID: <1258658976.6439.0.camel@idol> to, 2009-11-19 kello 12:41 -0600, Dharhas Pothina kirjoitti: > Sorry didn't realize IPython had a separate mailing list. Will repost there. > > I think I may have fixed the "too many files" problem but I the big > problem I have is when rerunning the same script and having too many > legend labels show up in matplotlib plots since it still has the old > ones from the previous time the script run. Clear the figure before replotting http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.clf From dav at alum.mit.edu Thu Nov 19 15:23:41 2009 From: dav at alum.mit.edu (Dav Clark) Date: Thu, 19 Nov 2009 12:23:41 -0800 Subject: [SciPy-User] Numerical methods textbook recs? In-Reply-To: <4B0577CE.1060500@american.edu> References: <7414ba0d0911190801i521ebe35s61494f33fa130b9e@mail.gmail.com> <4B0577CE.1060500@american.edu> Message-ID: <4354A80C-A51B-45BA-9BF0-EE9F6A9AE548@alum.mit.edu> On Nov 19, 2009, at 8:52 AM, Alan G Isaac wrote: > On 11/19/2009 11:01 AM, Mark McClure wrote: >> I'll be teaching an undergraduate level course in elementary >> numerical >> methods next semester. I would seriously consider using Python/SciPy >> as the computing environment for the course but I have not been able >> to find a textbook that is Python based. > > > http://www.amazon.com/Numerical-Methods-Engineering-Python-Kiusalaas/dp/0521852870/ref=sr_1_1?ie=UTF8&s=books&qid=1258649498&sr=8-1 I'm partial to the Strang book: http://www.amazon.com/Introduction-Applied-Mathematics-Gilbert-Strang/dp/0961408804 It's more a senior level text, not sure what level you need. Conceptually, it is one of the clearest texts I've come across - but you need to figure out the coding on your own. I've floated the idea of writing a "python complements" to this or any similar book that a group might get behind. If you're doing a course, that could be a nice way to organize such an enterprise. Let me (and the list) know if you do write / want help writing anything like that. Cheers, Dav From reakinator at gmail.com Thu Nov 19 18:39:53 2009 From: reakinator at gmail.com (Rich E) Date: Fri, 20 Nov 2009 00:39:53 +0100 Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard In-Reply-To: References: Message-ID: When trying to install py-scipy in macports, I get the following error: ichard-Eakins-MacBook-Pro:scipy richardeakin$ sudo port install py-scipy Portfile changed since last build; discarding previous state. ---> Computing dependencies for py-scipy ---> Staging py-numpy into destroot Error: Target org.macports.destroot returned: error renaming "/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_python_py-numpy/work/destroot/opt/local/bin/f2py": no such file or directory Error: The following dependencies failed to build: py-numpy swig-python bison gsed python_select swig Error: Status 1 encountered during processing. I don't know if that is an issue for this list or the macports list. But, I'm moving on to other methods.. Rich On Thu, Nov 19, 2009 at 4:58 PM, Dennis C wrote: > Greetings Rich; > > Another option that recently worked for me was to just install it through > MacPorts. That does maintain its own library in /opt so it will also > install Python even when it's already elsewhere on the system, but it'll > take care of all other dependencies too including the NumPy... > > Good luck, > > > Message: 3 > Date: Thu, 19 Nov 2009 00:13:42 +0100 > > From: Rich E > Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard > To: scipy-user at scipy.org > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > > Hi list, > > I just joined because I've had this mac for over a month and I still can't > get a working module of scipy in Snow Leopard. The dmg says it needs > 'python 2.6 or newer', but the one that comes with Snow Leopard is 2.6.1. > Installing MacPython seems to create more problems than anything else, and > building scipy from source is a no-go (I saw various other posts in the > archives about dependency problems, but no solutions yet. I am stuck at > UMFPACK sourcers being missing.) > > Any help or guidance is greatly appreciate. > > Rich > -------------- next part -------------- An HTML attachment was scrubbed... URL: From reakinator at gmail.com Thu Nov 19 19:20:21 2009 From: reakinator at gmail.com (Rich E) Date: Fri, 20 Nov 2009 01:20:21 +0100 Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard In-Reply-To: <4B04CA2B.4000105@ar.media.kyoto-u.ac.jp> References: <4B04CA2B.4000105@ar.media.kyoto-u.ac.jp> Message-ID: I just installed both python and scipy from the dmg files on their websites - not 64bit but I suppose that is unimportant at the moment. Now, I'm looking for pylab, but I don't see that anywhere (just matplotlib, although my scripts all use "import pylab"). Thanks for the advice. Rich On Thu, Nov 19, 2009 at 5:31 AM, David Cournapeau < david at ar.media.kyoto-u.ac.jp> wrote: > Rich E wrote: > > Hi list, > > > > I just joined because I've had this mac for over a month and I still > > can't get a working module of scipy in Snow Leopard. The dmg says it > > needs 'python 2.6 or newer', but the one that comes with Snow Leopard > > is 2.6.1. > > The dmg needs a python installed from python.org. If you want to get > scipy with the included python, you need to build it yourself. > > UMFPACK is not needed - if you have a problem, please report the exact > error as well as the command you used to build. Just saying it does not > work is not enough to help you, > > cheers, > > David > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Nov 19 19:23:46 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 19 Nov 2009 18:23:46 -0600 Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard In-Reply-To: References: <4B04CA2B.4000105@ar.media.kyoto-u.ac.jp> Message-ID: <3d375d730911191623q4483a6c9of2d96997e7167d13@mail.gmail.com> On Thu, Nov 19, 2009 at 18:20, Rich E wrote: > I just installed both python and scipy from the dmg files on their websites > - not 64bit but I suppose that is unimportant at the moment. > > Now, I'm looking for pylab, but I don't see that anywhere (just matplotlib, > although my scripts all use "import pylab"). pylab is part of matplotlib. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From lujitsu at hotmail.com Thu Nov 19 19:25:16 2009 From: lujitsu at hotmail.com (C. Campbell) Date: Thu, 19 Nov 2009 19:25:16 -0500 Subject: [SciPy-User] Dijkstra's algorithm on a lattice In-Reply-To: References: Message-ID: The package networkx has Dijktra's algorithm implemented; if I understand you correctly you'd just need to assign the intensities to the edge weights when you form the network. http://networkx.lanl.gov/ I hope this helps! Colin > From: zachary.pincus at yale.edu > To: scipy-user at scipy.org > Date: Wed, 18 Nov 2009 20:15:55 -0500 > Subject: [SciPy-User] Dijkstra's algorithm on a lattice > > Hi all, > > A bit off-topic, but before I write some C or cython to do this, I > thought I'd ask to see if anyone knows of existing code for the task > of finding the shortest (weighted) path between two points on a lattice. > > Specifically, I have images with "start" and "end" pixels marked and I > want to find the path through the image with the lowest integrated > intensity. Trivial but tedious to implement, so if anyone has some > good tips I'd be happy to know. (There's already a left-to-right- > shortest-path-finder in the image scikit repository, but that's not > quite what I need.) > > Thanks, > Zach > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user _________________________________________________________________ Windows 7: It works the way you want. Learn more. http://www.microsoft.com/Windows/windows-7/default.aspx?ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen:112009v2 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremy.vancleve at gmail.com Thu Nov 19 19:35:58 2009 From: jeremy.vancleve at gmail.com (Jeremy Van Cleve) Date: Thu, 19 Nov 2009 17:35:58 -0700 Subject: [SciPy-User] compiling scipy with pgcc 9.0-3 on cray linux 2.2 Message-ID: <4B05E46E.6090907@gmail.com> I am trying to compile scipy 0.7.0 on the cray linux 2.2 nodes on the NICS kraken system. I've successfully built lapack, ATLAS, python 2.6.4, and numpy 1.3.0 but am having problems with scipy. Running "python setup.py install" in the scipy source directory yields the following error: ... compiling C sources C compiler: cc -DNDEBUG -fastsse -fPIC creating build/temp.linux-x86_64-2.6/build creating build/temp.linux-x86_64-2.6/build/src.linux-x86_64-2.6 creating build/temp.linux-x86_64-2.6/build/src.linux-x86_64-2.6/scipy creating build/temp.linux-x86_64-2.6/build/src.linux-x86_64-2.6/scipy/fftpack compile options: '-Ibuild/src.linux-x86_64-2.6 -I/lustre/scratch/jvanclev/opt/lib/python2.6/site-packages/numpy/core/include -I/lustre/scratch/jvanclev/opt/include/python2.6 -c' cc: build/src.linux-x86_64-2.6/fortranobject.c /opt/cray/xt-asyncpe/3.3/bin/cc: INFO: linux target is being used cc: scipy/fftpack/src/zfftnd.c /opt/cray/xt-asyncpe/3.3/bin/cc: INFO: linux target is being used PGC-W-0156-Type not specified, 'int' assumed (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0039-Use of undeclared variable caches_zfftnd_fftpack (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0054-Subscript operator ([]) applied to non-array (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0059-Struct or union required on left of . or -> (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0054-Subscript operator ([]) applied to non-array (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0059-Struct or union required on left of . or -> (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0054-Subscript operator ([]) applied to non-array (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0059-Struct or union required on left of . or -> (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-W-0095-Type cast required for this conversion (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-W-0155-Pointer value created from a nonlong integral type (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0054-Subscript operator ([]) applied to non-array (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0059-Struct or union required on left of . or -> (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-W-0095-Type cast required for this conversion (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-W-0155-Pointer value created from a nonlong integral type (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0054-Subscript operator ([]) applied to non-array (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0059-Struct or union required on left of . or -> (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0054-Subscript operator ([]) applied to non-array (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0059-Struct or union required on left of . or -> (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0054-Subscript operator ([]) applied to non-array (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0059-Struct or union required on left of . or -> (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0054-Subscript operator ([]) applied to non-array (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0059-Struct or union required on left of . or -> (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-W-0095-Type cast required for this conversion (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0054-Subscript operator ([]) applied to non-array (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0059-Struct or union required on left of . or -> (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-W-0095-Type cast required for this conversion (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0054-Subscript operator ([]) applied to non-array (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0059-Struct or union required on left of . or -> (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-W-0095-Type cast required for this conversion (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-W-0155-Pointer value created from a nonlong integral type (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0054-Subscript operator ([]) applied to non-array (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0059-Struct or union required on left of . or -> (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-W-0095-Type cast required for this conversion (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-W-0155-Pointer value created from a nonlong integral type (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0054-Subscript operator ([]) applied to non-array (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-S-0059-Struct or union required on left of . or -> (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC-F-0008-Error limit exceeded (scipy/fftpack/src/zfftnd_fftpack.c: 21) PGC/x86-64 Linux 9.0-3: compilation aborted ... I was unable to find any information on a problem in scipy with this source file, so I was wondering whether there might be something specific to compiling with pgcc. Any thoughts? Thanks! best, Jeremy From perfreem at gmail.com Thu Nov 19 19:57:12 2009 From: perfreem at gmail.com (per freem) Date: Thu, 19 Nov 2009 19:57:12 -0500 Subject: [SciPy-User] error using SciPy on Mac OS X Snow Leopard (using scipy.maxentropy) Message-ID: hi all, i recently upgraded to Mac OS X Snow Leopard and moved from Python 2.5 to Python 2.6. I reinstalled scipy and it seemed to work, except when I try to execute: from scipy.maxentropy import logsumexp I get the errors: Traceback (most recent call last): File "myfile.py", line 6, in from scipy.maxentropy import logsumexp File "/Library/Python/2.6/site-packages/scipy/maxentropy/__init__.py", line 9, in from maxentropy import * File "/Library/Python/2.6/site-packages/scipy/maxentropy/maxentropy.py", line 74, in from scipy import optimize File "/Library/Python/2.6/site-packages/scipy/optimize/__init__.py", line 7, in from optimize import * File "/Library/Python/2.6/site-packages/scipy/optimize/optimize.py", line 28, in import linesearch File "/Library/Python/2.6/site-packages/scipy/optimize/linesearch.py", line 3, in from scipy.optimize import minpack2 ImportError: /Library/Python/2.6/site-packages/scipy/optimize/minpack2.so: no appropriate 64-bit architecture (see "man python" for running in 32-bit mode) any idea what could be wrong here? thanks. From reakinator at gmail.com Thu Nov 19 20:51:50 2009 From: reakinator at gmail.com (Rich E) Date: Fri, 20 Nov 2009 02:51:50 +0100 Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard In-Reply-To: <3d375d730911191623q4483a6c9of2d96997e7167d13@mail.gmail.com> References: <4B04CA2B.4000105@ar.media.kyoto-u.ac.jp> <3d375d730911191623q4483a6c9of2d96997e7167d13@mail.gmail.com> Message-ID: Cool, I think I have everything I need. Just didn't see all the dmg's at first! Sorry for the waste of time. Rich On Fri, Nov 20, 2009 at 1:23 AM, Robert Kern wrote: > On Thu, Nov 19, 2009 at 18:20, Rich E wrote: > > I just installed both python and scipy from the dmg files on their > websites > > - not 64bit but I suppose that is unimportant at the moment. > > > > Now, I'm looking for pylab, but I don't see that anywhere (just > matplotlib, > > although my scripts all use "import pylab"). > > pylab is part of matplotlib. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From karl.young at ucsf.edu Fri Nov 20 11:48:48 2009 From: karl.young at ucsf.edu (Young, Karl) Date: Fri, 20 Nov 2009 08:48:48 -0800 Subject: [SciPy-User] finite element packages Message-ID: <72BBA065386338429D2C4E83E442CD4E28BB24958C@EX02.net.ucsf.edu> I'm trying to model a flexible flywheel (hence my question about Wierstrass elliptic functions a couple of weeks ago - thanks again for the helpful replies). I'm now trying to consider realistic models with elastic materials that go beyond my abilities to model analytically and figured I need to look at finite element models. I haven't used finite element packages and was wondering if anyone on the list had any recommendations, preferably scipythonic but I'm just curious generally about what people would consider using for a problem like this (i.e. a rotating flexible rope type problem). Thanks for any thoughts, -- Karl From d.l.goldsmith at gmail.com Fri Nov 20 12:10:21 2009 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Fri, 20 Nov 2009 09:10:21 -0800 Subject: [SciPy-User] finite element packages In-Reply-To: <72BBA065386338429D2C4E83E442CD4E28BB24958C@EX02.net.ucsf.edu> References: <72BBA065386338429D2C4E83E442CD4E28BB24958C@EX02.net.ucsf.edu> Message-ID: <45d1ab480911200910j4e31860dkdc8f50f7f0848565@mail.gmail.com> Forgive me if you provided this in the previous thread, but, for reference, what analytic model(s) (differential equations, presumably) are you using that led you to elliptical functions? Also, are you interested in modeling transient (time-dependent) or steady-state (d/dt=0), stability-instability transitions, oscillatory mode amplification and damping, etc.? Finally, are you comparing theory w/ experiment, i.e., do you also have experimental data you're modeling and/or using to tweak your analytic models' parameters? DG On Fri, Nov 20, 2009 at 8:48 AM, Young, Karl wrote: > > I'm trying to model a flexible flywheel (hence my question about Wierstrass > elliptic functions a couple of weeks ago - thanks again for the helpful > replies). I'm now trying to consider realistic models with elastic materials > that go beyond my abilities to model analytically and figured I need to look > at finite element models. > > I haven't used finite element packages and was wondering if anyone on the > list had any recommendations, preferably scipythonic but I'm just curious > generally about what people would consider using for a problem like this > (i.e. a rotating flexible rope type problem). Thanks for any thoughts, > > -- Karl > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ramercer at gmail.com Fri Nov 20 12:33:13 2009 From: ramercer at gmail.com (Adam Mercer) Date: Fri, 20 Nov 2009 11:33:13 -0600 Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard In-Reply-To: References: Message-ID: <799406d60911200933r410a49cfva6b16e953bfcc55d@mail.gmail.com> On Thu, Nov 19, 2009 at 17:39, Rich E wrote: > When trying to install py-scipy in macports, I get the following error: > > ichard-Eakins-MacBook-Pro:scipy richardeakin$ sudo port install py-scipy > Portfile changed since last build; discarding previous state. > --->? Computing dependencies for py-scipy > --->? Staging py-numpy into destroot > Error: Target org.macports.destroot returned: error renaming > "/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_python_py-numpy/work/destroot/opt/local/bin/f2py": > no such file or directory > Error: The following dependencies failed to build: py-numpy swig-python > bison gsed python_select swig > Error: Status 1 encountered during processing. This is http://trac.macports.org/ticket/22571 > I don't know if that is an issue for this list or the macports list.? But, > I'm moving on to other methods.. Any reason why you want the python2.4 version? If you're on Snow Leopard the python2.6 version will work much better. Anyway this is a MacPort issue. Cheers Adam From reakinator at gmail.com Fri Nov 20 12:52:39 2009 From: reakinator at gmail.com (Rich E) Date: Fri, 20 Nov 2009 18:52:39 +0100 Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard In-Reply-To: <799406d60911200933r410a49cfva6b16e953bfcc55d@mail.gmail.com> References: <799406d60911200933r410a49cfva6b16e953bfcc55d@mail.gmail.com> Message-ID: Where do you see that I wanted the 2.4 version? I ended up installing 2.6 macpython, numpy, scipy, and matplotlib from dmg files, then ipython from source. It is working so far. Rich On Fri, Nov 20, 2009 at 6:33 PM, Adam Mercer wrote: > On Thu, Nov 19, 2009 at 17:39, Rich E wrote: > > When trying to install py-scipy in macports, I get the following error: > > > > ichard-Eakins-MacBook-Pro:scipy richardeakin$ sudo port install py-scipy > > Portfile changed since last build; discarding previous state. > > ---> Computing dependencies for py-scipy > > ---> Staging py-numpy into destroot > > Error: Target org.macports.destroot returned: error renaming > > > "/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_python_py-numpy/work/destroot/opt/local/bin/f2py": > > no such file or directory > > Error: The following dependencies failed to build: py-numpy swig-python > > bison gsed python_select swig > > Error: Status 1 encountered during processing. > > This is http://trac.macports.org/ticket/22571 > > > I don't know if that is an issue for this list or the macports list. > But, > > I'm moving on to other methods.. > > Any reason why you want the python2.4 version? If you're on Snow > Leopard the python2.6 version will work much better. Anyway this is a > MacPort issue. > > Cheers > > Adam > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ramercer at gmail.com Fri Nov 20 13:01:07 2009 From: ramercer at gmail.com (Adam Mercer) Date: Fri, 20 Nov 2009 12:01:07 -0600 Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard In-Reply-To: References: <799406d60911200933r410a49cfva6b16e953bfcc55d@mail.gmail.com> Message-ID: <799406d60911201001i7ad71417hec3dbfc2b3855f5b@mail.gmail.com> On Fri, Nov 20, 2009 at 11:52, Rich E wrote: > Where do you see that I wanted the 2.4 version?? I ended up installing 2.6 > macpython, numpy, scipy, and matplotlib from dmg files, then ipython from > source.? It is working so far. The fact that you tried to install py-scipy, this is the python2.4 version. Cheers Adam From karl.young at ucsf.edu Fri Nov 20 13:06:06 2009 From: karl.young at ucsf.edu (Young, Karl) Date: Fri, 20 Nov 2009 10:06:06 -0800 Subject: [SciPy-User] finite element packages In-Reply-To: <45d1ab480911200910j4e31860dkdc8f50f7f0848565@mail.gmail.com> References: <72BBA065386338429D2C4E83E442CD4E28BB24958C@EX02.net.ucsf.edu>, <45d1ab480911200910j4e31860dkdc8f50f7f0848565@mail.gmail.com> Message-ID: <72BBA065386338429D2C4E83E442CD4E28BB249590@EX02.net.ucsf.edu> Hi David, Thanks for the quick reply. I'm at a fairly early stage with this and so it's still fairly exploratory. That said I guess the main goal is to help my friend, who already has a working prtotype of a flexible flywheel, model and balance various parameter choices like speed of the flywheel, deformation of the wheel based on parameters associated with various material choices,... I obtained my analytic model by appropriately modifying the force diagram from a paper on the "skipping rope" problem; I obtained a nonlinear differential equation for the form of the loops of the flywheel that had elliptic functions as solutions. To first order I'm hoping that I can do some useful static modeling, i.e. in the rotating frame, even with more realistic parameters for the loop material, i.e. I guess the answer to the question is that my initial interest is in steady-state models (though I guess at some point it would be nice to study spin up and spin down). Again, to first order I'm not that concerned about looking at stability-instability transitions or oscillatory mode amplification and damping because my friend has a working prototype that seems to be pretty deeply in a stable range, at least re. variation in rotation speeds. The hope is that I can model the system in a way such that small changes in things like material parameters won't effect the stability regime (the flexible flywheel, combined with a fancy gimbal system seems to have a sort of surprisingly large stability range, re. parameters like rotation speeds and loop radius). But I may need to eventually model oscillatory modes and stability transitions re. use of some materials for the loop. The first goal will be to compare the model/simulations with his prototype, i.e. experiment (e.g. we may take pictures as in some of the skipping rope papers). Maybe my approach sounds silly; it's very preliminary and exploratory. Physicists (and particularly me) are probably too dumb to think about hard mechanical engineering problems ! -- Karl ________________________________________ From: scipy-user-bounces at scipy.org [scipy-user-bounces at scipy.org] On Behalf Of David Goldsmith [d.l.goldsmith at gmail.com] Sent: Friday, November 20, 2009 9:10 AM To: SciPy Users List Subject: Re: [SciPy-User] finite element packages Forgive me if you provided this in the previous thread, but, for reference, what analytic model(s) (differential equations, presumably) are you using that led you to elliptical functions? Also, are you interested in modeling transient (time-dependent) or steady-state (d/dt=0), stability-instability transitions, oscillatory mode amplification and damping, etc.? Finally, are you comparing theory w/ experiment, i.e., do you also have experimental data you're modeling and/or using to tweak your analytic models' parameters? DG On Fri, Nov 20, 2009 at 8:48 AM, Young, Karl > wrote: I'm trying to model a flexible flywheel (hence my question about Wierstrass elliptic functions a couple of weeks ago - thanks again for the helpful replies). I'm now trying to consider realistic models with elastic materials that go beyond my abilities to model analytically and figured I need to look at finite element models. I haven't used finite element packages and was wondering if anyone on the list had any recommendations, preferably scipythonic but I'm just curious generally about what people would consider using for a problem like this (i.e. a rotating flexible rope type problem). Thanks for any thoughts, -- Karl _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From d.l.goldsmith at gmail.com Fri Nov 20 14:49:37 2009 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Fri, 20 Nov 2009 11:49:37 -0800 Subject: [SciPy-User] finite element packages In-Reply-To: <72BBA065386338429D2C4E83E442CD4E28BB249590@EX02.net.ucsf.edu> References: <72BBA065386338429D2C4E83E442CD4E28BB24958C@EX02.net.ucsf.edu> <45d1ab480911200910j4e31860dkdc8f50f7f0848565@mail.gmail.com> <72BBA065386338429D2C4E83E442CD4E28BB249590@EX02.net.ucsf.edu> Message-ID: <45d1ab480911201149s3e1967dfs1e9d9520436bdb4@mail.gmail.com> On Fri, Nov 20, 2009 at 10:06 AM, Young, Karl wrote: > > Hi David, > > Thanks for the quick reply. I'm at a fairly early stage with this and so > it's still fairly exploratory. That said I guess the main goal is to help my > friend, who already has a working prtotype of a flexible flywheel, model and > balance various parameter choices like speed of the flywheel, deformation > of the wheel based on parameters associated with various material > choices,... > > I obtained my analytic model by appropriately modifying the force diagram > from a paper on the "skipping rope" problem; I obtained a nonlinear > differential equation for the form of the loops of the flywheel that had > elliptic functions as solutions. To first order I'm hoping that I can do > some useful static modeling, i.e. in the rotating frame, even with more > realistic parameters for the loop material, i.e. I guess the answer to the > question is that my initial interest is in steady-state models (though I > guess at some point it would be nice to study spin up and spin down). > > Again, to first order I'm not that concerned about looking at > stability-instability transitions or oscillatory mode amplification and > damping because my friend has a working prototype that seems to be pretty > deeply in a stable range, at least re. variation in rotation speeds. The > hope is that I can model the system in a way such that small changes in > things like material parameters won't effect the stability regime (the > flexible flywheel, combined with a fancy gimbal system seems to have a sort > of surprisingly large stability range, re. parameters like rotation speeds > and loop radius). But I may need to eventually model oscillatory modes and > stability transitions re. use of some materials for the loop. > > The first goal will be to compare the model/simulations with his prototype, > i.e. experiment (e.g. we may take pictures as in some of the skipping rope > papers). > > Maybe my approach sounds silly; it's very preliminary and exploratory. > Physicists (and particularly me) are probably too dumb to think about hard > mechanical engineering problems ! > No, but there is one key factor you're unclear as to how you're modeling, which an ME would consider among the first things to model, namely, a model for the elasticity of the "flexible material": how the flywheel deforms due to centripetal acceleration will clearly affect its moment of inertia, affecting its rotational momentum and kinetic energy, and in turn its elastic potential energy; elastic damping sounds like it is also important. In any event, I was hoping you'd supply the actual non-linear DE(s), as the FEM is not always well-suited to such problems: depending on the nature of the nonlinearities and your choice of basis functions, completing the required integration by parts may be intractable (or prohibitively difficult for a first iteration in an "exploratory" investigation). In particular, the physically-required periodicity of your solutions (whatever your solutions are at theta=0, they have to be the same at theta=2pi, unless your flywheel is experiencing a jump discontinuity there) suggest that a spectral method may be more appropriate (aka "Harmonic Balance"; "Article 125" in Zwillinger, D., 1998. "Handbook of Differential Equations, 3rd Ed." Academic Press [highly recommended] states: "Applicable to: Nonlinear ODE's w/ periodic solutions. Yields: An approximate solution valid over the entire period. There is a specified procedure for increasing the number of terms and, hence, for increasing the accuracy." Sounds like exactly what you need...the article furnishes an external reference which I can forward if desired. I'd be remiss if I did not mention however, that spectral and finite element methods are not necessarily mutually exclusive: periodic basis functions are among those for which the FEM is well-developed.) FWIW, DG > > -- Karl > > ________________________________________ > From: scipy-user-bounces at scipy.org [scipy-user-bounces at scipy.org] On > Behalf Of David Goldsmith [d.l.goldsmith at gmail.com] > Sent: Friday, November 20, 2009 9:10 AM > To: SciPy Users List > Subject: Re: [SciPy-User] finite element packages > > Forgive me if you provided this in the previous thread, but, for reference, > what analytic model(s) (differential equations, presumably) are you using > that led you to elliptical functions? Also, are you interested in modeling > transient (time-dependent) or steady-state (d/dt=0), stability-instability > transitions, oscillatory mode amplification and damping, etc.? Finally, are > you comparing theory w/ experiment, i.e., do you also have experimental data > you're modeling and/or using to tweak your analytic models' parameters? > > DG > > On Fri, Nov 20, 2009 at 8:48 AM, Young, Karl karl.young at ucsf.edu>> wrote: > > I'm trying to model a flexible flywheel (hence my question about Wierstrass > elliptic functions a couple of weeks ago - thanks again for the helpful > replies). I'm now trying to consider realistic models with elastic materials > that go beyond my abilities to model analytically and figured I need to look > at finite element models. > > I haven't used finite element packages and was wondering if anyone on the > list had any recommendations, preferably scipythonic but I'm just curious > generally about what people would consider using for a problem like this > (i.e. a rotating flexible rope type problem). Thanks for any thoughts, > > -- Karl > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.trem at gmail.com Fri Nov 20 15:26:06 2009 From: david.trem at gmail.com (David Trem) Date: Fri, 20 Nov 2009 21:26:06 +0100 Subject: [SciPy-User] sinc interpolation Message-ID: <4B06FB5E.8070806@gmail.com> Hello, Is sinc interpolation available in Scipy ? I've just ask this question to Travis Oliphant during the entought webinar that had just ended but unfortunately I was not able to ear the reply due to poor sound quality just at that moment :-( Hope someone could give me his or a reply to this question. Thanks, David From karl.young at ucsf.edu Fri Nov 20 16:21:41 2009 From: karl.young at ucsf.edu (Young, Karl) Date: Fri, 20 Nov 2009 13:21:41 -0800 Subject: [SciPy-User] finite element packages In-Reply-To: <45d1ab480911201149s3e1967dfs1e9d9520436bdb4@mail.gmail.com> References: <72BBA065386338429D2C4E83E442CD4E28BB24958C@EX02.net.ucsf.edu> <45d1ab480911200910j4e31860dkdc8f50f7f0848565@mail.gmail.com> <72BBA065386338429D2C4E83E442CD4E28BB249590@EX02.net.ucsf.edu>, <45d1ab480911201149s3e1967dfs1e9d9520436bdb4@mail.gmail.com> Message-ID: <72BBA065386338429D2C4E83E442CD4E28BB249592@EX02.net.ucsf.edu> Hi David, I was assuming that I'd have to just abandon the analytical form if I included elasticity so I didn't think to include the differential equation that I got. I don't have it handy but it was something pretty simple like y(x)'' - c * y(x)^3 = 0 and based on whether I included a couple of approximations or not there was a first derivative term as well; y is the radial extent of the loop and x is angle. My configuration was a little odd in that there were 4 "spokes" for the flywheel, which were just more of the rope tied together and the loopy part consisted of lengths that were longer than circular arcs (a configuration that he found empirically to be more stable). So I only accounted for 1 quarter of the loop (between spokes) and my boundary conditions were just y(0) = L, y(pi/2) = L where L is length of the "spoke". Re. generalizing to account for elasticity I found a nice paper that analyzed the catenary problem for "Neo-Hookean" materials (sort of the next step in sophistication from modeling deformation with Hooke's law, e.g. accounts for change in cross section as a function of stretching - though I'm sure you know about that already) and figured I'd start with that. Since I haven't done any finite element modeling I assumed I could just start with a model per element that included forces, boundary conditions, and elasticity parameters and get a numerical solution. Thanks much for the suggestion re. spectral methods I will definitely try to run down a copy of Zwillinger's article and take a look. -- Karl ________________________________________ From: scipy-user-bounces at scipy.org [scipy-user-bounces at scipy.org] On Behalf Of David Goldsmith [d.l.goldsmith at gmail.com] Sent: Friday, November 20, 2009 11:49 AM To: SciPy Users List Subject: Re: [SciPy-User] finite element packages On Fri, Nov 20, 2009 at 10:06 AM, Young, Karl > wrote: Hi David, Thanks for the quick reply. I'm at a fairly early stage with this and so it's still fairly exploratory. That said I guess the main goal is to help my friend, who already has a working prtotype of a flexible flywheel, model and balance various parameter choices like speed of the flywheel, deformation of the wheel based on parameters associated with various material choices,... I obtained my analytic model by appropriately modifying the force diagram from a paper on the "skipping rope" problem; I obtained a nonlinear differential equation for the form of the loops of the flywheel that had elliptic functions as solutions. To first order I'm hoping that I can do some useful static modeling, i.e. in the rotating frame, even with more realistic parameters for the loop material, i.e. I guess the answer to the question is that my initial interest is in steady-state models (though I guess at some point it would be nice to study spin up and spin down). Again, to first order I'm not that concerned about looking at stability-instability transitions or oscillatory mode amplification and damping because my friend has a working prototype that seems to be pretty deeply in a stable range, at least re. variation in rotation speeds. The hope is that I can model the system in a way such that small changes in things like material parameters won't effect the stability regime (the flexible flywheel, combined with a fancy gimbal system seems to have a sort of surprisingly large stability range, re. parameters like rotation speeds and loop radius). But I may need to eventually model oscillatory modes and stability transitions re. use of some materials for the loop. The first goal will be to compare the model/simulations with his prototype, i.e. experiment (e.g. we may take pictures as in some of the skipping rope papers). Maybe my approach sounds silly; it's very preliminary and exploratory. Physicists (and particularly me) are probably too dumb to think about hard mechanical engineering problems ! No, but there is one key factor you're unclear as to how you're modeling, which an ME would consider among the first things to model, namely, a model for the elasticity of the "flexible material": how the flywheel deforms due to centripetal acceleration will clearly affect its moment of inertia, affecting its rotational momentum and kinetic energy, and in turn its elastic potential energy; elastic damping sounds like it is also important. In any event, I was hoping you'd supply the actual non-linear DE(s), as the FEM is not always well-suited to such problems: depending on the nature of the nonlinearities and your choice of basis functions, completing the required integration by parts may be intractable (or prohibitively difficult for a first iteration in an "exploratory" investigation). In particular, the physically-required periodicity of your solutions (whatever your solutions are at theta=0, they have to be the same at theta=2pi, unless your flywheel is experiencing a jump discontinuity there) suggest that a spectral method may be more appropriate (aka "Harmonic Balance"; "Article 125" in Zwillinger, D., 1998. "Handbook of Differential Equations, 3rd Ed." Academic Press [highly recommended] states: "Applicable to: Nonlinear ODE's w/ periodic solutions. Yields: An approximate solution valid over the entire period. There is a specified procedure for increasing the number of terms and, hence, for increasing the accuracy." Sounds like exactly what you need...the article furnishes an external reference which I can forward if desired. I'd be remiss if I did not mention however, that spectral and finite element methods are not necessarily mutually exclusive: periodic basis functions are among those for which the FEM is well-developed.) FWIW, DG -- Karl ________________________________________ From: scipy-user-bounces at scipy.org [scipy-user-bounces at scipy.org] On Behalf Of David Goldsmith [d.l.goldsmith at gmail.com] Sent: Friday, November 20, 2009 9:10 AM To: SciPy Users List Subject: Re: [SciPy-User] finite element packages Forgive me if you provided this in the previous thread, but, for reference, what analytic model(s) (differential equations, presumably) are you using that led you to elliptical functions? Also, are you interested in modeling transient (time-dependent) or steady-state (d/dt=0), stability-instability transitions, oscillatory mode amplification and damping, etc.? Finally, are you comparing theory w/ experiment, i.e., do you also have experimental data you're modeling and/or using to tweak your analytic models' parameters? DG On Fri, Nov 20, 2009 at 8:48 AM, Young, Karl >> wrote: I'm trying to model a flexible flywheel (hence my question about Wierstrass elliptic functions a couple of weeks ago - thanks again for the helpful replies). I'm now trying to consider realistic models with elastic materials that go beyond my abilities to model analytically and figured I need to look at finite element models. I haven't used finite element packages and was wondering if anyone on the list had any recommendations, preferably scipythonic but I'm just curious generally about what people would consider using for a problem like this (i.e. a rotating flexible rope type problem). Thanks for any thoughts, -- Karl _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org> http://mail.scipy.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user From ferrell at diablotech.com Fri Nov 20 17:55:47 2009 From: ferrell at diablotech.com (Robert Ferrell) Date: Fri, 20 Nov 2009 15:55:47 -0700 Subject: [SciPy-User] installing scipy on OS X 10.6 Snow Leopard In-Reply-To: <6324058A-FD14-47BF-9308-9BB5D123CA3A@gmail.com> References: <6324058A-FD14-47BF-9308-9BB5D123CA3A@gmail.com> Message-ID: On Nov 18, 2009, at 5:40 PM, Pierre GM wrote: > > On Nov 18, 2009, at 6:13 PM, Rich E wrote: > >> Hi list, >> >> I just joined because I've had this mac for over a month and I >> still can't get a working module of scipy in Snow Leopard. The dmg >> says it needs 'python 2.6 or newer', but the one that comes with >> Snow Leopard is 2.6.1. > > So you're set: 2.6.1 is more recent than 2.6... > But you probably shouldn't use a dmg: install Scipy from sources, > it's far easier to help you. > Assuming you have xcode installed, and a proper gfortran (I think > this one is the recommended one: http://r.research.att.com/tools/) > > * Install numpy first. make a local install by using the --user flag > when calling python setup.py install. No need to install an > additional Python if you use --user, you won't be messing with your > system Is there documentation for --user? http://docs.python.org/install/index.html documents --home. Is --user equivalent to --home=~? thanks, -robert > > * Then, install scipy, using the --user flag as well. Don't bother > for UMFPACK for the moment > > * You may want to set CFLAGS="-arch x86_64" before installing numpy > and scipy. > > * Let me know where your problems are (off-list for now to reduce > the noise), post the log of your build somewhere. > > Don't worry, it's straightforward, provided you stick to the Python > that comes w/ SnowLeopard > Good luck > P. > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From emmanuelle.gouillart at normalesup.org Sat Nov 21 04:24:24 2009 From: emmanuelle.gouillart at normalesup.org (Emmanuelle Gouillart) Date: Sat, 21 Nov 2009 10:24:24 +0100 Subject: [SciPy-User] finite element packages In-Reply-To: <72BBA065386338429D2C4E83E442CD4E28BB249592@EX02.net.ucsf.edu> References: <72BBA065386338429D2C4E83E442CD4E28BB24958C@EX02.net.ucsf.edu> <45d1ab480911200910j4e31860dkdc8f50f7f0848565@mail.gmail.com> <45d1ab480911201149s3e1967dfs1e9d9520436bdb4@mail.gmail.com> <72BBA065386338429D2C4E83E442CD4E28BB249592@EX02.net.ucsf.edu> Message-ID: <20091121092424.GA29506@phare.normalesup.org> Hello Karl, you may have already found it by googling "python finite elements", but sfepy (http://code.google.com/p/sfepy/) has a good reputation (I haven't used it myself, though). Check the examples page to see if it can suit your needs. Cheers, Emmanuelle On Fri, Nov 20, 2009 at 01:21:41PM -0800, Young, Karl wrote: > Hi David, > I was assuming that I'd have to just abandon the analytical form if I included elasticity so I didn't think to include the differential equation that I got. I don't have it handy but it was something pretty simple like y(x)'' - c * y(x)^3 = 0 and based on whether I included a couple of approximations or not there was a first derivative term as well; y is the radial extent of the loop and x is angle. My configuration was a little odd in that there were 4 "spokes" for the flywheel, which were just more of the rope tied together and the loopy part consisted of lengths that were longer than circular arcs (a configuration that he found empirically to be more stable). So I only accounted for 1 quarter of the loop (between spokes) and my boundary conditions were just y(0) = L, y(pi/2) = L where L is length of the "spoke". > Re. generalizing to account for elasticity I found a nice paper that analyzed the catenary problem for "Neo-Hookean" materials (sort of the next step in sophistication from modeling deformation with Hooke's law, e.g. accounts for change in cross section as a function of stretching - though I'm sure you know about that already) and figured I'd start with that. Since I haven't done any finite element modeling I assumed I could just start with a model per element that included forces, boundary conditions, and elasticity parameters and get a numerical solution. > Thanks much for the suggestion re. spectral methods I will definitely try to run down a copy of Zwillinger's article and take a look. > -- Karl > ________________________________________ > From: scipy-user-bounces at scipy.org [scipy-user-bounces at scipy.org] On Behalf Of David Goldsmith [d.l.goldsmith at gmail.com] > Sent: Friday, November 20, 2009 11:49 AM > To: SciPy Users List > Subject: Re: [SciPy-User] finite element packages > On Fri, Nov 20, 2009 at 10:06 AM, Young, Karl > wrote: > Hi David, > Thanks for the quick reply. I'm at a fairly early stage with this and so it's still fairly exploratory. That said I guess the main goal is to help my friend, who already has a working prtotype of a flexible flywheel, model and balance various parameter choices like speed of the flywheel, deformation of the wheel based on parameters associated with various material choices,... > I obtained my analytic model by appropriately modifying the force diagram from a paper on the "skipping rope" problem; I obtained a nonlinear differential equation for the form of the loops of the flywheel that had elliptic functions as solutions. To first order I'm hoping that I can do some useful static modeling, i.e. in the rotating frame, even with more realistic parameters for the loop material, i.e. I guess the answer to the question is that my initial interest is in steady-state models (though I guess at some point it would be nice to study spin up and spin down). > Again, to first order I'm not that concerned about looking at stability-instability transitions or oscillatory mode amplification and damping because my friend has a working prototype that seems to be pretty deeply in a stable range, at least re. variation in rotation speeds. The hope is that I can model the system in a way such that small changes in things like material parameters won't effect the stability regime (the flexible flywheel, combined with a fancy gimbal system seems to have a sort of surprisingly large stability range, re. parameters like rotation speeds and loop radius). But I may need to eventually model oscillatory modes and stability transitions re. use of some materials for the loop. > The first goal will be to compare the model/simulations with his prototype, i.e. experiment (e.g. we may take pictures as in some of the skipping rope papers). > Maybe my approach sounds silly; it's very preliminary and exploratory. Physicists (and particularly me) are probably too dumb to think about hard mechanical engineering problems ! > No, but there is one key factor you're unclear as to how you're modeling, which an ME would consider among the first things to model, namely, a model for the elasticity of the "flexible material": how the flywheel deforms due to centripetal acceleration will clearly affect its moment of inertia, affecting its rotational momentum and kinetic energy, and in turn its elastic potential energy; elastic damping sounds like it is also important. In any event, I was hoping you'd supply the actual non-linear DE(s), as the FEM is not always well-suited to such problems: depending on the nature of the nonlinearities and your choice of basis functions, completing the required integration by parts may be intractable (or prohibitively difficult for a first iteration in an "exploratory" investigation). In particular, the physically-required periodicity of your solutions (whatever your solutions are at theta=0, they have to be the same at theta=2pi, unless your flywheel is experiencing a j > ump discontinuity there) suggest that a spectral method may be more appropriate (aka "Harmonic Balance"; "Article 125" in Zwillinger, D., 1998. "Handbook of Differential Equations, 3rd Ed." Academic Press [highly recommended] states: "Applicable to: Nonlinear ODE's w/ periodic solutions. Yields: An approximate solution valid over the entire period. There is a specified procedure for increasing the number of terms and, hence, for increasing the accuracy." Sounds like exactly what you need...the article furnishes an external reference which I can forward if desired. I'd be remiss if I did not mention however, that spectral and finite element methods are not necessarily mutually exclusive: periodic basis functions are among those for which the FEM is well-developed.) > FWIW, > DG > -- Karl > ________________________________________ > From: scipy-user-bounces at scipy.org [scipy-user-bounces at scipy.org] On Behalf Of David Goldsmith [d.l.goldsmith at gmail.com] > Sent: Friday, November 20, 2009 9:10 AM > To: SciPy Users List > Subject: Re: [SciPy-User] finite element packages > Forgive me if you provided this in the previous thread, but, for reference, what analytic model(s) (differential equations, presumably) are you using that led you to elliptical functions? Also, are you interested in modeling transient (time-dependent) or steady-state (d/dt=0), stability-instability transitions, oscillatory mode amplification and damping, etc.? Finally, are you comparing theory w/ experiment, i.e., do you also have experimental data you're modeling and/or using to tweak your analytic models' parameters? > DG > On Fri, Nov 20, 2009 at 8:48 AM, Young, Karl >> wrote: > I'm trying to model a flexible flywheel (hence my question about Wierstrass elliptic functions a couple of weeks ago - thanks again for the helpful replies). I'm now trying to consider realistic models with elastic materials that go beyond my abilities to model analytically and figured I need to look at finite element models. > I haven't used finite element packages and was wondering if anyone on the list had any recommendations, preferably scipythonic but I'm just curious generally about what people would consider using for a problem like this (i.e. a rotating flexible rope type problem). Thanks for any thoughts, > -- Karl > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org> > http://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From sturla at molden.no Sat Nov 21 05:18:46 2009 From: sturla at molden.no (Sturla Molden) Date: Sat, 21 Nov 2009 11:18:46 +0100 Subject: [SciPy-User] sinc interpolation In-Reply-To: <4B06FB5E.8070806@gmail.com> References: <4B06FB5E.8070806@gmail.com> Message-ID: <4B07BE86.6000006@molden.no> I have a least-sqaures interpolator similar to Matlab's interp function. Basically it just constructs a FIR filter that can be used with scipy.signal.lfilter. Also you can use FFTs for interpolation. Just rfft the signal, append zeros, and invert the transform. Sturla David Trem skrev: > Hello, > > Is sinc interpolation available in Scipy ? > > I've just ask this question to Travis Oliphant during > the entought webinar that had just ended but unfortunately > I was not able to ear the reply due to poor sound quality just at > that moment :-( > Hope someone could give me his or a reply to this question. > > Thanks, > > David > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From silva at lma.cnrs-mrs.fr Sat Nov 21 07:42:19 2009 From: silva at lma.cnrs-mrs.fr (Fabricio Silva) Date: Sat, 21 Nov 2009 13:42:19 +0100 Subject: [SciPy-User] sinc interpolation In-Reply-To: <4B06FB5E.8070806@gmail.com> References: <4B06FB5E.8070806@gmail.com> Message-ID: <1258807340.2525.0.camel@PCTerrusse> Le vendredi 20 novembre 2009 ? 21:26 +0100, David Trem a ?crit : > Hello, > > Is sinc interpolation available in Scipy ? David Cournapeau has a scikit for that : http://pypi.python.org/pypi/scikits.samplerate/ -- Fabrice Silva Laboratory of Mechanics and Acoustics (CNRS, UPR 7051) From stefan at sun.ac.za Sat Nov 21 09:38:54 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sat, 21 Nov 2009 16:38:54 +0200 Subject: [SciPy-User] Dijkstra's algorithm on a lattice In-Reply-To: References: Message-ID: <9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com> Hi Zach 2009/11/19 Zachary Pincus : > A bit off-topic, but before I write some C or cython to do this, I > thought I'd ask to see if anyone knows of existing code for the task > of finding the shortest (weighted) path between two points on a lattice. This is what the shortest path routine in scikits.image is meant to do. How can we modify it to make it more useful to you? Cheers St?fan From zachary.pincus at yale.edu Sat Nov 21 10:06:57 2009 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Sat, 21 Nov 2009 10:06:57 -0500 Subject: [SciPy-User] Dijkstra's algorithm on a lattice In-Reply-To: <9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com> References: <9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com> Message-ID: <9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu> > 2009/11/19 Zachary Pincus : >> A bit off-topic, but before I write some C or cython to do this, I >> thought I'd ask to see if anyone knows of existing code for the task >> of finding the shortest (weighted) path between two points on a >> lattice. > > This is what the shortest path routine in scikits.image is meant to > do. How can we modify it to make it more useful to you? Hi St?fan, Based on just a rudimentary perusal of that code, I thought it only found the lowest-cost path from the left to the right of an array... is this still the case? I'd been needing something to go from an arbitrary point to any other arbitrary point. I just started working on some cython code that will compute the shortest path (under a given length) from any point to any other. It's not quite Dijkstra (for which one needs to keep a sorted list of the next pixels to visit), but more like breadth-first search to a given depth. I'll be happy to send it over if that sort of thing sounds useful. Zach From zachary.pincus at yale.edu Sat Nov 21 16:52:48 2009 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Sat, 21 Nov 2009 16:52:48 -0500 Subject: [SciPy-User] Dijkstra's algorithm on a lattice In-Reply-To: <9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu> References: <9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com> <9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu> Message-ID: <99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu> OK, here's what I have. Not Dijkstra's algorithm, but very simple and not bad for many purposes. You pass in a 2D costs array, start- and end-points, and a maximum number of iterations; the code then keeps track of the minimum cumulative cost to each pixel from the starting point, as well as the path thereto. It does this by keeping track of "active" pixels -- any time a lower cumulative cost to a given pixel is found, that pixel is made active. Each iteration, all the neighbors of the "active" pixels are examined to see if their costs can be lowered too. Basically breadth-first search. Limitations and oddities: - Currently, diagonal and vertical/horizontal steps are both allowed. Easy enough to make this a parameter. - Paths along the boundary aren't traced out because I didn't want to deal with an if-check in the inner loop to make sure that the x,y position plus the current offset wasn't out of bounds. This could be addressed by (a) padding the input array by one pixel on each side, (b) putting the if in the inner loop, or (c) having a second pass through the edge pixels. - In theory, the code could find the cheapest path from top-left to bottom-right in a single pass because "active" pixels are marked immediately as the code iterates through the array. So the max_iters parameter doesn't guarantee that paths longer than that will not be found. But it does guarantee that any path found less than that length is optimal... Let's say it's BSD licensed, in case anyone finds it of use. Zach -------------- next part -------------- A non-text attachment was scrubbed... Name: trace_path.zip Type: application/zip Size: 1263 bytes Desc: not available URL: -------------- next part -------------- From gokhansever at gmail.com Sat Nov 21 16:57:10 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Sat, 21 Nov 2009 15:57:10 -0600 Subject: [SciPy-User] Fitting a curve on a log-normal distributed data In-Reply-To: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> References: <49d6b3500911162144x1193e04cj1a103776092c4471@mail.gmail.com> Message-ID: <49d6b3500911211357h4b4870c8q7355c2d32db55fe6@mail.gmail.com> One more update on this subject. I have been looking through some of the papers on this topic, and I have finally found exactly what I need in this paper: Hussein, T., Dal Maso, M., Petaja, T., Koponen, I. K., Paatero, P., Aalto, P. P., Hameri, K., and Kulmala, M.: Evaluation of an automatic algorithm for ?tting the particle number size distributions, Boreal Environ. Res., 10, 337?355, 2005. Here is the abstract: "The multi log-normal distribution function is widely in use to parameterize the aerosol particle size distributions. The main purpose of such a parameterization is to quantitatively describe size distributions and to allow straightforward comparisons between different aerosol particle data sets. In this study, we developed and evaluated an algorithm to parameterize aerosol particle number size distributions with the multi log-normal distribution function. The current algorithm is automatic and does not need a user decision for the initial input parameters; it requires only the maximum number of possible modes and then it reduces this number, if possible, without affecting the fitting quality. The reduction of the number of modes is based on an overlapping test between adjacent modes. The algorithm was evaluated against a previous algorithm that can be considered as a standard procedure. It was also evaluated against a long-term data set and different types of measured aerosol particle size distributions in the ambient atmosphere. The evaluation of the current algorithm showed the following advantages: (I) it is suitable for different types of aerosol particles observed in different environments and conditions, (2) it showed agreement with the previous standard algorithm in about 90% of long-term data set, (3) it is not time-consuming, particularly when long-term data sets are analyzed, and (4) it is a useful tool in the studies of atmospheric aerosol particle formation and transformation." The full-text is freely available at: http://www.borenv.net/BER/pdfs/ber10/ber10-337.pdf On Mon, Nov 16, 2009 at 11:44 PM, G?khan Sever wrote: > Hello, > > I have a data which represents aerosol size distribution in between 0.1 to > 3.0 micrometer ranges. I would like extrapolate the lower size down to 10 > nm. The data in this context is log-normally distributed. Therefore I am > looking a way to fit a log-normal curve onto my data. Could you please give > me some pointers to solve this problem? > > Thank you. > > -- > G?khan > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From gruben at bigpond.net.au Sat Nov 21 19:48:27 2009 From: gruben at bigpond.net.au (Gary Ruben) Date: Sun, 22 Nov 2009 11:48:27 +1100 Subject: [SciPy-User] Dijkstra's algorithm on a lattice In-Reply-To: <99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu> References: <9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com> <9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu> <99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu> Message-ID: <4B088A5B.8000604@bigpond.net.au> Hi Zach, I haven't looked at your code, but your description sounds like you've got a very nice solution. When you originally asked this, I immediately thought of Lee's algorithm, or Jarvis's distance-transform based path planning, which uses a modified distance transform that fixes the start and goal point costs. I didn't mention them because they don't cover to your case, but I think your solution is a more general case or theirs - i.e. you can use yours for navigation/maze solving by setting the obstacle/wall values to something greater than the maximum distance to the goal and the floor values to 0. I think would be a very nice, general routine for scikits.image, Gary R. Zachary Pincus wrote: > OK, here's what I have. Not Dijkstra's algorithm, but very simple and > not bad for many purposes. > > You pass in a 2D costs array, start- and end-points, and a maximum > number of iterations; the code then keeps track of the minimum > cumulative cost to each pixel from the starting point, as well as the > path thereto. It does this by keeping track of "active" pixels -- any > time a lower cumulative cost to a given pixel is found, that pixel is > made active. Each iteration, all the neighbors of the "active" pixels > are examined to see if their costs can be lowered too. Basically > breadth-first search. > > Limitations and oddities: > - Currently, diagonal and vertical/horizontal steps are both allowed. > Easy enough to make this a parameter. > - Paths along the boundary aren't traced out because I didn't want to > deal with an if-check in the inner loop to make sure that the x,y > position plus the current offset wasn't out of bounds. This could be > addressed by (a) padding the input array by one pixel on each side, (b) > putting the if in the inner loop, or (c) having a second pass through > the edge pixels. > - In theory, the code could find the cheapest path from top-left to > bottom-right in a single pass because "active" pixels are marked > immediately as the code iterates through the array. So the max_iters > parameter doesn't guarantee that paths longer than that will not be > found. But it does guarantee that any path found less than that length > is optimal... > > Let's say it's BSD licensed, in case anyone finds it of use. > > Zach > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From zachary.pincus at yale.edu Sat Nov 21 23:45:26 2009 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Sat, 21 Nov 2009 23:45:26 -0500 Subject: [SciPy-User] Dijkstra's algorithm on a lattice In-Reply-To: <4B088A5B.8000604@bigpond.net.au> References: <9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com> <9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu> <99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu> <4B088A5B.8000604@bigpond.net.au> Message-ID: <66E0FDD3-4822-4722-BB6E-C7F21A679D32@yale.edu> Thanks Gary! Attached is a simplified version that addresses all of the caveats I had earlier, plus I added documentation and input-checking. Boundaries are now handled properly (the bounds-checking is not too slow, even on huge arrays), and the code now iterates until all paths have been fully traced. Still BSD licensed; if it might be useful to scikits.image, please feel free to include it. Here's a simple "maze" solving example / test. >>> import numpy >>> import trace_path >>> a = numpy.ones((8,8), dtype=numpy.float32) >>> a[1:-1,1] = 0 >>> a[1,1:-1] = 0 >>> a array([[ 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 0., 0., 0., 0., 0., 0., 1.], [ 1., 0., 1., 1., 1., 1., 1., 1.], [ 1., 0., 1., 1., 1., 1., 1., 1.], [ 1., 0., 1., 1., 1., 1., 1., 1.], [ 1., 0., 1., 1., 1., 1., 1., 1.], [ 1., 0., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1.]], dtype=float32) >>> trace_path.trace_path(a, (1, 6), [(7, 2)]) (array([[ 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 0., 0., 0., 0., 0., 0., 1.], [ 1., 0., 1., 1., 1., 1., 1., 1.], [ 1., 0., 1., 2., 2., 2., 2., 2.], [ 1., 0., 1., 2., 3., 3., 3., 3.], [ 1., 0., 1., 2., 3., 4., 4., 4.], [ 1., 0., 1., 2., 3., 4., 5., 5.], [ 1., 1., 1., 2., 3., 4., 5., 6.]], dtype=float32), [[(1, 6), (1, 5), (1, 4), (1, 3), (1, 2), (2, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 2)]]) >>> trace_path.trace_path(a, (1, 6), [(7, 2)], diagonal_steps=False) (array([[ 2., 1., 1., 1., 1., 1., 1., 2.], [ 1., 0., 0., 0., 0., 0., 0., 1.], [ 1., 0., 1., 1., 1., 1., 1., 2.], [ 1., 0., 1., 2., 2., 2., 2., 3.], [ 1., 0., 1., 2., 3., 3., 3., 4.], [ 1., 0., 1., 2., 3., 4., 4., 5.], [ 1., 0., 1., 2., 3., 4., 5., 6.], [ 2., 1., 2., 3., 4., 5., 6., 7.]], dtype=float32), [[(1, 6), (1, 5), (1, 4), (1, 3), (1, 2), (1, 1), (2, 1), (3, 1), (4, 1), (5, 1), (6, 1), (6, 2), (7, 2)]]) Zach On Nov 21, 2009, at 7:48 PM, Gary Ruben wrote: > Hi Zach, > > I haven't looked at your code, but your description sounds like you've > got a very nice solution. When you originally asked this, I > immediately > thought of Lee's algorithm, or Jarvis's distance-transform based path > planning, which uses a modified distance transform that fixes the > start > and goal point costs. I didn't mention them because they don't cover > to > your case, but I think your solution is a more general case or > theirs - > i.e. you can use yours for navigation/maze solving by setting the > obstacle/wall values to something greater than the maximum distance to > the goal and the floor values to 0. > > I think would be a very nice, general routine for scikits.image, > > Gary R. > > Zachary Pincus wrote: >> OK, here's what I have. Not Dijkstra's algorithm, but very simple and >> not bad for many purposes. >> >> You pass in a 2D costs array, start- and end-points, and a maximum >> number of iterations; the code then keeps track of the minimum >> cumulative cost to each pixel from the starting point, as well as the >> path thereto. It does this by keeping track of "active" pixels -- any >> time a lower cumulative cost to a given pixel is found, that pixel is >> made active. Each iteration, all the neighbors of the "active" pixels >> are examined to see if their costs can be lowered too. Basically >> breadth-first search. >> >> Limitations and oddities: >> - Currently, diagonal and vertical/horizontal steps are both allowed. >> Easy enough to make this a parameter. >> - Paths along the boundary aren't traced out because I didn't want to >> deal with an if-check in the inner loop to make sure that the x,y >> position plus the current offset wasn't out of bounds. This could be >> addressed by (a) padding the input array by one pixel on each side, >> (b) >> putting the if in the inner loop, or (c) having a second pass through >> the edge pixels. >> - In theory, the code could find the cheapest path from top-left to >> bottom-right in a single pass because "active" pixels are marked >> immediately as the code iterates through the array. So the max_iters >> parameter doesn't guarantee that paths longer than that will not be >> found. But it does guarantee that any path found less than that >> length >> is optimal... >> >> Let's say it's BSD licensed, in case anyone finds it of use. >> >> Zach >> >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- A non-text attachment was scrubbed... Name: trace_path.zip Type: application/zip Size: 2025 bytes Desc: not available URL: From stefan at sun.ac.za Sun Nov 22 06:00:57 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sun, 22 Nov 2009 13:00:57 +0200 Subject: [SciPy-User] Dijkstra's algorithm on a lattice In-Reply-To: <66E0FDD3-4822-4722-BB6E-C7F21A679D32@yale.edu> References: <9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com> <9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu> <99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu> <4B088A5B.8000604@bigpond.net.au> <66E0FDD3-4822-4722-BB6E-C7F21A679D32@yale.edu> Message-ID: <9457e7c80911220300l29370151y655159ab835e6359@mail.gmail.com> Hi Zach 2009/11/22 Zachary Pincus : > Boundaries are now handled properly (the bounds-checking is not too > slow, even on huge arrays), and the code now iterates until all paths > have been fully traced. Still BSD licensed; if it might be useful to > scikits.image, please feel free to include it. This code looks really handy, and I'd love to add it. Would you consider putting your code in a branch on github? Simply go to the following URL and click "fork": http://github.com/stefanv/scikits.image Add your changes, push back to github and click the button "merge request", then I'll make sure it gets merged to the main branch. Thanks! St?fan From stefan at sun.ac.za Sun Nov 22 06:07:14 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sun, 22 Nov 2009 13:07:14 +0200 Subject: [SciPy-User] Dijkstra's algorithm on a lattice In-Reply-To: <9457e7c80911220300l29370151y655159ab835e6359@mail.gmail.com> References: <9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com> <9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu> <99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu> <4B088A5B.8000604@bigpond.net.au> <66E0FDD3-4822-4722-BB6E-C7F21A679D32@yale.edu> <9457e7c80911220300l29370151y655159ab835e6359@mail.gmail.com> Message-ID: <9457e7c80911220307x28112983ud6e3b454b3b69911@mail.gmail.com> 2009/11/22 St?fan van der Walt : > This code looks really handy, and I'd love to add it. ?Would you > consider putting your code in a branch on github? Actually, don't worry -- I'll add it quickly. Thanks for the contribution! Cheers St?fan From cedrick.faury at freesbee.fr Sun Nov 22 07:50:47 2009 From: cedrick.faury at freesbee.fr (=?ISO-8859-1?Q?C=E9drick_FAURY?=) Date: Sun, 22 Nov 2009 13:50:47 +0100 Subject: [SciPy-User] Incoherent results with signal.impulse Message-ID: <4B0933A7.6020708@freesbee.fr> Hello, I have scipy 0.7.1, python 2.6, and when I do : n = scipy.array([1]) d = scipy.array([0.01, 0.2, 1.0]) T, yout = scipy.signal.impulse((n,d)) it gives incoherent results for yout. And that doesn't occurs with [1.0, 2.0, 1.0] denominator. Is it a bug ? I'm doing something wrong ? Is anybody knows a solution ? Thanks by advance C?drick FAURY -------------- next part -------------- An HTML attachment was scrubbed... URL: From simon+python at a-oben.org Sun Nov 22 13:42:26 2009 From: simon+python at a-oben.org (Simon Friedberger) Date: Sun, 22 Nov 2009 19:42:26 +0100 Subject: [SciPy-User] Mailing list? Message-ID: <20091122184226.GA9227@a-oben.org> Hi everybody, I sent a message about the K-Means algorithm a couple of days ago but it seems like it never made it on the list. Are new members moderated or something? Best Simon From simon+python at a-oben.org Sun Nov 22 13:46:21 2009 From: simon+python at a-oben.org (Simon Friedberger) Date: Sun, 22 Nov 2009 19:46:21 +0100 Subject: [SciPy-User] Mailing list? In-Reply-To: <20091122184226.GA9227@a-oben.org> References: <20091122184226.GA9227@a-oben.org> Message-ID: <20091122184621.GB9227@a-oben.org> Ok, apparently this message got through, so that answers my question. Here is my original messages. Sorry for the confusion. Good Night Everybody, I just looked at the documentation for the K-Means vector quantization functions and I am a bit confused. On the one hand it says that normalization to unit variance would be beneficial on the other hand there are a lot of "must"s in the descriptions. I was wondering if it is possible to use the functions without normalization or if there is a negative impact. This would make sense because it seems reasonable that one would want to build the codebook on some set and then quantize a different set. In this case normalization would have to be the same or be omitted. I am also interested in literature recommendations concerning why this is a good idea in general. Any help would be greatly appreciated. Best Simon On 19:42 Sun 22.11.09, Simon Friedberger wrote: > Hi everybody, > > I sent a message about the K-Means algorithm a couple of days ago but it > seems like it never made it on the list. Are new members moderated or > something? > > Best > Simon > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From warren.weckesser at enthought.com Sun Nov 22 13:59:09 2009 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Sun, 22 Nov 2009 12:59:09 -0600 Subject: [SciPy-User] Incoherent results with signal.impulse In-Reply-To: <4B0933A7.6020708@freesbee.fr> References: <4B0933A7.6020708@freesbee.fr> Message-ID: <4B0989FD.4060809@enthought.com> C?drick FAURY wrote: > Hello, > > I have scipy 0.7.1, python 2.6, and when I do : > > n = scipy.array([1]) > d = scipy.array([0.01, 0.2, 1.0]) > T, yout = scipy.signal.impulse((n,d)) > > it gives incoherent results for yout. > > And that doesn't occurs with [1.0, 2.0, 1.0] denominator. > > Is it a bug ? > I'm doing something wrong ? > Is anybody knows a solution ? Hi C?drick, scipy.signal.impulse assumes that the state matrix A is diagonalizable, so it does not give a correct result when A is defective. I would call that a bug. :) The attached file contains the function impulse_response() that uses a different method to compute the impulse response. If run as a script, the code at the bottom of the file plots impulse responses computed by impulse_response() and by scipy.signal.impulse() for your example, and for two other values of the leading coefficient of your denominator. Warren > Thanks by advance > > C?drick FAURY > > ------------------------------------------------------------------------ > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: impulse_response.py URL: From cedrick.faury at freesbee.fr Sun Nov 22 14:31:00 2009 From: cedrick.faury at freesbee.fr (=?ISO-8859-1?Q?C=E9drick_FAURY?=) Date: Sun, 22 Nov 2009 20:31:00 +0100 Subject: [SciPy-User] Incoherent results with signal.impulse References: 4B0933A7.6020708@freesbee.fr Message-ID: <4B099174.8020800@freesbee.fr> > > >/ Hello, > />/ > />/ I have scipy 0.7.1, python 2.6, and when I do : > />/ > />/ n = scipy.array([1]) > />/ d = scipy.array([0.01, 0.2, 1.0]) > />/ T, yout = scipy.signal.impulse((n,d)) > />/ > />/ it gives incoherent results for yout. > />/ > />/ And that doesn't occurs with [1.0, 2.0, 1.0] denominator. > />/ > />/ Is it a bug ? > />/ I'm doing something wrong ? > />/ Is anybody knows a solution ? > / > > scipy.signal.impulse assumes that the state matrix A is diagonalizable, > so it does not give a correct result when A is defective. I would call > that a bug. :) > > The attached file contains the function impulse_response() that uses a > different method to compute the impulse response. If run as a script, > the code at the bottom of the file plots impulse responses computed by > impulse_response() and by scipy.signal.impulse() for your example, and > for two other values of the leading coefficient of your denominator. > Thank you very much, it works fine now ! Actualy, if i'm right, the solution is to use lsim2 ? C?drick From warren.weckesser at enthought.com Sun Nov 22 14:38:29 2009 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Sun, 22 Nov 2009 13:38:29 -0600 Subject: [SciPy-User] Incoherent results with signal.impulse In-Reply-To: <4B099174.8020800@freesbee.fr> References: 4B0933A7.6020708@freesbee.fr <4B099174.8020800@freesbee.fr> Message-ID: <4B099335.7070200@enthought.com> C?drick FAURY wrote: >>> / Hello, >>> >> />/ >> />/ I have scipy 0.7.1, python 2.6, and when I do : >> />/ >> />/ n = scipy.array([1]) >> />/ d = scipy.array([0.01, 0.2, 1.0]) >> />/ T, yout = scipy.signal.impulse((n,d)) >> />/ >> />/ it gives incoherent results for yout. >> />/ >> />/ And that doesn't occurs with [1.0, 2.0, 1.0] denominator. >> />/ >> />/ Is it a bug ? >> />/ I'm doing something wrong ? >> />/ Is anybody knows a solution ? >> / >> >> scipy.signal.impulse assumes that the state matrix A is diagonalizable, >> so it does not give a correct result when A is defective. I would call >> that a bug. :) >> >> The attached file contains the function impulse_response() that uses a >> different method to compute the impulse response. If run as a script, >> the code at the bottom of the file plots impulse responses computed by >> impulse_response() and by scipy.signal.impulse() for your example, and >> for two other values of the leading coefficient of your denominator. >> >> > Thank you very much, it works fine now ! > Actualy, if i'm right, the solution is to use lsim2 ? > Yes, it uses lsim2, with the input U all zeros, and with the initial condition set to the B matrix (plus the optional X0, if given). Warren > C?drick > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Mon Nov 23 00:43:57 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 23 Nov 2009 00:43:57 -0500 Subject: [SciPy-User] stats, classes instead of functions for results MovStats Message-ID: <1cd32cbb0911222143p35d24a1m52596afd13bd1661@mail.gmail.com> Following up on a question by Keith on the numpy list and his reminder that covariance can be calculated by the cross-product minus the product of the means, I redid and enhanced my moving stats functions. Suppose x and y are two time series, then the moving correlation requires the calculation of the mean, variance and covariance for each window. Currently in scipy stats intermediate results are usually thrown away on return (while rpy/R returns all intermediate results used for the calculation. Using a decorator/descriptor of Fernando written for nitime, I tried out to write the function as a class instead, so that any desired ( intermediate) calculations are only made on demand, but once they are calculated they are attached to the class as attributes or properties. This seems to be a useful "pattern". Are there any opinion for using the pattern in scipy.stats ? MovStats will currently go into statsmodels Below is the class (with cutting part of init), a full script is the attachment, including examples that test the class. about MovStats: y and x are tested for 2d, either (T,N) with axis=0 or (N,T) with axis=1, should (but may not yet) work for nd arrays along any axis (signal.correlate docstring) nans are handled by dropping the corresponding observations from the window, not adding any additional observations, not tested if a window is empty because it contains only nans, nor if variance is zero (kern is intended for weighted statistics in the window but not tested yet, I still need to decide on normalization requirements) requires scipy.signal, all calculations done with signal.correlate, no loops as often, functions are one-liners all results are returned for valid observations only, initial observations with incomplete window are cut bonus: slope of moving regression of y on x, since it was trivial to add still some cleaning and documentation to do usage: ms = MovStats(x, y, axis=1) ms.yvar ms.xmean ms.yxcorr ms.yxcov ... Josef class MovStats(object): def __init__(self, y, x=None, kern=5, axis=0): self.y = y self.x = x if np.isscalar(kern): ws = kern <... snip> @OneTimeProperty def ymean(self): ys = signal.correlate(self.y, self.kern, mode='same')[self.sslice] ym = ys/self.n return ym @OneTimeProperty def yvar(self): ys2 = signal.correlate(self.y*self.y, self.kern, mode='same')[self.sslice] yvar = ys2/self.n - self.ymean**2 return yvar @OneTimeProperty def xmean(self): if self.x is None: return None else: xs = signal.correlate(self.x, self.kern, mode='same')[self.sslice] xm = xs/self.n return xm @OneTimeProperty def xvar(self): if self.x is None: return None else: xs2 = signal.correlate(self.x*self.x, self.kern, mode='same')[self.sslice] xvar = xs2/self.n - self.xmean**2 return xvar @OneTimeProperty def yxcov(self): xys = signal.correlate(self.x*self.y, self.kern, mode='same')[self.sslice] return xys/self.n - self.ymean*self.xmean @OneTimeProperty def yxcorr(self): return self.yxcov/np.sqrt(self.yvar*self.xvar) @OneTimeProperty def yxslope(self): return self.yxcov/self.xvar -------------- next part -------------- # -*- coding: utf-8 -*- """ Created on Sat Nov 21 14:22:29 2009 Author: josef-pktd """ import numpy as np from scipy import signal class OneTimeProperty(object): """A descriptor to make special properties that become normal attributes. This is meant to be used mostly by the auto_attr decorator in this module. Author: Fernando Perez, copied from nitime """ def __init__(self,func): """Create a OneTimeProperty instance. Parameters ---------- func : method The method that will be called the first time to compute a value. Afterwards, the method's name will be a standard attribute holding the value of this computation. """ self.getter = func self.name = func.func_name def __get__(self,obj,type=None): """This will be called on attribute access on the class or instance. """ if obj is None: # Being called on the class, return the original function. This way, # introspection works on the class. #return func print 'class access' return self.getter val = self.getter(obj) #print "** auto_attr - loading '%s'" % self.name # dbg setattr(obj, self.name, val) return val def moving_slope(x,y): '''estimate moving slope coefficient of regression of y on x filters along axis=1, returns valid observations Todo: axis and lag options idea by John D'Errico ''' xx = np.column_stack((np.ones(x.shape), x)) pinvxx = np.linalg.pinv(xx)[1:,:] windsize = len(x) lead = windsize//2 - 1 return signal.correlate(y, pinvxx, 'full' )[:,windsize-lead:-(windsize+1*lead-2)] def corrxy(x, y, ws): # based on example by Keith d = np.nan * np.ones_like(y) for i in range(y.shape[0]): yi = y[i,:] xi = x[i,:] for j in range(ws-1, y.shape[1]): yj = yi[j+1-ws:j+1] xj = xi[j+1-ws:j+1] d[i,j] = np.corrcoef(xj, yj, bias=1)[0,1] return d x = np.sin(np.arange(20))[None,:] + np.random.randn(5, 20) #x = y**2 def movstats(y, x=None, ws=5, kind='mvcr', axis=0): ''' return moving correlation between two timeseries handles 1d or 2d data ''' kdim = [1]*y.ndim kdim[axis] = ws kern = np.ones(tuple(kdim)) sslice = [slice(None)]*y.ndim sslice[axis] = slice(ws//2, -ws//2+1) ys = signal.correlate(y, kern, mode='same')[sslice] ys2 = signal.correlate(y*y, kern, mode='same')[sslice] xs = signal.correlate(x, kern, mode='same')[sslice] xs2 = signal.correlate(x*x, kern, mode='same')[sslice] xys = signal.correlate(x*y, kern, mode='same')[sslice] n = ws ym = ys/(1.*n) xm = xs/(1.*n) yvar = ys2/(1.*n) - ym**2 xvar = xs2/(1.*n) - xm**2 xycov = xys/(1.*n) - ym*xm xycorr = xycov/np.sqrt(yvar*xvar) return xycorr class MovStats(object): def __init__(self, y, x=None, kern=5, axis=0): self.y = y self.x = x if np.isscalar(kern): ws = kern kdim = [1]*self.y.ndim #print ws, kdim, self.y.ndim kdim[axis] = ws self.kern = np.ones(tuple(kdim)) else: ws = y.shape[axis] if ((kern.ndim != self.y.ndim) or np.all([kern.shape(i) for i in self.y.ndim if not i==axis])): raise ValueError('kern has incorrect shape') self.kern = kern sslice = [slice(None)]*y.ndim sslice[axis] = slice(ws//2, -ws//2+1) self.sslice = sslice #Todo: add nan handling if self.x is None: ynotnan = ~np.isnan(self.y) else: self.x = np.copy(self.x) ynotnan = (~np.isnan(self.y))*(~np.isnan(self.x)) #ynotnan = ~np.logical_or(np.isnan(self.y), np.isnan(self.x)) self.x[~ynotnan] = 0 self.y = np.copy(self.y) self.y[~ynotnan] = 0 if ynotnan.all(): self.n = 1.0* ws else: self.n = signal.correlate(ynotnan, self.kern, mode='same')[self.sslice] @OneTimeProperty def ymean(self): ys = signal.correlate(self.y, self.kern, mode='same')[self.sslice] ym = ys/self.n return ym @OneTimeProperty def yvar(self): ys2 = signal.correlate(self.y*self.y, self.kern, mode='same')[self.sslice] yvar = ys2/self.n - self.ymean**2 return yvar @OneTimeProperty def xmean(self): if self.x is None: return None else: xs = signal.correlate(self.x, self.kern, mode='same')[self.sslice] xm = xs/self.n return xm @OneTimeProperty def xvar(self): if self.x is None: return None else: xs2 = signal.correlate(self.x*self.x, self.kern, mode='same')[self.sslice] xvar = xs2/self.n - self.xmean**2 return xvar @OneTimeProperty def yxcov(self): xys = signal.correlate(self.x*self.y, self.kern, mode='same')[self.sslice] return xys/self.n - self.ymean*self.xmean @OneTimeProperty def yxcorr(self): return self.yxcov/np.sqrt(self.yvar*self.xvar) @OneTimeProperty def yxslope(self): return self.yxcov/self.xvar x = np.array([1.0, 2.0, 3.0, 4.0, 5.0]) x = np.sin(np.arange(20))[None,:] + np.random.randn(5, 20) y = 1*np.arange(20)[None,:] + np.random.randn(5, 20) ws=5 ms = MovStats(x, y, axis=1) print dir(ms) xyc = MovStats(x, y, axis=1).yxcorr xyc_loop = corrxy(x, y, ws)[:,ws-1:] #testing #print xyc_loop #print xyc print np.corrcoef(y[0,:5],x[0,:5],bias=1) print np.corrcoef(y[0,2:7],x[0,2:7],bias=1) print np.corrcoef(y[1,:5],x[1,:5],bias=1) print np.corrcoef(y[-1,-5:],x[-1,-5:],bias=1) print 'maxabsdiff', np.max(np.abs(xyc_loop - xyc)) print 'test yxolsslope' from scipy import stats print stats.linregress(y[0,:5],x[0,:5])[0] print stats.linregress(y[0,2:7],x[0,2:7])[0] print stats.linregress(y[-1,-5:],x[-1,-5:])[0] print ms.yxslope print 'test axis=0' xyc_loopT = corrxy(x.T, y.T, ws)[:,ws-1:].T xycT = MovStats(x, y, axis=0).yxcorr print 'maxabsdiff', np.max(np.abs(xyc_loopT - xycT)) print 'testnan' xn = x.copy() xn[:, 2::5] = np.nan xync = MovStats(xn, y, axis=1).yxcorr #print xync xnr = xn[~np.isnan(xn)].reshape(5,-1) ynr = y[~np.isnan(xn)].reshape(5,-1) xync_loop = corrxy(xnr, ynr, 4)[:,4-1:] xyncr = xync[~np.isnan(xn)[:,4:]].reshape(5,-1) print 'maxabsdiff', np.max(np.abs(xync_loop - xyncr)) From dwf at cs.toronto.edu Mon Nov 23 01:10:03 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Mon, 23 Nov 2009 01:10:03 -0500 Subject: [SciPy-User] kmeans (Re: Mailing list?) In-Reply-To: <20091122184621.GB9227@a-oben.org> References: <20091122184226.GA9227@a-oben.org> <20091122184621.GB9227@a-oben.org> Message-ID: <86B36623-EEEB-4A95-9F72-5664DCC84717@cs.toronto.edu> On 22-Nov-09, at 1:46 PM, Simon Friedberger wrote: > I just looked at the documentation for the K-Means vector quantization > functions and I am a bit confused. On the one hand it says that > normalization to unit variance would be beneficial on the other hand > there are a lot of "must"s in the descriptions. > I was wondering if it is possible to use the functions without > normalization or if there is a negative impact. Damian did use some strong language there. The kmeans function won't know whether you've normalized or not, but in some cases you can expect much better solutions with normalized input (the function is called "whiten" which is somewhat misleading, as "whitening" is often used in the literature to mean decorrelating i.e. rotating by the eigenvectors of the covariance). kmeans uses the Euclidean distance, meaning that the distance between two points is the sum of the squared difference of each point's coordinates. If you have different coordinates that have vastly different scales, say some in the thousands and some that are always less than one, then one or two coordinates can dominate the distance calculation and make the other coordinates nearly irrelevant in the clustering (if the difference in scale is _really_ big then you can lose precision in the rounding error, too). Units are almost always arbitrary, and so scaling by the standard deviation helps this in that it treats all of your features "equally" (you may also want to subtract the mean before calling whiten(), as well - this is in fact one of the standard tricks, it doesn't matter so much here but it can help with numerical conditioning in a lot of algorithms, particularly ones that involve gradient descent). If you want to quantize new vectors after clustering then you can simply apply the reverse transformation to your codebook/centroids (i.e. multiply by the std. dev. of the original data and add back the mean if you subtracted it) which will scale them to the correct position in the space of the original data. > This would make sense because it seems reasonable that one would > want to > build the codebook on some set and then quantize a different set. In > this case normalization would have to be the same or be omitted. Yes, you could apply the normalization you applied to the training data on every point you wish to quantize after the fact, but it's usually easier to just apply the inverse transformation to the codebook (especially if the number of points you want to quantize greatly exceeds the number of items in your codebook, in which case you save a lot of floating point ops). > I am also interested in literature recommendations concerning why this > is a good idea in general. Well, lots of references will tell you what I just told you (that high- variance features will dominate distance calculations if you don't normalize). There is some discussion of it in the 'Prototype Methods' chapter in 'The Elements of Statistical Learning': http://www-stat.stanford.edu/~tibs/ElemStatLearn/ David From pgmdevlist at gmail.com Mon Nov 23 01:39:16 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 23 Nov 2009 01:39:16 -0500 Subject: [SciPy-User] stats, classes instead of functions for results MovStats In-Reply-To: <1cd32cbb0911222143p35d24a1m52596afd13bd1661@mail.gmail.com> References: <1cd32cbb0911222143p35d24a1m52596afd13bd1661@mail.gmail.com> Message-ID: On Nov 23, 2009, at 12:43 AM, josef.pktd at gmail.com wrote: > Following up on a question by Keith on the numpy list and his reminder > that covariance can be calculated by the cross-product minus the > product of the means, I redid and > enhanced my moving stats functions. > > Suppose x and y are two time series, then the moving correlation > requires the calculation of the mean, variance and covariance for each > window. Currently in scipy stats intermediate results are usually > thrown away on return (while rpy/R returns all intermediate results > used for the calculation. > > Using a decorator/descriptor of Fernando written for nitime, I tried > out to write the function as a class instead, so that any desired ( > intermediate) calculations are only made on demand, but once they are > calculated they are attached to the class as attributes or properties. > This seems to be a useful "pattern". > > Are there any opinion for using the pattern in scipy.stats ? MovStats > will currently go into statsmodels > > Below is the class (with cutting part of init), a full script is the > attachment, including examples that test the class. > > about MovStats: > y and x are tested for 2d, either (T,N) with axis=0 or (N,T) with > axis=1, should (but may not yet) work for nd arrays along any axis > (signal.correlate docstring) > nans are handled by dropping the corresponding observations from the > window, not adding any additional observations, > not tested if a window is empty because it contains only nans, nor if > variance is zero > (kern is intended for weighted statistics in the window but not tested > yet, I still need to decide on normalization requirements) > requires scipy.signal, all calculations done with signal.correlate, no loops > as often, functions are one-liners > all results are returned for valid observations only, initial > observations with incomplete window are cut > bonus: slope of moving regression of y on x, since it was trivial to add > still some cleaning and documentation to do Can you add support for MaskedArrays ? The easiest would be to check whether your inputs are masked arrays. If yes, make sure they're float (transform them if needed) and fill them w/ nans as needed. You can also check what Matt did w/ scikits.timeseries. About your suggestion: I'd leave it in statsmodels for now... From josef.pktd at gmail.com Mon Nov 23 02:13:28 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 23 Nov 2009 02:13:28 -0500 Subject: [SciPy-User] stats, classes instead of functions for results MovStats In-Reply-To: References: <1cd32cbb0911222143p35d24a1m52596afd13bd1661@mail.gmail.com> Message-ID: <1cd32cbb0911222313v3bd21f9aye671125d0fea28b0@mail.gmail.com> On Mon, Nov 23, 2009 at 1:39 AM, Pierre GM wrote: > On Nov 23, 2009, at 12:43 AM, josef.pktd at gmail.com wrote: >> Following up on a question by Keith on the numpy list and his reminder >> that covariance can be calculated by the cross-product minus the >> product of the means, I redid and >> enhanced my moving stats functions. >> >> Suppose x and y are two time series, then the moving correlation >> requires the calculation of the mean, variance and covariance for each >> window. Currently in scipy stats intermediate results are usually >> thrown away on return (while rpy/R returns all intermediate results >> used for the calculation. >> >> Using a decorator/descriptor of Fernando written for nitime, I tried >> out to write the function as a class instead, so that any desired ( >> intermediate) calculations are only made on demand, but once they are >> calculated they are attached to the class as attributes or properties. >> This seems to be a useful "pattern". >> >> Are there any opinion for using the pattern in scipy.stats ? MovStats >> will currently go into statsmodels >> >> Below is the class (with cutting part of init), a full script is the >> attachment, including examples that test the class. >> >> about MovStats: >> y and x are tested for 2d, either (T,N) with axis=0 or (N,T) with >> axis=1, should (but may not yet) work for nd arrays along any axis >> (signal.correlate docstring) >> nans are handled by dropping the corresponding observations from the >> window, not adding any additional observations, >> not tested if a window is empty because it contains only nans, nor if >> variance is zero >> (kern is intended for weighted statistics in the window but not tested >> yet, I still need to decide on normalization requirements) >> requires scipy.signal, all calculations done with signal.correlate, no loops >> as often, functions are one-liners >> all results are returned for valid observations only, initial >> observations with incomplete window are cut >> bonus: slope of moving regression of y on x, since it was trivial to add >> still some cleaning and documentation to do > > > Can you add support for MaskedArrays ? > The easiest would be to check whether your inputs are masked arrays. If yes, make sure they're float (transform them if needed) and fill them w/ nans as needed. Since only __init__ is affected this should be quite easy, I only need the mask for the calculation of the number of non-nan elements in a window, and to fill the data array with zeros. I haven't thought about different numeric types, I guess I should make sure that also for the non-ma arrays the calculations are done with floats. > You can also check what Matt did w/ scikits.timeseries. The way of calculating this, I initially got from scikits.timeseries autocovariance, your moving_funcs are mostly in c, cmov_window uses np.convolve which is only for 1d and needs to loop. The advantage of scipy.signal over numpy is that it does nd convolution. I will look at the mask handling in time series again. I always get mixed up with convolve versus correlate. Is there a standard sorting for time series, up to down or left to right by increasing time or reversed? I have to check this for non-flat window weights/kernels. > About your suggestion: I'd leave it in statsmodels for now... movstat goes into statsmodels.sandbox.tsa which is my playground for time series analysis for scipy.stats I was thinking more of existing or other functions, e.g. my version of groupstats, (mean, variance, demean, ... by groups) would follow the same pattern of partly expensive calculations on demand. Josef > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From simon+python at a-oben.org Mon Nov 23 02:18:33 2009 From: simon+python at a-oben.org (Simon Friedberger) Date: Mon, 23 Nov 2009 08:18:33 +0100 Subject: [SciPy-User] kmeans (Re: Mailing list?) In-Reply-To: <86B36623-EEEB-4A95-9F72-5664DCC84717@cs.toronto.edu> References: <20091122184226.GA9227@a-oben.org> <20091122184621.GB9227@a-oben.org> <86B36623-EEEB-4A95-9F72-5664DCC84717@cs.toronto.edu> Message-ID: <20091123071831.GA14634@a-oben.org> Hi David, thanks for your explanation. I agree with your arguments but couldn't it have the opposite effect: Weighing features that should have less discriminative power more because they have a small variance? I'm just not sure about it but I will check out the book you reference. I've had it lying around for a while anyway. On the case of inverting the transformation. Is this functionality built-in? I can't find anything in the docs. Thanks Simon From robert.kern at gmail.com Mon Nov 23 02:54:16 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 23 Nov 2009 01:54:16 -0600 Subject: [SciPy-User] kmeans (Re: Mailing list?) In-Reply-To: <20091123071831.GA14634@a-oben.org> References: <20091122184226.GA9227@a-oben.org> <20091122184621.GB9227@a-oben.org> <86B36623-EEEB-4A95-9F72-5664DCC84717@cs.toronto.edu> <20091123071831.GA14634@a-oben.org> Message-ID: <3d375d730911222354u4821ab2arbab7c48364b3aacc@mail.gmail.com> On Mon, Nov 23, 2009 at 01:18, Simon Friedberger wrote: > Hi David, > > thanks for your explanation. I agree with your arguments but couldn't it > have the opposite effect: Weighing features that should have less > discriminative power more because they have a small variance? If a variable has a small variance, a large deviation in that variable is *very* informative and should have a larger impact on the classification than a small deviation in a variable that has a large variance. Let's distinguish two cases: one in which each variable has its own units (let's say degrees Celsius and meters) and one in which each variable is commensurable and in the same units (let's say meters). Now, in the first case, you need some way to put all of the variables into the same units so you can sensibly compute a distance using all of the variables. A reasonable choice of units is "one standard deviation [of the marginal distribution for the variable]". In the second case, there *may* be a case for not doing prewhitening. If your points are actually 3D points in real space with a metric, then you may want to use that space's metric as the distance. However, if the process that created your data is creating "oblong" distributions of points, that may indicate that it is using a different notion of distance. In fact, you may want to do a PCA to find the right rotation such that your variables are orthogonal to the principal directions of variation. And then prewhiten in those directions. The key point is to find an appropriate definition of distance to use. Prewhitening is a good default when you don't have a model of your process, yet. And you usually don't. :-) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From matthieu.brucher at gmail.com Mon Nov 23 08:46:40 2009 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Mon, 23 Nov 2009 14:46:40 +0100 Subject: [SciPy-User] Modified Bessel functions of the first kind Message-ID: Hi, I need the zero-order modified Bessel function of the first kind. I've seen the scipy.special.besselpoly, but I don't know if it really is what I'm looking for... Does someone know? Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From eadrogue at gmx.net Mon Nov 23 09:11:33 2009 From: eadrogue at gmx.net (Ernest =?iso-8859-1?Q?Adrogu=E9?=) Date: Mon, 23 Nov 2009 15:11:33 +0100 Subject: [SciPy-User] Modified Bessel functions of the first kind In-Reply-To: References: Message-ID: <20091123141133.GA3743@doriath.local> Hi, 23/11/09 @ 14:46 (+0100), thus spake Matthieu Brucher: > I need the zero-order modified Bessel function of the first kind. I've > seen the scipy.special.besselpoly, but I don't know if it really is > what I'm looking for... > Does someone know? It's scipy.special.iv(order, x) it gives the modified Bessel function of the first kind of order 'order' evaluated at 'x'. Bye. From matthieu.brucher at gmail.com Mon Nov 23 09:15:41 2009 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Mon, 23 Nov 2009 15:15:41 +0100 Subject: [SciPy-User] Modified Bessel functions of the first kind In-Reply-To: <20091123141133.GA3743@doriath.local> References: <20091123141133.GA3743@doriath.local> Message-ID: Excellent! Thanks a lot for this. Matthieu 2009/11/23 Ernest Adrogu? : > Hi, > 23/11/09 @ 14:46 (+0100), thus spake Matthieu Brucher: >> I need the zero-order modified Bessel function of the first kind. I've >> seen the scipy.special.besselpoly, but I don't know if it really is >> what I'm looking for... >> Does someone know? > > It's scipy.special.iv(order, x) > it gives the modified Bessel function of the first kind of > order 'order' evaluated at 'x'. > > Bye. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From cclarke at chrisdev.com Mon Nov 23 08:59:21 2009 From: cclarke at chrisdev.com (Christopher Clarke) Date: Mon, 23 Nov 2009 09:59:21 -0400 Subject: [SciPy-User] timeseries forwardfill In-Reply-To: <3E381758-BD11-4EAE-B035-16554519F724@chrisdev.com> References: <9DC4A120-0DF0-4E33-91E1-04584E04135F@chrisdev.com> <3E381758-BD11-4EAE-B035-16554519F724@chrisdev.com> Message-ID: <6fb517fa0911230559k51b8cbaat141c02e1d82acb77@mail.gmail.com> Hi Something seems to have gone wrong with my initial reply!! Anyway, I often encounter the "initial values" use case when I am creating business day time series out of RDBMS tables using a subset of the observations in the table. For example i have a SQL query fragment like WHERE symbol='SFC' and dateix BETWEEN '2009-01-01' AND '2009-09-01' Now suppose that 2009-01-02 and 2009-01-05 are missing (trading is sparse on many of the exchanges i'm dealing with) i am supposed to forward_fill using the last traded value for SFC which may or may not be 2008-12-31. Hence i have a query that find the last traded values and i use these as the "initial values". Anyway here is by forward_fill wrapper. Its not very efficient as i'm copying and forward_fill is copying etc but.. I'm actually starting to have reservation about the usefulness forward_fill on 2d as opposed to the individual the individual series arrays as i am finding that i've often got to do loads of transformations and checking on the individuals arrays before i can combine them into a single ma array for filling anyway def forward_fill2(marr,maxgap=None,init_vals=None): """ init_vals a list with the same no. of elements as marr.shape[1] """ arr=ma.array(marr,copy=True if arr.ndim == 1: if init_vals: if arr.mask.any() and arr.mask[0]: if init_vals: arr[0] = init_vals[0] return forward_fill(arr,maxgap) else: n = arr.shape[1] if init_vals: mask=ma.getmask(arr) if len(init_vals) != n: raise ValueError, 'Initial Values sequence does no match number of columns' for c in range(len(init_vals)): if arr.mask.any() and mask[0,c]: if init_vals[c]: arr[0,c]=init_vals[c] arr = ma.hsplit(arr, n)[0] return ma.column_stack([forward_fill(np.squeeze(a),maxgap) for a in arr]) On Thu, Nov 19, 2009 at 5:07 AM, Chris Clarke wrote: > Hi > The "initial value" > > On Nov 18, 2009, at 9:17 PM, Pierre GM wrote: > > > On Nov 18, 2009, at 5:18 PM, Chris Clarke wrote: > > Sorry for the later reply. Yes forward_fill is still there and it > > works!!! > > > Good > > But it seemed to have some more capability (initial values, 2d arrays) > > when it was in the sandbox?? > > I may be wrong and mixing up with some other library. > > > That does sound familiar, but i don't think it was part of > scikits.timeseries... > A patch for 2D would be welcome, I'm not quite sure what you mean by > initial value, though > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From simon+python at a-oben.org Mon Nov 23 10:29:43 2009 From: simon+python at a-oben.org (Simon Friedberger) Date: Mon, 23 Nov 2009 16:29:43 +0100 Subject: [SciPy-User] kmeans (Re: Mailing list?) In-Reply-To: <3d375d730911222354u4821ab2arbab7c48364b3aacc@mail.gmail.com> References: <20091122184226.GA9227@a-oben.org> <20091122184621.GB9227@a-oben.org> <86B36623-EEEB-4A95-9F72-5664DCC84717@cs.toronto.edu> <20091123071831.GA14634@a-oben.org> <3d375d730911222354u4821ab2arbab7c48364b3aacc@mail.gmail.com> Message-ID: <20091123152943.GC14634@a-oben.org> Hi Robert, I agree with you in every respect. :) Now, it only remains to see if anybody knows how to get to the 'whitening' transformation so I can invert it or apply it to my other data. Anybody? :) Best Simon From dwf at cs.toronto.edu Mon Nov 23 13:30:57 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Mon, 23 Nov 2009 13:30:57 -0500 Subject: [SciPy-User] kmeans (Re: Mailing list?) In-Reply-To: <20091123071831.GA14634@a-oben.org> References: <20091122184226.GA9227@a-oben.org> <20091122184621.GB9227@a-oben.org> <86B36623-EEEB-4A95-9F72-5664DCC84717@cs.toronto.edu> <20091123071831.GA14634@a-oben.org> Message-ID: <0BA82226-42AD-46D7-A6B8-067759D1A8DD@cs.toronto.edu> On 23-Nov-09, at 2:18 AM, Simon Friedberger wrote: > thanks for your explanation. I agree with your arguments but > couldn't it > have the opposite effect: Weighing features that should have less > discriminative power more because they have a small variance? > I'm just not sure about it but I will check out the book you > reference. > I've had it lying around for a while anyway. It could, but typically when you're employing k-means, you have little reason to believe any of the variables have any more explanatory power than any of the others, so treating them "equally" is the simplest, most reasonable thing to do. It indeed will inflate the range of low variance. You also use the word "discriminative", which makes me think you're trying to do some sort of classification. Note that k-means can't take into account any label information and is thus ill-suited to classification, though it is sometimes used for this. > On the case of inverting the transformation. Is this functionality > built-in? I can't find anything in the docs. It isn't, but maybe it should be. It'd involve rethinking the cluster module a bit (which I've been planning on as a means to expand it, but oh, the time, where does it go?...). David From dwf at cs.toronto.edu Mon Nov 23 13:36:04 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Mon, 23 Nov 2009 13:36:04 -0500 Subject: [SciPy-User] kmeans (Re: Mailing list?) In-Reply-To: <20091123152943.GC14634@a-oben.org> References: <20091122184226.GA9227@a-oben.org> <20091122184621.GB9227@a-oben.org> <86B36623-EEEB-4A95-9F72-5664DCC84717@cs.toronto.edu> <20091123071831.GA14634@a-oben.org> <3d375d730911222354u4821ab2arbab7c48364b3aacc@mail.gmail.com> <20091123152943.GC14634@a-oben.org> Message-ID: On 23-Nov-09, at 10:29 AM, Simon Friedberger wrote: > Hi Robert, > I agree with you in every respect. :) > > Now, it only remains to see if anybody knows how to get to the > 'whitening' transformation so I can invert it or apply it to my other > data. The 'whiten' function is only two lines: std_dev = std(obs, axis=0) return obs / std_dev codebook *= std(youroriginaldata, axis=0) will invert the transformation done by whiten() and apply it to your codebook. David From cimrman3 at ntc.zcu.cz Tue Nov 24 07:10:58 2009 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Tue, 24 Nov 2009 13:10:58 +0100 Subject: [SciPy-User] ANN: SfePy 2009.4 Message-ID: <4B0BCD52.4080706@ntc.zcu.cz> I am pleased to announce release 2009.4 of SfePy. Description ----------- SfePy (simple finite elements in Python) is a software, distributed under the BSD license, for solving systems of coupled partial differential equations by the finite element method. The code is based on NumPy and SciPy packages. Mailing lists, issue tracking, git repository: http://sfepy.org Home page: http://sfepy.kme.zcu.cz New documentation site: http://docs.sfepy.org/doc Many thanks to Logan Sorenson for the new documentation contents, and Vladimir Lukes for setting up the server. Highlights of this release -------------------------- - unified handling of user-defined functions (for defining subdomains, heterogeneous material properties, boundary conditions etc.) - greatly improved postprocessing and visualization capabilities, namely: - support for file sequences (evolutionary simulations) - animations (using ffmpeg) - automatic scalar bars - sfepy_gui.py: Mayavi2-based GUI to launch simulations Major improvements ------------------ Apart from many bug-fixes, let us mention: - quasistatic time stepping - graphical logging: - dynamic adding of data groups (new axes) to Log and ProcessPlotter - linear algebra: - reversed Cuthill-McKee permutation algorithm, graph in-place permutation - setting of parameter variables by a user-defined function - new tests and terms For more information on this release, see http://sfepy.googlecode.com/svn/web/releases/2009.4_RELEASE_NOTES.txt (full release notes, rather long). Best regards, Robert Cimrman From osman at fuse.net Tue Nov 24 19:21:33 2009 From: osman at fuse.net (osman) Date: Tue, 24 Nov 2009 19:21:33 -0500 Subject: [SciPy-User] ANN: SfePy 2009.4 In-Reply-To: <4B0BCD52.4080706@ntc.zcu.cz> References: <4B0BCD52.4080706@ntc.zcu.cz> Message-ID: <1259108493.28868.3.camel@osman-laptop> On Tue, 2009-11-24 at 13:10 +0100, Robert Cimrman wrote: > I am pleased to announce release 2009.4 of SfePy. Hi Robert, Thanks for the new version. I put it on my 32 bit ubuntu jaunty. All tests ran fine without any errors. Then I tried isfepy. I am getting an error: In [1]: pb, vec, data = pde_solve('input/poisson.py') sfepy: left over: ['__builtins__', '__file__', '__name__', '_filename', '__doc__', '__package__'] sfepy: reading mesh (database/simple.mesh)... sfepy: ...done in 0.04 s sfepy: setting up domain edges... sfepy: ...done in 0.01 s sfepy: setting up domain faces... sfepy: ...done in 0.01 s sfepy: creating regions... sfepy: Gamma_Right sfepy: Omega sfepy: Gamma_Left sfepy: ...done in 0.04 s sfepy: equation "Temperature": sfepy: dw_laplace.i1.Omega( coef.val, s, t ) = 0 sfepy: setting up dof connectivities... sfepy: ...done in 0.00 s sfepy: describing geometries... sfepy: ...done in 0.00 s sfepy: using solvers: nls: newton ls: ls sfepy: matrix shape: (300, 300) sfepy: assembling matrix graph... sfepy: ...done in 0.00 s sfepy: matrix structural nonzeros: 3538 (3.93e-02% fill) sfepy: updating materials... sfepy: coef sfepy: ...done in 0.01 s sfepy: updating variables... sfepy: ...done /usr/lib/python2.6/dist-packages/scipy/linsolve/__init__.py:4: DeprecationWarning: scipy.linsolve has moved to scipy.sparse.linalg.dsolve warn('scipy.linsolve has moved to scipy.sparse.linalg.dsolve', DeprecationWarning) sfepy: nls: iter: 0, residual: 1.176265e-01 (rel: 1.000000e+00) /usr/lib/python2.6/dist-packages/scipy/sparse/linalg/dsolve/linsolve.py:78: DeprecationWarning: scipy.sparse.linalg.dsolve.umfpack will be removed, install scikits.umfpack instead ' install scikits.umfpack instead', DeprecationWarning ) sfepy: rezidual: 0.00 [s] sfepy: solve: 0.01 [s] sfepy: matrix: 0.00 [s] sfepy: nls: iter: 1, residual: 9.957055e-17 (rel: 8.464973e-16) In [2]: view = Viewer(pb.get_output_name()) In [3]: view() --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /home/osman/sfepy-release-2009.4/sfepy/interactive/__init__.py in () ----> 1 2 3 4 5 /home/osman/sfepy-release-2009.4/sfepy/postprocess/viewer.py in call_mlab(self, scene, show, is_3d, view, roll, layout, scalar_mode, vector_mode, rel_scaling, clamping, ranges, is_scalar_bar, rel_text_width, fig_filename, resolution, filter_names, only_names, step, anti_aliasing) 555 else: 556 gui = ViewerGUI(viewer=self) --> 557 scene = gui.scene.mayavi_scene 558 559 if scene is not self.scene: AttributeError: 'MlabSceneModel' object has no attribute 'mayavi_scene' 2009.3 release has no problem with isfepy. Best, Osman From artpoon at gmail.com Wed Nov 25 01:30:46 2009 From: artpoon at gmail.com (Art Poon) Date: Tue, 24 Nov 2009 22:30:46 -0800 Subject: [SciPy-User] gamma ppf weirdness Message-ID: <2CA3D151-9873-4BED-AD0F-922301CD7535@gmail.com> Hello, I'm trying to hammer out some quick simulation code and need to calculate a bunch of inverse CDF values from the gamma distribution. SciPy seems like a great resource for this. However, I've encountered some strangeness that is probably my own fault: >>> g = stats.gamma(1,0,1) >>> g >>> g.ppf(0.1) 0.10536051565782635 >>> g.ppf(0.25) 0.0 >>> g.ppf(0.2500001) 0.28768220578512327 >>> g.ppf(0.2499999) 0.28768193911845646 >>> g.ppf(0.25) 0.0 I'm just dying to know where I've gone wrong here! In the meantime, I'm coding up a function to compute the inverse CDF from MATLAB code. I'm using Snow Leopard with Python 2.5.4 (bypassing default system Python 2.6.1), numpy-1.3.0 and scipy-0.7.1, both compiled from source. Thanks! - Art. -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Nov 25 01:44:47 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Nov 2009 01:44:47 -0500 Subject: [SciPy-User] gamma ppf weirdness In-Reply-To: <2CA3D151-9873-4BED-AD0F-922301CD7535@gmail.com> References: <2CA3D151-9873-4BED-AD0F-922301CD7535@gmail.com> Message-ID: <1cd32cbb0911242244i32e88512x8f38b7a425d1f093@mail.gmail.com> On Wed, Nov 25, 2009 at 1:30 AM, Art Poon wrote: > Hello, > I'm trying to hammer out some quick simulation code and need to calculate a > bunch of inverse CDF values from the gamma distribution. ?SciPy seems like a > great resource for this. ?However, I've encountered some strangeness that is > probably my own fault: >>>> g = stats.gamma(1,0,1) >>>> g > >>>> g.ppf(0.1) > 0.10536051565782635 >>>> g.ppf(0.25) > 0.0 >>>> g.ppf(0.2500001) > 0.28768220578512327 >>>> g.ppf(0.2499999) > 0.28768193911845646 >>>> g.ppf(0.25) > 0.0 > I'm just dying to know where I've gone wrong here! ?In the meantime, I'm > coding up a function to compute the inverse CDF from MATLAB code. > I'm using Snow Leopard with Python 2.5.4 (bypassing default system Python > 2.6.1), numpy-1.3.0 and scipy-0.7.1, both compiled from source. > Thanks! > - Art. this has been fixed in trunk, see http://projects.scipy.org/scipy/ticket/975 >>> stats.gamma.ppf(0.25, 1.,0.,1) 0.28768207245178096 >>> stats.gamma.ppf(0.2500001, 1.,0.,1) 0.28768220578512327 >>> stats.gamma.ppf(0.25 -0.00001, 1.,0.,1) 0.28766873920733566 >>> stats.gamma.ppf(0.25, 1,0,1) 0.28768207245178096 >>> stats.gamma(1,0,1).ppf(0.25) 0.28768207245178096 >>> import scipy >>> scipy.version.version '0.8.0.dev6118' Josef > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From josef.pktd at gmail.com Wed Nov 25 02:00:43 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Nov 2009 02:00:43 -0500 Subject: [SciPy-User] gamma ppf weirdness In-Reply-To: <1cd32cbb0911242244i32e88512x8f38b7a425d1f093@mail.gmail.com> References: <2CA3D151-9873-4BED-AD0F-922301CD7535@gmail.com> <1cd32cbb0911242244i32e88512x8f38b7a425d1f093@mail.gmail.com> Message-ID: <1cd32cbb0911242300j8a438eqa864edfc62a0b20@mail.gmail.com> On Wed, Nov 25, 2009 at 1:44 AM, wrote: > On Wed, Nov 25, 2009 at 1:30 AM, Art Poon wrote: >> Hello, >> I'm trying to hammer out some quick simulation code and need to calculate a >> bunch of inverse CDF values from the gamma distribution. ?SciPy seems like a >> great resource for this. ?However, I've encountered some strangeness that is >> probably my own fault: >>>>> g = stats.gamma(1,0,1) >>>>> g >> >>>>> g.ppf(0.1) >> 0.10536051565782635 >>>>> g.ppf(0.25) >> 0.0 >>>>> g.ppf(0.2500001) >> 0.28768220578512327 >>>>> g.ppf(0.2499999) >> 0.28768193911845646 >>>>> g.ppf(0.25) >> 0.0 >> I'm just dying to know where I've gone wrong here! ?In the meantime, I'm >> coding up a function to compute the inverse CDF from MATLAB code. >> I'm using Snow Leopard with Python 2.5.4 (bypassing default system Python >> 2.6.1), numpy-1.3.0 and scipy-0.7.1, both compiled from source. >> Thanks! >> - Art. > > this has been fixed in trunk, see http://projects.scipy.org/scipy/ticket/975 > >>>> stats.gamma.ppf(0.25, 1.,0.,1) > 0.28768207245178096 >>>> stats.gamma.ppf(0.2500001, 1.,0.,1) > 0.28768220578512327 >>>> stats.gamma.ppf(0.25 -0.00001, 1.,0.,1) > 0.28766873920733566 >>>> stats.gamma.ppf(0.25, 1,0,1) > 0.28768207245178096 >>>> stats.gamma(1,0,1).ppf(0.25) > 0.28768207245178096 >>>> import scipy >>>> scipy.version.version > '0.8.0.dev6118' > > > Josef > >> scipy special also has the function for the gamma.isf (which is currently not used in stats.gamma) >>> special.gammainccinv(1, 1-0.25) 0.28768207245178096 You could check whether it is correct on 0.7.0, but I'm not sure whether special.gammaincinv(a,q) and special.gammainccinv(1, 1-0.25) are really independent implementation. Josef >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > From artpoon at gmail.com Wed Nov 25 02:05:31 2009 From: artpoon at gmail.com (Art Poon) Date: Tue, 24 Nov 2009 23:05:31 -0800 Subject: [SciPy-User] gamma ppf weirdness In-Reply-To: <1cd32cbb0911242300j8a438eqa864edfc62a0b20@mail.gmail.com> References: <2CA3D151-9873-4BED-AD0F-922301CD7535@gmail.com> <1cd32cbb0911242244i32e88512x8f38b7a425d1f093@mail.gmail.com> <1cd32cbb0911242300j8a438eqa864edfc62a0b20@mail.gmail.com> Message-ID: Excellent, thanks very much. - Art. On 2009-11-24, at 11:00 PM, josef.pktd at gmail.com wrote: > On Wed, Nov 25, 2009 at 1:44 AM, wrote: >> On Wed, Nov 25, 2009 at 1:30 AM, Art Poon wrote: >>> Hello, >>> I'm trying to hammer out some quick simulation code and need to calculate a >>> bunch of inverse CDF values from the gamma distribution. SciPy seems like a >>> great resource for this. However, I've encountered some strangeness that is >>> probably my own fault: >>>>>> g = stats.gamma(1,0,1) >>>>>> g >>> >>>>>> g.ppf(0.1) >>> 0.10536051565782635 >>>>>> g.ppf(0.25) >>> 0.0 >>>>>> g.ppf(0.2500001) >>> 0.28768220578512327 >>>>>> g.ppf(0.2499999) >>> 0.28768193911845646 >>>>>> g.ppf(0.25) >>> 0.0 >>> I'm just dying to know where I've gone wrong here! In the meantime, I'm >>> coding up a function to compute the inverse CDF from MATLAB code. >>> I'm using Snow Leopard with Python 2.5.4 (bypassing default system Python >>> 2.6.1), numpy-1.3.0 and scipy-0.7.1, both compiled from source. >>> Thanks! >>> - Art. >> >> this has been fixed in trunk, see http://projects.scipy.org/scipy/ticket/975 >> >>>>> stats.gamma.ppf(0.25, 1.,0.,1) >> 0.28768207245178096 >>>>> stats.gamma.ppf(0.2500001, 1.,0.,1) >> 0.28768220578512327 >>>>> stats.gamma.ppf(0.25 -0.00001, 1.,0.,1) >> 0.28766873920733566 >>>>> stats.gamma.ppf(0.25, 1,0,1) >> 0.28768207245178096 >>>>> stats.gamma(1,0,1).ppf(0.25) >> 0.28768207245178096 >>>>> import scipy >>>>> scipy.version.version >> '0.8.0.dev6118' >> >> >> Josef >> >>> > > scipy special also has the function for the gamma.isf (which is > currently not used in stats.gamma) > >>>> special.gammainccinv(1, 1-0.25) > 0.28768207245178096 > > You could check whether it is correct on 0.7.0, but I'm not sure > whether special.gammaincinv(a,q) and special.gammainccinv(1, 1-0.25) > are really independent implementation. > > Josef > > >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >>> >>> >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From cool-rr at cool-rr.com Wed Nov 25 02:05:06 2009 From: cool-rr at cool-rr.com (Ram Rachum) Date: Wed, 25 Nov 2009 07:05:06 +0000 (UTC) Subject: [SciPy-User] Mean arrivals per time unit -> Time between consecutive arrivals Message-ID: Hello, I've just started using scipy/numpy for some queue theory. I have a queue for which the arrival rate is a Poisson distribution. I also have the mean number of arrivals per time unit. I looked around SciPy and I saw I can use scipy.stats.poisson. I was happy that it could make a random variable for number of arrivals per time unit. But I want the time between consecutive arrivals, as a random variable. Does anyone know how I can get that? Thanks, Ram. From josef.pktd at gmail.com Wed Nov 25 02:42:31 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Nov 2009 02:42:31 -0500 Subject: [SciPy-User] Mean arrivals per time unit -> Time between consecutive arrivals In-Reply-To: References: Message-ID: <1cd32cbb0911242342x1aab227el3a615e067b9fca51@mail.gmail.com> On Wed, Nov 25, 2009 at 2:05 AM, Ram Rachum wrote: > Hello, > > I've just started using scipy/numpy for some queue theory. I have a queue for > which the arrival rate is a Poisson distribution. I also have the mean number of > arrivals per time unit. > > I looked around SciPy and I saw I can use scipy.stats.poisson. I was happy that > it could make a random variable for number of arrivals per time unit. But I want > the time between consecutive arrivals, as a random variable. > > Does anyone know how I can get that? I don't remember the relationship for the different random variables related to arrival processes (without looking it up again), but there is a related example in https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/py4science/examples/stats_distributions.py Josef > > Thanks, > Ram. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Wed Nov 25 03:00:13 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Nov 2009 03:00:13 -0500 Subject: [SciPy-User] Mean arrivals per time unit -> Time between consecutive arrivals In-Reply-To: <1cd32cbb0911242342x1aab227el3a615e067b9fca51@mail.gmail.com> References: <1cd32cbb0911242342x1aab227el3a615e067b9fca51@mail.gmail.com> Message-ID: <1cd32cbb0911250000o70eb0f27n886074fab0fab308@mail.gmail.com> On Wed, Nov 25, 2009 at 2:42 AM, wrote: > On Wed, Nov 25, 2009 at 2:05 AM, Ram Rachum wrote: >> Hello, >> >> I've just started using scipy/numpy for some queue theory. I have a queue for >> which the arrival rate is a Poisson distribution. I also have the mean number of >> arrivals per time unit. >> >> I looked around SciPy and I saw I can use scipy.stats.poisson. I was happy that >> it could make a random variable for number of arrivals per time unit. But I want >> the time between consecutive arrivals, as a random variable. >> >> Does anyone know how I can get that? > > I don't remember the relationship for the different random variables > related to arrival > processes (without looking it up again), but there is a related example in > > https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/py4science/examples/stats_distributions.py http://en.wikipedia.org/wiki/Queueing_theory#Role_of_Poisson_process.2C_exponential_distributions mentions the exponential distribution for the time between arrivals Josef > > Josef > >> >> Thanks, >> Ram. >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > From lucadeluge at gmail.com Wed Nov 25 05:37:23 2009 From: lucadeluge at gmail.com (Luca Delucchi) Date: Wed, 25 Nov 2009 11:37:23 +0100 Subject: [SciPy-User] problem with optimize.curve_fit Message-ID: Hi everybody i try to use optimize.curve_fit but i have a error gis at srvcavit:~/meteo_python$ python prova_optimize.py Traceback (most recent call last): File "prova_optimize.py", line 23, in popt, pcov = curve_fit(func,x,old_y,2) File "/usr/lib/python2.5/site-packages/scipy/optimize/minpack.py", line 423, in curve_fit raise RuntimeError, "Optimal parameters not found: " + mesg RuntimeError: Optimal parameters not found: Both actual and predicted relative reductions in the sum of squares are at most 0.000000 and the relative error between two consecutive iterates is at most 0.000000 i see that is a bug [0], i try to modify the script with the solution proposed by kael but nothing change, here [1] you can find the script that i use, how can i solve my problem? thanks Luca [0] http://projects.scipy.org/scipy/ticket/984 [1] http://pastebin.com/m3c721c6f From cimrman3 at ntc.zcu.cz Wed Nov 25 06:00:03 2009 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Wed, 25 Nov 2009 12:00:03 +0100 Subject: [SciPy-User] [Fwd: Re: ANN: SfePy 2009.4] Message-ID: <4B0D0E33.4010703@ntc.zcu.cz> Hi Osman, thanks for trying out the new version! As isfepy works for me, I assume it must be a version issue with mayavi (tested with 3.3.0). What is your mayavi version? If you cannot try 3.3.0, or use it already, could you send me the output of 'gui.scene.print_traits()'? Just put it prior to the offending line... cheers, r. PS: I guess we should discuss this on sfepy-devel only... osman wrote: > On Tue, 2009-11-24 at 13:10 +0100, Robert Cimrman wrote: >> I am pleased to announce release 2009.4 of SfePy. > > Hi Robert, > Thanks for the new version. I put it on my 32 bit ubuntu jaunty. All > tests ran fine without any errors. Then I tried isfepy. I am getting an > error: > In [1]: pb, vec, data = pde_solve('input/poisson.py') > sfepy: left over: ['__builtins__', '__file__', '__name__', '_filename', > '__doc__', '__package__'] > sfepy: reading mesh (database/simple.mesh)... > sfepy: ...done in 0.04 s > sfepy: setting up domain edges... > sfepy: ...done in 0.01 s > sfepy: setting up domain faces... > sfepy: ...done in 0.01 s > sfepy: creating regions... > sfepy: Gamma_Right > sfepy: Omega > sfepy: Gamma_Left > sfepy: ...done in 0.04 s > sfepy: equation "Temperature": > sfepy: dw_laplace.i1.Omega( coef.val, s, t ) = 0 > sfepy: setting up dof connectivities... > sfepy: ...done in 0.00 s > sfepy: describing geometries... > sfepy: ...done in 0.00 s > sfepy: using solvers: > nls: newton > ls: ls > sfepy: matrix shape: (300, 300) > sfepy: assembling matrix graph... > sfepy: ...done in 0.00 s > sfepy: matrix structural nonzeros: 3538 (3.93e-02% fill) > sfepy: updating materials... > sfepy: coef > sfepy: ...done in 0.01 s > sfepy: updating variables... > sfepy: ...done > /usr/lib/python2.6/dist-packages/scipy/linsolve/__init__.py:4: > DeprecationWarning: scipy.linsolve has moved to > scipy.sparse.linalg.dsolve > warn('scipy.linsolve has moved to scipy.sparse.linalg.dsolve', > DeprecationWarning) > sfepy: nls: iter: 0, residual: 1.176265e-01 (rel: 1.000000e+00) > /usr/lib/python2.6/dist-packages/scipy/sparse/linalg/dsolve/linsolve.py:78: DeprecationWarning: scipy.sparse.linalg.dsolve.umfpack will be removed, install scikits.umfpack instead > ' install scikits.umfpack instead', DeprecationWarning ) > sfepy: rezidual: 0.00 [s] > sfepy: solve: 0.01 [s] > sfepy: matrix: 0.00 [s] > sfepy: nls: iter: 1, residual: 9.957055e-17 (rel: 8.464973e-16) > > In [2]: view = Viewer(pb.get_output_name()) > > In [3]: view() > --------------------------------------------------------------------------- > AttributeError Traceback (most recent call > last) > > /home/osman/sfepy-release-2009.4/sfepy/interactive/__init__.py in > () > ----> 1 > 2 > 3 > 4 > 5 > > /home/osman/sfepy-release-2009.4/sfepy/postprocess/viewer.py in > call_mlab(self, scene, show, is_3d, view, roll, layout, scalar_mode, > vector_mode, rel_scaling, clamping, ranges, is_scalar_bar, > rel_text_width, fig_filename, resolution, filter_names, only_names, > step, anti_aliasing) > 555 else: > 556 gui = ViewerGUI(viewer=self) > --> 557 scene = gui.scene.mayavi_scene > 558 > 559 if scene is not self.scene: > > AttributeError: 'MlabSceneModel' object has no attribute 'mayavi_scene' > > > 2009.3 release has no problem with isfepy. > > Best, > Osman From gael.varoquaux at normalesup.org Wed Nov 25 06:06:43 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 25 Nov 2009 12:06:43 +0100 Subject: [SciPy-User] [Fwd: Re: ANN: SfePy 2009.4] In-Reply-To: <4B0D0E33.4010703@ntc.zcu.cz> References: <4B0D0E33.4010703@ntc.zcu.cz> Message-ID: <20091125110643.GB21484@phare.normalesup.org> On Wed, Nov 25, 2009 at 12:00:03PM +0100, Robert Cimrman wrote: > > 555 else: > > 556 gui = ViewerGUI(viewer=self) > > --> 557 scene = gui.scene.mayavi_scene > > 558 > > 559 if scene is not self.scene: > > AttributeError: 'MlabSceneModel' object has no attribute 'mayavi_scene' Yes, mayavi_scene is new in 3.3.0. We realized that this functionnality we needed a bit late :) Ga?l From cimrman3 at ntc.zcu.cz Wed Nov 25 06:33:23 2009 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Wed, 25 Nov 2009 12:33:23 +0100 Subject: [SciPy-User] [Fwd: Re: ANN: SfePy 2009.4] In-Reply-To: <20091125110643.GB21484@phare.normalesup.org> References: <4B0D0E33.4010703@ntc.zcu.cz> <20091125110643.GB21484@phare.normalesup.org> Message-ID: <4B0D1603.40509@ntc.zcu.cz> Gael Varoquaux wrote: > On Wed, Nov 25, 2009 at 12:00:03PM +0100, Robert Cimrman wrote: >>> 555 else: >>> 556 gui = ViewerGUI(viewer=self) >>> --> 557 scene = gui.scene.mayavi_scene >>> 558 >>> 559 if scene is not self.scene: > >>> AttributeError: 'MlabSceneModel' object has no attribute 'mayavi_scene' > > Yes, mayavi_scene is new in 3.3.0. We realized that this functionnality > we needed a bit late :) > > Ga?l And I have started using it even later :) thanks for clarification! r. From almar.klein at gmail.com Wed Nov 25 07:16:18 2009 From: almar.klein at gmail.com (Almar Klein) Date: Wed, 25 Nov 2009 13:16:18 +0100 Subject: [SciPy-User] Dijkstra's algorithm on a lattice In-Reply-To: <9457e7c80911220307x28112983ud6e3b454b3b69911@mail.gmail.com> References: <9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com> <9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu> <99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu> <4B088A5B.8000604@bigpond.net.au> <66E0FDD3-4822-4722-BB6E-C7F21A679D32@yale.edu> <9457e7c80911220300l29370151y655159ab835e6359@mail.gmail.com> <9457e7c80911220307x28112983ud6e3b454b3b69911@mail.gmail.com> Message-ID: Hi, I have an implementation of the Minimum Cost Path method too. It uses a binary heap to store the active pixels (I call it the front). Therefore it does not need to iterate over the the whole image at each iteration, but simply pops the pixel with the minimum cumulative cost from the heap. This significantly increase speed. Because I am still changing stuff to my MCP implementation, and I use it for my research, I am a bit reluctant to make it publicly available now. I'd be happy to share the binary heap implementation though. Looking at the posted code, I think it is incorrect. Each iteration, you should only check the neighbours of the pixel that has the minimum cumulative costs. That's why the binary heap is so important to get it fast. I short while ago I made a flash app with some examples. It also contains the pseudo code, although I use a slighly different terminology (cost=speed, cumulative cost=time): http://dl.dropbox.com/u/1463853/mcpExamples.swf Here's a snipet of how I use the binary heap: ===== from heap cimport BinaryHeapWithCrossRef front = BinaryHeapWithCrossRef(frozen_flat) value, ii= front.Pop() # to get the pixel of the front with the minimum cumulative cost front.Push(cumcost_flat[ii], ii) # to insert or update a value ===== I use flat arrays and scalar indices, so I need to store only one reference per pixel. This also makes the implementation work for 3D data (or even higher dimensions if you wish). frozen_flat is a flat array, the same size as the imput (cost or speed) image, that keeps track whether pixels are frozen (indicating they wont change). A pixel is frozen right after it is popped from the heap. I use the same array in the heap to be able to update values if a pixel is already in the heap. I hope this helps a bit. If not, feel free to ask. Almar 2009/11/22 St?fan van der Walt : > 2009/11/22 St?fan van der Walt : >> This code looks really handy, and I'd love to add it. ?Would you >> consider putting your code in a branch on github? > > Actually, don't worry -- I'll add it quickly. ?Thanks for the contribution! > > Cheers > St?fan > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- A non-text attachment was scrubbed... Name: heap.pxd Type: application/octet-stream Size: 1181 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: heap.pyx Type: application/octet-stream Size: 26347 bytes Desc: not available URL: From zachary.pincus at yale.edu Wed Nov 25 07:44:27 2009 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Wed, 25 Nov 2009 07:44:27 -0500 Subject: [SciPy-User] Dijkstra's algorithm on a lattice In-Reply-To: References: <9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com> <9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu> <99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu> <4B088A5B.8000604@bigpond.net.au> <66E0FDD3-4822-4722-BB6E-C7F21A679D32@yale.edu> <9457e7c80911220300l29370151y655159ab835e6359@mail.gmail.com> <9457e7c80911220307x28112983ud6e3b454b3b69911@mail.gmail.com> Message-ID: <6F8D1FC6-656C-4860-8D40-E61A7996920A@yale.edu> Hi Almar, The binary heap code looks extremely useful in general -- thanks for making it available! Do you have any license you want it under? (BSD seems preferable if this is to be incorporated into a scikit, e.g.) It would be great if you would be interested in making your MCP code available too, even just as a base for others to hack on a bit (rather than as a finished contribution), but this is of course up to you. Otherwise I'll probably try to throw together something similar using the heap code. > Looking at the posted code, I think it is incorrect. Each iteration, > you should only check the neighbours of the pixel that has the > minimum cumulative costs. That's why the binary heap is so important > to get it fast. Incorrect means that the code might give a wrong result: is this the case? I *think* I had satisfied myself that the implementation (while suboptimal because it does extra work -- a lot in some cases!) would yield the correct path. (Note that the code doesn't terminate when the "end" pixel is first assigned a cost, but when no costs are changing anywhere. Basically, brute-force search instead of Dijkstra's algorithm. Again, while a lot more than necessary to just find the minimum cost to a single point, this condition should be sufficient to ensure that the minimum cost to *every* point in the array has been found, right? If my analysis is wrong, though, it wouldn't be the first time!) > I use flat arrays and scalar indices, so I need to store only one > reference per pixel. This also makes the implementation work for 3D > data (or even higher dimensions if you wish). Do you have code to take the flat index and the shape of the original array and return the indices to the neighboring pixels, or is there some other trick with that too? Anyhow, thanks for your suggestions and contribution! I look forward to making use of the heap. Best, Zach On Nov 25, 2009, at 7:16 AM, Almar Klein wrote: > Hi, > > I have an implementation of the Minimum Cost Path method too. It uses > a binary heap to store the > active pixels (I call it the front). Therefore it does not need to > iterate over the the whole image at > each iteration, but simply pops the pixel with the minimum cumulative > cost from the heap. This > significantly increase speed. > > Because I am still changing stuff to my MCP implementation, and I use > it for my research, I am a bit > reluctant to make it publicly available now. I'd be happy to share the > binary heap implementation though. > > Looking at the posted code, I think it is incorrect. Each iteration, > you should only check the neighbours > of the pixel that has the minimum cumulative costs. That's why the > binary heap is so important to get > it fast. > > I short while ago I made a flash app with some examples. It also > contains the pseudo code, although > I use a slighly different terminology (cost=speed, cumulative > cost=time): > http://dl.dropbox.com/u/1463853/mcpExamples.swf > > > Here's a snipet of how I use the binary heap: > ===== > from heap cimport BinaryHeapWithCrossRef > front = BinaryHeapWithCrossRef(frozen_flat) > value, ii= front.Pop() # to get the pixel of the front with the > minimum cumulative cost > front.Push(cumcost_flat[ii], ii) # to insert or update a value > ===== > I use flat arrays and scalar indices, so I need to store only one > reference per pixel. This also > makes the implementation work for 3D data (or even higher dimensions > if you wish). > frozen_flat is a flat array, the same size as the imput (cost or > speed) image, that keeps track > whether pixels are frozen (indicating they wont change). A pixel is > frozen right after it is popped > from the heap. I use the same array in the heap to be able to update > values if a pixel is already > in the heap. > > I hope this helps a bit. If not, feel free to ask. > > Almar > > > > 2009/11/22 St?fan van der Walt : >> 2009/11/22 St?fan van der Walt : >>> This code looks really handy, and I'd love to add it. Would you >>> consider putting your code in a branch on github? >> >> Actually, don't worry -- I'll add it quickly. Thanks for the >> contribution! >> >> Cheers >> St?fan >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From almar.klein at gmail.com Wed Nov 25 08:06:57 2009 From: almar.klein at gmail.com (Almar Klein) Date: Wed, 25 Nov 2009 14:06:57 +0100 Subject: [SciPy-User] Dijkstra's algorithm on a lattice In-Reply-To: <6F8D1FC6-656C-4860-8D40-E61A7996920A@yale.edu> References: <9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com> <9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu> <99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu> <4B088A5B.8000604@bigpond.net.au> <66E0FDD3-4822-4722-BB6E-C7F21A679D32@yale.edu> <9457e7c80911220300l29370151y655159ab835e6359@mail.gmail.com> <9457e7c80911220307x28112983ud6e3b454b3b69911@mail.gmail.com> <6F8D1FC6-656C-4860-8D40-E61A7996920A@yale.edu> Message-ID: Hi Zach, > The binary heap code looks extremely useful in general -- thanks for > making it available! Do you have any license you want it under? (BSD > seems preferable if this is to be incorporated into a scikit, e.g.) BSD's fine :) > It would be great if you would be interested in making your MCP code > available too, even just as a base for others to hack on a bit (rather > than as a finished contribution), but this is of course up to you. > Otherwise I'll probably try to throw together something similar using > the heap code. I'll send you my current implementation, you should be able to distill something usefull from that. The problem is that 1) I make use of another module of mine to deal with anisotropic data (because my data is anisotropic) 2) I use the MCP method in a specific way, therefore I needed to make the implementation more flexible. This makes it less easy to use for other people, and thus less handy to include in any toolkit as-is. >> Looking at the posted code, I think it is incorrect. Each iteration, >> you should only check the neighbours of the pixel that has the >> minimum cumulative costs. That's why the binary heap is so important >> to get it fast. > > Incorrect means that the code might give a wrong result: is this the > case? I *think* I had satisfied myself that the implementation (while > suboptimal because it does extra work -- a lot in some cases!) would > yield the correct path. (Note that the code doesn't terminate when the > "end" pixel is first assigned a cost, but when no costs are changing > anywhere. Basically, brute-force search instead of Dijkstra's > algorithm. Again, while a lot more than necessary to just find the > minimum cost to a single point, this condition should be sufficient to > ensure that the minimum cost to *every* point in the array has been > found, right? If my analysis is wrong, though, it wouldn't be the > first time!) I really mean wrong, sorry. You now select any pixel that is active (meaning an arbitrary pixel in the front), and from it calculate the cumulative cost for its neighbours. However, it might be that the cumulative cost of this pixel is changed later. Therefore you must take the active pixel with the lowest cumulative cost; so you know it won't be changed. >> I use flat arrays and scalar indices, so I need to store only one >> reference per pixel. This also makes the implementation work for 3D >> data (or even higher dimensions if you wish). > > > Do you have code to take the flat index and the shape of the original > array and return the indices to the neighboring pixels, or is there > some other trick with that too? Yes, it's in the code I'll send you. Cheers, Almar From zachary.pincus at yale.edu Wed Nov 25 08:50:00 2009 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Wed, 25 Nov 2009 08:50:00 -0500 Subject: [SciPy-User] Dijkstra's algorithm on a lattice In-Reply-To: References: <9457e7c80911210638t1df8f21amc1d0616221b64467@mail.gmail.com> <9E69193F-E01E-48F5-8C81-4747C9AA0823@yale.edu> <99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu> <4B088A5B.8000604@bigpond.net.au> <66E0FDD3-4822-4722-BB6E-C7F21A679D32@yale.edu> <9457e7c80911220300l29370151y655159ab835e6359@mail.gmail.com> <9457e7c80911220307x28112983ud6e3b454b3b69911@mail.gmail.com> <6F8D1FC6-656C-4860-8D40-E61A7996920A@yale.edu> Message-ID: > I'll send you my current implementation, you should be able to > distill something > usefull from that. The problem is that 1) I make use of another > module of > mine to deal with anisotropic data (because my data is anisotropic) > 2) I use > the MCP method in a specific way, therefore I needed to make the > implementation more flexible. This makes it less easy to use for > other people, > and thus less handy to include in any toolkit as-is. Thanks -- I'll see what I can distill! >>> Looking at the posted code, I think it is incorrect. Each iteration, >>> you should only check the neighbours of the pixel that has the >>> minimum cumulative costs. That's why the binary heap is so important >>> to get it fast. >> >> Incorrect means that the code might give a wrong result: is this the >> case? I *think* I had satisfied myself that the implementation (while >> suboptimal because it does extra work -- a lot in some cases!) would >> yield the correct path. (Note that the code doesn't terminate when >> the >> "end" pixel is first assigned a cost, but when no costs are changing >> anywhere. Basically, brute-force search instead of Dijkstra's >> algorithm. Again, while a lot more than necessary to just find the >> minimum cost to a single point, this condition should be sufficient >> to >> ensure that the minimum cost to *every* point in the array has been >> found, right? If my analysis is wrong, though, it wouldn't be the >> first time!) > > I really mean wrong, sorry. You now select any pixel that is active > (meaning > an arbitrary pixel in the front), and from it calculate the > cumulative cost > for its neighbours. However, it might be that the cumulative cost of > this pixel > is changed later. Therefore you must take the active pixel with the > lowest > cumulative cost; so you know it won't be changed. Each time a pixel's cumulative cost decreases, I put it back into the "active" set (i.e. the front), so then the neighbors get re-examined the next iteration, etc. This should suffice, right? Or am I *still* missing something? Not that it really matters, because the approach is rather inefficient for anything except finding the minimum cost to every single array element (and even then I'm not certain this is better). But I am curious if I just conceived of the whole problem wrong. In which case perhaps I'm not the guy you want implementing this for the scikit! Zach From almar.klein at gmail.com Wed Nov 25 09:05:02 2009 From: almar.klein at gmail.com (Almar Klein) Date: Wed, 25 Nov 2009 15:05:02 +0100 Subject: [SciPy-User] Dijkstra's algorithm on a lattice In-Reply-To: References: <99C27119-17D0-4EA4-A330-ECEEE71B7C2D@yale.edu> <4B088A5B.8000604@bigpond.net.au> <66E0FDD3-4822-4722-BB6E-C7F21A679D32@yale.edu> <9457e7c80911220300l29370151y655159ab835e6359@mail.gmail.com> <9457e7c80911220307x28112983ud6e3b454b3b69911@mail.gmail.com> <6F8D1FC6-656C-4860-8D40-E61A7996920A@yale.edu> Message-ID: >>>> Looking at the posted code, I think it is incorrect. Each iteration, >>>> you should only check the neighbours of the pixel that has the >>>> minimum cumulative costs. That's why the binary heap is so important >>>> to get it fast. >>> >>> Incorrect means that the code might give a wrong result: is this the >>> case? I *think* I had satisfied myself that the implementation (while >>> suboptimal because it does extra work -- a lot in some cases!) would >>> yield the correct path. (Note that the code doesn't terminate when >>> the >>> "end" pixel is first assigned a cost, but when no costs are changing >>> anywhere. Basically, brute-force search instead of Dijkstra's >>> algorithm. Again, while a lot more than necessary to just find the >>> minimum cost to a single point, this condition should be sufficient >>> to >>> ensure that the minimum cost to *every* point in the array has been >>> found, right? If my analysis is wrong, though, it wouldn't be the >>> first time!) >> >> I really mean wrong, sorry. You now select any pixel that is active >> (meaning >> an arbitrary pixel in the front), and from it calculate the >> cumulative cost >> for its neighbours. However, it might be that the cumulative cost of >> this pixel >> is changed later. Therefore you must take the active pixel with the >> lowest >> cumulative cost; so you know it won't be changed. > > Each time a pixel's cumulative cost decreases, I put it back into the > "active" set (i.e. the front), so then the neighbors get re-examined > the next iteration, etc. This should suffice, right? Or am I *still* > missing something? Not that it really matters, because the approach is > rather inefficient for anything except finding the minimum cost to > every single array element (and even then I'm not certain this is > better). But I am curious if I just conceived of the whole problem > wrong. In which case perhaps I'm not the guy you want implementing > this for the scikit! Ah, now I see. I'm sorry. Yes, your code should produce the correct result, although it will probably evaluate a lot of pixels more than once :) Almar From oliphant at enthought.com Wed Nov 25 09:48:09 2009 From: oliphant at enthought.com (Travis Oliphant) Date: Wed, 25 Nov 2009 08:48:09 -0600 Subject: [SciPy-User] sinc interpolation Message-ID: On Nov 20, 2009, at 2:26 PM, David Trem wrote: > Hello, > > Is sinc interpolation available in Scipy ? Yes, use scipy.signal.resample which uses a Fourier method to downsample or upsample a signal: from scipy.signal import resample from numpy import r_, sin from pylab import plot x = r_[0:10] y = sin(x) yy = resample(x, 100) # This is a bit tricky to get the x-samples right xx = r_[0:10:101j][:-1] plot(x,y,'ro', xx, yy) -Travis From cournape at gmail.com Wed Nov 25 09:55:57 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 25 Nov 2009 23:55:57 +0900 Subject: [SciPy-User] sinc interpolation In-Reply-To: <1258807340.2525.0.camel@PCTerrusse> References: <4B06FB5E.8070806@gmail.com> <1258807340.2525.0.camel@PCTerrusse> Message-ID: <5b8d13220911250655p7c7ad73bqc4ea08b79bfa5722@mail.gmail.com> On Sat, Nov 21, 2009 at 9:42 PM, Fabricio Silva wrote: > Le vendredi 20 novembre 2009 ? 21:26 +0100, David Trem a ?crit : >> Hello, >> >> Is sinc interpolation available in Scipy ? > > David Cournapeau has a scikit for that : > http://pypi.python.org/pypi/scikits.samplerate/ It is mostly useful for audio signals, though, and limited to 1d signals. A more general sinc-based interpolation scheme would be nice for scipy.signal. David From vanforeest at gmail.com Wed Nov 25 10:54:28 2009 From: vanforeest at gmail.com (nicky van foreest) Date: Wed, 25 Nov 2009 16:54:28 +0100 Subject: [SciPy-User] Mean arrivals per time unit -> Time between consecutive arrivals In-Reply-To: References: Message-ID: Hi Ram, YOu should take the interarrival time between two consecutive arrivals to be exponentially distributed with rate lambda, where lambda is the arrival rate. LIke this the number of arrivals in a fixed period is Poisson distributed. I never tried, but I suppose scipy contains a module to generate exponentially distributed rv's. Nicky 2009/11/25 Ram Rachum : > Hello, > > I've just started using scipy/numpy for some queue theory. I have a queue for > which the arrival rate is a Poisson distribution. I also have the mean number of > arrivals per time unit. > > I looked around SciPy and I saw I can use scipy.stats.poisson. I was happy that > it could make a random variable for number of arrivals per time unit. But I want > the time between consecutive arrivals, as a random variable. > > Does anyone know how I can get that? > > Thanks, > Ram. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From dalloliogm at gmail.com Wed Nov 25 11:16:39 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 25 Nov 2009 17:16:39 +0100 Subject: [SciPy-User] sinc interpolation In-Reply-To: References: Message-ID: <5aa3b3570911250816t38ce53bejd693c7723d2f7924@mail.gmail.com> On Wed, Nov 25, 2009 at 3:48 PM, Travis Oliphant wrote: > > > from scipy.signal import resample > from numpy import r_, sin > from pylab import plot > > x = r_[0:10] > y = sin(x) > yy = resample(x, 100) > > # This is a bit tricky to get the x-samples right > xx = r_[0:10:101j][:-1] > just a question, why don't you use numpy.linspace(0, 10, 101) ? >>> n = numpy.linspace(0, 10, 101)[:-1] array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5. , 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6. , 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7. , 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8. , 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, 9. , 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9, 10. ]) >>> n == r_[0:10:101j][:-1] [True.....] -- Giovanni Dall'Olio, phd student Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain) My blog on bioinformatics: http://bioinfoblog.it -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Nov 25 11:20:09 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Nov 2009 11:20:09 -0500 Subject: [SciPy-User] Mean arrivals per time unit -> Time between consecutive arrivals In-Reply-To: References: Message-ID: <1cd32cbb0911250820p4b662f3eyab200cfef6c1f68b@mail.gmail.com> On Wed, Nov 25, 2009 at 10:54 AM, nicky van foreest wrote: > Hi Ram, > > YOu should take the interarrival time between two consecutive arrivals > to be exponentially distributed with rate lambda, where lambda is the > arrival rate. LIke this the number of arrivals in a fixed period is > Poisson distributed. I never tried, but I suppose scipy contains a > module to generate exponentially distributed rv's. The sum of iid exponential distributed rvs is gamma distributed http://en.wikipedia.org/wiki/Gamma_distribution all available in scipy.stats Josef > > Nicky > > 2009/11/25 Ram Rachum : >> Hello, >> >> I've just started using scipy/numpy for some queue theory. I have a queue for >> which the arrival rate is a Poisson distribution. I also have the mean number of >> arrivals per time unit. >> >> I looked around SciPy and I saw I can use scipy.stats.poisson. I was happy that >> it could make a random variable for number of arrivals per time unit. But I want >> the time between consecutive arrivals, as a random variable. >> >> Does anyone know how I can get that? >> >> Thanks, >> Ram. >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From d.l.goldsmith at gmail.com Wed Nov 25 14:53:19 2009 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Wed, 25 Nov 2009 11:53:19 -0800 Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace, etc.) methods in Prob. & Stats. Message-ID: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com> Are there enough applications of transform methods (by which I mean, Fourier, Laplace, Z, etc.) in probability & statistics for this to be considered its own specialty therein? Any text recommendations on it (even if it's only a chapter dedicated to it)? Thanks, DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Nov 25 15:19:54 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Nov 2009 15:19:54 -0500 Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace, etc.) methods in Prob. & Stats. In-Reply-To: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com> References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com> Message-ID: <1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com> On Wed, Nov 25, 2009 at 2:53 PM, David Goldsmith wrote: > Are there enough applications of transform methods (by which I mean, > Fourier, Laplace, Z, etc.) in probability & statistics for this to be > considered its own specialty therein?? Any text recommendations on it (even > if it's only a chapter dedicated to it)?? Thanks, > Some information is in the thread on my recent question "characteristic functions of probability distributions" There is a large literature in econometrics and statistics about using the characteristic function for estimation and testing. The reference of Nicky for queuing theory uses mostly the Laplace transform (for discrete distributions), while for continuous distributions and mixtures the continuous fourier transform is used (definition of characteristic function). I started to work my way through part of the literature with application in finance. Main use I looked at was using the inverse Fourier transform when the characteristic function has an analytical expression and the pdf does not, e.g used for estimating difffusion processes by MLE. I haven't looked much at the Laplace transform, because I'm more interested in the continuous random variable case. Related methods work directly with the empirical characteristic function to do estimation and testing, but I haven't looked much at that yet. I looked at references from all over the place, essentially with google searches and searches of the main stats journal collections. (I have a unsorted collection of pdfs on my computer but no overview about what I actually read.) Of course the biggest and oldest use of the Fourier transform is the frequency domain analysis in time series analysis. It's not off topic because I try to get some of these methods programmed in python. Josef > DG > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From vanforeest at gmail.com Wed Nov 25 17:41:43 2009 From: vanforeest at gmail.com (nicky van foreest) Date: Wed, 25 Nov 2009 23:41:43 +0100 Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace, etc.) methods in Prob. & Stats. In-Reply-To: <1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com> References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com> <1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com> Message-ID: Hi, 2009/11/25 : > On Wed, Nov 25, 2009 at 2:53 PM, David Goldsmith > wrote: >> Are there enough applications of transform methods (by which I mean, >> Fourier, Laplace, Z, etc.) in probability & statistics for this to be >> considered its own specialty therein?? Any text recommendations on it (even >> if it's only a chapter dedicated to it)?? Thanks, >> > > Some information is in the thread on my recent question > "characteristic functions of probability distributions" > > There is a large literature in econometrics and statistics about using > the characteristic function for estimation and testing. > The reference of Nicky for queuing theory uses mostly the Laplace > transform (for discrete distributions), It has been some time ago (more than 5 years...), but I recall that Whitt, in his articles on the numerical inversion of Laplace transforms, discretized Laplace transforms to facilitate the inversion, The distributions themselves are not necessarily discrete. One example would be the waiting time distribution of customers in a queue, which is continuous for most service and arrival processes. There is certainly potential for dedicated numerical inversion algo's for the Laplace transforms of density and distribution functions. The latter form a somewhat specialized sort of function. Distribution functions are 0 at -\infty, and 1 at \infty, and are non decreasing. They may also have discontinuities, but not too many. These properties may affect the inversion. Besides these properties, the transforms are used to obtain insight into the behavior of the sum of independent random variables. Such sums can be rewritten as the product of the transforms of distribution. This product in turn requires inversion to, as some people call it, take away the Laplacian curtain. Nicky From mjtemkin at gmail.com Wed Nov 25 18:13:00 2009 From: mjtemkin at gmail.com (Michael Temkin) Date: Wed, 25 Nov 2009 15:13:00 -0800 Subject: [SciPy-User] scipy.linalg import issues on Mac OS X Snow Leopard Message-ID: <79a789c20911251513l228219a2mc80f33f51361e61a@mail.gmail.com> I've been having numerous issues getting scipy to work on Mac OX 10.6. I finally got it to build using the instructions from http://blog.hyperjeff.net/?p=160. For some reason the scipy superpack won't work on my machine (and neither would the macports release) so building from source was the best option. Even though the build was finally successful, I am still unable to use the library. The error message I am getting now is: from scipy.linalg import norm, inv File "/Library/Python/2.6/site-packages/scipy/linalg/__init__.py", line 8, in from basic import * File "/Library/Python/2.6/site-packages/scipy/linalg/basic.py", line 17, in from lapack import get_lapack_funcs File "/Library/Python/2.6/site-packages/scipy/linalg/lapack.py", line 17, in from scipy.linalg import flapack ImportError: dlopen(/Library/Python/2.6/site-packages/scipy/linalg/flapack.so, 2): Symbol not found: _f2pywrapdlamch_ Referenced from: /Library/Python/2.6/site-packages/scipy/linalg/flapack.so Expected in: dynamic lookup Everything is where it should be, and as far as I know I am not doing anything non-standard. I'm using python 2.6.4, all the libraries were built with no obvious issues. Does anyone have any ideas as to what could cause this issue? thanks. From josef.pktd at gmail.com Wed Nov 25 18:23:26 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Nov 2009 18:23:26 -0500 Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace, etc.) methods in Prob. & Stats. In-Reply-To: References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com> <1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com> Message-ID: <1cd32cbb0911251523k460e67cdxa8b96341c1826144@mail.gmail.com> On Wed, Nov 25, 2009 at 5:41 PM, nicky van foreest wrote: > Hi, > > 2009/11/25 ?: >> On Wed, Nov 25, 2009 at 2:53 PM, David Goldsmith >> wrote: >>> Are there enough applications of transform methods (by which I mean, >>> Fourier, Laplace, Z, etc.) in probability & statistics for this to be >>> considered its own specialty therein?? Any text recommendations on it (even >>> if it's only a chapter dedicated to it)?? Thanks, >>> >> >> Some information is in the thread on my recent question >> "characteristic functions of probability distributions" >> >> There is a large literature in econometrics and statistics about using >> the characteristic function for estimation and testing. >> The reference of Nicky for queuing theory uses mostly the Laplace >> transform (for discrete distributions), > > It has been some time ago (more than 5 years...), but I recall that > Whitt, in his articles on the numerical inversion of Laplace > transforms, discretized Laplace transforms to facilitate the > inversion, The distributions themselves are not necessarily discrete. > One example would be the waiting time distribution of customers in a > queue, which is continuous for most service and arrival processes. > > There is certainly potential for dedicated numerical inversion algo's > for the Laplace transforms of density and distribution functions. The > latter form a somewhat specialized sort of function. Distribution > functions are 0 at -\infty, and 1 at \infty, and are non decreasing. > They may also have discontinuities, but not too many. These properties > may affect the inversion. ?Besides these properties, the transforms > are used to obtain insight into the behavior of the sum of independent > random variables. Such sums can be rewritten as the product of the > transforms of distribution. This product in turn requires inversion > to, as some people call it, take away the Laplacian curtain. Is there an advantage to using Laplace instead of Fourier transform in this context? I had to stop working on this, because I have to finish up some other projects. The advantages that I saw for the Fourier transform are that it has directly the interpretation as characteristic function with explicit formulas for many distributions, e.g. stable distribution which has no analytical expression for pdf or cdf, and the availability of fft to do fast inversion instead of pointwise integration. Except reading the definition of the Laplace transform, I don't know much about it and have no idea what the numerical advantages might be. Another application, besides the sum of rvs, that I looked at, are mixture distributions, e.g. Poisson mixture of continuous (lognormal) distributions, which are also easy to calculate in terms of the characteristic function, and I guess the Laplace transform. This is an older reference that is cited quite a bit: Waller, Lance A., Bruce W. Turnbull, and J. Michael Hardin. ?Obtaining Distribution Functions by Numerical Inversion of Characteristic Functions with Applications.? The American Statistician 49, no. 4 (November 1995): 346-350. Josef > > Nicky > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From d.l.goldsmith at gmail.com Wed Nov 25 18:24:37 2009 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Wed, 25 Nov 2009 15:24:37 -0800 Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace, etc.) methods in Prob. & Stats. In-Reply-To: References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com> <1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com> Message-ID: <45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com> Good info, thanks; I'll look up "your" thread, Josef, on the archive and run down what look like relevant references. (FWIW, my interest is that I'm helping out (nominally, "tutoring," but this level, it's more akin to being a sounding board, checking his derivations, and "reminding" him of various subtleties that are emphasized in math, but not necessarily in EE, etc.) this guy working on his dissertation on air traffic control automation using wireless communication protocols, very probability heavy stuff, and for the second time yesterday, he presented me with a transform application - in this instance, the "Z" transform - in this probability-heavy stuff, and this is outside of my training in probability, so I want to "bone-up.") Thanks again, DG On Wed, Nov 25, 2009 at 2:41 PM, nicky van foreest wrote: > Hi, > > 2009/11/25 : > > On Wed, Nov 25, 2009 at 2:53 PM, David Goldsmith > > wrote: > >> Are there enough applications of transform methods (by which I mean, > >> Fourier, Laplace, Z, etc.) in probability & statistics for this to be > >> considered its own specialty therein? Any text recommendations on it > (even > >> if it's only a chapter dedicated to it)? Thanks, > >> > > > > Some information is in the thread on my recent question > > "characteristic functions of probability distributions" > > > > There is a large literature in econometrics and statistics about using > > the characteristic function for estimation and testing. > > The reference of Nicky for queuing theory uses mostly the Laplace > > transform (for discrete distributions), > > It has been some time ago (more than 5 years...), but I recall that > Whitt, in his articles on the numerical inversion of Laplace > transforms, discretized Laplace transforms to facilitate the > inversion, The distributions themselves are not necessarily discrete. > One example would be the waiting time distribution of customers in a > queue, which is continuous for most service and arrival processes. > > There is certainly potential for dedicated numerical inversion algo's > for the Laplace transforms of density and distribution functions. The > latter form a somewhat specialized sort of function. Distribution > functions are 0 at -\infty, and 1 at \infty, and are non decreasing. > They may also have discontinuities, but not too many. These properties > may affect the inversion. Besides these properties, the transforms > are used to obtain insight into the behavior of the sum of independent > random variables. Such sums can be rewritten as the product of the > transforms of distribution. This product in turn requires inversion > to, as some people call it, take away the Laplacian curtain. > > Nicky > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Wed Nov 25 18:27:57 2009 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Wed, 25 Nov 2009 15:27:57 -0800 Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace, etc.) methods in Prob. & Stats. In-Reply-To: <1cd32cbb0911251523k460e67cdxa8b96341c1826144@mail.gmail.com> References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com> <1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com> <1cd32cbb0911251523k460e67cdxa8b96341c1826144@mail.gmail.com> Message-ID: <45d1ab480911251527u473b5503i5a7856ba47e3fccf@mail.gmail.com> On Wed, Nov 25, 2009 at 3:23 PM, wrote: > On Wed, Nov 25, 2009 at 5:41 PM, nicky van foreest > wrote: > > Hi, > >This is an older reference that is cited quite a bit: > Waller, Lance A., Bruce W. Turnbull, and J. Michael Hardin. ?Obtaining > Distribution Functions by Numerical Inversion of Characteristic > Functions with Applications.? The American Statistician 49, no. 4 > (November 1995): 346-350. > Great, thanks! DG > > Josef > > > > > Nicky > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Nov 25 18:45:38 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Nov 2009 18:45:38 -0500 Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace, etc.) methods in Prob. & Stats. In-Reply-To: <45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com> References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com> <1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com> <45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com> Message-ID: <1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com> On Wed, Nov 25, 2009 at 6:24 PM, David Goldsmith wrote: > Good info, thanks; I'll look up "your" thread, Josef, on the archive and run > down what look like relevant references.? (FWIW, my interest is that I'm > helping out (nominally, "tutoring," but this level, it's more akin to being > a sounding board, checking his derivations, and "reminding" him of various > subtleties that are emphasized in math, but not necessarily in EE, etc.) > this guy working on his dissertation on air traffic control automation using > wireless communication protocols, very probability heavy stuff, and for the > second time yesterday, he presented me with a transform application - in > this instance, the "Z" transform - in this probability-heavy stuff, and this > is outside of my training in probability, so I want to "bone-up.")? Thanks > again, I always have to look for your reply because you don't follow our bottom-posting policy. I have seen the z-transform only in the context of time series analysis http://en.wikipedia.org/wiki/Z-transform especially this http://en.wikipedia.org/wiki/Z-transform#Linear_constant-coefficient_difference_equation covered to some extend in scipy.signal, lfilter and lti so the other literature to Laplace transforms and characteristic functions might not be very closely related. Josef > > DG > > On Wed, Nov 25, 2009 at 2:41 PM, nicky van foreest > wrote: >> >> Hi, >> >> 2009/11/25 ?: >> > On Wed, Nov 25, 2009 at 2:53 PM, David Goldsmith >> > wrote: >> >> Are there enough applications of transform methods (by which I mean, >> >> Fourier, Laplace, Z, etc.) in probability & statistics for this to be >> >> considered its own specialty therein?? Any text recommendations on it >> >> (even >> >> if it's only a chapter dedicated to it)?? Thanks, >> >> >> > >> > Some information is in the thread on my recent question >> > "characteristic functions of probability distributions" >> > >> > There is a large literature in econometrics and statistics about using >> > the characteristic function for estimation and testing. >> > The reference of Nicky for queuing theory uses mostly the Laplace >> > transform (for discrete distributions), >> >> It has been some time ago (more than 5 years...), but I recall that >> Whitt, in his articles on the numerical inversion of Laplace >> transforms, discretized Laplace transforms to facilitate the >> inversion, The distributions themselves are not necessarily discrete. >> One example would be the waiting time distribution of customers in a >> queue, which is continuous for most service and arrival processes. >> >> There is certainly potential for dedicated numerical inversion algo's >> for the Laplace transforms of density and distribution functions. The >> latter form a somewhat specialized sort of function. Distribution >> functions are 0 at -\infty, and 1 at \infty, and are non decreasing. >> They may also have discontinuities, but not too many. These properties >> may affect the inversion. ?Besides these properties, the transforms >> are used to obtain insight into the behavior of the sum of independent >> random variables. Such sums can be rewritten as the product of the >> transforms of distribution. This product in turn requires inversion >> to, as some people call it, take away the Laplacian curtain. >> >> Nicky >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From d.l.goldsmith at gmail.com Wed Nov 25 19:31:00 2009 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Wed, 25 Nov 2009 16:31:00 -0800 Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace, etc.) methods in Prob. & Stats. In-Reply-To: <1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com> References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com> <1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com> <45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com> <1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com> Message-ID: <45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com> On Wed, Nov 25, 2009 at 3:45 PM, wrote: > On Wed, Nov 25, 2009 at 6:24 PM, David Goldsmith > wrote: > > Good info, thanks; I'll look up "your" thread, Josef, on the archive and > run > > down what look like relevant references. (FWIW, my interest is that I'm > > helping out (nominally, "tutoring," but this level, it's more akin to > being > > a sounding board, checking his derivations, and "reminding" him of > various > > subtleties that are emphasized in math, but not necessarily in EE, etc.) > > this guy working on his dissertation on air traffic control automation > using > > wireless communication protocols, very probability heavy stuff, and for > the > > second time yesterday, he presented me with a transform application - in > > this instance, the "Z" transform - in this probability-heavy stuff, and > this > > is outside of my training in probability, so I want to "bone-up.") > Thanks > > again, > > I always have to look for your reply because you don't follow our > bottom-posting > policy. > Sorry, I tend to "follow" when I'm saying something in direct response to something I'm replying to and/or when I think that I'm likely _not_ terminating the thread, but when I'm responding generally and/or think that I am likely terminating the thread, then I tend to just reply at the top. I'll try to remember that we have a policy. :-) I have seen the z-transform only in the context of time series analysis > http://en.wikipedia.org/wiki/Z-transform > especially this > > http://en.wikipedia.org/wiki/Z-transform#Linear_constant-coefficient_difference_equation > covered to some extend in scipy.signal, lfilter and lti > Part of the problem was that it wasn't clear to either of us - myself or my "student" - why the authors of this particular paper were using the z-transform at all where they were - it seemed their result was easily derivable w/out it, so we were both baffled. so the other literature to Laplace transforms and characteristic functions > might not be very closely related. > Perhaps not directly (in any event, presently, I'm interested in theoretical/"analytical," i.e., not numerical, applications anyway), but my philosophy has always been, if I can be directed to something that is closer to on target than what I've been able to find on my own, then even if it's not a bulls-eye, I can often find a bulls-eye in the reference's references. For example, "Chung (or any other book on graduate probability)" sounds like a good starting point. So thanks for reminding me about the thread. (I knew it sounded familiar: I contributed to it! And on that note, I "let it lie" at the time, but now feel I should say, admittedly a little defensively, that of course Anne's comments were on the mark; the only reasons I felt it necessary to add what I did about complex integration over a closed path were: A) you had indicated that you were a bit of a novice in the field, and the result I was giving is, perhaps arguably, the subject's most fundamental result, and B) I felt that it was important that you were aware of it because, if any of your functions _were_ analytic and your paths closed, then you shouldn't be doing any (explicit) numerical (or symbolic, for that matter) integration at all - you should just be "hard-wiring" those integrals to zero! And for what it's worth: every time you integrate with respect to one (continuous) real variable, you're doing a path integration - one so comparatively trivial that we don't call it that, but a path integration nevertheless.) :-) DG > > Josef > > > > > > DG > > > > On Wed, Nov 25, 2009 at 2:41 PM, nicky van foreest > > > wrote: > >> > >> Hi, > >> > >> 2009/11/25 : > >> > On Wed, Nov 25, 2009 at 2:53 PM, David Goldsmith > >> > wrote: > >> >> Are there enoug applications of transform methods (by which I mean, > >> >> Fourier, Laplace, Z, etc.) in probability & statistics for this to be > >> >> considered its own specialty therein? Any text recommendations on it > >> >> (even > >> >> if it's only a chapter dedicated to it)? Thanks, > >> >> > >> > > >> > Some information is in the thread on my recent question > >> > "characteristic functions of probability distributions" > >> > > >> > There is a large literature in econometrics and statistics about using > >> > the characteristic function for estimation and testing. > >> > The reference of Nicky for queuing theory uses mostly the Laplace > >> > transform (for discrete distributions), > >> > >> It has been some time ago (more than 5 years...), but I recall that > >> Whitt, in his articles on the numerical inversion of Laplace > >> transforms, discretized Laplace transforms to facilitate the > >> inversion, The distributions themselves are not necessarily discrete. > >> One example would be the waiting time distribution of customers in a > >> queue, which is continuous for most service and arrival processes. > >> > >> There is certainly potential for dedicated numerical inversion algo's > >> for the Laplace transforms of density and distribution functions. The > >> latter form a somewhat specialized sort of function. Distribution > >> functions are 0 at -\infty, and 1 at \infty, and are non decreasing. > >> They may also have discontinuities, but not too many. These properties > >> may affect the inversion. Besides these properties, the transforms > >> are used to obtain insight into the behavior of the sum of independent > >> random variables. Such sums can be rewritten as the product of the > >> transforms of distribution. This product in turn requires inversion > >> to, as some people call it, take away the Laplacian curtain. > >> > >> Nicky > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Wed Nov 25 19:54:58 2009 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Wed, 25 Nov 2009 16:54:58 -0800 Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace, etc.) methods in Prob. & Stats. In-Reply-To: <45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com> References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com> <1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com> <45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com> <1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com> <45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com> Message-ID: <45d1ab480911251654q78b5a4d6ub810ec1042a1011c@mail.gmail.com> On Wed, Nov 25, 2009 at 4:31 PM, David Goldsmith wrote: > On Wed, Nov 25, 2009 at 3:45 PM, wrote: > >> On Wed, Nov 25, 2009 at 6:24 PM, David Goldsmith >> wrote: >> > Good info, thanks; I'll look up "your" thread, Josef, on the archive and >> run >> > Chung, K. L., 2000. "A Course In Probability Theory, 2nd Ed." Academic. looks like a really good general reference, Nicky - I assume this is the "Chung" to which you were referring? Thanks!!! DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Nov 25 21:04:09 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Nov 2009 21:04:09 -0500 Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace, etc.) methods in Prob. & Stats. In-Reply-To: <45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com> References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com> <1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com> <45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com> <1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com> <45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com> Message-ID: <1cd32cbb0911251804h7ddb8875ud4e402040d7889e4@mail.gmail.com> On Wed, Nov 25, 2009 at 7:31 PM, David Goldsmith wrote: > On Wed, Nov 25, 2009 at 3:45 PM, wrote: >> >> On Wed, Nov 25, 2009 at 6:24 PM, David Goldsmith >> wrote: >> > Good info, thanks; I'll look up "your" thread, Josef, on the archive and >> > run >> > down what look like relevant references.? (FWIW, my interest is that I'm >> > helping out (nominally, "tutoring," but this level, it's more akin to >> > being >> > a sounding board, checking his derivations, and "reminding" him of >> > various >> > subtleties that are emphasized in math, but not necessarily in EE, etc.) >> > this guy working on his dissertation on air traffic control automation >> > using >> > wireless communication protocols, very probability heavy stuff, and for >> > the >> > second time yesterday, he presented me with a transform application - in >> > this instance, the "Z" transform - in this probability-heavy stuff, and >> > this >> > is outside of my training in probability, so I want to "bone-up.") >> > Thanks >> > again, >> >> I always have to look for your reply because you don't follow our >> bottom-posting >> policy. > > Sorry, I tend to "follow" when I'm saying something in direct response to > something I'm replying to and/or when I think that I'm likely _not_ > terminating the thread, but when I'm responding generally and/or think that > I am likely terminating the thread, then I tend to just reply at the top. > I'll try to remember that we have a policy. :-) > >> I have seen the z-transform only in the context of time series analysis >> http://en.wikipedia.org/wiki/Z-transform >> especially this >> >> http://en.wikipedia.org/wiki/Z-transform#Linear_constant-coefficient_difference_equation >> covered to some extend in scipy.signal, lfilter and lti > > Part of the problem was that it wasn't clear to either of us - myself or my > "student" - why the authors of this particular paper were using the > z-transform at all where they were - it seemed their result was easily > derivable w/out it, so we were both baffled. > >> so the other literature to Laplace transforms and characteristic functions >> might not be very closely related. > > Perhaps not directly (in any event, presently, I'm interested in > theoretical/"analytical," i.e., not numerical, applications anyway), but my > philosophy has always been, if I can be directed to something that is closer > to on target than what I've been able to find on my own, then even if it's > not a bulls-eye, I can often find a bulls-eye in the reference's > references.? For example, "Chung (or any other book on graduate > probability)" sounds like a good starting point.? So thanks for reminding me > about the thread.? (I knew it sounded familiar: I contributed to it!? And on > that note, I "let it lie" at the time, but now feel I should say, admittedly > a little defensively, that of course Anne's comments were on the mark; the > only reasons I felt it necessary to add what I did about complex integration > over a closed path were: A) you had indicated that you were a bit of a > novice in the field, and the result I was giving is, perhaps arguably, the > subject's most fundamental result, and B) I felt that it was important that > you were aware of it because, if any of your functions _were_ analytic and > your paths closed, then you shouldn't be doing any (explicit) numerical (or > symbolic, for that matter) integration at all - you should just be > "hard-wiring" those integrals to zero!? And for what it's worth: every time > you integrate with respect to one (continuous) real variable, you're doing a > path integration - one so comparatively trivial that we don't call it that, > but a path integration nevertheless.) :-) I was just reading up a bit on contour integrals on wikipedia, and it looks too applied for Probability and Measure theory. It just tells you how to use some tricks to calculate specific Rieman integrals in the complex plane. I didn't see any hints for Lebesque integrals. All real analysis, and measure theory (that I have seen) is based on Lebesque integration or Lebesque-Stiltjes as in Chungs book. So for me contour integrals just falls in between the measure theory and the applied (real) calculations, and I never had to figure out what it does. I'm not doing path integration when I integrate with respect to a (probability) measure that has both continuous intervals and mass points (Lebesque not Rieman if you want to be picky) Josef > > DG > >> >> Josef >> >> >> > >> > DG >> > >> > On Wed, Nov 25, 2009 at 2:41 PM, nicky van foreest >> > >> > wrote: >> >> >> >> Hi, >> >> >> >> 2009/11/25 ?: >> >> > On Wed, Nov 25, 2009 at 2:53 PM, David Goldsmith >> >> > wrote: >> >> >> Are there enoug applications of transform methods (by which I mean, >> >> >> Fourier, Laplace, Z, etc.) in probability & statistics for this to >> >> >> be >> >> >> considered its own specialty therein?? Any text recommendations on >> >> >> it >> >> >> (even >> >> >> if it's only a chapter dedicated to it)?? Thanks, >> >> >> >> >> > >> >> > Some information is in the thread on my recent question >> >> > "characteristic functions of probability distributions" >> >> > >> >> > There is a large literature in econometrics and statistics about >> >> > using >> >> > the characteristic function for estimation and testing. >> >> > The reference of Nicky for queuing theory uses mostly the Laplace >> >> > transform (for discrete distributions), >> >> >> >> It has been some time ago (more than 5 years...), but I recall that >> >> Whitt, in his articles on the numerical inversion of Laplace >> >> transforms, discretized Laplace transforms to facilitate the >> >> inversion, The distributions themselves are not necessarily discrete. >> >> One example would be the waiting time distribution of customers in a >> >> queue, which is continuous for most service and arrival processes. >> >> >> >> There is certainly potential for dedicated numerical inversion algo's >> >> for the Laplace transforms of density and distribution functions. The >> >> latter form a somewhat specialized sort of function. Distribution >> >> functions are 0 at -\infty, and 1 at \infty, and are non decreasing. >> >> They may also have discontinuities, but not too many. These properties >> >> may affect the inversion. ?Besides these properties, the transforms >> >> are used to obtain insight into the behavior of the sum of independent >> >> random variables. Such sums can be rewritten as the product of the >> >> transforms of distribution. This product in turn requires inversion >> >> to, as some people call it, take away the Laplacian curtain. >> >> >> >> Nicky >> >> _______________________________________________ >> >> SciPy-User mailing list >> >> SciPy-User at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> > >> > _______________________________________________ >> > SciPy-User mailing list >> > SciPy-User at scipy.org >> > http://mail.scipy.org/mailman/listinfo/scipy-user >> > >> > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From josef.pktd at gmail.com Wed Nov 25 21:48:56 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 25 Nov 2009 21:48:56 -0500 Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace, etc.) methods in Prob. & Stats. In-Reply-To: <1cd32cbb0911251804h7ddb8875ud4e402040d7889e4@mail.gmail.com> References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com> <1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com> <45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com> <1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com> <45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com> <1cd32cbb0911251804h7ddb8875ud4e402040d7889e4@mail.gmail.com> Message-ID: <1cd32cbb0911251848k6afded9egca0b9b399194eb44@mail.gmail.com> On Wed, Nov 25, 2009 at 9:04 PM, wrote: > On Wed, Nov 25, 2009 at 7:31 PM, David Goldsmith > wrote: >> On Wed, Nov 25, 2009 at 3:45 PM, wrote: >>> >>> On Wed, Nov 25, 2009 at 6:24 PM, David Goldsmith >>> wrote: >>> > Good info, thanks; I'll look up "your" thread, Josef, on the archive and >>> > run >>> > down what look like relevant references.? (FWIW, my interest is that I'm >>> > helping out (nominally, "tutoring," but this level, it's more akin to >>> > being >>> > a sounding board, checking his derivations, and "reminding" him of >>> > various >>> > subtleties that are emphasized in math, but not necessarily in EE, etc.) >>> > this guy working on his dissertation on air traffic control automation >>> > using >>> > wireless communication protocols, very probability heavy stuff, and for >>> > the >>> > second time yesterday, he presented me with a transform application - in >>> > this instance, the "Z" transform - in this probability-heavy stuff, and >>> > this >>> > is outside of my training in probability, so I want to "bone-up.") >>> > Thanks >>> > again, >>> >>> I always have to look for your reply because you don't follow our >>> bottom-posting >>> policy. >> >> Sorry, I tend to "follow" when I'm saying something in direct response to >> something I'm replying to and/or when I think that I'm likely _not_ >> terminating the thread, but when I'm responding generally and/or think that >> I am likely terminating the thread, then I tend to just reply at the top. >> I'll try to remember that we have a policy. :-) >> >>> I have seen the z-transform only in the context of time series analysis >>> http://en.wikipedia.org/wiki/Z-transform >>> especially this >>> >>> http://en.wikipedia.org/wiki/Z-transform#Linear_constant-coefficient_difference_equation >>> covered to some extend in scipy.signal, lfilter and lti >> >> Part of the problem was that it wasn't clear to either of us - myself or my >> "student" - why the authors of this particular paper were using the >> z-transform at all where they were - it seemed their result was easily >> derivable w/out it, so we were both baffled. >> >>> so the other literature to Laplace transforms and characteristic functions >>> might not be very closely related. >> >> Perhaps not directly (in any event, presently, I'm interested in >> theoretical/"analytical," i.e., not numerical, applications anyway), but my >> philosophy has always been, if I can be directed to something that is closer >> to on target than what I've been able to find on my own, then even if it's >> not a bulls-eye, I can often find a bulls-eye in the reference's >> references.? For example, "Chung (or any other book on graduate >> probability)" sounds like a good starting point.? So thanks for reminding me >> about the thread.? (I knew it sounded familiar: I contributed to it!? And on >> that note, I "let it lie" at the time, but now feel I should say, admittedly >> a little defensively, that of course Anne's comments were on the mark; the >> only reasons I felt it necessary to add what I did about complex integration >> over a closed path were: A) you had indicated that you were a bit of a >> novice in the field, and the result I was giving is, perhaps arguably, the >> subject's most fundamental result, and B) I felt that it was important that >> you were aware of it because, if any of your functions _were_ analytic and >> your paths closed, then you shouldn't be doing any (explicit) numerical (or >> symbolic, for that matter) integration at all - you should just be >> "hard-wiring" those integrals to zero!? And for what it's worth: every time >> you integrate with respect to one (continuous) real variable, you're doing a >> path integration - one so comparatively trivial that we don't call it that, >> but a path integration nevertheless.) :-) > > I was just reading up a bit on contour integrals on wikipedia, and it > looks too applied for Probability and Measure theory. It just tells > you how to use some tricks to calculate specific Rieman integrals in > the complex plane. I didn't see any hints for Lebesque integrals. > All real analysis, and measure theory (that I have seen) is based on > Lebesque integration or Lebesque-Stiltjes as in Chungs book. So for me > contour integrals just falls in between the measure theory and the > applied (real) calculations, and I never had to figure out what it > does. > > I'm not doing path integration when I integrate with respect to a > (probability) measure that has both continuous intervals and mass > points (Lebesque not Rieman if you want to be picky) Maybe the last statement is wrong, it's too long ago that I struggled with this. Maybe I'm mixing up Lebesgue-integral, Lebesgue-measurable, and measures that are absolutely continuous with respect to Lebesgue-measure. Josef > > Josef > > > > >> >> DG >> >>> >>> Josef >>> >>> >>> > >>> > DG >>> > >>> > On Wed, Nov 25, 2009 at 2:41 PM, nicky van foreest >>> > >>> > wrote: >>> >> >>> >> Hi, >>> >> >>> >> 2009/11/25 ?: >>> >> > On Wed, Nov 25, 2009 at 2:53 PM, David Goldsmith >>> >> > wrote: >>> >> >> Are there enoug applications of transform methods (by which I mean, >>> >> >> Fourier, Laplace, Z, etc.) in probability & statistics for this to >>> >> >> be >>> >> >> considered its own specialty therein?? Any text recommendations on >>> >> >> it >>> >> >> (even >>> >> >> if it's only a chapter dedicated to it)?? Thanks, >>> >> >> >>> >> > >>> >> > Some information is in the thread on my recent question >>> >> > "characteristic functions of probability distributions" >>> >> > >>> >> > There is a large literature in econometrics and statistics about >>> >> > using >>> >> > the characteristic function for estimation and testing. >>> >> > The reference of Nicky for queuing theory uses mostly the Laplace >>> >> > transform (for discrete distributions), >>> >> >>> >> It has been some time ago (more than 5 years...), but I recall that >>> >> Whitt, in his articles on the numerical inversion of Laplace >>> >> transforms, discretized Laplace transforms to facilitate the >>> >> inversion, The distributions themselves are not necessarily discrete. >>> >> One example would be the waiting time distribution of customers in a >>> >> queue, which is continuous for most service and arrival processes. >>> >> >>> >> There is certainly potential for dedicated numerical inversion algo's >>> >> for the Laplace transforms of density and distribution functions. The >>> >> latter form a somewhat specialized sort of function. Distribution >>> >> functions are 0 at -\infty, and 1 at \infty, and are non decreasing. >>> >> They may also have discontinuities, but not too many. These properties >>> >> may affect the inversion. ?Besides these properties, the transforms >>> >> are used to obtain insight into the behavior of the sum of independent >>> >> random variables. Such sums can be rewritten as the product of the >>> >> transforms of distribution. This product in turn requires inversion >>> >> to, as some people call it, take away the Laplacian curtain. >>> >> >>> >> Nicky >>> >> _______________________________________________ >>> >> SciPy-User mailing list >>> >> SciPy-User at scipy.org >>> >> http://mail.scipy.org/mailman/listinfo/scipy-user >>> > >>> > >>> > _______________________________________________ >>> > SciPy-User mailing list >>> > SciPy-User at scipy.org >>> > http://mail.scipy.org/mailman/listinfo/scipy-user >>> > >>> > >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > From david at ar.media.kyoto-u.ac.jp Wed Nov 25 23:45:45 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 26 Nov 2009 13:45:45 +0900 Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace, etc.) methods in Prob. & Stats. In-Reply-To: <1cd32cbb0911251848k6afded9egca0b9b399194eb44@mail.gmail.com> References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com> <1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com> <45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com> <1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com> <45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com> <1cd32cbb0911251804h7ddb8875ud4e402040d7889e4@mail.gmail.com> <1cd32cbb0911251848k6afded9egca0b9b399194eb44@mail.gmail.com> Message-ID: <4B0E07F9.8070409@ar.media.kyoto-u.ac.jp> josef.pktd at gmail.com wrote: > > Maybe the last statement is wrong, it's too long ago that I > struggled with this. Maybe I'm mixing up Lebesgue-integral, > Lebesgue-measurable, and measures that are absolutely continuous > with respect to Lebesgue-measure. > I am by no mean an expert on this, but I believe you are right. AFAIK, contour integrals require to have a piecewise-continuous parametrization of your path, and for me, the whole point of Lebesgue integrals is to handle cases where the set over which you integrate the function is not a (finite) union of intervals. I don't know if it makes sense to define something "like" contour integrals for lebesgue integrals. The fundamental reason why Lebesgue integrals work the way they do is because for a function f: E ->F, only the properties of F (and how the inversion function maps elements of the sigma algebra F) matter. And complex analysis is 'special' because of the special structure of E, not F. David From josef.pktd at gmail.com Thu Nov 26 01:19:38 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 26 Nov 2009 01:19:38 -0500 Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace, etc.) methods in Prob. & Stats. In-Reply-To: <4B0E07F9.8070409@ar.media.kyoto-u.ac.jp> References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com> <1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com> <45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com> <1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com> <45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com> <1cd32cbb0911251804h7ddb8875ud4e402040d7889e4@mail.gmail.com> <1cd32cbb0911251848k6afded9egca0b9b399194eb44@mail.gmail.com> <4B0E07F9.8070409@ar.media.kyoto-u.ac.jp> Message-ID: <1cd32cbb0911252219p45046eddx88ac1272e9ed8ec4@mail.gmail.com> On Wed, Nov 25, 2009 at 11:45 PM, David Cournapeau wrote: > josef.pktd at gmail.com wrote: >> >> Maybe the last statement is wrong, it's too long ago that I >> struggled with this. Maybe I'm mixing up Lebesgue-integral, >> Lebesgue-measurable, and measures that are absolutely continuous >> with respect to Lebesgue-measure. >> > > I am by no mean an expert on this, but I believe you are right. AFAIK, > contour integrals require to have a piecewise-continuous parametrization > of your path, and for me, the whole point of Lebesgue integrals is to > handle cases where the set over which you integrate the function is not > a (finite) union of intervals. > > I don't know if it makes sense to define something "like" contour > integrals for lebesgue integrals. The fundamental reason why Lebesgue > integrals work the way they do is because for a function f: E ->F, only > the properties of F (and how the inversion function maps elements of the > sigma algebra F) matter. And complex analysis is 'special' because of > the special structure of E, not F. I think on the theoretical level I'm right, but from what I read the last few hours, contour integrals seem to provide a method to actually calculate the integral, while I haven't seen much practical applications of Lebesgue integration. For the simple examples that I tried so far for the inversion of the characteristic function, I didn't need contour nor Lebesque integrals. And I hope it stays this way when I get back to this, especially since I never had to learn anything about complex analysis and the special structure of complex numbers. Josef > > David > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From oliphant at enthought.com Thu Nov 26 11:42:44 2009 From: oliphant at enthought.com (Travis Oliphant) Date: Thu, 26 Nov 2009 10:42:44 -0600 Subject: [SciPy-User] sinc interpolation In-Reply-To: <5aa3b3570911250816t38ce53bejd693c7723d2f7924@mail.gmail.com> References: <5aa3b3570911250816t38ce53bejd693c7723d2f7924@mail.gmail.com> Message-ID: On Nov 25, 2009, at 10:16 AM, Giovanni Marco Dall'Olio wrote: > > > On Wed, Nov 25, 2009 at 3:48 PM, Travis Oliphant > wrote: > > > from scipy.signal import resample > from numpy import r_, sin > from pylab import plot > > x = r_[0:10] > y = sin(x) > yy = resample(x, 100) > > # This is a bit tricky to get the x-samples right > xx = r_[0:10:101j][:-1] > > just a question, why don't you use numpy.linspace(0, 10, 101) ? I do quite often (especially in module code), but r_ is less typing and I like the use of slice syntax to specify endpoints. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From sccolbert at gmail.com Thu Nov 26 12:19:19 2009 From: sccolbert at gmail.com (S. Chris Colbert) Date: Thu, 26 Nov 2009 18:19:19 +0100 Subject: [SciPy-User] [Fwd: Re: ANN: SfePy 2009.4] In-Reply-To: <4B0D1603.40509@ntc.zcu.cz> References: <4B0D0E33.4010703@ntc.zcu.cz> <20091125110643.GB21484@phare.normalesup.org> <4B0D1603.40509@ntc.zcu.cz> Message-ID: <200911261819.19584.sccolbert@gmail.com> I'm getting all sorts of errors trying to run sfepy tests and examples: It builds fine. But I fail one of the solvers test because of a bug with OpenMPI (whichever solver is using Petsc4py which btw, is not listed as a dependency). The schroedinger example runs, but produces erroneous output (~300% error). The poisson and valec examples produce error results. System specs: Kubuntu 9.10 x64 Self built/easy_insall: Numpy 1.3.0 Scipy 0.7.1 Newest umfpack scikit Newest Petsc4Py pytables from the repos: hdf5-serial openmpi 1.6.6 pysparse Any help would be awesome! Cheers, Chris > Gael Varoquaux wrote: > > On Wed, Nov 25, 2009 at 12:00:03PM +0100, Robert Cimrman wrote: > >>> 555 else: > >>> 556 gui = ViewerGUI(viewer=self) > >>> --> 557 scene = gui.scene.mayavi_scene > >>> 558 > >>> 559 if scene is not self.scene: > >>> > >>> AttributeError: 'MlabSceneModel' object has no attribute 'mayavi_scene' > > > > Yes, mayavi_scene is new in 3.3.0. We realized that this functionnality > > we needed a bit late :) > > > > Ga?l > > And I have started using it even later :) > > thanks for clarification! > r. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From vanforeest at gmail.com Thu Nov 26 15:34:37 2009 From: vanforeest at gmail.com (nicky van foreest) Date: Thu, 26 Nov 2009 21:34:37 +0100 Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace, etc.) methods in Prob. & Stats. In-Reply-To: <45d1ab480911251654q78b5a4d6ub810ec1042a1011c@mail.gmail.com> References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com> <1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com> <45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com> <1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com> <45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com> <45d1ab480911251654q78b5a4d6ub810ec1042a1011c@mail.gmail.com> Message-ID: > Chung, K. L., 2000. "A Course In Probability Theory, 2nd Ed." Academic. > looks like a really good general reference, Nicky - I assume this is the > "Chung" to which you were referring?? Thanks!!! That is the one indeed. For z transforms you might like generatingfunctionoly (or something like this). Search on Herbert Wilf, On his website you can find a very nice book on the uses of z transforms. The first chapter is very accessible. HOpe this helps. NIcky From peter.combs at berkeley.edu Thu Nov 26 16:34:12 2009 From: peter.combs at berkeley.edu (Peter Combs) Date: Thu, 26 Nov 2009 13:34:12 -0800 Subject: [SciPy-User] Bivariate Spline Surface Fitting Message-ID: Hi all, I have localization data in 2 color channels that should agree with each other, but in practice, they don't to the level we want. I thought I'd try doing a straight polynomial least squares fit, and while that gives better registration between the two, I'm still not to the level I want. My next thought was a spline fit, so I'm trying to make two least-squares bivariate spline fits: one for taking (x,y) to x', and one for taking (x,y) to y'. import scipy.interpolate as interp ... def makeLSQspline(xl, yl, xr, yr): """docstring for makespline""" xmin = xr.min()-1 xmax = xr.max()+1 ymin = yr.min()-1 ymax = yr.max()+1 n = len(xl) print "xrange: ", xmin, xmax, '\t', "yrange: ", ymin, ymax yknots, xknots = mgrid[ymin:ymax:10j, xmin:xmax:10j] # Makes an 11x11 regular grid of knot locations xspline = interp.LSQBivariateSpline(xr, yr, xl, xknots.flat, yknots.flat) yspline = interp.LSQBivariateSpline(xr, yr, yl, xknots.flat, yknots.flat) def mapping(xr, yr): xl = xspline.ev(xr, yr) yl = yspline.ev(xr, yr) return xl, yl return mapping I have a "Registration Error" function which calculates a mapping for all but the ith point, then plugs that point into the mapping and finds the difference between the predicted value and the known value. For the 2nd order polynomial fit, I get a mean registration error around 7nm, but for the spline fitting using the function above, the mean error is more like 20,000nm. Which (along with all the random junk that gets spit out, such as /Library/Frameworks/Python.framework/Versions/5.1.0/lib/python2.5/site-packages/scipy/interpolate/fitpack2.py:498: UserWarning: Error on entry, no approximation returned. The following conditions must hold: xb<=x[i]<=xe, yb<=y[i]<=ye, w[i]>0, i=0..m-1 If iopt==-1, then xb References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com> <1cd32cbb0911251219q6bcdd086ycf33312b708d6ce8@mail.gmail.com> <45d1ab480911251524k6078e601nb71f97840e2c3ecb@mail.gmail.com> <1cd32cbb0911251545o78ec7964sec8383313bf39084@mail.gmail.com> <45d1ab480911251631h34751c04u9f84c540ea655b47@mail.gmail.com> <45d1ab480911251654q78b5a4d6ub810ec1042a1011c@mail.gmail.com> Message-ID: <45d1ab480911261405p6dbca723j484942a14cbae52@mail.gmail.com> Sounds good, thanks again! DG On Thu, Nov 26, 2009 at 12:34 PM, nicky van foreest wrote: > > Chung, K. L., 2000. "A Course In Probability Theory, 2nd Ed." Academic. > > looks like a really good general reference, Nicky - I assume this is the > > "Chung" to which you were referring? Thanks!!! > > That is the one indeed. > > For z transforms you might like generatingfunctionoly (or something > like this). Search on Herbert Wilf, On his website you can find a very > nice book on the uses of z transforms. The first chapter is very > accessible. > > HOpe this helps. > > NIcky > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Nov 26 17:17:31 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 26 Nov 2009 17:17:31 -0500 Subject: [SciPy-User] Bivariate Spline Surface Fitting In-Reply-To: References: Message-ID: <1cd32cbb0911261417h36ea8a82lf59823bbeece05@mail.gmail.com> On Thu, Nov 26, 2009 at 4:34 PM, Peter Combs wrote: > Hi all, > I have localization data in 2 color channels that should agree with each other, but in practice, they don't to the level we want. I thought I'd try doing a straight polynomial least squares fit, and while that gives better registration between the two, I'm still not to the level I want. My next thought was a spline fit, so I'm trying to make two least-squares bivariate spline fits: one for taking (x,y) to x', and one for taking (x,y) to y'. > > > import scipy.interpolate as interp > ... > def makeLSQspline(xl, yl, xr, yr): > ? """docstring for makespline""" > > ? xmin = xr.min()-1 > ? xmax = xr.max()+1 > ? ymin = yr.min()-1 > ? ymax = yr.max()+1 > ? n = len(xl) > > ? print "xrange: ", xmin, xmax, '\t', "yrange: ", ymin, ymax > > ? yknots, xknots = mgrid[ymin:ymax:10j, xmin:xmax:10j] ? # Makes an 11x11 regular grid of knot locations > > ? xspline = interp.LSQBivariateSpline(xr, yr, xl, xknots.flat, yknots.flat) > ? yspline = interp.LSQBivariateSpline(xr, yr, yl, xknots.flat, yknots.flat) > > ? def mapping(xr, yr): > ? ? ?xl = xspline.ev(xr, yr) > ? ? ?yl = yspline.ev(xr, yr) > ? ? ?return xl, yl > ? return mapping > > > I have a "Registration Error" function which calculates a mapping for all but the ith point, then plugs that point into the mapping and finds the difference between the predicted value and the known value. For the 2nd order polynomial fit, I get a mean registration error around 7nm, but for the spline fitting using the function above, the mean error is more like 20,000nm. Which (along with all the random junk that gets spit out, such as > /Library/Frameworks/Python.framework/Versions/5.1.0/lib/python2.5/site-packages/scipy/interpolate/fitpack2.py:498: UserWarning: > Error on entry, no approximation returned. The following conditions > must hold: > xb<=x[i]<=xe, yb<=y[i]<=ye, w[i]>0, i=0..m-1 > If iopt==-1, then > xb yb warnings.warn(message) > > about 1 copy of this (or something similar) per call of makeLSQSpline: > tx= 0.00000000000000 0.00000000000000 0.00000000000000 -264.022095089756 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 > ?12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 28778.4083647834 0.00000000000000 0.00000000000000 0.00000000000000 > tx= 0.00000000000000 0.00000000000000 0.00000000000000 -264.022095089756 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 > ?12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 28778.4083647834 0.00000000000000 0.00000000000000 0.00000000000000 > > makes me think something isn't quite right. Any guesses what's going on? I have ~3400 data points, roughly evenly spread out over a 28,000nm x 23,000nm grid. > > On another note, how well is this going to scale up? If I end up collecting hundreds of thousands to low-millions of points, does spline fitting go as O(n^2), or more like O(n)? The error registration function runs as O(n*O(fitting)), and takes around 5 seconds now, so O(N) spline fitting is fine, about an hour run time total, but O(n^2) is very much not. > Peter Combs > peter.combs at berkeley.edu > > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From sccolbert at gmail.com Thu Nov 26 17:45:49 2009 From: sccolbert at gmail.com (S. Chris Colbert) Date: Thu, 26 Nov 2009 23:45:49 +0100 Subject: [SciPy-User] [OT] Transform (i.e., Fourier, Laplace, etc.) methods in Prob. & Stats. In-Reply-To: <45d1ab480911261405p6dbca723j484942a14cbae52@mail.gmail.com> References: <45d1ab480911251153i5e70b5fbp93cdd8f9f2b745ec@mail.gmail.com> <45d1ab480911261405p6dbca723j484942a14cbae52@mail.gmail.com> Message-ID: <200911262345.49903.sccolbert@gmail.com> i dont know if it will be of any use to you, but: Laplace transforms and their inversions are also used extensively in control theory. I wrote some python for the numerical inversion awhile back (before i was any good in python, be warned!) There are two different inversion methods in the attached file: the method of riemann sums is the faster of the two, and here is the reference from which I made my implementation: http://books.google.com/books?id=CmX1aHur7jcC&pg=PA410&lpg=PA410&dq=%27%27%27This+algorithm+is+proposed+by+Tzou, +Ozisik+and+Chiffelle+%281994%29%27%27%27&source=bl&ots=NSiw3tKRvG&sig=cqfa_ka_baPbcnhoSA9Gcxo8Vj8&hl=en&ei=YgQPS-3dKJ2qmwPO6LncBQ&sa=X&oi=book_result&ct=result&resnum=1&ved=0CAgQ6AEwAA#v=onepage&q=%27%27%27This%20algorithm%20is%20proposed%20by%20Tzou%2C%20Ozisik%20and%20Chiffelle%20%281994%29%27%27%27&f=false You can probably throw out the stehfest method. I was using it originally as it was faster, but then I managed to vectorize the riemann method. I had no luck vectorizing the stehfest method. Cheers, Chris > Sounds good, thanks again! > > DG > > On Thu, Nov 26, 2009 at 12:34 PM, nicky van foreest wrote: > > > Chung, K. L., 2000. "A Course In Probability Theory, 2nd Ed." Academic. > > > looks like a really good general reference, Nicky - I assume this is > > > the "Chung" to which you were referring? Thanks!!! > > > > That is the one indeed. > > > > For z transforms you might like generatingfunctionoly (or something > > like this). Search on Herbert Wilf, On his website you can find a very > > nice book on the uses of z transforms. The first chapter is very > > accessible. > > > > HOpe this helps. > > > > NIcky > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- A non-text attachment was scrubbed... Name: inverselaplace.py Type: text/x-python Size: 4864 bytes Desc: not available URL: From david_baddeley at yahoo.com.au Thu Nov 26 19:30:27 2009 From: david_baddeley at yahoo.com.au (David Baddeley) Date: Thu, 26 Nov 2009 16:30:27 -0800 (PST) Subject: [SciPy-User] Bivariate Spline Surface Fitting In-Reply-To: Message-ID: <393158.32904.qm@web33005.mail.mud.yahoo.com> Hi Peter, would that be localization microscopy data by any chance? Which method are you using? We're using a very similar approach to correct our chromatic shift, I suspect that the problem you're having with the standard spline is that if you have a few points where the shift estimation is way off (when you've got 1000's of points you're almost guaranteed to have a few in the tails of a distribution) and they're pulling the interpolation out of whack in their neighbourhood. To get around this, we've ended up using a smoothing spline rather than a simple bivariate spline (see code below). We also preprocess the data to throw out any shift measurements which are pointing in dramatically different directions to their neighbours. def genShiftVectorFieldSpline(x,y, sx, sy, err_sx, err_sy): '''interpolates shift vectors using smoothing splines''' wonky = findWonkyVectors(x, y, sx, sy, tol=2*err_sx.mean()) good = wonky == 0 print '%d wonky vectors found and discarded' % wonky.sum() spx = SmoothBivariateSpline(x[good], y[good], sx[good], 1./err_sx[good]) spy = SmoothBivariateSpline(x[good], y[good], sy[good], 1./err_sy[good]) X, Y = np.meshgrid(np.arange(0, 512*70, 100), np.arange(0, 256*70, 100)) dx = spx.ev(X.ravel(),Y.ravel()).reshape(X.shape) dy = spy.ev(X.ravel(),Y.ravel()).reshape(X.shape) return (dx.T, dy.T, spx, spy) I've never found that I need more than a few thousand points to calculate a shift field which will get the error down to the 10nm regime, and the most I've tried fitting is probably ~10-20K points, which would only have taken a couple of minutes. Evaluating the splines is fast though, so you should have no worries evaluating with millions of points. Cheers, David --- On Fri, 27/11/09, Peter Combs wrote: > From: Peter Combs > Subject: [SciPy-User] Bivariate Spline Surface Fitting > To: scipy-user at scipy.org > Received: Friday, 27 November, 2009, 10:34 AM > Hi all, > I have localization data in 2 color channels that should > agree with each other, but in practice, they don't to the > level we want. I thought I'd try doing a straight polynomial > least squares fit, and while that gives better registration > between the two, I'm still not to the level I want. My next > thought was a spline fit, so I'm trying to make two > least-squares bivariate spline fits: one for taking (x,y) to > x', and one for taking (x,y) to y'. > > > import scipy.interpolate as interp > ... > def makeLSQspline(xl, yl, xr, yr): > ???"""docstring for makespline""" > ??? > ???xmin = xr.min()-1 > ???xmax = xr.max()+1 > ???ymin = yr.min()-1 > ???ymax = yr.max()+1 > ???n = len(xl) > ??? > ???print "xrange: ", xmin, xmax, '\t', > "yrange: ", ymin, ymax > ??? > ???yknots, xknots = mgrid[ymin:ymax:10j, > xmin:xmax:10j]???# Makes an 11x11 regular > grid of knot locations > ??? > ???xspline = interp.LSQBivariateSpline(xr, > yr, xl, xknots.flat, yknots.flat) > ???yspline = interp.LSQBivariateSpline(xr, > yr, yl, xknots.flat, yknots.flat) > ??? > ???def mapping(xr, yr): > ? ? ? xl = xspline.ev(xr, yr) > ? ? ? yl = yspline.ev(xr, yr) > ? ? ? return xl, yl > ???return mapping > > > I have a "Registration Error" function which calculates a > mapping for all but the ith point, then plugs that point > into the mapping and finds the difference between the > predicted value and the known value. For the 2nd order > polynomial fit, I get a mean registration error around 7nm, > but for the spline fitting using the function above, the > mean error is more like 20,000nm. Which (along with all the > random junk that gets spit out, such as > /Library/Frameworks/Python.framework/Versions/5.1.0/lib/python2.5/site-packages/scipy/interpolate/fitpack2.py:498: > UserWarning: > Error on entry, no approximation returned. The following > conditions > must hold: > xb<=x[i]<=xe, yb<=y[i]<=ye, w[i]>0, > i=0..m-1 > If iopt==-1, then > xb yb warnings.warn(message) > > about 1 copy of this (or something similar) per call of > makeLSQSpline: > tx= 0.00000000000000 0.00000000000000 0.00000000000000 > -264.022095089756 337.266858978529 3468.05790461355 > 6598.84895024857 9729.63999588358 12860.4310415186 > 15991.2220871536 19122.0131327886 22252.8041784237 > 25383.5952240587 28514.3862696937 337.266858978529 > 3468.05790461355 6598.84895024857 9729.63999588358 > 12860.4310415186 15991.2220871536 19122.0131327886 > 22252.8041784237 25383.5952240587 28514.3862696937 > 337.266858978529 3468.05790461355 6598.84895024857 > 9729.63999588358 12860.4310415186 15991.2220871536 > 19122.0131327886 22252.8041784237 25383.5952240587 > 28514.3862696937 337.266858978529 3468.05790461355 > 6598.84895024857 9729.63999588358 12860.4310415186 > 15991.2220871536 19122.0131327886 22252.8041784237 > 25383.5952240587 28514.3862696937 337.266858978529 > 3468.05790461355 6598.84895024857 9729.63999588358 > 12860.4310415186 15991.2220871536 19122.0131327886 > 22252.8041784237 25383.5952240587 28514.3862696937 > 337.266858978529 3468.05790461355 6598.84895024857 > 9729.63999588358 > ? 12860.4310415186 15991.2220871536 19122.0131327886 > 22252.8041784237 25383.5952240587 28514.3862696937 > 337.266858978529 3468.05790461355 6598.84895024857 > 9729.63999588358 12860.4310415186 15991.2220871536 > 19122.0131327886 22252.8041784237 25383.5952240587 > 28514.3862696937 337.266858978529 3468.05790461355 > 6598.84895024857 9729.63999588358 12860.4310415186 > 15991.2220871536 19122.0131327886 22252.8041784237 > 25383.5952240587 28514.3862696937 337.266858978529 > 3468.05790461355 6598.84895024857 9729.63999588358 > 12860.4310415186 15991.2220871536 19122.0131327886 > 22252.8041784237 25383.5952240587 28514.3862696937 > 337.266858978529 3468.05790461355 6598.84895024857 > 9729.63999588358 12860.4310415186 15991.2220871536 > 19122.0131327886 22252.8041784237 25383.5952240587 > 28514.3862696937 28778.4083647834 0.00000000000000 > 0.00000000000000 0.00000000000000 > tx= 0.00000000000000 0.00000000000000 0.00000000000000 > -264.022095089756 337.266858978529 3468.05790461355 > 6598.84895024857 9729.63999588358 12860.4310415186 > 15991.2220871536 19122.0131327886 22252.8041784237 > 25383.5952240587 28514.3862696937 337.266858978529 > 3468.05790461355 6598.84895024857 9729.63999588358 > 12860.4310415186 15991.2220871536 19122.0131327886 > 22252.8041784237 25383.5952240587 28514.3862696937 > 337.266858978529 3468.05790461355 6598.84895024857 > 9729.63999588358 12860.4310415186 15991.2220871536 > 19122.0131327886 22252.8041784237 25383.5952240587 > 28514.3862696937 337.266858978529 3468.05790461355 > 6598.84895024857 9729.63999588358 12860.4310415186 > 15991.2220871536 19122.0131327886 22252.8041784237 > 25383.5952240587 28514.3862696937 337.266858978529 > 3468.05790461355 6598.84895024857 9729.63999588358 > 12860.4310415186 15991.2220871536 19122.0131327886 > 22252.8041784237 25383.5952240587 28514.3862696937 > 337.266858978529 3468.05790461355 6598.84895024857 > 9729.63999588358 > ? 12860.4310415186 15991.2220871536 19122.0131327886 > 22252.8041784237 25383.5952240587 28514.3862696937 > 337.266858978529 3468.05790461355 6598.84895024857 > 9729.63999588358 12860.4310415186 15991.2220871536 > 19122.0131327886 22252.8041784237 25383.5952240587 > 28514.3862696937 337.266858978529 3468.05790461355 > 6598.84895024857 9729.63999588358 12860.4310415186 > 15991.2220871536 19122.0131327886 22252.8041784237 > 25383.5952240587 28514.3862696937 337.266858978529 > 3468.05790461355 6598.84895024857 9729.63999588358 > 12860.4310415186 15991.2220871536 19122.0131327886 > 22252.8041784237 25383.5952240587 28514.3862696937 > 337.266858978529 3468.05790461355 6598.84895024857 > 9729.63999588358 12860.4310415186 15991.2220871536 > 19122.0131327886 22252.8041784237 25383.5952240587 > 28514.3862696937 28778.4083647834 0.00000000000000 > 0.00000000000000 0.00000000000000 > > makes me think something isn't quite right. Any guesses > what's going on? I have ~3400 data points, roughly evenly > spread out over a 28,000nm x 23,000nm grid. > > On another note, how well is this going to scale up? If I > end up collecting hundreds of thousands to low-millions of > points, does spline fitting go as O(n^2), or more like O(n)? > The error registration function runs as O(n*O(fitting)), and > takes around 5 seconds now, so O(N) spline fitting is fine, > about an hour run time total, but O(n^2) is very much not. > Peter Combs > peter.combs at berkeley.edu > > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Thu Nov 26 23:03:04 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 26 Nov 2009 23:03:04 -0500 Subject: [SciPy-User] Bivariate Spline Surface Fitting In-Reply-To: References: Message-ID: <1cd32cbb0911262003k1d1d3424paeeb5f55d2eac19@mail.gmail.com> On Thu, Nov 26, 2009 at 4:34 PM, Peter Combs wrote: > Hi all, > I have localization data in 2 color channels that should agree with each other, but in practice, they don't to the level we want. I thought I'd try doing a straight polynomial least squares fit, and while that gives better registration between the two, I'm still not to the level I want. My next thought was a spline fit, so I'm trying to make two least-squares bivariate spline fits: one for taking (x,y) to x', and one for taking (x,y) to y'. > > > import scipy.interpolate as interp > ... > def makeLSQspline(xl, yl, xr, yr): > ? """docstring for makespline""" > > ? xmin = xr.min()-1 > ? xmax = xr.max()+1 > ? ymin = yr.min()-1 > ? ymax = yr.max()+1 > ? n = len(xl) > > ? print "xrange: ", xmin, xmax, '\t', "yrange: ", ymin, ymax > > ? yknots, xknots = mgrid[ymin:ymax:10j, xmin:xmax:10j] ? # Makes an 11x11 regular grid of knot locations knots should only specify the point of x and y not all grid points, I added an s to play with the border values following the example in the tests. most of it just trial and error, since I don't have a good example of what I should get as a result with the following knots, it finishes without warning and errors, and I get some numbers back that might be reasonable. s = 1.1 yknots = np.linspace(ymin+s,ymax-s,10) xknots = np.linspace(xmin+s,xmax-s,10) Some good examples for the use of the different options in the spline classes would be nice. the docs are still pretty bad, but there is: 473 Input: 474 x,y,z - 1-d sequences of data points (order is not 475 important) 476 tx,ty - strictly ordered 1-d sequences of knots 477 coordinates. 478 Optional input: 479 w - positive 1-d sequence of weights 480 bbox - 4-sequence specifying the boundary of 481 the rectangular approximation domain. 482 By default, bbox=[min(x,tx),max(x,tx), 483 min(y,ty),max(y,ty)] 484 kx,ky=3,3 - degrees of the bivariate spline. 485 eps - a threshold for determining the effective rank 486 of an over-determined linear system of 487 equations. 0 < eps < 1, default is 1e-16. Josef > > ? xspline = interp.LSQBivariateSpline(xr, yr, xl, xknots.flat, yknots.flat) > ? yspline = interp.LSQBivariateSpline(xr, yr, yl, xknots.flat, yknots.flat) > > ? def mapping(xr, yr): > ? ? ?xl = xspline.ev(xr, yr) > ? ? ?yl = yspline.ev(xr, yr) > ? ? ?return xl, yl > ? return mapping > > > I have a "Registration Error" function which calculates a mapping for all but the ith point, then plugs that point into the mapping and finds the difference between the predicted value and the known value. For the 2nd order polynomial fit, I get a mean registration error around 7nm, but for the spline fitting using the function above, the mean error is more like 20,000nm. Which (along with all the random junk that gets spit out, such as > /Library/Frameworks/Python.framework/Versions/5.1.0/lib/python2.5/site-packages/scipy/interpolate/fitpack2.py:498: UserWarning: > Error on entry, no approximation returned. The following conditions > must hold: > xb<=x[i]<=xe, yb<=y[i]<=ye, w[i]>0, i=0..m-1 > If iopt==-1, then > xb yb warnings.warn(message) > > about 1 copy of this (or something similar) per call of makeLSQSpline: > tx= 0.00000000000000 0.00000000000000 0.00000000000000 -264.022095089756 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 > ?12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 28778.4083647834 0.00000000000000 0.00000000000000 0.00000000000000 > tx= 0.00000000000000 0.00000000000000 0.00000000000000 -264.022095089756 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 > ?12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 337.266858978529 3468.05790461355 6598.84895024857 9729.63999588358 12860.4310415186 15991.2220871536 19122.0131327886 22252.8041784237 25383.5952240587 28514.3862696937 28778.4083647834 0.00000000000000 0.00000000000000 0.00000000000000 > > makes me think something isn't quite right. Any guesses what's going on? I have ~3400 data points, roughly evenly spread out over a 28,000nm x 23,000nm grid. > > On another note, how well is this going to scale up? If I end up collecting hundreds of thousands to low-millions of points, does spline fitting go as O(n^2), or more like O(n)? The error registration function runs as O(n*O(fitting)), and takes around 5 seconds now, so O(N) spline fitting is fine, about an hour run time total, but O(n^2) is very much not. > Peter Combs > peter.combs at berkeley.edu > > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From cool-rr at cool-rr.com Fri Nov 27 01:02:17 2009 From: cool-rr at cool-rr.com (Ram Rachum) Date: Fri, 27 Nov 2009 06:02:17 +0000 (UTC) Subject: [SciPy-User] Tool for visualizing queues Message-ID: I'm working on a simulation in Queueing Theory, and I would like a good tool for visualizing clients standing in queues and servers serving them. I would eventually like the GUI to be interactive as well, so for example the user could drag a client from one queue to another. Any ideas? Ram. From cool-rr at cool-rr.com Fri Nov 27 01:49:33 2009 From: cool-rr at cool-rr.com (Ram Rachum) Date: Fri, 27 Nov 2009 06:49:33 +0000 (UTC) Subject: [SciPy-User] =?utf-8?q?Mean_arrivals_per_time_unit_-=3E_Time_betw?= =?utf-8?q?een=09consecutive_arrivals?= References: <1cd32cbb0911250820p4b662f3eyab200cfef6c1f68b@mail.gmail.com> Message-ID: gmail.com> writes: > > YOu should take the interarrival time between two consecutive arrivals > > to be exponentially distributed with rate lambda, where lambda is the > > arrival rate. LIke this the number of arrivals in a fixed period is > > Poisson distributed. I never tried, but I suppose scipy contains a > > module to generate exponentially distributed rv's. > > The sum of iid exponential distributed rvs is gamma distributed > http://en.wikipedia.org/wiki/Gamma_distribution > > all available in scipy.stats > > Josef I don't understand. So you mean that the exponential thing would NOT be the right thing for the time between consecutive arrivals? Also, why doesn't scipy automatically gives me the time between consecutive arrivals when I give the mean number of arrivals per time period? Ram. From cimrman3 at ntc.zcu.cz Fri Nov 27 03:08:44 2009 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Fri, 27 Nov 2009 09:08:44 +0100 Subject: [SciPy-User] [Fwd: Re: ANN: SfePy 2009.4] In-Reply-To: <200911261819.19584.sccolbert@gmail.com> References: <4B0D0E33.4010703@ntc.zcu.cz> <20091125110643.GB21484@phare.normalesup.org> <4B0D1603.40509@ntc.zcu.cz> <200911261819.19584.sccolbert@gmail.com> Message-ID: <4B0F890C.107@ntc.zcu.cz> Hi Chris, thanks for trying sfepy! Let's discuss this at sfepy-devel, or, if you do not want to register there, write me personally, this is IMHO rather off-topic for the scipy-user list. S. Chris Colbert wrote: > I'm getting all sorts of errors trying to run sfepy tests and examples: Can you copy & paste the output and send it to me? I wonder what errors you get. > It builds fine. > > But I fail one of the solvers test because of a bug with OpenMPI (whichever > solver is using Petsc4py which btw, is not listed as a dependency). Yes, PETSc (with Petsc4py) are optional packages, used only by test_linear_solvers.py test file. It should not fail, however, only skip the test. > The schroedinger example runs, but produces erroneous output (~300% error). > The poisson and valec examples produce error results. > > System specs: > Kubuntu 9.10 x64 > > Self built/easy_insall: > Numpy 1.3.0 > Scipy 0.7.1 > Newest umfpack scikit > Newest Petsc4Py > pytables > > from the repos: > hdf5-serial > openmpi 1.6.6 > pysparse > > Any help would be awesome! I will gladly help you, but I need more information - could you send me (off-list) the full outputs of the simulations that do not work, or produce error results? Also attach the solution files if you find it necessary. Thanks, r. From peter.combs at berkeley.edu Fri Nov 27 05:42:51 2009 From: peter.combs at berkeley.edu (Peter Combs) Date: Fri, 27 Nov 2009 02:42:51 -0800 Subject: [SciPy-User] Bivariate Spline Surface Fitting In-Reply-To: <1cd32cbb0911262003k1d1d3424paeeb5f55d2eac19@mail.gmail.com> References: <1cd32cbb0911262003k1d1d3424paeeb5f55d2eac19@mail.gmail.com> Message-ID: <57866FE3-9506-48EA-BDC8-E634B5F48E37@berkeley.edu> On Nov 26, 2009, at 4:30 PM, David Baddeley wrote: > Hi Peter, > > would that be localization microscopy data by any chance? Which method are you using? Indeed it is! The lab I'm working in does a lot of FIONA (Fluorescence Imaging with One Nanometer Accuracy), although the data I'm using is a couple steps removed from the usual assays that are done. On Nov 26, 2009, at 8:03 PM, josef.pktd at gmail.com wrote: > knots should only specify the point of x and y not all grid points, I > added an s to play with the border values following the example in the > tests. most of it just trial and error, since I don't have a good > example of what I should get as a result > Thanks, that pretty much did it, I think. I'm still playing with the number of knots to see what gives reasonable results. Taking it up to 75 brings my error down to under 4 nm, which is starting to get to the limit of what we could do in one channel anyways. > with the following knots, it finishes without warning and errors, and > I get some numbers back that might be reasonable. > Now I'm getting this warning, but given that the results are very usable, I'm not too worried: The coefficients of the spline returned have been computed as the minimal norm least-squares solution of a (numerically) rank deficient system (deficiency=92). If deficiency is large, the results may be inaccurate. Deficiency may strongly depend on the value of eps. warnings.warn(message) I think it's saying that there are some grid squares that don't have enough points to calculate a fit, is that right? I'm pretty sure these are at the edge of the mesh, and shouldn't be a big problem. > s = 1.1 > yknots = np.linspace(ymin+s,ymax-s,10) > xknots = np.linspace(xmin+s,xmax-s,10) > > Some good examples for the use of the different options in the spline > classes would be nice. > Yeah. I think once I get things mostly figured out I'll try and condense what I have into an example or two. My problem is that I still don't *really* understand the difference between all these different kinds of splines, so I'll probably want someone to make sure I'm not going totally off the deep end. Peter Combs peter.combs at berkeley.edu From vanforeest at gmail.com Fri Nov 27 07:33:04 2009 From: vanforeest at gmail.com (nicky van foreest) Date: Fri, 27 Nov 2009 13:33:04 +0100 Subject: [SciPy-User] Tool for visualizing queues In-Reply-To: References: Message-ID: Hi Ram, You could have a look at omnetpp. In a simulator I used at Bell Labs there was a small number, like an index, that showed the number of customers in queue (the system). bye Nicky 2009/11/27 Ram Rachum : > I'm working on a simulation in Queueing Theory, and I would like a good tool for > visualizing clients standing in queues and servers serving them. I would > eventually like the GUI to be interactive as well, so for example the user could > drag a client from one queue to another. > > Any ideas? > > Ram. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From vanforeest at gmail.com Fri Nov 27 07:36:18 2009 From: vanforeest at gmail.com (nicky van foreest) Date: Fri, 27 Nov 2009 13:36:18 +0100 Subject: [SciPy-User] Mean arrivals per time unit -> Time between consecutive arrivals In-Reply-To: References: <1cd32cbb0911250820p4b662f3eyab200cfef6c1f68b@mail.gmail.com> Message-ID: > Also, why doesn't scipy automatically gives me the time between consecutive arrivals when I give the mean number of arrivals per time period? Consider a scenario in which precisely t time units fit inbetween two arrivals. The arrival rate would then be 1/t. When customers arrive with exponentially distributed interarrival times and at rate 1/t, the arrival rate is the same in both scenarios, but the interarrival times not. > > Ram. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Fri Nov 27 09:08:01 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 27 Nov 2009 09:08:01 -0500 Subject: [SciPy-User] Bivariate Spline Surface Fitting In-Reply-To: <57866FE3-9506-48EA-BDC8-E634B5F48E37@berkeley.edu> References: <1cd32cbb0911262003k1d1d3424paeeb5f55d2eac19@mail.gmail.com> <57866FE3-9506-48EA-BDC8-E634B5F48E37@berkeley.edu> Message-ID: <1cd32cbb0911270608x2a53f411l193111cf9bbc647c@mail.gmail.com> On Fri, Nov 27, 2009 at 5:42 AM, Peter Combs wrote: > On Nov 26, 2009, at 4:30 PM, David Baddeley wrote: > >> Hi Peter, >> >> would that be localization microscopy data by any chance? Which method are you using? > > Indeed it is! ?The lab I'm working in does a lot of FIONA (Fluorescence Imaging with One Nanometer Accuracy), although the data I'm using is a couple steps removed from the usual assays that are done. > > On Nov 26, 2009, at 8:03 PM, josef.pktd at gmail.com wrote: >> knots should only specify the point of x and y not all grid points, I >> added an s to play with the border values following the example in the >> tests. most of it just trial and error, since I don't have a good >> example of what I should get as a result >> > > Thanks, that pretty much did it, I think. ?I'm still playing with the number of knots to see what gives reasonable results. ?Taking it up to 75 brings my error down to under 4 nm, which is starting to get to the limit of what we could do in one channel anyways. > >> with the following knots, it finishes without warning and errors, and >> I get some numbers back that might be reasonable. >> > > Now I'm getting this warning, but given that the results are very usable, I'm not too worried: > The coefficients of the spline returned have been computed as the > minimal norm least-squares solution of a (numerically) rank deficient > system (deficiency=92). If deficiency is large, the results may be > inaccurate. Deficiency may strongly depend on the value of eps. > ?warnings.warn(message) > > I think it's saying that there are some grid squares that don't have enough points to calculate a fit, is that right? ?I'm pretty sure these are at the edge of the mesh, and shouldn't be a big problem. In a example from the tests, I got this message when the third variable z didn't have any variation. I guess for interpolation this might not have a strong effect. > >> ?s = 1.1 >> ?yknots = np.linspace(ymin+s,ymax-s,10) >> ?xknots = np.linspace(xmin+s,xmax-s,10) >> >> Some good examples for the use of the different options in the spline >> classes would be nice. >> > > Yeah. ?I think once I get things mostly figured out I'll try and condense what I have into an example or two. ?My problem is that I still don't *really* understand the difference between all these different kinds of splines, so I'll probably want someone to make sure I'm not going totally off the deep end. >From what I figured out so far, the main difference between SmoothBivariateSpline and LSQBivariateSpline is that in the first case the approximation is controlled by s and the knot points are adjusted, in the second case the knot points are fixed and will imply some s. (For the univariate splines, we had the discussion on the mailing list that the class names are not very descriptive and misleading) Controlling s (small positive number) looks easier to adjust, than choosing knots. One possibility might be to try out different values of s and see how the chosen knot points (get_knots()) compare to the ones that you are using now. I started to convert a scipy.tutorial example that uses the old wrapper to using spline classes to see the difference in the required arguments, but didn't get very far yet. Having a good (graphical) summary of the spline results, helps a lot to quickly see what the different "smoothing parameters" are doing. Josef > > > Peter Combs > peter.combs at berkeley.edu > > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Fri Nov 27 09:58:18 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 27 Nov 2009 09:58:18 -0500 Subject: [SciPy-User] Mean arrivals per time unit -> Time between consecutive arrivals In-Reply-To: References: <1cd32cbb0911250820p4b662f3eyab200cfef6c1f68b@mail.gmail.com> Message-ID: <1cd32cbb0911270658g57c35802l489d53a58f55410d@mail.gmail.com> On Fri, Nov 27, 2009 at 1:49 AM, Ram Rachum wrote: > ? gmail.com> writes: >> > YOu should take the interarrival time between two consecutive arrivals >> > to be exponentially distributed with rate lambda, where lambda is the >> > arrival rate. LIke this the number of arrivals in a fixed period is >> > Poisson distributed. I never tried, but I suppose scipy contains a >> > module to generate exponentially distributed rv's. >> >> The sum of iid exponential distributed rvs is gamma distributed >> http://en.wikipedia.org/wiki/Gamma_distribution >> >> all available in scipy.stats >> >> Josef > > I don't understand. So you mean that the exponential thing would NOT be the > right thing for the time between consecutive arrivals? What I meant was that the distribution of the time to the next arrival is exponential distributed. The time until you have k arrivals is the sum of k exponentially distributed random variables and is gamma distributed. For simulation of queuing models http://pypi.python.org/pypi/SimPy looks also useful, although I never used it. Josef > > Also, why doesn't scipy automatically gives me the time between consecutive > arrivals when I give the mean number of arrivals per time period? > > Ram. > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From cool-rr at cool-rr.com Fri Nov 27 10:42:11 2009 From: cool-rr at cool-rr.com (Ram Rachum) Date: Fri, 27 Nov 2009 15:42:11 +0000 (UTC) Subject: [SciPy-User] Tool for visualizing queues References: Message-ID: nicky van foreest gmail.com> writes: > > Hi Ram, > > You could have a look at omnetpp. In a simulator I used at Bell Labs > there was a small number, like an index, that showed the number of > customers in queue (the system). > > bye > > Nicky > Thanks for the tip. Ram. From josef.pktd at gmail.com Fri Nov 27 13:07:53 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 27 Nov 2009 13:07:53 -0500 Subject: [SciPy-User] BivariateSpline examples and my crashing python Message-ID: <1cd32cbb0911271007v383afadfv957f713cc24da093@mail.gmail.com> I wanted to prepare some examples for the use of the Bivariate Spline classes, The second part of the attached script contains a translation of a scipy.tutorial example to using the 3 classes instead. However, this script keeps crashing on me. Initially I thought it is RectBivariateSpline, but now I think it might be matplotlib, TK backend and maybe my latest numpy build. I would like to know if the splines crash or if it is my current setup that causes the crash. It crashes in spyder, idle and when I close the windows when I run it on the commandline. The last might indicate that the problem is matplotlib related. Can someone run the script, preferably not in an interpreter where you want to keep your session alive? Josef -------------- next part -------------- # -*- coding: utf-8 -*- """ Created on Thu Nov 26 22:00:20 2009 Author: josef-pktd and scipy mailinglist example """ import numpy as np from scipy import interpolate import matplotlib.pyplot as plt # from mailing list - Peter Combs def makeLSQspline(xl, yl, xr, yr): """docstring for makespline""" xmin = xr.min()-1 xmax = xr.max()+1 ymin = yr.min()-1 ymax = yr.max()+1 n = len(xl) print "xrange: ", xmin, xmax, '\t', "yrange: ", ymin, ymax s = 1.1 yknots, xknots = np.mgrid[ymin+s:ymax-s:10j, xmin+s:xmax-s:10j] # Makes an 11x11 regular grid of knot locations yknots = np.linspace(ymin+s,ymax-s,10) xknots = np.linspace(xmin+s,xmax-s,10) xspline = interpolate.LSQBivariateSpline(xr, yr, xl, xknots.flat, yknots.flat) yspline = interpolate.LSQBivariateSpline(xr, yr, yl, xknots.flat, yknots.flat) def mapping(xr, yr): xl = xspline.ev(xr, yr) yl = yspline.ev(xr, yr) return xl, yl return mapping, xspline, yspline xr = np.arange(20) yr = np.arange(20) s=0 xr, yr = np.mgrid[0+s:20-s:30j, 0+s:20-s:30j] xr = xr.ravel() yr = yr.ravel() xl = np.sin(xr) + 0.1*np.random.normal(size=xr.shape) yl = yr + 0.1*np.random.normal(size=yr.shape) smap, xspline, yspline = makeLSQspline(xl, yl, xr, yr) #print smap(xr, yr) plt.plot(xl) plt.plot(xr) #plt.show() xsp = interpolate.SmoothBivariateSpline(xr, yr, xl, kx=2,ky=2) print xsp.get_knots() #example from tests, testfitpack.py x = [1,1,1,2,2,2,3,3,3] y = [1,2,3,1,2,3,1,2,3] z = [3,3,4,4,5,6,3,3,3] s = 0.1 tx = [1+s,3-s] ty = [1+s,3-s] lut = interpolate.LSQBivariateSpline(x,y,z,tx,ty,kx=1,ky=1) import numpy as np from scipy import interpolate import matplotlib.pyplot as plt #2d spline interpolation example from the tutorial #------------------------------------------------- # Define function over sparse 20x20 grid x,y = np.mgrid[-1:1:20j,-1:1:20j] z = (x+y)*np.exp(-6.0*(x*x+y*y)) plt.figure() plt.pcolor(x,y,z) plt.colorbar() plt.title("Sparsely sampled function.") #plt.show() # Interpolate function over new 70x70 grid xnew,ynew = np.mgrid[-1:1:70j,-1:1:70j] tck = interpolate.bisplrep(x,y,z,s=0) znew = interpolate.bisplev(xnew[:,0],ynew[0,:],tck) plt.figure() plt.pcolor(xnew,ynew,znew) plt.colorbar() plt.title("Interpolated function - bisplrep") #plt.show() #Use spline classes instead of original wrapper #---------------------------------------------- #use same example as before ### Define function over sparse 20x20 grid ## ##x,y = np.mgrid[-1:1:20j,-1:1:20j] ##z = (x+y)*np.exp(-6.0*(x*x+y*y)) ## ##plt.figure() ##plt.pcolor(x,y,z) ##plt.colorbar() ##plt.title("Sparsely sampled function.") ###plt.show() #use SmoothBivariateSpline #^^^^^^^^^^^^^^^^^^^^^^^^^ xnew,ynew = np.mgrid[-1:1:70j,-1:1:70j] #tck = interpolate.bisplrep(x,y,z,s=0) intp = interpolate.SmoothBivariateSpline(x.ravel(),y.ravel(),z.ravel(),s=0.01) znew = intp.ev(xnew.ravel(),ynew.ravel()).reshape((70,70)) plt.figure() plt.pcolor(xnew,ynew,znew) plt.colorbar() plt.title("Interpolated function - SmoothBivariateSpline") #plt.show() #use LSQBivariateSpline #^^^^^^^^^^^^^^^^^^^^^^^^^ xnew,ynew = np.mgrid[-1:1:70j,-1:1:70j] #get knots from previous example tx,ty = intp.get_knots() tx = tx[4:-4] # remove endpoints, 4 in this example ty = ty[4:-4] intp = interpolate.LSQBivariateSpline(x.ravel(),y.ravel(),z.ravel(), tx, ty) znew = intp.ev(xnew.ravel(),ynew.ravel()).reshape((70,70)) plt.figure() plt.pcolor(xnew,ynew,znew) plt.colorbar() plt.title("Interpolated function - LSQBivariateSpline") #plt.show() #use RectBivariateSpline #^^^^^^^^^^^^^^^^^^^^^^^^ # this seems to cause a crash, for eg. s=0.001 # or maybe matplotlib related or maybe numpy ABI problems ? # or maybe some random crashing ? # I think it's matplotlib when closing windows intp = interpolate.RectBivariateSpline(x[:,0],y[0,:],z, s=0.001) znew = intp.ev(xnew.ravel(),ynew.ravel()).reshape((70,70)) plt.figure() plt.pcolor(xnew,ynew,znew) plt.colorbar() plt.title("Interpolated function - RectBivariateSpline") plt.show() From jsseabold at gmail.com Fri Nov 27 13:15:48 2009 From: jsseabold at gmail.com (Skipper Seabold) Date: Fri, 27 Nov 2009 13:15:48 -0500 Subject: [SciPy-User] BivariateSpline examples and my crashing python In-Reply-To: <1cd32cbb0911271007v383afadfv957f713cc24da093@mail.gmail.com> References: <1cd32cbb0911271007v383afadfv957f713cc24da093@mail.gmail.com> Message-ID: On Fri, Nov 27, 2009 at 1:07 PM, wrote: > I wanted to prepare some examples for the use of the Bivariate Spline classes, > The second part of the attached script contains a translation of a > scipy.tutorial example to using the 3 classes instead. > > However, this script keeps crashing on me. Initially I thought it is > RectBivariateSpline, but now I think it might be matplotlib, TK > backend and maybe my latest numpy build. > I would like to know if the splines crash or if it is my current setup > that causes the crash. It crashes in spyder, idle and when I close the > windows when I run it on the commandline. The last might indicate that > the problem is matplotlib related. > > Can someone run the script, preferably not in an interpreter where you > want to keep your session alive? > Runs fine for me and creates plots from within the interpreter and on the command line in Linux. I noticed that you recently ran into the segfault problem that occured somewhere in trunk towards the end summer (I forget when and why). Did you delete everything and rebuild matplotlib as well? I had to rebuild everything that had numpy/scipy as a dependency after that update. Don't know if that's what's going on though. Skipper From josef.pktd at gmail.com Fri Nov 27 13:26:49 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 27 Nov 2009 13:26:49 -0500 Subject: [SciPy-User] BivariateSpline examples and my crashing python In-Reply-To: References: <1cd32cbb0911271007v383afadfv957f713cc24da093@mail.gmail.com> Message-ID: <1cd32cbb0911271026g53c0fcf6v2015589216e806c2@mail.gmail.com> On Fri, Nov 27, 2009 at 1:15 PM, Skipper Seabold wrote: > On Fri, Nov 27, 2009 at 1:07 PM, ? wrote: >> I wanted to prepare some examples for the use of the Bivariate Spline classes, >> The second part of the attached script contains a translation of a >> scipy.tutorial example to using the 3 classes instead. >> >> However, this script keeps crashing on me. Initially I thought it is >> RectBivariateSpline, but now I think it might be matplotlib, TK >> backend and maybe my latest numpy build. >> I would like to know if the splines crash or if it is my current setup >> that causes the crash. It crashes in spyder, idle and when I close the >> windows when I run it on the commandline. The last might indicate that >> the problem is matplotlib related. >> >> Can someone run the script, preferably not in an interpreter where you >> want to keep your session alive? >> > > Runs fine for me and creates plots from within the interpreter and on > the command line in Linux. ?I noticed that you recently ran into the > segfault problem that occured somewhere in trunk towards the end > summer (I forget when and why). ?Did you delete everything and rebuild > matplotlib as well? ?I had to rebuild everything that had numpy/scipy > as a dependency after that update. ?Don't know if that's what's going > on though. On Windows I cannot rebuild matplotlib, I tried once but it has too many dependencies (and a at least a while ago couldn't be fully build with MingW.) That's why I'm worried about all the ABI breakage that was going on and I only recently started to work with the numpy trunk version. At least RectBivariateSpline works. I got a bit suspicious because it is missing in my (older) docs, and not in http://docs.scipy.org/scipy/docs/scipy-docs/interpolate.rst/ Josef > > Skipper > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From josef.pktd at gmail.com Fri Nov 27 13:27:32 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 27 Nov 2009 13:27:32 -0500 Subject: [SciPy-User] BivariateSpline examples and my crashing python In-Reply-To: <1cd32cbb0911271026g53c0fcf6v2015589216e806c2@mail.gmail.com> References: <1cd32cbb0911271007v383afadfv957f713cc24da093@mail.gmail.com> <1cd32cbb0911271026g53c0fcf6v2015589216e806c2@mail.gmail.com> Message-ID: <1cd32cbb0911271027s50ca924dwa670e4996f035e7d@mail.gmail.com> On Fri, Nov 27, 2009 at 1:26 PM, wrote: > On Fri, Nov 27, 2009 at 1:15 PM, Skipper Seabold wrote: >> On Fri, Nov 27, 2009 at 1:07 PM, ? wrote: >>> I wanted to prepare some examples for the use of the Bivariate Spline classes, >>> The second part of the attached script contains a translation of a >>> scipy.tutorial example to using the 3 classes instead. >>> >>> However, this script keeps crashing on me. Initially I thought it is >>> RectBivariateSpline, but now I think it might be matplotlib, TK >>> backend and maybe my latest numpy build. >>> I would like to know if the splines crash or if it is my current setup >>> that causes the crash. It crashes in spyder, idle and when I close the >>> windows when I run it on the commandline. The last might indicate that >>> the problem is matplotlib related. >>> >>> Can someone run the script, preferably not in an interpreter where you >>> want to keep your session alive? >>> >> >> Runs fine for me and creates plots from within the interpreter and on >> the command line in Linux. ?I noticed that you recently ran into the >> segfault problem that occured somewhere in trunk towards the end >> summer (I forget when and why). ?Did you delete everything and rebuild >> matplotlib as well? ?I had to rebuild everything that had numpy/scipy >> as a dependency after that update. ?Don't know if that's what's going >> on though. > > On Windows I cannot rebuild matplotlib, I tried once but it has too > many dependencies (and a at least a while ago couldn't be fully > build with MingW.) > > That's why I'm worried about all the ABI breakage that was going on > and I only recently started to work with the numpy trunk version. > > At least RectBivariateSpline works. I got a bit suspicious because > it is missing in my (older) docs, and not in > http://docs.scipy.org/scipy/docs/scipy-docs/interpolate.rst/ And thank you for checking the script. Josef > > Josef > >> >> Skipper >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > From kmichael.aye at googlemail.com Fri Nov 27 14:49:10 2009 From: kmichael.aye at googlemail.com (Michael Aye) Date: Fri, 27 Nov 2009 11:49:10 -0800 (PST) Subject: [SciPy-User] How to find local minimum of 1d histogram Message-ID: <813bc45b-2d43-4729-a1fa-5f59bcc988b7@x31g2000yqx.googlegroups.com> Hi! I am still fairly new with scipy, so please forgive me, if this is a simple question. But I couldn't find an example for this. What is the easiest way of finding the local minimum between 2 gaussian-like peaks in a 1d Histogram? Background: Using a histogram on an image to identify 2 populations of intensities. The minimum between the gaussian-like peaks in the histogram shall be used as the masking limit to either show one or the other population of pixel intensities. My idea so far, but I'm not sure, if there is not a more obvious way? * Using interpolate1d to get a spline. * somehow get the coefficients of the spline function. * put them into poly1d * do derivative * get roots of derivative I am ready to go this way, but I wondered if it isn't easier? Best regards and a nice weekend! Michael From robince at gmail.com Fri Nov 27 15:12:20 2009 From: robince at gmail.com (Robin) Date: Fri, 27 Nov 2009 20:12:20 +0000 Subject: [SciPy-User] How to find local minimum of 1d histogram In-Reply-To: <813bc45b-2d43-4729-a1fa-5f59bcc988b7@x31g2000yqx.googlegroups.com> References: <813bc45b-2d43-4729-a1fa-5f59bcc988b7@x31g2000yqx.googlegroups.com> Message-ID: <2d5132a50911271212j273088c2pe652c99f38062d1c@mail.gmail.com> On Fri, Nov 27, 2009 at 7:49 PM, Michael Aye wrote: > Hi! > > I am still fairly new with scipy, so please forgive me, if this is a > simple question. But I couldn't find an example for this. > > What is the easiest way of finding the local minimum between 2 > gaussian-like peaks in a 1d Histogram? > > Background: > Using a histogram on an image to identify 2 populations of > intensities. > The minimum between the gaussian-like peaks in the histogram shall be > used as the masking limit to either show one or the other population > of pixel intensities. > > My idea so far, but I'm not sure, if there is not a more obvious way? > * Using interpolate1d to get a spline. > * somehow get the coefficients of the spline function. > * put them into poly1d > * do derivative > * get roots of derivative > > I am ready to go this way, but I wondered if it isn't easier? >From the histogram you get a vector of counts - couldn't you do a diff on the vector and look for where that changes sign? If its a bit noisy you could look for where it changes sign or perhaps smooth before diffing. Cheers Robin From dwf at cs.toronto.edu Fri Nov 27 16:12:15 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Fri, 27 Nov 2009 16:12:15 -0500 Subject: [SciPy-User] How to find local minimum of 1d histogram In-Reply-To: <813bc45b-2d43-4729-a1fa-5f59bcc988b7@x31g2000yqx.googlegroups.com> References: <813bc45b-2d43-4729-a1fa-5f59bcc988b7@x31g2000yqx.googlegroups.com> Message-ID: <04150ACF-9244-4524-81E3-6B11F281E5FD@cs.toronto.edu> On 27-Nov-09, at 2:49 PM, Michael Aye wrote: > The minimum between the gaussian-like peaks in the histogram shall be > used as the masking limit to either show one or the other population > of pixel intensities. > > My idea so far, but I'm not sure, if there is not a more obvious way? > * Using interpolate1d to get a spline. > * somehow get the coefficients of the spline function. > * put them into poly1d > * do derivative > * get roots of derivative I had a similar problem, actually, and used scipy.ndimage.gaussian_laplace, which will produce a smoothed discrete second derivative. The minimum should be pretty easy to locate (it will appear as a rather significant maximum peak in the transformed curve). David From josef.pktd at gmail.com Fri Nov 27 17:07:49 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 27 Nov 2009 17:07:49 -0500 Subject: [SciPy-User] How to find local minimum of 1d histogram In-Reply-To: <04150ACF-9244-4524-81E3-6B11F281E5FD@cs.toronto.edu> References: <813bc45b-2d43-4729-a1fa-5f59bcc988b7@x31g2000yqx.googlegroups.com> <04150ACF-9244-4524-81E3-6B11F281E5FD@cs.toronto.edu> Message-ID: <1cd32cbb0911271407n450f80d0nc37d1e17d87b0824@mail.gmail.com> On Fri, Nov 27, 2009 at 4:12 PM, David Warde-Farley wrote: > > On 27-Nov-09, at 2:49 PM, Michael Aye wrote: > >> The minimum between the gaussian-like peaks in the histogram shall be >> used as the masking limit to either show one or the other population >> of pixel intensities. >> >> My idea so far, but I'm not sure, if there is not a more obvious way? >> * Using interpolate1d to get a spline. >> * somehow get the coefficients of the spline function. >> * put them into poly1d >> * do derivative >> * get roots of derivative > > I had a similar problem, actually, and used > scipy.ndimage.gaussian_laplace, which will produce a smoothed discrete > second derivative. The minimum should be pretty easy to locate (it > will appear as a rather significant maximum peak in the transformed > curve). In a similar direction, I thought of using gaussian_kde to get a smoothed probability distribution. and look for local minimum. Josef > > David > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From kgdunn at gmail.com Fri Nov 27 17:31:32 2009 From: kgdunn at gmail.com (Kevin Dunn) Date: Fri, 27 Nov 2009 17:31:32 -0500 Subject: [SciPy-User] How to find local minimum of 1d histogram Message-ID: > On Fri, Nov 27, 2009 at 4:12 PM, David Warde-Farley wrote: >> >> On 27-Nov-09, at 2:49 PM, Michael Aye wrote: >> >>> The minimum between the gaussian-like peaks in the histogram shall be >>> used as the masking limit to either show one or the other population >>> of pixel intensities. >>> >>> My idea so far, but I'm not sure, if there is not a more obvious way? >>> * Using interpolate1d to get a spline. >>> * somehow get the coefficients of the spline function. >>> * put them into poly1d >>> * do derivative >>> * get roots of derivative >> >> I had a similar problem, actually, and used >> scipy.ndimage.gaussian_laplace, which will produce a smoothed discrete >> second derivative. The minimum should be pretty easy to locate (it >> will appear as a rather significant maximum peak in the transformed >> curve). > > In a similar direction, I thought of using gaussian_kde to get a > smoothed probability distribution. and look for local minimum. Yet another way: Otsu's method [1], which is a standard algorithm in image processing to segment an image. There are other methods as well. When I've used Otsu's method from real-time image processing (under unpredictable lighting), I use it only to provide a starting value. Then you move left or right along the smoothed histogram (I normally just use a moving average smoother, because other exotic smoothers take too much time and don't improve accuracy that much) until you land up in a minimum. Usually the Otsu initial guess isn't far off, but it can be under some circumstances. [1] http://en.wikipedia.org/wiki/Otsu%27s_method (also see the references at the bottom) HTH, Kevin > Josef > >> >> David >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user From kmichael.aye at googlemail.com Fri Nov 27 19:43:42 2009 From: kmichael.aye at googlemail.com (Michael Aye) Date: Fri, 27 Nov 2009 16:43:42 -0800 (PST) Subject: [SciPy-User] How to find local minimum of 1d histogram In-Reply-To: References: Message-ID: <02b80856-e735-4977-88fb-c39af845df23@m3g2000yqf.googlegroups.com> Thanks to you all, love this forum! That will keep me busy on the weekend! ;) BR, Michael On Nov 27, 11:31?pm, Kevin Dunn wrote: > > On Fri, Nov 27, 2009 at 4:12 PM, David Warde-Farley wrote: > > >> On 27-Nov-09, at 2:49 PM, Michael Aye wrote: > > >>> The minimum between the gaussian-like peaks in the histogram shall be > >>> used as the masking limit to either show one or the other population > >>> of pixel intensities. > > >>> My idea so far, but I'm not sure, if there is not a more obvious way? > >>> * Using interpolate1d to get a spline. > >>> * somehow get the coefficients of the spline function. > >>> * put them into poly1d > >>> * do derivative > >>> * get roots of derivative > > >> I had a similar problem, actually, and used > >> scipy.ndimage.gaussian_laplace, which will produce a smoothed discrete > >> second derivative. The minimum should be pretty easy to locate (it > >> will appear as a rather significant maximum peak in the transformed > >> curve). > > > In a similar direction, I thought of using gaussian_kde to get a > > smoothed probability distribution. and look for local minimum. > > Yet another way: Otsu's method [1], which is a standard algorithm in > image processing to segment an image. ?There are other methods as > well. > > When I've used Otsu's method from real-time image processing (under > unpredictable lighting), I use it only to provide a starting value. > Then you move left or right along the smoothed histogram (I normally > just use a moving average smoother, because other exotic smoothers > take too much time and don't improve accuracy that much) until you > land up in a minimum. > > Usually the Otsu initial guess isn't far off, but it can be under some > circumstances. > > [1]http://en.wikipedia.org/wiki/Otsu%27s_method(also see the > references at the bottom) > > HTH, > Kevin > > > Josef > > >> David > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-U... at scipy.org > >>http://mail.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-User mailing list > SciPy-U... at scipy.orghttp://mail.scipy.org/mailman/listinfo/scipy-user From nwagner at iam.uni-stuttgart.de Sat Nov 28 03:59:03 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Sat, 28 Nov 2009 09:59:03 +0100 Subject: [SciPy-User] splprep example Message-ID: Hi all, I am looking for a cookbook example wrt splprep. Any pointer would be appreciated. Nils From cool-rr at cool-rr.com Sat Nov 28 05:04:41 2009 From: cool-rr at cool-rr.com (Ram Rachum) Date: Sat, 28 Nov 2009 10:04:41 +0000 (UTC) Subject: [SciPy-User] EPD doesn't run my code Message-ID: Hello, I have my project GarlicSim which runs in ordinary Python. I tried to run it in EPD, and it's failing, at two distinct points I could identify. (Possibly there are more.) Here's the project code: http://github.com/cool-RR/GarlicSim-for-Python-2.5 I identified one of the points in question. It's about the `win32api` and the `win32process` modules. When I load up the Python shell of my EPD, and try `import win32process`, I get this error dialog: python.exe - Entry Point Not Found The procedure entry point ?PyWinGlobals_Ensure@@YAXXZ could not be located in the dynamic link library pywintypes25.dll. When I try 'import win32api`, I get: python.exe - Entry Point Not Found The procedure entry point ?PyWinObject_AsHANDLE@@YAHPAU_object@@PAPAXH at Z could not be located in the dynamic link library pywintypes25.dll. The second point is that in my wxPython window, the images in the toolbar get cropped. Any idea? From cool-rr at cool-rr.com Sat Nov 28 05:48:21 2009 From: cool-rr at cool-rr.com (Ram Rachum) Date: Sat, 28 Nov 2009 10:48:21 +0000 (UTC) Subject: [SciPy-User] EPD doesn't run my code References: Message-ID: Ram Rachum cool-rr.com> writes: > > Hello, > > I have my project GarlicSim which runs in ordinary Python. I tried to run in > EPD, and it's failing, at two distinct points I could identify. (Possibly ther > are more.) Apologies, I didn't notice the epd-users list. I'll post it there. Ram. From josef.pktd at gmail.com Sat Nov 28 08:11:19 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 28 Nov 2009 08:11:19 -0500 Subject: [SciPy-User] splprep example In-Reply-To: References: Message-ID: <1cd32cbb0911280511n6f4c0ac7m4ee09b7b3f8fca59@mail.gmail.com> On Sat, Nov 28, 2009 at 3:59 AM, Nils Wagner wrote: > Hi all, > > I am looking for a cookbook example wrt splprep. > Any pointer would be appreciated. There are some examples in the scipy tutorial for interpolate in the docs and in http://www.scipy.org/Cookbook/Interpolation?highlight=%28splprep%29#head-34818696f8d7066bb3188495567dd776a451cf11 A mailinglist search should also turn up a few examples. Do you have anything specific in mind? Josef > > > Nils > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From kalle-test at gmx.de Sat Nov 28 14:19:42 2009 From: kalle-test at gmx.de (Kalle) Date: Sat, 28 Nov 2009 20:19:42 +0100 Subject: [SciPy-User] BivariateSpline examples and my crashing python In-Reply-To: <1cd32cbb0911271007v383afadfv957f713cc24da093@mail.gmail.com> References: <1cd32cbb0911271007v383afadfv957f713cc24da093@mail.gmail.com> Message-ID: Hello Josef, josef.pktd at gmail.com schrieb: > I wanted to prepare some examples for the use of the Bivariate Spline classes, > The second part of the attached script contains a translation of a > scipy.tutorial example to using the 3 classes instead. [...] > Can someone run the script, preferably not in an interpreter where you > want to keep your session alive? your Script runs fine here under Windows XP SP3 with python 2.5.4, scipy 0.7.1, matplotlib 0.98.5.3 (WX Backend) and numpy 1.2.1 There is only one warning, which might come from makeLSQspline i guess... Thanks for the example BTW, Kalle. From josef.pktd at gmail.com Sun Nov 29 20:42:08 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 29 Nov 2009 20:42:08 -0500 Subject: [SciPy-User] chebfun Message-ID: <1cd32cbb0911291742m1d3f3ab8r886009c7fba3cab9@mail.gmail.com> I just came by chance across this (for matlab) http://www2.maths.ox.ac.uk/chebfun/index.html http://www2.maths.ox.ac.uk/chebfun/license.html The documentation looks helpful for someone like me who doesn't know enough about what Chebyshev polynomials are good for. Josef From almar.klein at gmail.com Mon Nov 30 04:05:23 2009 From: almar.klein at gmail.com (Almar Klein) Date: Mon, 30 Nov 2009 10:05:23 +0100 Subject: [SciPy-User] ANN: visvis Message-ID: Hi all, I am pleased to announce the first release of visvis, a Python visualization library for of 1D to 4D data. Website: http://code.google.com/p/visvis/ Discussion group: http://groups.google.com/group/visvis/ Since this is the first release, it hasn't been tested on a large scale yet. Therefore I'm specifically interested to know whether it works for everyone. === Description === Visvis is a pure Python visualization library that uses OpenGl to display 1D to 4D data; it can be used from simple plotting tasks to rendering 3D volumetric data that moves in time. Visvis can be used in Python scripts, interactive Python sessions (as with IPython or IEP) and can be embedded in applications. Visvis employs an object oriented structure; each object being visualized (e.g. a line or a texture) has various properties that can be modified to change its behaviour or appearance. A Matlab-like interface in the form of a set of functions allows easy creation of these objects (e.g. plot(), imshow(), volshow()). Regards, Almar From jgomezdans at gmail.com Mon Nov 30 07:05:57 2009 From: jgomezdans at gmail.com (Jose Gomez-Dans) Date: Mon, 30 Nov 2009 12:05:57 +0000 Subject: [SciPy-User] Parallel code Message-ID: <91d218430911300405w34759867x7145cab01938d6bd@mail.gmail.com> Hi! I want to run some code in parallel, and I have toyed with the idea of either using the multiprocessing module, or using ipython (which is quite easy to use). The main idea is to run a number of class methods in parallel (unsurprisingly!), fed with some arguments. However, these methods will need (read-)access to a rather large numpy array. Ideally (and since this is running on a SMP box), this could be a chunk of shared memory. I am aware of Sturla Molden's suggestion of using ctypes, but I guess that I was wondering whether some magic simple stuff is available off the shelf for this shared memory business? Thanks! J -------------- next part -------------- An HTML attachment was scrubbed... URL: From robince at gmail.com Mon Nov 30 07:39:41 2009 From: robince at gmail.com (Robin) Date: Mon, 30 Nov 2009 12:39:41 +0000 Subject: [SciPy-User] Parallel code In-Reply-To: <91d218430911300405w34759867x7145cab01938d6bd@mail.gmail.com> References: <91d218430911300405w34759867x7145cab01938d6bd@mail.gmail.com> Message-ID: <2d5132a50911300439g2d406ee7wb244217f39a30e16@mail.gmail.com> If it is read only and you are on a platform with fork (ie not windows) that multiprocessing is great for this sort of situation... as long as the data is loaded before the fork, all the children can read it fine (but be sure not to write to it - on write the page will be copied for the child process leading to more memory use and changes not visible between children). Usually I put the variable to share in a module before calling pool... ie: import mymodule # a blank module mymodule.d = big_data_array p = Pool(8) p.map(function_which_does_something_to_mymodule.d, list_of_paraters) p.close() Cheers Robin On Mon, Nov 30, 2009 at 12:05 PM, Jose Gomez-Dans wrote: > Hi! > I want to run some code in parallel, and I have toyed with the idea of > either using the multiprocessing module, or using ipython (which is quite > easy to use). The main idea is to run a number of class methods in parallel > (unsurprisingly!), fed with some arguments. However, these methods will need > (read-)access to a rather large numpy array. Ideally (and since this is > running on a SMP box), this could be a chunk of shared memory. I am aware of > Sturla Molden's suggestion of using ctypes, but I guess that I was wondering > whether some magic simple stuff is available off the shelf for this shared > memory business? > > Thanks! > J > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From sturla at molden.no Mon Nov 30 08:28:56 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 30 Nov 2009 14:28:56 +0100 Subject: [SciPy-User] Parallel code In-Reply-To: <91d218430911300405w34759867x7145cab01938d6bd@mail.gmail.com> References: <91d218430911300405w34759867x7145cab01938d6bd@mail.gmail.com> Message-ID: <4B13C898.7030407@molden.no> What do you mean by my suggestion using ctypes? Why don't you use shared memory? Ga?l Varoquaux and I wrote a shared memory backend for ndarrays earlier this year. Sturla Jose Gomez-Dans skrev: > Hi! > I want to run some code in parallel, and I have toyed with the idea of > either using the multiprocessing module, or using ipython (which is > quite easy to use). The main idea is to run a number of class methods > in parallel (unsurprisingly!), fed with some arguments. However, these > methods will need (read-)access to a rather large numpy array. Ideally > (and since this is running on a SMP box), this could be a chunk of > shared memory. I am aware of Sturla Molden's suggestion of using > ctypes, but I guess that I was wondering whether some magic simple > stuff is available off the shelf for this shared memory business? > > Thanks! > J > ------------------------------------------------------------------------ > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From sturla at molden.no Mon Nov 30 08:37:41 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 30 Nov 2009 14:37:41 +0100 Subject: [SciPy-User] Parallel code In-Reply-To: <2d5132a50911300439g2d406ee7wb244217f39a30e16@mail.gmail.com> References: <91d218430911300405w34759867x7145cab01938d6bd@mail.gmail.com> <2d5132a50911300439g2d406ee7wb244217f39a30e16@mail.gmail.com> Message-ID: <4B13CAA5.7050102@molden.no> Robin skrev: > If it is read only and you are on a platform with fork (ie not > windows) that multiprocessing is great for this sort of situation... > as long as the data is loaded before the fork, all the children canread it fine On a system with a copy-on-write optimized os.fork (i.e. almost anything but Cygwin), no shared memory are needed for shared read-only access. Anonymous shared memory (multiprocessing.Array) will work on Windows as well, as handles can be inherited. This must be instantiated prior to process creation. Named shared memory can be used for read-write access to shared memory created before of after forking. Sturla From bsouthey at gmail.com Mon Nov 30 16:05:03 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 30 Nov 2009 15:05:03 -0600 Subject: [SciPy-User] stats, classes instead of functions for results MovStats In-Reply-To: <1cd32cbb0911222143p35d24a1m52596afd13bd1661@mail.gmail.com> References: <1cd32cbb0911222143p35d24a1m52596afd13bd1661@mail.gmail.com> Message-ID: <4B14337F.40208@gmail.com> On 11/22/2009 11:43 PM, josef.pktd at gmail.com wrote: > Following up on a question by Keith on the numpy list and his reminder > that covariance can be calculated by the cross-product minus the > product of the means, I redid and > enhanced my moving stats functions. > > Suppose x and y are two time series, then the moving correlation > requires the calculation of the mean, variance and covariance for each > window. Currently in scipy stats intermediate results are usually > thrown away on return (while rpy/R returns all intermediate results > used for the calculation. > > Using a decorator/descriptor of Fernando written for nitime, I tried > out to write the function as a class instead, so that any desired ( > intermediate) calculations are only made on demand, but once they are > calculated they are attached to the class as attributes or properties. > This seems to be a useful "pattern". > > Are there any opinion for using the pattern in scipy.stats ? MovStats > will currently go into statsmodels > > Below is the class (with cutting part of init), a full script is the > attachment, including examples that test the class. > > about MovStats: > y and x are tested for 2d, either (T,N) with axis=0 or (N,T) with > axis=1, should (but may not yet) work for nd arrays along any axis > (signal.correlate docstring) > nans are handled by dropping the corresponding observations from the > window, not adding any additional observations, > not tested if a window is empty because it contains only nans, nor if > variance is zero > (kern is intended for weighted statistics in the window but not tested > yet, I still need to decide on normalization requirements) > requires scipy.signal, all calculations done with signal.correlate, no loops > as often, functions are one-liners > all results are returned for valid observations only, initial > observations with incomplete window are cut > bonus: slope of moving regression of y on x, since it was trivial to add > still some cleaning and documentation to do > > usage: > ms = MovStats(x, y, axis=1) > ms.yvar > ms.xmean > ms.yxcorr > ms.yxcov > ... > > > Josef > > class MovStats(object): > def __init__(self, y, x=None, kern=5, axis=0): > self.y = y > self.x = x > if np.isscalar(kern): > ws = kern > <... snip> > > @OneTimeProperty > def ymean(self): > ys = signal.correlate(self.y, self.kern, mode='same')[self.sslice] > ym = ys/self.n > return ym > > @OneTimeProperty > def yvar(self): > ys2 = signal.correlate(self.y*self.y, self.kern, > mode='same')[self.sslice] > yvar = ys2/self.n - self.ymean**2 > return yvar > > @OneTimeProperty > def xmean(self): > if self.x is None: > return None > else: > xs = signal.correlate(self.x, self.kern, mode='same')[self.sslice] > xm = xs/self.n > return xm > > @OneTimeProperty > def xvar(self): > if self.x is None: > return None > else: > xs2 = signal.correlate(self.x*self.x, self.kern, > mode='same')[self.sslice] > xvar = xs2/self.n - self.xmean**2 > return xvar > @OneTimeProperty > def yxcov(self): > xys = signal.correlate(self.x*self.y, self.kern, > mode='same')[self.sslice] > return xys/self.n - self.ymean*self.xmean > > @OneTimeProperty > def yxcorr(self): > return self.yxcov/np.sqrt(self.yvar*self.xvar) > > @OneTimeProperty > def yxslope(self): > return self.yxcov/self.xvar > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > I think your handling of NaN's is incorrect because you do not drop the corresponding observations. That is for two arrays y=np.array([[ 1.229563, -0.339428, 0.83891 , 4.026574, 3.069378, 5.95668 ]]) x=np.array([[-1.236469, 1.941089, -0.346566, -0.268529, np.nan, 0.191336]]) For a windows size of 5, in the first window, the first mean and variance of y should use all 5 elements of y, the mean and variance of X should use the first 4 elements of x and, the regression and correlation coefficients should use the first 4 elements of x and y. Some other points: 1) Your calculation of variance is susceptible to errors, see http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance Provided that you are using sufficient precision (like numpy defaults) it is probably not that big a problem. 2) You only use the 'full' windows so when the window width is 5, you miss the first two windows and the last 2 windows. At least the mean exists in these windows and the variance in most of these partial windows. This may provided unexpected results to a user if they do not release which windows are not returned. 3) I think the user needs to define the kern argument for your MovStat class as there is probably no meaningful default value (except 42). 4) I do not know how you should handle positive and negative infinity. 5) Your code expects at least 2 dimensions so 1-d arrays fail because you can not do this assignment 'kdim[axis] = ws' with 1-d arrays. Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Nov 30 16:43:55 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 30 Nov 2009 16:43:55 -0500 Subject: [SciPy-User] stats, classes instead of functions for results MovStats In-Reply-To: <4B14337F.40208@gmail.com> References: <1cd32cbb0911222143p35d24a1m52596afd13bd1661@mail.gmail.com> <4B14337F.40208@gmail.com> Message-ID: <1cd32cbb0911301343r4859874r8059eb1ebee19b3c@mail.gmail.com> On Mon, Nov 30, 2009 at 4:05 PM, Bruce Southey wrote: > On 11/22/2009 11:43 PM, josef.pktd at gmail.com wrote: > > Following up on a question by Keith on the numpy list and his reminder > that covariance can be calculated by the cross-product minus the > product of the means, I redid and > enhanced my moving stats functions. > > Suppose x and y are two time series, then the moving correlation > requires the calculation of the mean, variance and covariance for each > window. Currently in scipy stats intermediate results are usually > thrown away on return (while rpy/R returns all intermediate results > used for the calculation. > > Using a decorator/descriptor of Fernando written for nitime, I tried > out to write the function as a class instead, so that any desired ( > intermediate) calculations are only made on demand, but once they are > calculated they are attached to the class as attributes or properties. > This seems to be a useful "pattern". > > Are there any opinion for using the pattern in scipy.stats ? MovStats > will currently go into statsmodels > > Below is the class (with cutting part of init), a full script is the > attachment, including examples that test the class. > > about MovStats: > y and x are tested for 2d, either (T,N) with axis=0 or (N,T) with > axis=1, should (but may not yet) work for nd arrays along any axis > (signal.correlate docstring) > nans are handled by dropping the corresponding observations from the > window, not adding any additional observations, > not tested if a window is empty because it contains only nans, nor if > variance is zero > (kern is intended for weighted statistics in the window but not tested > yet, I still need to decide on normalization requirements) > requires scipy.signal, all calculations done with signal.correlate, no loops > as often, functions are one-liners > all results are returned for valid observations only, initial > observations with incomplete window are cut > bonus: slope of moving regression of y on x, since it was trivial to add > still some cleaning and documentation to do > > usage: > ms = MovStats(x, y, axis=1) > ms.yvar > ms.xmean > ms.yxcorr > ms.yxcov > ... > > > Josef > > class MovStats(object): > def __init__(self, y, x=None, kern=5, axis=0): > self.y = y > self.x = x > if np.isscalar(kern): > ws = kern > <... snip> > > @OneTimeProperty > def ymean(self): > ys = signal.correlate(self.y, self.kern, mode='same')[self.sslice] > ym = ys/self.n > return ym > > @OneTimeProperty > def yvar(self): > ys2 = signal.correlate(self.y*self.y, self.kern, > mode='same')[self.sslice] > yvar = ys2/self.n - self.ymean**2 > return yvar > > @OneTimeProperty > def xmean(self): > if self.x is None: > return None > else: > xs = signal.correlate(self.x, self.kern, > mode='same')[self.sslice] > xm = xs/self.n > return xm > > @OneTimeProperty > def xvar(self): > if self.x is None: > return None > else: > xs2 = signal.correlate(self.x*self.x, self.kern, > mode='same')[self.sslice] > xvar = xs2/self.n - self.xmean**2 > return xvar > @OneTimeProperty > def yxcov(self): > xys = signal.correlate(self.x*self.y, self.kern, > mode='same')[self.sslice] > return xys/self.n - self.ymean*self.xmean > > @OneTimeProperty > def yxcorr(self): > return self.yxcov/np.sqrt(self.yvar*self.xvar) > > @OneTimeProperty > def yxslope(self): > return self.yxcov/self.xvar > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > Thanks for checking > I think your handling of NaN's is incorrect because you do not drop the > corresponding observations. That is for two arrays > > y=np.array([[ 1.229563, -0.339428,? 0.83891 ,? 4.026574,? 3.069378,? 5.95668 > ]]) > x=np.array([[-1.236469,? 1.941089, -0.346566, -0.268529,???? np.nan, > 0.191336]]) > > For a windows size of 5, in the first window, the first mean and variance of > y should use all 5 elements of y, the mean and variance of X should use the > first 4 elements of x and, the regression and correlation coefficients > should use the first 4 elements of x and y. What I do currently is a compromise, I don't want to calculate mean and variance twice. So the behavior now is, if only one array is given, then you get the mean and variance dropping the nan observations for that array. If two arrays are given then I drop observations in both arrays if either one has a nan. This way the user can choose whether they want the separate calculation. If a user provides two arrays, my assumption is that they want cov and corr. > > > Some other points: > 1) Your calculation of variance is susceptible to errors, see > http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance > Provided that you are using sufficient precision (like numpy defaults) it is > probably not that big a problem. Yes, I'm aware of this, for the example the difference to np.corrcoeff is around 1e-14. I will add a warning to the docstring that this is designed for speed with some precision loss. I might be able to use some preprocessing to at least treat some badly scaled data, but the usual higher numerical precision ways of calculating would require much slower loops. For reasonably short windows this seems to be an acceptable tradeoff. > 2) You only use the 'full' windows so when the window width is 5, you miss > the first two windows and the last 2 windows. At least the mean exists in > these windows and the variance in most of these partial windows. This may > provided unexpected results to a user if they do not release which windows > are not returned. Currently I'm returning "valid" observations, that have a full window. In a previous version I allowed for a lag, lead, centered option, Keith returns nans, I think scikits timeseries masks. I don't know yet which or whether these options should be included in this function (class). > 3) I think the user needs to define the kern argument for your MovStat class > as there is probably no meaningful default value (except 42). at least the window length should be specified by the user. I picked 5 for business week mostly arbitrary to reduce typing (?) In a slightly updated version I switched to convolution instead of correlation to have the correct orientation for e.g. exponential weights. But I haven't tested this yet > 4) I do not know how you should handle positive and negative infinity. I haven't thought about this, but since most of the time I consider inf as a valid number, I think, it will return infs in the corresponding windows. With a masked array, infs could be masked, but for regular arrays, I don't want to convert infs to nans. However, I still have to figure out some corner cases, e.g. no valid observations in a window, windows with zero variance. it would also be possible to require a minimum of valid observations per window. > 5) Your code expects at least 2 dimensions so 1-d arrays fail because you > can not do this assignment 'kdim[axis] = ws' with 1-d arrays. Thanks, I have tested only the 2d case. I guess I have to review general axis handling again Josef > > Bruce > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > From Scott.Askey at afit.edu Mon Nov 30 16:52:32 2009 From: Scott.Askey at afit.edu (Askey, Scott A Capt USAF AETC AFIT/ENY) Date: Mon, 30 Nov 2009 16:52:32 -0500 Subject: [SciPy-User] "profiling" a function References: Message-ID: <792700546363C941B876B9D41AF4475902D689CF@MS-AFIT-03.afit.edu> What are the tools available in Scipy for evaluating the (computational) cost of a function call? In particular I am solving nonlinear systems (with fsolve) and considering exact versus approximate Jacobians and trig functions versus their approximations. V/R Scott -----Original Message----- From: scipy-user-bounces at scipy.org on behalf of scipy-user-request at scipy.org Sent: Mon 11/30/2009 1:00 PM To: scipy-user at scipy.org Subject: SciPy-User Digest, Vol 75, Issue 70 Send SciPy-User mailing list submissions to scipy-user at scipy.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.scipy.org/mailman/listinfo/scipy-user or, via email, send a message with subject or body 'help' to scipy-user-request at scipy.org You can reach the person managing the list at scipy-user-owner at scipy.org When replying, please edit your Subject line so it is more specific than "Re: Contents of SciPy-User digest..." Today's Topics: 1. chebfun (josef.pktd at gmail.com) 2. ANN: visvis (Almar Klein) 3. Parallel code (Jose Gomez-Dans) 4. Re: Parallel code (Robin) 5. Re: Parallel code (Sturla Molden) 6. Re: Parallel code (Sturla Molden) ---------------------------------------------------------------------- Message: 1 Date: Sun, 29 Nov 2009 20:42:08 -0500 From: josef.pktd at gmail.com Subject: [SciPy-User] chebfun To: SciPy Users List Message-ID: <1cd32cbb0911291742m1d3f3ab8r886009c7fba3cab9 at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 I just came by chance across this (for matlab) http://www2.maths.ox.ac.uk/chebfun/index.html http://www2.maths.ox.ac.uk/chebfun/license.html The documentation looks helpful for someone like me who doesn't know enough about what Chebyshev polynomials are good for. Josef ------------------------------ Message: 2 Date: Mon, 30 Nov 2009 10:05:23 +0100 From: Almar Klein Subject: [SciPy-User] ANN: visvis To: SciPy Users List Message-ID: Content-Type: text/plain; charset=UTF-8 Hi all, I am pleased to announce the first release of visvis, a Python visualization library for of 1D to 4D data. Website: http://code.google.com/p/visvis/ Discussion group: http://groups.google.com/group/visvis/ Since this is the first release, it hasn't been tested on a large scale yet. Therefore I'm specifically interested to know whether it works for everyone. === Description === Visvis is a pure Python visualization library that uses OpenGl to display 1D to 4D data; it can be used from simple plotting tasks to rendering 3D volumetric data that moves in time. Visvis can be used in Python scripts, interactive Python sessions (as with IPython or IEP) and can be embedded in applications. Visvis employs an object oriented structure; each object being visualized (e.g. a line or a texture) has various properties that can be modified to change its behaviour or appearance. A Matlab-like interface in the form of a set of functions allows easy creation of these objects (e.g. plot(), imshow(), volshow()). Regards, Almar ------------------------------ Message: 3 Date: Mon, 30 Nov 2009 12:05:57 +0000 From: Jose Gomez-Dans Subject: [SciPy-User] Parallel code To: SciPy Users List Message-ID: <91d218430911300405w34759867x7145cab01938d6bd at mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Hi! I want to run some code in parallel, and I have toyed with the idea of either using the multiprocessing module, or using ipython (which is quite easy to use). The main idea is to run a number of class methods in parallel (unsurprisingly!), fed with some arguments. However, these methods will need (read-)access to a rather large numpy array. Ideally (and since this is running on a SMP box), this could be a chunk of shared memory. I am aware of Sturla Molden's suggestion of using ctypes, but I guess that I was wondering whether some magic simple stuff is available off the shelf for this shared memory business? Thanks! J -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20091130/38449a81/attachment-0001.html ------------------------------ Message: 4 Date: Mon, 30 Nov 2009 12:39:41 +0000 From: Robin Subject: Re: [SciPy-User] Parallel code To: SciPy Users List Message-ID: <2d5132a50911300439g2d406ee7wb244217f39a30e16 at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 If it is read only and you are on a platform with fork (ie not windows) that multiprocessing is great for this sort of situation... as long as the data is loaded before the fork, all the children can read it fine (but be sure not to write to it - on write the page will be copied for the child process leading to more memory use and changes not visible between children). Usually I put the variable to share in a module before calling pool... ie: import mymodule # a blank module mymodule.d = big_data_array p = Pool(8) p.map(function_which_does_something_to_mymodule.d, list_of_paraters) p.close() Cheers Robin On Mon, Nov 30, 2009 at 12:05 PM, Jose Gomez-Dans wrote: > Hi! > I want to run some code in parallel, and I have toyed with the idea of > either using the multiprocessing module, or using ipython (which is quite > easy to use). The main idea is to run a number of class methods in parallel > (unsurprisingly!), fed with some arguments. However, these methods will need > (read-)access to a rather large numpy array. Ideally (and since this is > running on a SMP box), this could be a chunk of shared memory. I am aware of > Sturla Molden's suggestion of using ctypes, but I guess that I was wondering > whether some magic simple stuff is available off the shelf for this shared > memory business? > > Thanks! > J > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > ------------------------------ Message: 5 Date: Mon, 30 Nov 2009 14:28:56 +0100 From: Sturla Molden Subject: Re: [SciPy-User] Parallel code To: SciPy Users List Message-ID: <4B13C898.7030407 at molden.no> Content-Type: text/plain; charset=ISO-8859-1; format=flowed What do you mean by my suggestion using ctypes? Why don't you use shared memory? Ga?l Varoquaux and I wrote a shared memory backend for ndarrays earlier this year. Sturla Jose Gomez-Dans skrev: > Hi! > I want to run some code in parallel, and I have toyed with the idea of > either using the multiprocessing module, or using ipython (which is > quite easy to use). The main idea is to run a number of class methods > in parallel (unsurprisingly!), fed with some arguments. However, these > methods will need (read-)access to a rather large numpy array. Ideally > (and since this is running on a SMP box), this could be a chunk of > shared memory. I am aware of Sturla Molden's suggestion of using > ctypes, but I guess that I was wondering whether some magic simple > stuff is available off the shelf for this shared memory business? > > Thanks! > J > ------------------------------------------------------------------------ > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > ------------------------------ Message: 6 Date: Mon, 30 Nov 2009 14:37:41 +0100 From: Sturla Molden Subject: Re: [SciPy-User] Parallel code To: SciPy Users List Message-ID: <4B13CAA5.7050102 at molden.no> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Robin skrev: > If it is read only and you are on a platform with fork (ie not > windows) that multiprocessing is great for this sort of situation... > as long as the data is loaded before the fork, all the children canread it fine On a system with a copy-on-write optimized os.fork (i.e. almost anything but Cygwin), no shared memory are needed for shared read-only access. Anonymous shared memory (multiprocessing.Array) will work on Windows as well, as handles can be inherited. This must be instantiated prior to process creation. Named shared memory can be used for read-write access to shared memory created before of after forking. Sturla ------------------------------ _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user End of SciPy-User Digest, Vol 75, Issue 70 ****************************************** From dwf at cs.toronto.edu Mon Nov 30 17:45:03 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Mon, 30 Nov 2009 17:45:03 -0500 Subject: [SciPy-User] "profiling" a function In-Reply-To: <792700546363C941B876B9D41AF4475902D689CF@MS-AFIT-03.afit.edu> References: <792700546363C941B876B9D41AF4475902D689CF@MS-AFIT-03.afit.edu> Message-ID: <66D17E78-C2BB-420E-BF6C-78C70DA3EBBB@cs.toronto.edu> On 30-Nov-09, at 4:52 PM, Askey, Scott A Capt USAF AETC AFIT/ENY wrote: > > What are the tools available in Scipy for evaluating the > (computational) cost of a function call? > > > > In particular I am solving nonlinear systems (with fsolve) and > considering exact versus approximate Jacobians > and trig functions versus their approximations. Nothing in SciPy itself, but Python contains the cProfile module, as well as hotshot. There's also Robert Kern's line_profiler: http://packages.python.org/line_profiler/ which is rather handy. If you'd just like to time things, the 'timeit' module is good, as well as IPython's shortcuts for it (i.e. %timeit -n 3 my_call()) David From Chris.Barker at noaa.gov Mon Nov 30 18:58:29 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 30 Nov 2009 15:58:29 -0800 Subject: [SciPy-User] scikits.timeseries question Message-ID: <4B145C25.7040303@noaa.gov> HI all, Maybe I'm missing something, but I can't seem to get this to work as I'd like. I have a bunch of data that is indexed by "day since Jan 1, 2001". It seemed I should be able to do a DateArray like this: In [40]: import scikits.timeseries as ts In [41]: sd = ts.Date(freq='D', year=2001, month=1, day=1) In [42]: sd Out[42]: In [43]: da = ts.date_array((1,2,3,4), start_date=sd) In [44]: da Out[44]: DateArray([1, 2, 3, 4], freq='U') but it looks like it didn't get the frequency ffomr teh start date, so I did: In [46]: da = ts.date_array((1,2,3,4), start_date=sd, freq='D') In [47]: da Out[47]: DateArray([01-Jan-0001, 02-Jan-0001, 03-Jan-0001, 04-Jan-0001], freq='D') Now it's got the frequency, but it's using year 0001, instead of 2001, which is the same as I get if I don't use a start_date at all. What am I missing? In [50]: ts.__version__ Out[50]: '0.91.3' -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Mon Nov 30 19:12:52 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 30 Nov 2009 19:12:52 -0500 Subject: [SciPy-User] scikits.timeseries question In-Reply-To: <4B145C25.7040303@noaa.gov> References: <4B145C25.7040303@noaa.gov> Message-ID: <9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com> On Nov 30, 2009, at 6:58 PM, Christopher Barker wrote: > HI all, > > Maybe I'm missing something, but I can't seem to get this to work as I'd > like. I guess you're confusing DateArrays and TimeSeries. DateArrays are just arrays of dates (think a ndarray of datetime objects, or a ndarray with a datetime64 dtype). TimeSeries are like MaskedArrays, the combination of a ndarray of values with 2 others ndarrays: one array of booleans (the mask), one DateArray. > I have a bunch of data that is indexed by "day since Jan 1, 2001". It > seemed I should be able to do a DateArray like this: > > In [40]: import scikits.timeseries as ts > > In [41]: sd = ts.Date(freq='D', year=2001, month=1, day=1) > > In [42]: sd > Out[42]: All is well here. > In [43]: da = ts.date_array((1,2,3,4), start_date=sd) Check the doc for date_array: the first argument can be * an existing :class:`DateArray` object; * a sequence of :class:`Date` objects with the same frequency; * a sequence of :class:`datetime.datetime` objects; * a sequence of dates in string format; * a sequence of integers corresponding to the representation of :class:`Date` objects. So, what you're trying to do is to build a an array of four dates (1,2,3,4) Instead, use that: >>> ts.time_series((1,2,3,4),start_date=sd) timeseries([1 2 3 4], dates = [01-Jan-2001 ... 04-Jan-2001], freq = D) If you think the doc is confusing to that respect, please let me know how to improve it. And of course, don't hesitate to contact me if you need further info P. From Chris.Barker at noaa.gov Mon Nov 30 19:23:52 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 30 Nov 2009 16:23:52 -0800 Subject: [SciPy-User] scikits.timeseries question In-Reply-To: <9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com> References: <4B145C25.7040303@noaa.gov> <9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com> Message-ID: <4B146218.9000305@noaa.gov> Pierre GM wrote: > On Nov 30, 2009, at 6:58 PM, Christopher Barker wrote: > I guess you're confusing DateArrays and TimeSeries. > DateArrays are just arrays of dates (think a ndarray of datetime > objects, or a ndarray with a datetime64 dtype). TimeSeries are like > MaskedArrays, the combination of a ndarray of values with 2 others > ndarrays: one array of booleans (the mask), one DateArray. Actually, I think I got that. >> In [41]: sd = ts.Date(freq='D', year=2001, month=1, day=1) >> >> In [42]: sd >> Out[42]: > > All is well here. yup. >> In [43]: da = ts.date_array((1,2,3,4), start_date=sd) > > Check the doc for date_array: the first argument can be ... > * a sequence of integers corresponding to the representation of > :class:`Date` objects. That's what I'm trying to give it. > So, what you're trying to do is to build a an array of four dates (1,2,3,4) > Instead, use that: > >>>> ts.time_series((1,2,3,4),start_date=sd) > timeseries([1 2 3 4], > dates = [01-Jan-2001 ... 04-Jan-2001], > freq = D) Ah, but what I am trying to do is build that "dates" array -- in teh real case, I have 1212 pieces of data, associated with time, in terms of "days since Jan 1, 2001). So I need to construct that dates array to associate with the time_series data. So I want: dates = what_to_put_here? ts.time_series(an_array_of_data, start_date=sd) timeseries([1 2 3 4], dates = dates], freq = D) While I'm at it -- what I really have is a big 'ol 3-d array, which is gridded model output, of shape: (time, lat, lon). Time is expressed in days since... I need to do a moving average of the while grid over time. Can a time_series be n-d, with time as one of the axis? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Mon Nov 30 19:49:35 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 30 Nov 2009 19:49:35 -0500 Subject: [SciPy-User] scikits.timeseries question In-Reply-To: <4B146218.9000305@noaa.gov> References: <4B145C25.7040303@noaa.gov> <9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com> <4B146218.9000305@noaa.gov> Message-ID: On Nov 30, 2009, at 7:23 PM, Christopher Barker wrote: > Pierre GM wrote: > ... > >> * a sequence of integers corresponding to the representation of >> :class:`Date` objects. > > That's what I'm trying to give it. Ah OK. Well, the answer is: that depends. iIf you know that your dates are just in daily increments from 2001-01-01 (like a range), then just use start_date and length. If you may have several duplicated dates (like 2001-01-01, 2001-01-02, 2001-01-02, 2001-01-03...), then the easiest is probably: >>> da = ts.date_array(np.array(0,1,1,2)+sd) np.array(...) + sd gives you a ndarray of Date objects (so its dtype is np.object), and you use that as the input of date_array. The frequency should be recognized properly. Note that if 1 in your data set means '2001-01-01', then use (sd-1) instead, but you would have guessed that. > While I'm at it -- what I really have is a big 'ol 3-d array, which is > gridded model output, of shape: (time, lat, lon). Time is expressed in > days since... > > I need to do a moving average of the while grid over time. Can a > time_serie be n-d, with time as one of the axis? Well, I never tried so I can tell you. Check wheter lib.moving_funcs supports 2D data. If not, not a big deal: just fill the missing dates (so that you have a regular-spaced series with masked elements for missing dates), and use whatever moving average you need on the .series attribute (which is just a MaskedArray). Or fill this .series with np.nans if your averaging function accepts floats but no missing values... Let me know how it goes P. From ferrell at diablotech.com Mon Nov 30 19:53:56 2009 From: ferrell at diablotech.com (Robert Ferrell) Date: Mon, 30 Nov 2009 17:53:56 -0700 Subject: [SciPy-User] scikits.timeseries question In-Reply-To: <4B146218.9000305@noaa.gov> References: <4B145C25.7040303@noaa.gov> <9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com> <4B146218.9000305@noaa.gov> Message-ID: <5B296682-78C2-44AC-81AD-CCC220B8E47F@diablotech.com> On Nov 30, 2009, at 5:23 PM, Christopher Barker wrote: > Pierre GM wrote: >> On Nov 30, 2009, at 6:58 PM, Christopher Barker wrote: >> I guess you're confusing DateArrays and TimeSeries. > >> DateArrays are just arrays of dates (think a ndarray of datetime >> objects, or a ndarray with a datetime64 dtype). TimeSeries are like >> MaskedArrays, the combination of a ndarray of values with 2 others >> ndarrays: one array of booleans (the mask), one DateArray. > > Actually, I think I got that. > >>> In [41]: sd = ts.Date(freq='D', year=2001, month=1, day=1) >>> >>> In [42]: sd >>> Out[42]: >> >> All is well here. > > yup. > >>> In [43]: da = ts.date_array((1,2,3,4), start_date=sd) >> >> Check the doc for date_array: the first argument can be > > ... > >> * a sequence of integers corresponding to the representation >> of >> :class:`Date` objects. > > That's what I'm trying to give it. > >> So, what you're trying to do is to build a an array of four dates >> (1,2,3,4) >> Instead, use that: >> >>>>> ts.time_series((1,2,3,4),start_date=sd) >> timeseries([1 2 3 4], >> dates = [01-Jan-2001 ... 04-Jan-2001], >> freq = D) > > Ah, but what I am trying to do is build that "dates" array -- in teh > real case, I have 1212 pieces of data, associated with time, in > terms of > "days since Jan 1, 2001). So I need to construct that dates array to > associate with the time_series data. > > So I want: > > dates = what_to_put_here? I may be misunderstanding what you are trying to do, but here's what I do: In [68]: sd = ts.Date('d', '2001-01-01') In [69]: dates = ts.date_array(cumsum(ones(4)) + sd) In [70]: dates Out[70]: DateArray([02-Jan-2001, 03-Jan-2001, 04-Jan-2001, 05-Jan-2001], freq='D') If the dates aren't consecutive, you can always just use the known offsets: In [73]: days_since_beginning = array([1, 3, 4, 8]) In [74]: dates = ts.date_array(days_since_beginning + sd) In [75]: dates Out[75]: DateArray([02-Jan-2001, 04-Jan-2001, 05-Jan-2001, 09-Jan-2001], freq='D') There's probably an easier way... If this happens to be what you are trying to do, be careful of the counting of days (0 based, vs 1 based). -robert > > ts.time_series(an_array_of_data, > start_date=sd) > timeseries([1 2 3 4], > dates = dates], > freq = D) > > While I'm at it -- what I really have is a big 'ol 3-d array, which is > gridded model output, of shape: (time, lat, lon). Time is expressed in > days since... > > I need to do a moving average of the while grid over time. Can a > time_series be n-d, with time as one of the axis? > > -Chris > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From pgmdevlist at gmail.com Mon Nov 30 20:06:44 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 30 Nov 2009 20:06:44 -0500 Subject: [SciPy-User] scikits.timeseries question In-Reply-To: <5B296682-78C2-44AC-81AD-CCC220B8E47F@diablotech.com> References: <4B145C25.7040303@noaa.gov> <9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com> <4B146218.9000305@noaa.gov> <5B296682-78C2-44AC-81AD-CCC220B8E47F@diablotech.com> Message-ID: <55C6FC2C-8242-47CA-817D-4E0289C7B9DD@gmail.com> On Nov 30, 2009, at 7:53 PM, Robert Ferrell wrote: > > I may be misunderstanding what you are trying to do, but here's what I > do: > > In [68]: sd = ts.Date('d', '2001-01-01') > > In [69]: dates = ts.date_array(cumsum(ones(4)) + sd) > > In [70]: dates > Out[70]: > DateArray([02-Jan-2001, 03-Jan-2001, 04-Jan-2001, 05-Jan-2001], > freq='D') The cumsum approach works only if you have irregular time steps as inputs (as in 1 day after the first, 1 day after that, 3 days after that...). If you have regular time steps of 1, just use arange+start_date (or even just length+start_date) From Chris.Barker at noaa.gov Mon Nov 30 20:16:41 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 30 Nov 2009 17:16:41 -0800 Subject: [SciPy-User] scikits.timeseries question In-Reply-To: References: <4B145C25.7040303@noaa.gov> <9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com> <4B146218.9000305@noaa.gov> Message-ID: <4B146E79.7090407@noaa.gov> Pierre GM wrote: > Ah OK. Well, the answer is: that depends. iIf you know that your > dates are just in daily increments from 2001-01-01 (like a range), > then just use start_date and length. right -- but I don't know that. > If you may have several duplicated dates (like 2001-01-01, > 2001-01-02, 2001-01-02, 2001-01-03...), then the easiest is probably: > >>>> da = ts.date_array(np.array(0,1,1,2)+sd) nope -- not duplicated, but maybe there are missing ones. The point is that I have an array of "days since", and I want array of timeseries.dates (which is a DateArray, yes?) > np.array(...) + sd gives you a ndarray of Date objects (so its dtype > is np.object), and you use that as the input of date_array. The > frequency should be recognized properly. OK -- though it seems I SHOULD be able to go straight to an DateArray, and I'm still confused about what this means: >> In [43]: da = ts.date_array((1,2,3,4), start_date=sd) > > Check the doc for date_array: the first argument can be > * an existing :class:`DateArray` object; > * a sequence of :class:`Date` objects with the same frequency; > * a sequence of :class:`datetime.datetime` objects; > * a sequence of dates in string format; > * a sequence of integers corresponding to the representation of > :class:`Date` objects. That's what I have: a sequence of integers corresponding to the representation of the Date objects (doesn't it represent them as "units since start date" where units is the "freq" ? If that's not what if means, then what does it mean? Robert Ferrell wrote: > If this happens to be what you are trying to do, be careful of the > counting of days (0 based, vs 1 based). yup -- thanks for the reminder. >> I need to do a moving average of the while grid over time. Can a >> time_series be n-d, with time as one of the axis? > Well, I never tried so I can tell you. Check wheter lib.moving_funcs > supports 2D data. hmm -- I see this: Definition: ts_lib.mov_average(data, span, dtype=None) Docstring: Calculates the moving average of a series. Parameters ---------- data : array-like Input data, as a sequence or (subclass of) ndarray. Masked arrays and TimeSeries objects are also accepted. The input array should be 1D or 2D at most. If the input array is 2D, the function is applied on each column. I've got a 3-d array -- darn! Maybe I'll poke into it and see if it can be generalized. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Mon Nov 30 20:39:39 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 30 Nov 2009 20:39:39 -0500 Subject: [SciPy-User] scikits.timeseries question In-Reply-To: <4B146E79.7090407@noaa.gov> References: <4B145C25.7040303@noaa.gov> <9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com> <4B146218.9000305@noaa.gov> <4B146E79.7090407@noaa.gov> Message-ID: <3EB215DA-3808-42CE-B2E0-6568B6B40C37@gmail.com> On Nov 30, 2009, at 8:16 PM, Christopher Barker wrote: > nope -- not duplicated, but maybe there are missing ones. The point is > that I have an array of "days since", and I want array of > timeseries.dates (which is a DateArray, yes?) Got it. Duplicated and/or missing dates correspond to the same problem: you can't assume that your dates are regularly spaced, so you can't use start_date and length. >> np.array(...) + sd gives you a ndarray of Date objects (so its dtype >> is np.object), and you use that as the input of date_array. The >> frequency should be recognized properly. > > OK -- though it seems I SHOULD be able to go straight to an DateArray, > and I'm still confused about what this means: Well, that depends on the type of starting date, actually. If it's a Date, adding a ndarray to it will give you a ndarray of Date objects. If it's a DateArray of length 1, it'll give you a DateArray. (Note to self: we could probably be a bit more consistent on this one...) >>> In [43]: da = ts.date_array((1,2,3,4), start_date=sd) >> >> Check the doc for date_array: the first argument can be >> * an existing :class:`DateArray` object; >> * a sequence of :class:`Date` objects with the same frequency; >> * a sequence of :class:`datetime.datetime` objects; >> * a sequence of dates in string format; >> * a sequence of integers corresponding to the representation of >> :class:`Date` objects. > > That's what I have: a sequence of integers corresponding to the > representation of the Date objects (doesn't it represent them as "units > since start date" where units is the "freq" ? No, not exactly: the representation of a Date objects is relative to an absolute build-in reference (Day #1 being 01/01/01). (Likewise, nump.datetime64 uses the standard 1970/01/01). We can't have a variable reference as it would be far too messy too quickly. Instead, you have to use the trick start_date + ndarray of integers to get what you want. > If that's not what if means, then what does it mean? If you have a 'A' frequency, that'd be a sequence like 2001, 2002, ... For a 'M' frequency, that'd be 24001 (for 2001/01), 24002 (for 2001/02)... For a 'D' frequency, that'd be 730486, 730487... for 2001/01/01, 2001/01/02... In other terms, the nb of units since the absolute reference. > > hmm -- I see this: > > Definition: > ts_lib.mov_average(data, span, dtype=None) > Docstring: > Calculates the moving average of a series. > > Parameters > ---------- > data : array-like > Input data, as a sequence or (subclass of) ndarray. > Masked arrays and TimeSeries objects are also accepted. > The input array should be 1D or 2D at most. > If the input array is 2D, the function is applied on each > column. > > I've got a 3-d array -- darn! Maybe I'll poke into it and see if it can > be generalized. 3D ? What are your actual variables ? Keep in mind that when we talk about dimensions with time series, we zap the time one, so if you have a series of maps, your array is only 2D in our terminology. If you have a time series of (lat, lon), mov_average will average your lats independently of your lons From ferrell at diablotech.com Mon Nov 30 21:59:32 2009 From: ferrell at diablotech.com (Robert Ferrell) Date: Mon, 30 Nov 2009 19:59:32 -0700 Subject: [SciPy-User] scikits.timeseries question In-Reply-To: <4B146E79.7090407@noaa.gov> References: <4B145C25.7040303@noaa.gov> <9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com> <4B146218.9000305@noaa.gov> <4B146E79.7090407@noaa.gov> Message-ID: <2B27D020-8DE3-4BFE-8B84-936E4A1B9FBE@diablotech.com> On Nov 30, 2009, at 6:16 PM, Christopher Barker wrote: > Pierre GM wrote: > >> Ah OK. Well, the answer is: that depends. iIf you know that your >> dates are just in daily increments from 2001-01-01 (like a range), >> then just use start_date and length. > > right -- but I don't know that. > >> If you may have several duplicated dates (like 2001-01-01, >> 2001-01-02, 2001-01-02, 2001-01-03...), then the easiest is probably: >> >>>>> da = ts.date_array(np.array(0,1,1,2)+sd) > > nope -- not duplicated, but maybe there are missing ones. The point is > that I have an array of "days since", and I want array of > timeseries.dates (which is a DateArray, yes?) I don't think so. An array of dates is not a DateArray. In [98]: sd = ts.Date('d', '2001-01-01') In [99]: zeros(4) + sd Out[99]: array([01-Jan-2001, 01-Jan-2001, 01-Jan-2001, 01-Jan-2001], dtype=object) This seems natural to me, (array + Date = array) although I do have to include an extra line sometimes to get a DateArray if I need it. If I need a timeseries, sometimes I can skip making the DateArray explicitly. In [109]: a = arange(4) + sd In [110]: a Out[110]: array([01-Jan-2001, 02-Jan-2001, 03-Jan-2001, 04-Jan-2001], dtype=object) In [111]: ts.time_series([1,2,3,4], dates=a) Out[111]: timeseries([1 2 3 4], dates = [01-Jan-2001 ... 04-Jan-2001], freq = D) > >> np.array(...) + sd gives you a ndarray of Date objects (so its dtype >> is np.object), and you use that as the input of date_array. The >> frequency should be recognized properly. > > OK -- though it seems I SHOULD be able to go straight to an DateArray, Is the issue that sd is a Date and not a DateArray? You can always make a DataArray with sd, of the correct length, and then add to that: In [83]: sd = ts.Date('d', '2001-01-01') In [84]: d1 = ts.date_array(zeros(4) + sd) In [85]: d1 Out[85]: DateArray([01-Jan-2001, 01-Jan-2001, 01-Jan-2001, 01-Jan-2001], freq='D') In [86]: d1 + array([0,2,3,5]) Out[86]: DateArray([01-Jan-2001, 03-Jan-2001, 04-Jan-2001, 06-Jan-2001], freq='D') I'm probably telling you things that are obvious and are not addressing your question. > and I'm still confused about what this means: > >>> In [43]: da = ts.date_array((1,2,3,4), start_date=sd) This throws an exception for me. : year=1 is before 1900; the datetime strftime() methods require year >= 1900 -robert From ferrell at diablotech.com Mon Nov 30 22:15:29 2009 From: ferrell at diablotech.com (Robert Ferrell) Date: Mon, 30 Nov 2009 20:15:29 -0700 Subject: [SciPy-User] scikits.timeseries question In-Reply-To: <55C6FC2C-8242-47CA-817D-4E0289C7B9DD@gmail.com> References: <4B145C25.7040303@noaa.gov> <9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com> <4B146218.9000305@noaa.gov> <5B296682-78C2-44AC-81AD-CCC220B8E47F@diablotech.com> <55C6FC2C-8242-47CA-817D-4E0289C7B9DD@gmail.com> Message-ID: On Nov 30, 2009, at 6:06 PM, Pierre GM wrote: > On Nov 30, 2009, at 7:53 PM, Robert Ferrell wrote: >> >> I may be misunderstanding what you are trying to do, but here's >> what I >> do: >> >> In [68]: sd = ts.Date('d', '2001-01-01') >> >> In [69]: dates = ts.date_array(cumsum(ones(4)) + sd) >> >> In [70]: dates >> Out[70]: >> DateArray([02-Jan-2001, 03-Jan-2001, 04-Jan-2001, 05-Jan-2001], >> freq='D') > > The cumsum approach works only if you have irregular time steps as > inputs (as in 1 day after the first, 1 day after that, 3 days after > that...). If you have regular time steps of 1, just use arange > +start_date (or even just length+start_date) Sort of. The cumsum approach works even if the intervals are uniform, of course, but it may be overkill and arange may be sufficient. In any case, I get the impression that the OP has an array of integer offsets generated in some other fashion entirely. From pgmdevlist at gmail.com Mon Nov 30 23:03:22 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 30 Nov 2009 23:03:22 -0500 Subject: [SciPy-User] scikits.timeseries question In-Reply-To: <2B27D020-8DE3-4BFE-8B84-936E4A1B9FBE@diablotech.com> References: <4B145C25.7040303@noaa.gov> <9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com> <4B146218.9000305@noaa.gov> <4B146E79.7090407@noaa.gov> <2B27D020-8DE3-4BFE-8B84-936E4A1B9FBE@diablotech.com> Message-ID: <17E9CA7E-7446-4202-997C-9AB5081977C0@gmail.com> On Nov 30, 2009, at 9:59 PM, Robert Ferrell wrote: > This seems natural to me, (array + Date = array) although I do have to > include an extra line sometimes to get a DateArray if I need it. If I > need a timeseries, sometimes I can skip making the DateArray explicitly. Well, keep in mind that Date was implemented a few years ago already, far before the new datetime64 dtype, and it was the easiest way we had to define a new datatype (well, a kind of datatype). I'll check how we can merge the two approaches when I'll have some time. Anyhow, in practice, a Date object will be seen as a np.object by numpy, and you end up having a ndarray with a np.object dtype. > Is the issue that sd is a Date and not a DateArray? You can always > make a DataArray with sd, of the correct length, and then add to that: > > In [83]: sd = ts.Date('d', '2001-01-01') > > In [84]: d1 = ts.date_array(zeros(4) + sd) Wow, that's overkill ! Just make sd a DateArray: >>> np.arange(4) + ts.DateArray(sd) Now, because DateArray is a subclass of ndarray with a higher priority, its _add__ method takes over and the ouput is a DateArray. > >> and I'm still confused about what this means: >> >>>> In [43]: da = ts.date_array((1,2,3,4), start_date=sd) > > This throws an exception for me. > > : year=1 is before 1900; the datetime > strftime() methods require year >= 1900 What version are you using ? And anyway, you get the exception only if you try to print it (as strftime is called only when calling repr/str) From mattknox.ca at gmail.com Mon Nov 30 23:13:55 2009 From: mattknox.ca at gmail.com (Matt Knox) Date: Tue, 1 Dec 2009 04:13:55 +0000 (UTC) Subject: [SciPy-User] scikits.timeseries question References: <4B145C25.7040303@noaa.gov> <9DF1735B-BD7A-456A-8DD2-08C281440221@gmail.com> <4B146218.9000305@noaa.gov> <4B146E79.7090407@noaa.gov> Message-ID: Christopher Barker noaa.gov> writes: > >> In [43]: da = ts.date_array((1,2,3,4), start_date=sd) > > > > Check the doc for date_array: the first argument can be > > * an existing :class:`DateArray` object; > > * a sequence of :class:`Date` objects with the same frequency; > > * a sequence of :class:`datetime.datetime` objects; > > * a sequence of dates in string format; > > * a sequence of integers corresponding to the representation of > > :class:`Date` objects. > > That's what I have: a sequence of integers corresponding to the > representation of the Date objects (doesn't it represent them as "units > since start date" where units is the "freq" ? > > If that's not what if means, then what does it mean? I agree the documentation is perhaps a bit confusing here. The sequence of integers being referred to are the internal representation of the Date objects (eg. ts.now('d').value) which is absolute, not relative (not relative to a custom start date anyway). Another thing you are missing is that the first argument (dlist) is not supposed to be used in conjunction with the start_date parameter. There are a couple ways to call date_array: 1. using the `dlist` argument, possibly in combination with the `freq` argument if freq is not implicit with the dlist being passed. 2. Using the `start_date` parameter in combination with either the `length` or `end_date` parameter. This option would only be used for a continuous time series (ie. no missing or duplicated dates) Whether this is a good api is probably debateable, but that is how it works currently. In addition to the methods described by Pierre and Robert, you could also do: >>> sd = ts.now('d') >>> relative_days = np.array([1,5,8]) >>> absolute_days = relative_days + sd.value >>> darray = ts.date_array(absolute_days, freq = sd.freq) which I think probably has the lowest overhead (but don't hold me to that :) ) if that matters for your application. - Matt