From stefan at sun.ac.za Mon Mar 2 00:55:39 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 2 Mar 2009 07:55:39 +0200 Subject: [SciPy-dev] Advice on Simulated Annealing (ticket #875) Message-ID: <9457e7c80903012155q79e148f4r4ae7af061b61a15@mail.gmail.com> Hi, Is there anyone here who has some experience with simulated annealing? The code at http://scipy.org/scipy/scipy/ticket/875#comment:1 looks fragile, but I don't know how to fix it best. Thanks St?fan From josef.pktd at gmail.com Mon Mar 2 06:53:29 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 2 Mar 2009 06:53:29 -0500 Subject: [SciPy-dev] Advice on Simulated Annealing (ticket #875) In-Reply-To: <9457e7c80903012155q79e148f4r4ae7af061b61a15@mail.gmail.com> References: <9457e7c80903012155q79e148f4r4ae7af061b61a15@mail.gmail.com> Message-ID: <1cd32cbb0903020353v7bdb727bh339a5ca2e0a83f81@mail.gmail.com> On Mon, Mar 2, 2009 at 12:55 AM, St?fan van der Walt wrote: > Hi, > > Is there anyone here who has some experience with simulated annealing? > > The code at > > http://scipy.org/scipy/scipy/ticket/875#comment:1 > > looks fragile, but I don't know how to fix it best. > > Thanks > St?fan I had looked at this ticket briefly, and, I think, following his suggestion of building the random array xt incrementally should not affect any other code and speed up finding a random array inside the bounds. The elements of xc are independently drawn, so in each iteration only those values, for which indu or indo are true, have to be replaced, and successfull draws can be kept. xc can be drawn for the full array and partially discarded, since this is cheap. something like this (use correct `or` and initialize indu and indo for first iteration) should work xt[indu or indo] = x0[indu or indo] + xc[indu or indo] Josef From sturla at molden.no Mon Mar 2 07:52:37 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 02 Mar 2009 13:52:37 +0100 Subject: [SciPy-dev] Advice on Simulated Annealing (ticket #875) In-Reply-To: <9457e7c80903012155q79e148f4r4ae7af061b61a15@mail.gmail.com> References: <9457e7c80903012155q79e148f4r4ae7af061b61a15@mail.gmail.com> Message-ID: <49ABD695.207@molden.no> On 3/2/2009 6:55 AM, St?fan van der Walt wrote: > Hi, > > Is there anyone here who has some experience with simulated annealing? > > The code at > > http://scipy.org/scipy/scipy/ticket/875#comment:1 An atrocious thing about SciPy's SA is line 188 in anneal.py that says schedule = eval(schedule+'_sa()') S.M. From sturla at molden.no Mon Mar 2 08:54:09 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 02 Mar 2009 14:54:09 +0100 Subject: [SciPy-dev] Advice on Simulated Annealing (ticket #875) In-Reply-To: <9457e7c80903012155q79e148f4r4ae7af061b61a15@mail.gmail.com> References: <9457e7c80903012155q79e148f4r4ae7af061b61a15@mail.gmail.com> Message-ID: <49ABE501.4050100@molden.no> On 3/2/2009 6:55 AM, St?fan van der Walt wrote: > http://scipy.org/scipy/scipy/ticket/875#comment:1 > > looks fragile, but I don't know how to fix it best. This ticket i bogus. There is no simple_sa in anneal.py in SVN, and line 151 is a docstring. Google says that the simple_sa class was posted to this mailing list two years ago by William Ratcliff. But as far as I can tell it is not in SciPy. Sturla Molden From cimrman3 at ntc.zcu.cz Mon Mar 2 09:14:00 2009 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Mon, 02 Mar 2009 15:14:00 +0100 Subject: [SciPy-dev] ANN: SfePy 2009.1 Message-ID: <49ABE9A8.2040002@ntc.zcu.cz> I am pleased to announce the release of SfePy 2009.1. SfePy (simple finite elements in Python) is a finite element analysis software based primarily on Numpy and SciPy. Mailing lists, issue tracking, git repository: http://sfepy.org Home page: http://sfepy.kme.zcu.cz Major improvements: - new solvers: - simple backtracking steepest descent optimization solver - PETSc Krylov solvers via petsc4py, sequential mode - LOBPCG eigenvalue solver (SciPy implementation) - new mesh readers: - mesh3d (hermes3d) - AVS UCD ascii mesh - Hypermesh ascii mesh - homogenization framework: - unified approach to resolve data dependencies: HomogenizationEngine class - switched DVCS from mercurial to git Applications: - phononic materials: - dispersion analysis, phase velocity computation for phononic materials - caching of coefficients to speed up parametric runs - schroedinger.py: - fixed DFT iterations, iteration plot saving - basic smearing around Fermi limit For more information on this release, see http://sfepy.googlecode.com/svn/web/releases/2009.1_RELEASE_NOTES.txt Best regards, Robert Cimrman From william.ratcliff at gmail.com Mon Mar 2 09:43:09 2009 From: william.ratcliff at gmail.com (william ratcliff) Date: Mon, 2 Mar 2009 09:43:09 -0500 Subject: [SciPy-dev] Advice on Simulated Annealing (ticket #875) In-Reply-To: <49ABE501.4050100@molden.no> References: <9457e7c80903012155q79e148f4r4ae7af061b61a15@mail.gmail.com> <49ABE501.4050100@molden.no> Message-ID: <827183970903020643r668756efvf3db36c97cb3b42b@mail.gmail.com> I posted a suggested patch because it would seem that the simulated annealing routine in scipy as it was would not respect bounds for the search space. I'll try to reconstruct the test case to show where it fails. There seemed to be a lack of interest in a version that did respect bounds (upper/lower limits on the parameters), so I've just been using my own copy. Cheers, William On Mon, Mar 2, 2009 at 8:54 AM, Sturla Molden wrote: > On 3/2/2009 6:55 AM, St?fan van der Walt wrote: > > > http://scipy.org/scipy/scipy/ticket/875#comment:1 > > > > looks fragile, but I don't know how to fix it best. > > This ticket i bogus. There is no simple_sa in anneal.py in SVN, and line > 151 is a docstring. Google says that the simple_sa class was posted to > this mailing list two years ago by William Ratcliff. But as far as I can > tell it is not in SciPy. > > Sturla Molden > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From william.ratcliff at gmail.com Mon Mar 2 12:04:13 2009 From: william.ratcliff at gmail.com (william ratcliff) Date: Mon, 2 Mar 2009 12:04:13 -0500 Subject: [SciPy-dev] Advice on Simulated Annealing (ticket #875) In-Reply-To: <827183970903020643r668756efvf3db36c97cb3b42b@mail.gmail.com> References: <9457e7c80903012155q79e148f4r4ae7af061b61a15@mail.gmail.com> <49ABE501.4050100@molden.no> <827183970903020643r668756efvf3db36c97cb3b42b@mail.gmail.com> Message-ID: <827183970903020904j53459cdaj5d27b9348b99399e@mail.gmail.com> Here is code that will demonstrate the failure. Suppose you want to minimize the simple function f(x,y)=x^2+y^2, but you want to do it in a specified domain. This will not respect the upper and lower bounds: cheers, William import numpy as N import scipy.optimize.anneal as anneal def fcn(p): x,y=p result=x**2+y**2 return result if __name__=="__main__": p0=N.array([3,3],'d') lowerm=[1,1] upperm=[4,4] myschedule='fast' p0,jmin=anneal(fcn,p0,\ schedule=myschedule,lower=lowerm,upper=upperm,\ maxeval=None, maxaccept=None,dwell=10,maxiter=600,T0=10000) print 'p0',p0,'jmin',jmin On Mon, Mar 2, 2009 at 9:43 AM, william ratcliff wrote: > I posted a suggested patch because it would seem that the simulated > annealing routine in scipy as it was would not respect bounds for the search > space. I'll try to reconstruct the test case to show where it fails. There > seemed to be a lack of interest in a version that did respect bounds > (upper/lower limits on the parameters), so I've just been using my own copy. > > > Cheers, > William > > > On Mon, Mar 2, 2009 at 8:54 AM, Sturla Molden wrote: > >> On 3/2/2009 6:55 AM, St?fan van der Walt wrote: >> >> > http://scipy.org/scipy/scipy/ticket/875#comment:1 >> > >> > looks fragile, but I don't know how to fix it best. >> >> This ticket i bogus. There is no simple_sa in anneal.py in SVN, and line >> 151 is a docstring. Google says that the simple_sa class was posted to >> this mailing list two years ago by William Ratcliff. But as far as I can >> tell it is not in SciPy. >> >> Sturla Molden >> _______________________________________________ >> Scipy-dev mailing list >> Scipy-dev at scipy.org >> http://projects.scipy.org/mailman/listinfo/scipy-dev >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Mar 2 14:04:11 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 2 Mar 2009 14:04:11 -0500 Subject: [SciPy-dev] Advice on Simulated Annealing (ticket #875) In-Reply-To: <827183970903020904j53459cdaj5d27b9348b99399e@mail.gmail.com> References: <9457e7c80903012155q79e148f4r4ae7af061b61a15@mail.gmail.com> <49ABE501.4050100@molden.no> <827183970903020643r668756efvf3db36c97cb3b42b@mail.gmail.com> <827183970903020904j53459cdaj5d27b9348b99399e@mail.gmail.com> Message-ID: <1cd32cbb0903021104j3e47c1devb3b9e02e9a040355@mail.gmail.com> On Mon, Mar 2, 2009 at 12:04 PM, william ratcliff wrote: > Here is code that will demonstrate the failure.? Suppose you want to > minimize the simple function f(x,y)=x^2+y^2, but you want to do it in a > specified domain.? This will not respect the upper and lower bounds: > > cheers, > William > > > import numpy as N > import scipy.optimize.anneal as anneal > > def fcn(p): > ??? x,y=p > ??? result=x**2+y**2 > ??? return result > > > if __name__=="__main__": > ??? p0=N.array([3,3],'d') > ??? lowerm=[1,1] > ??? upperm=[4,4] > ??? myschedule='fast' > ??? p0,jmin=anneal(fcn,p0,\ > ????????????????? schedule=myschedule,lower=lowerm,upper=upperm,\ > ????????????????? maxeval=None, > maxaccept=None,dwell=10,maxiter=600,T0=10000) > ??? print 'p0',p0,'jmin',jmin > > After looking a bit more carefully: `upper` and `lower` in` fast_ca` are the bounds on the updating increment, xc, not on the parameters that are estimated x0, xnew and in the ticket. I didn't see any constraints on the parameters themselves in anneal. The current bounds restrict the updating to local perturbations, while in your case perturbations would always be global. Rewriting anneal to incorporate bounds might be a good enhancement but, I think, you need to distinguish between bounds on the parameters and bounds on the update increments. Then the update increments can easily bound by (xbounds - x0) and you don't need iteration to find the updated values. Josef From bsouthey at gmail.com Mon Mar 2 14:09:17 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 02 Mar 2009 13:09:17 -0600 Subject: [SciPy-dev] Depreciating functions in scipy.stats In-Reply-To: <7CFAD058-CB6E-4CAB-A59B-4AF03FB365A7@gmail.com> References: <49A7171F.5070500@gmail.com> <1cd32cbb0902261547v43de301du7199b3bb7af26c47@mail.gmail.com> <49A800A7.4080407@gmail.com> <1cd32cbb0902270819l7da77aacv9ff8f7ac3c60fd1b@mail.gmail.com> <64BE787E-059E-4BC4-A9BC-F40B294442DD@gmail.com> <1cd32cbb0902271148u12111254t72b76ddbc5efb17b@mail.gmail.com> <1cd32cbb0902271214q2e08a68ciead44452234f2f30@mail.gmail.com> <6CF41A55-2514-410F-8C68-CA46D26A628F@gmail.com> <1cd32cbb0902271352j928a978g3a3f0cd4b53b69a4@mail.gmail.com> <7CFAD058-CB6E-4CAB-A59B-4AF03FB365A7@gmail.com> Message-ID: <49AC2EDD.4080000@gmail.com> Hi, I am seeing a few functions that should be made depreciated as these appear to duplicate Numpy or Scipy functions. Do you want these as new or old tickets (for example, samplestd has ticket #81 as part of the Statistics Review)? Would you want a large patch or one for each ticket? These functions are just renamed functions present in scipy.special just with perhaps slightly more informative names: erfc ksprob fprob chisqprob zprob But I do not think we need these as separate functions but there is the issue of depreciation involved if users use these specific functions. There are other like that should be treated as depreciated: samplestd samplevar >>> import numpy as np >>> import scipy.stats.stats as stats >>> a=np.array([[1,2,3,4,5], [6,7,8,9,10]]) >>> np.std(a,axis=0) array([ 2.5, 2.5, 2.5, 2.5, 2.5]) >>> stats.samplestd(a,axis=0) array([ 2.5, 2.5, 2.5, 2.5, 2.5]) >>> stats.samplestd(a,axis=None) 2.8722813232690143 >>> np.std(a,axis=None) 2.8722813232690143 Also, stats.py has the histogram and histogram2 functions where I agree with the comment in the code about being obsoleted by numpy.histogram. I would think these should be depreciated although the cumfreq and relfreq functions would need to be rewritten, Thanks Bruce From william.ratcliff at gmail.com Mon Mar 2 14:15:43 2009 From: william.ratcliff at gmail.com (william ratcliff) Date: Mon, 2 Mar 2009 14:15:43 -0500 Subject: [SciPy-dev] Advice on Simulated Annealing (ticket #875) In-Reply-To: <1cd32cbb0903021104j3e47c1devb3b9e02e9a040355@mail.gmail.com> References: <9457e7c80903012155q79e148f4r4ae7af061b61a15@mail.gmail.com> <49ABE501.4050100@molden.no> <827183970903020643r668756efvf3db36c97cb3b42b@mail.gmail.com> <827183970903020904j53459cdaj5d27b9348b99399e@mail.gmail.com> <1cd32cbb0903021104j3e47c1devb3b9e02e9a040355@mail.gmail.com> Message-ID: <827183970903021115m6eb81aeby8e3391d2579df22e@mail.gmail.com> Perhaps we could enhance the documentation so this is clear? Also, having another module which does impose bounds on the actual values of the parameters would be useful. My ansatz was that if the next iteration would fall outside of the bounds, stay at the current location. Cheers, William On Mon, Mar 2, 2009 at 2:04 PM, wrote: > On Mon, Mar 2, 2009 at 12:04 PM, william ratcliff > wrote: > > Here is code that will demonstrate the failure. Suppose you want to > > minimize the simple function f(x,y)=x^2+y^2, but you want to do it in a > > specified domain. This will not respect the upper and lower bounds: > > > > cheers, > > William > > > > > > import numpy as N > > import scipy.optimize.anneal as anneal > > > > def fcn(p): > > x,y=p > > result=x**2+y**2 > > return result > > > > > > if __name__=="__main__": > > p0=N.array([3,3],'d') > > lowerm=[1,1] > > upperm=[4,4] > > myschedule='fast' > > p0,jmin=anneal(fcn,p0,\ > > schedule=myschedule,lower=lowerm,upper=upperm,\ > > maxeval=None, > > maxaccept=None,dwell=10,maxiter=600,T0=10000) > > print 'p0',p0,'jmin',jmin > > > > > > After looking a bit more carefully: > > `upper` and `lower` in` fast_ca` are the bounds on the updating > increment, xc, not on the parameters that are estimated x0, xnew and > in the ticket. I didn't see any constraints on the parameters > themselves in anneal. > > The current bounds restrict the updating to local perturbations, while > in your case perturbations would always be global. > > Rewriting anneal to incorporate bounds might be a good enhancement > but, I think, you need to distinguish between bounds on the parameters > and bounds on the update increments. Then the update increments can > easily bound by (xbounds - x0) and you don't need iteration to find > the updated values. > > Josef > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Mar 2 15:47:30 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 2 Mar 2009 15:47:30 -0500 Subject: [SciPy-dev] Depreciating functions in scipy.stats In-Reply-To: <49AC2EDD.4080000@gmail.com> References: <49A7171F.5070500@gmail.com> <49A800A7.4080407@gmail.com> <1cd32cbb0902270819l7da77aacv9ff8f7ac3c60fd1b@mail.gmail.com> <64BE787E-059E-4BC4-A9BC-F40B294442DD@gmail.com> <1cd32cbb0902271148u12111254t72b76ddbc5efb17b@mail.gmail.com> <1cd32cbb0902271214q2e08a68ciead44452234f2f30@mail.gmail.com> <6CF41A55-2514-410F-8C68-CA46D26A628F@gmail.com> <1cd32cbb0902271352j928a978g3a3f0cd4b53b69a4@mail.gmail.com> <7CFAD058-CB6E-4CAB-A59B-4AF03FB365A7@gmail.com> <49AC2EDD.4080000@gmail.com> Message-ID: <1cd32cbb0903021247i38861e7ak33d1330881da818c@mail.gmail.com> On Mon, Mar 2, 2009 at 2:09 PM, Bruce Southey wrote: > Hi, > I am seeing a few functions that should be made depreciated as these > appear to duplicate Numpy or Scipy functions. > > Do you want these as new or old tickets (for example, samplestd has > ticket #81 as part of the Statistics Review)? > Would you want a large patch or one for each ticket? I agree with all the depreciation, and there might be some more (eg. sem and stderr are essentially the same). For depreciation warnings I would prefer one new ticket with one patch (or easier for me to verify is the changed complete sourcefile of stats.py) > > These functions are just renamed functions present in scipy.special just > with perhaps slightly more informative names: > erfc > ksprob > fprob > chisqprob > zprob Most calls to these functions can be replaced to calls to the distribution, e.g distributions.f.sf, as I did for the t-tests. However, I have seen them used in some external packages, and a release with a depreciation warning might be necessary. > But I do not think we need these as separate functions but there is the > issue of depreciation involved if users use these specific functions. > > There are other like that should be treated as depreciated: > samplestd > samplevar > > ?>>> import numpy as np > ?>>> import scipy.stats.stats as stats > ?>>> a=np.array([[1,2,3,4,5], [6,7,8,9,10]]) > ?>>> np.std(a,axis=0) > array([ 2.5, ?2.5, ?2.5, ?2.5, ?2.5]) > ?>>> stats.samplestd(a,axis=0) > array([ 2.5, ?2.5, ?2.5, ?2.5, ?2.5]) > ?>>> stats.samplestd(a,axis=None) > 2.8722813232690143 > ?>>> np.std(a,axis=None) > 2.8722813232690143 > > Also, stats.py has the histogram and histogram2 functions where I agree > with the comment in the code about being obsoleted by numpy.histogram. > I would think these should be depreciated although the cumfreq and > relfreq functions would need to be rewritten, > I never looked closely at the histogram and histogram2 functions in stats, because I also use the numpy version. So I don't know if they have equivalent functionality. Neither the histogram functions nor cumfreq and relfreq have tests, so before depreciating we should find out what these functions are doing for different cases. > Thanks > Bruce Thank you for checking this Josef From bsouthey at gmail.com Mon Mar 2 16:59:22 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 02 Mar 2009 15:59:22 -0600 Subject: [SciPy-dev] Depreciating functions in scipy.stats In-Reply-To: <1cd32cbb0903021247i38861e7ak33d1330881da818c@mail.gmail.com> References: <49A7171F.5070500@gmail.com> <49A800A7.4080407@gmail.com> <1cd32cbb0902270819l7da77aacv9ff8f7ac3c60fd1b@mail.gmail.com> <64BE787E-059E-4BC4-A9BC-F40B294442DD@gmail.com> <1cd32cbb0902271148u12111254t72b76ddbc5efb17b@mail.gmail.com> <1cd32cbb0902271214q2e08a68ciead44452234f2f30@mail.gmail.com> <6CF41A55-2514-410F-8C68-CA46D26A628F@gmail.com> <1cd32cbb0902271352j928a978g3a3f0cd4b53b69a4@mail.gmail.com> <7CFAD058-CB6E-4CAB-A59B-4AF03FB365A7@gmail.com> <49AC2EDD.4080000@gmail.com> <1cd32cbb0903021247i38861e7ak33d1330881da818c@mail.gmail.com> Message-ID: <49AC56BA.5050904@gmail.com> josef.pktd at gmail.com wrote: > On Mon, Mar 2, 2009 at 2:09 PM, Bruce Southey wrote: > >> Hi, >> I am seeing a few functions that should be made depreciated as these >> appear to duplicate Numpy or Scipy functions. >> >> Do you want these as new or old tickets (for example, samplestd has >> ticket #81 as part of the Statistics Review)? >> Would you want a large patch or one for each ticket? >> > > I agree with all the depreciation, and there might be some more (eg. sem and > stderr are essentially the same). For depreciation warnings I would prefer > one new ticket with one patch (or easier for me to verify is the > changed complete > sourcefile of stats.py) > > >> These functions are just renamed functions present in scipy.special just >> with perhaps slightly more informative names: >> erfc >> ksprob >> fprob >> chisqprob >> zprob >> > > Most calls to these functions can be replaced to calls to the distribution, e.g > distributions.f.sf, as I did for the t-tests. However, I have seen them used in > some external packages, and a release with a depreciation warning might be > necessary. > > I agree that these should be first depreciated. I will try to write these when I get the time. >> But I do not think we need these as separate functions but there is the >> issue of depreciation involved if users use these specific functions. >> >> There are other like that should be treated as depreciated: >> samplestd >> samplevar >> Okay, I have created two tickets with hopefully suitable patches for: samplevar: 877 samplestd: 878 I did not change the info.py and any tests but these will need to be changed if the patches are applied. Also, if you apply these patches, I think that tickets 80 and 81 can be closed. >> >> Also, stats.py has the histogram and histogram2 functions where I agree >> with the comment in the code about being obsoleted by numpy.histogram. >> I would think these should be depreciated although the cumfreq and >> relfreq functions would need to be rewritten, >> >> > > I never looked closely at the histogram and histogram2 functions in stats, > because I also use the numpy version. So I don't know if they have > equivalent functionality. > I have not examined it in detail, histogram is different from numpy in various ways like arguments and implementation. But, after the histogram discussion on the numpy list, I do not consider that it is sufficiently different than the numpy version to justify yet another version. I think histogram2 is just a utility function than a useful function and it not used elsewhere. > Neither the histogram functions nor cumfreq and relfreq have tests, so > before depreciating we should find out what these functions are doing > for different cases. > When I get to these functions, I will look into providing tests. Also these may need changes depending on what happens with histogram. > >> Thanks >> Bruce >> > > Thank you for checking this > > Josef > No problems (yet) especially when I will have more 'issues' as I go through these functions. Bruce From stefan at sun.ac.za Mon Mar 2 17:17:11 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 3 Mar 2009 00:17:11 +0200 Subject: [SciPy-dev] Hosting infrastructure upgrade tomorrow Message-ID: <9457e7c80903021417p7d35d01dh46769ba5c93fb4cf@mail.gmail.com> Hi all, Tomorrow afternoon at 14:00 UTC, the SciPy SVN and Trac services will be migrated to a new machine. Please be advised that, for a period of two hours, access to these and other services hosted on scipy.org may be unavailable. Regards St?fan From lesserwhirls at gmail.com Mon Mar 2 22:25:29 2009 From: lesserwhirls at gmail.com (Sean Arms) Date: Mon, 2 Mar 2009 21:25:29 -0600 Subject: [SciPy-dev] Continuous Wavelet Transform for SciPy Message-ID: Greetings! My name is Sean Arms and I'm a graduate student at the University of Oklahoma in the School of Meteorology. As part of my PhD research, I'm studying coherent structures in atmospheric boundary layer turbulence, primarily using in-situ observations and, secondarily, Direct Numerical Simulation (DNS) output. One common approach for detecting coherent structures in observational datasets relies on the use of the global wavelet power spectrum as estimated from a continuous wavelet transform (CWT). I know SciPy has a DWT impementation, and I've already been in contact with Filip. He recommeded that I post my code in hopes that it would add some momentum to the python-cwt development and create some feedback (I'm currently looking for a good place to post my code). I've implemented the CWT using pure python (that is, I did not write any C extension code myself - nothing to build), along with one mother wavelet (second derivitive of a Gaussian, or the Mexican Hat) - I'll be adding more Mother wavelets as I go along. I've made it a point to (try to) design my MotherWavelet class to be easily extendable. I'm working on documentation and a few tests at the moment, but so far my code compares well with other wavelet routines. The point of this email is to introduce myself and let the SciPy dev community know that I am willing to help develop CWT support for SciPy - I'll already be doing the work for my research, so I might as well put in the extra effort to make is usable by the larger community! Cheers! Sean Arms Graduate Research Assistant School of Meteorology University of Oklahoma -------------- next part -------------- An HTML attachment was scrubbed... URL: From thouis at broad.mit.edu Mon Mar 2 22:40:43 2009 From: thouis at broad.mit.edu (Thouis (Ray) Jones) Date: Mon, 2 Mar 2009 22:40:43 -0500 Subject: [SciPy-dev] Question: general heap implementation versus specific? Message-ID: <6c17e6f50903021940y397605aamc3b9aac6dbbd2aba@mail.gmail.com> Lee Kamentsky has been working on a replacement for the watershed image transformation in scipy (the current one isn't exactly broken, but behaves in unexpected ways for some images). As part of this work he implemented heaps in cython, specific to this problem (integers, 4 entries per heap element). I spent some time generalizing it to arbitrary number of entries per element, with the thought that it could be of use elsewhere in scipy. I wonder now if that was worthwhile. The performance cost of the generalization isn't horrible, but not negligible, and I wonder if it the code can actually be used elsewhere effectively. For instance, the recent kdTree code has a heap implementation as well, but I don't think it could take advantage of this code (I haven't looked too closely). Any opinions or advice on the relative cost of generality and simplicity versus performance? I've attached the current heap implementation for reference. Ray Jones -------------- next part -------------- A non-text attachment was scrubbed... Name: heap.pxi Type: application/octet-stream Size: 4593 bytes Desc: not available URL: From peridot.faceted at gmail.com Mon Mar 2 23:24:03 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Mon, 2 Mar 2009 23:24:03 -0500 Subject: [SciPy-dev] Question: general heap implementation versus specific? In-Reply-To: <6c17e6f50903021940y397605aamc3b9aac6dbbd2aba@mail.gmail.com> References: <6c17e6f50903021940y397605aamc3b9aac6dbbd2aba@mail.gmail.com> Message-ID: 2009/3/2 Thouis (Ray) Jones : > Lee Kamentsky has been working on a replacement for the watershed > image transformation in scipy (the current one isn't exactly broken, > but behaves in unexpected ways for some images). ?As part of this work > he implemented heaps in cython, specific to this problem (integers, 4 > entries per heap element). ?I spent some time generalizing it to > arbitrary number of entries per element, with the thought that it > could be of use elsewhere in scipy. > > I wonder now if that was worthwhile. ?The performance cost of the > generalization isn't horrible, but not negligible, and I wonder if it > the code can actually be used elsewhere effectively. ?For instance, > the recent kdTree code has a heap implementation as well, but I don't > think it could take advantage of this code (I haven't looked too > closely). I wrote the heap implementation in cKDTree, and I wrestled with the same question. As JWZ put it in the XKeyCaps source "I'd just like to take this moment to point out that C has all the expressive power of two dixie cups and a string." The problem is really one of writing generic data structures in C. For a heap, I see two relevant questions: what is the type of the keys? and how big are the objects being stored? For the type of the keys, you clearly want different implementations for each of int, short int, float, double, user-defined type (with comparison function). If the stored objects are small - ints or the like - you can stow them in the heap directly, copying when you have to move heap objects around. If they're large, then you want to store pointers to them, and have some well-defined memory allocation and deallocation syntax. (The cKDTree code deals with this an an ad-hoc manner with a union.) While C++ templates should be able to help with this sort of thing, I think as long as one is sticking to C and to cython, there's no good solution to making a general-enough library. Of course, if you want a fully-flexible heap at the python level, that exists already. But I think, annoying as it is, we're stuck reimplementing heaps. That said, I suspect that with a little care it might be possible to merge the heap implementations between cKDTree and the image transformations. Anne > Any opinions or advice on the relative cost of generality and > simplicity versus performance? > > I've attached the current heap implementation for reference. > > Ray Jones > > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-dev > > From josef.pktd at gmail.com Tue Mar 3 00:38:47 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 3 Mar 2009 00:38:47 -0500 Subject: [SciPy-dev] Advice on Simulated Annealing (ticket #875) In-Reply-To: <827183970903021115m6eb81aeby8e3391d2579df22e@mail.gmail.com> References: <9457e7c80903012155q79e148f4r4ae7af061b61a15@mail.gmail.com> <49ABE501.4050100@molden.no> <827183970903020643r668756efvf3db36c97cb3b42b@mail.gmail.com> <827183970903020904j53459cdaj5d27b9348b99399e@mail.gmail.com> <1cd32cbb0903021104j3e47c1devb3b9e02e9a040355@mail.gmail.com> <827183970903021115m6eb81aeby8e3391d2579df22e@mail.gmail.com> Message-ID: <1cd32cbb0903022138s95afcbt8a96427769bca57f@mail.gmail.com> On Mon, Mar 2, 2009 at 2:15 PM, william ratcliff wrote: > Perhaps we could enhance the documentation so this is clear?? Also, having > another module which does impose bounds on the actual values of the > parameters would be useful.? My ansatz was that if the next iteration would > fall outside of the bounds, stay at the current location. > > > Cheers, > William I checked again and neither anneal nor brute have any tests. Are there any good or classical test cases for global optimization? I only know the basic principles about simulated annealing, but looking at the implementation in anneal, it seems to me that your extension to impose bounds in update_guess should work, at least in fast and simple (I didn't look carefully at the others.) lower and upper bounds are also used in `getstart_temp`, which might also need to be rewritten if the bounds of the increment are different from the bounds on the underlying parameter. As we discussed last week, new code needs to be accompanied by proper tests. Especially, since no one has been really "maintaining" anneal for a long time, that's the impression I get from the changelog, >My ansatz was that if the next iteration would > fall outside of the bounds, stay at the current location. That's not my reading of the code, since, if xt is outside, then a new increment candidate, xc, is drawn, which underlies the problem of the ticket (looking for a very long time for a new candidate). But I still think a more efficient drawing from a rectangular area inside the parameter bounds should be sufficient to make this work. Josef From scott.sinclair.za at gmail.com Tue Mar 3 01:04:07 2009 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Tue, 3 Mar 2009 08:04:07 +0200 Subject: [SciPy-dev] Advice on Simulated Annealing (ticket #875) In-Reply-To: <827183970903021115m6eb81aeby8e3391d2579df22e@mail.gmail.com> References: <9457e7c80903012155q79e148f4r4ae7af061b61a15@mail.gmail.com> <49ABE501.4050100@molden.no> <827183970903020643r668756efvf3db36c97cb3b42b@mail.gmail.com> <827183970903020904j53459cdaj5d27b9348b99399e@mail.gmail.com> <1cd32cbb0903021104j3e47c1devb3b9e02e9a040355@mail.gmail.com> <827183970903021115m6eb81aeby8e3391d2579df22e@mail.gmail.com> Message-ID: <6a17e9ee0903022204l5985118cr4dd942ea405c68a6@mail.gmail.com> > 2009/3/2 william ratcliff > > Perhaps we could enhance the documentation so this is clear? > Also, having another module which does impose bounds on the actual values of the parameters would be useful. > My ansatz was that if the next iteration would fall outside of the bounds, stay at the current location. Another option is to use a suitable transform on the parameters so that the parameter search is carried out in an unbounded domain and the unbounded parameters transformed to bounded parameters when evaluating the objective function. Something like: Given an unbounded parameter estimate y, a bounded estimate x with a < x < b is given by x = ( (b*exp(y) + a) / (1 + exp(y)) ) This is based on a variation of the logit transform (e.g. http://en.wikipedia.org/wiki/Logit) For a < x < b, y is unbounded if y = ln( (x - a) / (b - x) ) I don't know if there are any subtle problems with using this approach, especially when x is close to the bounds a & b. I've found it to work well in the past. Cheers, Scott From stefan at sun.ac.za Tue Mar 3 03:27:35 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 3 Mar 2009 10:27:35 +0200 Subject: [SciPy-dev] Continuous Wavelet Transform for SciPy In-Reply-To: References: Message-ID: <9457e7c80903030027l7b83ba9ena62a67158f8c64ca@mail.gmail.com> Hi Sean 2009/3/3 Sean Arms : > ???? The point of this email is to introduce myself and let the SciPy dev > community know that I am willing to help develop CWT support for SciPy - > I'll already be doing the work for my research, so I might as well put in > the extra effort to make is usable by the larger community! Welcome! I am very glad to hear that you'll be working on wavelet support (hey, that's actually a domain-specific pun :-), which I believe to be a very worthwhile endeavour! Regards St?fan From sturla at molden.no Tue Mar 3 05:42:38 2009 From: sturla at molden.no (Sturla Molden) Date: Tue, 03 Mar 2009 11:42:38 +0100 Subject: [SciPy-dev] Question: general heap implementation versus specific? In-Reply-To: References: <6c17e6f50903021940y397605aamc3b9aac6dbbd2aba@mail.gmail.com> Message-ID: <49AD099E.6010106@molden.no> On 3/3/2009 5:24 AM, Anne Archibald wrote: > While C++ templates should be able to help with this sort of thing, I > think as long as one is sticking to C and to cython, there's no good > solution to making a general-enough library. Of course, if you want a > fully-flexible heap at the python level, that exists already. But I > think, annoying as it is, we're stuck reimplementing heaps. Python is a fantastic text processor though. It is easy to set up an ad hoc template system in Python. In its simplest form: >>> code = """ cdef T1 foobar(T2 *t): return t[0] """ >>> print code.replace('T1', 'int').replace('T2', 'double') cdef int foobar(double *t): return t[0] Sturla Molden From cournape at gmail.com Tue Mar 3 06:36:54 2009 From: cournape at gmail.com (David Cournapeau) Date: Tue, 3 Mar 2009 20:36:54 +0900 Subject: [SciPy-dev] Question: general heap implementation versus specific? In-Reply-To: References: <6c17e6f50903021940y397605aamc3b9aac6dbbd2aba@mail.gmail.com> Message-ID: <5b8d13220903030336u644a47a0s7922c0fd195d0b38@mail.gmail.com> On Tue, Mar 3, 2009 at 1:24 PM, Anne Archibald wrote: > > I wrote the heap implementation in cKDTree, and I wrestled with the > same question. As JWZ put it in the XKeyCaps source "I'd just like to > take this moment to point out that C has all the expressive power of > two dixie cups and a string." The problem is really one of writing > generic data structures in C. There are partial solutions to this problem, I don't know if you are aware of them: - autogen: http://www.gnu.org/software/autogen/ - Python pre-processing (using for example the template system we have in numpy for .src files). Autogen is quite nice (but runs only on unix/cygwin). cheers, David From sturla at molden.no Tue Mar 3 07:22:52 2009 From: sturla at molden.no (Sturla Molden) Date: Tue, 03 Mar 2009 13:22:52 +0100 Subject: [SciPy-dev] Question: general heap implementation versus specific? In-Reply-To: <5b8d13220903030336u644a47a0s7922c0fd195d0b38@mail.gmail.com> References: <6c17e6f50903021940y397605aamc3b9aac6dbbd2aba@mail.gmail.com> <5b8d13220903030336u644a47a0s7922c0fd195d0b38@mail.gmail.com> Message-ID: <49AD211C.3020807@molden.no> On 3/3/2009 12:36 PM, David Cournapeau wrote: > There are partial solutions to this problem, I don't know if you are > aware of them: > - autogen: http://www.gnu.org/software/autogen/ > - Python pre-processing (using for example the template system we > have in numpy for .src files). Yet another option (if you will excuse the profanity) is to make Cython generate C++. Cython allows C name specifiers that can be used to get C++ syntax compiled into the output. # template # void foobar(T arg); ctypedef void templateT_void "template void" ctypedef int T cdef templateT_void foobar(T arg): #whatever # template # T foobar(T arg); ctypedef T templateT_T "template T" cdef templateT_T foobar(T arg): #whatever Presumably, the C naming syntax of Cython should also make it possible to use C++ template container classes from STL or Boost. Sturla Molden From ajvogel at tuks.co.za Tue Mar 3 07:45:09 2009 From: ajvogel at tuks.co.za (Adolph J. Vogel) Date: Tue, 3 Mar 2009 14:45:09 +0200 Subject: [SciPy-dev] Advice on Simulated Annealing (ticket #875) Message-ID: <200903031445.09934.ajvogel@tuks.co.za> Josef >I checked again and neither anneal nor brute have any tests. Are there >any good or classical test cases for global optimization? I have "A Collection of Test Problems for Constrained Global Optimization Algorithms" by C.A Floudas on my desk at the moment. Maybe they can be used for tests for the algorithms. If you`d like I can type up some of the problems and relay them to list? Regards Adolph -- Adolph J. Vogel BEng(Mech) -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Tue Mar 3 08:07:30 2009 From: cournape at gmail.com (David Cournapeau) Date: Tue, 3 Mar 2009 22:07:30 +0900 Subject: [SciPy-dev] talkbox scikit In-Reply-To: <49A93327.9000701@mur.at> References: <49A93327.9000701@mur.at> Message-ID: <5b8d13220903030507r19c8803yf40f256de9331f84@mail.gmail.com> Hi Georg On Sat, Feb 28, 2009 at 9:50 PM, Georg Holzmann wrote: > Hallo David ! > > I want to contribute some code to the talkbox scikit and have some > questions. Great. > > - First, is this scikit also for audio signal processing / audio (music) > feature extraction, or mainly for speech only ? No, no, music is definitely welcomed. Actually, I only do speech because it pays my bill :) And a talkox is mainly used for music, after all. > - Second, I have implemented some (random) code for audio signal > processing which IMHO would be nice to have in a scikit: > > * Implementation of a Generalized Cross Correlation (GCC) with various > pre-whitening filters. > (after "The Generalized Correlation Method for Estimation of Time Delay" > by Charles Knapp and Clifford Carter, programmed with looking at the > matlab GCC implementation by Davide Renzi) > this function is used for robustly determine the time delay between two > real signals > > * Equivalent Rectangular Bandwidth Filter Coefficients for biquad IIR > Filters. > (implemented after "An Efficient Implementation of the > Patterson-Holdsworth Auditory Filter Bank" by Malcolm Slaney) > > * Filter coefficients for a bank of Gammatone filters. > (implemented after "An Efficient Implementation of the > Patterson-Holdsworth Auditory Filter Bank" by Malcolm Slaney) > Implementation also with multiple biquad filters, to avoid numerical > unstabilities. > > * Common filter parameters for audio biquad IIR filters (after "Cookbook > formulae for audio EQ biquad filter coefficients", > http://www.musicdsp.org/files/Audio-EQ-Cookbook.txt) > > * Conversion of linear IIR filter parameters to a minimum phase filter > with the same amplitude response. > > * MFCC feature extraction (but I have seen that you already have > implemented mfccs...) > > * I plan to implement more audio/music feature extraction methods in > near future (chroma features, beat features, beat-synchronous features ...) All this sounds great. I don't have much time to work on talkbox at the moment, so I won't be able to review in detail your code. I know more or less what I want to see in the scikits (all the above fits it), but I don't know yet how to organize. There are only two big requirements: - I do want a pure python implementation for everything (with optional C/Cython). - It should be under the BSD. I hope that at least some of it will be included in scipy at some point. > > - In which categories should I put all these ? So I propose all the > filter parameter calculations in talkbox/fbanks/, feature extraction > methods of course into talkbox/features/ and the generalized cross > correlation maybe into talkbox/tools/correlations.py, or maybe in a > seperate file ... ? The organization is quite messy ATM, I have not thought too much about it. The difference between features and fbanks is not clear, for example. The good news is that I may well be the only user of talkbox for now, so if you have a better suggestion, we can break things, > > - And last but not least, is this the right mailing list for such > discussions ;) ? Or are there any special lists for scikits No special scikits ML, no, I think it is the right place, David From josef.pktd at gmail.com Tue Mar 3 08:52:49 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 3 Mar 2009 08:52:49 -0500 Subject: [SciPy-dev] talkbox scikit In-Reply-To: <5b8d13220903030507r19c8803yf40f256de9331f84@mail.gmail.com> References: <49A93327.9000701@mur.at> <5b8d13220903030507r19c8803yf40f256de9331f84@mail.gmail.com> Message-ID: <1cd32cbb0903030552i7271ac2bp3d799a2998287e18@mail.gmail.com> >> - Second, I have implemented some (random) code for audio signal >> processing which IMHO would be nice to have in a scikit: >> >> * Implementation of a Generalized Cross Correlation (GCC) with various >> pre-whitening filters. >> (after "The Generalized Correlation Method for Estimation of Time Delay" >> by Charles Knapp and Clifford Carter, programmed with looking at the >> matlab GCC implementation by Davide Renzi) >> this function is used for robustly determine the time delay between two >> real signals > > There are only two big requirements: > ?- I do want a pure python implementation for everything (with > optional C/Cython). > ?- It should be under the BSD. I hope that at least some of it will be > included in scipy at some point. > matlab GCC implementation by Davide Renzi is GPL so depending how it is implemented, this might be a problem Josef From david at ar.media.kyoto-u.ac.jp Tue Mar 3 09:01:58 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 03 Mar 2009 23:01:58 +0900 Subject: [SciPy-dev] SVN and TRAC migrations starting NOW Message-ID: <49AD3856.3090407@ar.media.kyoto-u.ac.jp> Dear Numpy and Scipy developers, We are now starting the svn and trac migrations to new servers: - The svn repositories of both numpy and scipy are now unavailable, and should be available around 16:00 UTC (3rd March 2009). You will then be able to update/commit again. - Trac for numpy and scipy are also unavailable. We will send an email when everything will be backed up, The Scipy website administrators From josef.pktd at gmail.com Tue Mar 3 09:20:42 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 3 Mar 2009 09:20:42 -0500 Subject: [SciPy-dev] Advice on Simulated Annealing (ticket #875) In-Reply-To: <200903031445.09934.ajvogel@tuks.co.za> References: <200903031445.09934.ajvogel@tuks.co.za> Message-ID: <1cd32cbb0903030620s2d5db0d2s669a441882ee1998@mail.gmail.com> On Tue, Mar 3, 2009 at 7:45 AM, Adolph J. Vogel wrote: > Josef > >>I checked again and neither anneal nor brute have any tests. Are there >>any good or classical test cases for global optimization? > > I have "A Collection of Test Problems for Constrained Global Optimization > Algorithms" by C.A Floudas on my desk at the moment. Maybe they can be used > for tests for the algorithms. > > If you`d like I can type up some of the problems and relay them to list? > > Regards Adolph > > -- > Adolph J. Vogel BEng(Mech) > Yes, this would be very useful, both for quality control in scipy and making refactoring more reliable. With nose testing it is very easy to convert an example to a test if you know the correct answer: from numpy.testing import assert_array_almost_equal # or similar ....(calculate result) assert_array_almost_equal(result, expected_result, decimal) For examples and tests, a few easy and a few "tough" problems would be enough. It would be good if some of the examples can also be solved by unconstrained optimizers or if the constraints can be included through transformation of the parameters to test the current implementation. It would also be very helpful to report any problems or limitations with the current code that your examples and tests might uncover, either on the mailing list or by opening a ticket. Thank you, Josef From pgmdevlist at gmail.com Tue Mar 3 11:14:55 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 3 Mar 2009 11:14:55 -0500 Subject: [SciPy-dev] SVN server down ? Message-ID: <0248775B-F40E-45D0-A9F6-AA556B93CFBF@gmail.com> All, I'm trying to commit some changes to the scikits I'm in charge of, but I get this error message: """ svn: Server sent unexpected return value (405 Method Not Allowed) in response to PROPFIND request """ Is the server down or am I doing something wrong on my side ? Thx a lot in advance for any help. P. From robince at gmail.com Tue Mar 3 11:21:11 2009 From: robince at gmail.com (Robin) Date: Tue, 3 Mar 2009 16:21:11 +0000 Subject: [SciPy-dev] SVN server down ? In-Reply-To: <0248775B-F40E-45D0-A9F6-AA556B93CFBF@gmail.com> References: <0248775B-F40E-45D0-A9F6-AA556B93CFBF@gmail.com> Message-ID: I think svn is in the process of being moved to a new server: http://projects.scipy.org/scipy/numpy """ SciPy infrastructure upgrade: 3 March 2009, 14:00 UTC. All changes to Trac, SVN or other hosted resources during this period may be lost. """ Cheers Robin On Tue, Mar 3, 2009 at 4:14 PM, Pierre GM wrote: > All, > I'm trying to commit some changes to the scikits I'm in charge of, but > I get this error message: > """ > svn: Server sent unexpected return value (405 Method Not Allowed) in > response to PROPFIND request > """ > Is the server down or am I doing something wrong on my side ? > Thx a lot in advance for any help. > P. > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-dev > From david at ar.media.kyoto-u.ac.jp Tue Mar 3 11:11:21 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 04 Mar 2009 01:11:21 +0900 Subject: [SciPy-dev] SVN server down ? In-Reply-To: <0248775B-F40E-45D0-A9F6-AA556B93CFBF@gmail.com> References: <0248775B-F40E-45D0-A9F6-AA556B93CFBF@gmail.com> Message-ID: <49AD56A9.7040302@ar.media.kyoto-u.ac.jp> Pierre GM wrote: > All, > I'm trying to commit some changes to the scikits I'm in charge of, but > I get this error message: > """ > svn: Server sent unexpected return value (405 Method Not Allowed) in > response to PROPFIND request > """ > Is the server down Yes, we are migrating both trac and svn to new servers. It should be back up soon, David From pgmdevlist at gmail.com Tue Mar 3 11:29:23 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 3 Mar 2009 11:29:23 -0500 Subject: [SciPy-dev] SVN server down ? In-Reply-To: <49AD56A9.7040302@ar.media.kyoto-u.ac.jp> References: <0248775B-F40E-45D0-A9F6-AA556B93CFBF@gmail.com> <49AD56A9.7040302@ar.media.kyoto-u.ac.jp> Message-ID: <404604BA-1789-45AA-8D8B-C6FA2C4C54FF@gmail.com> OK, thanks a lot for the quick answers. I'll try a bit later. On Mar 3, 2009, at 11:11 AM, David Cournapeau wrote: > Pierre GM wrote: >> All, >> I'm trying to commit some changes to the scikits I'm in charge of, >> but >> I get this error message: >> """ >> svn: Server sent unexpected return value (405 Method Not Allowed) in >> response to PROPFIND request >> """ >> Is the server down > > Yes, we are migrating both trac and svn to new servers. It should be > back up soon, > > David > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-dev From grh at mur.at Tue Mar 3 12:45:27 2009 From: grh at mur.at (Georg Holzmann) Date: Tue, 03 Mar 2009 18:45:27 +0100 Subject: [SciPy-dev] talkbox scikit In-Reply-To: <5b8d13220903030507r19c8803yf40f256de9331f84@mail.gmail.com> References: <49A93327.9000701@mur.at> <5b8d13220903030507r19c8803yf40f256de9331f84@mail.gmail.com> Message-ID: <49AD6CB7.1080800@mur.at> Hallo! > No, no, music is definitely welcomed. Actually, I only do speech > because it pays my bill :) And a talkox is mainly used for music, > after all. OK, thats fine ;) > There are only two big requirements: > - I do want a pure python implementation for everything (with > optional C/Cython). > - It should be under the BSD. I hope that at least some of it will be > included in scipy at some point. Yes I know - no problems ! > The organization is quite messy ATM, I have not thought too much about > it. The difference between features and fbanks is not clear, for > example. The good news is that I may well be the only user of talkbox > for now, so if you have a better suggestion, we can break things, OK, so I will put all the filter parameter stuff into fbanks ... LG Georg From grh at mur.at Tue Mar 3 12:46:46 2009 From: grh at mur.at (Georg Holzmann) Date: Tue, 03 Mar 2009 18:46:46 +0100 Subject: [SciPy-dev] talkbox scikit In-Reply-To: <1cd32cbb0903030552i7271ac2bp3d799a2998287e18@mail.gmail.com> References: <49A93327.9000701@mur.at> <5b8d13220903030507r19c8803yf40f256de9331f84@mail.gmail.com> <1cd32cbb0903030552i7271ac2bp3d799a2998287e18@mail.gmail.com> Message-ID: <49AD6D06.3040409@mur.at> Hallo! > matlab GCC implementation by Davide Renzi is GPL > > so depending how it is implemented, this might be a problem Hm, I mean this is no direct copy of his implementation - I just tried to use the same interface. (however, the implementation is of course somehow similar ...) But I don't think that this should be a problem ... LG Georg From david at ar.media.kyoto-u.ac.jp Tue Mar 3 13:13:18 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 04 Mar 2009 03:13:18 +0900 Subject: [SciPy-dev] talkbox scikit In-Reply-To: <49AD6D06.3040409@mur.at> References: <49A93327.9000701@mur.at> <5b8d13220903030507r19c8803yf40f256de9331f84@mail.gmail.com> <1cd32cbb0903030552i7271ac2bp3d799a2998287e18@mail.gmail.com> <49AD6D06.3040409@mur.at> Message-ID: <49AD733E.30109@ar.media.kyoto-u.ac.jp> Georg Holzmann wrote: > (however, the implementation is of course somehow similar ...) > > But I don't think that this should be a problem ... > I think it does. IANAL, but at least morally, I would be pretty pissed to release a GPL code and seeing a BSD code with same interface and "similar" implementation afterwards. I think you should check with the original developer - my experience is that at least in academia, people who release things under the GPL do not mind releasing under the BSD (they chose GPL by default). This is of course not always the case, though, David From grh at mur.at Tue Mar 3 13:57:55 2009 From: grh at mur.at (Georg Holzmann) Date: Tue, 03 Mar 2009 19:57:55 +0100 Subject: [SciPy-dev] talkbox scikit In-Reply-To: <49AD733E.30109@ar.media.kyoto-u.ac.jp> References: <49A93327.9000701@mur.at> <5b8d13220903030507r19c8803yf40f256de9331f84@mail.gmail.com> <1cd32cbb0903030552i7271ac2bp3d799a2998287e18@mail.gmail.com> <49AD6D06.3040409@mur.at> <49AD733E.30109@ar.media.kyoto-u.ac.jp> Message-ID: <49AD7DB3.5060804@mur.at> Hallo! > I think it does. IANAL, but at least morally, I would be pretty pissed > to release a GPL code and seeing a BSD code with same interface and > "similar" implementation afterwards. hm, I can of course change the interface (and don't mention the author of the code, just the author of the paper) - but that sounds somehow silly to me ... > I think you should check with the original developer - my experience is > that at least in academia, people who release things under the GPL do > not mind releasing under the BSD (they chose GPL by default). This is of > course not always the case, though, OK - I will. But is it enough if I write him a mail if that is ok and he says yes ? Or must this be in a more formal way ... LG Georg > > David > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-dev From pwang at enthought.com Tue Mar 3 14:05:50 2009 From: pwang at enthought.com (Peter Wang) Date: Tue, 3 Mar 2009 13:05:50 -0600 Subject: [SciPy-dev] SVN and Trac servers are back up In-Reply-To: <49AD3856.3090407@ar.media.kyoto-u.ac.jp> References: <49AD3856.3090407@ar.media.kyoto-u.ac.jp> Message-ID: <67278550-7BBB-4499-B578-CC05702533ED@enthought.com> Hi everyone, We have moved the scipy and numpy Trac and SVN servers to a new machine. We have also moved the scikits SVN repository, but not its Trac (scipy.org/scipy/scikits). The SVN repositories for wavelets, mpi4py, and other projects that are hosted on scipy have not been moved yet, and will be temporarily unavailable until we get them moved over. Please poke around (gently!) and let us know if you experience any broken links, incorrect redirects, and the like. A few things to note: - The URLs for the trac pages have been simplified to: http://projects.scipy.org/numpy http://projects.scipy.org/scipy You should be seemlessly redirected to these sites if you try to access any of the old URLs (which were of the form /scipy/scipy/ or / scipy/numpy/). - The mailman archives and listinfo pages should now redirect to mail.scipy.org/mailman/ and mail.scipy.org/pipermail/. Again, this should be seemless, so if you experience any difficulties please let us know. Thanks, Peter, Stefan, and David From david at ar.media.kyoto-u.ac.jp Tue Mar 3 13:53:19 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 04 Mar 2009 03:53:19 +0900 Subject: [SciPy-dev] talkbox scikit In-Reply-To: <49AD7DB3.5060804@mur.at> References: <49A93327.9000701@mur.at> <5b8d13220903030507r19c8803yf40f256de9331f84@mail.gmail.com> <1cd32cbb0903030552i7271ac2bp3d799a2998287e18@mail.gmail.com> <49AD6D06.3040409@mur.at> <49AD733E.30109@ar.media.kyoto-u.ac.jp> <49AD7DB3.5060804@mur.at> Message-ID: <49AD7C9F.5030703@ar.media.kyoto-u.ac.jp> Georg Holzmann wrote: > > hm, I can of course change the interface (and don't mention the author > of the code, just the author of the paper) - but that sounds somehow > silly to me ... > Again, IANAL, but I don't think it changes anything at this point (changing the interface or not). GPL says that any derivative work that you distribute must be GPL itself. Something with the same API and "somewhat" similar may well qualify as derivative work. This is annoying, but we have to be careful with those 'rules' - that's the only reason why we cannot use things like the gsl, or R code, even though those codebases are potentially very useful to us (and ours to them, maybe). > OK - I will. But is it enough if I write him a mail if that is ok and he > says yes ? > >From a "moral" POV, it is enough for me - I have done exactly this in the past. Since the code is open source, I think asking for permission is a kind of "minimal decency". cheers, David From pav at iki.fi Tue Mar 3 14:15:49 2009 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 3 Mar 2009 19:15:49 +0000 (UTC) Subject: [SciPy-dev] SVN and Trac servers are back up References: <49AD3856.3090407@ar.media.kyoto-u.ac.jp> <67278550-7BBB-4499-B578-CC05702533ED@enthought.com> Message-ID: Tue, 03 Mar 2009 13:05:50 -0600, Peter Wang wrote: [clip: SVN etc. move] > Peter, Stefan, and David Thanks a lot for taking care of this! -- Pauli Virtanen From bsouthey at gmail.com Tue Mar 3 14:33:05 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 03 Mar 2009 13:33:05 -0600 Subject: [SciPy-dev] talkbox scikit In-Reply-To: <49AD7C9F.5030703@ar.media.kyoto-u.ac.jp> References: <49A93327.9000701@mur.at> <5b8d13220903030507r19c8803yf40f256de9331f84@mail.gmail.com> <1cd32cbb0903030552i7271ac2bp3d799a2998287e18@mail.gmail.com> <49AD6D06.3040409@mur.at> <49AD733E.30109@ar.media.kyoto-u.ac.jp> <49AD7DB3.5060804@mur.at> <49AD7C9F.5030703@ar.media.kyoto-u.ac.jp> Message-ID: <49AD85F1.7050002@gmail.com> David Cournapeau wrote: > Georg Holzmann wrote: > >> hm, I can of course change the interface (and don't mention the author >> of the code, just the author of the paper) - but that sounds somehow >> silly to me ... >> >> > > Again, IANAL, but I don't think it changes anything at this point > (changing the interface or not). GPL says that any derivative work that > you distribute must be GPL itself. Something with the same API and > "somewhat" similar may well qualify as derivative work. > > This is annoying, but we have to be careful with those 'rules' - that's > the only reason why we cannot use things like the gsl, or R code, even > though those codebases are potentially very useful to us (and ours to > them, maybe). > > >> OK - I will. But is it enough if I write him a mail if that is ok and he >> says yes ? >> >> > > >From a "moral" POV, it is enough for me - I have done exactly this in > the past. Since the code is open source, I think asking for permission > is a kind of "minimal decency". > > cheers, > > David > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-dev > Hi, Ideally you should have a provided a clean room implementation. In any case you have to be very careful here if you have actually viewed the code licensed under the GPL because in part it may imply acceptance of the license (EULA conditions). But I agree that it far better to ask for permission and see what happens. You might even be able to create a better product. You might find the various resources provide by Software Freedom Law Center at http://www.softwarefreedom.org/resources/ 'A Practical Guide to GPL Compliance' http://www.softwarefreedom.org/resources/2008/compliance-guide.html 'Maintaining Permissive-Licensed Files in a GPL-Licensed Project: Guidelines for Developers' http://www.softwarefreedom.org/resources/2007/gpl-non-gpl-collaboration.html Bruce From grh at mur.at Tue Mar 3 15:18:07 2009 From: grh at mur.at (Georg Holzmann) Date: Tue, 03 Mar 2009 21:18:07 +0100 Subject: [SciPy-dev] talkbox scikit In-Reply-To: <49AD85F1.7050002@gmail.com> References: <49A93327.9000701@mur.at> <5b8d13220903030507r19c8803yf40f256de9331f84@mail.gmail.com> <1cd32cbb0903030552i7271ac2bp3d799a2998287e18@mail.gmail.com> <49AD6D06.3040409@mur.at> <49AD733E.30109@ar.media.kyoto-u.ac.jp> <49AD7DB3.5060804@mur.at> <49AD7C9F.5030703@ar.media.kyoto-u.ac.jp> <49AD85F1.7050002@gmail.com> Message-ID: <49AD907F.6010001@mur.at> Hallo! > You might find the various resources provide by Software Freedom Law > Center at > http://www.softwarefreedom.org/resources/ > > 'A Practical Guide to GPL Compliance' > http://www.softwarefreedom.org/resources/2008/compliance-guide.html > 'Maintaining Permissive-Licensed Files in a GPL-Licensed Project: > Guidelines for Developers' > http://www.softwarefreedom.org/resources/2007/gpl-non-gpl-collaboration.html Thanks for the links ! LG Georg From dwf at cs.toronto.edu Tue Mar 3 18:16:37 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Tue, 3 Mar 2009 18:16:37 -0500 Subject: [SciPy-dev] talkbox scikit In-Reply-To: <49AD733E.30109@ar.media.kyoto-u.ac.jp> References: <49A93327.9000701@mur.at> <5b8d13220903030507r19c8803yf40f256de9331f84@mail.gmail.com> <1cd32cbb0903030552i7271ac2bp3d799a2998287e18@mail.gmail.com> <49AD6D06.3040409@mur.at> <49AD733E.30109@ar.media.kyoto-u.ac.jp> Message-ID: On 3-Mar-09, at 1:13 PM, David Cournapeau wrote: > Georg Holzmann wrote: >> (however, the implementation is of course somehow similar ...) >> >> But I don't think that this should be a problem ... >> > > I think it does. IANAL, but at least morally, I would be pretty pissed > to release a GPL code and seeing a BSD code with same interface and > "similar" implementation afterwards. I guess it depends on how he means "interface", i.e. most of scipy.spatial.distance designed to have an almost identical interface to the matlab functions with the same name, although under the hood everything is completely different. David From cournape at gmail.com Tue Mar 3 18:34:14 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 4 Mar 2009 08:34:14 +0900 Subject: [SciPy-dev] talkbox scikit In-Reply-To: References: <49A93327.9000701@mur.at> <5b8d13220903030507r19c8803yf40f256de9331f84@mail.gmail.com> <1cd32cbb0903030552i7271ac2bp3d799a2998287e18@mail.gmail.com> <49AD6D06.3040409@mur.at> <49AD733E.30109@ar.media.kyoto-u.ac.jp> Message-ID: <5b8d13220903031534k3c432da0oefdb27f062f04e60@mail.gmail.com> On Wed, Mar 4, 2009 at 8:16 AM, David Warde-Farley wrote: > On 3-Mar-09, at 1:13 PM, David Cournapeau wrote: > >> Georg Holzmann wrote: >>> (however, the implementation is of course somehow similar ...) >>> >>> But I don't think that this should be a problem ... >>> >> >> I think it does. IANAL, but at least morally, I would be pretty pissed >> to release a GPL code and seeing a BSD code with same interface and >> "similar" implementation afterwards. > > I guess it depends on how he means "interface", i.e. most of > scipy.spatial.distance designed to have an almost identical interface > to the matlab functions with the same name, although under the hood > everything is completely different. having the same interface is almost always OK - but if you looked at the implementation, you are almost guaranteed to be "tainted". According to Georg, both API and the implementation is somewhat similar. And now there is public proof that he did look at the GPL implementation for "inspiration". I would not bet this cannot be considered as derivative work. Thinking another way: say the original code is matlab code from mathworks, and we have an implementation which looks like their (same variables, same structure, public record we took a look at matlab implementation). Would you feel confident if mathworks brings you to court ? I wouldn't. In this case, since the original code is GPL, I would think there is little legal risk. Really, asking the original author is just easier :) David From charlesr.harris at gmail.com Wed Mar 4 03:09:55 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 4 Mar 2009 01:09:55 -0700 Subject: [SciPy-dev] Continuous Wavelet Transform for SciPy In-Reply-To: References: Message-ID: Hi Sean, On Mon, Mar 2, 2009 at 8:25 PM, Sean Arms wrote: > Greetings! > > My name is Sean Arms and I'm a graduate student at the University of > Oklahoma in the School of Meteorology. As part of my PhD research, I'm > studying coherent structures in atmospheric boundary layer turbulence, > primarily using in-situ observations and, secondarily, Direct Numerical > Simulation (DNS) output. One common approach for detecting coherent > structures in observational datasets relies on the use of the global wavelet > power spectrum as estimated from a continuous wavelet transform (CWT). I > know SciPy has a DWT impementation, and I've already been in contact with > Filip. He recommeded that I post my code in hopes that it would add some > momentum to the python-cwt development and create some feedback (I'm > currently looking for a good place to post my code). I've implemented the > CWT using pure python (that is, I did not write any C extension code myself > - nothing to build), along with one mother wavelet (second derivitive of a > Gaussian, or the Mexican Hat) - I'll be adding more Mother wavelets as I go > along. I've made it a point to (try to) design my MotherWavelet class to be > easily extendable. I'm working on documentation and a few tests at the > moment, but so far my code compares well with other wavelet routines. > > The point of this email is to introduce myself and let the SciPy dev > community know that I am willing to help develop CWT support for SciPy - > I'll already be doing the work for my research, so I might as well put in > the extra effort to make is usable by the larger community! > If you are running on Linux someone here should be able to give you some pointers on using the git mirror and posting your code for review. There will be a howto page at some point... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.grydeland at gmail.com Wed Mar 4 06:51:45 2009 From: tom.grydeland at gmail.com (Tom Grydeland) Date: Wed, 4 Mar 2009 12:51:45 +0100 Subject: [SciPy-dev] talkbox scikit In-Reply-To: <49A93327.9000701@mur.at> References: <49A93327.9000701@mur.at> Message-ID: On Sat, Feb 28, 2009 at 1:50 PM, Georg Holzmann wrote: > - Second, I have implemented some (random) code for audio signal > processing which IMHO would be nice to have in a scikit: > - In which categories should I put all these ? So I propose all the > filter parameter calculations in talkbox/fbanks/, feature extraction > methods of course into talkbox/features/ and the generalized cross > correlation maybe into talkbox/tools/correlations.py, or maybe in a > seperate file ... ? Just my opinion, but from the names and descriptions of these, I think several could easily be argued to belong with signal processing functionality in scipy.signal, as they do not appear to be limited to sound/speech applications. > Georg -- Tom Grydeland From grh at mur.at Wed Mar 4 08:22:00 2009 From: grh at mur.at (Georg Holzmann) Date: Wed, 04 Mar 2009 14:22:00 +0100 Subject: [SciPy-dev] talkbox scikit In-Reply-To: References: <49A93327.9000701@mur.at> Message-ID: <49AE8078.1070608@mur.at> Hallo! > Just my opinion, but from the names and descriptions of these, I think > several could easily be argued to belong with signal processing > functionality in scipy.signal, as they do not appear to be limited to > sound/speech applications. Yes, maybe ... However, maybe it is a good first step to put them into talkbox. Then people can try it and if they think it is useful for scipy, they can move it there ... LG Georg From david at ar.media.kyoto-u.ac.jp Wed Mar 4 08:11:39 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 04 Mar 2009 22:11:39 +0900 Subject: [SciPy-dev] talkbox scikit In-Reply-To: <49AE8078.1070608@mur.at> References: <49A93327.9000701@mur.at> <49AE8078.1070608@mur.at> Message-ID: <49AE7E0B.8010205@ar.media.kyoto-u.ac.jp> Georg Holzmann wrote: > Hallo! > > >> Just my opinion, but from the names and descriptions of these, I think >> several could easily be argued to belong with signal processing >> functionality in scipy.signal, as they do not appear to be limited to >> sound/speech applications. >> > > Yes, maybe ... > However, maybe it is a good first step to put them into talkbox. > Then people can try it and if they think it is useful for scipy, they > can move it there ... > Yes, that's exactly the point (and the only rationale for BSD code - using GPL would have been better to use fftw and co). Also, scipy.signal is already kind of messy and would need some code cleaning (there is a lot of old code which would benefit to be updated with recent numpy facilities, such as array iterators). I would prefer making it better before adding more code. Finally, being in scikits means the code can change as much as the author want, there is much less constraints compared to being in scipy. cheers, David From lesserwhirls at gmail.com Wed Mar 4 09:32:39 2009 From: lesserwhirls at gmail.com (Sean Arms) Date: Wed, 4 Mar 2009 08:32:39 -0600 Subject: [SciPy-dev] Continuous Wavelet Transform for SciPy In-Reply-To: References: Message-ID: Greeting Chuck, On Wed, Mar 4, 2009 at 2:09 AM, Charles R Harris wrote: > Hi Sean, > > On Mon, Mar 2, 2009 at 8:25 PM, Sean Arms wrote: >> >> Greetings! >> >> ?????? My name is Sean Arms and I'm a graduate student at the University >> of Oklahoma in the School of Meteorology.? As part of my PhD research, I'm >> studying coherent structures in atmospheric boundary layer turbulence, >> primarily using in-situ observations and, secondarily, Direct Numerical >> Simulation (DNS) output.? One common approach for detecting coherent >> structures in observational datasets relies on the use of the global wavelet >> power spectrum as estimated from a continuous wavelet transform (CWT).? I >> know SciPy has a DWT impementation, and I've already been in contact with >> Filip.? He recommeded that I post my code in hopes that it would add some >> momentum to the python-cwt development and create some feedback (I'm >> currently looking for a good place to post my code).? I've implemented the >> CWT using pure python (that is, I did not write any C extension code myself >> - nothing to build), along with one mother wavelet (second derivitive of a >> Gaussian, or the Mexican Hat) - I'll be adding more Mother wavelets as I go >> along.? I've made it a point to (try to) design my MotherWavelet class to be >> easily extendable.? I'm working on documentation and a few tests at the >> moment, but so far my code compares well with other wavelet routines. >> >> ???? The point of this email is to introduce myself and let the SciPy dev >> community know that I am willing to help develop CWT support for SciPy - >> I'll already be doing the work for my research, so I might as well put in >> the extra effort to make is usable by the larger community! > > If you are running on Linux someone here should be able to give you some > pointers on using the git mirror and posting your code for review. There > will be a howto page at some point... > > Chuck > I am running Linux and currently use SVN, but I do have GIT installed...GIT was something that I wanted to check out, so now I have the perfect reason! Sean From sturla at molden.no Wed Mar 4 11:13:59 2009 From: sturla at molden.no (Sturla Molden) Date: Wed, 04 Mar 2009 17:13:59 +0100 Subject: [SciPy-dev] Bug in PyArray_AsCArray? Message-ID: <49AEA8C7.1050407@molden.no> Shouldn't this function fail if the last dimension is strided? S.M. From oliphant at enthought.com Wed Mar 4 14:28:09 2009 From: oliphant at enthought.com (Travis E. Oliphant) Date: Wed, 04 Mar 2009 13:28:09 -0600 Subject: [SciPy-dev] Bug in PyArray_AsCArray? In-Reply-To: <49AEA8C7.1050407@molden.no> References: <49AEA8C7.1050407@molden.no> Message-ID: <49AED649.8040109@enthought.com> Sturla Molden wrote: > Shouldn't this function fail if the last dimension is strided? > Yes, probably, but maybe there is a use case. This code was adapted from the 1d case in Numeric. -Travis From pav at iki.fi Wed Mar 4 15:07:52 2009 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 4 Mar 2009 20:07:52 +0000 (UTC) Subject: [SciPy-dev] Continuous Wavelet Transform for SciPy References: Message-ID: Wed, 04 Mar 2009 08:32:39 -0600, Sean Arms wrote: [clip] > I am running Linux and currently use SVN, but I do have GIT > installed...GIT was something that I wanted to check out, so now I have > the perfect reason! This may be useful, albeit work-in-progress information: http://projects.scipy.org/numpy/wiki/GitMirror I'd suggest also adding an enhancement ticket in Scipy's Trac, http://projects.scipy.org/scipy so that we don't lose track of your work, and pasting the URL to your Git repository there. (Or just uploading your work as patches there, if you choose not to use Git. But this is more cumbersome.) Sending your patch to the code review tool http://codereview.appspot.com/ and sending the review URL to this list may also be useful to get feedback. The upload.py tool supports git: upload.py --rev=svn/trunk uploads the patch. -- Pauli Virtanen From gareth.elston.floss at googlemail.com Wed Mar 4 17:10:10 2009 From: gareth.elston.floss at googlemail.com (Gareth Elston) Date: Wed, 4 Mar 2009 22:10:10 +0000 Subject: [SciPy-dev] A module for homogeneous transformation matrices, Euler angles and quaternions Message-ID: <2352c0540903041410j263dbb4dk6d6a2662ae7c4216@mail.gmail.com> I found a nice module for these transforms at http://www.lfd.uci.edu/~gohlke/code/transformations.py.html . I've been using an older version for some time and thought it might make a good addition to numpy/scipy. I made some simple mods to the older version to add a couple of functions I needed and to allow it to be used with Python 2.4. The module is pure Python (2.5, with numpy 1.2 imported), includes doctests, and is BSD licensed. Here's the first part of the module docstring: """Homogeneous Transformation Matrices and Quaternions. A library for calculating 4x4 matrices for translating, rotating, mirroring, scaling, shearing, projecting, orthogonalizing, and superimposing arrays of homogenous coordinates as well as for converting between rotation matrices, Euler angles, and quaternions. """ I'd like to see this added to numpy/scipy so I know I've got some reading to do (scipy.org/Developer_Zone and the huge scipy-dev discussions on Scipy development infrastructure / workflow) to make sure it follows the guidelines, but where would people like to see this? In numpy? scipy? scikits? elsewhere? I seem to remember that there was a first draft of a guide for developers being written. Are there any links available? Thanks, Gareth. From wnbell at gmail.com Wed Mar 4 21:19:20 2009 From: wnbell at gmail.com (Nathan Bell) Date: Wed, 4 Mar 2009 21:19:20 -0500 Subject: [SciPy-dev] mimicking output of scipy.test() in other packages that use numpy.testing Message-ID: PyAMG uses numpy.testing for unittesting support. How do I make pyamg.test() output the SciPy and PyAMG version info as scipy.test() does below? I browsed through the numpy and scipy source trees and it's still unclear to me where this information is provided to test(). FWIW here's the pyamg trunk: http://code.google.com/p/pyamg/source/browse/#svn/trunk/pyamg >>> from pyamg import test >>> test() Running unit tests for pyamg NumPy version 1.2.1 NumPy is installed in /usr/lib/python2.5/site-packages/numpy Python version 2.5.2 (r252:60911, Oct 5 2008, 19:24:49) [GCC 4.3.2] nose version 0.10.1 >>> from scipy import test >>> test() Running unit tests for scipy NumPy version 1.2.1 NumPy is installed in /usr/lib/python2.5/site-packages/numpy SciPy version 0.7.0 SciPy is installed in /usr/lib/python2.5/site-packages/scipy Python version 2.5.2 (r252:60911, Oct 5 2008, 19:24:49) [GCC 4.3.2] nose version 0.10.1 -- Nathan Bell wnbell at gmail.com http://graphics.cs.uiuc.edu/~wnbell/ From robert.kern at gmail.com Wed Mar 4 21:24:01 2009 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 4 Mar 2009 20:24:01 -0600 Subject: [SciPy-dev] mimicking output of scipy.test() in other packages that use numpy.testing In-Reply-To: References: Message-ID: <3d375d730903041824w229b7fd2uf49c2db78360e60e@mail.gmail.com> On Wed, Mar 4, 2009 at 20:19, Nathan Bell wrote: > PyAMG uses numpy.testing for unittesting support. ?How do I make > pyamg.test() output the SciPy and PyAMG version info as scipy.test() > does below? ?I browsed through the numpy and scipy source trees and > it's still unclear to me where this information is provided to test(). It's hardcoded in nosetester.py: def _show_system_info(self): nose = import_nose() import numpy print "NumPy version %s" % numpy.__version__ npdir = os.path.dirname(numpy.__file__) print "NumPy is installed in %s" % npdir if 'scipy' in self.package_name: import scipy print "SciPy version %s" % scipy.__version__ spdir = os.path.dirname(scipy.__file__) print "SciPy is installed in %s" % spdir pyversion = sys.version.replace('\n','') print "Python version %s" % pyversion print "nose version %d.%d.%d" % nose.__versioninfo__ -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From wnbell at gmail.com Wed Mar 4 22:06:04 2009 From: wnbell at gmail.com (Nathan Bell) Date: Wed, 4 Mar 2009 22:06:04 -0500 Subject: [SciPy-dev] mimicking output of scipy.test() in other packages that use numpy.testing In-Reply-To: <3d375d730903041824w229b7fd2uf49c2db78360e60e@mail.gmail.com> References: <3d375d730903041824w229b7fd2uf49c2db78360e60e@mail.gmail.com> Message-ID: On Wed, Mar 4, 2009 at 9:24 PM, Robert Kern wrote: > > It's hardcoded in nosetester.py: > Thanks. I will settle for a monkey patch then. -- Nathan Bell wnbell at gmail.com http://graphics.cs.uiuc.edu/~wnbell/ From stefan at sun.ac.za Thu Mar 5 16:28:41 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 5 Mar 2009 23:28:41 +0200 Subject: [SciPy-dev] mimicking output of scipy.test() in other packages that use numpy.testing In-Reply-To: <3d375d730903041824w229b7fd2uf49c2db78360e60e@mail.gmail.com> References: <3d375d730903041824w229b7fd2uf49c2db78360e60e@mail.gmail.com> Message-ID: <9457e7c80903051328m109e9343xfe13767228c4b9eb@mail.gmail.com> 2009/3/5 Robert Kern : > It's hardcoded in nosetester.py: > > ? ?def _show_system_info(self): > ? ? ? ?nose = import_nose() > > ? ? ? ?import numpy > ? ? ? ?print "NumPy version %s" % numpy.__version__ > ? ? ? ?npdir = os.path.dirname(numpy.__file__) > ? ? ? ?print "NumPy is installed in %s" % npdir > > ? ? ? ?if 'scipy' in self.package_name: > ? ? ? ? ? ?import scipy > ? ? ? ? ? ?print "SciPy version %s" % scipy.__version__ > ? ? ? ? ? ?spdir = os.path.dirname(scipy.__file__) > ? ? ? ? ? ?print "SciPy is installed in %s" % spdir Having nosetester.py specially adapted for SciPy is not ideal. Can't we rather provide a hook into nosetester, and have NumPy and SciPy call it upon import? St?fan > > ? ? ? ?pyversion = sys.version.replace('\n','') > ? ? ? ?print "Python version %s" % pyversion > ? ? ? ?print "nose version %d.%d.%d" % nose.__versioninfo__ > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > ?-- Umberto Eco > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-dev > From robert.kern at gmail.com Thu Mar 5 16:37:26 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 5 Mar 2009 15:37:26 -0600 Subject: [SciPy-dev] mimicking output of scipy.test() in other packages that use numpy.testing In-Reply-To: <9457e7c80903051328m109e9343xfe13767228c4b9eb@mail.gmail.com> References: <3d375d730903041824w229b7fd2uf49c2db78360e60e@mail.gmail.com> <9457e7c80903051328m109e9343xfe13767228c4b9eb@mail.gmail.com> Message-ID: <3d375d730903051337w6dc2ff32s3ac85e8298c7782e@mail.gmail.com> On Thu, Mar 5, 2009 at 15:28, St?fan van der Walt wrote: > 2009/3/5 Robert Kern : >> It's hardcoded in nosetester.py: >> >> ? ?def _show_system_info(self): >> ? ? ? ?nose = import_nose() >> >> ? ? ? ?import numpy >> ? ? ? ?print "NumPy version %s" % numpy.__version__ >> ? ? ? ?npdir = os.path.dirname(numpy.__file__) >> ? ? ? ?print "NumPy is installed in %s" % npdir >> >> ? ? ? ?if 'scipy' in self.package_name: >> ? ? ? ? ? ?import scipy >> ? ? ? ? ? ?print "SciPy version %s" % scipy.__version__ >> ? ? ? ? ? ?spdir = os.path.dirname(scipy.__file__) >> ? ? ? ? ? ?print "SciPy is installed in %s" % spdir > > Having nosetester.py specially adapted for SciPy is not ideal. ?Can't > we rather provide a hook into nosetester, and have NumPy and SciPy > call it upon import? I wouldn't call that ideal, either. I would probably have test() functions in numpy.__init__ and scipy.__init__ which would internally import the Tester, print out these messages, then run Tester().test(*args, **kwds). I don't see a real need for this information to be in the Tester class itself, just the convenience {numpy,scipy}.test() functions. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From stefan at sun.ac.za Thu Mar 5 17:08:01 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 6 Mar 2009 00:08:01 +0200 Subject: [SciPy-dev] mimicking output of scipy.test() in other packages that use numpy.testing In-Reply-To: <3d375d730903051337w6dc2ff32s3ac85e8298c7782e@mail.gmail.com> References: <3d375d730903041824w229b7fd2uf49c2db78360e60e@mail.gmail.com> <9457e7c80903051328m109e9343xfe13767228c4b9eb@mail.gmail.com> <3d375d730903051337w6dc2ff32s3ac85e8298c7782e@mail.gmail.com> Message-ID: <9457e7c80903051408n76bd902eqc9f6ca327184dd6a@mail.gmail.com> 2009/3/5 Robert Kern : > I would probably have test() functions in numpy.__init__ and > scipy.__init__ which would internally import the Tester, print out > these messages, then run Tester().test(*args, **kwds). I don't see a > real need for this information to be in the Tester class itself, just > the convenience {numpy,scipy}.test() functions. Although, a common way of calling tests is nosetests numpy which would not play well with the above. S. From robert.kern at gmail.com Thu Mar 5 17:09:31 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 5 Mar 2009 16:09:31 -0600 Subject: [SciPy-dev] mimicking output of scipy.test() in other packages that use numpy.testing In-Reply-To: <9457e7c80903051408n76bd902eqc9f6ca327184dd6a@mail.gmail.com> References: <3d375d730903041824w229b7fd2uf49c2db78360e60e@mail.gmail.com> <9457e7c80903051328m109e9343xfe13767228c4b9eb@mail.gmail.com> <3d375d730903051337w6dc2ff32s3ac85e8298c7782e@mail.gmail.com> <9457e7c80903051408n76bd902eqc9f6ca327184dd6a@mail.gmail.com> Message-ID: <3d375d730903051409lb96997dh3731d691058672a0@mail.gmail.com> On Thu, Mar 5, 2009 at 16:08, St?fan van der Walt wrote: > 2009/3/5 Robert Kern : >> I would probably have test() functions in numpy.__init__ and >> scipy.__init__ which would internally import the Tester, print out >> these messages, then run Tester().test(*args, **kwds). I don't see a >> real need for this information to be in the Tester class itself, just >> the convenience {numpy,scipy}.test() functions. > > Although, a common way of calling tests is > > nosetests numpy > > which would not play well with the above. "nosetests numpy" doesn't even touch nosetester. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From stefan at sun.ac.za Thu Mar 5 17:16:29 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 6 Mar 2009 00:16:29 +0200 Subject: [SciPy-dev] mimicking output of scipy.test() in other packages that use numpy.testing In-Reply-To: <3d375d730903051409lb96997dh3731d691058672a0@mail.gmail.com> References: <3d375d730903041824w229b7fd2uf49c2db78360e60e@mail.gmail.com> <9457e7c80903051328m109e9343xfe13767228c4b9eb@mail.gmail.com> <3d375d730903051337w6dc2ff32s3ac85e8298c7782e@mail.gmail.com> <9457e7c80903051408n76bd902eqc9f6ca327184dd6a@mail.gmail.com> <3d375d730903051409lb96997dh3731d691058672a0@mail.gmail.com> Message-ID: <9457e7c80903051416t54c7c528v844437dc2e84def0@mail.gmail.com> 2009/3/6 Robert Kern : > "nosetests numpy" doesn't even touch nosetester. So I see now, sorry. Then I like your suggested change. Cheers St?fan From matthew.brett at gmail.com Fri Mar 6 16:30:54 2009 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 6 Mar 2009 13:30:54 -0800 Subject: [SciPy-dev] Cython / python policy Message-ID: <1e2af89e0903061330w2716f84ah49b21d48cca3291@mail.gmail.com> Hi David, and team, David, I've quoted you here on your response to an offer to submit some code to your scikit, but as a jumping off point for further discussion. > There are only two big requirements: > ?- I do want a pure python implementation for everything (with > optional C/Cython). I was just thinking of doing some Cython. Do we think that, in general, scipy code should have both C(ython) _and_ python implementations of the same thing, with different names, as for Anne's spatial package? Or, in different namespaces (scipy.mypackage.c.func, scipy.mypackage.python.func sort of thing) Or, switched by a decorator (that depends on a conditional import), like this: @replace_with('mycimplementation.func') def func(a, b): return a+b # or something (an idea from http://www.lfd.uci.edu/~gohlke/code/transformations.py.html). To me, all these seem to add maintenance overhead without much gain. We don't do that for C++ wrapping, after all... Best, Matthew From peridot.faceted at gmail.com Fri Mar 6 16:48:54 2009 From: peridot.faceted at gmail.com (Anne Archibald) Date: Fri, 6 Mar 2009 16:48:54 -0500 Subject: [SciPy-dev] Cython / python policy In-Reply-To: <1e2af89e0903061330w2716f84ah49b21d48cca3291@mail.gmail.com> References: <1e2af89e0903061330w2716f84ah49b21d48cca3291@mail.gmail.com> Message-ID: 2009/3/6 Matthew Brett : > Hi David, and team, > > David, I've quoted you here on your response to an offer to submit > some code to your scikit, but as a jumping off point for further > discussion. > >> There are only two big requirements: >> ?- I do want a pure python implementation for everything (with >> optional C/Cython). > > I was just thinking of doing some Cython. > > Do we think that, in general, scipy code should have both C(ython) > _and_ python implementations of the same thing, with different names, > as for Anne's spatial package? Just let me chime in here to point out two things: * The python implementation predates the cython implementation, and in fact the cython implementation began as a conversion of the python implementation. I'm not necessarily advocating this, just describing how I did it. * The python implementation provides additional functionality at the expense of speed. In particular, if people want to write additional algorithms, particularly those that involve annotating the kd-tree, the python implementation makes this much easier that the cython implementation. I don't think it's worth providing a python implementation solely for those who can't compile the cython one; after all, cython modules are distributed as C, and if a user can't compile C they much of scipy breaks, But for certain things a python implementation allows more flexibility: that's why, for example, for years python supported both pickle and cPickle. Anne > Or, in different namespaces (scipy.mypackage.c.func, > scipy.mypackage.python.func sort of thing) > > Or, switched by a decorator (that depends on a conditional import), like this: > > @replace_with('mycimplementation.func') > def func(a, b): > ? ?return a+b # or something > > (an idea from http://www.lfd.uci.edu/~gohlke/code/transformations.py.html). > > To me, all these seem to add maintenance overhead without much gain. > We don't do that for C++ wrapping, after all... > > Best, > > Matthew > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From hoytak at cs.ubc.ca Fri Mar 6 17:08:44 2009 From: hoytak at cs.ubc.ca (Hoyt Koepke) Date: Fri, 6 Mar 2009 14:08:44 -0800 Subject: [SciPy-dev] Cython / python policy In-Reply-To: References: <1e2af89e0903061330w2716f84ah49b21d48cca3291@mail.gmail.com> Message-ID: <4db580fd0903061408h62ae1d43h4146d5494bf95533@mail.gmail.com> Hello, While I'm not a regular scipy contributer, in my research I use scipy a lot and use cython a lot. From personal experience, I have found that the *small* bit of extra work involved in cythonizing your code almost always pays off unless speed really doesn't matter. Additionally, cython tries hard to generate maximally portable C code, and has good mechanisms for exception handling, etc. that are very tedious to implement in straight C code. So including more cython code, and even cythonizing some existing code, gets a +1 from me. Along with what Anne said, I think that ease of subclassing is the only real argument for having both python and cython implementations of the same thing floating around (besides, perhaps, backwards compatibility if that's an issue). It seems that sticking to the "justified by use cases" reasoning that python itself employs here is appropriate -- i.e. Is there a use case common enough, and where subclassing would significantly be more difficult than other ways around it, to justify the added complexity of two solutions? Note that cython actually improves this subclassing issue significantly over straight c or fortran extensions. If there is a policy that all cython extension type code must have an accompanying .pxd definition file, one can simply subclass that type in cython without much difficulty. Just a few thoughts... --Hoyt ++++++++++++++++++++++++++++++++++++++++++ + Hoyt Koepke + University of Washington Department of Statistics + http://www.stat.washington.edu/~hoytak/ + hoytak at gmail.com ++++++++++++++++++++++++++++++++++++++++++ From sturla at molden.no Fri Mar 6 20:55:09 2009 From: sturla at molden.no (Sturla Molden) Date: Sat, 07 Mar 2009 02:55:09 +0100 Subject: [SciPy-dev] Cython / python policy In-Reply-To: <1e2af89e0903061330w2716f84ah49b21d48cca3291@mail.gmail.com> References: <1e2af89e0903061330w2716f84ah49b21d48cca3291@mail.gmail.com> Message-ID: <49B1D3FD.3000700@molden.no> Matthew Brett wrote: > I was just thinking of doing some Cython. > Do we think that, in general, scipy code should have both C(ython) > _and_ python implementations of the same thing, with different names, > as for Anne's spatial package?y-dev > I think SciPy should adopt the same policy as Python 3. That is, first import a Python module, then try to import a C module on top of that, replacing names. Having two versions with similar functionality but different name in Python and C (e.g. pickle and cPickle, profile and cProfile, KDTree and cKDTree) is not a good idea. SciPy will move to Python 3 at some point anyway, and doing this differently from other Python packages would be confusing. It is less messy this way as well, as we avoid subtle differences between the C and Python objects - any difference would be considered a bug. It would also mean that anything is prototyped in Python first, before migration to C. Hopefully it would also limit the use of C to the parts that really benefit from migration, as the working Python code is always written first. To use Anne's kd-tree as an example, a tentative organization of the code would be a Python module kdtree.py with a full Python implementation of teh kd-tree, and a Cython file ckdtree.pyx with a partial Cython re-implementation. In kdtree.py: class baseKDTree(object): # full python implementation here try: from ckdtree import KDTree except ImportError: class KDTree(baseKDTree): pass __all__ = ['KDTree'] In ckdtree.pyx: from kdtree import baseKDTree class KDTree(baseKDTree): # Cython re-implementation of the slowest # parts of the base class, the rest we keep # in Python. Sturla Molden From avi at sicortex.com Sat Mar 7 11:54:20 2009 From: avi at sicortex.com (Avi Purkayastha) Date: Sat, 7 Mar 2009 10:54:20 -0600 Subject: [SciPy-dev] help on fortran compiler Message-ID: Hi, I am trying to build scipy v0.7.0 on SiCortex platform (MIPS64 architecture on linux). However the problem is that the script does not recognize the pathscale fortran compiler I am passing.. The build step was % python setup.py build --fcompiler=pathf95 After the build process proceeds for a while, I get.. : don't know how to compile Fortran code on platform 'posix' with 'pathf95' compiler. Supported compilers are: compaq,none,absoft,nag,gnu,sun,lahey,intelem,gnu95,intelv,g95,intele,int elev,pg,intel,mips,hpux,vast,ibm) building 'dfftpack' library error: library dfftpack has Fortran sources but no Fortran compiler found How do I force the script to accept pathf95 since I have no other fortran compilers? Both BLAS and LAPACK have been built with pathscale and that is recognized in the build step. : ATLAS version 3.7.32 built by root on Fri Jul 25 11:31:20 EDT 2008: UNAME : Linux sf1-m0n0.scsystem 2.6.18.8-sc-lustre-perfmon #1 SMP Fri Jul 4 18:40:20 Local time zone must be set--see zic m mips64 SiCortex ICE9A V1.0 FPU V0.1 SiCortex SC-1000 GNU/Linux INSTFLG : -1 0 -a 1 ARCHDEFS : -DATL_OS_Linux -DATL_ARCH_MIPSICE9 -DATL_USE64BITS F2CDEFS : -DAdd__ -DF77_INTEGER=int -DStringSunStyle CACHEEDGE: 131072 F77 : pathf95, version F77FLAGS : -mabi=64 SMC : gcc, version gcc (GCC) 4.1.2 (Gentoo 4.1.2) SMCFLAGS : -O2 -mips64 -march=5kf -mtune=5kf -fno-schedule-insns - fschedule-insns2 -fno-peephole -fno-peephole2 -mabi=64 SKC : gcc, version gcc (GCC) 4.1.2 (Gentoo 4.1.2) SKCFLAGS : -O2 -mips64 -march=5kf -mtune=5kf -fno-schedule-insns - fschedule-insns2 -fno-peephole -fno-peephole2 -mabi=64 success! : Here are some other informations.. 1) (sc1-m3n6:~/builds/scipy-0.7.0) avi% uname -a Linux sc1-m3n6.scsystem 2.6.18.8-sc-lustre-perfmon #1 SMP Thu Sep 25 13:19:45 EDT 2008 mips64 SiCortex ICE9B V1.0 FPU V0.1 SiCortex SC-1000 GNU/Linux 2) (sc1-m3n6:~/builds/scipy-0.7.0) avi% python -c 'import os,sys;print os.name,sys.platform' posix linux2 II) I have a second issue once the fortran issue is resolved: we do not have dfftpack or umfpack built on our system. Does scipy build them, if not found in our library stack, so can I force the script to ignore dfftpack and umfpack? Thanks for any help and suggestions. -- Avi -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sat Mar 7 12:16:38 2009 From: cournape at gmail.com (David Cournapeau) Date: Sun, 8 Mar 2009 02:16:38 +0900 Subject: [SciPy-dev] Cython / python policy In-Reply-To: <1e2af89e0903061330w2716f84ah49b21d48cca3291@mail.gmail.com> References: <1e2af89e0903061330w2716f84ah49b21d48cca3291@mail.gmail.com> Message-ID: <5b8d13220903070916w2b5649b4mb817b074638624cd@mail.gmail.com> On Sat, Mar 7, 2009 at 6:30 AM, Matthew Brett wrote: > Hi David, and team, > > David, I've quoted you here on your response to an offer to submit > some code to your scikit, but as a jumping off point for further > discussion. > >> There are only two big requirements: >> ?- I do want a pure python implementation for everything (with >> optional C/Cython). > > I was just thinking of doing some Cython. > > Do we think that, in general, scipy code should have both C(ython) > _and_ python implementations of the same thing, with different names, > as for Anne's spatial package? I think we should have as much python code as possible. We have too much C/C++/Fortran code already, and this is order of magnitude harder to maintain than python. As long as the original writer is there, it is ok, but when he is not available anymore, it can become a problem. When you have semi working code in C (or worse fortran or C++), without documentation, it is almost guaranteed to become unmaintainable short of rewriting. Even cython has problems - only for it cannot support complex number or other features of numpy as well as python code. Having a pure python code is even more useful than test in my experience. > > Or, in different namespaces (scipy.mypackage.c.func, > scipy.mypackage.python.func sort of thing) I have tried with different namespace and package names: it does not work very well. I agree with Sturla that we should follow the python way of doing things if possible. But there is still the problem of python implementation which does more than the C implementation.I have this case in my scikits, because cython does not have native complex number handling yet, so my routine for Levinson Durbin (a well known method for inversion of toeplitz matrix) works for real and complex number in python, but only for real in cython (the cython version is order of magnitude faster though - the code cannot be vectorized efficiently). cheers, David From pav at iki.fi Sat Mar 7 18:56:09 2009 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 7 Mar 2009 23:56:09 +0000 (UTC) Subject: [SciPy-dev] Git-svn sub-optimality References: <1e2af89e0902241059r145a10d3n118745ac24f80a7b@mail.gmail.com>

<6ce0ac130902252321j6139238by634364acd2bd07b2@mail.gmail.com> <49A6D913.9040809@enthought.com> <6ce0ac130902261240y3596278fo30693766a0194d5@mail.gmail.com> <9457e7c80902261307m1d4cdedckc7df633763a9b29d@mail.gmail.com> <9457e7c80902261448y2719b96bg8225a0717969eedd@mail.gmail.com> <5b8d13220902262023i675a4bc4ra8bf981267fb2156@mail.gmail.com> <49B3825C.3020200@ar.media.kyoto-u.ac.jp> Message-ID: Sun, 08 Mar 2009 17:31:24 +0900, David Cournapeau wrote: > Pauli Virtanen wrote: >> Hi David (& other prospective git-svn users), >> >> I just ran into a speedbump with git-svn and using the Git mirror: >> >> http://projects.scipy.org/numpy/wiki/GitMirror#Abigfatwordofwarning >> >> In short: apparently git-svn does not automatically track SVN commits >> appearing elsewhere than locally via `git-svn fetch/rebase`. So if you >> want to `dcommit` after `git fetch`ing from the mirror or from someone >> else, the database of git-svn needs to be rebuilt: >> >> rm -rf .git/svn >> git svn rebase -l >> >> Otherwise, the `dcommit` will shove bogus changesets to SVN. (Ouch!) >> >> > Hm, do you have an example of that ? It never happened to me. My typical > usage is when I start working on numpy is: > > git svn fetch --all > git co master && git svn rebase -l > git co -b line_of_work > > When I use dcommit, it first rebase my changes on top of svn last > revision if the last svn revision differs from the on I have locally. It doesn't occur if you stick to the usual git-svn workflow of getting SVN commits via `git svn fetch/rebase` only. An example where it occurs is git fetch mirror # fetch branch from mirror or from someone else git rebase svn/trunk # rebase on it git svn dcommit -n # now try to dcommit Now git-svn thinks the current branch still corresponds to the old version, and uses that as the base for `dcommit`. However, doing `git svn rebase` does not fix the situation, since `git-svn` also thinks that the current branch is based on the latest commit, and is so up-to-date. I'll also note that if you do merges and then `dcommit` the result, this makes the tree generated by `git-svn` diverge from the one in the mirror (since in the mirror the merge commit has only one parent, but in the private tree it has two -- `git-svn fetch` doesn't track merges). -- Pauli Virtanen From ondrej at certik.cz Mon Mar 9 00:39:12 2009 From: ondrej at certik.cz (Ondrej Certik) Date: Sun, 8 Mar 2009 21:39:12 -0700 Subject: [SciPy-dev] cmake build system for scipy Message-ID: <85b5c3130903082139m7286b5e6j4ec8e4b4f1c6e97d@mail.gmail.com> Hi, I started to write cmake build system for scipy, so that it a) can compile in parallel b) compiles on all systems out of the box (currently it fails in Debian without patching) It's in my cmake branch: http://github.com/certik/scipy/tree/cmake Currently it only compiles part of the sparsetools, but it works with f2py and the templates (.src -> .f), so I am posting it here in case anyone would like to take it and finish it. Unfortunately I am very busy, so I don't know when I have time to work on it more. Is there some reason why the scipy package cannot be build inplace? There are some checks for that: ImportError: Error importing scipy: you cannot import scipy while being in scipy source directory; please exit the scipy source tree first, and relaunch your python intepreter. So I guess there are good reasons for that. cmake supports both inplace and outplace build, but as scipy doesn't seem to work inplace anyway, I'll just support the "make install" way. Ondrej From robert.kern at gmail.com Mon Mar 9 00:45:45 2009 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 8 Mar 2009 23:45:45 -0500 Subject: [SciPy-dev] cmake build system for scipy In-Reply-To: <85b5c3130903082139m7286b5e6j4ec8e4b4f1c6e97d@mail.gmail.com> References: <85b5c3130903082139m7286b5e6j4ec8e4b4f1c6e97d@mail.gmail.com> Message-ID: <3d375d730903082145j17194441n980cca5138fa4d52@mail.gmail.com> On Sun, Mar 8, 2009 at 23:39, Ondrej Certik wrote: > Hi, > > I started to write cmake build system for scipy, so that it > > a) can compile in parallel > b) compiles on all systems out of the box (currently it fails in > Debian without patching) > > It's in my cmake branch: > > http://github.com/certik/scipy/tree/cmake > > Currently it only compiles part of the sparsetools, but it works with > f2py and the templates (.src -> .f), so I am posting it here in case > anyone would like to take it and finish it. > Unfortunately I am very busy, so I don't know when I have time to work > on it more. > > Is there some reason why the scipy package cannot be build inplace? > There are some checks for that: > > ImportError: Error importing scipy: you cannot import scipy while > ? ?being in scipy source directory; please exit the scipy source > ? ?tree first, and relaunch your python intepreter. > > So I guess there are good reasons for that. cmake supports both > inplace and outplace build, but as scipy doesn't seem to work inplace > anyway, I'll just support the "make install" way. It works fine in-place using distutils. You are probably not installing the __config__.py properly. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From david at ar.media.kyoto-u.ac.jp Mon Mar 9 00:39:42 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Mon, 09 Mar 2009 13:39:42 +0900 Subject: [SciPy-dev] cmake build system for scipy In-Reply-To: <85b5c3130903082139m7286b5e6j4ec8e4b4f1c6e97d@mail.gmail.com> References: <85b5c3130903082139m7286b5e6j4ec8e4b4f1c6e97d@mail.gmail.com> Message-ID: <49B49D8E.3050608@ar.media.kyoto-u.ac.jp> Ondrej Certik wrote: > Hi, > > I started to write cmake build system for scipy, so that it > > a) can compile in parallel > b) compiles on all systems out of the box (currently it fails in > Debian without patching) > > It's in my cmake branch: > > http://github.com/certik/scipy/tree/cmake > > Currently it only compiles part of the sparsetools, but it works with > f2py and the templates (.src -> .f), so I am posting it here in case > anyone would like to take it and finish it. > Unfortunately I am very busy, so I don't know when I have time to work > on it more. > > Is there some reason why the scipy package cannot be build inplace? > There are some checks for that: > > ImportError: Error importing scipy: you cannot import scipy while > being in scipy source directory; please exit the scipy source > tree first, and relaunch your python intepreter. > The message is misleading in your case - scipy can be imported in-place, but you need more than just installing the .so. You need to recreate the scipy logic which enables the in-place import (I don't remember the details, but it boils down to installing a few .py files - the scipy __init__.py should be clear). cheers, David From cournape at gmail.com Mon Mar 9 03:23:24 2009 From: cournape at gmail.com (David Cournapeau) Date: Mon, 9 Mar 2009 16:23:24 +0900 Subject: [SciPy-dev] Git-svn sub-optimality In-Reply-To: References: <1e2af89e0902241059r145a10d3n118745ac24f80a7b@mail.gmail.com> <6ce0ac130902261240y3596278fo30693766a0194d5@mail.gmail.com> <9457e7c80902261307m1d4cdedckc7df633763a9b29d@mail.gmail.com> <9457e7c80902261448y2719b96bg8225a0717969eedd@mail.gmail.com> <5b8d13220902262023i675a4bc4ra8bf981267fb2156@mail.gmail.com> <49B3825C.3020200@ar.media.kyoto-u.ac.jp> Message-ID: <5b8d13220903090023j58bb5a0bn9c56fbd7c39d5f65@mail.gmail.com> On Sun, Mar 8, 2009 at 11:04 PM, Pauli Virtanen wrote: > Sun, 08 Mar 2009 17:31:24 +0900, David Cournapeau wrote: > >> Pauli Virtanen wrote: >>> Hi David (& other prospective git-svn users), >>> >>> I just ran into a speedbump with git-svn and using the Git mirror: >>> >>> ? ? http://projects.scipy.org/numpy/wiki/GitMirror#Abigfatwordofwarning >>> >>> In short: apparently git-svn does not automatically track SVN commits >>> appearing elsewhere than locally via `git-svn fetch/rebase`. So if you >>> want to `dcommit` after `git fetch`ing from the mirror or from someone >>> else, the database of git-svn needs to be rebuilt: >>> >>> ? ? rm -rf .git/svn >>> ? ? git svn rebase -l >>> >>> Otherwise, the `dcommit` will shove bogus changesets to SVN. (Ouch!) >>> >>> >> Hm, do you have an example of that ? It never happened to me. My typical >> usage is when I start working on numpy is: >> >> git svn fetch --all >> git co master && git svn rebase -l >> git co -b line_of_work >> >> When I use dcommit, it first rebase my changes on top of svn last >> revision if the last svn revision differs from the on I have locally. > > It doesn't occur if you stick to the usual git-svn workflow of getting > SVN commits via `git svn fetch/rebase` only. An example where it occurs is > > ? ?git fetch mirror ? ? ?# fetch branch from mirror or from someone else > ? ?git rebase svn/trunk ?# rebase on it > ? ?git svn dcommit -n ? ?# now try to dcommit Ah, yes, you should definitely stick to one and only one mirror. That's a git-svn limitation I think. I guess I was careful because of my experience with previous conversion tools (in bzr). I think the problem is linked to guaranteeing that a given commit with the same content will be recognized as such. In git proper, it is by design if the history is the same. In bzr and other systems, it has to be explicit because a commit does not depend only on the content (but on meta-data as well). That's what they call deterministic on the BzrMigration page: http://bazaar-vcs.org/BzrMigration If I look at my git-svn import and yours, the commit sha1 are not the same for the corresponding svn revision. As such, I don't see how it is possible to guarantee consistency with multiple mirrors. cheers, David From cimrman3 at ntc.zcu.cz Mon Mar 9 05:12:37 2009 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Mon, 09 Mar 2009 10:12:37 +0100 Subject: [SciPy-dev] Cython / python policy In-Reply-To: <5b8d13220903070916w2b5649b4mb817b074638624cd@mail.gmail.com> References: <1e2af89e0903061330w2716f84ah49b21d48cca3291@mail.gmail.com> <5b8d13220903070916w2b5649b4mb817b074638624cd@mail.gmail.com> Message-ID: <49B4DD85.40700@ntc.zcu.cz> David Cournapeau wrote: > On Sat, Mar 7, 2009 at 6:30 AM, Matthew Brett wrote: >> Hi David, and team, >> >> David, I've quoted you here on your response to an offer to submit >> some code to your scikit, but as a jumping off point for further >> discussion. >> >>> There are only two big requirements: >>> - I do want a pure python implementation for everything (with >>> optional C/Cython). >> I was just thinking of doing some Cython. >> >> Do we think that, in general, scipy code should have both C(ython) >> _and_ python implementations of the same thing, with different names, >> as for Anne's spatial package? > > I think we should have as much python code as possible. We have too > much C/C++/Fortran code already, and this is order of magnitude harder > to maintain than python. As long as the original writer is there, it > is ok, but when he is not available anymore, it can become a problem. > When you have semi working code in C (or worse fortran or C++), > without documentation, it is almost guaranteed to become > unmaintainable short of rewriting. > > Even cython has problems - only for it cannot support complex number > or other features of numpy as well as python code. Having a pure > python code is even more useful than test in my experience. +1 to as much pure python code as possible. After all, scipy is not "scicy". IMHO only the real performance bottlenecks (i.e. after many people complain :)) should be considered worth cythonizing/C++-ing. r. From ondrej at certik.cz Mon Mar 9 09:16:03 2009 From: ondrej at certik.cz (Ondrej Certik) Date: Mon, 9 Mar 2009 06:16:03 -0700 Subject: [SciPy-dev] least squares error Message-ID: <85b5c3130903090616p4c64204et7c70448cb97f571f@mail.gmail.com> Hi, I think each time I used leastsq, I also needed to calculate the errors in the fitted parameters. I use this method, that takes the output of leastsq and returns the parameters+errors. def calc_error(args): p, cov, info, mesg, success = args chisq=sum(info["fvec"]*info["fvec"]) dof=len(info["fvec"])-len(p) sigma = array([sqrt(cov[i,i])*sqrt(chisq/dof) for i in range(len(p))]) return p, sigma let's integrate this with leastsq? E.g. add a new key in the info dict? Ondrej From sturla at molden.no Mon Mar 9 09:26:57 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 09 Mar 2009 14:26:57 +0100 Subject: [SciPy-dev] Update memmap for Python 2.6 and 3? Message-ID: <49B51921.1050005@molden.no> In Python 2.5, mmap always memory maps from the beginning of the file. This is a problem on 32 bit systems when working on large data files. In Python 2.6, the mmap object takes an offset parameter to solve this. I suggest we do something like this in memmap.py: if float(sys.version[:3]) > 2.5: bytes = bytes - offset mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=offset) self = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm, offset=0, order=order) else: mm = mmap.mmap(fid.fileno(), bytes, access=acc) self = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm, offset=offset, order=order) Reagards, Sturla Molden -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: memmap.py URL: From josef.pktd at gmail.com Mon Mar 9 10:27:20 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 9 Mar 2009 09:27:20 -0500 Subject: [SciPy-dev] least squares error In-Reply-To: <85b5c3130903090616p4c64204et7c70448cb97f571f@mail.gmail.com> References: <85b5c3130903090616p4c64204et7c70448cb97f571f@mail.gmail.com> Message-ID: <1cd32cbb0903090727u183d14f4hc925b5d0129291b8@mail.gmail.com> On Mon, Mar 9, 2009 at 8:16 AM, Ondrej Certik wrote: > Hi, > > I think each time I used leastsq, I also needed to calculate the > errors in the fitted parameters. I use this method, that takes the > output of leastsq and returns the parameters+errors. > > def calc_error(args): > ? ?p, cov, info, mesg, success = args > ? ?chisq=sum(info["fvec"]*info["fvec"]) > ? ?dof=len(info["fvec"])-len(p) > ? ?sigma = array([sqrt(cov[i,i])*sqrt(chisq/dof) for i in range(len(p))]) > ? ?return p, sigma > > let's integrate this with leastsq? E.g. add a new key in the info dict? > > Ondrej > _______________________________________________ That's close to what the new scipy.optimize.curve_fit does, except it returns the correctly scaled covariance matrix of the parameter estimate popt, pcov = curve_fit(func, x, yn) psigma = np.sqrt(np.diag(pcov)) ) #standard deviation of parameter estimates Note: using chisq=sum(info["fvec"]*info["fvec"]) saves one function call compared to the curve_fit implementation. I would prefer that optimize.leastsq stays a low level wrapper, and the interpretation and additional statistics are produced in higher level functions, such as curve_fit. The higher level functions can take care of calculating the correct covariance of the parameter estimates for different cases, e.g. when using weights as in curve_fit, or generalized least squares, or sandwich estimates for the covariance. What I would like to see additional in optimize.leastsq is the Jacobian directly, since this can be used to test additional restrictions on the parameter estimates, or derive additional statistics. I still haven't figured out whether it is possible to recover it. Josef From pav at iki.fi Mon Mar 9 14:46:34 2009 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 9 Mar 2009 18:46:34 +0000 (UTC) Subject: [SciPy-dev] Git-svn sub-optimality References: <1e2af89e0902241059r145a10d3n118745ac24f80a7b@mail.gmail.com> <6ce0ac130902261240y3596278fo30693766a0194d5@mail.gmail.com> <9457e7c80902261307m1d4cdedckc7df633763a9b29d@mail.gmail.com> <9457e7c80902261448y2719b96bg8225a0717969eedd@mail.gmail.com> <5b8d13220902262023i675a4bc4ra8bf981267fb2156@mail.gmail.com> <49B3825C.3020200@ar.media.kyoto-u.ac.jp> <5b8d13220903090023j58bb5a0bn9c56fbd7c39d5f65@mail.gmail.com> Message-ID: Mon, 09 Mar 2009 16:23:24 +0900, David Cournapeau wrote: [clip] >> It doesn't occur if you stick to the usual git-svn workflow of getting >> SVN commits via `git svn fetch/rebase` only. An example where it occurs >> is >> >> ? ?git fetch mirror ? ? ?# fetch branch from mirror or from someone >> ? ?else git rebase svn/trunk ?# rebase on it git svn dcommit -n ? ?# >> ? ?now try to dcommit > > Ah, yes, you should definitely stick to one and only one mirror. That's > a git-svn limitation I think. This is a different issue, I believe: the commits are exactly the same, hashes match etc., but git-svn's caching just gets confused. [clip] > If I look at my git-svn import and yours, the commit sha1 are not the > same for the corresponding svn revision. As such, I don't see how it is > possible to guarantee consistency with multiple mirrors. This is because in your history, git-svn has made one of the preceding commits a merge commit in `git svn rebase`. This information can't of course be reconstructed from SVN. -- Pauli Virtanen From nwagner at iam.uni-stuttgart.de Mon Mar 9 16:37:04 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Mon, 09 Mar 2009 21:37:04 +0100 Subject: [SciPy-dev] least squares error In-Reply-To: <85b5c3130903090616p4c64204et7c70448cb97f571f@mail.gmail.com> References: <85b5c3130903090616p4c64204et7c70448cb97f571f@mail.gmail.com> Message-ID: On Mon, 9 Mar 2009 06:16:03 -0700 Ondrej Certik wrote: > Hi, > > I think each time I used leastsq, I also needed to >calculate the > errors in the fitted parameters. I use this method, that >takes the > output of leastsq and returns the parameters+errors. > > def calc_error(args): > p, cov, info, mesg, success = args > chisq=sum(info["fvec"]*info["fvec"]) > dof=len(info["fvec"])-len(p) > sigma = array([sqrt(cov[i,i])*sqrt(chisq/dof) for i >in range(len(p))]) > return p, sigma > > let's integrate this with leastsq? E.g. add a new key in >the info dict? > > Ondrej > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev Hi, there is another issue concerning least squares http://projects.scipy.org/numpy/ticket/937 Nils From josef.pktd at gmail.com Mon Mar 9 16:47:36 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 9 Mar 2009 15:47:36 -0500 Subject: [SciPy-dev] least squares error In-Reply-To: References: <85b5c3130903090616p4c64204et7c70448cb97f571f@mail.gmail.com> Message-ID: <1cd32cbb0903091347s6d233465m8dd00f6c77ef8433@mail.gmail.com> On Mon, Mar 9, 2009 at 3:37 PM, Nils Wagner wrote: > On Mon, 9 Mar 2009 06:16:03 -0700 > ?Ondrej Certik wrote: > - Show quoted text - >> Hi, >> >> I think each time I used leastsq, I also needed to >>calculate the >> errors in the fitted parameters. I use this method, that >>takes the >> output of leastsq and returns the parameters+errors. >> >> def calc_error(args): >> ? ?p, cov, info, mesg, success = args >> ? ?chisq=sum(info["fvec"]*info["fvec"]) >> ? ?dof=len(info["fvec"])-len(p) >> ? ?sigma = array([sqrt(cov[i,i])*sqrt(chisq/dof) for i >>in range(len(p))]) >> ? ?return p, sigma >> >> let's integrate this with leastsq? E.g. add a new key in >>the info dict? >> >> Ondrej >> _______________________________________________ >> Scipy-dev mailing list >> Scipy-dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev > > Hi, > > there is another issue concerning least squares > > http://projects.scipy.org/numpy/ticket/937 > > Nils ticket 937 is for numpy.linalg.lstsq while I assume Ondrej meant scipy.optimize.leastsq Namespaces would be nice. Josef From ondrej at certik.cz Mon Mar 9 16:53:48 2009 From: ondrej at certik.cz (Ondrej Certik) Date: Mon, 9 Mar 2009 13:53:48 -0700 Subject: [SciPy-dev] least squares error In-Reply-To: <1cd32cbb0903091347s6d233465m8dd00f6c77ef8433@mail.gmail.com> References: <85b5c3130903090616p4c64204et7c70448cb97f571f@mail.gmail.com> <1cd32cbb0903091347s6d233465m8dd00f6c77ef8433@mail.gmail.com> Message-ID: <85b5c3130903091353l690e6e41j2c78bda81a53e2ef@mail.gmail.com> On Mon, Mar 9, 2009 at 7:27 AM, wrote: > On Mon, Mar 9, 2009 at 8:16 AM, Ondrej Certik wrote: >> Hi, >> >> I think each time I used leastsq, I also needed to calculate the >> errors in the fitted parameters. I use this method, that takes the >> output of leastsq and returns the parameters+errors. >> >> def calc_error(args): >> p, cov, info, mesg, success = args >> chisq=sum(info["fvec"]*info["fvec"]) >> dof=len(info["fvec"])-len(p) >> sigma = array([sqrt(cov[i,i])*sqrt(chisq/dof) for i in range(len(p))]) >> return p, sigma >> >> let's integrate this with leastsq? E.g. add a new key in the info dict? >> >> Ondrej >> _______________________________________________ > > That's close to what the new scipy.optimize.curve_fit does, except it > returns the correctly scaled covariance matrix of the parameter > estimate > > popt, pcov = curve_fit(func, x, yn) > psigma = np.sqrt(np.diag(pcov)) ) #standard deviation of parameter estimates > > Note: using chisq=sum(info["fvec"]*info["fvec"]) saves one function > call compared to the curve_fit implementation. > > I would prefer that optimize.leastsq stays a low level wrapper, and > the interpretation and additional statistics are produced in higher > level functions, such as curve_fit. Yes, but given the complexity of my code above (e.g. it's trivial), I think it could also be added to leastsq to the info dict, because it already returns some stat. data. > > The higher level functions can take care of calculating the correct > covariance of the parameter estimates for different cases, e.g. when > using weights as in curve_fit, or generalized least squares, or > sandwich estimates for the covariance. > > What I would like to see additional in optimize.leastsq is the > Jacobian directly, since this can be used to test additional > restrictions on the parameter estimates, or derive additional > statistics. I still haven't figured out whether it is possible to > recover it. I thought the Jacobian is in the info dict as well. On Mon, Mar 9, 2009 at 1:47 PM, wrote: > On Mon, Mar 9, 2009 at 3:37 PM, Nils Wagner > wrote: >> Hi, >> >> there is another issue concerning least squares >> >> http://projects.scipy.org/numpy/ticket/937 >> >> Nils > > ticket 937 is for numpy.linalg.lstsq > while I assume Ondrej meant scipy.optimize.leastsq Yep. Ondrej From luis94855510 at gmail.com Tue Mar 10 01:48:41 2009 From: luis94855510 at gmail.com (Luis Saavedra) Date: Tue, 10 Mar 2009 02:48:41 -0300 Subject: [SciPy-dev] there is a bug in PyArray_CheckFromAny ? Message-ID: <49B5FF39.4000100@gmail.com> Hi, if o is a "vector" and , the following lines: po = PyArray_CheckFromAny(o,PyArray_DescrFromType(NPY_INT),0,0, NPY_FORCECAST|NPY_OUT_FARRAY|NPY_ELEMENTSTRIDES,NULL); int *ver = (int *)((PyArrayObject *)po)->data; printf("0: %d\n",ver[0]); printf("1: %d\n",ver[1]); show that: 0: 0 1: -1 regardless of the value (size>2). But if I modify the shape of "o" before the previous line, for example shape = (1,n), works fine! Regards, Luis. PD: In debian squeeze: $ dpkg -s python-numpy |grep Version Version: 1:1.2.1-1 Python-Version: 2.4, 2.5 From avi at sicortex.com Tue Mar 10 12:48:14 2009 From: avi at sicortex.com (Avi Purkayastha) Date: Tue, 10 Mar 2009 11:48:14 -0500 Subject: [SciPy-dev] build_ext fails to pick up fortran compiler Message-ID: <22A21E9F-C805-4ABC-BFD6-D36034981C59@sicortex.com> Hi, When building scipy with a new compiler (pathscale) added to the numpy*distutils*fcompiler list, the build log shows that the build process picks up the fortran compiler for parts of the build where needed except for build_ext. Some parts of the log is listed below for explanation. : customize PathScaleFCompiler customize PathScaleFCompiler using build_clib building 'dfftpack' library compiling Fortran sources Fortran f77 compiler: /opt/pathscale/ice9_native_3.2n_B_sicortex/bin/pathf95 -fixedform -O3 Fortran f90 compiler: /opt/pathscale/ice9_native_3.2n_B_sicortex/bin/pathf95 -O3 Fortran fix compiler: /opt/pathscale/ice9_native_3.2n_B_sicortex/bin/pathf95 -fixedform -O3 creating build/temp.linux-mips64-2.4 creating build/temp.linux-mips64-2.4/scipy creating build/temp.linux-mips64-2.4/scipy/fftpack creating build/temp.linux-mips64-2.4/scipy/fftpack/src creating build/temp.linux-mips64-2.4/scipy/fftpack/src/dfftpack compile options: '-c' pathf95:f77: scipy/fftpack/src/dfftpack/dcosqi.f pathf95:f77: scipy/fftpack/src/dfftpack/dcosqf.f pathf95:f77: scipy/fftpack/src/dfftpack/zfftf.f : and the build continues with success on picking up and building with the pathscale compiler on other pieces that need the fortran compiler until.. : running build_ext customize UnixCCompiler customize UnixCCompiler using build_ext resetting extension 'scipy.integrate._odepack' language from 'c' to 'f77'. resetting extension 'scipy.integrate.vode' language from 'c' to 'f77'. resetting extension 'scipy.lib.blas.fblas' language from 'c' to 'f77'. resetting extension 'scipy.odr.__odrpack' language from 'c' to 'f77'. extending extension 'scipy.sparse.linalg.dsolve._zsuperlu' defined_macros with [('USE_VENDOR_BLAS', 1)] extending extension 'scipy.sparse.linalg.dsolve._dsuperlu' defined_macros with [('USE_VENDOR_BLAS', 1)] extending extension 'scipy.sparse.linalg.dsolve._csuperlu' defined_macros with [('USE_VENDOR_BLAS', 1)] extending extension 'scipy.sparse.linalg.dsolve._ssuperlu' defined_macros with [('USE_VENDOR_BLAS', 1)] customize UnixCCompiler customize UnixCCompiler using build_ext customize PathScaleFCompiler warning: build_ext: f77_compiler=pathscale is not available. : scgcc -shared build/temp.linux- mips64-2.4/scipy/cluster/src/hierarchy_wrap.o build/temp.linux- mips64-2.4/scipy/cluster/src/h ierarchy.o -Lbuild/temp.linux-mips64-2.4 -o build/lib.linux- mips64-2.4/scipy/cluster/_hierarchy_wrap.so building 'scipy.fftpack._fftpack' extension warning: build_ext: extension 'scipy.fftpack._fftpack' has Fortran libraries but no Fortran linker found, using default linker compiling C sources : and ultimately failure because of this reason.. scgcc -shared build/temp.linux- mips64-2.4/scipy/interpolate/src/_fitpackmodule.o -Lbuild/temp.linux- mips64-2.4 -lfitpack -o build/lib.linux-mips64-2.4/scipy/interpolate/_fitpack.so building 'scipy.interpolate.dfitpack' extension error: extension 'scipy.interpolate.dfitpack' has Fortran sources but no Fortran compiler found Any suggestions on why build_ext is failing on picking up the fortran compiler or any work-arounds for this? Thanks Avi -------------- next part -------------- An HTML attachment was scrubbed... URL: From gareth.elston.floss at googlemail.com Wed Mar 11 19:07:19 2009 From: gareth.elston.floss at googlemail.com (Gareth Elston) Date: Wed, 11 Mar 2009 23:07:19 +0000 Subject: [SciPy-dev] [Numpy-discussion] A module for homogeneous transformation matrices, Euler angles and quaternions In-Reply-To: <463e11f90903041928j7508b2fcu4abbaa65cfe11460@mail.gmail.com> References: <2352c0540903041410j263dbb4dk6d6a2662ae7c4216@mail.gmail.com> <463e11f90903041928j7508b2fcu4abbaa65cfe11460@mail.gmail.com> Message-ID: <2352c0540903111607r32d50c4fm976f010a76e1f72d@mail.gmail.com> Does anyone know any good internet references for defining and using homogeneous transformation matrices, especially oblique projection matrices? I'm writing some tests for transformations.py and I'm getting unexpected results, quite possibly because I'm making naive assumptions about how to use projection_matrix(). Thanks, Gareth. On Thu, Mar 5, 2009 at 3:28 AM, Jonathan Taylor wrote: > Looks cool but a lot of this should be done in an extension module to > make it fast. ?Perhaps starting this process off as a separate entity > until stability is acheived. ?I would be tempted to do some of this > using cython. ?I just wrote found that generating a rotation matrix > from euler angles is about 10x faster when done properly with cython. > > J. > > On Wed, Mar 4, 2009 at 5:10 PM, Gareth Elston > wrote: >> I found a nice module for these transforms at >> http://www.lfd.uci.edu/~gohlke/code/transformations.py.html . I've >> been using an older version for some time and thought it might make a >> good addition to numpy/scipy. I made some simple mods to the older >> version to add a couple of functions I needed and to allow it to be >> used with Python 2.4. >> >> The module is pure Python (2.5, with numpy 1.2 imported), includes >> doctests, and is BSD licensed. Here's the first part of the module >> docstring: >> >> """Homogeneous Transformation Matrices and Quaternions. >> >> A library for calculating 4x4 matrices for translating, rotating, mirroring, >> scaling, shearing, projecting, orthogonalizing, and superimposing arrays of >> homogenous coordinates as well as for converting between rotation matrices, >> Euler angles, and quaternions. >> """ >> >> I'd like to see this added to numpy/scipy so I know I've got some >> reading to do (scipy.org/Developer_Zone and the huge scipy-dev >> discussions on Scipy development infrastructure / workflow) to make >> sure it follows the guidelines, but where would people like to see >> this? In numpy? scipy? scikits? elsewhere? >> >> I seem to remember that there was a first draft of a guide for >> developers being written. Are there any links available? >> >> Thanks, >> Gareth. From lists at onerussian.com Thu Mar 12 22:41:53 2009 From: lists at onerussian.com (Yaroslav Halchenko) Date: Thu, 12 Mar 2009 22:41:53 -0400 Subject: [SciPy-dev] Sad sad sad... Was: Warning about remaining issues in stats.distributions ? In-Reply-To: <20081209183213.GL25994@washoe.rutgers.edu> References: <1cd32cbb0812010806x6eb5bdcdt684c404e4e5e8027@mail.gmail.com> <20081209183213.GL25994@washoe.rutgers.edu> Message-ID: <20090313024153.GA31303@washoe.rutgers.edu> heh heh... very sad to see that the warning was simply ignored and 0.7.0 still has this issue on exactly the same command (hence advice to include it to unittests was ignored as well): >>> print scipy.__version__ 0.7.0 >>> scipy.stats.rdist(1.32, 0, 1).cdf(-1.0+numpy.finfo(float).eps) Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.5/site-packages/scipy/stats/distributions.py", line 117, in cdf return self.dist.cdf(x,*self.args,**self.kwds) File "/usr/lib/python2.5/site-packages/scipy/stats/distributions.py", line 625, in cdf place(output,cond,self._cdf(*goodargs)) File "/usr/lib/python2.5/site-packages/scipy/stats/distributions.py", line 528, in _cdf return self.veccdf(x,*args) File "/usr/lib/python2.5/site-packages/numpy/lib/function_base.py", line 1886, in __call__ _res = array(self.ufunc(*newargs),copy=False, File "/usr/lib/python2.5/site-packages/scipy/stats/distributions.py", line 525, in _cdf_single_call return scipy.integrate.quad(self._pdf, self.a, x, args=args)[0] File "/usr/lib/python2.5/site-packages/scipy/integrate/quadpack.py", line 185, in quad retval = _quad(func,a,b,args,full_output,epsabs,epsrel,limit,points) File "/usr/lib/python2.5/site-packages/scipy/integrate/quadpack.py", line 249, in _quad return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit) File "/usr/lib/python2.5/site-packages/scipy/stats/distributions.py", line 3046, in _pdf return pow((1.0-x*x),c/2.0-1) / special.beta(0.5,c/2.0) ZeroDivisionError: 0.0 cannot be raised to a negative power and my workaround doesn't work any more so I need to look for another one. On Tue, 09 Dec 2008, Yaroslav Halchenko wrote: > > * distributions that have problems for some range of parameters > so a good (imho) piece to add to unittests for the 'issues' to be fixed: > scipy.stats.rdist(1.32, 0, 1).cdf(-1.0+numpy.finfo(float).eps) > (tried on the SVN trunk to verify that it fails... discover the reason on > your own ;-)) > For myself I resolved it with > __eps = N.sqrt(N.finfo(float).eps) > rdist = rdist_gen(a=-1.0+__eps, b=1.0-__eps, .... > but I am not sure if that is the cleanest way... and may be some other > distributions would need such tweakery to make them more stable. -- .-. =------------------------------ /v\ ----------------------------= Keep in touch // \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^-^^ [175555] From josef.pktd at gmail.com Thu Mar 12 23:51:24 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 12 Mar 2009 23:51:24 -0400 Subject: [SciPy-dev] Sad sad sad... Was: Warning about remaining issues in stats.distributions ? In-Reply-To: <20090313024153.GA31303@washoe.rutgers.edu> References: <1cd32cbb0812010806x6eb5bdcdt684c404e4e5e8027@mail.gmail.com> <20081209183213.GL25994@washoe.rutgers.edu> <20090313024153.GA31303@washoe.rutgers.edu> Message-ID: <1cd32cbb0903122051x664c0f2fvcfbe47011508932@mail.gmail.com> On Thu, Mar 12, 2009 at 10:41 PM, Yaroslav Halchenko wrote: > heh heh... very sad to see that the warning was simply ignored and 0.7.0 > still has this issue on exactly the same command (hence advice to include > it to unittests was ignored as well): > >>>> print scipy.__version__ > 0.7.0 >>>> scipy.stats.rdist(1.32, 0, 1).cdf(-1.0+numpy.finfo(float).eps) > Traceback (most recent call last): > ?File "", line 1, in > ?File "/usr/lib/python2.5/site-packages/scipy/stats/distributions.py", line 117, in cdf > ? ?return self.dist.cdf(x,*self.args,**self.kwds) > ?File "/usr/lib/python2.5/site-packages/scipy/stats/distributions.py", line 625, in cdf > ? ?place(output,cond,self._cdf(*goodargs)) > ?File "/usr/lib/python2.5/site-packages/scipy/stats/distributions.py", line 528, in _cdf > ? ?return self.veccdf(x,*args) > ?File "/usr/lib/python2.5/site-packages/numpy/lib/function_base.py", line 1886, in __call__ > ? ?_res = array(self.ufunc(*newargs),copy=False, > ?File "/usr/lib/python2.5/site-packages/scipy/stats/distributions.py", line 525, in _cdf_single_call > ? ?return scipy.integrate.quad(self._pdf, self.a, x, args=args)[0] > ?File "/usr/lib/python2.5/site-packages/scipy/integrate/quadpack.py", line 185, in quad > ? ?retval = _quad(func,a,b,args,full_output,epsabs,epsrel,limit,points) > ?File "/usr/lib/python2.5/site-packages/scipy/integrate/quadpack.py", line 249, in _quad > ? ?return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit) > ?File "/usr/lib/python2.5/site-packages/scipy/stats/distributions.py", line 3046, in _pdf > ? ?return pow((1.0-x*x),c/2.0-1) / special.beta(0.5,c/2.0) > ZeroDivisionError: 0.0 cannot be raised to a negative power > > and my workaround doesn't work any more so I need to look for another > one. > > > On Tue, 09 Dec 2008, Yaroslav Halchenko wrote: > >> > * distributions that have problems for some range of parameters >> so a good (imho) piece to add to unittests for the 'issues' to be fixed: > >> scipy.stats.rdist(1.32, 0, 1).cdf(-1.0+numpy.finfo(float).eps) > >> (tried on the SVN trunk to verify that it fails... discover the reason on >> your own ;-)) > >> For myself I resolved it with > >> ? ? __eps = N.sqrt(N.finfo(float).eps) >> ? ? rdist = rdist_gen(a=-1.0+__eps, b=1.0-__eps, .... > >> but I am not sure if that is the cleanest way... and may be some other >> distributions would need such tweakery to make them more stable. > -- > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?.-. > =------------------------------ ? /v\ ?----------------------------= > Keep in touch ? ? ? ? ? ? ? ? ? ?// \\ ? ? (yoh@|www.)onerussian.com > Yaroslav Halchenko ? ? ? ? ? ? ?/( ? )\ ? ? ? ? ? ? ? ICQ#: 60653192 > ? ? ? ? ? ? ? ? ? Linux User ? ?^^-^^ ? ?[175555] > > Fixing numerical integration over the distance of a machine epsilon of a function that has a singularity at the boundary was not very high on my priority list. If there is a real use case that requires this, I can do a temporary fix. As far as I have seen, you use explicitly this special case as a test case and not a test that would reflect a failing use case. Overall, I prefer to have a general solution to the boundary problem for numerical integration, instead of messing around with the theoretically correct boundaries. Also, I would like to know what the references for the rdist are. Google search for r distribution is pretty useless, and I have not yet found a reference or an explanation of the rdist and its uses. Josef From josef.pktd at gmail.com Fri Mar 13 01:11:05 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 13 Mar 2009 01:11:05 -0400 Subject: [SciPy-dev] Sad sad sad... Was: Warning about remaining issues in stats.distributions ? In-Reply-To: <1cd32cbb0903122051x664c0f2fvcfbe47011508932@mail.gmail.com> References: <1cd32cbb0812010806x6eb5bdcdt684c404e4e5e8027@mail.gmail.com> <20081209183213.GL25994@washoe.rutgers.edu> <20090313024153.GA31303@washoe.rutgers.edu> <1cd32cbb0903122051x664c0f2fvcfbe47011508932@mail.gmail.com> Message-ID: <1cd32cbb0903122211y67bbe3f9sba7a63e8dc47ec03@mail.gmail.com> On Thu, Mar 12, 2009 at 11:51 PM, wrote: > On Thu, Mar 12, 2009 at 10:41 PM, Yaroslav Halchenko > wrote: >> heh heh... very sad to see that the warning was simply ignored and 0.7.0 >> still has this issue on exactly the same command (hence advice to include >> it to unittests was ignored as well): >> >>>>> print scipy.__version__ >> 0.7.0 >>>>> scipy.stats.rdist(1.32, 0, 1).cdf(-1.0+numpy.finfo(float).eps) >> Traceback (most recent call last): >> ?File "", line 1, in >> ?File "/usr/lib/python2.5/site-packages/scipy/stats/distributions.py", line 117, in cdf >> ? ?return self.dist.cdf(x,*self.args,**self.kwds) >> ?File "/usr/lib/python2.5/site-packages/scipy/stats/distributions.py", line 625, in cdf >> ? ?place(output,cond,self._cdf(*goodargs)) >> ?File "/usr/lib/python2.5/site-packages/scipy/stats/distributions.py", line 528, in _cdf >> ? ?return self.veccdf(x,*args) >> ?File "/usr/lib/python2.5/site-packages/numpy/lib/function_base.py", line 1886, in __call__ >> ? ?_res = array(self.ufunc(*newargs),copy=False, >> ?File "/usr/lib/python2.5/site-packages/scipy/stats/distributions.py", line 525, in _cdf_single_call >> ? ?return scipy.integrate.quad(self._pdf, self.a, x, args=args)[0] >> ?File "/usr/lib/python2.5/site-packages/scipy/integrate/quadpack.py", line 185, in quad >> ? ?retval = _quad(func,a,b,args,full_output,epsabs,epsrel,limit,points) >> ?File "/usr/lib/python2.5/site-packages/scipy/integrate/quadpack.py", line 249, in _quad >> ? ?return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit) >> ?File "/usr/lib/python2.5/site-packages/scipy/stats/distributions.py", line 3046, in _pdf >> ? ?return pow((1.0-x*x),c/2.0-1) / special.beta(0.5,c/2.0) >> ZeroDivisionError: 0.0 cannot be raised to a negative power >> >> and my workaround doesn't work any more so I need to look for another >> one. >> >> >> On Tue, 09 Dec 2008, Yaroslav Halchenko wrote: >> >>> > * distributions that have problems for some range of parameters >>> so a good (imho) piece to add to unittests for the 'issues' to be fixed: >> >>> scipy.stats.rdist(1.32, 0, 1).cdf(-1.0+numpy.finfo(float).eps) >> >>> (tried on the SVN trunk to verify that it fails... discover the reason on >>> your own ;-)) >> >>> For myself I resolved it with >> >>> ? ? __eps = N.sqrt(N.finfo(float).eps) >>> ? ? rdist = rdist_gen(a=-1.0+__eps, b=1.0-__eps, .... >> >>> but I am not sure if that is the cleanest way... and may be some other >>> distributions would need such tweakery to make them more stable. >> -- >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?.-. >> =------------------------------ ? /v\ ?----------------------------= >> Keep in touch ? ? ? ? ? ? ? ? ? ?// \\ ? ? (yoh@|www.)onerussian.com >> Yaroslav Halchenko ? ? ? ? ? ? ?/( ? )\ ? ? ? ? ? ? ? ICQ#: 60653192 >> ? ? ? ? ? ? ? ? ? Linux User ? ?^^-^^ ? ?[175555] >> >> > > Fixing numerical integration over the distance of a machine epsilon of > a function that has a singularity at the boundary was not very high on > my priority list. > > If there is a real use case that requires this, I can do a temporary > fix. As far as I have seen, you use explicitly this special case as a > test case and not a test that would reflect a failing use case. > Overall, I prefer to have a general solution to the boundary problem > for numerical integration, instead of messing around with the > theoretically correct boundaries. > > Also, I would like to know what the references for the rdist are. > Google search for r distribution is pretty useless, and I have not yet > found a reference or an explanation of the rdist and its uses. > > Josef > numerical inprecision for calculation close to the boundary with >>> stats.rdist.a -1 rdist is symmetric, so the following two should be the same: >>> stats.rdist(1.32, 0, 1).cdf(-1.0+1e-12) 5.5125207421811825e-009 >>> 1-stats.rdist(1.32, 0, 1).cdf(1.0-1e-12) 1.8800738743607326e-011 If you want to avoid the exception for values very close to the boundary, then you can override the boundary yourself. >>> stats.rdist.a = -1 + sqrt(np.finfo(float).eps) >>> sqrt(np.finfo(float).eps) 1.4901161193847656e-008 but then you loose numerical precision at the other points close to the boundary. The zero values are obtained because the integral is negative, the upper integration bound is smaller than the lower integration bound, and I guess, the check for validity of values in cdf sets out-of-bound (negative) values to zero. >>> stats.rdist(1.32, 0, 1).cdf(-1.0+np.finfo(float).eps) 0.0 >>> stats.rdist(1.32, 0, 1).cdf(-1.0+1e-14) 0.0 >>> stats.rdist(1.32, 0, 1).cdf(-1.0+1e-12) 0.0 >>> stats.rdist(1.32, 0, 1).cdf(-1.0+1e-11) 0.0 >>> stats.rdist(1.32, 0, 1).cdf(-1.0+1e-10) 0.0 >>> stats.rdist(1.32, 0, 1).cdf(-1.0+1e-9) 0.0 >>> stats.rdist(1.32, 0, 1).cdf(-1.0+1e-8) 0.0 >>> stats.rdist(1.32, 0, 1).cdf(-1.0+1e-7) 7.8873658135486906e-006 compared with symmetric case >>> stats.rdist(1.32, 0, 1).sf(1.0-1e-7) 1.8275658764110858e-010 I don't know about the internals of scipy.integrate.quad and how it handles boundary points. For anything closer to the boundary than 1e-10 the estimated absolute error is larger than the integral. Closer to the boundary than 1e-14 raises an exception. >>> scipy.integrate.quad(stats.rdist._pdf, stats.rdist.a, -1+1e-8, args=(1.32,)) (2.4122318926575886e-006, 1.984389088393137e-012) >>> scipy.integrate.quad(stats.rdist._pdf, stats.rdist.a, -1+1e-9, args=(1.32,)) (5.2774056254715333e-007, 5.6048532675329467e-012) >>> scipy.integrate.quad(stats.rdist._pdf, stats.rdist.a, -1+1e-10, args=(1.32,)) (1.152923988934946e-007, 1.3805352057819491e-008) >>> scipy.integrate.quad(stats.rdist._pdf, stats.rdist.a, -1+1e-11, args=(1.32,)) (2.5201736781677286e-008, 4.7714144348752183e-009) this looks ok: >>> stats.rdist.a = -1 + np.finfo(float).eps >>> scipy.integrate.quad(stats.rdist._pdf, stats.rdist.a, -1+1e-7, args=(1.32,)) (1.1026024914000647e-005, 2.4375839668006874e-012) >>> scipy.integrate.quad(stats.rdist._pdf, stats.rdist.a, -1+1e-14, args=(1.32,)) (2.4315077563114524e-010, 3.8495287806628835e-012) >>> stats.rdist(1.32, 0, 1).cdf(-1.0+np.finfo(float).eps) 0.0 >>> stats.rdist(1.32, 0, 1).cdf(-1.0+2*np.finfo(float).eps) Warning: Extremely bad integrand behavior occurs at some points of the integration interval. 1.2660374321273558e-011 >>> stats.rdist(1.32, 0, 1).cdf(-1.0+3*np.finfo(float).eps) Warning: Extremely bad integrand behavior occurs at some points of the integration interval. 2.2899449003632814e-011 >>> stats.rdist(1.32, 0, 1).cdf(-1.0+1e-13) 1.1878387051893005e-009 >>> stats.rdist(1.32, 0, 1).cdf(-1.0+1e-10) 1.1528892986424152e-007 So, your solution for increasing the lower bound removes the exception but changes (screws up) the next range of values. But, introducing integration bounds different from dist.a and dist.b might work, but it requires testing to see if it works for all distributions, and we don't have test for all epsilon boundary cases. Josef From lists at onerussian.com Fri Mar 13 10:10:06 2009 From: lists at onerussian.com (Yaroslav Halchenko) Date: Fri, 13 Mar 2009 10:10:06 -0400 Subject: [SciPy-dev] Sad sad sad... Was: Warning about remaining issues in stats.distributions ? In-Reply-To: <1cd32cbb0903122051x664c0f2fvcfbe47011508932@mail.gmail.com> References: <1cd32cbb0812010806x6eb5bdcdt684c404e4e5e8027@mail.gmail.com> <20081209183213.GL25994@washoe.rutgers.edu> <20090313024153.GA31303@washoe.rutgers.edu> <1cd32cbb0903122051x664c0f2fvcfbe47011508932@mail.gmail.com> Message-ID: <20090313141004.GA25857@washoe.rutgers.edu> > Fixing numerical integration over the distance of a machine epsilon of > a function that has a singularity at the boundary was not very high on > my priority list. fair enough, but still sad ;) it is just that I got frustrated since upgrade to 0.7.0 caused quite a few places to break, and this one was one of the 'soar' ones ;) > If there is a real use case that requires this, I can do a temporary well... I think I've mentioned before how I ran into this issue: our unittests of PyMVPA [1] fail. Primarily (I think) to my silly distribution matching function. > fix. As far as I have seen, you use explicitly this special case as a > test case and not a test that would reflect a failing use case. well -- this test case is just an example. that distribution matching is the one which causes it in unittests > Overall, I prefer to have a general solution to the boundary problem > for numerical integration, instead of messing around with the > theoretically correct boundaries. sure! proper solution would be nice. as for "messing with boundaries": imho it depends on how 'they are messed up with" ;) May be original self.{a,b} could be left alone, but for numeric integration some others could be introduced (self.{a,b}_eps), which are used for integration and some correction term to be added whenever we are in the 'theoretical' boundaries, to compensate for "missing" part of [self.a, self.a_eps] Most of the distributions would not need to have them different from a,b and have 0 correction. Testing is imho quite obvious -- just go through all distributions and try to obtain .cdf values within sample points in the vicinity of the boundaries. I know that it is know exhaustive if a distribution has singularities within, but well -- it is better than nothing imho ;) I bet you can come up with a better solution. > Also, I would like to know what the references for the rdist are. actually I've found empirically that rdist is the one which I needed, and indeed there is not much information on the web: rdist corresponds to the distribution of a (single) coordinate component for a point located on a hypersphere (in space of N dimensions) of radius 1. When N is large it is well approximated by Gaussian, but in the low dimensions it is very different and quite interesting (e.g. flat in N=3) n.b. actually my boss told me that there is a family of distributions where this one belongs to but I've forgotten which one ;) will ask today again > Google search for r distribution is pretty useless, and I have not yet > found a reference or an explanation of the rdist and its uses. there was just a single page which I ran to which described rdist and plotted sample pdfs. but can't find it now [1] http://www.pymvpa.org/ -- .-. =------------------------------ /v\ ----------------------------= Keep in touch // \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^-^^ [175555] From josef.pktd at gmail.com Fri Mar 13 12:12:45 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 13 Mar 2009 12:12:45 -0400 Subject: [SciPy-dev] Sad sad sad... Was: Warning about remaining issues in stats.distributions ? In-Reply-To: <20090313141004.GA25857@washoe.rutgers.edu> References: <1cd32cbb0812010806x6eb5bdcdt684c404e4e5e8027@mail.gmail.com> <20081209183213.GL25994@washoe.rutgers.edu> <20090313024153.GA31303@washoe.rutgers.edu> <1cd32cbb0903122051x664c0f2fvcfbe47011508932@mail.gmail.com> <20090313141004.GA25857@washoe.rutgers.edu> Message-ID: <1cd32cbb0903130912r370fb65fn389c39e75a40b0e6@mail.gmail.com> On Fri, Mar 13, 2009 at 10:10 AM, Yaroslav Halchenko wrote: > >> Fixing numerical integration over the distance of a machine epsilon of >> a function that has a singularity at the boundary was not very high on >> my priority list. > fair enough, but still sad ;) > it is just that I got frustrated since upgrade to 0.7.0 caused quite a > few places to break, and this one was one of the 'soar' ones ;) I ran the pymvpa test suite several times with uptodate versions of scipy and I usually didn't see any problems. I expected that, all my changes in stats so far should be backwards compatible, at least for parts that worked correctly before. If there are any problems you could be more specific and report them on the mailing list or open a ticket. > >> If there is a real use case that requires this, I can do a temporary > well... I think I've mentioned before how I ran into this issue: our > unittests of PyMVPA [1] fail. ?Primarily (I think) to my silly > distribution matching function. > >> fix. As far as I have seen, you use explicitly this special case as a >> test case and not a test that would reflect a failing use case. > well -- this test case is just an example. that distribution matching is > the one which causes it in unittests I only found the test for the rdist in test_stats.testRDistStability, and there it explicitely checks this corner case. In general the current implementation of the fit method still has several major problems to work out of the box for all distributions, especially those with a finite support boundary. Generic starting values don't work with many distributions, and fitting the location for distributions with finite support bounds using maximum likelihood often has problems. In pymvpa you also allow fitting of semifrozen distributions, is this your default in the use of MatchDistributions for distributions with bounded support as rdist, or half-open support. In your match distribution example I get many distributions with very bad kstest statistic, and when I tried out possible changes to automatically match distributions with fit, then this often indicated estimation problems and not necessarily a bad fit. But I did this for only a few sample distributions. For the largest part, I like the statistical analysis in pymvpa and you are far ahead of scipy.stats, eg. MatchDistribution and rv_semifrozen, or many of the other statistical methods. But in many cases, I don't like the actual implementation so much at least for a general statistical use. If you have an example where matching the rdist distribution actually raises your exception, then we could look at the fitting method and see whether we can make it more robust. I still don't see how any statistical method can hit boundary+eps. For rdist solution see below. > >> Overall, I prefer to have a general solution to the boundary problem >> for numerical integration, instead of messing around with the >> theoretically correct boundaries. > sure! proper solution would be nice. ?as for "messing with boundaries": > imho it depends on how 'they are messed up with" ;) May be original > self.{a,b} could be left alone, but for numeric integration some others > could be introduced (self.{a,b}_eps), which are used for integration and > some correction term to be added whenever we are in the 'theoretical' > boundaries, to compensate for "missing" part of [self.a, self.a_eps] > > Most of the distributions would not need to have them different from a,b > and have 0 correction. > Testing is imho quite obvious -- just go through all distributions and > try to obtain .cdf values within sample points in the vicinity of the > boundaries. I know that it is know exhaustive if a distribution has > singularities within, but well -- it is better than nothing imho ;) > > I bet you can come up with a better solution. > >> Also, I would like to know what the references for the rdist are. > actually I've found empirically that rdist is the one which I needed, > and indeed there is not much information on the web: > > rdist corresponds to the distribution of a (single) coordinate > component for a point located on a hypersphere (in space of N > dimensions) of radius 1. When ?N is large it is well approximated by > Gaussian, but in the low dimensions it is very > different and quite interesting (e.g. flat in N=3) > > n.b. actually my boss told me that there is a family of distributions where > this one belongs to but I've forgotten which one ;) will ask today again > >> Google search for r distribution is pretty useless, and I have not yet >> found a reference or an explanation of the rdist and its uses. > there was just a single page which I ran to which described rdist and > plotted sample pdfs. but can't find it now I read somewhere, I don't remember where that rdist is the distribution of the correlation coefficient, but without more information that's pretty useless > > [1] http://www.pymvpa.org/ > -- The solution to the rdist problem is trivial: >>> np.power(1-1.,-2) inf >>> pow(1-1.,-2) Traceback (most recent call last): File "", line 1, in pow(1-1.,-2) ZeroDivisionError: 0.0 cannot be raised to a negative power I hate missing name spaces, I didn't know that pow is the python buildin and not numpy.power numpy.power can raise 0.0 to the negative power, I just have to remove the usage or python's pow. Much better than fiddling with boundaries. I still have to test it, but this looks ok. I was wondering why you used the square root in your eps increase on the boundary, because in all my trying in the python shell it seems to work also without. But the problem is that I was using a numpy float instead of a python float and so numpy.power was used instead of python pow, (I assume): >>> (-1+(1-(1e-14)**2))**(-2) Traceback (most recent call last): File "", line 1, in (-1+(1-(1e-14)**2))**(-2) ZeroDivisionError: 0.0 cannot be raised to a negative power >>> (-1+(1-np.finfo(float).eps**2))**(-1) inf >>> (-1+(1-(1e-14)**2)).__class__ >>> (-1+(1-np.finfo(float).eps**2)).__class__ Josef From josef.pktd at gmail.com Fri Mar 13 12:44:43 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 13 Mar 2009 12:44:43 -0400 Subject: [SciPy-dev] Sad sad sad... Was: Warning about remaining issues in stats.distributions ? In-Reply-To: <1cd32cbb0903130912r370fb65fn389c39e75a40b0e6@mail.gmail.com> References: <1cd32cbb0812010806x6eb5bdcdt684c404e4e5e8027@mail.gmail.com> <20081209183213.GL25994@washoe.rutgers.edu> <20090313024153.GA31303@washoe.rutgers.edu> <1cd32cbb0903122051x664c0f2fvcfbe47011508932@mail.gmail.com> <20090313141004.GA25857@washoe.rutgers.edu> <1cd32cbb0903130912r370fb65fn389c39e75a40b0e6@mail.gmail.com> Message-ID: <1cd32cbb0903130944h5c217bc6ub4b04e0a239cfb22@mail.gmail.com> > The solution to the rdist problem is trivial: > >>>> np.power(1-1.,-2) > inf >>>> pow(1-1.,-2) > Traceback (most recent call last): > ?File "", line 1, in > ? ?pow(1-1.,-2) > ZeroDivisionError: 0.0 cannot be raised to a negative power > It only partially helped, it doesn't raise an exception anymore, but the inf is incorrect >>> stats.rdist(1.32, 0, 1).cdf(-1.0+np.finfo(float).eps) Warning: Extremely bad integrand behavior occurs at some points of the integration interval. Warning: Extremely bad integrand behavior occurs at some points of the integration interval. inf >>> stats.rdist(1.32, 0, 1).cdf(-1.0+1e-14) Warning: Extremely bad integrand behavior occurs at some points of the integration interval. inf further away from the boundary it looks good >>> stats.rdist(1.32, 0, 1).cdf(-1.0+1e-12) 5.5125207421811825e-009 >>> stats.rdist._cdf_skip(-1.0+1e-12, 1.32) 5.5260225839681709e-009 >>> stats.rdist._cdf_skip(-1.0+1e-13, 1.32) 1.2092278844910709e-009 >>> stats.rdist(1.32, 0, 1).cdf(-1.0+1e-13) 1.2060822041379395e-009 so back to fiddling with the boundary ? so just do >>> stats.rdist.a = -1.0+np.finfo(float).eps before calling rdist, this works in some basic tests. Josef From lists at onerussian.com Fri Mar 13 15:16:53 2009 From: lists at onerussian.com (Yaroslav Halchenko) Date: Fri, 13 Mar 2009 15:16:53 -0400 Subject: [SciPy-dev] Sad sad sad... Was: Warning about remaining issues in stats.distributions ? In-Reply-To: <1cd32cbb0903130912r370fb65fn389c39e75a40b0e6@mail.gmail.com> References: <1cd32cbb0812010806x6eb5bdcdt684c404e4e5e8027@mail.gmail.com> <20081209183213.GL25994@washoe.rutgers.edu> <20090313024153.GA31303@washoe.rutgers.edu> <1cd32cbb0903122051x664c0f2fvcfbe47011508932@mail.gmail.com> <20090313141004.GA25857@washoe.rutgers.edu> <1cd32cbb0903130912r370fb65fn389c39e75a40b0e6@mail.gmail.com> Message-ID: <20090313191652.GC25857@washoe.rutgers.edu> > >> Google search for r distribution is pretty useless, and I have not yet > >> found a reference or an explanation of the rdist and its uses. > > there was just a single page which I ran to which described rdist and > > plotted sample pdfs. but can't find it now > I read somewhere, I don't remember where that rdist is the distribution > of the correlation coefficient, but without more information that's pretty > useless doh! sure it is related... hence the name rdist, since pearsons corr coeff is abbreviated as 'r' ;) hence rdist ;) http://en.wikipedia.org/wiki/Correlation_coefficient says that The distribution of the correlation coefficient has been examined by R. A. Fisher[2][3] and A. K. Gayen.[4] but those are 100 and 50 years old books... not sure if we have them online to check if they were the one who brought analytic function for it... and it seems that it is related to the 'multidimensional' correlation mentioned in the wikipedia but it is now clear how sample size "fits into equation"... c seems to relate to the dimensions of the data... is it possible to trace back who introduced this lovely piece into scipy? ;) may be we could ask the author? ;) -- .-. =------------------------------ /v\ ----------------------------= Keep in touch // \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^-^^ [175555] From lists at onerussian.com Fri Mar 13 15:32:05 2009 From: lists at onerussian.com (Yaroslav Halchenko) Date: Fri, 13 Mar 2009 15:32:05 -0400 Subject: [SciPy-dev] Sad sad sad... Was: Warning about remaining issues in stats.distributions ? In-Reply-To: <1cd32cbb0903130912r370fb65fn389c39e75a40b0e6@mail.gmail.com> References: <1cd32cbb0812010806x6eb5bdcdt684c404e4e5e8027@mail.gmail.com> <20081209183213.GL25994@washoe.rutgers.edu> <20090313024153.GA31303@washoe.rutgers.edu> <1cd32cbb0903122051x664c0f2fvcfbe47011508932@mail.gmail.com> <20090313141004.GA25857@washoe.rutgers.edu> <1cd32cbb0903130912r370fb65fn389c39e75a40b0e6@mail.gmail.com> Message-ID: <20090313193204.GD25857@washoe.rutgers.edu> doh... just now realized that some pieces don't come together... whenever I wrote you reply (while reading your email) I killed everything till the signature which commonly starts with '^^-$'... hence I killed till this lines and never saw your power finding ;) cool!!! so if it works all-around (or whenever you think that you are sick of testing it) -- please let me know -- I will adopt this solution in my monkey patch within pymvpa ;) THANKS! P.S. Told ya you will come up with a nicer one ;) > I was wondering why you used the square root in your eps increase on > the boundary, because in all my trying in the python shell it seems to > work also without. But the problem is that I was using a numpy float > instead of a python float and so numpy.power was used instead of > python pow, (I assume): probably ;) -- .-. =------------------------------ /v\ ----------------------------= Keep in touch // \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^-^^ [175555] From lists at onerussian.com Fri Mar 13 15:53:56 2009 From: lists at onerussian.com (Yaroslav Halchenko) Date: Fri, 13 Mar 2009 15:53:56 -0400 Subject: [SciPy-dev] Scipy workflow (and not tools). In-Reply-To: References: <6ce0ac130902252321j6139238by634364acd2bd07b2@mail.gmail.com> <49A6D913.9040809@enthought.com> <6ce0ac130902261240y3596278fo30693766a0194d5@mail.gmail.com> <9457e7c80902261307m1d4cdedckc7df633763a9b29d@mail.gmail.com> <9457e7c80902261448y2719b96bg8225a0717969eedd@mail.gmail.com> <5b8d13220902262023i675a4bc4ra8bf981267fb2156@mail.gmail.com> <85b5c3130902262054i27fca37fq7e2d58c6cf06626a@mail.gmail.com> Message-ID: <20090313195355.GE25857@washoe.rutgers.edu> just 1 tiny comment... probably it was raised already, then I beg your pardon for its reincarnation http://projects.scipy.org/numpy/wiki/GitMirror advices to use git clone --origin svn http://projects.scipy.org/git/scipy.git scipy.git but there is a convention to use .git suffix for '--bare' repositories. So I would advice to use scipy-git or scipy_git or just scipy ;) On Fri, 27 Feb 2009, Pauli Virtanen wrote: > Yes, instructions for this are available also on the page > http://scipy.org/scipy/numpy/wiki/GitMirror > But if the mirror is up-to-date (and I hope we manage to get > the SVN post-commit hook installed), there's no need to do this, > you can just ''git fetch''. -- .-. =------------------------------ /v\ ----------------------------= Keep in touch // \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^-^^ [175555] From lists at onerussian.com Fri Mar 13 16:19:44 2009 From: lists at onerussian.com (Yaroslav Halchenko) Date: Fri, 13 Mar 2009 16:19:44 -0400 Subject: [SciPy-dev] Question to Travis: what is rdist about? Message-ID: <20090313201943.GF25857@washoe.rutgers.edu> Travis, I wonder if you get a moment and desire to give a bit of theory/history for the hungry people ;) In a recent thread (a part of it is below the body of this email) dealing with instabilities of rdist Josef asked what is the application domain of rdist distribution... he heard about relation to correlation, I mentioned that it is related to the distribution of a coordinate of points on c-dimensional sphere. But I wonder -- what was the original reason for this distribution to appear? where have you found it, or in other words -- what literature source describes it? thanks to git I found that you introduced it in commit 8ce8603696448c171c186ea2aab158cf34e25441 Author: travo Date: Fri Nov 22 09:04:46 2002 +0000 Changed statistics module to use clasasses. git-svn-id: http://svn.scipy.org/svn/scipy/trunk at 648 d6536bca-fef9-0310-8506-e4c0a848fbcf but I can't figure out if it was really a new distribution or refactored from some other one. Thank you in advance! On Fri, 13 Mar 2009, Yaroslav Halchenko wrote: > > >> Google search for r distribution is pretty useless, and I have not yet > > >> found a reference or an explanation of the rdist and its uses. > > > there was just a single page which I ran to which described rdist and > > > plotted sample pdfs. but can't find it now > > I read somewhere, I don't remember where that rdist is the distribution > > of the correlation coefficient, but without more information that's pretty > > useless > doh! sure it is related... hence the name rdist, since pearsons corr > coeff is abbreviated as 'r' ;) hence rdist ;) > http://en.wikipedia.org/wiki/Correlation_coefficient > says that > The distribution of the correlation coefficient has been examined by R. > A. Fisher[2][3] and A. K. Gayen.[4] > but those are 100 and 50 years old books... not sure if we have them > online to check if they were the one who brought analytic function for > it... > and it seems that it is related to the 'multidimensional' correlation > mentioned in the wikipedia > but it is now clear how sample size "fits into equation"... c seems to > relate to the dimensions of the data... > is it possible to trace back who introduced this lovely piece into > scipy? ;) may be we could ask the author? ;) -- .-. =------------------------------ /v\ ----------------------------= Keep in touch // \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^-^^ [175555] From bsouthey at gmail.com Fri Mar 13 16:31:23 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 13 Mar 2009 15:31:23 -0500 Subject: [SciPy-dev] Question to Travis: what is rdist about? In-Reply-To: <20090313201943.GF25857@washoe.rutgers.edu> References: <20090313201943.GF25857@washoe.rutgers.edu> Message-ID: <49BAC29B.9000003@gmail.com> Hi, I presume it refers to the correlation distribution. The pdf is that given at: http://www.xycoon.com/rdis_density.htm where scipy.stats c variable is equal to n-2 in that formula. You can find things if you look for correlation test. Bruce Yaroslav Halchenko wrote: > Travis, > > I wonder if you get a moment and desire to give a bit of theory/history for > the hungry people ;) > > In a recent thread (a part of it is below the body of this email) dealing > with instabilities of rdist Josef asked what is the application domain of > rdist distribution... he heard about relation to correlation, I mentioned that > it is related to the distribution of a coordinate of points on c-dimensional > sphere. But I wonder -- what was the original reason for this distribution to > appear? where have you found it, or in other words -- what literature > source describes it? > > thanks to git I found that you introduced it in > > commit 8ce8603696448c171c186ea2aab158cf34e25441 > Author: travo > Date: Fri Nov 22 09:04:46 2002 +0000 > > Changed statistics module to use clasasses. > > > git-svn-id: http://svn.scipy.org/svn/scipy/trunk at 648 d6536bca-fef9-0310-8506-e4c0a848fbcf > > > but I can't figure out if it was really a new distribution or refactored from > some other one. > > Thank you in advance! > > On Fri, 13 Mar 2009, Yaroslav Halchenko wrote: > > >>>>> Google search for r distribution is pretty useless, and I have not yet >>>>> found a reference or an explanation of the rdist and its uses. >>>>> >>>> there was just a single page which I ran to which described rdist and >>>> plotted sample pdfs. but can't find it now >>>> >>> I read somewhere, I don't remember where that rdist is the distribution >>> of the correlation coefficient, but without more information that's pretty >>> useless >>> >> doh! sure it is related... hence the name rdist, since pearsons corr >> coeff is abbreviated as 'r' ;) hence rdist ;) >> > > >> http://en.wikipedia.org/wiki/Correlation_coefficient >> says that >> The distribution of the correlation coefficient has been examined by R. >> A. Fisher[2][3] and A. K. Gayen.[4] >> > > >> but those are 100 and 50 years old books... not sure if we have them >> online to check if they were the one who brought analytic function for >> it... >> > > >> and it seems that it is related to the 'multidimensional' correlation >> mentioned in the wikipedia >> but it is now clear how sample size "fits into equation"... c seems to >> relate to the dimensions of the data... >> > > >> is it possible to trace back who introduced this lovely piece into >> scipy? ;) may be we could ask the author? ;) >> From oliphant at enthought.com Fri Mar 13 17:32:09 2009 From: oliphant at enthought.com (Travis E. Oliphant) Date: Fri, 13 Mar 2009 16:32:09 -0500 Subject: [SciPy-dev] Question to Travis: what is rdist about? In-Reply-To: <20090313201943.GF25857@washoe.rutgers.edu> References: <20090313201943.GF25857@washoe.rutgers.edu> Message-ID: <49BAD0D9.3040103@enthought.com> Yaroslav Halchenko wrote: > Travis, > > I wonder if you get a moment and desire to give a bit of theory/history for > the hungry people ;) > Thanks for emailing me directly. Unfortunately, I don't get the time to read all of SciPy-dev anymore. These are the references I used in constructing the distributions (they are comments in the code). ## References:: ## Documentation for ranlib, rv2, cdflib and ## ## Eric Wesstein's world of mathematics http://mathworld.wolfram.com/ ## http://mathworld.wolfram.com/topics/StatisticalDistributions.html ## ## Documentation to Regress+ by Michael McLaughlin ## ## Engineering and Statistics Handbook (NIST) ## http://www.itl.nist.gov/div898/handbook/index.htm ## ## Documentation for DATAPLOT from NIST ## http://www.itl.nist.gov/div898/software/dataplot/distribu.htm ## ## Norman Johnson, Samuel Kotz, and N. Balakrishnan "Continuous ## Univariate Distributions", second edition, ## Volumes I and II, Wiley & Sons, 1994. The rdist distribution appeared at the same time as a lot of other distributions. It must be referred to in one of the above sources. But, here is a decent current source: http://demonstrations.wolfram.com/TheRDistribution/ From the text (some of the math images disappeared): The r-distribution with parameter is the distribution of the correlation coefficient of a random sample of size drawn from a bivariate normal distribution with . It can be used to construct tests about the correlation coefficient of bivariate normal data; that is, tests with null hypothesis . The mean of the distribution is always zero, and as the sample size grows, the distribution's mass concentrates more closely about this mean. Thanks for pushing hard against SciPy --- it's the only way to ferret out the problems that exist. Best regards, -Travis -- Travis Oliphant Enthought, Inc. (512) 536-1057 (office) (512) 536-1059 (fax) http://www.enthought.com oliphant at enthought.com From cournape at gmail.com Fri Mar 13 23:45:55 2009 From: cournape at gmail.com (David Cournapeau) Date: Sat, 14 Mar 2009 12:45:55 +0900 Subject: [SciPy-dev] Scipy workflow (and not tools). In-Reply-To: <20090313195355.GE25857@washoe.rutgers.edu> References: <6ce0ac130902252321j6139238by634364acd2bd07b2@mail.gmail.com> <6ce0ac130902261240y3596278fo30693766a0194d5@mail.gmail.com> <9457e7c80902261307m1d4cdedckc7df633763a9b29d@mail.gmail.com> <9457e7c80902261448y2719b96bg8225a0717969eedd@mail.gmail.com> <5b8d13220902262023i675a4bc4ra8bf981267fb2156@mail.gmail.com> <85b5c3130902262054i27fca37fq7e2d58c6cf06626a@mail.gmail.com> <20090313195355.GE25857@washoe.rutgers.edu> Message-ID: <5b8d13220903132045q6e3f515cmbd2f4866558a4c33@mail.gmail.com> On Sat, Mar 14, 2009 at 4:53 AM, Yaroslav Halchenko wrote: > > but there is a convention to use .git suffix for '--bare' repositories. > So I would advice to use scipy-git or scipy_git or just scipy ;) IIRC, the .git suffix is automatically added by github. cheers, David From pav at iki.fi Sat Mar 14 07:38:44 2009 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 14 Mar 2009 11:38:44 +0000 (UTC) Subject: [SciPy-dev] Scipy workflow (and not tools). References: <6ce0ac130902252321j6139238by634364acd2bd07b2@mail.gmail.com> <6ce0ac130902261240y3596278fo30693766a0194d5@mail.gmail.com> <9457e7c80902261307m1d4cdedckc7df633763a9b29d@mail.gmail.com> <9457e7c80902261448y2719b96bg8225a0717969eedd@mail.gmail.com> <5b8d13220902262023i675a4bc4ra8bf981267fb2156@mail.gmail.com> <85b5c3130902262054i27fca37fq7e2d58c6cf06626a@mail.gmail.com> <20090313195355.GE25857@washoe.rutgers.edu> <5b8d13220903132045q6e3f515cmbd2f4866558a4c33@mail.gmail.com> Message-ID: Sat, 14 Mar 2009 12:45:55 +0900, David Cournapeau wrote: > On Sat, Mar 14, 2009 at 4:53 AM, Yaroslav Halchenko > wrote: > > >> but there is a convention to use .git suffix for '--bare' repositories. >> So I would advice to use scipy-git or scipy_git or just scipy ;) > > IIRC, the .git suffix is automatically added by github. I guess he's talking about the name of the cloned directory here. -- Pauli Virtanen From thouis at broad.mit.edu Sat Mar 14 20:56:49 2009 From: thouis at broad.mit.edu (Thouis (Ray) Jones) Date: Sat, 14 Mar 2009 20:56:49 -0400 Subject: [SciPy-dev] Sad sad sad... Was: Warning about remaining issues in stats.distributions ? In-Reply-To: <1cd32cbb0903122051x664c0f2fvcfbe47011508932@mail.gmail.com> References: <1cd32cbb0812010806x6eb5bdcdt684c404e4e5e8027@mail.gmail.com> <20081209183213.GL25994@washoe.rutgers.edu> <20090313024153.GA31303@washoe.rutgers.edu> <1cd32cbb0903122051x664c0f2fvcfbe47011508932@mail.gmail.com> Message-ID: <6c17e6f50903141756x221949d8mdfa2f2b4d84a381@mail.gmail.com> On Thu, Mar 12, 2009 at 23:51, wrote: > Fixing numerical integration over the distance of a machine epsilon of > a function that has a singularity at the boundary was not very high on > my priority list. I've been working with some code for integration using the double-exponential transform, which is useful for exactly this case. I'm not familiar with all the issues involved here, but if you would like to explore that option, let me know and I'll send you the code. I was planning on cleaning it up a bit and contributing it to scipy, anyway. From josef.pktd at gmail.com Sat Mar 14 23:02:33 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 14 Mar 2009 23:02:33 -0400 Subject: [SciPy-dev] Sad sad sad... Was: Warning about remaining issues in stats.distributions ? In-Reply-To: <6c17e6f50903141756x221949d8mdfa2f2b4d84a381@mail.gmail.com> References: <1cd32cbb0812010806x6eb5bdcdt684c404e4e5e8027@mail.gmail.com> <20081209183213.GL25994@washoe.rutgers.edu> <20090313024153.GA31303@washoe.rutgers.edu> <1cd32cbb0903122051x664c0f2fvcfbe47011508932@mail.gmail.com> <6c17e6f50903141756x221949d8mdfa2f2b4d84a381@mail.gmail.com> Message-ID: <1cd32cbb0903142002k340d9f1csb44e5b0fd8c73123@mail.gmail.com> On Sat, Mar 14, 2009 at 8:56 PM, Thouis (Ray) Jones wrote: > On Thu, Mar 12, 2009 at 23:51, ? wrote: >> Fixing numerical integration over the distance of a machine epsilon of >> a function that has a singularity at the boundary was not very high on >> my priority list. > > I've been working with some code for integration using the > double-exponential transform, which is useful for exactly this case. > I'm not familiar with all the issues involved here, but if you would > like to explore that option, let me know and I'll send you the code. > > I was planning on cleaning it up a bit and contributing it to scipy, anyway. Thank you for the offer, it would be good to have some more tools available to handle these special cases. There are several distributions that have a point in the interior or the boundary of the support where the density function goes to infinity. In general scipy.integrate.quad seems to be handling them quite well except for corner cases. But I don't know what the precision loss in these cases is. For the specific case of the r-distribution, rdist, the main source of the problem is that I had to disable the explicit formula for the cumulative distribution function because scipy.special.hyp2f1 produces incorrect numbers over part of the parameter range used in rdist, and I had to fall back to numerical integration. But for me there is no hurry for these corner case problems, since, in my available time, I still have other problems or improvements in stats to chase, (and for me it's not so much fun to figure out what epsilon multiplied by (going to infinity) is, especially if it's not clear what the use case for it is.) Josef From david at ar.media.kyoto-u.ac.jp Sun Mar 15 05:56:04 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sun, 15 Mar 2009 18:56:04 +0900 Subject: [SciPy-dev] #880: lfilter, variable number of output Message-ID: <49BCD0B4.9010007@ar.media.kyoto-u.ac.jp> Hi, A user of scipy.signal.lfilter reported a bug in it, which I fixed, but he also raised an interesting point concerning the variable number of output in lfilter. I think always outputting the final values of the delay line is better as well, but we can't change it wo breaking lfilter API. I first thought about adding a lfilter2, but the name is a bit misleading - and I also noticed that filter is not taken (which is the name of this functionality under matlab). What about adding a filter function which is like lfilter, but always return zf ? cheers, David From stefan at sun.ac.za Sun Mar 15 06:30:09 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sun, 15 Mar 2009 12:30:09 +0200 Subject: [SciPy-dev] #880: lfilter, variable number of output In-Reply-To: <49BCD0B4.9010007@ar.media.kyoto-u.ac.jp> References: <49BCD0B4.9010007@ar.media.kyoto-u.ac.jp> Message-ID: <9457e7c80903150330s2b633625i23e376719f41b980@mail.gmail.com> 2009/3/15 David Cournapeau : > ? ?A user of scipy.signal.lfilter reported a bug in it, which I fixed, > but he also raised an interesting point concerning the variable number > of output in lfilter. I think always outputting the final values of the > delay line is better as well, but we can't change it wo breaking lfilter > API. I first thought about adding a lfilter2, but the name is a bit > misleading - and I also noticed that filter is not taken (which is the > name of this functionality under matlab). > ? ?What about adding a filter function which is like lfilter, but > always return zf ? So we move the functionality to filter, let lfilter call filter, and add a deprecation notice to lfilter? Sounds good. Cheers St?fan From david at ar.media.kyoto-u.ac.jp Sun Mar 15 06:19:17 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sun, 15 Mar 2009 19:19:17 +0900 Subject: [SciPy-dev] #880: lfilter, variable number of output In-Reply-To: <9457e7c80903150330s2b633625i23e376719f41b980@mail.gmail.com> References: <49BCD0B4.9010007@ar.media.kyoto-u.ac.jp> <9457e7c80903150330s2b633625i23e376719f41b980@mail.gmail.com> Message-ID: <49BCD625.3060806@ar.media.kyoto-u.ac.jp> St?fan van der Walt wrote: > 2009/3/15 David Cournapeau : > >> A user of scipy.signal.lfilter reported a bug in it, which I fixed, >> but he also raised an interesting point concerning the variable number >> of output in lfilter. I think always outputting the final values of the >> delay line is better as well, but we can't change it wo breaking lfilter >> API. I first thought about adding a lfilter2, but the name is a bit >> misleading - and I also noticed that filter is not taken (which is the >> name of this functionality under matlab). >> What about adding a filter function which is like lfilter, but >> always return zf ? >> > > So we move the functionality to filter, let lfilter call filter, and > add a deprecation notice to lfilter? Sounds good. > Modulo the deprecation warning, yes. I think scipy.signal would benefit from a lot of cleaning, and if we ever decide to deprecate some things, it may be better to do everything at once ? cheers, David From fperez.net at gmail.com Mon Mar 16 00:42:46 2009 From: fperez.net at gmail.com (Fernando Perez) Date: Sun, 15 Mar 2009 21:42:46 -0700 Subject: [SciPy-dev] Has IPython been useful to you? Please let me know... Message-ID: Hi all, [ apologies for the semi-spam, I'll keep this brief and expect all replies off-list ] IPython is a project that many of you on this list are likely to use in your daily work, either directly or indirectly (if you've embedded it or used it as a component of some other system). I would simply like to ask you, if IPython has been significantly useful for a project you use, lead, develop, etc., to let me know. For legal/professional reasons, I need to gather information about who has found IPython to be of value. I started IPython as a toy 'afternoon hack' in late 2001, and today it continues to grow, as the nicely summarized Ohloh stats show: https://www.ohloh.net/p/ipython (obviously, this is now the result of the work of many, not just myself, as is true of any healthy open source project as it grows). But I have never systematically tracked its impact, and now I need to do so. So, if you have used IPython and it has made a significant contribution to your project, work, research, company, whatever, I'd be very grateful if you let me know. A short paragraph on what this benefit has been is all I ask. Once I gather any information I get, I would contact directly some of the responders to ask for your authorization before quoting you. I should stress that any information you give me will only go in a documentation packet in support of my legal/residency process here in the USA (think of it as an oversized, obnoxiously detailed CV that goes beyond just publications and regular academic information). To keep traffic off this list, please send your replies directly to me, either at this address or my regular work one: Fernando.Perez at berkeley.edu In advance, many thanks to anyone willing to reply. I've never asked for anything in return for working on IPython and the ecosystem of scientific Python tools, but this is actually very important, so any information you can provide me will be very useful. Best regards, Fernando Perez. From sturla at molden.no Mon Mar 16 09:42:42 2009 From: sturla at molden.no (Sturla Molden) Date: Mon, 16 Mar 2009 14:42:42 +0100 Subject: [SciPy-dev] Changing the container for SciPy's FFTPACK cache? Message-ID: <49BE5752.60802@molden.no> SciPy's interface to fftpack is different from NumPy's fftpack_lite. One notable difference is that SciPy's uses a cache in C, whereas NumPy uses a Python dict for the same purpose. SciPy's cache is accessed directly from the C wrappers to FFTPACK. This means that SciPy's FFTs are not threadsafe, and thus the GIL cannot be released. Thus, SciPy has to lock up the interpreter while doing FFTs. Using a Python container with ndarrays as cache would allow the GIL to be released when calling FFTPACK (cf. ticket #1055 for NumPy). I've posted a patch for NumPy yesterday, and I thought of giving SciPy a shot as well (when I can find some spare time). Sturla Molden From lists at onerussian.com Mon Mar 16 12:36:06 2009 From: lists at onerussian.com (Yaroslav Halchenko) Date: Mon, 16 Mar 2009 12:36:06 -0400 Subject: [SciPy-dev] Scipy workflow (and not tools). In-Reply-To: References: <9457e7c80902261307m1d4cdedckc7df633763a9b29d@mail.gmail.com> <9457e7c80902261448y2719b96bg8225a0717969eedd@mail.gmail.com> <5b8d13220902262023i675a4bc4ra8bf981267fb2156@mail.gmail.com> <85b5c3130902262054i27fca37fq7e2d58c6cf06626a@mail.gmail.com> <20090313195355.GE25857@washoe.rutgers.edu> <5b8d13220903132045q6e3f515cmbd2f4866558a4c33@mail.gmail.com> Message-ID: <20090316163605.GJ25857@washoe.rutgers.edu> On Sat, 14 Mar 2009, Pauli Virtanen wrote: > >> but there is a convention to use .git suffix for '--bare' repositories. > >> So I would advice to use scipy-git or scipy_git or just scipy ;) > > IIRC, the .git suffix is automatically added by github. > I guess he's talking about the name of the cloned directory here. yeap ;) -- .-. =------------------------------ /v\ ----------------------------= Keep in touch // \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^-^^ [175555] From lists at onerussian.com Mon Mar 16 13:31:22 2009 From: lists at onerussian.com (Yaroslav Halchenko) Date: Mon, 16 Mar 2009 13:31:22 -0400 Subject: [SciPy-dev] Sad sad sad... Was: Warning about remaining issues in stats.distributions ? In-Reply-To: <1cd32cbb0903130944h5c217bc6ub4b04e0a239cfb22@mail.gmail.com> References: <1cd32cbb0812010806x6eb5bdcdt684c404e4e5e8027@mail.gmail.com> <20081209183213.GL25994@washoe.rutgers.edu> <20090313024153.GA31303@washoe.rutgers.edu> <1cd32cbb0903122051x664c0f2fvcfbe47011508932@mail.gmail.com> <20090313141004.GA25857@washoe.rutgers.edu> <1cd32cbb0903130912r370fb65fn389c39e75a40b0e6@mail.gmail.com> <1cd32cbb0903130944h5c217bc6ub4b04e0a239cfb22@mail.gmail.com> Message-ID: <20090316173121.GK25857@washoe.rutgers.edu> Thanks Josef once again. I have tested s/pow/N.power/ in rdist and it seems to work great! thanks a lot. numpy is more fun though than I thought ;) In [12]:np.power(0, -1) Out[12]:-9223372036854775808 In [14]:np.power(0.0, -1) Out[14]:inf In [16]:np.power(0, -1.0) Out[16]:inf I know that in my case I will not (or should not) get ints as input but... I thought it might be worth mentioning... may be it is worth a bug report in numpy? On Fri, 13 Mar 2009, josef.pktd at gmail.com wrote: > > The solution to the rdist problem is trivial: > >>>> np.power(1-1.,-2) > > inf > >>>> pow(1-1.,-2) > > Traceback (most recent call last): > > ?File "", line 1, in > > ? ?pow(1-1.,-2) > > ZeroDivisionError: 0.0 cannot be raised to a negative power > It only partially helped, it doesn't raise an exception anymore, but the inf is > incorrect > >>> stats.rdist(1.32, 0, 1).cdf(-1.0+np.finfo(float).eps) > Warning: Extremely bad integrand behavior occurs at some points of the > integration interval. > Warning: Extremely bad integrand behavior occurs at some points of the > integration interval. > inf > >>> stats.rdist(1.32, 0, 1).cdf(-1.0+1e-14) > Warning: Extremely bad integrand behavior occurs at some points of the > integration interval. > inf > further away from the boundary it looks good > >>> stats.rdist(1.32, 0, 1).cdf(-1.0+1e-12) > 5.5125207421811825e-009 > >>> stats.rdist._cdf_skip(-1.0+1e-12, 1.32) > 5.5260225839681709e-009 > >>> stats.rdist._cdf_skip(-1.0+1e-13, 1.32) > 1.2092278844910709e-009 > >>> stats.rdist(1.32, 0, 1).cdf(-1.0+1e-13) > 1.2060822041379395e-009 > so back to fiddling with the boundary ? > so just do > >>> stats.rdist.a = -1.0+np.finfo(float).eps > before calling rdist, this works in some basic tests. > Josef > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev -- .-. =------------------------------ /v\ ----------------------------= Keep in touch // \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^-^^ [175555] From almer at gnome.org Tue Mar 17 15:05:49 2009 From: almer at gnome.org (Almer S. Tigelaar) Date: Tue, 17 Mar 2009 20:05:49 +0100 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) Message-ID: <1237316749.6984.13.camel@rufio-laptop> Hello, (I realize this mail is a bit lengthy, but I would appreciate it if someone could comment on it). I believe that I found a bug in your implementation of Kendall's Tau. I have evaluated the implementation (to verify a self-written implementation). When the results turned out to be different I investigated the current SciPy implementation at the following URL: http://svn.scipy.org/svn/scipy/trunk/scipy/stats/stats.py (I am aware of the fact that there is also a Kendall's Tau implementation in mstats.py, but have not evaluated that implementation yet). I will give some explanation of my interpretation of Kendall's tau, an example showing the differences between SciPy's and my implementation and a possible fix for SciPy's implementation. Your implementation is Kendall's tau-b with tie correction (same as mine). I take as my reference definition, the one in the following 'poster' paper: http://portal.acm.org/citation.cfm?id=1277935 (this same definition appears in other places as well, this is the shortest resource I could find) Recall that Kendall's tau calculates a score t given two rankings R1 and R2. Variables P, Q and T are all characteristics of the pairs in those rankings. The definition given in the reference is: t = (P - Q) / SQRT((P + Q + T) * (P + Q + U)) where P is the number of concordant pairs, Q the number of discordant pairs, T the number ties in R1 and U the number of ties in R2. An example: ----------- Let's use two identical rankings with a tie: A B C R1 = [1, 1, 2] R2 = [1, 1, 2] There are three pair combinations in these lists, namely: (A, B), (A, C) and (B, C). It is obvious that _one_ of these combinations has a tie for both lists (the (A,B) combination which is (1,1) for both R1 and R2). So, since there is one tie in both list we have T = U = 1 We find that there are two concordant pairs in both lists (A, C) and (B,C) so P = 2. There are no discordant pairs, so Q = 0. With all variables given, we can now calculate Kendall's tau for R1 and R2: t = (2 - 0) / SQRT((2 + 0 + 1)*(2 + 0 + 1)) t = 2 / SQRT(3*3) t = 2 / 3 t = 0.6666666 However, using scipy (svn HEAD) as follows: import scipy.stats.stats as s s.kendalltau([1,1,2], [1,1,2]) Yields t = 1.0: (1.0, 0.11718509694604401) Which I believe is wrong (or at least: has no correction for ties, as is claimed in the source code). If there are three combinations and one of these is a tie, and the other two combinations are concordant, it makes sense that Kendall's tau-b should yield 2 / 3. The cause and fix ----------------- Playing around with SciPy's code (and comparing it with my own) I believe I discovered a probable cause for this difference in SciPy's code. Again, I used the implementation at the following URL: http://svn.scipy.org/svn/scipy/trunk/scipy/stats/stats.py (please take look at the implementation first, otherwise you will not understand my explanation) In the 'kendalltau(x,y)' function we see a test for ties and an 'else' branch. In the 'else' branch the values of 'n1' and 'n2' are incremented if there is a tie (conforming to +T and +U in the formula given above). However, I believe that the 'if' conditions here are wrong: 1) Consider that if 'a1' has value '0' it is tied (the same goes for 'a2'). In the else branch I see: if a1: n1 = n1 + 1 if a2: n2 = n2 + 1 So, here the addition takes places on the variables (n1, n2) if there is NO tie, instead of if there is a tie. Hence, this explains the different outcome. Translating this back to the formula gives me T = U = 0, which would yield: t = (2 - 0) / SQRT((2 + 0 + 0)*(2 + 0 + 0)) t = 2 / SQRT(2*2) t = 2 / 2 t = 1.0 Which is indeed consistent with the SciPy outcome. Henceforth, I believe the solution to this is to correct the condition in the if statements in the Kendall's tau function: if not a1: n1 = n1 + 1 if not a2: n2 = n2 + 1 Closing ------- Of course, my interpretation of Kendall's Tau could be wrong. Since I can not exclude that possibility I would appreciate it if one of you could check and see if you reach the same conclusion. Maybe the base formula that SciPy uses is different. I have compared your implementation also to that implemented in the R project, however their source code suggests that they do not adjust for ties (effectively implementing Kendall's tau-a). -- With kind regards, Almer S. Tigelaar University of Twente From xavier.gnata at gmail.com Tue Mar 17 16:21:46 2009 From: xavier.gnata at gmail.com (Xavier Gnata) Date: Tue, 17 Mar 2009 21:21:46 +0100 Subject: [SciPy-dev] Status on ubuntu jaunty 64bits. Message-ID: <49C0065A.6010509@gmail.com> Hi, Here it is: ====================================================================== ERROR: test_implicit (test_odr.TestODR) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/scipy/odr/tests/test_odr.py", line 88, in test_implicit out = implicit_odr.run() File "/usr/local/lib/python2.6/dist-packages/scipy/odr/odrpack.py", line 1055, in run self.output = Output(apply(odr, args, kwds)) TypeError: y must be a sequence or integer (if model is implicit) ====================================================================== FAIL: test_random_real (test_basic.TestSingleIFFT) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/scipy/fftpack/tests/test_basic.py", line 206, in test_random_real assert_array_almost_equal (y2, x) File "/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", line 311, in assert_array_almost_equal header='Arrays are not almost equal') File "/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", line 296, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not almost equal (mismatch 0.900900900901%) x: array([ 0.89560729 -4.65661287e-09j, 0.87991965 +7.21774995e-09j, 0.44631395 -2.04890966e-08j, 0.71974921 +4.46598869e-11j, 0.20776373 +1.41855736e-08j, 0.83089650 -1.69798398e-09j,... y: array([ 0.89560729, 0.87991947, 0.44631401, 0.71974903, 0.20776364, 0.83089662, 0.86079419, 0.93193549, 0.20852582, 0.51215041, 0.91066802, 0.99397069, 0.74227983, 0.67712617, 0.244197 ,... ====================================================================== FAIL: test_iv_cephes_vs_amos (test_basic.TestBessel) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", line 1653, in test_iv_cephes_vs_amos self.check_cephes_vs_amos(iv, iv, rtol=1e-8, atol=1e-305) File "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", line 1642, in check_cephes_vs_amos assert_tol_equal(c1, c2, err_msg=(v, z), rtol=rtol, atol=atol) File "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", line 38, in assert_tol_equal verbose=verbose, header=header) File "/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", line 296, in assert_array_compare raise AssertionError(msg) AssertionError: Not equal to tolerance rtol=1e-08, atol=1e-305 (-120, -11) (mismatch 100.0%) x: array(1.3384173609003782e-110) y: array((1.3384173859242368e-110+0j)) ====================================================================== FAIL: test_yn_zeros (test_basic.TestBessel) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", line 1598, in test_yn_zeros 488.98055964441374646], rtol=1e-19) File "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", line 38, in assert_tol_equal verbose=verbose, header=header) File "/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", line 296, in assert_array_compare raise AssertionError(msg) AssertionError: Not equal to tolerance rtol=1e-19, atol=0 (mismatch 100.0%) x: array([ 450.136, 463.057, 472.807, 481.274, 488.981]) y: array([ 450.136, 463.057, 472.807, 481.274, 488.981]) ====================================================================== FAIL: test_ynp_zeros (test_basic.TestBessel) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", line 1604, in test_ynp_zeros assert_tol_equal(yvp(443, ao), 0, atol=1e-15) File "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", line 38, in assert_tol_equal verbose=verbose, header=header) File "/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", line 296, in assert_array_compare raise AssertionError(msg) AssertionError: Not equal to tolerance rtol=1e-07, atol=1e-15 (mismatch 100.0%) x: array([ 1.239e-10, -8.119e-16, 3.608e-16, 5.898e-16, 1.226e-15]) y: array(0) ====================================================================== FAIL: test_yv_cephes_vs_amos (test_basic.TestBessel) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", line 1650, in test_yv_cephes_vs_amos self.check_cephes_vs_amos(yv, yn, rtol=1e-11, atol=1e-305) File "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", line 1640, in check_cephes_vs_amos assert c2.imag != 0, (v, z) AssertionError: (301, 1.0) ====================================================================== FAIL: test_pbdv (test_basic.TestCephes) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", line 370, in test_pbdv assert_equal(cephes.pbdv(1,0),(0.0,0.0)) File "/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", line 176, in assert_equal assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k,err_msg), verbose) File "/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", line 183, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: item=1 ACTUAL: 1.0 DESIRED: 0.0 ---------------------------------------------------------------------- The last error on pbdv is a quite old one. Can someone reproduce this error? http://projects.scipy.org/scipy/ticket/803 According to mathematica (and my understanding of pbdv....) (0.0, 1.0) is the correct answer. Xavier From nwagner at iam.uni-stuttgart.de Tue Mar 17 16:26:01 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Tue, 17 Mar 2009 21:26:01 +0100 Subject: [SciPy-dev] Status on ubuntu jaunty 64bits. In-Reply-To: <49C0065A.6010509@gmail.com> References: <49C0065A.6010509@gmail.com> Message-ID: On Tue, 17 Mar 2009 21:21:46 +0100 Xavier Gnata wrote: > Hi, > > Here it is: > > ====================================================================== > > > ERROR: test_implicit > (test_odr.TestODR) > > ---------------------------------------------------------------------- > > > Traceback (most recent call > last): > > File > "/usr/local/lib/python2.6/dist-packages/scipy/odr/tests/test_odr.py", > line 88, in > test_implicit > > > > out = > implicit_odr.run() > > > File >"/usr/local/lib/python2.6/dist-packages/scipy/odr/odrpack.py", > line 1055, in run > self.output = Output(apply(odr, args, > kwds)) > TypeError: y must be a sequence or integer (if model is > implicit) > > ====================================================================== >FAIL: test_random_real (test_basic.TestSingleIFFT) > > ---------------------------------------------------------------------- > Traceback (most recent call last): > > File > "/usr/local/lib/python2.6/dist-packages/scipy/fftpack/tests/test_basic.py", > line 206, in > test_random_real > > > > assert_array_almost_equal (y2, > x) > File >"/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", >line > 311, in > assert_array_almost_equal > > > > header='Arrays are not almost > equal') > File >"/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", >line > 296, in > assert_array_compare > > > > raise > AssertionError(msg) > > > AssertionError: > > > > Arrays are not almost > equal > > > > (mismatch 0.900900900901%) > x: array([ 0.89560729 -4.65661287e-09j, 0.87991965 >+7.21774995e-09j, > 0.44631395 -2.04890966e-08j, 0.71974921 >+4.46598869e-11j, > 0.20776373 +1.41855736e-08j, 0.83089650 >-1.69798398e-09j,... > y: array([ 0.89560729, 0.87991947, 0.44631401, > 0.71974903, 0.20776364, > 0.83089662, 0.86079419, 0.93193549, > 0.20852582, 0.51215041, > 0.91066802, 0.99397069, 0.74227983, > 0.67712617, 0.244197 ,... > > ====================================================================== >FAIL: test_iv_cephes_vs_amos (test_basic.TestBessel) > > ---------------------------------------------------------------------- > Traceback (most recent call last): > > File > "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", > line 1653, in > test_iv_cephes_vs_amos > > > self.check_cephes_vs_amos(iv, iv, rtol=1e-8, > atol=1e-305) > File > "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", > line 1642, in > check_cephes_vs_amos > > > assert_tol_equal(c1, c2, err_msg=(v, z), rtol=rtol, > atol=atol) > File > "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", > line 38, in > assert_tol_equal > > > > verbose=verbose, > header=header) > > File >"/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", >line > 296, in > assert_array_compare > > > > raise > AssertionError(msg) > > > AssertionError: > > > > Not equal to tolerance rtol=1e-08, > atol=1e-305 > (-120, > -11) > > > > (mismatch > 100.0%) > > > > x: > array(1.3384173609003782e-110) > > > y: > array((1.3384173859242368e-110+0j)) > > > > ====================================================================== >FAIL: test_yn_zeros (test_basic.TestBessel) > > ---------------------------------------------------------------------- > Traceback (most recent call last): > > File > "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", > line 1598, in > test_yn_zeros > > > > 488.98055964441374646], > rtol=1e-19) > > File > "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", > line 38, in > assert_tol_equal > > > > verbose=verbose, > header=header) > > File >"/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", >line > 296, in > assert_array_compare > > > > raise > AssertionError(msg) > > > AssertionError: > > > > Not equal to tolerance rtol=1e-19, > atol=0 > > (mismatch 100.0%) > x: array([ 450.136, 463.057, 472.807, 481.274, > 488.981]) > y: array([ 450.136, 463.057, 472.807, 481.274, > 488.981]) > > ====================================================================== >FAIL: test_ynp_zeros (test_basic.TestBessel) > > ---------------------------------------------------------------------- > Traceback (most recent call last): > > File > "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", > line 1604, in > test_ynp_zeros > > > > assert_tol_equal(yvp(443, ao), 0, > atol=1e-15) > File > "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", > line 38, in > assert_tol_equal > > > > verbose=verbose, > header=header) > > File >"/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", >line > 296, in > assert_array_compare > > > > raise > AssertionError(msg) > > > AssertionError: > > > > Not equal to tolerance rtol=1e-07, > atol=1e-15 > > (mismatch 100.0%) > x: array([ 1.239e-10, -8.119e-16, 3.608e-16, > 5.898e-16, 1.226e-15]) > y: array(0) > > > > ====================================================================== >FAIL: test_yv_cephes_vs_amos (test_basic.TestBessel) > > ---------------------------------------------------------------------- > Traceback (most recent call last): > > File > "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", > line 1650, in > test_yv_cephes_vs_amos > > > self.check_cephes_vs_amos(yv, yn, rtol=1e-11, > atol=1e-305) > File > "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", > line 1640, in > check_cephes_vs_amos > > > assert c2.imag != 0, (v, > z) > > AssertionError: (301, > 1.0) > > > > ====================================================================== >FAIL: test_pbdv (test_basic.TestCephes) > > ---------------------------------------------------------------------- > Traceback (most recent call last): > > File > "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", > line 370, in test_pbdv > assert_equal(cephes.pbdv(1,0),(0.0,0.0)) > File >"/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", >line > 176, in assert_equal > assert_equal(actual[k], desired[k], 'item=%r\n%s' % >(k,err_msg), > verbose) > File >"/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", >line > 183, in assert_equal > raise AssertionError(msg) > AssertionError: > Items are not equal: > item=1 > > ACTUAL: 1.0 > DESIRED: 0.0 > > ---------------------------------------------------------------------- > > The last error on pbdv is a quite old one. > Can someone reproduce this error? Yes I can. ====================================================================== FAIL: test_pbdv (test_basic.TestCephes) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/nwagner/local/lib64/python2.6/site-packages/scipy/special/tests/test_basic.py", line 370, in test_pbdv assert_equal(cephes.pbdv(1,0),(0.0,0.0)) File "/home/nwagner/local/lib64/python2.6/site-packages/numpy/testing/utils.py", line 183, in assert_equal assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k,err_msg), verbose) File "/home/nwagner/local/lib64/python2.6/site-packages/numpy/testing/utils.py", line 190, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: item=1 ACTUAL: 1.0 DESIRED: 0.0 ---------------------------------------------------------------------- Ran 3542 tests in 80.817s FAILED (KNOWNFAIL=2, SKIP=17, errors=1, failures=6) Cheers, Nils From josef.pktd at gmail.com Tue Mar 17 16:41:39 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 17 Mar 2009 16:41:39 -0400 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1237316749.6984.13.camel@rufio-laptop> References: <1237316749.6984.13.camel@rufio-laptop> Message-ID: <1cd32cbb0903171341x29d9a029u53b354fcba0fb400@mail.gmail.com> On Tue, Mar 17, 2009 at 3:05 PM, Almer S. Tigelaar wrote: > Hello, > > (I realize this mail is a bit lengthy, but I would appreciate it if someone could comment on it). > > I believe that I found a bug in your implementation of Kendall's Tau. I > have evaluated the implementation (to verify a self-written > implementation). When the results turned out to be different I > investigated the current SciPy implementation at the following URL: > http://svn.scipy.org/svn/scipy/trunk/scipy/stats/stats.py > > (I am aware of the fact that there is also a Kendall's Tau implementation in mstats.py, but > ?have not evaluated that implementation yet). > > I will give some explanation of my interpretation of Kendall's tau, an > example showing the differences between SciPy's and my implementation and a > possible fix for SciPy's implementation. > > Your implementation is Kendall's tau-b with tie correction (same as > mine). I take as my reference definition, the one in the following > 'poster' paper: > http://portal.acm.org/citation.cfm?id=1277935 > (this same definition appears in other places as well, this is the > shortest resource I could find) > > Recall that Kendall's tau calculates a score t given two rankings R1 and > R2. Variables P, Q and T are all characteristics of the pairs in those > rankings. > > The definition given in the reference is: > ? ? ? ?t = (P - Q) / SQRT((P + Q + T) * (P + Q + U)) > where P is the number of concordant pairs, Q the number of discordant > pairs, T the number ties in R1 and U the number of ties in R2. > > An example: > ----------- > Let's use two identical rankings with a tie: > ? ? ? ? ? ? ?A ?B ?C > ? ? ? ?R1 = [1, 1, 2] > ? ? ? ?R2 = [1, 1, 2] > > There are three pair combinations in these lists, namely: (A, B), (A, C) > and (B, C). It is obvious that _one_ of these combinations has a tie for > both lists (the (A,B) combination which is (1,1) for both R1 and R2). > So, since there is one tie in both list we have T = U = 1 > > We find that there are two concordant pairs in both lists (A, C) and > (B,C) so P = 2. There are no discordant pairs, so Q = 0. With all > variables given, we can now calculate Kendall's tau for R1 and R2: > > ? ? ? ?t = (2 - 0) / SQRT((2 + 0 + 1)*(2 + 0 + 1)) > ? ? ? ?t = 2 / SQRT(3*3) > ? ? ? ?t = 2 / 3 > ? ? ? ?t = 0.6666666 > > However, using scipy (svn HEAD) as follows: > > ? ? ? ?import scipy.stats.stats as s > ? ? ? ?s.kendalltau([1,1,2], [1,1,2]) > > Yields t = 1.0: > > ? ? ? ?(1.0, 0.11718509694604401) > > Which I believe is wrong (or at least: has no correction for ties, as is > claimed in the source code). If there are three combinations and one of > these is a tie, and the other two combinations are concordant, it makes > sense that Kendall's tau-b should yield 2 / 3. > > The cause and fix > ----------------- > Playing around with SciPy's code (and comparing it with my own) I believe I > discovered a probable cause for this difference in SciPy's code. Again, I used the > implementation at the following URL: > http://svn.scipy.org/svn/scipy/trunk/scipy/stats/stats.py > (please take look at the implementation first, otherwise you will not > understand my explanation) > > In the 'kendalltau(x,y)' function we see a test for ties and an 'else' > branch. In the 'else' branch the values of 'n1' and 'n2' are incremented > if there is a tie (conforming to +T and +U in the formula given above). > However, I believe that the 'if' conditions here are wrong: > 1) Consider that if 'a1' has value '0' it is tied (the same goes for > 'a2'). In the else branch I see: > > ? ? ? ?if a1: > ? ? ? ? ? ? ? ?n1 = n1 + 1 > ? ? ? ?if a2: > ? ? ? ? ? ? ? ?n2 = n2 + 1 > > So, here the addition takes places on the variables (n1, n2) if there is > NO tie, instead of if there is a tie. Hence, this explains the different > outcome. Translating this back to the formula gives me T = U = 0, which > would yield: > > ? ? ? ?t = (2 - 0) / SQRT((2 + 0 + 0)*(2 + 0 + 0)) > ? ? ? ?t = 2 / SQRT(2*2) > ? ? ? ?t = 2 / 2 > ? ? ? ?t = 1.0 > > Which is indeed consistent with the SciPy outcome. Henceforth, I believe > the solution to this is to correct the condition in the if statements in > the Kendall's tau function: > > ? ? ? ?if not a1: > ? ? ? ? ? ? ? ?n1 = n1 + 1 > ? ? ? ?if not a2: > ? ? ? ? ? ? ? ?n2 = n2 + 1 Yes, comparing this part of the function with your reference, your change is correct. I prefer an explicit comparison with 0, because it's much easier to read than thinking about the truth value of a number if a1 == 0: n1 = n1 + 1 if a2 == 0: n2 = n2 + 1 This is already the third problem with tie handling, so I am not surprised. Note, however that this only creates the incorrect count if both vectors have a tie for exactly the same pair. If only one of the two variables has a tie, then it increases n of the other variable, either n1 if a2 ==0, or n2 if a1 ==0. So, since only matching ties are not counted in the calculation of kendalls tau, this might be up to interpretation whether two identical vectors should have correlation 1 or something smaller, reduced by matching tie correction. > > Closing > ------- > Of course, my interpretation of Kendall's Tau could be wrong. Since I > can not exclude that possibility I would appreciate it if one of you could > check and see if you reach the same conclusion. Maybe the base formula that > SciPy uses is different. > > I have compared your implementation also to that implemented in the R > project, however their source code suggests that they do not adjust for > ties (effectively implementing Kendall's tau-a). > > -- > With kind regards, > > Almer S. Tigelaar > University of Twente > Thank you for checking and reporting this. I still need to look at a it bit more closely, and find some correct tests, but I will change it withing a few days. The problem, I had with Kendalls tau was that I didn't find a good, non-ambiguous reference, also with the hints to different versions of kendalls tau, it wasn't clear to me what exactly is implemented or how the different versions are defined. Do you also know a reference for the variance that is used in calculating the p-value, so we can also verify it? I'm a bit surprised that R doesn't do the tie handling. Josef From sturla at molden.no Tue Mar 17 17:09:55 2009 From: sturla at molden.no (Sturla Molden) Date: Tue, 17 Mar 2009 22:09:55 +0100 (CET) Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1237316749.6984.13.camel@rufio-laptop> References: <1237316749.6984.13.camel@rufio-laptop> Message-ID: <39162f05a73e9a6fb9cf829b491857e7.squirrel@webmail.uio.no> > Let's use two identical rankings with a tie: > A B C > R1 = [1, 1, 2] > R2 = [1, 1, 2] Minitab says Kendall's tau is 0.67 in this case. When looking at page 752 in Numerical Receipes, 3rd edition: tau = (c - d)/(sqrt(c+d+ey)*sqrt(c+d+ex)) c = #concordant pairs d = #disconcordant pairs ey = #pairs with tied rank in y but not in x ex = #pairs with tied tank in x but not in y Here the pairs are: (1,1) vs. (1,1) -> tie in x and tie in y (1,1) vs. (2,2) -> concordant pair (1,1) vs. (2,2) -> concordant pair tau = (2 - 0) / (sqrt(2+0+0)*sqrt(2+0+0)) = 1 So from NR we are forced to conclude that tau is 1 in this case. Sturla Molden c = 2 d = 0 ex = 1 ey = 1 tau = 2/(sqrt(3)*sqrt(3)) = 2/3 = 0.666667 Which by the way is just what Minitab says. Sturla Molden > There are three pair combinations in these lists, namely: (A, B), (A, C) > and (B, C). It is obvious that _one_ of these combinations has a tie for > both lists (the (A,B) combination which is (1,1) for both R1 and R2). > So, since there is one tie in both list we have T = U = 1 > > We find that there are two concordant pairs in both lists (A, C) and > (B,C) so P = 2. There are no discordant pairs, so Q = 0. With all > variables given, we can now calculate Kendall's tau for R1 and R2: > > t = (2 - 0) / SQRT((2 + 0 + 1)*(2 + 0 + 1)) > t = 2 / SQRT(3*3) > t = 2 / 3 > t = 0.6666666 > > However, using scipy (svn HEAD) as follows: > > import scipy.stats.stats as s > s.kendalltau([1,1,2], [1,1,2]) > > Yields t = 1.0: > > (1.0, 0.11718509694604401) > > Which I believe is wrong (or at least: has no correction for ties, as is > claimed in the source code). If there are three combinations and one of > these is a tie, and the other two combinations are concordant, it makes > sense that Kendall's tau-b should yield 2 / 3. > > The cause and fix > ----------------- > Playing around with SciPy's code (and comparing it with my own) I believe > I > discovered a probable cause for this difference in SciPy's code. Again, I > used the > implementation at the following URL: > http://svn.scipy.org/svn/scipy/trunk/scipy/stats/stats.py > (please take look at the implementation first, otherwise you will not > understand my explanation) > > In the 'kendalltau(x,y)' function we see a test for ties and an 'else' > branch. In the 'else' branch the values of 'n1' and 'n2' are incremented > if there is a tie (conforming to +T and +U in the formula given above). > However, I believe that the 'if' conditions here are wrong: > 1) Consider that if 'a1' has value '0' it is tied (the same goes for > 'a2'). In the else branch I see: > > if a1: > n1 = n1 + 1 > if a2: > n2 = n2 + 1 > > So, here the addition takes places on the variables (n1, n2) if there is > NO tie, instead of if there is a tie. Hence, this explains the different > outcome. Translating this back to the formula gives me T = U = 0, which > would yield: > > t = (2 - 0) / SQRT((2 + 0 + 0)*(2 + 0 + 0)) > t = 2 / SQRT(2*2) > t = 2 / 2 > t = 1.0 > > Which is indeed consistent with the SciPy outcome. Henceforth, I believe > the solution to this is to correct the condition in the if statements in > the Kendall's tau function: > > if not a1: > n1 = n1 + 1 > if not a2: > n2 = n2 + 1 > > Closing > ------- > Of course, my interpretation of Kendall's Tau could be wrong. Since I > can not exclude that possibility I would appreciate it if one of you could > check and see if you reach the same conclusion. Maybe the base formula > that > SciPy uses is different. > > I have compared your implementation also to that implemented in the R > project, however their source code suggests that they do not adjust for > ties (effectively implementing Kendall's tau-a). > > -- > With kind regards, > > Almer S. Tigelaar > University of Twente > > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From sturla at molden.no Tue Mar 17 17:16:27 2009 From: sturla at molden.no (Sturla Molden) Date: Tue, 17 Mar 2009 22:16:27 +0100 (CET) Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1237316749.6984.13.camel@rufio-laptop> References: <1237316749.6984.13.camel@rufio-laptop> Message-ID: <8f999ebe2068b2c62ecf068df813b3af.squirrel@webmail.uio.no> > Hello, > The definition given in the reference is: > t = (P - Q) / SQRT((P + Q + T) * (P + Q + U)) Correct. > There are three pair combinations in these lists, namely: (A, B), (A, C) > and (B, C). It is obvious that _one_ of these combinations has a tie for > both lists (the (A,B) combination which is (1,1) for both R1 and R2). > So, since there is one tie in both list we have T = U = 1 Wrong. Since the tie in in x and y, this pair should be ignored. T = U = 0. Sturla Molden From sturla at molden.no Tue Mar 17 17:36:24 2009 From: sturla at molden.no (Sturla Molden) Date: Tue, 17 Mar 2009 22:36:24 +0100 (CET) Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1237316749.6984.13.camel@rufio-laptop> References: <1237316749.6984.13.camel@rufio-laptop> Message-ID: <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> What the heck... Please ignore the crap below my first signature. Sturla Molden From josef.pktd at gmail.com Tue Mar 17 18:10:30 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 17 Mar 2009 18:10:30 -0400 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> References: <1237316749.6984.13.camel@rufio-laptop> <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> Message-ID: <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> Hollander, M., and D. A. Wolfe. 1999. Nonparametric statistical methods is supposed to have a discussion on tie handling for kendall's tau, but I don't have access to it. Searching some references again, I still get only ambiguous answers, whether matching ties should be counted or not. I guess we can stick with the current implementation if it produces the same result as numerical recipes. And I like it better if a vector has correlation 1 with itself. But I found a verification for the calculation of the variance and the pvalue. Josef From xavier.gnata at gmail.com Tue Mar 17 19:03:33 2009 From: xavier.gnata at gmail.com (Xavier Gnata) Date: Wed, 18 Mar 2009 00:03:33 +0100 Subject: [SciPy-dev] Status on ubuntu jaunty 64bits. In-Reply-To: References: <49C0065A.6010509@gmail.com> Message-ID: <49C02C45.9030403@gmail.com> Nils Wagner wrote: > On Tue, 17 Mar 2009 21:21:46 +0100 > Xavier Gnata wrote: > >> Hi, >> >> Here it is: >> >> ====================================================================== >> >> >> ERROR: test_implicit >> (test_odr.TestODR) >> >> ---------------------------------------------------------------------- >> >> >> Traceback (most recent call >> last): >> >> File >> "/usr/local/lib/python2.6/dist-packages/scipy/odr/tests/test_odr.py", >> line 88, in >> test_implicit >> >> >> >> out = >> implicit_odr.run() >> >> >> File >> "/usr/local/lib/python2.6/dist-packages/scipy/odr/odrpack.py", >> line 1055, in run >> self.output = Output(apply(odr, args, >> kwds)) >> TypeError: y must be a sequence or integer (if model is >> implicit) >> >> ====================================================================== >> FAIL: test_random_real (test_basic.TestSingleIFFT) >> >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> >> File >> "/usr/local/lib/python2.6/dist-packages/scipy/fftpack/tests/test_basic.py", >> line 206, in >> test_random_real >> >> >> >> assert_array_almost_equal (y2, >> x) >> File >> "/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", >> line >> 311, in >> assert_array_almost_equal >> >> >> >> header='Arrays are not almost >> equal') >> File >> "/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", >> line >> 296, in >> assert_array_compare >> >> >> >> raise >> AssertionError(msg) >> >> >> AssertionError: >> >> >> >> Arrays are not almost >> equal >> >> >> >> (mismatch 0.900900900901%) >> x: array([ 0.89560729 -4.65661287e-09j, 0.87991965 >> +7.21774995e-09j, >> 0.44631395 -2.04890966e-08j, 0.71974921 >> +4.46598869e-11j, >> 0.20776373 +1.41855736e-08j, 0.83089650 >> -1.69798398e-09j,... >> y: array([ 0.89560729, 0.87991947, 0.44631401, >> 0.71974903, 0.20776364, >> 0.83089662, 0.86079419, 0.93193549, >> 0.20852582, 0.51215041, >> 0.91066802, 0.99397069, 0.74227983, >> 0.67712617, 0.244197 ,... >> >> ====================================================================== >> FAIL: test_iv_cephes_vs_amos (test_basic.TestBessel) >> >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> >> File >> "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", >> line 1653, in >> test_iv_cephes_vs_amos >> >> >> self.check_cephes_vs_amos(iv, iv, rtol=1e-8, >> atol=1e-305) >> File >> "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", >> line 1642, in >> check_cephes_vs_amos >> >> >> assert_tol_equal(c1, c2, err_msg=(v, z), rtol=rtol, >> atol=atol) >> File >> "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", >> line 38, in >> assert_tol_equal >> >> >> >> verbose=verbose, >> header=header) >> >> File >> "/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", >> line >> 296, in >> assert_array_compare >> >> >> >> raise >> AssertionError(msg) >> >> >> AssertionError: >> >> >> >> Not equal to tolerance rtol=1e-08, >> atol=1e-305 >> (-120, >> -11) >> >> >> >> (mismatch >> 100.0%) >> >> >> >> x: >> array(1.3384173609003782e-110) >> >> >> y: >> array((1.3384173859242368e-110+0j)) >> >> >> >> ====================================================================== >> FAIL: test_yn_zeros (test_basic.TestBessel) >> >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> >> File >> "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", >> line 1598, in >> test_yn_zeros >> >> >> >> 488.98055964441374646], >> rtol=1e-19) >> >> File >> "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", >> line 38, in >> assert_tol_equal >> >> >> >> verbose=verbose, >> header=header) >> >> File >> "/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", >> line >> 296, in >> assert_array_compare >> >> >> >> raise >> AssertionError(msg) >> >> >> AssertionError: >> >> >> >> Not equal to tolerance rtol=1e-19, >> atol=0 >> >> (mismatch 100.0%) >> x: array([ 450.136, 463.057, 472.807, 481.274, >> 488.981]) >> y: array([ 450.136, 463.057, 472.807, 481.274, >> 488.981]) >> >> ====================================================================== >> FAIL: test_ynp_zeros (test_basic.TestBessel) >> >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> >> File >> "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", >> line 1604, in >> test_ynp_zeros >> >> >> >> assert_tol_equal(yvp(443, ao), 0, >> atol=1e-15) >> File >> "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", >> line 38, in >> assert_tol_equal >> >> >> >> verbose=verbose, >> header=header) >> >> File >> "/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", >> line >> 296, in >> assert_array_compare >> >> >> >> raise >> AssertionError(msg) >> >> >> AssertionError: >> >> >> >> Not equal to tolerance rtol=1e-07, >> atol=1e-15 >> >> (mismatch 100.0%) >> x: array([ 1.239e-10, -8.119e-16, 3.608e-16, >> 5.898e-16, 1.226e-15]) >> y: array(0) >> >> >> >> ====================================================================== >> FAIL: test_yv_cephes_vs_amos (test_basic.TestBessel) >> >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> >> File >> "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", >> line 1650, in >> test_yv_cephes_vs_amos >> >> >> self.check_cephes_vs_amos(yv, yn, rtol=1e-11, >> atol=1e-305) >> File >> "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", >> line 1640, in >> check_cephes_vs_amos >> >> >> assert c2.imag != 0, (v, >> z) >> >> AssertionError: (301, >> 1.0) >> >> >> >> ====================================================================== >> FAIL: test_pbdv (test_basic.TestCephes) >> >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> >> File >> "/usr/local/lib/python2.6/dist-packages/scipy/special/tests/test_basic.py", >> line 370, in test_pbdv >> assert_equal(cephes.pbdv(1,0),(0.0,0.0)) >> File >> "/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", >> line >> 176, in assert_equal >> assert_equal(actual[k], desired[k], 'item=%r\n%s' % >> (k,err_msg), >> verbose) >> File >> "/usr/lib/python2.6/dist-packages/numpy/testing/utils.py", >> line >> 183, in assert_equal >> raise AssertionError(msg) >> AssertionError: >> Items are not equal: >> item=1 >> >> ACTUAL: 1.0 >> DESIRED: 0.0 >> >> ---------------------------------------------------------------------- >> >> The last error on pbdv is a quite old one. >> Can someone reproduce this error? >> > > Yes I can. > > ====================================================================== > FAIL: test_pbdv (test_basic.TestCephes) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/home/nwagner/local/lib64/python2.6/site-packages/scipy/special/tests/test_basic.py", > line 370, in test_pbdv > assert_equal(cephes.pbdv(1,0),(0.0,0.0)) > File > "/home/nwagner/local/lib64/python2.6/site-packages/numpy/testing/utils.py", > line 183, in assert_equal > assert_equal(actual[k], desired[k], 'item=%r\n%s' % > (k,err_msg), verbose) > File > "/home/nwagner/local/lib64/python2.6/site-packages/numpy/testing/utils.py", > line 190, in assert_equal > raise AssertionError(msg) > AssertionError: > Items are not equal: > item=1 > > ACTUAL: 1.0 > DESIRED: 0.0 > > ---------------------------------------------------------------------- > Ran 3542 tests in 80.817s > > FAILED (KNOWNFAIL=2, SKIP=17, errors=1, failures=6) > > > Cheers, > Nils > _______________________________________________ > Ok. According to mathematica: ParabolicCylinderD[1., 0] : Same result with scipy. Ok. ParabolicCylinderD[2., 0]=-1 : Same result with scipy. Ok. D[ParabolicCylinderD[a, x], x]=1/2 x ParabolicCylinderD[a, x] - ParabolicCylinderD[1 + a, x] As a result, with a=1 and x=0, we have: 1/2 * 1 * 1 - (-1) = 0 It means that: assert_equal(cephes.pbdv(1,0),(0.0,0.0)) is wrong. It should be : assert_equal(cephes.pbdv(1,0),(0.0,1.0)) It also looks like there is anyhow another problem: I cannot reproduce the value of the derivative for "large" value of a and x: pbdv(.2,.3)=(0.90436167932402323, 0.067568701127784084) : Ok. but pbdv(2,3)=(0.84319379649491466, 1.4228895315851684) : according to mathematica, it should be (0.84319379649491466,-0.632395) Cheers, Xavier From sturla at molden.no Tue Mar 17 19:09:44 2009 From: sturla at molden.no (Sturla Molden) Date: Wed, 18 Mar 2009 00:09:44 +0100 (CET) Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> References: <1237316749.6984.13.camel@rufio-laptop> <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> Message-ID: > Hollander, M., and D. A. Wolfe. 1999. Nonparametric statistical methods > is supposed to have a discussion on tie handling for kendall's tau, > but I don't have access to it. I have this book in my book shelf at work. Sturla Molden From josef.pktd at gmail.com Tue Mar 17 19:11:41 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 17 Mar 2009 19:11:41 -0400 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> References: <1237316749.6984.13.camel@rufio-laptop> <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> Message-ID: <1cd32cbb0903171611t38c9bd59hf83092c69ffcc4f5@mail.gmail.com> On Tue, Mar 17, 2009 at 6:10 PM, wrote: > Hollander, M., and D. A. Wolfe. 1999. Nonparametric statistical methods > is supposed to have a discussion on tie handling for kendall's tau, > but I don't have access to it. > > Searching some references again, I still get only ambiguous answers, > whether matching ties should be counted or not. > > I guess we can stick with the current implementation if it produces > the same result as numerical recipes. > And I like it better if a vector has correlation 1 with itself. > > But I found a verification for the calculation of the variance and the pvalue. > > Josef > I saw it mentioned somewhere, that Kendall's tau is the correlation coefficient of pairwise ranking indicators. I tried to see if I can get this, and the version below exactly replicates the current implementation for the test examples. So similar to spearman and the other correlation statistics, we just need to construct the right transformation to get a nice correlation interpretation back. I think, this wouldn't hold if we don't exclude matching ties in the counts for the denominator as is done with the current implementation. Josef import numpy as np from scipy import stats from numpy.testing import assert_equal def kendalltaustat(x, y): '''calculate Kendall's tau-b correlation statistic this is just the (non-central) correlation of all pairwise rankings ''' # calculate indicators of all pairs ppos1 = np.sign((x[:,np.newaxis] - x)).astype(float).ravel() ppos2 = np.sign((y[:,np.newaxis] - y)).astype(float).ravel() #correlation coefficient without mean correction tau = np.dot(ppos1,ppos2) / np.sqrt(np.dot(ppos1,ppos1) * np.dot(ppos2,ppos2)) return tau x1a = np.array([0, 1, 3, 3, 4, 5, 5, 7, 8]) x1b = np.array([1, 3, 3, 4, 5, 5, 7, 8, 9]) x1c = np.array([1, 3, 3, 3, 5, 5, 7, 8, 9]) x1 = np.array([1,1,2]) x2a = np.array([1,1,2,2]) x2b = np.array([1,2,3,4]) data = [(x1a,x1a), (x1a,x1b), (x1a,x1b), (x1,x1), (x2a,x2b), (x2b,x2b), (x2a,3-x2a), (x2b,3-x2b)] for x,y in data: t1 = kendalltaustat(x,y), ts, ps = stats.kendalltau(x,y) print t1, (ts, ps) assert_equal(t1,ts) for i in range(10): x = np.random.randn(20) y = np.random.randn(20) t1 = kendalltaustat(x,y), ts, ps = stats.kendalltau(x,y) print t1, (ts, ps) assert_equal(t1,ts) From josef.pktd at gmail.com Tue Mar 17 19:32:53 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 17 Mar 2009 19:32:53 -0400 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: References: <1237316749.6984.13.camel@rufio-laptop> <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> Message-ID: <1cd32cbb0903171632j46a91640i911c8ec8da3bc039@mail.gmail.com> On Tue, Mar 17, 2009 at 7:09 PM, Sturla Molden wrote: >> Hollander, M., and D. A. Wolfe. 1999. Nonparametric statistical methods >> is supposed to have a discussion on tie handling for kendall's tau, >> but I don't have access to it. > > I have this book in my book shelf at work. > > Sturla Molden > If you can verify, that we don't include matching ties in the denominator, then we could settle the issue and mark kendalltau as verified. Thanks, Josef From sturla at molden.no Tue Mar 17 19:37:37 2009 From: sturla at molden.no (Sturla Molden) Date: Wed, 18 Mar 2009 00:37:37 +0100 (CET) Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1cd32cbb0903171611t38c9bd59hf83092c69ffcc4f5@mail.gmail.com> References: <1237316749.6984.13.camel@rufio-laptop> <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> <1cd32cbb0903171611t38c9bd59hf83092c69ffcc4f5@mail.gmail.com> Message-ID: <32800b5f0562df92bbbb300b1fdf17ef.squirrel@webmail.uio.no> > On Tue, Mar 17, 2009 at 6:10 PM, wrote: > I saw it mentioned somewhere, that Kendall's tau is the correlation > coefficient of pairwise ranking indicators. By definition, given a joint probability distribution f(x,y), tau = 2 * Probability{ (x1-x2)*(y1-y2) > 0 } - 1 where (x1,y1) and (x2,y2) are i.i.d. f(x,y). Sturla Molden From almer at gnome.org Wed Mar 18 05:12:42 2009 From: almer at gnome.org (Almer S. Tigelaar) Date: Wed, 18 Mar 2009 10:12:42 +0100 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1cd32cbb0903171341x29d9a029u53b354fcba0fb400@mail.gmail.com> References: <1237316749.6984.13.camel@rufio-laptop> <1cd32cbb0903171341x29d9a029u53b354fcba0fb400@mail.gmail.com> Message-ID: <1237367562.6824.14.camel@ewi1341> Hi Josef, On Tue, 2009-03-17 at 16:41 -0400, josef.pktd at gmail.com wrote: > The problem, I had with Kendalls tau was that I didn't find a good, > non-ambiguous reference, also with the hints to different versions of > kendalls tau, it wasn't clear to me what exactly is implemented or how > the different versions are defined. For clarity (and future reference), there are three versions that I know of (I will give them in full, repeating some text for each definition): Kendall tau-a (with NO handling for ties): ------------------------------------------ t = (P - Q) / (0.5 * n * (n - 1)) where P is the number of concordant pairs, Q the number of discordant pairs and n is the number of items to rank (the denominator is actually equal to the number of pairs). [I have verified this with multiple academic sources and implementations and am quite sure that this definition is correct] Kendall tau-b (tie handling): ----------------------------- t = (P - Q) / SQRT((P + Q + T) * (P + Q + U)) where P is the number of concordant pairs, Q the number of discordant pairs, T the number ties in R1 and U the number of ties in R2. (and were we are still discussing whether T and U should be incremented a tie occurs in the same pair for both). Kendall tau-c (alternative tie handling): ----------------------------------------- (also called Stuart's tau-c or Kendall-Stuart's tau-c) t = (m * (P - Q)) / (n^2 * (m - 1)) where P is the number of concordant pairs, Q the number of discordant pairs, n the number of items and m = min(r,s) where r and s are the number of rows and columns in the data. [Note that there are some incorrect definition of Kendall tau-c floating around which substitute 2m instead of m in the numerator, as this can yield values outside of the (-1, +1) range this is obviously wrong] --- Some sources state that Kendall tau-b is more appropriate for square tables and Kendall tau-c for rectangular ones. However, this is an argument I admittedly do not yet fully grasp. > I'm a bit surprised that R doesn't do the tie handling. I might be wrong, my knowledge of R is not (yet) very thorough, but this is the conclusion I drew from the comments in this file: https://svn.r-project.org/R/trunk/src/library/stats/src/kendall.c With kind regards, Almer S. Tigelaar From almer at gnome.org Wed Mar 18 05:25:02 2009 From: almer at gnome.org (Almer S. Tigelaar) Date: Wed, 18 Mar 2009 10:25:02 +0100 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <8f999ebe2068b2c62ecf068df813b3af.squirrel@webmail.uio.no> References: <1237316749.6984.13.camel@rufio-laptop> <8f999ebe2068b2c62ecf068df813b3af.squirrel@webmail.uio.no> Message-ID: <1237368302.6824.26.camel@ewi1341> Hi Sturla, Thanks for your reaction(s). On Tue, 2009-03-17 at 22:16 +0100, Sturla Molden wrote: > Wrong. Since the tie in in x and y, this pair should be ignored. T = U = 0. *Ponders* .... well it does intuitively make sense that if the tie occurs in the same pair it would be ignored. Since it does not provide any evidence of the rankings being more dissimilar. Good point. However, do I interpret you correct that minitab disagrees with this? -- With kind regards, Almer S. Tigelaar From almer at gnome.org Wed Mar 18 05:36:05 2009 From: almer at gnome.org (Almer S. Tigelaar) Date: Wed, 18 Mar 2009 10:36:05 +0100 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1cd32cbb0903171611t38c9bd59hf83092c69ffcc4f5@mail.gmail.com> References: <1237316749.6984.13.camel@rufio-laptop> <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> <1cd32cbb0903171611t38c9bd59hf83092c69ffcc4f5@mail.gmail.com> Message-ID: <1237368965.6824.37.camel@ewi1341> Hi Josef, On Tue, 2009-03-17 at 19:11 -0400, josef.pktd at gmail.com wrote: > I saw it mentioned somewhere, that Kendall's tau is the correlation > coefficient of pairwise ranking indicators. > I think, this wouldn't hold if we don't exclude matching ties in the > counts for the denominator as is done with the current implementation. Okay, given this and reading the other posts (from Sturla) then my initial interpretation is probably incorrect. In closing we can agree that the theoretical definition should really be as follows: Kendall's tau-b (tie handling): ------------------------------- t = (P - Q) / SQRT((P + Q + T) * (P + Q + U)) where P is the number of concordant pairs, Q the number of discordant pairs, T the number ties only in R1 and U the number of ties only in R2. If a tie occurs for the same pair in both R1 and R2, it is not added to either T or U. ------------------------------- I have not yet been able to find a source myself that unambiguously gives precisely this same definition. But given Sturla's interpretation, his access to the books by Hollander&Wolfe and the Numerical Recipes book and your confirmation using the correlation coefficient, I am inclined to accept this definition. Thanks all for your feedback and help! From sturla at molden.no Wed Mar 18 08:11:35 2009 From: sturla at molden.no (Sturla Molden) Date: Wed, 18 Mar 2009 13:11:35 +0100 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: References: <1237316749.6984.13.camel@rufio-laptop> <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> Message-ID: <49C0E4F7.3020707@molden.no> On 3/18/2009 12:09 AM, Sturla Molden wrote: >> Hollander, M., and D. A. Wolfe. 1999. Nonparametric statistical methods >> is supposed to have a discussion on tie handling for kendall's tau, >> but I don't have access to it. > > I have this book in my book shelf at work. Here is a completely na?ve Kendall's tau based on Hollander & Wolfe's book: def tau(x,y): """ Kendall's tau according to Hollander, M., and D. A. Wolfe. 1999. Nonparametric statistical methods. 2nd edition. New York: Wiley. Page 382. """ def Q((a,b),(c,d)): if (d-b)*(c-a) > 0: return 1 if (d-b)*(c-a) == 0: return 0 if (d-b)*(c-a) < 0: return -1 raise ValueError, 'this should never occur' K = 0 n = len(x) assert(len(x)==len(y)) for i in range(n-1): for j in range(i+1,n): K += Q((x[i],y[i]),(x[j],y[j])) return 2.0*K/(n*(n-1)) # Eq 8.34 And with this: >>> tau([1,1,2],[1,1,2]) 0.66666666666666663 So it seems Hollander & Wolfe and Minitab says 0.67, whereas Numerical Receipes says 1.0. Intuitively a vector correlation should be exactly correlated with itself, but I am inclined to trust Hollander & Wolfe more than Numerical Receipes. For example, if we use this probability definition of tau: tau = P{concordant pair} - P{disconcordant pair} then tau should indeed be 0.67. Best regards, Sturla Molden From almer at gnome.org Wed Mar 18 09:19:38 2009 From: almer at gnome.org (Almer S. Tigelaar) Date: Wed, 18 Mar 2009 14:19:38 +0100 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <49C0E4F7.3020707@molden.no> References: <1237316749.6984.13.camel@rufio-laptop> <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> <49C0E4F7.3020707@molden.no> Message-ID: <1237382378.6824.120.camel@ewi1341> Hello, On Wed, 2009-03-18 at 13:11 +0100, Sturla Molden wrote: > So it seems Hollander & Wolfe and Minitab says 0.67, whereas Numerical > Receipes says 1.0. Intuitively a vector correlation should be exactly > correlated with itself, but I am inclined to trust Hollander & Wolfe > more than Numerical Receipes. Ah, I was under the impression you already checked Hollander & Wolfe. Anyway, it seems my initial interpretation was right then. Repeating the formula here (augmented) for future reference: Kendall's tau-b (tie handling): ------------------------------- Given two rankings R1 and R2, Kendall's tau-b is calculated by: t = (P - Q) / SQRT((P + Q + T) * (P + Q + U)) where P is the number of concordant pairs, Q the number of discordant pairs, T the number of ties in R1 and U the number of ties in R2. [Ties are always counted regardless of whether they occur for the same pair in R1 and R2 or different pairs] ------------------------------- Some tests I ran today with the R implementation of Kendall's Tau(-a) and the original implementation in SciPy.stats.stats (Kendall's Tau-b) seem to suggests that if we do NOT count ties on the same pair (the current situation in SciPy.stats.stats) effectively Kendall's Tau-b gives the same outcomes as Kendall's Tau-a for about 36 test cases. This seems to suggest that Kendall's Tau-b (tie correction) in SciPy as it is behaves like Kendall's Tau-a (no tie correction), possibly because of leaving out ties on identical pairs in T and U above. I unfortunately do not have the time to mathematically prove (or disprove) the equivalence of Kendall's Tau-a and the current SciPy implementation right now, but I thought I'd be useful to mention these test results. -- With kind regards, Almer S. Tigelaar University of Twente From bsouthey at gmail.com Wed Mar 18 09:29:17 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 18 Mar 2009 08:29:17 -0500 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1237382378.6824.120.camel@ewi1341> References: <1237316749.6984.13.camel@rufio-laptop> <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> <49C0E4F7.3020707@molden.no> <1237382378.6824.120.camel@ewi1341> Message-ID: <49C0F72D.6050406@gmail.com> Almer S. Tigelaar wrote: > Hello, > > On Wed, 2009-03-18 at 13:11 +0100, Sturla Molden wrote: > >> So it seems Hollander & Wolfe and Minitab says 0.67, whereas Numerical >> Receipes says 1.0. Intuitively a vector correlation should be exactly >> correlated with itself, but I am inclined to trust Hollander & Wolfe >> more than Numerical Receipes. >> > > Ah, I was under the impression you already checked Hollander & Wolfe. > Anyway, it seems my initial interpretation was right then. Repeating the > formula here (augmented) for future reference: > > Kendall's tau-b (tie handling): > ------------------------------- > Given two rankings R1 and R2, Kendall's tau-b is calculated by: > t = (P - Q) / SQRT((P + Q + T) * (P + Q + U)) > where P is the number of concordant pairs, Q the number of discordant > pairs, T the number of ties in R1 and U the number of ties in R2. > [Ties are always counted regardless of whether they occur for the same > pair in R1 and R2 or different pairs] > ------------------------------- > > Some tests I ran today with the R implementation of Kendall's Tau(-a) > and the original implementation in SciPy.stats.stats (Kendall's Tau-b) > seem to suggests that if we do NOT count ties on the same pair (the > current situation in SciPy.stats.stats) effectively Kendall's Tau-b > gives the same outcomes as Kendall's Tau-a for about 36 test cases. > > This seems to suggest that Kendall's Tau-b (tie correction) in SciPy as > it is behaves like Kendall's Tau-a (no tie correction), possibly because > of leaving out ties on identical pairs in T and U above. > > I unfortunately do not have the time to mathematically prove (or > disprove) the equivalence of Kendall's Tau-a and the current SciPy > implementation right now, but I thought I'd be useful to mention these > test results. > > Hi, This link might be useful as it has worked examples: http://faculty.chass.ncsu.edu/garson/PA765/assocordinal.htm I find that the SAS documentation for Proc Freq very useful: http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/freq_index.htm http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/freq_sect20.htm Also, does these implementation depend on the type of array? It would be great to have a single function that accepts an array, masked array or an object that can be converted into an array. Finally, this measure assumes ordinal data but there is no type checking done in the Scipy function. Bruce From josef.pktd at gmail.com Wed Mar 18 11:12:27 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 18 Mar 2009 11:12:27 -0400 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <49C0F72D.6050406@gmail.com> References: <1237316749.6984.13.camel@rufio-laptop> <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> <49C0E4F7.3020707@molden.no> <1237382378.6824.120.camel@ewi1341> <49C0F72D.6050406@gmail.com> Message-ID: <1cd32cbb0903180812q2172c29fr34419e773b423ed6@mail.gmail.com> On Wed, Mar 18, 2009 at 9:29 AM, Bruce Southey wrote: > Almer S. Tigelaar wrote: >> Hello, >> >> On Wed, 2009-03-18 at 13:11 +0100, Sturla Molden wrote: >> >>> So it seems Hollander & Wolfe and Minitab says 0.67, whereas Numerical >>> Receipes says 1.0. Intuitively a vector correlation should be exactly >>> correlated with itself, but I am inclined to trust Hollander & Wolfe >>> more than Numerical Receipes. >>> >> >> Ah, I was under the impression you already checked Hollander & Wolfe. >> Anyway, it seems my initial interpretation was right then. Repeating the >> formula here (augmented) for future reference: >> >> Kendall's tau-b (tie handling): >> ------------------------------- >> Given two rankings R1 and R2, Kendall's tau-b is calculated by: >> ? ? ? ? t = (P - Q) / SQRT((P + Q + T) * (P + Q + U)) >> where P is the number of concordant pairs, Q the number of discordant >> pairs, T the number of ties in R1 and U the number of ties in R2. >> [Ties are always counted regardless of whether they occur for the same >> pair in R1 and R2 or different pairs] >> ------------------------------- >> >> Some tests I ran today with the R implementation of Kendall's Tau(-a) >> and the original implementation in SciPy.stats.stats (Kendall's Tau-b) >> seem to suggests that if we do NOT count ties on the same pair (the >> current situation in SciPy.stats.stats) effectively Kendall's Tau-b >> gives the same outcomes as Kendall's Tau-a for about 36 test cases. >> >> This seems to suggest that Kendall's Tau-b (tie correction) in SciPy as >> it is behaves like Kendall's Tau-a (no tie correction), possibly because >> of leaving out ties on identical pairs in T and U above. >> >> I unfortunately do not have the time to mathematically prove (or >> disprove) the equivalence of Kendall's Tau-a and the current SciPy >> implementation right now, but I thought I'd be useful to mention these >> test results. >> >> > > Hi, > This link might be useful as it has worked examples: > http://faculty.chass.ncsu.edu/garson/PA765/assocordinal.htm > > I find that the SAS documentation for Proc Freq very useful: > http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/freq_index.htm > http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/freq_sect20.htm > > Also, does these implementation depend on the type of array? > It would be great to have a single function that accepts an array, > masked array or an object that can be converted into an array. > > Finally, this measure assumes ordinal data but there is no type checking > done in the Scipy function. > > Bruce I'm giving up, this takes too much time: the sas reference by Bruce hat a reference to a book by Agresti, where I finally found a clear formal definition and it excludes matching ties in the in the denominator: http://books.google.ca/books?id=hpEzw4T0sPUC&dq=Agresti,+Alan+(1996).+Introduction+to+categorical+data+analysis&printsec=frontcover&source=bn&hl=en&ei=hADBSamzBpKWMrTttJoI&sa=X&oi=book_result&resnum=4&ct=result#PPA68,M1 categorical data: sas and spss are using kendall's tau and the other association measures for contingency tables, and sas says kendall tau-b is only appropriate for categorical data. I think this is not true in general, a saw other applications of kendall's tau for continuous (normal) random variables, and since in this case there are no ties all definitions for kendalls tau or gamma produce identical results, corresponding to the probability and correlation definition. the definition that Bruce found: Kendall's tau according to Hollander, M., and D. A. Wolfe. 1999, I think, corresponds to kendall's tau-a since there is no tie correction in the denominator. general principles: I like the correlation interpretation that I mentioned, In a reference on the wikipedia page, they gave a preference ordering interpretation of kendall's tau, how strongly are two preference orderings, e.g. rankings related. They refered to it only for strict preferences, but the same should hold for weak preferences. If you consider [1,1,2] as preference ranking, than two individuals with the same weak ranking have exactly the same preferences, and the correlation should be 1 and not 2/3 I don't know about kendall's tau-c because m seems to be specific to contingency tables, while all other measures have a more general interpretation. Similar in view of the general definitions, I don't understand the talk about square or rectangular tables. But I don't have a good intuition for contingency tables. I haven't checked the comparison with R, but I would be very surprised if stats.kendalltau corresponds to kendall's tau-a in R. the version of kendall's tau-b that adds matching ties in the denominator is wrong from my reading. So, I prefer to leave stats.kendalltau as it is and maybe add on option for tiehandling, so that also obtain kendall's tau-a and maybe gamma (in the definition of sas and spss) Josef attached is my test script -------------- next part -------------- import numpy as np from scipy import stats from numpy.testing import assert_equal def kendalltaustat(x, y): '''calculate Kendall's tau-b correlation statistic this is just the (non-central) correlation of all pairwise rankings Notes ----- tau-b = (P - Q) / sqrt(((P + Q + Tx - Txy)(P + Q + Ty - Txy))) equivalent to tau-b = (P - Q) / sqrt(((n*(n-1) - Tx)*((n*(n-1) - Ty))) total number of pairs (n over 2) = (n*(n-1) Tx number of ties in x Ty number of ties in y Txy number of pairs that are tied in x and in y Tx - Txy number of pairs only tied in x and not in y Ty - Txy number of pairs only tied in y and not in x P = sum_pairs( (x1-x2)*(y1-y2)>0) concordant pairs Q = sum_pairs( (x1-x2)*(y1-y2)<0) discordant pairs ''' # not this is using 2 times the pairs # calculate indicators of all pairs ppos1 = np.sign((x[:,np.newaxis] - x)).astype(float).ravel() ppos2 = np.sign((y[:,np.newaxis] - y)).astype(float).ravel() #correlation coefficient without mean correction tau = np.dot(ppos1,ppos2) / np.sqrt(np.dot(ppos1,ppos1) * np.dot(ppos2,ppos2)) return tau def kendalltaustata(x, y): '''calculate Kendall's tau-a correlation statistic sum of rank indicator (-1,0,+1) divided by all pairs ''' n = x.shape[0] tau = (np.sum(np.sign((x[:,np.newaxis]-x)*(y[:,np.newaxis]-y))))/(1.0*n*(n-1)) return tau def gammaassoc(x, y): ''' this is gamma correlation statistic this is just the (non-central) correlation of all pairwise rankings ''' n = x.shape[0] conc = np.sign((x[:,np.newaxis]-x)*(y[:,np.newaxis]-y)) tau = np.sum(conc)/float(np.sum(np.abs(conc))) return tau def kendalltaustatba(x, y): ''' calculate Kendall's tau-b correlation statistic with different tie handling, adding simultanous ties in x and y in denominator ''' n = x.shape[0] conc = np.sign((x[:,np.newaxis]-x)*(y[:,np.newaxis]-y)) strict = np.sum(np.abs(conc)) tiesx = np.sum((x[:,np.newaxis]-x)==0) - n tiesy = np.sum((y[:,np.newaxis]-y)==0) - n tau = np.sum(conc)/np.float(np.sqrt((strict + tiesx) * (strict + tiesx))) return tau x1a = np.array([0, 1, 3, 3, 4, 5, 5, 7, 8]) x1b = np.array([1, 3, 3, 4, 5, 5, 7, 8, 9]) x1c = np.array([1, 3, 3, 3, 5, 5, 7, 8, 9]) x1d = np.array([1, 3, 3, 3, 5, 5, 7, 8, 6]) x1 = np.array([1,1,2]) x2a = np.array([1,1,2,2]) x2b = np.array([1,2,3,4]) data = [(x1a, x1a), (x1a, x1b), (x1a, x1c), (x1a, 10-x1c), (x1, x1), (x2a, x2b), (x2b, x2b), (x2a, 3-x2a), (x2a, 3-x2b), ] data = [(x1a, x1a, 1.0, 0.00017455009626808976), (x1a, x1b, 0.94117647058823528, 0.0004116823127257346), (x1a, x1c, 0.93982554701579024, 0.00041964794448139537), (x1a, x1d, 0.81855773449762381, 0.0021244496819179748), (x1a, 10-x1c, -0.93982554701579024, 0.00041964794448139537), (x1, x1, 1.0, 0.11718509694604401), (x2a, x2b, 0.81649658092772615, 0.0960923383021903), (x2b, x2b, 1.0, 0.041540072431798185), (x2a, 3-x2a, -1.0, 0.041540072431798185), (x2a, 3-x2b, -0.81649658092772615, 0.0960923383021903), ] for x,y, tc, pc in data: t1 = kendalltaustat(x, y) ts, ps = stats.kendalltau(x, y) ta = kendalltaustata(x, y) taa = gammaassoc(x, y) tba = kendalltaustatba(x, y) print ts, t1, ta, taa, tba #print (ts, ps) assert_equal(t1, ts) assert_equal(ts, tc) assert_equal(ps, pc) for i in range(10): x = np.random.randn(20) y = np.random.randn(20) t1 = kendalltaustat(x, y) ts, ps = stats.kendalltau(x, y) ta = kendalltaustata(x, y) taa = gammaassoc(x, y) tba = kendalltaustatba(x, y) #print t1, (ts, ps) print ts, t1, ta, taa, tba assert_equal(t1,ts) violence = np.array([1,1,1,2,2,2]) rating = np.array([1,2,3,1,2,3]) count = np.array([10, 5, 2, 9, 12, 16]) vi = np.repeat(violence,count) ra = np.repeat(rating,count) print 'stats.kendalltau, 0.34932342479397172',stats.kendalltau(vi, ra) print 'stats.spearmanr, 0.370', stats.spearmanr(vi, ra) # wrong vir = stats.rankdata(vi) rar = stats.rankdata(ra) print 'pearson, 0.370', np.corrcoef(vi, ra)[0,1] print 'rankdata correlation, 0.370', np.corrcoef(vir, rar)[0,1] # correct From sturla at molden.no Wed Mar 18 11:55:38 2009 From: sturla at molden.no (Sturla Molden) Date: Wed, 18 Mar 2009 16:55:38 +0100 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1cd32cbb0903180812q2172c29fr34419e773b423ed6@mail.gmail.com> References: <1237316749.6984.13.camel@rufio-laptop> <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> <49C0E4F7.3020707@molden.no> <1237382378.6824.120.camel@ewi1341> <49C0F72D.6050406@gmail.com> <1cd32cbb0903180812q2172c29fr34419e773b423ed6@mail.gmail.com> Message-ID: <49C1197A.8070109@molden.no> On 3/18/2009 4:12 PM, josef.pktd at gmail.com wrote: > the sas reference by Bruce hat a reference to a book by Agresti, where > I finally found a clear formal definition and it excludes matching > ties in the in the denominator: > > http://books.google.ca/books?id=hpEzw4T0sPUC&dq=Agresti,+Alan This is basically the same as in Numerical Receipes. Hollander & Wolfe has a discussion of tie handling on page 374 and 375: "We have recommended dealing with tied X observation and/or tied Y observations by counting a zero in Q (8.17) counts leading to the computation of K (8.6). This approach is statisfactory as long as the number of (X,Y) pairs containing a tied X and/or tied Y observation does not represent a sizable percentage of the total number (n) of sample pairs. We should, however, point out that methods other than this zero assignment to Q counts have been considered for dealing with tied C and/or tied Y observations [...]" They then suggest counting +1 or -1 by coin toss for ties, or a strategy to be conservative about rejecting H0. They also suggest using Efron's bootstrap for confidency intervals on tau (page 388). I don't see any mention of tau-a, tau-b or tau-c in H&W, nor contigency tables. I don't quite understand Hollander & Wolfe's argument. They are basically saying that their recommended method of dealing with ties only works when ties are so few in numbers that they don't matter anyway. > I don't know about kendall's tau-c because m seems to be specific to > contingency tables, while all other measures have a more general > interpretation. Similar in view of the general definitions, I don't > understand the talk about square or rectangular tables. But I don't > have a good intuition for contingency tables. The ide is that Kendall's tau works on ordinal scale, not rank scale as Spearman's r. You can use the number of categories for X and Y you like, but the categories have to be ordered. You thus get a table of counts. If you for example use two categories (small or big) in X and four categories (tiny, small, big, huge) in Y, the table is 2 x 4. If you go all the way up to rank scale, you get a very sparse table with a lot 0 counts. With few categories, ties will be quite common, and that is the justification for tau-b instead of gamma. Sturla Molden From almer at gnome.org Wed Mar 18 11:59:00 2009 From: almer at gnome.org (Almer S. Tigelaar) Date: Wed, 18 Mar 2009 16:59:00 +0100 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1cd32cbb0903180812q2172c29fr34419e773b423ed6@mail.gmail.com> References: <1237316749.6984.13.camel@rufio-laptop> <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> <49C0E4F7.3020707@molden.no> <1237382378.6824.120.camel@ewi1341> <49C0F72D.6050406@gmail.com> <1cd32cbb0903180812q2172c29fr34419e773b423ed6@mail.gmail.com> Message-ID: <1237391940.6824.130.camel@ewi1341> Hi Josef, On Wed, 2009-03-18 at 11:12 -0400, josef.pktd at gmail.com wrote: > I'm giving up, this takes too much time: Okay, we can simply conclude that there are some conflicting interpretations of Kendall's Tau-b. Both, I believe, are defensible. In such cases one is best off choosing just one approach and making clear what it is. So, I would simply say in the function documentation precisely the definition that you use (also for ties) and your motivation relating to the correlation interpretation (which is indeed pretty convincing). You can use my template Kendall tau-b definition from this post for that: http://mail.scipy.org/pipermail/scipy-dev/2009-March/011569.html Then at least there will be no misunderstanding about what the function is supposed (and will) do. If people then disagree (and want to use the other interpretation) then they can copy the function and adjust it to their wishes. Thanks for your rigorous testing and general effort on this. I appreciate it, very useful for what I am working on. -- With kind regards, Almer S. Tigelaar University of Twente From josef.pktd at gmail.com Wed Mar 18 12:13:09 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 18 Mar 2009 12:13:09 -0400 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1cd32cbb0903180812q2172c29fr34419e773b423ed6@mail.gmail.com> References: <1237316749.6984.13.camel@rufio-laptop> <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> <49C0E4F7.3020707@molden.no> <1237382378.6824.120.camel@ewi1341> <49C0F72D.6050406@gmail.com> <1cd32cbb0903180812q2172c29fr34419e773b423ed6@mail.gmail.com> Message-ID: <1cd32cbb0903180913s47ab93fk388adb275bb59397@mail.gmail.com> kendall's tau in R stats is exactly the same as scipy.stats.kendalltau for all the test cases, independent of ties and matching ties, see also >>> rcortest([1,1,2], [1,1,2], method = "kendall", exact=0)['estimate']['tau'] 1.0 R help doesn't specify which version of kendall tau is in cor.test import rpy rcortest = rpy.r('cor.test') rkend = rcortest(x, y, method = "kendall", exact=0) tr = rkend['estimate']['tau'] ts, ps = stats.kendalltau(x, y) assert_almost_equal(tr, ts, decimal=10) doesn't raise exception for any test cases, I also checked the p-values, but there are a few discrepancies between R and scipy.stats. but all test cases have for very small sample size difference in p-values for test cases: np.diff(rcomparr, axis=1).T array([[ -1.17641404e-04, -2.40189199e-04, -3.84139134e-04, -1.38669151e-03, -3.84139134e-04, -4.01141101e-02, -2.52429121e-02, 5.42191308e-09, -4.17244442e-02, -2.52429121e-02, 0.00000000e+00, 1.09393424e-08, 3.23933650e-09, 9.63088664e-09, 1.09393424e-08, 1.18686162e-08, 1.04538346e-08, 1.37460343e-09, 1.27648634e-08, 1.22071195e-08]]) Since we also agree with R.stats cor.test for the definition of kendall tau, there is really no reason to change stats.kendalltau, maybe checking for which cases the p-values differ could be useful. Josef From sturla at molden.no Wed Mar 18 12:30:51 2009 From: sturla at molden.no (Sturla Molden) Date: Wed, 18 Mar 2009 17:30:51 +0100 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <49C1197A.8070109@molden.no> References: <1237316749.6984.13.camel@rufio-laptop> <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> <49C0E4F7.3020707@molden.no> <1237382378.6824.120.camel@ewi1341> <49C0F72D.6050406@gmail.com> <1cd32cbb0903180812q2172c29fr34419e773b423ed6@mail.gmail.com> <49C1197A.8070109@molden.no> Message-ID: <49C121BB.3020701@molden.no> On 3/18/2009 4:55 PM, Sturla Molden wrote: > The ide is that Kendall's tau works on ordinal scale, not rank scale as > Spearman's r. You can use the number of categories for X and Y you like, > but the categories have to be ordered. You thus get a table of counts. > If you for example use two categories (small or big) in X and four > categories (tiny, small, big, huge) in Y, the table is 2 x 4. If you go > all the way up to rank scale, you get a very sparse table with a lot 0 > counts. With few categories, ties will be quite common, and that is the > justification for tau-b instead of gamma. One very important aspect of this is that it can reduce the computational burden substantially. If you e.q. know that 100 categories is sufficient resolution, you get a 100 x 100 contigency table. tau-b can be computed directly from the table. So for large data sets, this avoids the O(N**2) complexity of tau. The complexity of tau-b becomes O(N) and O(C*D), with C and D the number of categories in X and Y. So having a contingency-table version of tau-b would be very useful. Sturla Molden From josef.pktd at gmail.com Wed Mar 18 12:35:03 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 18 Mar 2009 12:35:03 -0400 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1237391940.6824.130.camel@ewi1341> References: <1237316749.6984.13.camel@rufio-laptop> <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> <49C0E4F7.3020707@molden.no> <1237382378.6824.120.camel@ewi1341> <49C0F72D.6050406@gmail.com> <1cd32cbb0903180812q2172c29fr34419e773b423ed6@mail.gmail.com> <1237391940.6824.130.camel@ewi1341> Message-ID: <1cd32cbb0903180935r5c5a52a5x59330bf349835fa0@mail.gmail.com> On Wed, Mar 18, 2009 at 11:59 AM, Almer S. Tigelaar wrote: > Hi Josef, > > On Wed, 2009-03-18 at 11:12 -0400, josef.pktd at gmail.com wrote: >> I'm giving up, this takes too much time: > > Okay, we can simply conclude that there are some conflicting > interpretations of Kendall's Tau-b. Both, I believe, are defensible. In > such cases one is best off choosing just one approach and making clear > what it is. > > So, I would simply say in the function documentation precisely the > definition that you use (also for ties) and your motivation relating to > the correlation interpretation (which is indeed pretty convincing). > You can use my template Kendall tau-b definition from this post for > that: > http://mail.scipy.org/pipermail/scipy-dev/2009-March/011569.html > > Then at least there will be no misunderstanding about what the function > is supposed (and will) do. If people then disagree (and want to use the > other interpretation) then they can copy the function and adjust it to > their wishes. > > Thanks for your rigorous testing and general effort on this. I > appreciate it, very useful for what I am working on. > The proliferation of different version of a statistic was the reason that I wasn't able to verify kendall's tau before. I think, there should be a clear theoretical foundation and interpretation and not just twisting the tie handling a bit. For example spearman's r: the calculation is based on a short hand formula that only works when there are no ties. If there are ties, the discussion starts how to handle them. But, if you go back to the definition as correlation of the rank ordering implied by the data then we can just use the standard correlation coefficient on the rankdata and we don't have to worry about tie handling. another case: What's the point of the pointbiserialr(x, y), it's just the correlation between a binary and a continuous variable. It has a nice explicit formula to calculate it (almost) by hand. But using a computer we can just use np.corrcoef and don't have to worry about special functions. I think some of these formulas are left-over from the time before we had fast computers. Josef From josef.pktd at gmail.com Wed Mar 18 13:16:59 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 18 Mar 2009 13:16:59 -0400 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <49C121BB.3020701@molden.no> References: <1237316749.6984.13.camel@rufio-laptop> <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> <49C0E4F7.3020707@molden.no> <1237382378.6824.120.camel@ewi1341> <49C0F72D.6050406@gmail.com> <1cd32cbb0903180812q2172c29fr34419e773b423ed6@mail.gmail.com> <49C1197A.8070109@molden.no> <49C121BB.3020701@molden.no> Message-ID: <1cd32cbb0903181016y1c44ee7cw2dd46b7b5117a8f0@mail.gmail.com> On Wed, Mar 18, 2009 at 12:30 PM, Sturla Molden wrote: > On 3/18/2009 4:55 PM, Sturla Molden wrote: > >> The ide is that Kendall's tau works on ordinal scale, not rank scale as >> Spearman's r. You can use the number of categories for X and Y you like, >> but the categories have to be ordered. You thus get a table of counts. >> If you for example use two categories (small or big) in X and four >> categories (tiny, small, big, huge) in Y, the table is 2 x 4. If you go >> all the way up to rank scale, you get a very sparse table with a lot 0 >> counts. With few categories, ties will be quite common, and that is the >> justification for tau-b instead of gamma. I got confused what the statement "requires ordinal scale" means. I think the clearer statement should be: requires at least ordinal scale, the categories have to be ordered and not unordered as in "male", "female". Kendall tau uses only ordinal information even if the variable is metric like a continuous variable. It didn't catch my attention before that Spearman's r uses a rank scale and not a (ordinal) rank ordering. > > One very important aspect of this is that it can reduce the > computational burden substantially. If you e.q. know that 100 categories > is sufficient resolution, you get a 100 x 100 contigency table. tau-b > can be computed directly from the table. So for large data sets, this > avoids the O(N**2) complexity of tau. The complexity of tau-b becomes > O(N) and O(C*D), with C and D the number of categories in X and Y. > > So having a contingency-table version of tau-b would be very useful. You could still have O(C*D) > O(N**2), if the table is sparse, and you haven't deleted the empty rows and columns. If you have 100 categories for a variable, do you still have to treat it as an ordinal variable, I would expect that using statistics for continuous variables should produce almost the same results. I think there should be some general tools or tricks to work on contingency tables, to have a common pattern for working with them. But, I never used them much, so it would take me quite some time to figure out how to do this efficiently. I'm more used to continuous variable and maybe a few dummy variables. I was looking at categorical variables for regression and for ANOVA and there it is a similar story for the size of the matrices. When I create a dummy variable for each category combination, then in your case, I would have a matrix of dummy variables in the size of (number of observation)*100*100. If the category variables are race and sex and age group, then the dimension would be much smaller. In the case for a small number of categories, everything can be written using simple linear algebra (broadcasting and dot products all over) which is very fast, but would require more memory if the number of categories is really large. Given that there are so many different application cases for statistics, choosing an implementation that satisfies most, looks pretty difficult, and requires feedback, contributions and time to actually do it. Josef From sturla at molden.no Wed Mar 18 13:53:04 2009 From: sturla at molden.no (Sturla Molden) Date: Wed, 18 Mar 2009 18:53:04 +0100 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1cd32cbb0903181016y1c44ee7cw2dd46b7b5117a8f0@mail.gmail.com> References: <1237316749.6984.13.camel@rufio-laptop> <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> <49C0E4F7.3020707@molden.no> <1237382378.6824.120.camel@ewi1341> <49C0F72D.6050406@gmail.com> <1cd32cbb0903180812q2172c29fr34419e773b423ed6@mail.gmail.com> <49C1197A.8070109@molden.no> <49C121BB.3020701@molden.no> <1cd32cbb0903181016y1c44ee7cw2dd46b7b5117a8f0@mail.gmail.com> Message-ID: <49C13500.5030605@molden.no> On 3/18/2009 6:16 PM, josef.pktd at gmail.com wrote: > You could still have O(C*D) > O(N**2), if the table is sparse, and you > haven't deleted the empty rows and columns. Yes. So what is the faster option depends on N and the number of ordinal categories. But often we have C*D << N**2. If N is a million and 100 categories suffice, it is easy to do the math. Also, it is possible to estimate tau by Monte Carlo. S.M. From josef.pktd at gmail.com Wed Mar 18 16:13:11 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 18 Mar 2009 16:13:11 -0400 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <49C13500.5030605@molden.no> References: <1237316749.6984.13.camel@rufio-laptop> <49C0E4F7.3020707@molden.no> <1237382378.6824.120.camel@ewi1341> <49C0F72D.6050406@gmail.com> <1cd32cbb0903180812q2172c29fr34419e773b423ed6@mail.gmail.com> <49C1197A.8070109@molden.no> <49C121BB.3020701@molden.no> <1cd32cbb0903181016y1c44ee7cw2dd46b7b5117a8f0@mail.gmail.com> <49C13500.5030605@molden.no> Message-ID: <1cd32cbb0903181313g3501e20bu1fbabe81190c9a88@mail.gmail.com> On Wed, Mar 18, 2009 at 1:53 PM, Sturla Molden wrote: > On 3/18/2009 6:16 PM, josef.pktd at gmail.com wrote: > >> You could still have O(C*D) > O(N**2), if the table is sparse, and you >> haven't deleted the empty rows and columns. > > Yes. So what is the faster option depends on N and the number of ordinal > categories. But often we have C*D << N**2. If N is a million and 100 > categories suffice, it is easy to do the math. > > Also, it is possible to estimate tau by Monte Carlo. > I got a (number of cells)**2 version for the contingency table for Kendall's tau-a, that's as far as I could go without loops. I don't see how you could get O(C*D) and not O((C*D)**2) since you still need to compare all pairs of cells, so my impression is that the relevant comparison is between C*D and N. Josef # contingency table violence = np.array([1,1,1,2,2,2]) rating = np.array([1,2,3,1,2,3]) count = np.array([10, 5, 2, 9, 12, 16]) # individual observation vi = np.repeat(violence,count) ra = np.repeat(rating,count) # tau-a calculated using contingency table from example # creates arrays of size (number of cells)**2, no loops but (almost 50%) redundant points deltax = violence[:,np.newaxis] - violence deltay = rating[:,np.newaxis] - rating paircount = count[:,np.newaxis]*count - np.diag(count) tau_a = np.sum(np.sign(deltax*deltay)*paircount)/(1.*paircount.sum()) print tau_a, kendalltaustata(vi,ra) From almer at gnome.org Wed Mar 18 16:18:01 2009 From: almer at gnome.org (Almer S. Tigelaar) Date: Wed, 18 Mar 2009 21:18:01 +0100 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1cd32cbb0903180935r5c5a52a5x59330bf349835fa0@mail.gmail.com> References: <1237316749.6984.13.camel@rufio-laptop> <18f8e379820d4affae266687ea793cb9.squirrel@webmail.uio.no> <1cd32cbb0903171510o4f614c25i7dfeaa7d219a78a3@mail.gmail.com> <49C0E4F7.3020707@molden.no> <1237382378.6824.120.camel@ewi1341> <49C0F72D.6050406@gmail.com> <1cd32cbb0903180812q2172c29fr34419e773b423ed6@mail.gmail.com> <1237391940.6824.130.camel@ewi1341> <1cd32cbb0903180935r5c5a52a5x59330bf349835fa0@mail.gmail.com> Message-ID: <1237407481.6482.5.camel@rufio-laptop> Hi Josef, On Wed, 2009-03-18 at 12:35 -0400, josef.pktd at gmail.com wrote: > I think, there should be a clear theoretical foundation and > interpretation and not just twisting the tie handling a bit. True. Honestly, my expectation before running my [1,1,2] [1,1,2] example was also (expressed in unit tests) that this should yield a +1.0 correlation. I started to investigate this further when I noticed it was not so (for my own implementation). I do think the correlation argument is pretty convincing both intuitively and semantically. On identical lists, the correlation should be +1.0. This is not so for example for the Kendall tau-c variant, which gets closer to +1.0 as the number of items to rank increases, for which I can still not give an entirely satisfactory explanation. With kind regards, Almer S. Tigelaar. From sturla at molden.no Wed Mar 18 16:42:12 2009 From: sturla at molden.no (Sturla Molden) Date: Wed, 18 Mar 2009 21:42:12 +0100 (CET) Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1cd32cbb0903181313g3501e20bu1fbabe81190c9a88@mail.gmail.com> References: <1237316749.6984.13.camel@rufio-laptop> <49C0E4F7.3020707@molden.no> <1237382378.6824.120.camel@ewi1341> <49C0F72D.6050406@gmail.com> <1cd32cbb0903180812q2172c29fr34419e773b423ed6@mail.gmail.com> <49C1197A.8070109@molden.no> <49C121BB.3020701@molden.no> <1cd32cbb0903181016y1c44ee7cw2dd46b7b5117a8f0@mail.gmail.com> <49C13500.5030605@molden.no> <1cd32cbb0903181313g3501e20bu1fbabe81190c9a88@mail.gmail.com> Message-ID: <82d3c7587b21972d45039f7cbe04f669.squirrel@webmail.uio.no> > I got a (number of cells)**2 version for the contingency table for > Kendall's tau-a, that's as far as I could go without loops. I don't > see how you could get O(C*D) and not O((C*D)**2) since you still need > to compare all pairs of cells, so my impression is that the relevant > comparison is between C*D and N. You are right. We have to compare C*D with N. I was thinking about how to best use the contigency-table for tau-b. I don't really see how it can be done without some loops. It may be easier to do this in Fortran. Sturla Molden From sturla at molden.no Wed Mar 18 18:00:28 2009 From: sturla at molden.no (Sturla Molden) Date: Wed, 18 Mar 2009 23:00:28 +0100 (CET) Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <82d3c7587b21972d45039f7cbe04f669.squirrel@webmail.uio.no> References: <1237316749.6984.13.camel@rufio-laptop> <49C0E4F7.3020707@molden.no> <1237382378.6824.120.camel@ewi1341> <49C0F72D.6050406@gmail.com> <1cd32cbb0903180812q2172c29fr34419e773b423ed6@mail.gmail.com> <49C1197A.8070109@molden.no> <49C121BB.3020701@molden.no> <1cd32cbb0903181016y1c44ee7cw2dd46b7b5117a8f0@mail.gmail.com> <49C13500.5030605@molden.no> <1cd32cbb0903181313g3501e20bu1fbabe81190c9a88@mail.gmail.com> <82d3c7587b21972d45039f7cbe04f669.squirrel@webmail.uio.no> Message-ID: <8a11e2fa01503c44bf051dc04a36857a.squirrel@webmail.uio.no> > I was thinking about how to best use the contigency-table for tau-b. I > don't really see how it can be done without some loops. It may be easier > to do this in Fortran. f2py'ing something like this should work (not thoroughly tested though)... subroutine taub(C, D, table, tau) implicit none intrinsic :: sqrt, sum integer*4, intent(in) :: C, D integer*4, dimension(C,D), intent(in) :: table real*8, intent(out) :: tau integer*4 :: i, j, tmp, P, Q, Tx, Ty P = 0 Q = 0 Tx = 0 Ty = 0 !$omp parallel do & !$omp& private(i,j,tmp) & !$omp& reduction(+:P,Q,Tx,Ty) & !$omp& shared(table,C,D) do i = 1,D do j = 1,C tmp = table(j,i) ! count concordant pairs if ((i .lt. D) .and. (j .gt. 1)) then P = P + sum(table(1:j-1,i+1:D)*tmp) end if ! count disconcordant pairs if ((i .gt. 1) .and. (j .lt. C)) then Q = Q + sum(table(j+1:C,1:i-1)*tmp) end if ! count pairs tied in y if (i .lt. D) then Ty = Ty + sum(table(j,i+1:D)*tmp) end if ! count pairs tied in x if (j .lt. C) then Tx = Tx + sum(table(j+1:C,i)*tmp) end if end do end do !$omp end parallel do tau = real(P-Q)/sqrt(real((P+Q+Tx)*(P+Q+Ty))) end subroutine From josef.pktd at gmail.com Wed Mar 18 21:58:45 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 18 Mar 2009 21:58:45 -0400 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <8a11e2fa01503c44bf051dc04a36857a.squirrel@webmail.uio.no> References: <1237316749.6984.13.camel@rufio-laptop> <49C0F72D.6050406@gmail.com> <1cd32cbb0903180812q2172c29fr34419e773b423ed6@mail.gmail.com> <49C1197A.8070109@molden.no> <49C121BB.3020701@molden.no> <1cd32cbb0903181016y1c44ee7cw2dd46b7b5117a8f0@mail.gmail.com> <49C13500.5030605@molden.no> <1cd32cbb0903181313g3501e20bu1fbabe81190c9a88@mail.gmail.com> <82d3c7587b21972d45039f7cbe04f669.squirrel@webmail.uio.no> <8a11e2fa01503c44bf051dc04a36857a.squirrel@webmail.uio.no> Message-ID: <1cd32cbb0903181858j2ad945b3k2b3bb8f5c236bf0a@mail.gmail.com> On Wed, Mar 18, 2009 at 6:00 PM, Sturla Molden wrote: > >> I was thinking about how to best use the contigency-table for tau-b. I >> don't really see how it can be done without some loops. It may be easier >> to do this in Fortran. > > f2py'ing something like this should work (not thoroughly tested though)... > > subroutine taub(C, D, table, tau) > ? ?implicit none I have a hard time following the double loop and slice indices inside the double loop. It would be fast and safe on memory since it runs through the double loop only once, but hard to read, debug and maintain. Since I already did most of the work, I finished kendall's tau-a, tau-b and tau-c for both versions, for original 2 data vectors and for the contingency table, tau-b and tau-c are verified with the spss example for the contingency table the results: first column from contingency table, second column from 2 data vector version tau_a 0.190775681342 0.190775681342 tau_b 0.349323424794 (0.34932342479397172, 0.00019199787220495093) tau_c 0.374485596708 0.374485596708 I didn't use any loop but instead used intermediate arrays. Several calculations are necessary because I use the contingency table in flattened format. And I think it's quite readable. Here's the contingency table version: def kendalltau_fromct(x, y, count): '''return tau-a, tau-b and tau-c from contingency table data example for contingency table x = np.array([1,1,1,2,2,2]) y = np.array([1,2,3,1,2,3]) count = np.array([10, 5, 2, 9, 12, 16]) ''' catx = np.unique(x) caty = np.unique(y) ncatx = len(catx) ncaty = len(caty) deltax = np.sign(x[:,np.newaxis] - x) deltay = np.sign(y[:,np.newaxis] - y) paircount = count[:,np.newaxis]*count - np.diag(count) # number of concordant minus number of discordant pairs netconc = np.sum((deltax*deltay)*paircount) # calculation for tau-a tau_a = netconc/(1.*paircount.sum()) # calculation for tau-c m = min(ncatx,ncaty) tau_c = netconc /(1.*count.sum()**2)* m/(m-1.0) #extra calculation for tau_b # row and column counts of contingency table countx = np.dot(count,(x[:,np.newaxis]==catx)) county = np.dot(count,(y[:,np.newaxis]==caty)) #total number of pairs npairs = paircount.sum() #number of ties ntiex = np.dot(countx,(countx-1)) ntiey = np.dot(county,(county-1)) denom = 1.0*np.sqrt((npairs - ntiex ) * (npairs - ntiey)) tau_b = netconc / denom return tau_a, tau_b, tau_c violence = np.array([1,1,1,2,2,2]) rating = np.array([1,2,3,1,2,3]) count = np.array([10, 5, 2, 9, 12, 16]) vi = np.repeat(violence,count) ra = np.repeat(rating,count) tau_a, tau_b, tau_c = kendalltau_fromct(violence, rating, count) print 'tau_a', tau_a, kendalltaustata(vi,ra) print 'tau_b', tau_b, stats.kendalltau(vi, ra) print 'tau_c', tau_c, kendalltauc(vi, ra) Josef From josef.pktd at gmail.com Wed Mar 18 23:07:15 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 18 Mar 2009 23:07:15 -0400 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1237367562.6824.14.camel@ewi1341> References: <1237316749.6984.13.camel@rufio-laptop> <1cd32cbb0903171341x29d9a029u53b354fcba0fb400@mail.gmail.com> <1237367562.6824.14.camel@ewi1341> Message-ID: <1cd32cbb0903182007x7242ef41vdbfe978e07d14b50@mail.gmail.com> > Kendall tau-c (alternative tie handling): > ----------------------------------------- > (also called Stuart's tau-c or Kendall-Stuart's tau-c) > ? ? ? ?t = (m * (P - Q)) / (n^2 * (m - 1)) > where P is the number of concordant pairs, Q the number of discordant > pairs, n the number of items and m = min(r,s) where r and s are the > number of rows and columns in the data. > > [Note that there are some incorrect definition of Kendall tau-c floating > ?around which substitute 2m instead of m in the numerator, as this > ?can yield values outside of the (-1, +1) range this is obviously wrong] > Just a final comment: I think there are also two different definitions of pairs in usage, whether each pair is counted twice, e.g. a is compared with b and b is compared with a If both directions are counted as separate pairs, then there are n*(n-1) pairs (this is what I use), otherwise there are n*(n-1)/2 pairs. For tau-a and tau-b it doesn't matter as long as the same definition is used in the numerator and in the denominator. For tau-c, I get the same result as in the spss example, however in the explanation for spss, they use 2*m while I use m, but they have only half the number of pairs that I do, which exactly compensates. Josef From sturla at molden.no Thu Mar 19 07:42:58 2009 From: sturla at molden.no (Sturla Molden) Date: Thu, 19 Mar 2009 12:42:58 +0100 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1cd32cbb0903181858j2ad945b3k2b3bb8f5c236bf0a@mail.gmail.com> References: <1237316749.6984.13.camel@rufio-laptop> <49C0F72D.6050406@gmail.com> <1cd32cbb0903180812q2172c29fr34419e773b423ed6@mail.gmail.com> <49C1197A.8070109@molden.no> <49C121BB.3020701@molden.no> <1cd32cbb0903181016y1c44ee7cw2dd46b7b5117a8f0@mail.gmail.com> <49C13500.5030605@molden.no> <1cd32cbb0903181313g3501e20bu1fbabe81190c9a88@mail.gmail.com> <82d3c7587b21972d45039f7cbe04f669.squirrel@webmail.uio.no> <8a11e2fa01503c44bf051dc04a36857a.squirrel@webmail.uio.no> <1cd32cbb0903181858j2ad945b3k2b3bb8f5c236bf0a@mail.gmail.com> Message-ID: <49C22FC2.5090203@molden.no> On 3/19/2009 2:58 AM, josef.pktd at gmail.com wrote: > I have a hard time following the double loop and slice indices inside > the double loop. It would be fast and safe on memory since it runs > through the double loop only once, but hard to read, debug and > maintain. We start with a table of counts. Given a pivot with index (j,i), we can split the table into 8 fields. When counting pairs only once, concordant pairs are to the lower right and disconcordant to the lower left. Pairs with tied-y are to the right and pairs with tied x are below. import numpy as np from math import sqrt import numpy as np from math import sqrt def taub(table): def _table(j,i): return { #'above' : table[:j,i], 'below' : table[j+1:,i], #'left' : table[j,:i], 'right' : table[j,i+1:], #'upper-left' : table[:j,:i], #'upper-right' : table[:j,i+1:], 'lower-left' : table[j+1:,:i], 'lower-right' : table[j+1:,i+1:] } D = table.shape[0] C = table.shape[1] P = 0 Q = 0 Tx = 0 Ty = 0 for i in range(C): for j in range(D): pivot = table[j,i] # use this as pivot ct = _table(j,i) # split remainder into 8 sections # count concordant pairs # -- multiply pivot with 'lower-right' and summate P += np.sum(ct['lower-right'] * pivot) # count disconcordant pairs # -- multiply pivot with 'lower-left' and summate Q += np.sum(ct['lower-left'] * pivot) # count pairs tied in y # -- multiply pivot with 'right' and summate Ty += np.sum(ct['right'] * pivot) # count pairs tied in x # -- multiply pivot with 'lower' and summate Tx += np.sum(ct['below'] * pivot) return float(P-Q)/(sqrt((P+Q+Tx)*(P+Q+Ty))) It seems I did a mistake in the Fortran. I counted upper-right as concordant where I should have counted lower-right (!#"?#"%...): subroutine taub(C, D, table, tau) implicit none intrinsic :: sqrt, sum integer*4, intent(in) :: C, D integer*4, dimension(C,D), intent(in) :: table real*8, intent(out) :: tau integer*4 :: i, j, pivot, P, Q, Tx, Ty P = 0 Q = 0 Tx = 0 Ty = 0 !$omp parallel do & !$omp& private(i,j,tmp) & !$omp& reduction(+:P,Q,Tx,Ty) & !$omp& shared(table,C,D) do i = 1,D do j = 1,C pivot = table(j,i) ! count concordant pairs if ((i .lt. D) .and. (j .lt. C)) then P = P + sum(table(j+1:C,i+1:D)*pivot) end if ! count disconcordant pairs if ((i .gt. 1) .and. (j .lt. C)) then Q = Q + sum(table(j+1:C,1:i-1)*pivot) end if ! count pairs tied in y if (i .lt. D) then Ty = Ty + sum(table(j,i+1:D)*pivot) end if ! count pairs tied in x if (j .lt. C) then Tx = Tx + sum(table(j+1:C,i)*pivot) end if end do end do !$omp end parallel do tau = real(P-Q)/sqrt(real((P+Q+Tx)*(P+Q+Ty))) end subroutine > def kendalltau_fromct(x, y, count): > '''return tau-a, tau-b and tau-c from contingency table data > > example for contingency table > x = np.array([1,1,1,2,2,2]) > y = np.array([1,2,3,1,2,3]) > count = np.array([10, 5, 2, 9, 12, 16]) > ''' It looks good, but it can be terribly slow. E.g. when I run it with say an contigency table of 100 categories for X and 100 categoried for Y, it takes for ever and finally raises a MemoryError. Whereas my version with loops takes just 2 seconds. On the other hand, if we go furter up to a 100 x 1000 ct, the looped version takes 90 seconds. Yours exits immediately with 'ValueError: broadcast dimensions too large.' An other thing is that the input is a bit difficult. Perhaps we could have flattened count and calculated x and y inside the function? Then it would just take a 2D array of counts as input. def kendalltau_fromct(count): '''return tau-a, tau-b and tau-c from contingency table data example for contingency table count = np.array([[10, 5, 2], [ 9, 12, 16]]) ''' assert(count.ndim == 2) ny = count.shape[0] nx = count.shape[1] x, y = np.meshgrid(range(nx),range(ny)) x = x.flatten() y = y.flatten() count = count.flatten() catx = np.unique(x) caty = np.unique(y) ncatx = len(catx) ncaty = len(caty) deltax = np.sign(x[:,np.newaxis] - x) deltay = np.sign(y[:,np.newaxis] - y) paircount = count[:,np.newaxis]*count - np.diag(count) # number of concordant minus number of discordant pairs netconc = np.sum((deltax*deltay)*paircount) # calculation for tau-a tau_a = netconc/(1.*paircount.sum()) # calculation for tau-c m = min(ncatx,ncaty) tau_c = netconc /(1.*count.sum()**2)* m/(m-1.0) #extra calculation for tau_b # row and column counts of contingency table countx = np.dot(count,(x[:,np.newaxis]==catx)) county = np.dot(count,(y[:,np.newaxis]==caty)) #total number of pairs npairs = paircount.sum() #number of ties ntiex = np.dot(countx,(countx-1)) ntiey = np.dot(county,(county-1)) denom = 1.0*np.sqrt((npairs - ntiex ) * (npairs - ntiey)) tau_b = netconc / denom return tau_a, tau_b, tau_c Anyhow, I think Fortran or Cython is really needed here. Sturla From sturla at molden.no Thu Mar 19 09:30:25 2009 From: sturla at molden.no (Sturla Molden) Date: Thu, 19 Mar 2009 14:30:25 +0100 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <49C22FC2.5090203@molden.no> References: <1237316749.6984.13.camel@rufio-laptop> <49C0F72D.6050406@gmail.com> <1cd32cbb0903180812q2172c29fr34419e773b423ed6@mail.gmail.com> <49C1197A.8070109@molden.no> <49C121BB.3020701@molden.no> <1cd32cbb0903181016y1c44ee7cw2dd46b7b5117a8f0@mail.gmail.com> <49C13500.5030605@molden.no> <1cd32cbb0903181313g3501e20bu1fbabe81190c9a88@mail.gmail.com> <82d3c7587b21972d45039f7cbe04f669.squirrel@webmail.uio.no> <8a11e2fa01503c44bf051dc04a36857a.squirrel@webmail.uio.no> <1cd32cbb0903181858j2ad945b3k2b3bb8f5c236bf0a@mail.gmail.com> <49C22FC2.5090203@molden.no> Message-ID: <49C248F1.7000108@molden.no> So, it seems this version is faster and more memory efficient: def kendalltau_fromct(table): '''return tau-a, tau-b and tau-c from contingency table data example for contingency table: count = np.array([[10, 5, 2], [ 9, 12, 16]]) ''' def _table(j,i): return { #'above' : table[:j,i], 'below' : table[j+1:,i], #'left' : table[j,:i], 'right' : table[j,i+1:], #'upper-left' : table[:j,:i], #'upper-right' : table[:j,i+1:], 'lower-left' : table[j+1:,:i], 'lower-right' : table[j+1:,i+1:] } D = table.shape[0] C = table.shape[1] P = 0 Q = 0 Tx = 0 Ty = 0 for i in range(C): for j in range(D): pivot = table[j,i] # use this as pivot ct = _table(j,i) # split remainder into 8 sections # count concordant pairs # -- multiply pivot with 'lower-right' and summate P += np.sum(ct['lower-right'] * pivot) # count disconcordant pairs # -- multiply pivot with 'lower-left' and summate Q += np.sum(ct['lower-left'] * pivot) # count pairs tied in y # -- multiply pivot with 'right' and summate Ty += np.sum(ct['right'] * pivot) # count pairs tied in x # -- multiply pivot with 'below' and summate Tx += np.sum(ct['below'] * pivot) n = np.sum(table) tau_a = 2*float(P-Q)/(n*(n-1)) tau_b = float(P-Q)/(sqrt((P+Q+Tx)*(P+Q+Ty))) m = C if C < D else D tau_c = (P-Q)*(2*m/float((m-1)*n**2)) return tau_a, tau_b, tau_c If Fortran 95 is difficult to build and maintain, we can do this in Cython as well. There is no need for function calls with array arguments here, so Fortan has no advantage over Cython in this case. Something like this: import numpy as np fom math import sqrt cimport numpy as np ctypedef np.int_t int_t cimport cython @cython.boundscheck(False) def kendalltau_fromct(np.ndarray[int_t, ndim=2] table): '''return tau-a, tau-b and tau-c from contingency table data example for contingency table: count = np.array([[10, 5, 2], [ 9, 12, 16]]) ''' cdef int C, D, P, Q, Tx, Ty cdef int i, j, ii, jj, m, n D = table.shape[0] C = table.shape[1] P = 0 Q = 0 Tx = 0 Ty = 0 n = 0 for j in range(D): for i in range(C): pivot = table[j,i] # use this as pivot n = n + pivot # count concordant pairs # -- multiply pivot with 'lower-right' and summate for jj in range(j+1,D): for ii in range(i+1,C): P = P + table[jj,ii] * pivot # count disconcordant pairs # -- multiply pivot with 'lower-left' and summate for jj in range(j+1,D): for ii in range(i): Q = Q + table[jj,ii] * pivot # count pairs tied in y # -- multiply pivot with 'right' and summate for ii in range(i+1,C): Ty = Ty + table[j,ii] * pivot # count pairs tied in x # -- multiply pivot with 'below' and summate for jj in range(j+1,D): Tx = Tx + table[jj,i] * pivot tau_a = 2*float(P-Q)/(n*(n-1)) tau_b = float(P-Q)/(sqrt(float((P+Q+Tx)*(P+Q+Ty)))) m = C if C < D else D tau_c = (P-Q)*(2*m/float((m-1)*n**2)) return tau_a, tau_b, tau_c Now I am done Kendall's tau for sure. Pick whichever you like. Sturla Molden From josef.pktd at gmail.com Thu Mar 19 13:03:16 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 19 Mar 2009 13:03:16 -0400 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <49C248F1.7000108@molden.no> References: <1237316749.6984.13.camel@rufio-laptop> <49C121BB.3020701@molden.no> <1cd32cbb0903181016y1c44ee7cw2dd46b7b5117a8f0@mail.gmail.com> <49C13500.5030605@molden.no> <1cd32cbb0903181313g3501e20bu1fbabe81190c9a88@mail.gmail.com> <82d3c7587b21972d45039f7cbe04f669.squirrel@webmail.uio.no> <8a11e2fa01503c44bf051dc04a36857a.squirrel@webmail.uio.no> <1cd32cbb0903181858j2ad945b3k2b3bb8f5c236bf0a@mail.gmail.com> <49C22FC2.5090203@molden.no> <49C248F1.7000108@molden.no> Message-ID: <1cd32cbb0903191003y1ea48ca3g331b82ec3aeffa99@mail.gmail.com> On Thu, Mar 19, 2009 at 9:30 AM, Sturla Molden wrote: > > So, it seems this version is faster and more memory efficient: > > def kendalltau_fromct(table): > ? ? '''return tau-a, tau-b and tau-c from contingency table data > > ? ? example for contingency table: > > ? ? count = np.array([[10, ?5, ?2], > ? ? ? ? ? ? ? ? ? ? ? ?[ 9, 12, 16]]) > > ? ? ''' > > ? ? def _table(j,i): > ? ? ? return { > ? ? ? ? ?#'above' : ? ? ? table[:j,i], > ? ? ? ? ?'below' : ? ? ? table[j+1:,i], > ? ? ? ? ?#'left' : ? ? ? ?table[j,:i], > ? ? ? ? ?'right' : ? ? ? table[j,i+1:], > ? ? ? ? ?#'upper-left' : ?table[:j,:i], > ? ? ? ? ?#'upper-right' : table[:j,i+1:], > ? ? ? ? ?'lower-left' : ?table[j+1:,:i], > ? ? ? ? ?'lower-right' : table[j+1:,i+1:] } > > > ? ? D = table.shape[0] > ? ? C = table.shape[1] > ? ? P = 0 > ? ? Q = 0 > ? ? Tx = 0 > ? ? Ty = 0 > ? ? for i in range(C): > ? ? ? ? for j in range(D): > > ? ? ? ? ? ? pivot = table[j,i] # use this as pivot > ? ? ? ? ? ? ct = _table(j,i) ? # split remainder into 8 sections > > ? ? ? ? ? ? # count concordant pairs > ? ? ? ? ? ? # -- multiply pivot with 'lower-right' and summate > ? ? ? ? ? ? P += np.sum(ct['lower-right'] * pivot) > > ? ? ? ? ? ? # count disconcordant pairs > ? ? ? ? ? ? # -- multiply pivot with 'lower-left' and summate > > ? ? ? ? ? ? Q += np.sum(ct['lower-left'] * pivot) > > ? ? ? ? ? ? # count pairs tied in y > ? ? ? ? ? ? # -- multiply pivot with 'right' and summate > > ? ? ? ? ? ? Ty += np.sum(ct['right'] * pivot) > > ? ? ? ? ? ? # count pairs tied in x > ? ? ? ? ? ? # -- multiply pivot with 'below' and summate > > ? ? ? ? ? ? Tx += np.sum(ct['below'] * pivot) > > ? ? n = np.sum(table) > ? ? tau_a = 2*float(P-Q)/(n*(n-1)) > ? ? tau_b = float(P-Q)/(sqrt((P+Q+Tx)*(P+Q+Ty))) > ? ? m = C if C < D else D > ? ? tau_c = (P-Q)*(2*m/float((m-1)*n**2)) > ? ? return tau_a, tau_b, tau_c > > > > If Fortran 95 is difficult to build and maintain, we can do this in > Cython as well. There is no need for function calls with array arguments > here, so Fortan has no advantage over Cython in this case. Something > like this: > > > import numpy as np > fom math import sqrt > cimport numpy as np > ctypedef np.int_t int_t > cimport cython > > @cython.boundscheck(False) > def kendalltau_fromct(np.ndarray[int_t, ndim=2] table): > ? ? '''return tau-a, tau-b and tau-c from contingency table data > > ? ? example for contingency table: > > ? ? count = np.array([[10, ?5, ?2], > ? ? ? ? ? ? ? ? ? ? ? ?[ 9, 12, 16]]) > > ? ? ''' > > ? ? cdef int C, D, P, Q, Tx, Ty > ? ? cdef int i, j, ii, jj, m, n > > ? ? D = table.shape[0] > ? ? C = table.shape[1] > ? ? P = 0 > ? ? Q = 0 > ? ? Tx = 0 > ? ? Ty = 0 > ? ? n = 0 > ? ? for j in range(D): > ? ? ? ? for i in range(C): > > ? ? ? ? ? ? pivot = table[j,i] # use this as pivot > ? ? ? ? ? ? n = n + pivot > > ? ? ? ? ? ? # count concordant pairs > ? ? ? ? ? ? # -- multiply pivot with 'lower-right' and summate > ? ? ? ? ? ? for jj in range(j+1,D): > ? ? ? ? ? ? ? ? for ii in range(i+1,C): > ? ? ? ? ? ? ? ? ? ? P = P + table[jj,ii] * pivot > > > ? ? ? ? ? ? # count disconcordant pairs > ? ? ? ? ? ? # -- multiply pivot with 'lower-left' and summate > ? ? ? ? ? ? for jj in range(j+1,D): > ? ? ? ? ? ? ? ? for ii in range(i): > ? ? ? ? ? ? ? ? ? ? Q = Q + table[jj,ii] * pivot > > ? ? ? ? ? ? # count pairs tied in y > ? ? ? ? ? ? # -- multiply pivot with 'right' and summate > ? ? ? ? ? ? for ii in range(i+1,C): > ? ? ? ? ? ? ? ? Ty = Ty + table[j,ii] * pivot > > ? ? ? ? ? ? # count pairs tied in x > ? ? ? ? ? ? # -- multiply pivot with 'below' and summate > ? ? ? ? ? ? for jj in range(j+1,D): > ? ? ? ? ? ? ? ? Tx = Tx + table[jj,i] * pivot > > ? ? tau_a = 2*float(P-Q)/(n*(n-1)) > ? ? tau_b = float(P-Q)/(sqrt(float((P+Q+Tx)*(P+Q+Ty)))) > ? ? m = C if C < D else D > ? ? tau_c = (P-Q)*(2*m/float((m-1)*n**2)) > ? ? return tau_a, tau_b, tau_c > > > Now I am done Kendall's tau for sure. Pick whichever you like. > > > > Sturla Molden > Thanks, this was very instructive, especially seeing a python, a cython and a fortran version next to each other. Your explanation with the 8 regions for a pivot point in the table is very helpful. I looked at the SAS documentation and they define Kendall tau and similar in an equivalent way, but double counting pairs. To be a bit more precise in the comment to Tx and Ty, we should add that the count excludes simultaneous ties in both variables, since this started the entire discussion # Ty: count pairs tied in x and not in y # Ty: count pairs tied in y and not in x I only checked your python version, and all numbers agree with mine and so can be considered as verified with spss and R (for tau-b). The only part that is left to do, is to find the pvalues for tau-a and tau-c and check the discrepancy for tau-b to R. scipy.stats.kendalltau doesn't take ties into account when calculating the variance of tau-b. however, quote from the SAS documentation: http://support.sas.com/documentation/cdl/en/statug/59654/HTML/default/statug_freq_a0000000630.htm "Note that the ratio of `est` to `sqrt(variance(est))` is the same for the following measures: gamma, Kendall?s tau-b, Stuart?s tau-c, Somers? D(R|C) , and Somers? D(C|R) . Therefore, the tests for these measures are identical. For example, the p-values for the test of H0:gamma=0 equal the p-values for the test of H0:tau-b=0 " This would imply that we need to calculate the p-value only once. SAS doesn't have information on tau-a. But there calculation of the variance of tau-c could be directly included in the loop that calculates tau-c. The SAS documentation is pretty good. I should have found it earlier, and it would have saved me a lot of time googling. The reference for the calculation for kendall tau-b for the original data series is at http://support.sas.com/documentation/cdl/en/procstat/59629/HTML/default/procstat_corr_sect015.htm (here they count pairs only once) I was not successful in finding the variance, for kendall's tau-a. I think without p-value, we can also skip tau-a since it doesn't seem so popular and include gamma which uses the same calculations as tau. I will open an enhancement ticket for your contingency table version and the extension of the current stats.kendalltau. I'm in favor of the cython version with the python version as reference, but I still have to figure out how to include cython code in scipy. This was a long thread, but I'm glad that the version proliferation and ambiguities for Kendall's tau have been cleared up. Thanks, Josef From sturla at molden.no Thu Mar 19 13:13:58 2009 From: sturla at molden.no (Sturla Molden) Date: Thu, 19 Mar 2009 18:13:58 +0100 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <1cd32cbb0903191003y1ea48ca3g331b82ec3aeffa99@mail.gmail.com> References: <1237316749.6984.13.camel@rufio-laptop> <49C121BB.3020701@molden.no> <1cd32cbb0903181016y1c44ee7cw2dd46b7b5117a8f0@mail.gmail.com> <49C13500.5030605@molden.no> <1cd32cbb0903181313g3501e20bu1fbabe81190c9a88@mail.gmail.com> <82d3c7587b21972d45039f7cbe04f669.squirrel@webmail.uio.no> <8a11e2fa01503c44bf051dc04a36857a.squirrel@webmail.uio.no> <1cd32cbb0903181858j2ad945b3k2b3bb8f5c236bf0a@mail.gmail.com> <49C22FC2.5090203@molden.no> <49C248F1.7000108@molden.no> <1cd32cbb0903191003y1ea48ca3g331b82ec3aeffa99@mail.gmail.com> Message-ID: <49C27D56.6090906@molden.no> On 3/19/2009 6:03 PM, josef.pktd at gmail.com wrote: > I only checked your python version, and all numbers agree with mine > and so can be considered as verified with spss and R (for tau-b). > I will open an enhancement ticket for your contingency table version > and the extension of the current stats.kendalltau. You don't have to. I just did (ticket #893). I have added Cython versions of kendalltau and kendalltau_fromct. They are more than ten times faster than the Python version I posted. It needs review and decision. I get numbers that look correct to me. http://projects.scipy.org/scipy/ticket/893 Sturla Molden From thouis at broad.mit.edu Thu Mar 19 13:55:58 2009 From: thouis at broad.mit.edu (Thouis (Ray) Jones) Date: Thu, 19 Mar 2009 13:55:58 -0400 Subject: [SciPy-dev] double exponential integrator Message-ID: <6c17e6f50903191055u4fab96bbs787e60e5e060a362@mail.gmail.com> I'm soliciting feedback on an implementation of integration using the double exponential transform. I've tentatively placed it in scipy.integrate as de_integrate. It's available at this url and branch: http://broad.mit.edu/~thouis/scipy.git DEintegrator (assuming I've set up my git repository correctly, which is quite possibly not the case.) See http://www.johndcook.com/double_exponential_integration.html for an explanation of the double exponential transform. It's main use is integrating functions that have a singularity at one or both ends of the integration limits, though it works reasonably well for arbitrary functions, though my implementation could be made more robust in many ways. I've also included code for generating the constants necessary for integrations with infinite limits (0 to infinity as well as -inf to +inf). Thouis Jones From josef.pktd at gmail.com Thu Mar 19 14:04:17 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 19 Mar 2009 14:04:17 -0400 Subject: [SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau) In-Reply-To: <49C27D56.6090906@molden.no> References: <1237316749.6984.13.camel@rufio-laptop> <49C13500.5030605@molden.no> <1cd32cbb0903181313g3501e20bu1fbabe81190c9a88@mail.gmail.com> <82d3c7587b21972d45039f7cbe04f669.squirrel@webmail.uio.no> <8a11e2fa01503c44bf051dc04a36857a.squirrel@webmail.uio.no> <1cd32cbb0903181858j2ad945b3k2b3bb8f5c236bf0a@mail.gmail.com> <49C22FC2.5090203@molden.no> <49C248F1.7000108@molden.no> <1cd32cbb0903191003y1ea48ca3g331b82ec3aeffa99@mail.gmail.com> <49C27D56.6090906@molden.no> Message-ID: <1cd32cbb0903191104h171556e8i9047a5a9288a6925@mail.gmail.com> On Thu, Mar 19, 2009 at 1:13 PM, Sturla Molden wrote: > On 3/19/2009 6:03 PM, josef.pktd at gmail.com wrote: > >> I only checked your python version, and all numbers agree with mine >> and so can be considered as verified with spss and R (for tau-b). > > ?> I will open an enhancement ticket for your contingency table version > ?> and the extension of the current stats.kendalltau. > > You don't have to. I just did (ticket #893). > > I have added Cython versions of kendalltau and kendalltau_fromct. They > are more than ten times faster than the Python version I posted. > > It needs review and decision. I get numbers that look correct to me. > > http://projects.scipy.org/scipy/ticket/893 > Thank you, I will look at it. Since we needed the conversion between flat and 2d contingency table, I cleaned it up a bit. If you find it useful, we could also include it, it handles only 2d tables, not higher dimensional, but allows for arbitrary category labels. (I reversed x and y from your original version of table2flat. Josef def flat2table(x, y, count): '''convert flat contingency table to 2d Parameters ---------- x, y: 1d arrays flattened categories of observation x is row variable y is column variable count: 1d array of cells content Returns ------- --------- ctable: 2d array, contingency table xcat, ycat: 1d arrays labels of x and y categories Examples -------- >>> x = np.array([1,1,1,2,2,2]) >>> y = np.array([1,2,3,1,2,3]) >>> count = np.array([10, 5, 2, 9, 12, 16]) >>> flat2table(x, y, count) [0 0 0 1 1 1] [0 1 2 0 1 2] (array([[ 10., 5., 2.], [ 9., 12., 16.]]), array([1, 2]), array([1, 2, 3])) >>> flat2table(x.astype(str), y.astype(str), count) [0 0 0 1 1 1] [0 1 2 0 1 2] (array([[ 10., 5., 2.], [ 9., 12., 16.]]), array(['1', '2'], dtype='|S1'), array(['1', '2', '3'], dtype='|S1')) ''' catx, xrinv = np.unique1d(x, return_inverse=True) caty, yrinv = np.unique1d(y, return_inverse=True) ncatx = len(catx) ncaty = len(caty) tab = np.zeros((ncatx, ncaty)) tab[xrinv,yrinv] = count return tab, catx, caty def table2flat(ctable, xcat=None, ycat=None): '''convert contingency table to flat format Parmeters --------- ctable: 2d array, contingency table xcat, ycat: 1d arrays labels of x and y categories Returns ------- x, y: 1d arrays flattened categories of observation x is row variable y is column variable count: 1d array of cells content Examples -------- >>> tab = np.array([[ 10., 5., 2.], [ 9., 12., 16.]]) >>> table2flat(tab) (array([0, 0, 0, 1, 1, 1]), array([0, 1, 2, 0, 1, 2]), array([ 10., 5., 2., 9., 12., 16.])) >>> table2flat(tab,np.arange(1,nx+1).astype(str), np.arange(1,ny+1).astype(str)) (array(['1', '1', '1', '2', '2', '2'], dtype='|S1'), array(['1', '2', '3', '1', '2', '3'], dtype='|S1'), array([ 10., 5., 2., 9., 12., 16.])) ''' assert(ctable.ndim == 2) nx = ctable.shape[0] ny = ctable.shape[1] if xcat is None: xcat = range(nx) if ycat is None: ycat = range(ny) y, x = np.meshgrid(ycat, xcat) x = x.flatten() y = y.flatten() count = ctable.flatten() return x, y, count # example and round trip test x = np.array([1,1,1,2,2,2]) y = np.array([1,2,3,1,2,3]) count = np.array([10, 5, 2, 9, 12, 16]) tab = flat2table(x, y, count)[0] nx, ny = tab.shape assert_array_equal((x-1,y-1,count),table2flat(flat2table(x, y, count)[0])) assert_array_equal((x,y,count), table2flat(flat2table(x, y, count)[0],range(1,nx+1), range(1,ny+1))) From pav at iki.fi Fri Mar 20 05:46:42 2009 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 20 Mar 2009 09:46:42 +0000 (UTC) Subject: [SciPy-dev] double exponential integrator References: <6c17e6f50903191055u4fab96bbs787e60e5e060a362@mail.gmail.com> Message-ID: Hi, Thu, 19 Mar 2009 13:55:58 -0400, Thouis (Ray) Jones wrote: > I'm soliciting feedback on an implementation of integration using the > double exponential transform. I've tentatively placed it in > scipy.integrate as de_integrate. +1 for including this in Scipy 0.8.0. Some quick comments (I'll try to find time for better comments later): - Vectorization: you are using a list comprehension in doubleexp.py:contribution_at_level and elsewhere These statements could be vectorized -- I believe you can also require that the integrand function `f` can evaluate many points at the same time and return an array. Could be an useful speedup. - Is generating the _abscissas_raw and _weights_raw costly? I see that you use `mpmath` to prepare these. Does the generation fail in double precision? If not, it might be better to generate them when the integration function is first used. (Also, it might be nice to put also the generator functions in the same doubleexp.py; these are not long files.) - Function name `de_integrate`: Perhaps it should be `quad_de` or something similar, since it's a quadrature, and the "basic" quadrature in scipy.integrate is called `quad`. - I'm not sure about if a `full_output` switch is good API. Several Scipy functions do currently use something like that, but perhaps it would be better not to introduce more... > It's available at this url and branch: > http://broad.mit.edu/~thouis/scipy.git DEintegrator > > (assuming I've set up my git repository correctly, which is quite > possibly not the case.) The repository works OK. -- Pauli Virtanen From pav at iki.fi Fri Mar 20 05:52:06 2009 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 20 Mar 2009 09:52:06 +0000 (UTC) Subject: [SciPy-dev] double exponential integrator References: <6c17e6f50903191055u4fab96bbs787e60e5e060a362@mail.gmail.com> Message-ID: Thu, 19 Mar 2009 13:55:58 -0400, Thouis (Ray) Jones wrote: > I'm soliciting feedback on an implementation of integration using the > double exponential transform. I've tentatively placed it in > scipy.integrate as de_integrate. Also, could you file an enhancement ticket on Scipy's Trac: http://projects.scipy.org/scipy/ This should ensure that your contribution does not disappear in the mailing list traffic... -- Pauli Virtanen From josef.pktd at gmail.com Fri Mar 20 11:04:38 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 20 Mar 2009 11:04:38 -0400 Subject: [SciPy-dev] special.multigammaln missing in docs Message-ID: <1cd32cbb0903200804m29f9c200v10df4e040435b605@mail.gmail.com> While searching in trac, I found that in changeset 3181 multigammaln was added to scipy.special. It is missing in the docs for special, special.rst. I don't want to add it myself, because I don't know in which category it belongs. Josef From thouis at broad.mit.edu Fri Mar 20 12:31:16 2009 From: thouis at broad.mit.edu (Thouis (Ray) Jones) Date: Fri, 20 Mar 2009 12:31:16 -0400 Subject: [SciPy-dev] double exponential integrator In-Reply-To: References: <6c17e6f50903191055u4fab96bbs787e60e5e060a362@mail.gmail.com> Message-ID: <6c17e6f50903200931q417d81fbj29fad3ab2961d6fd@mail.gmail.com> Renamed the function to quad_de. Changed its signature to be closer to quadrature(). Improved reporting when the func() returns nonfinite values. Added vectorization (via the vectorize1 functions in quadrature.py. Filed a trac ticket: #895. The generation of the weights and abscissas is not particularly costly, but I believe they do fail in double precision arithmetic. If it's really desirable to have them created dynamically (to allow more levels of integration, for instance), it would probably require writing a Taylor expansion or similar. I've left them in a separate file for now, due to their reliance on mpmath. In the future, this code could accept arbitrary limits (0, 1, or 2 infinite limits) and use whichever abscissa and weight functions are appropriate, with C code to generate them on demand to arbitrary levels. I would like to gauge interest in this code in general before adding more functionality. The interface would be backwards compatible, with possibly one new keyword argument to handle the two different versions of integrals from 0 to infinity (standard versus exponential falloff). Ray Jones On Fri, Mar 20, 2009 at 05:46, Pauli Virtanen wrote: > Hi, > > Thu, 19 Mar 2009 13:55:58 -0400, Thouis (Ray) Jones wrote: >> I'm soliciting feedback on an implementation of integration using the >> double exponential transform. ?I've tentatively placed it in >> scipy.integrate as de_integrate. > > +1 for including this in Scipy 0.8.0. > > Some quick comments (I'll try to find time for better comments later): > > - Vectorization: you are using a list comprehension in > ?doubleexp.py:contribution_at_level and elsewhere > > ?These statements could be vectorized -- I believe you can also require > ?that the integrand function `f` can evaluate many points at the same > ?time and return an array. Could be an useful speedup. > > - Is generating the _abscissas_raw and _weights_raw costly? > > ?I see that you use `mpmath` to prepare these. Does the generation > ?fail in double precision? > > ?If not, it might be better to generate them when the integration > ?function is first used. (Also, it might be nice to put also the > ?generator functions in the same doubleexp.py; these are not long files.) > > - Function name `de_integrate`: Perhaps it should be `quad_de` or > ?something similar, since it's a quadrature, and the "basic" quadrature > ?in scipy.integrate is called `quad`. > > - I'm not sure about if a `full_output` switch is good API. > ?Several Scipy functions do currently use something like that, > ?but perhaps it would be better not to introduce more... > >> It's available at this url and branch: >> http://broad.mit.edu/~thouis/scipy.git DEintegrator >> >> (assuming I've set up my git repository correctly, which is quite >> possibly not the case.) > > The repository works OK. > > -- > Pauli Virtanen > > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From jsseabold at gmail.com Fri Mar 20 13:47:47 2009 From: jsseabold at gmail.com (Skipper Seabold) Date: Fri, 20 Mar 2009 13:47:47 -0400 Subject: [SciPy-dev] GSoC Project Ideas Questions Message-ID: Hello all, I am a PhD student in economics, and I am very interested in getting involved with the GSoC 2009 and most specifically SciPy/Numpy. I was wondering if there was an application template available yet, so I could get started. Also wondering if there were priorities in place for the project ideas. I'm trying to decide where I might be most useful. Any advice would be appreciated. Along this line, does anyone have the source of Jonathan Taylor's statistical models? Links I have seen are broken. (Sorry for the cross post from SoC2009-General.) Best, Skipper Seabold From josef.pktd at gmail.com Fri Mar 20 14:04:17 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 20 Mar 2009 14:04:17 -0400 Subject: [SciPy-dev] GSoC Project Ideas Questions In-Reply-To: References: Message-ID: <1cd32cbb0903201104p1daf0b90u8fe1cf7930c196d2@mail.gmail.com> On Fri, Mar 20, 2009 at 1:47 PM, Skipper Seabold wrote: > Hello all, > > Along this line, does anyone have the source of Jonathan Taylor's > statistical models? ?Links I have seen are broken. ?(Sorry for the > cross post from SoC2009-General.) > stats.models is currently in nipy: https://code.launchpad.net/nipy One branch with some bug fixes is in http://bazaar.launchpad.net/~nipy-developers/nipy/trunk-josef-models/files/head%3A/neuroimaging/fixes//scipy/ nipy trunk and other branches, also have or might have changes to the original models code. Josef From berkes at gatsby.ucl.ac.uk Fri Mar 20 14:26:15 2009 From: berkes at gatsby.ucl.ac.uk (Pietro Berkes) Date: Fri, 20 Mar 2009 14:26:15 -0400 Subject: [SciPy-dev] Checking matrix dtype in umfpack.py Message-ID: Dear all, on a Mac OS X 10.5, with the latest scipy SVN version I get a few errors from the sparse package when running scipy.test() . They all look more or less like this: ====================================================================== ERROR: Prefactorize (with UMFPACK) matrix for solving with multiple rhs ---------------------------------------------------------------------- Traceback (most recent call last): File "numpy/testing/decorators.py", line 82, in skipper File "/Users/berkes/local/python_libs/lib/python2.5/site-packages/scipy/sparse/linalg/dsolve/umfpack/tests/test_umfpack.py", line 90, in test_factorized_umfpack solve = linsolve.factorized( a ) File "/Users/berkes/local/python_libs//lib/python2.5/site-packages/scipy/sparse/linalg/dsolve/linsolve.py", line 160, in factorized umf.numeric( A ) File "umfpack/umfpack.py", line 395, in numeric File "umfpack/umfpack.py", line 356, in symbolic File "umfpack/umfpack.py", line 341, in _getIndx ValueError: matrix must have float64 values This is due to a non-robust type checking in umfpack.py in the function _getIndx. I suggest to replace this code if self.isReal: if mtx.data.dtype != nm.dtype(' I've been fiddling with ideas for GSoC related to SciPy and I wanted to run this by people on the list. David C. and others are often complaining that C and Fortran code is an order of magnitude harder to maintain than Python/Cython code. Thus, would there be interest in a proposal that included rewriting Damian Eads' excellent scipy.spatial.distance and scipy.cluster.vq in Cython? I've already been scoping this out as I had wanted to add output matrix functionality to scipy.spatial.pdist and scipy.spatial.cdist, which would make scenarios where distances are recomputed frequently (as in some sort of tracking application) much less memory-intensive. kmeans Also at the back of my mind have been implementing some of the tricks found in the literature for speeding up k-means (optimized versions that take advantage of the triangle inequality, for instance; "online" k-means, by which I mean updating the means with the contribution of each data point sequentially as opposed to considering them all at once). I'd also like to see the addition of exemplar based methods such as k-centers and the relatively new affinity propagation (there is a reference implementation of the latter which would be unsuitable for direct translation from MATLAB due to licensing, so I'd be proposing a clean-room implementation). Any feedback, additional suggestions would be welcome. Thanks, David From cournape at gmail.com Sat Mar 21 01:39:39 2009 From: cournape at gmail.com (David Cournapeau) Date: Sat, 21 Mar 2009 14:39:39 +0900 Subject: [SciPy-dev] special.multigammaln missing in docs In-Reply-To: <1cd32cbb0903200804m29f9c200v10df4e040435b605@mail.gmail.com> References: <1cd32cbb0903200804m29f9c200v10df4e040435b605@mail.gmail.com> Message-ID: <5b8d13220903202239oc2cac0fg29be2a21690fc534@mail.gmail.com> On Sat, Mar 21, 2009 at 12:04 AM, wrote: > While searching in trac, I found that in changeset 3181 ?multigammaln > was added to scipy.special. > > It is missing in the docs for special, special.rst. > > I don't want to add it myself, because I don't know in which category > it belongs. I added it myself (IIRC, I am the one added it). I also fixed the docstring to follow the new convention, thanks for the heads-up, David From cournape at gmail.com Sat Mar 21 01:50:19 2009 From: cournape at gmail.com (David Cournapeau) Date: Sat, 21 Mar 2009 14:50:19 +0900 Subject: [SciPy-dev] Another GSoC idea In-Reply-To: References: Message-ID: <5b8d13220903202250m170eabf3mfe9fc35f4bd4273b@mail.gmail.com> Hi David, On Sat, Mar 21, 2009 at 2:01 PM, David Warde-Farley wrote: > I've been fiddling with ideas for GSoC related to SciPy and I wanted > to run this by people on the list. > > David C. and others are often complaining that C and Fortran code is > an order of magnitude harder to maintain than Python/Cython code. > Thus, would there be interest in a proposal that included rewriting > Damian Eads' excellent scipy.spatial.distance and scipy.cluster.vq in > Cython? For scipy.cluster.vq, I already have something in Cython - just not put into scipy because the code is barely "research quality" (whatever that means :) ). But I think it would be less work to improve it than to start from scratch. > > I've already been scoping this out as I had wanted to add output > matrix functionality to scipy.spatial.pdist and scipy.spatial.cdist, > which would make scenarios where distances are recomputed frequently > (as in some sort of tracking application) much less memory-intensive. > kmeans I think this would be a great addition. You are of course free to choose what you work on, but I like the idea of a basic set of recursives implementations of basic statistics and clustering algorithms. I have also myself an implementation of online EM for online estimation of GMM, based on the following preprint: http://www.citeulike.org/user/stibor/article/3245946 But again, "research quality" code. Does this idea of focusing your proposal on the recursive side of things sounds appealing ? cheers, David From david at ar.media.kyoto-u.ac.jp Sat Mar 21 03:58:00 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sat, 21 Mar 2009 16:58:00 +0900 Subject: [SciPy-dev] Will scipy 0.8 depend on numpy 1.3.0 ? Message-ID: <49C49E08.4000201@ar.media.kyoto-u.ac.jp> Hi, Everything in the title: will scipy 0.8 depend on numpy 1.3 or do we still want to maintain compatibility with the 1.2 serie ? cheers, David From pav at iki.fi Sat Mar 21 06:03:53 2009 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 21 Mar 2009 10:03:53 +0000 (UTC) Subject: [SciPy-dev] Will scipy 0.8 depend on numpy 1.3.0 ? References: <49C49E08.4000201@ar.media.kyoto-u.ac.jp> Message-ID: Sat, 21 Mar 2009 16:58:00 +0900, David Cournapeau wrote: > Everything in the title: will scipy 0.8 depend on numpy 1.3 or do we > still want to maintain compatibility with the 1.2 serie ? Depending on 1.3 would have the advantage of being able to use the NPY_NAN etc. core math symbols in scipy.special. -- Pauli Virtanen From david at ar.media.kyoto-u.ac.jp Sat Mar 21 09:36:50 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sat, 21 Mar 2009 22:36:50 +0900 Subject: [SciPy-dev] Will scipy 0.8 depend on numpy 1.3.0 ? In-Reply-To: References: <49C49E08.4000201@ar.media.kyoto-u.ac.jp> Message-ID: <49C4ED72.6060409@ar.media.kyoto-u.ac.jp> Pauli Virtanen wrote: > Sat, 21 Mar 2009 16:58:00 +0900, David Cournapeau wrote: > >> Everything in the title: will scipy 0.8 depend on numpy 1.3 or do we >> still want to maintain compatibility with the 1.2 serie ? >> > > Depending on 1.3 would have the advantage of being able to use the > NPY_NAN etc. core math symbols in scipy.special. > Yes and no - although the core math library is independent from the rest of numpy/core by design, it is not yet made available to other extensions. The library is neither installed nor can an extension get the necessary flags to link against it. I was planning on doing this in 1.4 As a temporary gap, we could copy the library sources in scipy/special in the meantime - that's ugly, but better than the current situation in scipy/special (which incidentally cause trouble for scipy on win64). Or I could make sure the library is available in numpy 1.3, but I would prefer having time to think about a general mechanism to share libraries between numpy and other extensions in numpy.distutils, cheers, David From mellerf at netvision.net.il Sat Mar 21 12:06:09 2009 From: mellerf at netvision.net.il (Yosef Meller) Date: Sat, 21 Mar 2009 18:06:09 +0200 Subject: [SciPy-dev] Tests for optimize.fsolve Message-ID: <49C51071.4090901@netvision.net.il> I'm looking in optimize/test_optimize/ and see no test for optimize.fsolve(). Is it because there isn't any or because I'm looking at the wrong place or something else? Thanks. From pav at iki.fi Sat Mar 21 12:58:13 2009 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 21 Mar 2009 16:58:13 +0000 (UTC) Subject: [SciPy-dev] Tests for optimize.fsolve References: <49C51071.4090901@netvision.net.il> Message-ID: Sat, 21 Mar 2009 18:06:09 +0200, Yosef Meller wrote: > I'm looking in optimize/test_optimize/ and see no test for > optimize.fsolve(). Is it because there isn't any or because I'm looking > at the wrong place or something else? If there are no tests under optimize/tests/ for fsolve (there indeed don't appear to be any), this very likely means that there are no tests anywhere. Certainly the current situation re testing needs improvement. If you (or someone else) wants to write tests for fsolve or the other routines, this would be a useful contribution. -- Pauli Virtanen From mellerf at netvision.net.il Sat Mar 21 14:17:40 2009 From: mellerf at netvision.net.il (Yosef Meller) Date: Sat, 21 Mar 2009 20:17:40 +0200 Subject: [SciPy-dev] Tests for optimize.fsolve In-Reply-To: References: <49C51071.4090901@netvision.net.il> Message-ID: <49C52F44.40708@netvision.net.il> ????? Pauli Virtanen: > Sat, 21 Mar 2009 18:06:09 +0200, Yosef Meller wrote: >> I'm looking in optimize/test_optimize/ and see no test for >> optimize.fsolve(). Is it because there isn't any or because I'm looking >> at the wrong place or something else? > > If there are no tests under optimize/tests/ for fsolve (there indeed > don't appear to be any), this very likely means that there are no tests > anywhere. > > Certainly the current situation re testing needs improvement. If you (or > someone else) wants to write tests for fsolve or the other routines, this > would be a useful contribution. That's the reason I'm asking (but no promises). From gael.varoquaux at normalesup.org Sun Mar 22 01:38:00 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sun, 22 Mar 2009 06:38:00 +0100 Subject: [SciPy-dev] Will scipy 0.8 depend on numpy 1.3.0 ? In-Reply-To: <49C4ED72.6060409@ar.media.kyoto-u.ac.jp> References: <49C49E08.4000201@ar.media.kyoto-u.ac.jp> <49C4ED72.6060409@ar.media.kyoto-u.ac.jp> Message-ID: <20090322053800.GA20988@phare.normalesup.org> On Sat, Mar 21, 2009 at 10:36:50PM +0900, David Cournapeau wrote: > I could make sure the library is available in numpy 1.3, but I would > prefer having time to think about a general mechanism to share libraries > between numpy and other extensions in numpy.distutils, That would sure be very useful. Ga?l From millman at berkeley.edu Sun Mar 22 02:41:48 2009 From: millman at berkeley.edu (Jarrod Millman) Date: Sat, 21 Mar 2009 23:41:48 -0700 Subject: [SciPy-dev] Will scipy 0.8 depend on numpy 1.3.0 ? In-Reply-To: <49C49E08.4000201@ar.media.kyoto-u.ac.jp> References: <49C49E08.4000201@ar.media.kyoto-u.ac.jp> Message-ID: On Sat, Mar 21, 2009 at 12:58 AM, David Cournapeau wrote: > ? ?Everything in the title: will scipy 0.8 depend on numpy 1.3 or do we > still want to maintain compatibility with the 1.2 serie ? I think the general policy should be that new releases of scipy require the most recently released version of numpy. So I wouldn't require scipy 0.8 to be compatible with numpy 1.2.x. From tom.grydeland at gmail.com Sun Mar 22 07:31:27 2009 From: tom.grydeland at gmail.com (Tom Grydeland) Date: Sun, 22 Mar 2009 12:31:27 +0100 Subject: [SciPy-dev] Has IPython been useful to you? Please let me know... In-Reply-To: References: Message-ID: Hello, Mr. Perez On Mon, Mar 16, 2009 at 5:42 AM, Fernando Perez wrote: > Hi all, > So, if you have used IPython and it has made a significant > contribution to your project, work, research, company, whatever, I'd > be very grateful if you let me know. ?A short paragraph on what this > benefit has been is all I ask. ?Once I gather any information I get, I > would contact directly some of the responders to ask for your > authorization before quoting you. My name is Tom Grydeland, I am PhD of physics, working in a small petroleum exploration company in Norway called Discover Petroleum. As part of expanding our collection of exploration tools, we have a significant research and development effort, and the majority of our in-house development uses Python with NumPy and SciPy. Without the availability of a convenient cross-platform interactive environment such as that provided by IPython, the choice of Python as a language would have been less obvious. While there are alternatives, they typically are much more expensive, or inconvenient, or both. Thank you so much for your efforts, they are much appreciated. > Best regards, > > Fernando Perez. Thanks, and best of luck with your application. -- Tom Grydeland From mellerf at netvision.net.il Sun Mar 22 15:26:42 2009 From: mellerf at netvision.net.il (Yosef Meller) Date: Sun, 22 Mar 2009 21:26:42 +0200 Subject: [SciPy-dev] Initial tests for optimize.fsolve() Message-ID: <49C690F2.3070906@netvision.net.il> Added a test problem and two initial tests that just check that nothing croaks: a run without a jacobian and a run with a jacobian. Now, I kind of lost track of the whole workflow discussion, so please tell me what process to follow to get this in. Also, is the huge docstring in TestFSolve.pressure_network() an overkill? Yours, Yosef. -------------- next part -------------- A non-text attachment was scrubbed... Name: Initial-tests-for-optimize_fsolve.patch Type: text/x-diff Size: 2972 bytes Desc: not available URL: From pav at iki.fi Sun Mar 22 16:33:07 2009 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 22 Mar 2009 20:33:07 +0000 (UTC) Subject: [SciPy-dev] Initial tests for optimize.fsolve() References: <49C690F2.3070906@netvision.net.il> Message-ID: Sun, 22 Mar 2009 21:26:42 +0200, Yosef Meller wrote: > Added a test problem and two initial tests that just check that nothing > croaks: a run without a jacobian and a run with a jacobian. > > Now, I kind of lost track of the whole workflow discussion, so please > tell me what process to follow to get this in. In general: open a ticket, attach the patch (or, even better, post it to the codereview site and paste the URL to the ticket, or branch your own Git clone and paste an URL to it), finally marking the ticket as needs review. And maybe, ping this mailing list, too. Or, you can just send the patch to this mailing list, but then it's quite possible that it gets lost in the traffic and people forget about it. This one was a trivial change, so I just committed it directly (r5633). > Also, is the huge docstring in TestFSolve.pressure_network() an > overkill? The docstring is ok, not too big, though maybe not indispensable. -- Pauli Virtanen From stefan at sun.ac.za Sun Mar 22 16:46:31 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sun, 22 Mar 2009 22:46:31 +0200 Subject: [SciPy-dev] Initial tests for optimize.fsolve() In-Reply-To: <49C690F2.3070906@netvision.net.il> References: <49C690F2.3070906@netvision.net.il> Message-ID: <9457e7c80903221346n478f328bg9b2812e52a2c1146@mail.gmail.com> Hi Yosef 2009/3/22 Yosef Meller : > Added a test problem and two initial tests that just check that nothing > croaks: a run without a jacobian and a run with a jacobian. Thanks for your contribution! > Now, I kind of lost track of the whole workflow discussion, so please tell > me what process to follow to get this in. Attach your patch to a ticket, and mark the ticket as "Ready for Review". > Also, is the huge docstring in TestFSolve.pressure_network() an overkill? That's an interesting test case! The docstring is informative, so I don't think we need to remove it. Regards St?fan P.S. If you are interested, here are some minor nitpicks about formatting. I don't include it in the main message, because it won't make the difference between a positive and negative review: The paragraph with the formulas can be marked up with two colons: + the pressures and flows in a system of n parallel pipes:: + + f_i = P_i - P_0, for i = 1..n + f_0 = sum(Q_i) - Qtot Remember the space after the paramter name: + flow_rates: float -> flow_rate : float Sentences are capitalised with full stops: + A 1D array of n flow rates [kg/s]. According to PEP08, spaces should be inserted between operators (although you'll see this "rule" being broken all over SciPy): + P = k*flow_rates**2 -> k * flow_rates**2 I guess that could also be k * flow_rates ** 2, but that doesn't feel quite right. Remove the extraneous whitespace at the end and beginning of certain lines. + jac[:n-1,:n-1] = pdiff + jac[:n-1,n-1] = 0 + jac[n-1,:] = np.ones(n) Do not align equal marks (according to PEP08). From mellerf at netvision.net.il Sun Mar 22 17:27:40 2009 From: mellerf at netvision.net.il (Yosef Meller) Date: Sun, 22 Mar 2009 23:27:40 +0200 Subject: [SciPy-dev] Initial tests for optimize.fsolve() In-Reply-To: <9457e7c80903221346n478f328bg9b2812e52a2c1146@mail.gmail.com> References: <49C690F2.3070906@netvision.net.il> <9457e7c80903221346n478f328bg9b2812e52a2c1146@mail.gmail.com> Message-ID: <49C6AD4C.3090503@netvision.net.il> ????? St?fan van der Walt: >> Now, I kind of lost track of the whole workflow discussion, so please tell >> me what process to follow to get this in. > > Attach your patch to a ticket, and mark the ticket as "Ready for Review". Like this? http://projects.scipy.org/scipy/ticket/897 > P.S. If you are interested, here are some minor nitpicks about > formatting. I don't include it in the main message, because it won't > make the difference between a positive and negative review: [snip] > Do not align equal marks (according to PEP08). Thanks, I applied what wasn't done for me. As for style, I think aligning equal marks sometimes makes it visually easier to read code (like a table), but this is too small to argue about :) From stefan at sun.ac.za Sun Mar 22 18:43:17 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 23 Mar 2009 00:43:17 +0200 Subject: [SciPy-dev] Initial tests for optimize.fsolve() In-Reply-To: <49C6AD4C.3090503@netvision.net.il> References: <49C690F2.3070906@netvision.net.il> <9457e7c80903221346n478f328bg9b2812e52a2c1146@mail.gmail.com> <49C6AD4C.3090503@netvision.net.il> Message-ID: <9457e7c80903221543g6081956u6086c9d5a133332a@mail.gmail.com> 2009/3/22 Yosef Meller : > Like this? > http://projects.scipy.org/scipy/ticket/897 Thanks, your tests are included now! See revisions 5633, 5634, 5635. Cheers St?fan From rjsm at umich.edu Sun Mar 22 22:11:18 2009 From: rjsm at umich.edu (ross smith) Date: Sun, 22 Mar 2009 22:11:18 -0400 Subject: [SciPy-dev] Porting SciPy to Py3k GSOC project Message-ID: <73531abb0903221911k1d862881q9db5f387fa93bb39@mail.gmail.com> Hello everyone, I am interested in porting SciPy/NumPy to Py3k. I've been working this past school year to port an existing code base to py3k for a research group on campus. A lot of the code relies on SciPy and NumPy but the scope of my project didn't let me work on porting either project, to my dismay. I'd love the opportunity to port a project I use heavily in my own code and gain a better understanding of how it works. We are supposed to contact the group we would be working with, to flesh out the details of our application. I've looked at the application and the only thing I know I'll need significant help with is the Milestones portion. Of course, Any and all suggestions are welcome! thank you, Ross Smith (Gaurdro on Freenode) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mattknox.ca at gmail.com Mon Mar 23 13:56:30 2009 From: mattknox.ca at gmail.com (Matt Knox) Date: Mon, 23 Mar 2009 17:56:30 +0000 (UTC) Subject: [SciPy-dev] recommended compiler for distributing scikits windows binaries Message-ID: Pierre and I are preparing to do a first official release of the timeseries scikit once numpy 1.3 is released and I was just wondering what compiler I should be using to distribute binaries for windows. What compiler is being used to build the official numpy binaries? MinGW? or MSVC? I assume it is recommended to use the same compiler for building numpy C extensions that was used to build numpy itself? Also, do I need to compile separate versions using each of the binaries included in the numpy superpack installers and thus create a "superpack" of my own for the C extensions to work with all versions? or is that only needed for the lower level numpy library itself? The C code in question doesn't do anything fancy, just uses the basic numpy C api. And will binaries compiled against the current numpy 1.3 beta version continue to work properly with the final released version? Any help would be greatly appreciated. Thanks. - Matt From dpeterson at enthought.com Mon Mar 23 16:46:40 2009 From: dpeterson at enthought.com (Dave Peterson) Date: Mon, 23 Mar 2009 15:46:40 -0500 Subject: [SciPy-dev] ANNOUNCE: ETS 3.2.0 Released Message-ID: <49C7F530.5020700@enthought.com> Hello, I'm pleased to announce that Enthought Tool Suite (ETS) version 3.2.0 has been tagged and released! Source distributions (.tar.gz) have been uploaded to PyPi, and Windows binaries will be follow shortly. A full install of ETS can be done using Setuptools via a command like: easy_install -U "ets[nonets] >= 3.2.0" NOTE 1: Users of an old ETS release will need to first uninstall prior to installing the new ETS. NOTE 2: If you get a 'SandboxViolation' error, simply re-run the command again -- it may take multiple invocations to get everything installed. (This error appears to be a long-standing incompatibility between numpy.distutils and setuptools.) Please see below for a list of what's new in this release. What Is ETS? =========== The Enthought Tool Suite (ETS) is a collection of components developed by Enthought and the open-source community, which we use every day to construct custom scientific applications. It includes a wide variety of components, including: * an extensible application framework * application building blocks * 2-D and 3-D graphics libraries * scientific and math libraries * developer tools The cornerstone on which these tools rest is the Traits package, which provides explicit type declarations in Python; its features include initialization, validation, delegation, notification, and visualization of typed attributes. More information on ETS is available from the development home page: http://code.enthought.com/projects/index.php Changelog ========= ETS 3.2.0 is a feature-added update to ETS 3.1.0, including numerous bug-fixes. Some of the notable changes include: Chaco ----- * Domain limits - Mappers now can declare the "limits" of their valid domain. PanTool and ZoomTool respect these limits. (pwang) * Adding "hide_grids" parameter to Plot.img_plot() and Plot.contour_plot() so users can override the default behavior of hiding grids. (pwang) * Refactored examples to declare a Demo object so they can be be run with the demo.py example launcher. (vibha) * Adding chaco.overlays package with some canned SVG overlays. (bhendrix) * DragZoom now can scale both X and Y axes independently corresponding to the mouse cursor motion along the X and Y axes (similar to the zoom behavior in Matplotlib). (pwang) * New Examples: * world map (bhendrix) * more financial plots (pwang) * scatter_toggle (pwang) * stacked_axis (pwang) * Fixing the chaco.scales TimeFormatter to use the built-in localtime() instead of the one in the safetime.py module due to Daylight Savings Time issues with timedelta. (r23231, pwang) * Improved behavior of ScatterPlot when it doesn't get the type of metadata it expects in its "selections" and "selection_masks" metadata keys (r23121, pwang) * Setting the .range2d attribute on GridMapper now properly sets the two DataRange1D instances of its sub-mappers. (r23119, pwang) * ScatterPlot.map_index() now respects the index_only flag (r23060, pwang) * Fixed occasional traceback/bug in LinePlot that occurred when data was completely outside the visible range (r23059, pwang) * Implementing is_in() on legends to account for padding and alignment (caused by tools that move the legend) (r23052, bhendrix) * Legend behaves properly when there are no plots to display (r23012, judah) * Fixed LogScale in the chaco.scales package to correctly handle the case when the length of the interval is less than a decade (r22907, warren.weckesser) * Fixed traceback when calling copy_traits() on a DataView (r22894, vibha) * Scatter plots generated by Plot.plot() now properly use the "auto" coloring feature of Plot. (r22727, pwang) * Reduced the size of screenshots in the user manual. (r22720, rkern) Mayavi ------ * 17, 18 March, 2009 (PR): * NEW: A simple example to show how one can use TVTK?s visual module with mlab. [23250] * BUG: The size trait was being overridden and was different from the parent causing a bug with resizing the viewer. [23243] * 15 March, 2009 (GV): * ENH: Add a volume factory to mlab that knows how to set color, vmin and vmax for the volume module [23221]. * 14 March, 2009 (PR): * API/TEST: Added a new testing entry point: ?mayavi -t? now runs tests in separate process, for isolation. Added enthought.mayavi.api.test to allow for simple testing from the interpreter [23195]...[23200], [23213], [23214], [23223]. * BUG: The volume module was directly importing the wx_gradient_editor leading to an import error when no wxPython is available. This has been tested and fixed. Thanks to Christoph Bohme for reporting this issue. [23191] * 14 March, 2009 (GV): * BUG: [mlab]: fix positioning for titles [23194], and opacity for titles and text [23193]. * ENH: Add the mlab_source attribute on all objects created by mlab, when possible [23201], [23209]. * ENH: Add a message to help the first-time user, using the new banner feature of the IPython shell view [23208]. * 13 March, 2009 (PR): * NEW/API: Adding a powerful TCP/UDP server for scripting mayavi via the network. This is available in enthought.mayavi.tools.server and is fully documented. It uses twisted and currently only works with wxPython. It is completely insecure though since it allows a remote user to do practically anything from mayavi. * 13 March, 2009 (GV) * API: rename mlab.orientationaxes to mlab.orientation_axes [23184] * 11 March, 2009 (GV) * API: Expose ?traverse? in mlab.pipeline [23181] * 10 March, 2009 (PR) * BUG: Fixed a subtle bug that affected the ImagePlaneWidget. This happened because the scalar_type of the output data from the VTKDataSource was not being set correctly. Getting the range of any input scalars also seems to silence warnings from VTK. This should hopefully fix issues with the use of the IPW with multiple scalars. I?ve added two tests for this, one is an integration test since those errors really show up only when the display is used. The other is a traditional unittest. [23166] * 08 March, 2009 (GV) * ENH: Raises an error when the user passes to mlab an array with infinite values [23150] * 07 March, 2009 (PR) * BUG: A subtle bug with a really gross error in the GridPlane component, I was using the extents when I should really have been looking at the dimensions. The extract grid filter was also not flushing the data changes downstream leading to errors that are also fixed now. These errors would manifest when you use an ExtractGrid to select a VOI or a sample rate and then used a grid plane down stream causing very wierd and incorrect rendering of the grid plane (thanks to conflation of extents and dimensions). This bug was seen at NAL for a while and also reported by Fred with a nice CME. The CME was then converted to a nice unittest by Suyog and then improved. Thanks to them all. [23146] * 28 February, 2009 (PR) * BUG: Fixed some issues reported by Ondrej Certik regarding the use Of mlab.options.offscreen, mlab.options.backend = ?test?, removed cruft from earlier ?null? backend, fixed bug with incorrect imports, add_dataset set no longer adds one new null engine each time figure=False is passed, added test case for the options.backend test. [23088] * 23 February, 2009 (PR) * ENH: Updating show so that it supports a stop keyword argument that pops up a little UI that lets the user stop the mainloop temporarily and continue using Python [23049] * 21 February, 2009 (GV) * ENH: Add a richer view for the pipeline to the MayaviScene [23035] * ENH: Add safegards to capture wrong triangle array sizes in mlab.triangular_mesh_source. [23037] * 21 February, 2009 (PR) * ENH: Making the transform data filter recordable. [23033] * NEW: A simple animator class to make it relatively to create animations. [23036] [23039] * 20 February, 2009 (PR) * ENH: Added readers for various image file formats, poly data readers and unstructured grid readers. These include DICOM, GESigna, DEM, MetaImage (mha,mhd) MINC, AVSucd, GAMBIT, Exodus, STL, Points, Particle, PLY, PDB, SLC, OBJ, Facet and BYU files. Also added several tests for most of this functionality along with small data files. These are additions from PR?s project staff, Suyog Jain and Sreekanth Ravindran. [23013] * ENH: We now change the default so the ImagePlaneWidget does not control the LUT. Also made the IPW recordable. [23011] * 18 February, 2009 (GV) * ENH: Add a preference manager view for editing preferences outside envisage [22998] * 08 February, 2009 (GV) * ENH: Center the glyphs created by barchart on the data points, as mentioned by Rauli Ruohonen [22906] * 29 January, 2009 (GV) * ENH: Make it possible to avoid redraws with mlab by using mlab.gcf().scene.disable_render = True [22869] * 28 January, 2009 (PR and GV) * ENH: Make the mlab.pipeline.user_defined factory function usable to add arbitrary filters on the pipeline. [22867], [22865] * 11 January, 2009 (GV) * ENH: Make mlab.imshow use the ImageActor. Enhance the ImageActor to map scalars to colors when needed. [22816] Traits ------ * Fixed a bug whereby faulty error handling in the PyProtocols Pyrex speedup code keeps references to tracebacks that have been handled. In so doing, clean up the same code such that it can be used with a modern Pyrex release (a bare raise can no longer be used outside of an except: clause). * RangeEditor factory now supports a 'logslider' mode: Thanks to Matthew Turk for the patch * TabularEditor factory now supports editing of all columns: Thanks to Didrik Pinte for the patch * DateEditor factory in 'custom' style now supports multi-select feature. * DateEditor and TimeEditor now support the 'readonly' style. * Fixed a bug in the ArrayEditor factory that was causing multiple trait change events to get fired when the underlying array is changed externally to the editor: Thanks to Matthew Turk for he patch. * Fixed a circular import error in Color, Font and RGBColor traits * Fixed a bug in the factory for ArrayViewEditor so it now calls the toolkit backend-specific editor TraitsBackendWX --------------- * RangeEditor now supports a 'logslider' mode: Thanks to Matthew Turk for the patch * TabularEditor now supports editing of all columns: Thanks to Didrik Pinte for the patch * DateEditor in 'custom' style now supports multi-select feature. * DateEditor and TimeEditor now support the 'readonly' style. * Added a trait to the wx pyface workbench View to indicate if the view dock window should be closeable. * Fixed the DirectoryEditor to popup the correct file dialog (thanks to Luca Fasano and Phil Thompson) * Fixed a circular import error in Color, Font and RGBColor traits * Fixed a bug in the ColorEditor that was causing the revert action to not work correctly. * Fixed a bug that caused a traceback when trying to undock a pyface dock window * Fixed a bug in the 'livemodal' view that caused the UI to become unresponsive if the 'updated' event was fired on the contained view. * Fixed bugs in ListEditor (notebook style) that caused a loss of sync between the 'selected' trait and the activated dock window. TraitsBackendQt --------------- * RangeEditor now supports a 'logslider' mode: Thanks to Matthew Turk for the patch * Fixed the DirectoryEditor to popup the correct file dialog (thanks to Luca Fasano and Phil Thompson) From charlesr.harris at gmail.com Mon Mar 23 16:47:38 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 23 Mar 2009 14:47:38 -0600 Subject: [SciPy-dev] Porting SciPy to Py3k GSOC project In-Reply-To: <73531abb0903221911k1d862881q9db5f387fa93bb39@mail.gmail.com> References: <73531abb0903221911k1d862881q9db5f387fa93bb39@mail.gmail.com> Message-ID: Hi Ross, 2009/3/22 ross smith > Hello everyone, > > I am interested in porting SciPy/NumPy to Py3k. I've been working this > past school year to port an existing code base to py3k for a research group > on campus. A lot of the code relies on SciPy and NumPy but the scope of my > project didn't let me work on porting either project, to my dismay. I'd > love the opportunity to port a project I use heavily in my own code and gain > a better understanding of how it works. > > We are supposed to contact the group we would be working with, to flesh out > the details of our application. I've looked at the application and the only > thing I know I'll need significant help with is the Milestones portion. Of > course, Any and all suggestions are welcome! > Do you have a plan of attack? What all does your experience suggest will be needed? I think if you can end the project with a report on your experience and a list of things that needed to be done that that would be helpful in itself. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Mon Mar 23 21:05:59 2009 From: cournape at gmail.com (David Cournapeau) Date: Tue, 24 Mar 2009 10:05:59 +0900 Subject: [SciPy-dev] recommended compiler for distributing scikits windows binaries In-Reply-To: References: Message-ID: <5b8d13220903231805y7aff2e27ke25ad65e7ec3ac96@mail.gmail.com> On Tue, Mar 24, 2009 at 2:56 AM, Matt Knox wrote: > Pierre and I are preparing to do a first official release of the timeseries > scikit once numpy 1.3 is released and I was just wondering what compiler I > should be using to distribute binaries for windows. > > What compiler is being used to build the official numpy binaries? MinGW? or > MSVC? Mingw (the official release, that is 3.*-based). > I assume it is recommended to use the same compiler for building numpy C > extensions that was used to build numpy itself? Yes and no. Yes, it is better to use the same compiler - but we already break this rule with numpy, since numpy and python do not use the same compiler. The problem with MS compilers is that you need to use a different compiler for every python version; in particular, there is no free version of VS 2003.net (the one used for python 2.4 and 2.5), and using VS 2005 or later will definitely not work. Actually, the problem is not so much the compiler as the same C runtime. > Also, do I need to compile separate versions using each of the binaries included > in the numpy superpack installers and thus create a "superpack" of my own for > the C extensions to work with all versions? No - but build against numpy 1.3, to avoid getting special optimization flags which may render your binary useless for low-spec machines. >. And will binaries compiled against the current > numpy 1.3 beta version continue to work properly with the final released version? They should. cheers, David From mattknox.ca at gmail.com Mon Mar 23 22:14:43 2009 From: mattknox.ca at gmail.com (Matt Knox) Date: Tue, 24 Mar 2009 02:14:43 +0000 (UTC) Subject: [SciPy-dev] =?utf-8?q?recommended_compiler_for_distributing_sciki?= =?utf-8?q?ts=09windows_binaries?= References: <5b8d13220903231805y7aff2e27ke25ad65e7ec3ac96@mail.gmail.com> Message-ID: > > What compiler is being used to build the official numpy binaries? MinGW? or > > MSVC? > > Mingw (the official release, that is 3.*-based). Thanks for the info David. I'll just use MinGW then to be consistent with numpy. No sense adding even more variation into the mix. - Matt From martyfuhry at gmail.com Mon Mar 23 23:20:50 2009 From: martyfuhry at gmail.com (Marty Fuhry) Date: Mon, 23 Mar 2009 23:20:50 -0400 Subject: [SciPy-dev] Summer of Code: Proposal for Implementing date/time types in NumPy Message-ID: Hello, I was reading through the Summer of Code ideas and I'm terribly interested in date/time proposal (http://projects.scipy.org/numpy/browser/trunk/doc/neps/datetime-proposal3.rst). I would love to work on this for a Google Summer of Code project. I'm a sophmore studying Computer Science and Mathematics at Kent State University in Ohio, so this project directly relates to my studies. Is there anyone looking into this proposal yet? Thank you. -Marty Fuhry From rjsm at umich.edu Tue Mar 24 01:17:36 2009 From: rjsm at umich.edu (ross smith) Date: Tue, 24 Mar 2009 01:17:36 -0400 Subject: [SciPy-dev] Porting SciPy to Py3k GSOC project In-Reply-To: <73531abb0903232212i6e004447nbc423b926a94cf39@mail.gmail.com> References: <73531abb0903221911k1d862881q9db5f387fa93bb39@mail.gmail.com> <73531abb0903232212i6e004447nbc423b926a94cf39@mail.gmail.com> Message-ID: <73531abb0903232217i5cb12570gccbc92b7b25dca7@mail.gmail.com> Hello again, > > I don't know if my second email made it through the moderator (it was too > large). > > 2009/3/23 Charles R Harris > >> Hi Ross, >> >> 2009/3/22 ross smith >> >>> Hello everyone, >>> >>> I am interested in porting SciPy/NumPy to Py3k. I've been working this >>> past school year to port an existing code base to py3k for a research group >>> on campus. A lot of the code relies on SciPy and NumPy but the scope of my >>> project didn't let me work on porting either project, to my dismay. I'd >>> love the opportunity to port a project I use heavily in my own code and gain >>> a better understanding of how it works. >>> >>> We are supposed to contact the group we would be working with, to flesh >>> out the details of our application. I've looked at the application and the >>> only thing I know I'll need significant help with is the Milestones >>> portion. Of course, Any and all suggestions are welcome! >>> >> >> Do you have a plan of attack? >> > > I do. I've done some looking through the trunk svn, and I see three > chunks to the project. the Distutils, NumPy and SciPy. Distutils would be > first on the list to be ported as the other two won't install without it. > for the most part I plan to stick to the outline in python's suggested > porting method as it worked well for the Lab's codebase. ( > http://docs.python.org/3.0/whatsnew/3.0.html) > > What all does your experience suggest will be needed? >> > > Most of the code will require minor or stylistic changes here and > there. The two issues that require much more work are any __cmp__ methods > and any questionable coding practices. I've found that the things that 2to3 > (the provided auto-converter) chokes on are things that shouldn't have made > it to production in the first place. Once it's been run through 2to3, the > bugs and errors that pop up don't seem to follow much of a pattern and I > expect much more work per bug or error in this auto-converted code. > > I think if you can end the project with a report on your experience and a >> list of things that needed to be done that that would be helpful in itself. >> >> Chuck >> > > Thanks for the suggestions. I have one other question, are there unit > tests available somwhere that I haven't seen? looking under the tests and > testing folders in the source tree turned up a fairly lean set of tests. > > -Ross > >> >> >> >> _______________________________________________ >> Scipy-dev mailing list >> Scipy-dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Mar 24 01:35:48 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 23 Mar 2009 23:35:48 -0600 Subject: [SciPy-dev] Porting SciPy to Py3k GSOC project In-Reply-To: <73531abb0903232217i5cb12570gccbc92b7b25dca7@mail.gmail.com> References: <73531abb0903221911k1d862881q9db5f387fa93bb39@mail.gmail.com> <73531abb0903232212i6e004447nbc423b926a94cf39@mail.gmail.com> <73531abb0903232217i5cb12570gccbc92b7b25dca7@mail.gmail.com> Message-ID: 2009/3/23 ross smith > Hello again, > >> >> I don't know if my second email made it through the moderator (it was too >> large). >> >> 2009/3/23 Charles R Harris >> >>> Hi Ross, >>> >>> 2009/3/22 ross smith >>> >>>> Hello everyone, >>>> >>>> I am interested in porting SciPy/NumPy to Py3k. I've been working this >>>> past school year to port an existing code base to py3k for a research group >>>> on campus. A lot of the code relies on SciPy and NumPy but the scope of my >>>> project didn't let me work on porting either project, to my dismay. I'd >>>> love the opportunity to port a project I use heavily in my own code and gain >>>> a better understanding of how it works. >>>> >>>> We are supposed to contact the group we would be working with, to flesh >>>> out the details of our application. I've looked at the application and the >>>> only thing I know I'll need significant help with is the Milestones >>>> portion. Of course, Any and all suggestions are welcome! >>>> >>> >>> Do you have a plan of attack? >>> >> >> I do. I've done some looking through the trunk svn, and I see three >> chunks to the project. the Distutils, NumPy and SciPy. Distutils would be >> first on the list to be ported as the other two won't install without it. >> for the most part I plan to stick to the outline in python's suggested >> porting method as it worked well for the Lab's codebase. ( >> http://docs.python.org/3.0/whatsnew/3.0.html) >> >> What all does your experience suggest will be needed? >>> >> >> Most of the code will require minor or stylistic changes here and >> there. The two issues that require much more work are any __cmp__ methods >> and any questionable coding practices. I've found that the things that 2to3 >> (the provided auto-converter) chokes on are things that shouldn't have made >> it to production in the first place. Once it's been run through 2to3, the >> bugs and errors that pop up don't seem to follow much of a pattern and I >> expect much more work per bug or error in this auto-converted code. >> >> I think if you can end the project with a report on your experience and a >>> list of things that needed to be done that that would be helpful in itself. >>> >>> Chuck >>> >> >> Thanks for the suggestions. I have one other question, are there unit >> tests available somwhere that I haven't seen? looking under the tests and >> testing folders in the source tree turned up a fairly lean set of tests. >> > Yes, test coverage certainly isn't where it should be and varies a lot among packages. We could probably use someone slogging away at the unromantic work of expanding the coverage if you know anyone who might be interested. For distutils, David would probably be the best bet. The f2py translator is important for scipy and could probably use a looksie too. Other folks know more about scipy than I, so hopefully they will weigh in. I'm mostly a numpy guy so I was happy to see you included numpy in your list. We can use all the help we can get. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Mar 24 01:52:17 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 23 Mar 2009 23:52:17 -0600 Subject: [SciPy-dev] Summer of Code: Proposal for Implementing date/time types in NumPy In-Reply-To: References: Message-ID: Hi Marty, On Mon, Mar 23, 2009 at 9:20 PM, Marty Fuhry wrote: > Hello, > > I was reading through the Summer of Code ideas and I'm terribly > interested in date/time proposal > ( > http://projects.scipy.org/numpy/browser/trunk/doc/neps/datetime-proposal3.rst > ). > I would love to work on this for a Google Summer of Code project. I'm > a sophmore studying Computer Science and Mathematics at Kent State > University in Ohio, so this project directly relates to my studies. Is > there anyone looking into this proposal yet? > You might want to cross post on the numpy mailing list also just to make sure Francesc sees it. Note also the recent post by Matt Knox (Recommended compiler...), he and Pierre might also be interested in this work. Do you know Laura Smithies? I was in graduate school with her. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Tue Mar 24 02:31:53 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 24 Mar 2009 15:31:53 +0900 Subject: [SciPy-dev] Porting SciPy to Py3k GSOC project In-Reply-To: References: <73531abb0903221911k1d862881q9db5f387fa93bb39@mail.gmail.com> <73531abb0903232212i6e004447nbc423b926a94cf39@mail.gmail.com> <73531abb0903232217i5cb12570gccbc92b7b25dca7@mail.gmail.com> Message-ID: <49C87E59.4070106@ar.media.kyoto-u.ac.jp> Charles R Harris wrote: > > > 2009/3/23 ross smith > > > Hello again, > > > I don't know if my second email made it through the moderator > (it was too large). > > 2009/3/23 Charles R Harris > > > Hi Ross, > > 2009/3/22 ross smith > > > Hello everyone, > > I am interested in porting SciPy/NumPy to Py3k. I've > been working this past school year to port an existing > code base to py3k for a research group on campus. A > lot of the code relies on SciPy and NumPy but the > scope of my project didn't let me work on porting > either project, to my dismay. I'd love the > opportunity to port a project I use heavily in my own > code and gain a better understanding of how it works. > > We are supposed to contact the group we would be > working with, to flesh out the details of our > application. I've looked at the application and the > only thing I know I'll need significant help with is > the Milestones portion. Of course, Any and all > suggestions are welcome! > > > Do you have a plan of attack? > > > I do. I've done some looking through the trunk svn, and I > see three chunks to the project. the Distutils, NumPy and > SciPy. Distutils would be first on the list to be ported as > the other two won't install without it. for the most part I > plan to stick to the outline in python's suggested porting > method as it worked well for the Lab's codebase. > (http://docs.python.org/3.0/whatsnew/3.0.html) > You would need to port numpy itself before starting scipy as well. I think it is fair to say that most work will be inside numpy/core - which is ~ 30 000 LOC according to sloccount (wo counting comments, empty lines and the like); IOW, it is massive, and there is no chance to do the conversion in a couple of months unless you are very familiar with numpy. I am not sure I can see a meaningful subpart of numpy/core which could be ported for python 3 for a SoC. A more limited in scope project would be to port f2py to python 3. It is only python code, it is a non trivial piece of code, and required for scipy as well (although of course f2py could be run from python 2, but that would require some changes in numpy.distutils, as f2py is imported as a python module for the moment). Another angle would be to rewrite some C numpy code into cython. I don't know enough about cython to assess whether this is a good idea or even feasible at all, but it is my understanding that cython can generate 2. and 3.-compatible C code. It would be helpful for numpy's maintainability and would help for the port to python 3 as well. cheers, David From stefan at sun.ac.za Tue Mar 24 04:11:28 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 24 Mar 2009 10:11:28 +0200 Subject: [SciPy-dev] Porting SciPy to Py3k GSOC project In-Reply-To: <49C87E59.4070106@ar.media.kyoto-u.ac.jp> References: <73531abb0903221911k1d862881q9db5f387fa93bb39@mail.gmail.com> <73531abb0903232212i6e004447nbc423b926a94cf39@mail.gmail.com> <73531abb0903232217i5cb12570gccbc92b7b25dca7@mail.gmail.com> <49C87E59.4070106@ar.media.kyoto-u.ac.jp> Message-ID: <9457e7c80903240111r48e3bb62u7df112277bdb78e9@mail.gmail.com> 2009/3/24 David Cournapeau : > You would need to port numpy itself before starting scipy as well. I > think it is fair to say that most work will be inside numpy/core - which > is ~ 30 000 LOC according to sloccount (wo counting comments, empty > lines and the like); IOW, it is massive, and there is no chance to do > the conversion in a couple of months unless you are very familiar with > numpy. I am not sure I can see a meaningful subpart of numpy/core which > could be ported for python 3 for a SoC. Is the situation really that dire? How much has the C API changed between 2 and 3, and are these changes difficult to propagate? St?fan From david at ar.media.kyoto-u.ac.jp Tue Mar 24 04:21:46 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 24 Mar 2009 17:21:46 +0900 Subject: [SciPy-dev] Porting SciPy to Py3k GSOC project In-Reply-To: <9457e7c80903240111r48e3bb62u7df112277bdb78e9@mail.gmail.com> References: <73531abb0903221911k1d862881q9db5f387fa93bb39@mail.gmail.com> <73531abb0903232212i6e004447nbc423b926a94cf39@mail.gmail.com> <73531abb0903232217i5cb12570gccbc92b7b25dca7@mail.gmail.com> <49C87E59.4070106@ar.media.kyoto-u.ac.jp> <9457e7c80903240111r48e3bb62u7df112277bdb78e9@mail.gmail.com> Message-ID: <49C8981A.80805@ar.media.kyoto-u.ac.jp> St?fan van der Walt wrote: > 2009/3/24 David Cournapeau : > >> You would need to port numpy itself before starting scipy as well. I >> think it is fair to say that most work will be inside numpy/core - which >> is ~ 30 000 LOC according to sloccount (wo counting comments, empty >> lines and the like); IOW, it is massive, and there is no chance to do >> the conversion in a couple of months unless you are very familiar with >> numpy. I am not sure I can see a meaningful subpart of numpy/core which >> could be ported for python 3 for a SoC. >> > > Is the situation really that dire? How much has the C API changed > between 2 and 3, and are these changes difficult to propagate? > Talking only about the C code, here are some things which changed: - the buffer API is changed -> I don't know how significant this is - the basic C types/objects structures have changed a bit -> again, no idea how significant this is - the Unicode/String unification - long/int unification But I would think the main problem is that numpy simply is a big, complicated set of C code, whose parts can't simply be done separately. You would need to do quite a bit of changes for the code to only compile, making bugs hard to track - and that's assuming numpy.distutils itself won't get in the way. Then, there is the problem on how to deal with two codebases - if we can't handle things with a few #ifdef, the situation will be really bad. That's why I am not convinced that it would be a good project for a GSoC. You can't try things easily. Using cython for the C code is the advice given by the python doc itself - although again, numpy is not the usual extension. I think almost everything outside numpy/core should be relatively easy to convert to cython (where easy means would take time, but could be done gradually without impact everywhere). I don't know how usable cython would be to define C-accessible, public extension types. But if it is, then things can be done gradually - this can be tested by many people, etc... I think Travis already thought a bit about the transition, but I am not sure he has written anything about it ? he is obviously the best person to give directions for the transition, cheers, David From charlesr.harris at gmail.com Tue Mar 24 10:17:14 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 24 Mar 2009 08:17:14 -0600 Subject: [SciPy-dev] Porting SciPy to Py3k GSOC project In-Reply-To: <49C8981A.80805@ar.media.kyoto-u.ac.jp> References: <73531abb0903221911k1d862881q9db5f387fa93bb39@mail.gmail.com> <73531abb0903232212i6e004447nbc423b926a94cf39@mail.gmail.com> <73531abb0903232217i5cb12570gccbc92b7b25dca7@mail.gmail.com> <49C87E59.4070106@ar.media.kyoto-u.ac.jp> <9457e7c80903240111r48e3bb62u7df112277bdb78e9@mail.gmail.com> <49C8981A.80805@ar.media.kyoto-u.ac.jp> Message-ID: On Tue, Mar 24, 2009 at 2:21 AM, David Cournapeau < david at ar.media.kyoto-u.ac.jp> wrote: > St?fan van der Walt wrote: > > 2009/3/24 David Cournapeau : > > > >> You would need to port numpy itself before starting scipy as well. I > >> think it is fair to say that most work will be inside numpy/core - which > >> is ~ 30 000 LOC according to sloccount (wo counting comments, empty > >> lines and the like); IOW, it is massive, and there is no chance to do > >> the conversion in a couple of months unless you are very familiar with > >> numpy. I am not sure I can see a meaningful subpart of numpy/core which > >> could be ported for python 3 for a SoC. > >> > > > > Is the situation really that dire? How much has the C API changed > > between 2 and 3, and are these changes difficult to propagate? > > > > Talking only about the C code, here are some things which changed: > - the buffer API is changed -> I don't know how significant this is > - the basic C types/objects structures have changed a bit -> again, > no idea how significant this is > - the Unicode/String unification > - long/int unification > > But I would think the main problem is that numpy simply is a big, > complicated set of C code, whose parts can't simply be done separately. > You would need to do quite a bit of changes for the code to only > compile, making bugs hard to track - and that's assuming numpy.distutils > itself won't get in the way. Then, there is the problem on how to deal > with two codebases - if we can't handle things with a few #ifdef, the > situation will be really bad. That's why I am not convinced that it > would be a good project for a GSoC. You can't try things easily. > > Using cython for the C code is the advice given by the python doc itself > - although again, numpy is not the usual extension. I think almost > everything outside numpy/core should be relatively easy to convert to > cython (where easy means would take time, but could be done gradually > without impact everywhere). I don't know how usable cython would be to > define C-accessible, public extension types. But if it is, then things > can be done gradually - this can be tested by many people, etc... > A while back I took a shot at interfacing lapack_lite using cython just to see what it looked like, but decided that the current interface was actually pretty clean. There is also fftpack. Random is already done. Are there any separate bits folks can think of? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Mar 24 13:05:42 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 24 Mar 2009 11:05:42 -0600 Subject: [SciPy-dev] Porting SciPy to Py3k GSOC project In-Reply-To: References: <73531abb0903221911k1d862881q9db5f387fa93bb39@mail.gmail.com> <73531abb0903232212i6e004447nbc423b926a94cf39@mail.gmail.com> <73531abb0903232217i5cb12570gccbc92b7b25dca7@mail.gmail.com> <49C87E59.4070106@ar.media.kyoto-u.ac.jp> <9457e7c80903240111r48e3bb62u7df112277bdb78e9@mail.gmail.com> <49C8981A.80805@ar.media.kyoto-u.ac.jp> Message-ID: On Tue, Mar 24, 2009 at 8:17 AM, Charles R Harris wrote: > > > On Tue, Mar 24, 2009 at 2:21 AM, David Cournapeau < > david at ar.media.kyoto-u.ac.jp> wrote: > >> St?fan van der Walt wrote: >> > 2009/3/24 David Cournapeau : >> > >> >> You would need to port numpy itself before starting scipy as well. I >> >> think it is fair to say that most work will be inside numpy/core - >> which >> >> is ~ 30 000 LOC according to sloccount (wo counting comments, empty >> >> lines and the like); IOW, it is massive, and there is no chance to do >> >> the conversion in a couple of months unless you are very familiar with >> >> numpy. I am not sure I can see a meaningful subpart of numpy/core which >> >> could be ported for python 3 for a SoC. >> >> >> > >> > Is the situation really that dire? How much has the C API changed >> > between 2 and 3, and are these changes difficult to propagate? >> > >> >> Talking only about the C code, here are some things which changed: >> - the buffer API is changed -> I don't know how significant this is >> - the basic C types/objects structures have changed a bit -> again, >> no idea how significant this is >> - the Unicode/String unification >> - long/int unification >> >> But I would think the main problem is that numpy simply is a big, >> complicated set of C code, whose parts can't simply be done separately. >> You would need to do quite a bit of changes for the code to only >> compile, making bugs hard to track - and that's assuming numpy.distutils >> itself won't get in the way. Then, there is the problem on how to deal >> with two codebases - if we can't handle things with a few #ifdef, the >> situation will be really bad. That's why I am not convinced that it >> would be a good project for a GSoC. You can't try things easily. >> >> Using cython for the C code is the advice given by the python doc itself >> - although again, numpy is not the usual extension. I think almost >> everything outside numpy/core should be relatively easy to convert to >> cython (where easy means would take time, but could be done gradually >> without impact everywhere). I don't know how usable cython would be to >> define C-accessible, public extension types. But if it is, then things >> can be done gradually - this can be tested by many people, etc... >> > > A while back I took a shot at interfacing lapack_lite using cython just to > see what it looked like, but decided that the current interface was actually > pretty clean. There is also fftpack. Random is already done. Are there any > separate bits folks can think of? > Continuing the cython thoughts, there are parts of the current c code that I think would look better in python. For instance, I think most of arraymethods.c and umath_ufunc_object.inc would look cleaner in python. However, I don't know what this would mean for the C-ABI and call overhead. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From rjsm at umich.edu Tue Mar 24 17:44:46 2009 From: rjsm at umich.edu (ross smith) Date: Tue, 24 Mar 2009 17:44:46 -0400 Subject: [SciPy-dev] Porting SciPy to Py3k GSOC project In-Reply-To: <73531abb0903241222x545ca7e7t28a9961f06d86791@mail.gmail.com> References: <73531abb0903221911k1d862881q9db5f387fa93bb39@mail.gmail.com> <73531abb0903232212i6e004447nbc423b926a94cf39@mail.gmail.com> <73531abb0903232217i5cb12570gccbc92b7b25dca7@mail.gmail.com> <49C87E59.4070106@ar.media.kyoto-u.ac.jp> <9457e7c80903240111r48e3bb62u7df112277bdb78e9@mail.gmail.com> <49C8981A.80805@ar.media.kyoto-u.ac.jp> <73531abb0903241222x545ca7e7t28a9961f06d86791@mail.gmail.com> Message-ID: <73531abb0903241444p2e42839ay619c9e65a1d2ed50@mail.gmail.com> Just to summarize what I'm seeing. Porting all three (distutils, NumPy and SciPY) is going to be a too ambitious idea for a summer. Likewise, all the work necisary to get even NumPy to compile would make it a poor project as well. on the other hand, porting relevant parts of NumPy to Cython would take a more reasonable amount of work, and would provide more room for testing and bug squashing during the summer as well as helping with the eventual porting. Another possible project is to port f2py, a smaller project that's all python that would be valueable to the overall porting of the code. I don't have any experience (yet) with cython so I'm a little worried about the learning curve associated with it. I'd tend to lean toward the porting of f2py as the project I'd be interested in. Possibly with the addition of a design document for what specifics will need to be done to get Numpy completely ported. I haven't had the chance to look at f2py yet, but I'll put together a rough outline of a timeline for the f2py and Cython projects and post back for your thoughts. -Ross On Tue, Mar 24, 2009 at 15:22, ross smith wrote: > Just to summarize what I'm seeing. > > Porting all three (distutils, NumPy and SciPY) is going to be a too > ambitious idea for a summer. Likewise, all the work necisary to get even > NumPy to compile would make it a poor project as well. > > on the other hand, porting relevant parts of NumPy to Cython would take a > more reasonable amount of work, and would provide more room for testing and > bug squashing during the summer as well as helping with the eventual > porting. Another possible project is to port f2py, a smaller project that's > all python that would be valueable to the overall porting of the code. > > I don't have any experience (yet) with cython so I'm a little worried about > the learning curve associated with it. I'd tend to lean toward the porting > of f2py as the project I'd be interested in. Possibly with the addition of > a design document for what specifics will need to be done to get Numpy > completely ported. > > I haven't had the chance to look at f2py yet, but I'll put together a rough > outline of a timeline for the f2py and Cython projects and post back for > your thoughts. > > -Ross > > > 2009/3/24 Charles R Harris > >> >> >> On Tue, Mar 24, 2009 at 8:17 AM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Tue, Mar 24, 2009 at 2:21 AM, David Cournapeau < >>> david at ar.media.kyoto-u.ac.jp> wrote: >>> >>>> St?fan van der Walt wrote: >>>> > 2009/3/24 David Cournapeau : >>>> > >>>> >> You would need to port numpy itself before starting scipy as well. I >>>> >> think it is fair to say that most work will be inside numpy/core - >>>> which >>>> >> is ~ 30 000 LOC according to sloccount (wo counting comments, empty >>>> >> lines and the like); IOW, it is massive, and there is no chance to do >>>> >> the conversion in a couple of months unless you are very familiar >>>> with >>>> >> numpy. I am not sure I can see a meaningful subpart of numpy/core >>>> which >>>> >> could be ported for python 3 for a SoC. >>>> >> >>>> > >>>> > Is the situation really that dire? How much has the C API changed >>>> > between 2 and 3, and are these changes difficult to propagate? >>>> > >>>> >>>> Talking only about the C code, here are some things which changed: >>>> - the buffer API is changed -> I don't know how significant this is >>>> - the basic C types/objects structures have changed a bit -> again, >>>> no idea how significant this is >>>> - the Unicode/String unification >>>> - long/int unification >>>> >>>> But I would think the main problem is that numpy simply is a big, >>>> complicated set of C code, whose parts can't simply be done separately. >>>> You would need to do quite a bit of changes for the code to only >>>> compile, making bugs hard to track - and that's assuming numpy.distutils >>>> itself won't get in the way. Then, there is the problem on how to deal >>>> with two codebases - if we can't handle things with a few #ifdef, the >>>> situation will be really bad. That's why I am not convinced that it >>>> would be a good project for a GSoC. You can't try things easily. >>>> >>>> Using cython for the C code is the advice given by the python doc itself >>>> - although again, numpy is not the usual extension. I think almost >>>> everything outside numpy/core should be relatively easy to convert to >>>> cython (where easy means would take time, but could be done gradually >>>> without impact everywhere). I don't know how usable cython would be to >>>> define C-accessible, public extension types. But if it is, then things >>>> can be done gradually - this can be tested by many people, etc... >>>> >>> >>> A while back I took a shot at interfacing lapack_lite using cython just >>> to see what it looked like, but decided that the current interface was >>> actually pretty clean. There is also fftpack. Random is already done. Are >>> there any separate bits folks can think of? >>> >> >> Continuing the cython thoughts, there are parts of the current c code that >> I think would look better in python. For instance, I think most of >> arraymethods.c and umath_ufunc_object.inc would look cleaner in python. >> However, I don't know what this would mean for the C-ABI and call overhead. >> >> Chuck >> >> >> >> _______________________________________________ >> Scipy-dev mailing list >> Scipy-dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Mar 24 22:52:12 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 24 Mar 2009 20:52:12 -0600 Subject: [SciPy-dev] About that date/time thingie. Message-ID: Folks, We have a student who has expressed an interest in the date/time project for GSoC. Could the people who were discussing this last summer please step forward and extend him the courtesy of a comment. TIA Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Mar 24 23:05:57 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 24 Mar 2009 21:05:57 -0600 Subject: [SciPy-dev] Porting SciPy to Py3k GSOC project In-Reply-To: <73531abb0903241444p2e42839ay619c9e65a1d2ed50@mail.gmail.com> References: <73531abb0903221911k1d862881q9db5f387fa93bb39@mail.gmail.com> <73531abb0903232217i5cb12570gccbc92b7b25dca7@mail.gmail.com> <49C87E59.4070106@ar.media.kyoto-u.ac.jp> <9457e7c80903240111r48e3bb62u7df112277bdb78e9@mail.gmail.com> <49C8981A.80805@ar.media.kyoto-u.ac.jp> <73531abb0903241222x545ca7e7t28a9961f06d86791@mail.gmail.com> <73531abb0903241444p2e42839ay619c9e65a1d2ed50@mail.gmail.com> Message-ID: 2009/3/24 ross smith > Just to summarize what I'm seeing. > > Porting all three (distutils, NumPy and SciPY) is going to be a too > ambitious idea for a summer. Likewise, all the work necisary to get even > NumPy to compile would make it a poor project as well. > > on the other hand, porting relevant parts of NumPy to Cython would take a > more reasonable amount of work, and would provide more room for testing and > bug squashing during the summer as well as helping with the eventual > porting. Another possible project is to port f2py, a smaller project that's > all python that would be valueable to the overall porting of the code. > > I don't have any experience (yet) with cython so I'm a little worried about > the learning curve associated with it. I'd tend to lean toward the porting > of f2py as the project I'd be interested in. Possibly with the addition of > a design document for what specifics will need to be done to get Numpy > completely ported. > > I haven't had the chance to look at f2py yet, but I'll put together a rough > outline of a timeline for the f2py and Cython projects and post back for > your thoughts. > > -Ross > If you go the f2py route it might be worth while to think about ways of testing it. Can't say I have many ideas about how to do it myself. It could also be interesting to just try running it with the python2.6 flag to see what turns up. The original author of the code was Pearu Peterson,but he is no longer maintaining it. I don't know what his current plans are, he has moved development elsewhere and has a real job that is taking up most of his time. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mattknox.ca at gmail.com Wed Mar 25 08:12:50 2009 From: mattknox.ca at gmail.com (Matt Knox) Date: Wed, 25 Mar 2009 12:12:50 +0000 (UTC) Subject: [SciPy-dev] Summer of Code: Proposal for Implementing date/time types in NumPy References: Message-ID: > Hello, > I was reading through the Summer of Code ideas and I'm terribly > interested in date/time proposal > (http://projects.scipy.org/numpy/browser/trunk/doc/neps/datetime-proposal3.rst). > I would love to work on this for a Google Summer of Code project. I'm > a sophmore studying Computer Science and Mathematics at Kent State > University in Ohio, so this project directly relates to my studies. Is > there anyone looking into this proposal yet? > > > You might want to cross post on the numpy mailing list also just to make sure > Francesc sees it. Note also the recent post by Matt Knox (Recommended > compiler...), he and Pierre might also be interested in this work. Not to discourage anyone from working on this, but since my name was brought up I'll just mention that I am not personally that interested in this project as the scikits.timeseries package fulfills my needs for handling date/time data. We use an integer array with some meta data to represent the "dates" portion of our TimeSeries class and it seems to work fairly well for the most part. It isn't very often I need to store dates for the actual "values" in an array (as opposed to just representing the time dimension), but for the odd occasion that I do I find using an object dtype array with standard datetime values to be sufficient. That being said, the timeseries module is not the silver bullet for every need and does have limitations that are addressed in this proposal such as frequencies higher than seconds (microsecond, etc), but again these aren't interesting to me personally given the type of data I work with. And things like "Quarterly frequency with different origins" (mentioned at the bottom of the proposal) are very important for the timeseries module but considered out of scope for the date/time data type enhancement proposal. - Matt From david.huard at gmail.com Wed Mar 25 10:15:31 2009 From: david.huard at gmail.com (David Huard) Date: Wed, 25 Mar 2009 10:15:31 -0400 Subject: [SciPy-dev] Porting SciPy to Py3k GSOC project In-Reply-To: References: <73531abb0903221911k1d862881q9db5f387fa93bb39@mail.gmail.com> <49C87E59.4070106@ar.media.kyoto-u.ac.jp> <9457e7c80903240111r48e3bb62u7df112277bdb78e9@mail.gmail.com> <49C8981A.80805@ar.media.kyoto-u.ac.jp> <73531abb0903241222x545ca7e7t28a9961f06d86791@mail.gmail.com> <73531abb0903241444p2e42839ay619c9e65a1d2ed50@mail.gmail.com> Message-ID: <91cf711d0903250715lccff543n6612c31a544a3fad@mail.gmail.com> 2009/3/24 Charles R Harris > > > 2009/3/24 ross smith > >> Just to summarize what I'm seeing. >> >> Porting all three (distutils, NumPy and SciPY) is going to be a too >> ambitious idea for a summer. Likewise, all the work necisary to get even >> NumPy to compile would make it a poor project as well. >> >> on the other hand, porting relevant parts of NumPy to Cython would take a >> more reasonable amount of work, and would provide more room for testing and >> bug squashing during the summer as well as helping with the eventual >> porting. Another possible project is to port f2py, a smaller project that's >> all python that would be valueable to the overall porting of the code. >> > If you look at numpy/f2py/src, you'll see fortranobject.c and fortranobject.h. > >> I don't have any experience (yet) with cython so I'm a little worried >> about the learning curve associated with it. I'd tend to lean toward the >> porting of f2py as the project I'd be interested in. Possibly with the >> addition of a design document for what specifics will need to be done to get >> Numpy completely ported. >> >> I haven't had the chance to look at f2py yet, but I'll put together a >> rough outline of a timeline for the f2py and Cython projects and post back >> for your thoughts. >> >> -Ross >> > > If you go the f2py route it might be worth while to think about ways of > testing it. Can't say I have many ideas about how to do it myself. It could > also be interesting to just try running it with the python2.6 flag to see > what turns up. > > The original author of the code was Pearu Peterson,but he is no longer maintaining it. I don't know what his current plans are, > he has moved development elsewhere and has a real job that is taking up most > of his time. > My understanding is that Pearu still maintains the code in the sense that he fixes the occasional bugs, but does not actively develop it anymore. He started a refactoring of the code known as f2py g3 which is hosted at http://launchpad.net/f2py/ but this project seems to be on hold for the moment. If you wish to work on f2py, I suggest you make sure he has some time to act as mentor for this project. I am wondering if you'll be able to test f2py if the numpy core is not ported first ? Regards, David > > > > Chuck > > > > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dagss at student.matnat.uio.no Wed Mar 25 10:26:53 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 25 Mar 2009 15:26:53 +0100 Subject: [SciPy-dev] Porting SciPy to Py3k GSOC project In-Reply-To: <91cf711d0903250715lccff543n6612c31a544a3fad@mail.gmail.com> References: <73531abb0903221911k1d862881q9db5f387fa93bb39@mail.gmail.com> <49C87E59.4070106@ar.media.kyoto-u.ac.jp> <9457e7c80903240111r48e3bb62u7df112277bdb78e9@mail.gmail.com> <49C8981A.80805@ar.media.kyoto-u.ac.jp> <73531abb0903241222x545ca7e7t28a9961f06d86791@mail.gmail.com> <73531abb0903241444p2e42839ay619c9e65a1d2ed50@mail.gmail.com> <91cf711d0903250715lccff543n6612c31a544a3fad@mail.gmail.com> Message-ID: <49CA3F2D.8060105@student.matnat.uio.no> David Huard wrote: > > My understanding is that Pearu still maintains the code in the sense > that he fixes the occasional bugs, but does not actively develop it > anymore. He started a refactoring of the code known as f2py g3 which > is hosted at http://launchpad.net/f2py/ but this project seems to be > on hold for the moment. > If you wish to work on f2py, I suggest you make sure he has some time > to act as mentor for this project Note that I'm currently discussing a GSoC project in the Cython camp on Fortran integration with a promising student (with me as mentor). One possibility we're looking at is using f2py for parsing and Cython as the backend/output format. The aim would be to have a more transparent Cython/Fortran experience without having to go through a Python layer. I'll get back to the NumPy list in a day or two when we have discussed the road we want to take a bit more. Dag Sverre From pgmdevlist at gmail.com Wed Mar 25 11:03:06 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 25 Mar 2009 11:03:06 -0400 Subject: [SciPy-dev] About that date/time thingie. In-Reply-To: References: Message-ID: <4C4DB925-7963-48B9-9959-A9AFBD1803A4@gmail.com> On Mar 24, 2009, at 10:52 PM, Charles R Harris wrote: > Folks, > > We have a student who has expressed an interest in the date/time > project for GSoC. Could the people who were discussing this last > summer please step forward and extend him the courtesy of a comment. Aye, Aye... A link to the previous discussion would be welcome, though. Keep me in the loop, Thx in advance P. From oliphant at enthought.com Wed Mar 25 12:50:28 2009 From: oliphant at enthought.com (Travis E. Oliphant) Date: Wed, 25 Mar 2009 11:50:28 -0500 Subject: [SciPy-dev] Summer of Code: Proposal for Implementing date/time types in NumPy In-Reply-To: References:

Message-ID: <49CA60D4.2070109@enthought.com> Matt Knox wrote: >> Hello, >> I was reading through the Summer of Code ideas and I'm terribly >> interested in date/time proposal >> (http://projects.scipy.org/numpy/browser/trunk/doc/neps/datetime-proposal3.rst). >> I would love to work on this for a Google Summer of Code project. I'm >> a sophmore studying Computer Science and Mathematics at Kent State >> University in Ohio, so this project directly relates to my studies. Is >> there anyone looking into this proposal yet? >> >> >> You might want to cross post on the numpy mailing list also just to make sure >> Francesc sees it. Note also the recent post by Matt Knox (Recommended >> compiler...), he and Pierre might also be interested in this work. >> > > Not to discourage anyone from working on this, but since my name was brought up > I'll just mention that I am not personally that interested in this project as > the scikits.timeseries package fulfills my needs for handling date/time data. We > use an integer array with some meta data to represent the "dates" portion of our > TimeSeries class and it seems to work fairly well for the most part. It isn't > very often I need to store dates for the actual "values" in an array (as opposed > to just representing the time dimension), but for the odd occasion that I do I > find using an object dtype array with standard datetime values to be sufficient. > > That being said, the timeseries module is not the silver bullet for every need > and does have limitations that are addressed in this proposal such as > frequencies higher than seconds (microsecond, etc), but again these aren't > interesting to me personally given the type of data I work with. And things like > "Quarterly frequency with different origins" (mentioned at the bottom of the > proposal) are very important for the timeseries module but considered out of > scope for the date/time data type enhancement proposal. > I've had a chance recently to look at the Date class in the timeseries module and I liked the way it was put together. I think it would benefit NumPy to have something like this class more tightly integrated with the NumPy distribution. My approach would be to make a NumPy dtype that is basically the Date class with more frequencies as taken from the date/time proposal. But, perhaps just pulling the DateArray into NumPy is a sufficient first step. -Travis From Dwf at cs.toronto.edu Wed Mar 25 13:21:33 2009 From: Dwf at cs.toronto.edu (David Warde-Farley) Date: Wed, 25 Mar 2009 13:21:33 -0400 Subject: [SciPy-dev] Another GSoC idea In-Reply-To: <5b8d13220903202250m170eabf3mfe9fc35f4bd4273b@mail.gmail.com> References: <5b8d13220903202250m170eabf3mfe9fc35f4bd4273b@mail.gmail.com> Message-ID: Hi David, Thanks for your reply - I fell ill over the weekend and then fell behind on email (and other things :). On 21-Mar-09, at 1:50 AM, David Cournapeau wrote: > For scipy.cluster.vq, I already have something in Cython - just not > put into scipy because the code is barely "research quality" (whatever > that means :) ). But I think it would be less work to improve it than > to start from scratch. For sure - it's usually not a good idea to throw out code that works unless you have a very good reason! Do you think you'll ever get around to improving it? > I think this would be a great addition. You are of course free to > choose what you work on, but I like the idea of a basic set of > recursives implementations of basic statistics and clustering > algorithms. I have also myself an implementation of online EM for > online estimation of GMM, based on the following preprint: > > http://www.citeulike.org/user/stibor/article/3245946 The idea of general "building blocks" for doing EM (and other things) with probabilistic models in Python interests me very much, and probably interests a lot of other people. However, it's a somewhat ambitious undertaking, let alone for a GSoC. Part of the difficulty I see is that there's a lot of good code that we wouldn't want to reinvent. There's a lot of code in, for example, PyEM that would be of use, some of my own "research quality" machinations, but there's also the (often ignored) maxentropy module, which as far as I know doesn't support hidden variables but would nonetheless have useful chunks (personally, I had encountered maxent models under the moniker of exponential family models and forgotten the tidbit about equivalence of the two until one day I looked at the maxentropy docs). Then there's PyMC, which as far as I can see has developed a *really* well thought out object-oriented system for specifying probabilistic graphical models. Of course, it's geared toward Bayesian inference via MCMC. In the (relatively rare) case that the posterior is analytically available it shouldn't be all that difficult to graft on code for doing that. Likewise with maximum likelihood (hyper)parameter fitting via EM or gradient-based optimization. Then there's of course code written in other languages, like Kevin Murphy's Bayes Net toolbox for Matlab, which I recall you got permission to port with a BSD license. In summary, I think a general treatment of mixture models, etc. in Python is a big task, and as such I'm not certain it'd be suitable for a SoC. Having a really solid module with a few canned non- probabilistic algorithms like k-means (like it already does), k- medoids/centers might be a more manageable task in the short term. David From david at ar.media.kyoto-u.ac.jp Wed Mar 25 13:25:25 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 26 Mar 2009 02:25:25 +0900 Subject: [SciPy-dev] Another GSoC idea In-Reply-To: References: <5b8d13220903202250m170eabf3mfe9fc35f4bd4273b@mail.gmail.com> Message-ID: <49CA6905.7020308@ar.media.kyoto-u.ac.jp> David Warde-Farley wrote: > Hi David, > > Thanks for your reply - I fell ill over the weekend and then fell > behind on email (and other things :). > Hope everything is going well now. > > For sure - it's usually not a good idea to throw out code that works > unless you have a very good reason! Do you think you'll ever get > around to improving it? > Yes, otherwise, I would not have mentioned it - who cares that I have code if it is not somewhere available publicly :) > > The idea of general "building blocks" for doing EM (and other things) > with probabilistic models in Python interests me very much, and > probably interests a lot of other people. However, it's a somewhat > ambitious undertaking, let alone for a GSoC. Part of the difficulty I > see is that there's a lot of good code that we wouldn't want to > reinvent. > I think I may not have been very clear: building blocks for machine learning is definitely out of scope. What I had in mind, following your example of recursive kmeans, is a set of simple algorithms which can be used recursively. By simple, I meant things like averages and other moment-like statistics. There was some discussion before: http://www.mail-archive.com/numpy-discussion at scipy.org/msg14473.html But again, that's only a mere suggestion, being something I am interested in myself, and which sounded similar to some of the ideas you talked about (for application to tracking). > Then there's PyMC, which as far as I can see has developed a *really* > well thought out object-oriented system for specifying probabilistic > graphical models. Of course, it's geared toward Bayesian inference via > MCMC. In the (relatively rare) case that the posterior is analytically > available it shouldn't be all that difficult to graft on code for > doing that. Likewise with maximum likelihood (hyper)parameter fitting > via EM or gradient-based optimization. > I have even worse (very research quality :) ) code implementing Variational Bayes for GMM, if that's something you are interested in, which is a relatively well known approximation of Bayesian computation for latent models. > In summary, I think a general treatment of mixture models, etc. in > Python is a big task, and as such I'm not certain it'd be suitable for > a SoC. Having a really solid module with a few canned non- > probabilistic algorithms like k-means (like it already does), k- > medoids/centers might be a more manageable task in the short term. Yes, agreed. My suggestion was about focusing more on the recursive aspect rather than cython side of things, since I have partly done the job already, although not publicly (yet). cheers, David From charlesr.harris at gmail.com Wed Mar 25 15:01:15 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 25 Mar 2009 13:01:15 -0600 Subject: [SciPy-dev] Porting SciPy to Py3k GSOC project In-Reply-To: <49CA3F2D.8060105@student.matnat.uio.no> References: <73531abb0903221911k1d862881q9db5f387fa93bb39@mail.gmail.com> <9457e7c80903240111r48e3bb62u7df112277bdb78e9@mail.gmail.com> <49C8981A.80805@ar.media.kyoto-u.ac.jp> <73531abb0903241222x545ca7e7t28a9961f06d86791@mail.gmail.com> <73531abb0903241444p2e42839ay619c9e65a1d2ed50@mail.gmail.com> <91cf711d0903250715lccff543n6612c31a544a3fad@mail.gmail.com> <49CA3F2D.8060105@student.matnat.uio.no> Message-ID: On Wed, Mar 25, 2009 at 8:26 AM, Dag Sverre Seljebotn < dagss at student.matnat.uio.no> wrote: > David Huard wrote: > > > > My understanding is that Pearu still maintains the code in the sense > > that he fixes the occasional bugs, but does not actively develop it > > anymore. He started a refactoring of the code known as f2py g3 which > > is hosted at http://launchpad.net/f2py/ but this project seems to be > > on hold for the moment. > > If you wish to work on f2py, I suggest you make sure he has some time > > to act as mentor for this project > Note that I'm currently discussing a GSoC project in the Cython camp on > Fortran integration with a promising student (with me as mentor). One > possibility we're looking at is using f2py for parsing and Cython as the > backend/output format. The aim would be to have a more transparent > Cython/Fortran experience without having to go through a Python layer. > > I'll get back to the NumPy list in a day or two when we have discussed > the road we want to take a bit more. > Looking at numpy again, there are certainly large chunks of python code that could use auditing before a future transition is attempted, the tests for instance. Last I heard, nose didn't run on python3, nor can we run the tests on python3 until most of the rest of Numpy is ported. Even so, there are probably some idioms that will make the transition difficult and they could be cleaned up. Having someone with experience in the area could be very helpful in spotting and fixing such things. And it might also serve to start putting together a more detailed plan for how to get to python3.0 when the time comes. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Mar 25 17:56:58 2009 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 25 Mar 2009 16:56:58 -0500 Subject: [SciPy-dev] Another GSoC idea In-Reply-To: References: <5b8d13220903202250m170eabf3mfe9fc35f4bd4273b@mail.gmail.com> Message-ID: <3d375d730903251456pc846454uf200779f7530d8a5@mail.gmail.com> On Wed, Mar 25, 2009 at 12:21, David Warde-Farley wrote: > Then there's PyMC, which as far as I can see has developed a *really* > well thought out object-oriented system for specifying probabilistic > graphical models. Of course, it's geared toward Bayesian inference via > MCMC. In the (relatively rare) case that the posterior is analytically > available it shouldn't be all that difficult to graft on code for > doing that. Likewise with maximum likelihood (hyper)parameter fitting > via EM or gradient-based optimization. It does bring to mind an idea, though: replicate the probability distributions from scipy.stats analytically using sympy. I think that has reasonable scope by itself, but going through *all* of the distributions may be more of a boring chore than I would inflict on a student for an entire summer. However, once one gets enough distributions to be interesting and to give assurances that one has covered all of the use cases in the design, the student can forgo the remaining distributions to work on a way to combine these distribution objects into probabilistic models that can be converted to efficient MCMC, EM, or other such numerical codes for estimation. I haven't looked at PyMC recently, though. Maybe it has already cornered the model construction API, and the implementation just needs some tweaking to allow other operations on the models. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From dagss at student.matnat.uio.no Wed Mar 25 18:23:20 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 25 Mar 2009 23:23:20 +0100 Subject: [SciPy-dev] Cython, f2py and GSoC Message-ID: <49CAAED8.4030204@student.matnat.uio.no> This is in preparation for a GSoC project application; Kurt Smith has approached me about doing a project on Fortran integration in Cython with myself as mentor. Although we started out with a Fortran/Cython perspective, we think that this potentially affects the SciPy community and f2py as well. The main issues: 1) f2py doesn't work that well for Cython, as it requires Python packing/unpacking of arguments. A more direct call approach is needed. 2) f2py and Cython has a certain overlap in their implementation (both generate Python extension modules), and need to tackle many of the same issues both now and especially in the future Could we solve this so that in getting Fortran/Cython integration, we also set up a development path for further development of f2py with Cython as a backend? Below is a scetch of our current plan to give you an idea. More full specifications etc. will come later and we can have any discussions then. 1) Add a Cython syntax and API for passing acquired PEP-3118 buffers/NumPy arrays to external C functions (i.e. as a struct with the necesarry information (pointer, shape, strides)). This simply means defining a syntax for passing information that Cython already has to an external C function. 2) Create a new tool which uses the parser part of f2py (with any necesarry improvements) but adds a different backend which generates a C interface to the given Fortran module, along with a Cython pxd file for it. (Adding a C .h file target, to get "f2c" functionality, would be trivial.) This will be done using the Fortran 2003 C bindings. So a .f90 file is generated which compiles to a C interface to the library. Array parameters will be passed as the PEP-3118-like structs we define in 1), and so the functions will be callable directly with e.g. NumPy arrays from Cython. Copy-in/out might be necesarry for Fortran to be able to work with the arrays, if so this will happen in the Fortran wrapper generated by this new tool. 3) One could then add a feature to Cython to automatically export external functions as Python functions so that one doesn't have to write wrapper stubs. This should bring the functionality to a level comparable to current f2py. Now, how does the SciPy community see this project? 1) Is there a potential for a joint Cython/SciPy project on Fortran/Python integration here? I could do the main mentoring work, but support of the idea etc. is important too. 2) Any co-mentors perhaps on the f2py parser side? Improvements might be needed there. 3) Would you prefer us to a) rip/fork the parser out of f2py and stay within the Cython project, or b) work on f2py upstream to add another backend? Or something else? -- Dag Sverre From martyfuhry at gmail.com Wed Mar 25 20:22:23 2009 From: martyfuhry at gmail.com (Marty Fuhry) Date: Wed, 25 Mar 2009 20:22:23 -0400 Subject: [SciPy-dev] Summer of Code: Proposal for Implementing date/time types in NumPy In-Reply-To: <49CA60D4.2070109@enthought.com> References:

<49CA60D4.2070109@enthought.com> Message-ID: -theflamingchicken On Wed, Mar 25, 2009 at 12:50 PM, Travis E. Oliphant wrote: > Matt Knox wrote: >>> Hello, >>> I was reading through the Summer of Code ideas and I'm terribly >>> interested in date/time proposal >>> (http://projects.scipy.org/numpy/browser/trunk/doc/neps/datetime-proposal3.rst). >>> I would love to work on this for a Google Summer of Code project. I'm >>> a sophmore studying Computer Science and Mathematics at Kent State >>> University in Ohio, so this project directly relates to my studies. Is >>> there anyone looking into this proposal yet? >>> >>> >>> You might want to cross post on the numpy mailing list also just to make sure >>> Francesc sees it. Note also the recent post by Matt Knox (Recommended >>> compiler...), he and Pierre might also be interested in this work. >>> >> >> Not to discourage anyone from working on this, but since my name was brought up >> I'll just mention that I am not personally that interested in this project as >> the scikits.timeseries package fulfills my needs for handling date/time data. We >> use an integer array with some meta data to represent the "dates" portion of our >> TimeSeries class and it seems to work fairly well for the most part. It isn't >> very often I need to store dates for the actual "values" in an array (as opposed >> to just representing the time dimension), but for the odd occasion that I do I >> find using an object dtype array with standard datetime values to be sufficient. >> >> That being said, the timeseries module is not the silver bullet for every need >> and does have limitations that are addressed in this proposal such as >> frequencies higher than seconds (microsecond, etc), but again these aren't >> interesting to me personally given the type of data I work with. And things like >> "Quarterly frequency with different origins" (mentioned at the bottom of the >> proposal) are very important for the timeseries module but considered out of >> scope for the date/time data type enhancement proposal. >> > > I've had a chance recently to look at the Date class in the timeseries > module and I liked the way it was put together. ? ?I think it would > benefit NumPy to have something like this class more tightly integrated > with the NumPy distribution. > > My approach would be to make a NumPy dtype that is basically the Date > class with more frequencies as taken from the date/time proposal. > But, perhaps just pulling the DateArray into NumPy is a sufficient first > step. > > -Travis > > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > >Do you know Laura Smithies? I was in graduate school with her. Thanks, Chuck. I don't know any Laura Smithies, sorry. >My approach would be to make a NumPy dtype that is basically the Date >class with more frequencies as taken from the date/time proposal. I'll be sure to take a look at the scikits.timeseries package before writing up an application. Is this date / time proposal still a candidate for GSoC? From david.huard at gmail.com Wed Mar 25 23:44:53 2009 From: david.huard at gmail.com (David Huard) Date: Wed, 25 Mar 2009 23:44:53 -0400 Subject: [SciPy-dev] Porting SciPy to Py3k GSOC project In-Reply-To: <49CA3F2D.8060105@student.matnat.uio.no> References: <73531abb0903221911k1d862881q9db5f387fa93bb39@mail.gmail.com> <9457e7c80903240111r48e3bb62u7df112277bdb78e9@mail.gmail.com> <49C8981A.80805@ar.media.kyoto-u.ac.jp> <73531abb0903241222x545ca7e7t28a9961f06d86791@mail.gmail.com> <73531abb0903241444p2e42839ay619c9e65a1d2ed50@mail.gmail.com> <91cf711d0903250715lccff543n6612c31a544a3fad@mail.gmail.com> <49CA3F2D.8060105@student.matnat.uio.no> Message-ID: <91cf711d0903252044x1014a58dmafc3507975ca642d@mail.gmail.com> On Wed, Mar 25, 2009 at 10:26 AM, Dag Sverre Seljebotn < dagss at student.matnat.uio.no> wrote: > David Huard wrote: > > > > My understanding is that Pearu still maintains the code in the sense > > that he fixes the occasional bugs, but does not actively develop it > > anymore. He started a refactoring of the code known as f2py g3 which > > is hosted at http://launchpad.net/f2py/ but this project seems to be > > on hold for the moment. > > If you wish to work on f2py, I suggest you make sure he has some time > > to act as mentor for this project > Note that I'm currently discussing a GSoC project in the Cython camp on > Fortran integration with a promising student (with me as mentor). One > possibility we're looking at is using f2py for parsing and Cython as the > backend/output format. The aim would be to have a more transparent > Cython/Fortran experience without having to go through a Python layer. > Dag, This is a great idea. I really hope it works out. Just in case you were not aware, f2py g3 has a standalone fortran parser, so no need to pry that out of numpy.f2py. David > I'll get back to the NumPy list in a day or two when we have discussed > the road we want to take a bit more. > > Dag Sverre > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott.collier at utoronto.ca Wed Mar 25 23:49:11 2009 From: scott.collier at utoronto.ca (scott.collier at utoronto.ca) Date: Wed, 25 Mar 2009 23:49:11 -0400 Subject: [SciPy-dev] Google Summer of Code 2009 Message-ID: <20090325234911.38omt96kg00os0k0@webmail.utoronto.ca> My name is Scott Collier. I'm a student at the University of Toronto studying Physics, Mathematics and Computer Science, and am considering applying to SciPy/SymPy for the google summer of code program. I am especially interesting in the application/simulation of Newtonian Mechanics and Special Relativity using Python. I have basic knowledge in python, and love programming, physics, and math, so being able to integrate the three would be an incredible way to spend a summer. Any help/guidance would be much appreciated. Scott. From charlesr.harris at gmail.com Thu Mar 26 00:34:11 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 25 Mar 2009 22:34:11 -0600 Subject: [SciPy-dev] Google Summer of Code 2009 In-Reply-To: <20090325234911.38omt96kg00os0k0@webmail.utoronto.ca> References: <20090325234911.38omt96kg00os0k0@webmail.utoronto.ca> Message-ID: On Wed, Mar 25, 2009 at 9:49 PM, wrote: > My name is Scott Collier. I'm a student at the University of Toronto > studying Physics, Mathematics and Computer Science, and am considering > applying to SciPy/SymPy for the google summer of code program. I am > especially interesting in the application/simulation of Newtonian > Mechanics and Special Relativity using Python. I have basic knowledge > in python, and love programming, physics, and math, so being able to > integrate the three would be an incredible way to spend a summer. > Any help/guidance would be much appreciated. > Something along theselines? You might try talking to those folks. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.huard at gmail.com Thu Mar 26 10:01:44 2009 From: david.huard at gmail.com (David Huard) Date: Thu, 26 Mar 2009 10:01:44 -0400 Subject: [SciPy-dev] Cython, f2py and GSoC In-Reply-To: <49CAAED8.4030204@student.matnat.uio.no> References: <49CAAED8.4030204@student.matnat.uio.no> Message-ID: <91cf711d0903260701va983eccg21f498e8ea7a89c4@mail.gmail.com> On Wed, Mar 25, 2009 at 6:23 PM, Dag Sverre Seljebotn < dagss at student.matnat.uio.no> wrote: > This is in preparation for a GSoC project application; Kurt Smith has > approached me about doing a project on Fortran integration in Cython > with myself as mentor. > > Although we started out with a Fortran/Cython perspective, we think > that this potentially affects the SciPy community and f2py as well. > > The main issues: > 1) f2py doesn't work that well for Cython, as it requires Python > packing/unpacking of arguments. A more direct call approach is needed. > 2) f2py and Cython has a certain overlap in their implementation (both > generate Python extension modules), and need to tackle many of the same > issues both now and especially in the future > > Could we solve this so that in getting Fortran/Cython integration, we > also set up a development path for further development of f2py with > Cython as a backend? > > Below is a scetch of our current plan to give you an idea. More full > specifications etc. will come later and we can have any discussions then. > > 1) Add a Cython syntax and API for passing acquired PEP-3118 > buffers/NumPy arrays to external C functions (i.e. as a struct with the > necesarry information (pointer, shape, strides)). This simply means > defining a syntax for passing information that Cython already has to an > external C function. > > 2) Create a new tool which uses the parser part of f2py (with any > necesarry improvements) but adds a different backend which generates a C > interface to the given Fortran module, along with a Cython pxd file for > it. (Adding a C .h file target, to get "f2c" functionality, would be > trivial.) > > This will be done using the Fortran 2003 C bindings. So a .f90 file is > generated which compiles to a C interface to the library. Array > parameters will be passed as the PEP-3118-like structs we define in 1), > and so the functions will be callable directly with e.g. NumPy arrays > from Cython. Copy-in/out might be necesarry for Fortran to be able to > work with the arrays, if so this will happen in the Fortran wrapper > generated by this new tool. > Do you plan to write a modified copy of the entire subroutine or just a wrapper subroutine accepting C types, which then calls the original function ? > > 3) One could then add a feature to Cython to automatically export > external functions as Python functions so that one doesn't have to write > wrapper stubs. This should bring the functionality to a level comparable > to current f2py. > This would be really nice. > > Now, how does the SciPy community see this project? > > 1) Is there a potential for a joint Cython/SciPy project on > Fortran/Python integration here? I could do the main mentoring work, but > support of the idea etc. is important too. > I think this is a great project. I really wish I could interface fortran and python more closely. Many new features of f90 are not supported by f2py, so I am ending up writing python code that I already wrote in f90. > > 2) Any co-mentors perhaps on the f2py parser side? Improvements might > be needed there. > > 3) Would you prefer us to a) rip/fork the parser out of f2py and stay > within the Cython project, or b) work on f2py upstream to add another > backend? Or something else? > > -- > Dag Sverre > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dagss at student.matnat.uio.no Thu Mar 26 10:45:13 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 26 Mar 2009 15:45:13 +0100 Subject: [SciPy-dev] Cython, f2py and GSoC In-Reply-To: <91cf711d0903260701va983eccg21f498e8ea7a89c4@mail.gmail.com> References: <49CAAED8.4030204@student.matnat.uio.no> <91cf711d0903260701va983eccg21f498e8ea7a89c4@mail.gmail.com> Message-ID: <49CB94F9.3070708@student.matnat.uio.no> David Huard wrote: > > > On Wed, Mar 25, 2009 at 6:23 PM, Dag Sverre Seljebotn > > wrote: > > This is in preparation for a GSoC project application; Kurt Smith has > approached me about doing a project on Fortran integration in Cython > with myself as mentor. > > Although we started out with a Fortran/Cython perspective, we think > that this potentially affects the SciPy community and f2py as well. > > The main issues: > 1) f2py doesn't work that well for Cython, as it requires Python > packing/unpacking of arguments. A more direct call approach is needed. > 2) f2py and Cython has a certain overlap in their implementation > (both > generate Python extension modules), and need to tackle many of the > same > issues both now and especially in the future > > Could we solve this so that in getting Fortran/Cython integration, we > also set up a development path for further development of f2py with > Cython as a backend? > > Below is a scetch of our current plan to give you an idea. More full > specifications etc. will come later and we can have any > discussions then. > > 1) Add a Cython syntax and API for passing acquired PEP-3118 > buffers/NumPy arrays to external C functions (i.e. as a struct > with the > necesarry information (pointer, shape, strides)). This simply means > defining a syntax for passing information that Cython already has > to an > external C function. > > 2) Create a new tool which uses the parser part of f2py (with any > necesarry improvements) but adds a different backend which > generates a C > interface to the given Fortran module, along with a Cython pxd > file for > it. (Adding a C .h file target, to get "f2c" functionality, would be > trivial.) > > This will be done using the Fortran 2003 C bindings. So a .f90 file is > generated which compiles to a C interface to the library. Array > parameters will be passed as the PEP-3118-like structs we define > in 1), > and so the functions will be callable directly with e.g. NumPy arrays > from Cython. Copy-in/out might be necesarry for Fortran to be able to > work with the arrays, if so this will happen in the Fortran wrapper > generated by this new tool. > > > Do you plan to write a modified copy of the entire subroutine or just > a wrapper subroutine accepting C types, which then calls the original > function ? The latter. Though we could always skip wrapping of functions which already are declared BIND(C), that's actually a nice idea. This would make it possible to write a wrapper manually and skip this overhead in extreme situations (where the Fortran compiler also cannot inline the call), while still having Python bindings generated. Dag Sverre From sturla at molden.no Thu Mar 26 11:20:20 2009 From: sturla at molden.no (Sturla Molden) Date: Thu, 26 Mar 2009 16:20:20 +0100 Subject: [SciPy-dev] Cython, f2py and GSoC In-Reply-To: <49CB94F9.3070708@student.matnat.uio.no> References: <49CAAED8.4030204@student.matnat.uio.no> <91cf711d0903260701va983eccg21f498e8ea7a89c4@mail.gmail.com> <49CB94F9.3070708@student.matnat.uio.no> Message-ID: <49CB9D34.7060601@molden.no> On 3/26/2009 3:45 PM, Dag Sverre Seljebotn wrote: > Though we could always skip wrapping of functions which already are > declared BIND(C), that's actually a nice idea. This would make it > possible to write a wrapper manually and skip this overhead in extreme > situations (where the Fortran compiler also cannot inline the call), > while still having Python bindings generated. If the idea is to take a Fortran module and autogenerate an .f03 file with C bindings and a corresponding C/C++ header, I don't really see where Cython comes in. Such a tool would be useful beyond the Python community. Sturla Molden From rjsm at umich.edu Thu Mar 26 13:00:09 2009 From: rjsm at umich.edu (ross smith) Date: Thu, 26 Mar 2009 13:00:09 -0400 Subject: [SciPy-dev] NumPy f2py GSOC project Message-ID: <73531abb0903261000n333a4651hdeda83130e09c9b6@mail.gmail.com> Hello, I've done some looking at what will be required for the project. 1. port f2py to Cython (Py3k). a. begin porting the C code to python, without worrying about speed-ups. Instead focusing on proper runtime behavior b. ask community to help with providing fortran code to test proper runtime behavior. c. develop small test suit of fortran/python code for problem cases. d. Fix major bugs introduced from porting. e. Document code, (docstrings) f. Beginning modifying code for speedier execution Midterm eval: a. Have f2py code all ported to Cython and compiling. b. Have at least 5 tests written and working in ported code. Final Eval a. Major bugs found during the summer squashed. b. f2py compiling and running as expected for all test cases. c. Documentation completed for Cython code. I'd happily stay on to bugfix once the summer is done. I'd still be a student so my availability would be limited at times. Comments ans Suggestions, as always, are welcome. -Ross -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwmsmith at gmail.com Thu Mar 26 13:14:33 2009 From: kwmsmith at gmail.com (Kurt Smith) Date: Thu, 26 Mar 2009 12:14:33 -0500 Subject: [SciPy-dev] Cython, f2py and GSoC In-Reply-To: <49CB9D34.7060601@molden.no> References: <49CAAED8.4030204@student.matnat.uio.no> <91cf711d0903260701va983eccg21f498e8ea7a89c4@mail.gmail.com> <49CB94F9.3070708@student.matnat.uio.no> <49CB9D34.7060601@molden.no> Message-ID: On Thu, Mar 26, 2009 at 10:20 AM, Sturla Molden wrote: > On 3/26/2009 3:45 PM, Dag Sverre Seljebotn wrote: > >> Though we could always skip wrapping of functions which already are >> declared BIND(C), that's actually a nice idea. This would make it >> possible to write a wrapper manually and skip this overhead in extreme >> situations (where the Fortran compiler also cannot inline the call), >> while still having Python bindings generated. > > > If the idea is to take a Fortran module and autogenerate an .f03 file > with C bindings and a corresponding C/C++ header, I don't really see > where Cython comes in. Such a tool would be useful beyond the Python > community. The project has 2 'fronts': the Fortran <-> C/Cython/Python bindings end with a patched & updated f2py, and enhancing Cython to pass python buffers to external functions. The Fortran bindings generator would be able to generate bindings that are python-buffer aware. Currently one has to include the array bounds information in the argument list for all fortran functions to work with f2py; one can't use the nice assumed-shape arrays in Fortran 90 and later. We would be adding that functionality (among other things). One could then seamlessly pass around multi-dimensional numpy arrays to external fortran functions that use assumed-shape arrays. Kurt From david.huard at gmail.com Thu Mar 26 13:30:21 2009 From: david.huard at gmail.com (David Huard) Date: Thu, 26 Mar 2009 13:30:21 -0400 Subject: [SciPy-dev] Cython, f2py and GSoC In-Reply-To: References: <49CAAED8.4030204@student.matnat.uio.no> <91cf711d0903260701va983eccg21f498e8ea7a89c4@mail.gmail.com> <49CB94F9.3070708@student.matnat.uio.no> <49CB9D34.7060601@molden.no> Message-ID: <91cf711d0903261030n331df8b9h90d9beb02e5a1e9b@mail.gmail.com> On Thu, Mar 26, 2009 at 1:14 PM, Kurt Smith wrote: > On Thu, Mar 26, 2009 at 10:20 AM, Sturla Molden wrote: > > On 3/26/2009 3:45 PM, Dag Sverre Seljebotn wrote: > > > >> Though we could always skip wrapping of functions which already are > >> declared BIND(C), that's actually a nice idea. This would make it > >> possible to write a wrapper manually and skip this overhead in extreme > >> situations (where the Fortran compiler also cannot inline the call), > >> while still having Python bindings generated. > > > > > > If the idea is to take a Fortran module and autogenerate an .f03 file > > with C bindings and a corresponding C/C++ header, I don't really see > > where Cython comes in. Such a tool would be useful beyond the Python > > community. > > The project has 2 'fronts': the Fortran <-> C/Cython/Python bindings > end with a patched & updated f2py, and enhancing Cython to pass python > buffers to external functions. > > The Fortran bindings generator would be able to generate bindings that > are python-buffer aware. Currently one has to include the array > bounds information in the argument list for all fortran functions to > work with f2py; one can't use the nice assumed-shape arrays in Fortran > 90 and later. We would be adding that functionality (among other > things). One could then seamlessly pass around multi-dimensional > numpy arrays to external fortran functions that use assumed-shape > arrays. This would be great ! Have you thought of a way to handle ELEMENTAL statements ? Could they be converted into ufuncs directly ? David > > > Kurt > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.huard at gmail.com Thu Mar 26 13:51:03 2009 From: david.huard at gmail.com (David Huard) Date: Thu, 26 Mar 2009 13:51:03 -0400 Subject: [SciPy-dev] NumPy f2py GSOC project In-Reply-To: <73531abb0903261000n333a4651hdeda83130e09c9b6@mail.gmail.com> References: <73531abb0903261000n333a4651hdeda83130e09c9b6@mail.gmail.com> Message-ID: <91cf711d0903261051w56d243k4b9bf50e89a80625@mail.gmail.com> Ross, I still don't understand how you'll test the py3k version of f2py if numpy is not ported first ? I would have thought the first step in porting numpy would be to port the ndarray type and its methods. But maybe that's already too ambitious for a gsoc ? Cheers, David 2009/3/26 ross smith > Hello, > > I've done some looking at what will be required for the project. > > > 1. port f2py to Cython (Py3k). > a. begin porting the C code to python, without worrying about > speed-ups. Instead focusing on proper runtime behavior > b. ask community to help with providing fortran code to test proper > runtime behavior. > c. develop small test suit of fortran/python code for problem cases. > d. Fix major bugs introduced from porting. > e. Document code, (docstrings) > f. Beginning modifying code for speedier execution > > Midterm eval: > a. Have f2py code all ported to Cython and compiling. > b. Have at least 5 tests written and working in ported code. > Final Eval > a. Major bugs found during the summer squashed. > b. f2py compiling and running as expected for all test cases. > c. Documentation completed for Cython code. > > > I'd happily stay on to bugfix once the summer is done. I'd still be a > student so my availability would be limited at times. > > Comments ans Suggestions, as always, are welcome. > > > -Ross > > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Thu Mar 26 13:55:16 2009 From: cournape at gmail.com (David Cournapeau) Date: Fri, 27 Mar 2009 02:55:16 +0900 Subject: [SciPy-dev] NumPy f2py GSOC project In-Reply-To: <91cf711d0903261051w56d243k4b9bf50e89a80625@mail.gmail.com> References: <73531abb0903261000n333a4651hdeda83130e09c9b6@mail.gmail.com> <91cf711d0903261051w56d243k4b9bf50e89a80625@mail.gmail.com> Message-ID: <5b8d13220903261055h409df166pa12757d07d8a3cac@mail.gmail.com> 2009/3/27 David Huard : > Ross, > > I still don't understand how you'll test the py3k version of f2py if numpy > is not ported first ? f2py itself could be ported to py3k but to generate python 2. C code, right ? > I would have thought the first step in porting numpy > would be to port the ndarray type and its methods. But maybe that's already > too ambitious for a gsoc ? I think that's very ambitious, even for people already intimately familiar with numpy C core, cheers, David From david.huard at gmail.com Thu Mar 26 15:25:58 2009 From: david.huard at gmail.com (David Huard) Date: Thu, 26 Mar 2009 15:25:58 -0400 Subject: [SciPy-dev] NumPy f2py GSOC project In-Reply-To: <5b8d13220903261055h409df166pa12757d07d8a3cac@mail.gmail.com> References: <73531abb0903261000n333a4651hdeda83130e09c9b6@mail.gmail.com> <91cf711d0903261051w56d243k4b9bf50e89a80625@mail.gmail.com> <5b8d13220903261055h409df166pa12757d07d8a3cac@mail.gmail.com> Message-ID: <91cf711d0903261225g72547205t9f1798e4a0a20af9@mail.gmail.com> On Thu, Mar 26, 2009 at 1:55 PM, David Cournapeau wrote: > 2009/3/27 David Huard : > > Ross, > > > > I still don't understand how you'll test the py3k version of f2py if > numpy > > is not ported first ? > > f2py itself could be ported to py3k but to generate python 2. C code, right > ? > RIght, and how does this play with the other cython/f2py gsoc project ? > > > I would have thought the first step in porting numpy > > would be to port the ndarray type and its methods. But maybe that's > already > > too ambitious for a gsoc ? > > I think that's very ambitious, even for people already intimately > familiar with numpy C core, > > cheers, > > David > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dagss at student.matnat.uio.no Thu Mar 26 15:43:20 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 26 Mar 2009 20:43:20 +0100 Subject: [SciPy-dev] Cython, f2py and GSoC In-Reply-To: <49CB9D34.7060601@molden.no> References: <49CAAED8.4030204@student.matnat.uio.no> <91cf711d0903260701va983eccg21f498e8ea7a89c4@mail.gmail.com> <49CB94F9.3070708@student.matnat.uio.no> <49CB9D34.7060601@molden.no> Message-ID: <49CBDAD8.1090103@student.matnat.uio.no> Sturla Molden wrote: > On 3/26/2009 3:45 PM, Dag Sverre Seljebotn wrote: > >> Though we could always skip wrapping of functions which already are >> declared BIND(C), that's actually a nice idea. This would make it >> possible to write a wrapper manually and skip this overhead in extreme >> situations (where the Fortran compiler also cannot inline the call), >> while still having Python bindings generated. > > > If the idea is to take a Fortran module and autogenerate an .f03 file > with C bindings and a corresponding C/C++ header, I don't really see > where Cython comes in. Such a tool would be useful beyond the Python > community. Indeed. Thanks for bringing that up. If somebody had already done the "f2c" (f2py with C backend) part I'm sure both me and Kurt would like to focus on Cython; but as it is, necessity is the mother of invention. The "f2c" will be kept seperate from Cython; we hope that it can go into f2py (as that's who has the parser) but that is not up to us. Some further thoughts: a) As Kurt touched upon, there's some Cython-specific features needed as well to make this happen. I hope that about 50% of the project can be on the Cython part and 50% on "f2c". b) While there's a clear technical seperation between Fortran <-> C and C <-> Cython, the main challenge is the same both places (dealing with strided arrays on a C level) and so it's benefitial to have them within the same project. c) Going via C one will have "information loss" in the function signature as there's no canonical strided array type for C. Therefore a pxd for Cython must be generated directly as well. BTW, part of that Cython work would involve getting 90% of the way towards cdef utility_function_of_buffer(ndarray[int] foo): ... without reacquisition of the buffer :-) (i.e. caller-acquired buffer) (The only similar thing I found to "f2c" to this was some Babel-specific XSLT transforms in Chasm... if you know of anything else please tell us!) -- Dag Sverre From dagss at student.matnat.uio.no Thu Mar 26 15:45:53 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 26 Mar 2009 20:45:53 +0100 Subject: [SciPy-dev] Cython, f2py and GSoC In-Reply-To: <91cf711d0903261030n331df8b9h90d9beb02e5a1e9b@mail.gmail.com> References: <49CAAED8.4030204@student.matnat.uio.no> <91cf711d0903260701va983eccg21f498e8ea7a89c4@mail.gmail.com> <49CB94F9.3070708@student.matnat.uio.no> <49CB9D34.7060601@molden.no> <91cf711d0903261030n331df8b9h90d9beb02e5a1e9b@mail.gmail.com> Message-ID: <49CBDB71.1030308@student.matnat.uio.no> David Huard wrote: > The Fortran bindings generator would be able to generate bindings that > are python-buffer aware. Currently one has to include the array > bounds information in the argument list for all fortran functions to > work with f2py; one can't use the nice assumed-shape arrays in Fortran > 90 and later. We would be adding that functionality (among other > things). One could then seamlessly pass around multi-dimensional > numpy arrays to external fortran functions that use assumed-shape > arrays. > > > This would be great ! > > Have you thought of a way to handle ELEMENTAL statements ? Could they be > converted into ufuncs directly ? It's possible to take Cython to a level where this could be done through Cython, and I hope that will happen, but I don't think it should fall within the scope of this project because of the workload. -- Dag Sverre From rjsm at umich.edu Thu Mar 26 16:16:35 2009 From: rjsm at umich.edu (ross smith) Date: Thu, 26 Mar 2009 16:16:35 -0400 Subject: [SciPy-dev] NumPy f2py GSOC project In-Reply-To: <91cf711d0903261225g72547205t9f1798e4a0a20af9@mail.gmail.com> References: <73531abb0903261000n333a4651hdeda83130e09c9b6@mail.gmail.com> <91cf711d0903261051w56d243k4b9bf50e89a80625@mail.gmail.com> <5b8d13220903261055h409df166pa12757d07d8a3cac@mail.gmail.com> <91cf711d0903261225g72547205t9f1798e4a0a20af9@mail.gmail.com> Message-ID: <73531abb0903261316i5c4e805ue783f582e9bea386@mail.gmail.com> When I looked at that one, It seemed to me like they are trying to use it to bind fortran and C together with only a thin, fairly transparent python layer in the middle. This one, moves the f2py code to Cython to ease development later on and help with compatibility between Py3k and Py2x. I don't see any reason why the two projects would be incompatible. if I'm wrong please correct me. -Ross 2009/3/26 David Huard > > > On Thu, Mar 26, 2009 at 1:55 PM, David Cournapeau > wrote: > >> 2009/3/27 David Huard : >> > Ross, >> > >> > I still don't understand how you'll test the py3k version of f2py if >> numpy >> > is not ported first ? >> >> f2py itself could be ported to py3k but to generate python 2. C code, >> right ? >> > > RIght, and how does this play with the other cython/f2py gsoc project ? > > >> >> > I would have thought the first step in porting numpy >> > would be to port the ndarray type and its methods. But maybe that's >> already >> > too ambitious for a gsoc ? >> >> I think that's very ambitious, even for people already intimately >> familiar with numpy C core, >> >> cheers, >> >> David >> _______________________________________________ >> Scipy-dev mailing list >> Scipy-dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > > _______________________________________________ > Scipy-dev mailing list > Scipy-dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwmsmith at gmail.com Thu Mar 26 16:44:30 2009 From: kwmsmith at gmail.com (Kurt Smith) Date: Thu, 26 Mar 2009 15:44:30 -0500 Subject: [SciPy-dev] NumPy f2py GSOC project In-Reply-To: <73531abb0903261316i5c4e805ue783f582e9bea386@mail.gmail.com> References: <73531abb0903261000n333a4651hdeda83130e09c9b6@mail.gmail.com> <91cf711d0903261051w56d243k4b9bf50e89a80625@mail.gmail.com> <5b8d13220903261055h409df166pa12757d07d8a3cac@mail.gmail.com> <91cf711d0903261225g72547205t9f1798e4a0a20af9@mail.gmail.com> <73531abb0903261316i5c4e805ue783f582e9bea386@mail.gmail.com> Message-ID: