From kmichael.aye at gmail.com Wed May 1 17:46:26 2013 From: kmichael.aye at gmail.com (K.-Michael Aye) Date: Wed, 1 May 2013 14:46:26 -0700 Subject: [SciPy-User] scipy sprint @ EuroSciPy '12? References: Message-ID: On 2012-05-01 19:53:59 +0000, Ralf Gommers said: > Hi all, > > Would people be interested in having a scipy sprint at EuroSciPy this > year? Last year we tried to do a last minute mini-sprint and ended up > hunting for wifi access for half the time, so it would be good to > organize things better this time around. I'm thinking a one-day sprint > on Wed Aug 22nd would be good. > > If there's interest, I'm happy to organize (contact the conference > organizers for a room, create a wiki page, etc.). I am currently trying to find out what sprints are actually happening at all, can't seem to find any sprints overview page? I would be happy to join a sprint for the first time, after many years of using scipy sucessfully, now that I'm not so far away from the conference. Michael > > Ralf > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From kmichael.aye at gmail.com Wed May 1 17:47:42 2013 From: kmichael.aye at gmail.com (K.-Michael Aye) Date: Wed, 1 May 2013 14:47:42 -0700 Subject: [SciPy-User] scipy sprint @ EuroSciPy '12? References: Message-ID: On 2013-05-01 21:46:26 +0000, K.-Michael Aye said: > On 2012-05-01 19:53:59 +0000, Ralf Gommers said: > >> Hi all, >> >> Would people be interested in having a scipy sprint at EuroSciPy this >> year? Last year we tried to do a last minute mini-sprint and ended up >> hunting for wifi access for half the time, so it would be good to >> organize things better this time around. I'm thinking a one-day sprint >> on Wed Aug 22nd would be good. >> >> If there's interest, I'm happy to organize (contact the conference >> organizers for a room, create a wiki page, etc.). > > I am currently trying to find out what sprints are actually happening > at all, can't seem to find any sprints overview page? > I would be happy to join a sprint for the first time, after many years > of using scipy sucessfully, now that I'm not so far away from the > conference. Oops, scratch that, I didn't see that this was for the EuroScipy, sorry! M. > > Michael > > >> >> Ralf >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev From victor.gonzalez at geomati.co Fri May 3 04:24:58 2013 From: victor.gonzalez at geomati.co (Victor Gonzalez) Date: Fri, 3 May 2013 10:24:58 +0200 Subject: [SciPy-User] Kernel analysis from R to scipy Message-ID: Hi all, I'm trying to migrate a functionality from R to python. It has to perform a kernel density estimation and I managed to use gaussian_kde [1] successfully. The problem is that I need to use the least squares cross-validation (LSCV) for the bandwidth selection and, as far as I can see, only 'scotts' and 'silverman' rule of thumb are supported. Is there a way to perform the kernel estimation with LSCV? If you need more info on what I'm trying to do, here is the definition in R of the kernel that I'm trying to migrate [2] (page 124). Here is an example of how to use the kernel in R to obtain some results [3]. And here is the link to download the R source code [4]. Thanks in advance, V?ctor. [1] http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html [2] http://cran.r-project.org/web/packages/adehabitat/adehabitat.pdf [3] https://trac.faunalia.it/animove/wiki/AnimoveHowto#Kernelhomerange [4] http://cran.r-project.org/web/packages/adehabitat/index.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From dan.stowell at eecs.qmul.ac.uk Fri May 3 04:59:51 2013 From: dan.stowell at eecs.qmul.ac.uk (Dan Stowell) Date: Fri, 03 May 2013 09:59:51 +0100 Subject: [SciPy-User] multi-dimensional scaling Message-ID: <51837C87.6030109@eecs.qmul.ac.uk> Hello, I'm looking in scipy for something to perform multi-dimensional scaling*. I don't see anything - have I missed it? Is it easy to make it from scipy components? Thanks Dan * http://en.wikipedia.org/wiki/Multidimensional_scaling -- Dan Stowell Postdoctoral Research Assistant Centre for Digital Music Queen Mary, University of London Mile End Road, London E1 4NS http://www.elec.qmul.ac.uk/digitalmusic/people/dans.htm http://www.mcld.co.uk/ From josef.pktd at gmail.com Fri May 3 07:27:07 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 3 May 2013 07:27:07 -0400 Subject: [SciPy-User] Kernel analysis from R to scipy In-Reply-To: References: Message-ID: On Fri, May 3, 2013 at 4:24 AM, Victor Gonzalez wrote: > Hi all, > > I'm trying to migrate a functionality from R to python. It has to perform a > kernel density estimation and I managed to use gaussian_kde [1] > successfully. The problem is that I need to use the least squares > cross-validation (LSCV) for the bandwidth selection and, as far as I can > see, only 'scotts' and 'silverman' rule of thumb are supported. Is there a > way to perform the kernel estimation with LSCV? > > If you need more info on what I'm trying to do, here is the definition in R > of the kernel that I'm trying to migrate [2] (page 124). Here is an example > of how to use the kernel in R to obtain some results [3]. And here is the > link to download the R source code [4]. > > Thanks in advance, > V?ctor. > > [1] > http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html > [2] http://cran.r-project.org/web/packages/adehabitat/adehabitat.pdf > [3] https://trac.faunalia.it/animove/wiki/AnimoveHowto#Kernelhomerange > [4] http://cran.r-project.org/web/packages/adehabitat/index.html statsmodels has LSCV for kernel density estimation http://statsmodels.sourceforge.net/notebooks/nonparametric.html there are tutorial examples for the univariate case http://statsmodels.sourceforge.net/notebooks/examples/notebooks/generated/kernel_density.html this requires a recent master version of statsmodels, or a soon to be released version. Josef > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From zachary.pincus at yale.edu Fri May 3 08:36:25 2013 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Fri, 3 May 2013 08:36:25 -0400 Subject: [SciPy-User] multi-dimensional scaling In-Reply-To: <51837C87.6030109@eecs.qmul.ac.uk> References: <51837C87.6030109@eecs.qmul.ac.uk> Message-ID: > I'm looking in scipy for something to perform multi-dimensional > scaling*. I don't see anything - have I missed it? Is it easy to make it > from scipy components? > > Thanks > Dan > > * http://en.wikipedia.org/wiki/Multidimensional_scaling MDS is more a class of approaches than a specific algorithm. If you want to do "classic" MDS with euclidian distances as the metric, then you would use PCA to implement that: http://stats.stackexchange.com/questions/14002/whats-the-difference-between-principal-components-analysis-and-multidimensional And PCA is just a simple eigendecomposition that you can build from the basic linear algebra tools in numpy. I'm happy to send over the short wrapper code I wrote to do "PCA" on data in a vaguely smart way, if you want. Zach From nelle.varoquaux at gmail.com Fri May 3 08:39:18 2013 From: nelle.varoquaux at gmail.com (Nelle Varoquaux) Date: Fri, 3 May 2013 14:39:18 +0200 Subject: [SciPy-User] multi-dimensional scaling In-Reply-To: References: <51837C87.6030109@eecs.qmul.ac.uk> Message-ID: > > I'm looking in scipy for something to perform multi-dimensional > > scaling*. I don't see anything - have I missed it? Is it easy to make it > > from scipy components? > > > > Thanks > > Dan > > > > * http://en.wikipedia.org/wiki/Multidimensional_scaling > > > MDS is more a class of approaches than a specific algorithm. If you want > to do "classic" MDS with euclidian distances as the metric, then you would > use PCA to implement that: > > http://stats.stackexchange.com/questions/14002/whats-the-difference-between-principal-components-analysis-and-multidimensional > > And PCA is just a simple eigendecomposition that you can build from the > basic linear algebra tools in numpy. I'm happy to send over the short > wrapper code I wrote to do "PCA" on data in a vaguely smart way, if you > want. > There are both the classical MDS (smacof algorithm) and NMDS (non metric) in scikit-learn (and PCA :) ). Cheers, N > > Zach > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.gonzalez at geomati.co Fri May 3 09:01:50 2013 From: victor.gonzalez at geomati.co (Victor Gonzalez) Date: Fri, 3 May 2013 15:01:50 +0200 Subject: [SciPy-User] Kernel analysis from R to scipy In-Reply-To: References: Message-ID: Thanks a lot for the quick response! I'll check it later, but seems exactly what I've been looking for. Thanks! V?ctor. 2013/5/3 > On Fri, May 3, 2013 at 4:24 AM, Victor Gonzalez > wrote: > > Hi all, > > > > I'm trying to migrate a functionality from R to python. It has to > perform a > > kernel density estimation and I managed to use gaussian_kde [1] > > successfully. The problem is that I need to use the least squares > > cross-validation (LSCV) for the bandwidth selection and, as far as I can > > see, only 'scotts' and 'silverman' rule of thumb are supported. Is there > a > > way to perform the kernel estimation with LSCV? > > > > If you need more info on what I'm trying to do, here is the definition > in R > > of the kernel that I'm trying to migrate [2] (page 124). Here is an > example > > of how to use the kernel in R to obtain some results [3]. And here is the > > link to download the R source code [4]. > > > > Thanks in advance, > > V?ctor. > > > > [1] > > > http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html > > [2] http://cran.r-project.org/web/packages/adehabitat/adehabitat.pdf > > [3] https://trac.faunalia.it/animove/wiki/AnimoveHowto#Kernelhomerange > > [4] http://cran.r-project.org/web/packages/adehabitat/index.html > > statsmodels has LSCV for kernel density estimation > > http://statsmodels.sourceforge.net/notebooks/nonparametric.html > there are tutorial examples for the univariate case > > http://statsmodels.sourceforge.net/notebooks/examples/notebooks/generated/kernel_density.html > > this requires a recent master version of statsmodels, or a soon to be > released version. > > Josef > > > > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dan.stowell at eecs.qmul.ac.uk Fri May 3 11:18:04 2013 From: dan.stowell at eecs.qmul.ac.uk (Dan Stowell) Date: Fri, 03 May 2013 16:18:04 +0100 Subject: [SciPy-User] multi-dimensional scaling In-Reply-To: References: <51837C87.6030109@eecs.qmul.ac.uk> Message-ID: <5183D52C.8040701@eecs.qmul.ac.uk> On 03/05/13 13:39, Nelle Varoquaux wrote: > > > I'm looking in scipy for something to perform multi-dimensional > > scaling*. I don't see anything - have I missed it? Is it easy to > make it > > from scipy components? > > > > Thanks > > Dan > > > > * http://en.wikipedia.org/wiki/Multidimensional_scaling > > > MDS is more a class of approaches than a specific algorithm. If you > want to do "classic" MDS with euclidian distances as the metric, > then you would use PCA to implement that: > http://stats.stackexchange.com/questions/14002/whats-the-difference-between-principal-components-analysis-and-multidimensional > > And PCA is just a simple eigendecomposition that you can build from > the basic linear algebra tools in numpy. I'm happy to send over the > short wrapper code I wrote to do "PCA" on data in a vaguely smart > way, if you want. > > > There are both the classical MDS (smacof algorithm) and NMDS (non > metric) in scikit-learn (and PCA :) ). Thanks both. In my case, for each pair of points I have a list of binary values representing match-or-no-match, so I will use Hamming distance rather than Euclidean. It looks like sklearn.manifold.MDS(metric=False) will do the job for me. Just need to update my installation of sklearn to 0.12+... By the way: the example here uses a variable called "similarities", but most of the way through they are really dissimilarities, and then later (AFTER their use in mds) converted to similarities - a touch confusing. Also, the documentation for fit_transform() here just uses "X" and "Input data" and doesn't explicitly say whether it expects similarities or dissimilarities. It would really help if the documentation was a little clearer about that. (I think it wants dissimilarities - please correct me if I'm wrong...) Thanks Dan -- Dan Stowell Postdoctoral Research Assistant Centre for Digital Music Queen Mary, University of London Mile End Road, London E1 4NS http://www.elec.qmul.ac.uk/digitalmusic/people/dans.htm http://www.mcld.co.uk/ From nelle.varoquaux at gmail.com Fri May 3 11:20:03 2013 From: nelle.varoquaux at gmail.com (Nelle Varoquaux) Date: Fri, 3 May 2013 17:20:03 +0200 Subject: [SciPy-User] multi-dimensional scaling In-Reply-To: <5183D52C.8040701@eecs.qmul.ac.uk> References: <51837C87.6030109@eecs.qmul.ac.uk> <5183D52C.8040701@eecs.qmul.ac.uk> Message-ID: On 3 May 2013 17:18, Dan Stowell wrote: > On 03/05/13 13:39, Nelle Varoquaux wrote: > > > > > I'm looking in scipy for something to perform multi-dimensional > > > scaling*. I don't see anything - have I missed it? Is it easy to > > make it > > > from scipy components? > > > > > > Thanks > > > Dan > > > > > > * http://en.wikipedia.org/wiki/Multidimensional_scaling > > > > > > MDS is more a class of approaches than a specific algorithm. If you > > want to do "classic" MDS with euclidian distances as the metric, > > then you would use PCA to implement that: > > > http://stats.stackexchange.com/questions/14002/whats-the-difference-between-principal-components-analysis-and-multidimensional > > > > And PCA is just a simple eigendecomposition that you can build from > > the basic linear algebra tools in numpy. I'm happy to send over the > > short wrapper code I wrote to do "PCA" on data in a vaguely smart > > way, if you want. > > > > > > There are both the classical MDS (smacof algorithm) and NMDS (non > > metric) in scikit-learn (and PCA :) ). > > Thanks both. In my case, for each pair of points I have a list of binary > values representing match-or-no-match, so I will use Hamming distance > rather than Euclidean. It looks like sklearn.manifold.MDS(metric=False) > will do the job for me. Just need to update my installation of sklearn > to 0.12+... > > By the way: the example here > > uses a variable called "similarities", but most of the way through they > are really dissimilarities, and then later (AFTER their use in mds) > converted to similarities - a touch confusing. > > Also, the documentation for fit_transform() here > < > http://scikit-learn.sourceforge.net/dev/modules/generated/sklearn.manifold.MDS.html > > > just uses "X" and "Input data" and doesn't explicitly say whether it > expects similarities or dissimilarities. It would really help if the > documentation was a little clearer about that. (I think it wants > dissimilarities - please correct me if I'm wrong...) > I'll try to improve the documentation on the MDS in the near future. Thanks for the feedback. Cheers, N > > Thanks > Dan > > -- > Dan Stowell > Postdoctoral Research Assistant > Centre for Digital Music > Queen Mary, University of London > Mile End Road, London E1 4NS > http://www.elec.qmul.ac.uk/digitalmusic/people/dans.htm > http://www.mcld.co.uk/ > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dan.stowell at eecs.qmul.ac.uk Fri May 3 11:33:58 2013 From: dan.stowell at eecs.qmul.ac.uk (Dan Stowell) Date: Fri, 03 May 2013 16:33:58 +0100 Subject: [SciPy-User] multi-dimensional scaling In-Reply-To: References: <51837C87.6030109@eecs.qmul.ac.uk> <5183D52C.8040701@eecs.qmul.ac.uk> Message-ID: <5183D8E6.7020208@eecs.qmul.ac.uk> On 03/05/13 16:20, Nelle Varoquaux wrote: > > > > On 3 May 2013 17:18, Dan Stowell > wrote: > > On 03/05/13 13:39, Nelle Varoquaux wrote: > > > > > I'm looking in scipy for something to perform > multi-dimensional > > > scaling*. I don't see anything - have I missed it? Is it > easy to > > make it > > > from scipy components? > > > > > > Thanks > > > Dan > > > > > > * http://en.wikipedia.org/wiki/Multidimensional_scaling > > > > > > MDS is more a class of approaches than a specific algorithm. > If you > > want to do "classic" MDS with euclidian distances as the metric, > > then you would use PCA to implement that: > > > http://stats.stackexchange.com/questions/14002/whats-the-difference-between-principal-components-analysis-and-multidimensional > > > > And PCA is just a simple eigendecomposition that you can > build from > > the basic linear algebra tools in numpy. I'm happy to send > over the > > short wrapper code I wrote to do "PCA" on data in a vaguely smart > > way, if you want. > > > > > > There are both the classical MDS (smacof algorithm) and NMDS (non > > metric) in scikit-learn (and PCA :) ). > > Thanks both. In my case, for each pair of points I have a list of binary > values representing match-or-no-match, so I will use Hamming distance > rather than Euclidean. It looks like sklearn.manifold.MDS(metric=False) > will do the job for me. Just need to update my installation of sklearn > to 0.12+... > > By the way: the example here > > uses a variable called "similarities", but most of the way through they > are really dissimilarities, and then later (AFTER their use in mds) > converted to similarities - a touch confusing. > > Also, the documentation for fit_transform() here > > just uses "X" and "Input data" and doesn't explicitly say whether it > expects similarities or dissimilarities. It would really help if the > documentation was a little clearer about that. (I think it wants > dissimilarities - please correct me if I'm wrong...) > > > I'll try to improve the documentation on the MDS in the near future. > Thanks for the feedback. And thanks for the code! I've written a simple example script, but I must be doing something wrong. It simply generates two classes of data points then does NMDS on them, but contrary to my expectations it doesn't cluster the two classes separately from each other in the solution. If you have any hints I'd be grateful. Thanks Dan -------------- next part -------------- A non-text attachment was scrubbed... Name: mds_test.py Type: text/x-python Size: 1441 bytes Desc: not available URL: From pierre.raybaut at gmail.com Fri May 3 15:48:25 2013 From: pierre.raybaut at gmail.com (Pierre Raybaut) Date: Fri, 3 May 2013 21:48:25 +0200 Subject: [SciPy-User] ANN: New WinPython with Python 2.7.4 and 3.3.1 (32/64bit) Message-ID: Hi all, I am pleased to announce that four new versions of WinPython have been released yesterday with Python 2.7.4 and 3.3.1, 32 and 64 bits. Many packages have been added or upgraded (see the automatically-generated changelogs). Special thanks to Christoph Gohlke for building most of the binary packages bundled in WinPython. WinPython is a free open-source portable distribution of Python for Windows, designed for scientists. It is a full-featured (see http://code.google.com/p/winpython/wiki/PackageIndex) Python-based scientific environment: * Designed for scientists (thanks to the integrated libraries NumPy, SciPy, Matplotlib, guiqwt, etc.: * Regular *scientific users*: interactive data processing and visualization using Python with Spyder * *Advanced scientific users and software developers*: Python applications development with Spyder, version control with Mercurial and other development tools (like gettext) * *Portable*: preconfigured, it should run out of the box on any machine under Windows (without any installation requirements) and the folder containing WinPython can be moved to any location (local, network or removable drive) * *Flexible*: one can install (or should I write "use" as it's portable) as many WinPython versions as necessary (like isolated and self-consistent environments), even if those versions are running different versions of Python (2.7, 3.3) or different architectures (32bit or 64bit) on the same machine * *Customizable*: using the integrated package manager (wppm, as WinPython Package Manager), it's possible to install, uninstall or upgrade Python packages (see http://code.google.com/p/winpython/wiki/WPPM for more details on supported package formats). *WinPython is not an attempt to replace Python(x,y)*, this is just something different (see http://code.google.com/p/winpython/wiki/Roadmap): more flexible, easier to maintain, movable and less invasive for the OS, but certainly less user-friendly, with less packages/contents and without any integration to Windows explorer [*]. [*] Actually there is an optional integration into Windows explorer, providing the same features as the official Python installer regarding file associations and context menu entry (this option may be activated through the WinPython Control Panel), and adding shortcuts to Windows Start menu. Enjoy! -------------- next part -------------- An HTML attachment was scrubbed... URL: From alec.kalinin at gmail.com Sun May 5 07:18:50 2013 From: alec.kalinin at gmail.com (Alexander Kalinin) Date: Sun, 5 May 2013 15:18:50 +0400 Subject: [SciPy-User] Matrix by matrix vs. matrix by vector multiplications are not the same Message-ID: Hello, Look at this code: import numpy as np N = 500 A = np.random.rand(N, N) B = np.random.rand(N, N) c1 = np.dot(A, B)[:, 0] c2 = np.dot(A, B[:, 0]) print np.linalg.norm(c1 - c2) The output is: 3.92795839192e-13 For me it is little bit strange that the results are not the same. Does it mean that matrix by matrix and matrix by vector multiplications algorithms are different? Sincerely, Alexander -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sun May 5 08:00:17 2013 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 05 May 2013 15:00:17 +0300 Subject: [SciPy-User] Matrix by matrix vs. matrix by vector multiplications are not the same In-Reply-To: References: Message-ID: 05.05.2013 14:18, Alexander Kalinin kirjoitti: [clip] > For me it is little bit strange that the results are not the same. Does > it mean that matrix by matrix and matrix by vector multiplications > algorithms are different? Correct, they are different. BLAS libraries typically try to optimize for speed and the most appropriate algorithms for each case are different. The speed increase gained by this can be quite significant. The underlying issue is that in floating point, addition and multiplication are not associative operations, so that mathematically equivalent algorithms may produce somewhat different results due to different accumulation of rounding error. -- Pauli Virtanen From alec.kalinin at gmail.com Mon May 6 03:01:02 2013 From: alec.kalinin at gmail.com (Alexander Kalinin) Date: Mon, 6 May 2013 11:01:02 +0400 Subject: [SciPy-User] Matrix by matrix vs. matrix by vector multiplications are not the same In-Reply-To: References: Message-ID: Thank you for explanation. Alexander. On Sun, May 5, 2013 at 4:00 PM, Pauli Virtanen wrote: > 05.05.2013 14:18, Alexander Kalinin kirjoitti: > [clip] > > For me it is little bit strange that the results are not the same. Does > > it mean that matrix by matrix and matrix by vector multiplications > > algorithms are different? > > Correct, they are different. BLAS libraries typically try to optimize > for speed and the most appropriate algorithms for each case are > different. The speed increase gained by this can be quite significant. > > The underlying issue is that in floating point, addition and > multiplication are not associative operations, so that mathematically > equivalent algorithms may produce somewhat different results due to > different accumulation of rounding error. > > -- > Pauli Virtanen > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wuzzyview at gmail.com Tue May 7 09:57:04 2013 From: wuzzyview at gmail.com (Ahmed Fasih) Date: Tue, 7 May 2013 09:57:04 -0400 Subject: [SciPy-User] Use numpy.random.RandomState with scipy.stats Message-ID: I really like the random variable objects in scipy.stats, e.g., scipy.stats.norm, which let me store the parameters of a distribution and sample from it. I see that, to get repeatable results, scipy.stats documentation suggests setting the default global Numpy random seed via numpy.random.seed. The rest of my application uses individual random streams generated from numpy.random.RandomState, since I may have multiple streams. Is there any way to get my scipy.stats random variable objects to use a user-supplied RandomState object instead of the default numpy stream? If not, would this qualify as a (minor) bug? Thanks, Ahmed -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue May 7 10:43:23 2013 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 7 May 2013 15:43:23 +0100 Subject: [SciPy-User] Use numpy.random.RandomState with scipy.stats In-Reply-To: References: Message-ID: On Tue, May 7, 2013 at 2:57 PM, Ahmed Fasih wrote: > I really like the random variable objects in scipy.stats, e.g., > scipy.stats.norm, which let me store the parameters of a distribution and > sample from it. I see that, to get repeatable results, scipy.stats > documentation suggests setting the default global Numpy random seed via > numpy.random.seed. The rest of my application uses individual random streams > generated from numpy.random.RandomState, since I may have multiple streams. > Is there any way to get my scipy.stats random variable objects to use a > user-supplied RandomState object instead of the default numpy stream? If > not, would this qualify as a (minor) bug? Not quite a bug. It is a long-sought missing feature. -- Robert Kern From kmtac at live.com Tue May 7 13:03:43 2013 From: kmtac at live.com (. kt) Date: Tue, 7 May 2013 13:03:43 -0400 Subject: [SciPy-User] Planet Scipy: Organization and Interesting Multi-Axis Plot Message-ID: Hi, I've been using various scientific python tools for a couple years and have found them very useful. The blogs at planet.scipy.org have also been useful, A few months back, there was a blog post about a decomposition analysis (PCA or ICA, maybe) that included a multi-axis (6- or 8-?) plot. Unfortunately, I forgot the title and name of the blogger. Could anyone point me to the name of the blogger or send a link? Thanks! In general, I was wondering if it would be possible to organize links to the planet scipy blog posts by subject as well as by poster. Sometimes I'll skim a post and mentally note that it's interesting, but not have a need for it. Then, a couple months later, I'll realize that I'd like to use the techniques described in the post. At times like this, it'd be really useful to have the posts categorized by subject. I'd think others would find this useful as well. What do you think? Btw, I'm volunteering. Thanks, Kathy -------------- next part -------------- An HTML attachment was scrubbed... URL: From scipy-user at sabonrai.com Wed May 8 10:20:15 2013 From: scipy-user at sabonrai.com (Terry Westley) Date: Wed, 8 May 2013 10:20:15 -0400 Subject: [SciPy-User] Bus error 10 in statically linked scipy.test() Message-ID: Please let me know if there's a better mailing list for this question. This is not really a user nor a dev question, but I'm starting here... I'm trying to build a statically linked version of python interpreter with numpy and scipy. Numpy.test() runs to completion: FAILED (KNOWNFAIL=5, SKIP=25, errors=1). A shared library built with the same components results in "OK (KNOWNFAIL=5, SKIP=6)" so there's clearly a problem here. But I'm ignoring this error for now because scipy error is more serious: Bus Error 10. It gets a bus error without displaying any results whatsoever, so I've run a trace version of the tests as follows: import trace, scipy tracer = trace.Trace(count=False, trace=True) tracer.run('scipy.test("full", 10)') The trace is 152,000 lines so no way am I going to include that in this email. If you're interested and can help me: - text file, http://www.sabonrai.com/scipy-user/scipy-test-bus-error.txt, 7.34 MB - zipped, http://www.sabonrai.com/scipy-user/scipy-test-bus-error.zip, 321 KB I can't make heads or tails of this trace output; I don't really know anything about scipy self test. It seems to fail shortly after importing vq and hierarchy from cluster. Environment: - Mac OS X 10.7.5 - Python 2.7.3 - Numpy 1.7.0 - Scipy 0.12.0 - Nose 1.3.0 - clang: Apple LLVM version 4.2 (clang-425.0.28) (based on LLVM 3.2svn), Target: x86_64-apple-darwin11.4.2 - clang++: Apple LLVM version 4.2 (clang-425.0.28) (based on LLVM 3.2svn), Target: x86_64-apple-darwin11.4.2 - gfortran: GNU Fortran (MacPorts gcc46 4.6.3_9) 4.6.3 Thanks for any suggestions. --Terry P.S. Yes, I've wondered if the numpy error is causing the scipy error. This is the end of the numpy.test() run: ====================================================================== ERROR: Failure: AttributeError ('module' object has no attribute '__file__') ---------------------------------------------------------------------- Traceback (most recent call last): File "/pym/lib/python2.7/site-packages/nose/loader.py", line 413, in loadTestsFromName addr.filename, addr.module) File "/pym/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath return self.importFromDir(dir_path, fqname) File "/pym/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/pym/lib/python2.7/site-packages/numpy/tests/test_ctypeslib.py", line 9, in cdll = load_library('multiarray', np.core.multiarray.__file__) AttributeError: 'module' object has no attribute '__file__' ---------------------------------------------------------------------- Ran 4437 tests in 66.937s FAILED (KNOWNFAIL=5, SKIP=25, errors=1) -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Wed May 8 10:37:35 2013 From: cournape at gmail.com (David Cournapeau) Date: Wed, 8 May 2013 15:37:35 +0100 Subject: [SciPy-User] Bus error 10 in statically linked scipy.test() In-Reply-To: References: Message-ID: On Wed, May 8, 2013 at 3:20 PM, Terry Westley wrote: > Please let me know if there's a better mailing list for this question. This > is not really a user nor a dev question, but I'm starting here... > > I'm trying to build a statically linked version of python interpreter with > numpy and scipy. > > Numpy.test() runs to completion: FAILED (KNOWNFAIL=5, SKIP=25, errors=1). A > shared library built with the same components results in "OK (KNOWNFAIL=5, > SKIP=6)" so there's clearly a problem here. > > But I'm ignoring this error for now because scipy error is more serious: Bus > Error 10. It gets a bus error without displaying any results whatsoever, so > I've run a trace version of the tests as follows: > > import trace, scipy > tracer = trace.Trace(count=False, trace=True) > tracer.run('scipy.test("full", 10)') > > The trace is 152,000 lines so no way am I going to include that in this > email. If you're interested and can help me: > - text file, http://www.sabonrai.com/scipy-user/scipy-test-bus-error.txt, > 7.34 MB > - zipped, http://www.sabonrai.com/scipy-user/scipy-test-bus-error.zip, 321 > KB > > I can't make heads or tails of this trace output; I don't really know > anything about scipy self test. It seems to fail shortly after importing vq > and hierarchy from cluster. > > Environment: > - Mac OS X 10.7.5 > - Python 2.7.3 > - Numpy 1.7.0 > - Scipy 0.12.0 > - Nose 1.3.0 > - clang: Apple LLVM version 4.2 (clang-425.0.28) (based on LLVM 3.2svn), > Target: x86_64-apple-darwin11.4.2 > - clang++: Apple LLVM version 4.2 (clang-425.0.28) (based on LLVM 3.2svn), > Target: x86_64-apple-darwin11.4.2 > - gfortran: GNU Fortran (MacPorts gcc46 4.6.3_9) 4.6.3 clang is known to cause issue when building scipy, though it is not clear whether this is a scipy bug or a clang bug. I suggest you try building with gcc instead. cheers, David From pierre.raybaut at gmail.com Wed May 8 11:15:20 2013 From: pierre.raybaut at gmail.com (Pierre Raybaut) Date: Wed, 8 May 2013 17:15:20 +0200 Subject: [SciPy-User] ANN: Spyder v2.2 Message-ID: Hi all, On the behalf of Spyder's development team ( http://code.google.com/p/spyderlib/people/list), I'm pleased to announce that Spyder v2.2 has been released and is available for Windows XP/Vista/7/8, GNU/Linux and MacOS X: http://code.google.com/p/spyderlib/. This release represents 18 months of development since v2.1 and introduces major enhancements and new features: * Full support for IPython v0.13, including the ability to attach to existing kernels * New MacOS X application * Much improved debugging experience * Various editor improvements for code completion, zooming, auto insertion, and syntax highlighting * Better looking and faster Object Inspector * Single instance mode * Spanish tranlation of the interface * And many other changes: http://code.google.com/p/spyderlib/wiki/ChangeLog This is the last release to support Python 2.5: * Spyder 2.2 supports Python 2.5 to 2.7 * Spyder 2.3 will support Python 2.7 and Python 3 * (Spyder 2.1.14dev4 is a development release which already supports Python 3) See also https://code.google.com/p/spyderlib/downloads/list. Spyder is a free, open-source (MIT license) interactive development environment for the Python language with advanced editing, interactive testing, debugging and introspection features. Originally designed to provide MATLAB-like features (integrated help, interactive console, variable explorer with GUI-based editors for dictionaries, NumPy arrays, ...), it is strongly oriented towards scientific computing and software development. Thanks to the `spyderlib` library, Spyder also provides powerful ready-to-use widgets: embedded Python console (example: http://packages.python.org/guiqwt/_images/sift3.png), NumPy array editor (example: http://packages.python.org/guiqwt/_images/sift2.png), dictionary editor, source code editor, etc. Description of key features with tasty screenshots can be found at: http://code.google.com/p/spyderlib/wiki/Features Don't forget to follow Spyder updates/news: * on the project website: http://code.google.com/p/spyderlib/ * and on our official blog: http://spyder-ide.blogspot.com/ Last, but not least, we welcome any contribution that helps making Spyder an efficient scientific development/computing environment. Join us to help creating your favourite environment! (http://code.google.com/p/spyderlib/wiki/NoteForContributors) Enjoy! -Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.raybaut at gmail.com Wed May 8 11:26:50 2013 From: pierre.raybaut at gmail.com (Pierre Raybaut) Date: Wed, 8 May 2013 17:26:50 +0200 Subject: [SciPy-User] ANN: WinPython 2.7.4.1 and 3.3.1.1, 32 and 64bit Message-ID: Hi all, I am pleased to announce that four new versions of WinPython have been released yesterday with Python 2.7.4 and 3.3.1, 32 and 64 bits. Many packages have been added or upgraded (see the automatically-generated changelogs): * WinPython 2.7.4.1 32bit, 64bit -- including new Spyder v2.2: https://code.google.com/p/winpython/wiki/ChangeLog_27 * WinPython 3.3.1.1 32bit, 64bit: https://code.google.com/p/winpython/wiki/ChangeLog_33 Special thanks to Christoph Gohlke for building most of the binary packages bundled in WinPython. WinPython is a free open-source portable distribution of Python for Windows, designed for scientists. It is a full-featured (see http://code.google.com/p/winpython/wiki/PackageIndex) Python-based scientific environment: * Designed for scientists (thanks to the integrated libraries NumPy, SciPy, Matplotlib, guiqwt, etc.: * Regular *scientific users*: interactive data processing and visualization using Python with Spyder * *Advanced scientific users and software developers*: Python applications development with Spyder, version control with Mercurial and other development tools (like gettext) * *Portable*: preconfigured, it should run out of the box on any machine under Windows (without any installation requirements) and the folder containing WinPython can be moved to any location (local, network or removable drive) * *Flexible*: one can install (or should I write "use" as it's portable) as many WinPython versions as necessary (like isolated and self-consistent environments), even if those versions are running different versions of Python (2.7, 3.3) or different architectures (32bit or 64bit) on the same machine * *Customizable*: using the integrated package manager (wppm, as WinPython Package Manager), it's possible to install, uninstall or upgrade Python packages (see http://code.google.com/p/winpython/wiki/WPPM for more details on supported package formats). *WinPython is not an attempt to replace Python(x,y)*, this is just something different (see http://code.google.com/p/winpython/wiki/Roadmap): more flexible, easier to maintain, movable and less invasive for the OS, but certainly less user-friendly, with less packages/contents and without any integration to Windows explorer [*]. [*] Actually there is an optional integration into Windows explorer, providing the same features as the official Python installer regarding file associations and context menu entry (this option may be activated through the WinPython Control Panel), and adding shortcuts to Windows Start menu. Enjoy! -------------- next part -------------- An HTML attachment was scrubbed... URL: From lanceboyle at qwest.net Thu May 9 04:58:30 2013 From: lanceboyle at qwest.net (Jerry) Date: Thu, 9 May 2013 01:58:30 -0700 Subject: [SciPy-User] Where are GUI apps for ports installed? Message-ID: <8C753D58-CF13-48F3-9566-6EC60D8548D5@qwest.net> Some ports install a double-clickable binary in /Applications/MacPorts. However, the only things installed in that location are on my system are AquaTerm, Pallet, some python 2.7 auxiliary stuff, and something related to Qt4 called qmlplugindump. I have the Spyder port installed and there is a binary for it but it does not install in this location but rather apparently in /opt/local/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python with a symlink in /opt/local/bin to another small file which is a short python script which launches it. A different situation is cmake. When I install this from another place (probably the official cmake developer site), there is a GUI application placed in /Applications. However, the MacPorts version apparently does not install a GUI app anywhere. I find this a little disconcerting. Is there a MacPorts policy about these kinds of things? Jerry From lanceboyle at qwest.net Thu May 9 04:59:04 2013 From: lanceboyle at qwest.net (Jerry) Date: Thu, 9 May 2013 01:59:04 -0700 Subject: [SciPy-User] Is Macports version of Spyder 2.2.0 same as "official" release? Message-ID: Hi, I see that Spyder 2.2.0 (Scientific PYthon Development EnviRonment) is out. There is an official .dmg distribution available from the project's home page. According to the list of changes for 2.2.0 http://code.google.com/p/spyderlib/wiki/ChangeLog is this: "The App comes with its own interpreter, which has the main Python scientific libraries preinstalled: Numpy, SciPy, Matplotlib, IPython, Pandas, Sympy, Scikit-learn and Scikit-image." Does the MacPorts version of Spyder 2.2.0 include all of these add-ons? I hate duplicating things such as Python interpreters, Numpy, etc. all over the place if I can do it once with MacPorts so I would much prefer to use the MacPorts version if possible. Also, the MacPorts page lists both py-spyder 2.2.0 and py27-spyder 2.2.0. Are these the same? Thanks, Jerry From lanceboyle at qwest.net Thu May 9 05:20:01 2013 From: lanceboyle at qwest.net (Jerry) Date: Thu, 9 May 2013 02:20:01 -0700 Subject: [SciPy-User] ANN: Spyder v2.2 In-Reply-To: References: Message-ID: <73841731-12BE-4777-BC6E-0096BB585287@qwest.net> On May 8, 2013, at 8:15 AM, Pierre Raybaut wrote: > Hi all, > > On the behalf of Spyder's development team (http://code.google.com/p/spyderlib/people/list), I'm pleased to announce that Spyder v2.2 has been released and is available for Windows XP/Vista/7/8, GNU/Linux and MacOS X: http://code.google.com/p/spyderlib/. > > This release represents 18 months of development since v2.1 and introduces major enhancements and new features: > * Full support for IPython v0.13, including the ability to attach to existing kernels > * New MacOS X application > * Much improved debugging experience > * Various editor improvements for code completion, zooming, auto insertion, and syntax highlighting > * Better looking and faster Object Inspector > * Single instance mode > * Spanish tranlation of the interface > * And many other changes: http://code.google.com/p/spyderlib/wiki/ChangeLog > > This is the last release to support Python 2.5: > * Spyder 2.2 supports Python 2.5 to 2.7 > * Spyder 2.3 will support Python 2.7 and Python 3 > * (Spyder 2.1.14dev4 is a development release which already supports Python 3) > See also https://code.google.com/p/spyderlib/downloads/list. > > Spyder is a free, open-source (MIT license) interactive development environment for the Python language with advanced editing, interactive testing, debugging and introspection features. Originally designed to provide MATLAB-like features (integrated help, interactive console, variable explorer with GUI-based editors for dictionaries, NumPy arrays, ...), it is strongly oriented towards scientific computing and software development. Thanks to the `spyderlib` library, Spyder also provides powerful ready-to-use widgets: embedded Python console (example: http://packages.python.org/guiqwt/_images/sift3.png), NumPy array editor (example: http://packages.python.org/guiqwt/_images/sift2.png), dictionary editor, source code editor, etc. > > Description of key features with tasty screenshots can be found at: > http://code.google.com/p/spyderlib/wiki/Features > > Don't forget to follow Spyder updates/news: > * on the project website: http://code.google.com/p/spyderlib/ > * and on our official blog: http://spyder-ide.blogspot.com/ > > Last, but not least, we welcome any contribution that helps making Spyder an efficient scientific development/computing environment. Join us to help creating your favourite environment! > (http://code.google.com/p/spyderlib/wiki/NoteForContributors) > > Enjoy! > -Pierre > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user Sorry?fails to launch on my OS X 10.7.5. http://code.google.com/p/spyderlib/issues/detail?id=1383 Jerry -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu May 9 05:29:20 2013 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 9 May 2013 10:29:20 +0100 Subject: [SciPy-User] Where are GUI apps for ports installed? In-Reply-To: <8C753D58-CF13-48F3-9566-6EC60D8548D5@qwest.net> References: <8C753D58-CF13-48F3-9566-6EC60D8548D5@qwest.net> Message-ID: On Thu, May 9, 2013 at 9:58 AM, Jerry wrote: > Some ports install a double-clickable binary in /Applications/MacPorts. > > However, the only things installed in that location are on my system are AquaTerm, Pallet, some python 2.7 auxiliary stuff, and something related to Qt4 called qmlplugindump. > > I have the Spyder port installed and there is a binary for it but it does not install in this location but rather apparently in The MacPorts mailing lists may be found here: http://trac.macports.org/wiki/MailingLists The Spyder mailing list may be found here: https://groups.google.com/forum/?fromgroups#!forum/spyderlib -- Robert Kern From tlinnet at gmail.com Thu May 9 09:19:43 2013 From: tlinnet at gmail.com (=?ISO-8859-1?Q?Troels_Emtek=E6r_Linnet?=) Date: Thu, 9 May 2013 15:19:43 +0200 Subject: [SciPy-User] ANN: WinPython 2.7.4.1 and 3.3.1.1, 32 and 64bit In-Reply-To: References: Message-ID: What would be the free similar python distribution on linux? I am only aware of the (acedemic-free) Enthought distribution. Best Troels Emtek?r Linnet 2013/5/8 Pierre Raybaut > Hi all, > > I am pleased to announce that four new versions of WinPython have been > released yesterday with Python 2.7.4 and 3.3.1, 32 and 64 bits. Many > packages have been added or upgraded (see the automatically-generated > changelogs): > * WinPython 2.7.4.1 32bit, 64bit -- including new Spyder v2.2: > https://code.google.com/p/winpython/wiki/ChangeLog_27 > * WinPython 3.3.1.1 32bit, 64bit: > https://code.google.com/p/winpython/wiki/ChangeLog_33 > > Special thanks to Christoph Gohlke for building most of the binary > packages bundled in WinPython. > > WinPython is a free open-source portable distribution of Python > for Windows, designed for scientists. > > It is a full-featured (see > http://code.google.com/p/winpython/wiki/PackageIndex) > Python-based scientific environment: > * Designed for scientists (thanks to the integrated libraries > NumPy, SciPy, Matplotlib, guiqwt, etc.: > * Regular *scientific users*: interactive data processing > and visualization using Python with Spyder > * *Advanced scientific users and software developers*: > Python applications development with Spyder, version control with > Mercurial and other development tools (like gettext) > * *Portable*: preconfigured, it should run out of the box on any machine > under Windows (without any installation requirements) and the folder > containing WinPython can be moved to any location (local, network or > removable drive) > * *Flexible*: one can install (or should I write "use" as it's portable) > as many WinPython versions as necessary (like isolated and self-consistent > environments), even if those versions are running different versions of > Python (2.7, 3.3) or different architectures (32bit or 64bit) on the same > machine > * *Customizable*: using the integrated package manager (wppm, > as WinPython Package Manager), it's possible to install, uninstall > or upgrade Python packages (see > http://code.google.com/p/winpython/wiki/WPPM for more details > on supported package formats). > > *WinPython is not an attempt to replace Python(x,y)*, this is > just something different (see > http://code.google.com/p/winpython/wiki/Roadmap): more flexible, easier > to maintain, movable and less invasive for the OS, but certainly less > user-friendly, with less packages/contents and without any integration to > Windows explorer [*]. > > [*] Actually there is an optional integration into Windows > explorer, providing the same features as the official Python installer > regarding file associations and context menu entry (this option may be > activated through the WinPython Control Panel), and adding shortcuts to > Windows Start menu. > > Enjoy! > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From takowl at gmail.com Thu May 9 12:53:17 2013 From: takowl at gmail.com (Thomas Kluyver) Date: Thu, 9 May 2013 17:53:17 +0100 Subject: [SciPy-User] ANN: WinPython 2.7.4.1 and 3.3.1.1, 32 and 64bit In-Reply-To: References: Message-ID: On 9 May 2013 14:19, Troels Emtek?r Linnet wrote: > What would be the free similar python distribution on linux? There's arguably less need for it on Linux, because it's convenient to manage it through the system package manager. But if you do want a Python distribution: - Anaconda is pretty comprehensive for free (Continuum sell some extras) - EPD has a free option, 'EPD Free', though it's smaller and lighter - Pyzo is a new distribution based on Python 3 The new Scipy website has a list of Scipy distributions: http://new.scipy.org/install.html Best wishes, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu May 9 13:02:18 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 9 May 2013 13:02:18 -0400 Subject: [SciPy-User] ANN: WinPython 2.7.4.1 and 3.3.1.1, 32 and 64bit In-Reply-To: References: Message-ID: On Thu, May 9, 2013 at 12:53 PM, Thomas Kluyver wrote: > On 9 May 2013 14:19, Troels Emtek?r Linnet wrote: >> >> What would be the free similar python distribution on linux? > > > There's arguably less need for it on Linux, because it's convenient to > manage it through the system package manager. But if you do want a Python > distribution: > > - Anaconda is pretty comprehensive for free (Continuum sell some extras) > - EPD has a free option, 'EPD Free', though it's smaller and lighter > - Pyzo is a new distribution based on Python 3 - https://code.google.com/p/pythonxy-linux/ with even some daily builds https://launchpad.net/~pythonxy/+archive/pythonxy-devel It's only the "stick" that is only for Windows. Josef > > The new Scipy website has a list of Scipy distributions: > http://new.scipy.org/install.html > > Best wishes, > Thomas > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From arnaldorusso at gmail.com Thu May 9 16:20:54 2013 From: arnaldorusso at gmail.com (Arnaldo Russo) Date: Thu, 9 May 2013 17:20:54 -0300 Subject: [SciPy-User] efficiency of the simplex routine: R (optim) vs scipy.optimize.fmin In-Reply-To: References: Message-ID: Hi, Sorry for crossposting, but I'm trying to replace my "optim" function inside R, converting it to Python using scipy.optimize.fmin and leastsq. In Python, my results are aproximately the same, when the function is setted to run with the right parameters. I have read that fmin needs as a result a single value, so I have to np.sum the return value and with leastsq I don't have to. But, when compared with R results of the same function ( "optim" function), the results are much different. Example below: # Inside python import numpy as np from scipy.optimize import leastsq, fmin def pp_min(params, x, y): alpha = params[0] Beta = params[1] gama = params[2] model = ( (y-gama*(1-np.exp(-alpha*x/gama))*np.exp(-Beta*x/gama))**2) return model params = (0.4, 1.5 , 80) x = np.array([0, 1, 8, 16, 64, 164, 264, 364, 464, 564, 664, 764, 864, 964, 1064, 1164]) y = np.array([0., 0.2436, 1.6128, 2.8224, 6.72, 15.1536, 22.176, 30.576, 31.1808, 40.2696, 47.4096, 41.7144, 61.6896, 56.6832, 62.5632, 63.5544]) pp = leastsq(pp_min, params, args=(x, y)) In [2]: pp Out[2]: (array([ 8.48990490e-02, -1.56537197e-02, 6.51505458e+01]), 1) # inside R x <- c(0, 1, 8, 16, 64, 164, 264, 364, 464, 564, 664, 764, 864, 964, 1064, 1164) y <- c(0.0, 0.2436, 1.6128, 2.8224, 6.72, 15.1536, 22.176, 30.576, 31.1808, 40.2696, 47.4096, 41.7144, 61.6896, 56.6832, 62.5632, 63.5544) dat<-as.data.frame(cbind(x, y)) names(dat)<-c("x","y") attach(dat) pp_min<-function(params,data=dat) { alpha<-params[1] Beta<-params[2] gama<-params[3] return( sum( (y-gama*(1-exp(-alpha*x/gama))*exp(-Beta*x/gama))^2)) } pp <- optim(par=c(0.4, 1.5 , 80), fn=pp_min) Out: > pp $par [1] 0.09157204 0.02129695 148.89173924 The third argument is specially much different. Any reasonable explanation? Thank you. --- *Arnaldo D'Amaral Pereira Granja Russo* Lab. de Estudos dos Oceanos e Clima Instituto de Oceanografia - FURG 2012/7/23 servant mathieu > Hi Denis, > Thanks for your response. For the fmin function in scipy, I took the > default ftol and stol values. I'm just trying to minize a chi square > between observed experimental data and simulated data. I've done this in > python and R with the Nelder-Mead algorithm, with exactly the same starting > values. While the solutions produced by R and python are not very > different, R systematicaly produces a lower chi-square after the same > amount of iterations. This may be related to ftol and stol, but I don't > know which value I should give to these parameters.... > Cheers, > Mat > > 2012/7/20 denis > >> Hi Mathieu, >> (months later) two differences among implementations of Nelder-Mead: >> 1) the start simplex: x0 +- what ? It's common to take x0 + a fixed >> (user-specified) stepsize in each dimension. NLOpt takes a "walking >> simplex", don't know what R does >> >> 2) termination: what ftol, xtol did you specify ? NLOpt looks at >> fhi - flo: fhi changes at each iteration, flo is sticky. >> >> Could you post a testcase similar to yours ? >> That would sure be helpful. >> >> cheers >> -- denis >> >> >> On 24/05/2012 10:15, servant mathieu wrote: >> > Dear scipy users, >> > Again a question about optimization. >> > I've just compared the efficiency of the simplex routine in R >> > (optim) vs scipy (fmin), when minimizing a chi-square. fmin is faster >> > than optim, but appears to be less efficient. In R, the value of the >> > function is always minimized step by step (there are of course some >> > exceptions) while there is lot of fluctuations in python. Given that the >> > underlying simplex algorithm is supposed to be the same, which mechanism >> > is responsible for this difference? Is it possible to constrain fmin so >> > it could be more rigorous? >> > Cheers, >> > Mathieu >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu May 9 16:31:25 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 9 May 2013 16:31:25 -0400 Subject: [SciPy-User] efficiency of the simplex routine: R (optim) vs scipy.optimize.fmin In-Reply-To: References: Message-ID: On Thu, May 9, 2013 at 4:20 PM, Arnaldo Russo wrote: > Hi, > > Sorry for crossposting, but I'm trying to replace my "optim" function inside > R, converting it to Python using scipy.optimize.fmin and leastsq. > In Python, my results are aproximately the same, when the function is > setted to run with the right parameters. > I have read that fmin needs as a result a single value, so I have to np.sum > the return value and with leastsq I don't have to. > But, when compared with R results of the same function ( "optim" function), > the results are much different. > Example below: > > # Inside python > import numpy as np > from scipy.optimize import leastsq, fmin > > def pp_min(params, x, y): > alpha = params[0] > Beta = params[1] > gama = params[2] > > model = ( (y-gama*(1-np.exp(-alpha*x/gama))*np.exp(-Beta*x/gama))**2) > > return model > > > params = (0.4, 1.5 , 80) > x = np.array([0, 1, 8, 16, 64, 164, 264, 364, 464, 564, 664, 764, 864, 964, > 1064, 1164]) > y = np.array([0., 0.2436, 1.6128, 2.8224, 6.72, 15.1536, 22.176, 30.576, > 31.1808, 40.2696, 47.4096, 41.7144, 61.6896, 56.6832, 62.5632, 63.5544]) > > pp = leastsq(pp_min, params, args=(x, y)) for leastsq the function should not be squared, taking square and sum is part of the algorithm. drop the **2 in pp_min and it should work from the docstring: x = arg min(sum(func(y)**2,axis=0)) y Josef > > In [2]: pp > Out[2]: (array([ 8.48990490e-02, -1.56537197e-02, 6.51505458e+01]), 1) > > > # inside R > x <- c(0, 1, 8, 16, 64, 164, 264, 364, 464, 564, 664, 764, 864, 964, 1064, > 1164) > > y <- c(0.0, 0.2436, 1.6128, 2.8224, 6.72, 15.1536, 22.176, 30.576, 31.1808, > 40.2696, 47.4096, 41.7144, 61.6896, 56.6832, 62.5632, 63.5544) > > dat<-as.data.frame(cbind(x, y)) > names(dat)<-c("x","y") > attach(dat) > > pp_min<-function(params,data=dat) > { alpha<-params[1] > Beta<-params[2] > gama<-params[3] > return( sum( (y-gama*(1-exp(-alpha*x/gama))*exp(-Beta*x/gama))^2)) > } > > pp <- optim(par=c(0.4, 1.5 , 80), fn=pp_min) > > Out: >> pp > $par > [1] 0.09157204 0.02129695 148.89173924 > > The third argument is specially much different. > Any reasonable explanation? > > Thank you. > > > --- > Arnaldo D'Amaral Pereira Granja Russo > Lab. de Estudos dos Oceanos e Clima > Instituto de Oceanografia - FURG > > > > > 2012/7/23 servant mathieu >> >> Hi Denis, >> Thanks for your response. For the fmin function in scipy, I took the >> default ftol and stol values. I'm just trying to minize a chi square between >> observed experimental data and simulated data. I've done this in python and >> R with the Nelder-Mead algorithm, with exactly the same starting values. >> While the solutions produced by R and python are not very different, R >> systematicaly produces a lower chi-square after the same amount of >> iterations. This may be related to ftol and stol, but I don't know which >> value I should give to these parameters.... >> Cheers, >> Mat >> >> 2012/7/20 denis >>> >>> Hi Mathieu, >>> (months later) two differences among implementations of Nelder-Mead: >>> 1) the start simplex: x0 +- what ? It's common to take x0 + a fixed >>> (user-specified) stepsize in each dimension. NLOpt takes a "walking >>> simplex", don't know what R does >>> >>> 2) termination: what ftol, xtol did you specify ? NLOpt looks at >>> fhi - flo: fhi changes at each iteration, flo is sticky. >>> >>> Could you post a testcase similar to yours ? >>> That would sure be helpful. >>> >>> cheers >>> -- denis >>> >>> >>> On 24/05/2012 10:15, servant mathieu wrote: >>> > Dear scipy users, >>> > Again a question about optimization. >>> > I've just compared the efficiency of the simplex routine in R >>> > (optim) vs scipy (fmin), when minimizing a chi-square. fmin is faster >>> > than optim, but appears to be less efficient. In R, the value of the >>> > function is always minimized step by step (there are of course some >>> > exceptions) while there is lot of fluctuations in python. Given that >>> > the >>> > underlying simplex algorithm is supposed to be the same, which >>> > mechanism >>> > is responsible for this difference? Is it possible to constrain fmin so >>> > it could be more rigorous? >>> > Cheers, >>> > Mathieu >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From arnaldorusso at gmail.com Thu May 9 17:16:10 2013 From: arnaldorusso at gmail.com (Arnaldo Russo) Date: Thu, 9 May 2013 18:16:10 -0300 Subject: [SciPy-User] efficiency of the simplex routine: R (optim) vs scipy.optimize.fmin In-Reply-To: References: Message-ID: Hi Josef, thank you for quick reply. Changing my pp_min, without square, I have the results: Out[4]: (array([ 9.82259647e-02, -1.58338152e-02, 5.03125198e+01]), 1) It's not much different from previously results. Any reason for this? --- *Arnaldo D'Amaral Pereira Granja Russo* Lab. de Estudos dos Oceanos e Clima Instituto de Oceanografia - FURG 2013/5/9 > On Thu, May 9, 2013 at 4:20 PM, Arnaldo Russo > wrote: > > Hi, > > > > Sorry for crossposting, but I'm trying to replace my "optim" function > inside > > R, converting it to Python using scipy.optimize.fmin and leastsq. > > In Python, my results are aproximately the same, when the function is > > setted to run with the right parameters. > > I have read that fmin needs as a result a single value, so I have to > np.sum > > the return value and with leastsq I don't have to. > > But, when compared with R results of the same function ( "optim" > function), > > the results are much different. > > Example below: > > > > # Inside python > > import numpy as np > > from scipy.optimize import leastsq, fmin > > > > def pp_min(params, x, y): > > alpha = params[0] > > Beta = params[1] > > gama = params[2] > > > > model = ( > (y-gama*(1-np.exp(-alpha*x/gama))*np.exp(-Beta*x/gama))**2) > > > > return model > > > > > > params = (0.4, 1.5 , 80) > > x = np.array([0, 1, 8, 16, 64, 164, 264, 364, 464, 564, 664, 764, 864, > 964, > > 1064, 1164]) > > y = np.array([0., 0.2436, 1.6128, 2.8224, 6.72, 15.1536, 22.176, 30.576, > > 31.1808, 40.2696, 47.4096, 41.7144, 61.6896, 56.6832, 62.5632, 63.5544]) > > > > pp = leastsq(pp_min, params, args=(x, y)) > > for leastsq the function should not be squared, taking square and sum > is part of the algorithm. > drop the **2 in pp_min and it should work > > from the docstring: > > x = arg min(sum(func(y)**2,axis=0)) > y > > Josef > > > > > In [2]: pp > > Out[2]: (array([ 8.48990490e-02, -1.56537197e-02, 6.51505458e+01]), > 1) > > > > > > # inside R > > x <- c(0, 1, 8, 16, 64, 164, 264, 364, 464, 564, 664, 764, 864, 964, > 1064, > > 1164) > > > > y <- c(0.0, 0.2436, 1.6128, 2.8224, 6.72, 15.1536, 22.176, 30.576, > 31.1808, > > 40.2696, 47.4096, 41.7144, 61.6896, 56.6832, 62.5632, 63.5544) > > > > dat<-as.data.frame(cbind(x, y)) > > names(dat)<-c("x","y") > > attach(dat) > > > > pp_min<-function(params,data=dat) > > { alpha<-params[1] > > Beta<-params[2] > > gama<-params[3] > > return( sum( (y-gama*(1-exp(-alpha*x/gama))*exp(-Beta*x/gama))^2)) > > } > > > > pp <- optim(par=c(0.4, 1.5 , 80), fn=pp_min) > > > > Out: > >> pp > > $par > > [1] 0.09157204 0.02129695 148.89173924 > > > > The third argument is specially much different. > > Any reasonable explanation? > > > > Thank you. > > > > > > --- > > Arnaldo D'Amaral Pereira Granja Russo > > Lab. de Estudos dos Oceanos e Clima > > Instituto de Oceanografia - FURG > > > > > > > > > > 2012/7/23 servant mathieu > >> > >> Hi Denis, > >> Thanks for your response. For the fmin function in scipy, I took the > >> default ftol and stol values. I'm just trying to minize a chi square > between > >> observed experimental data and simulated data. I've done this in python > and > >> R with the Nelder-Mead algorithm, with exactly the same starting values. > >> While the solutions produced by R and python are not very different, R > >> systematicaly produces a lower chi-square after the same amount of > >> iterations. This may be related to ftol and stol, but I don't know which > >> value I should give to these parameters.... > >> Cheers, > >> Mat > >> > >> 2012/7/20 denis > >>> > >>> Hi Mathieu, > >>> (months later) two differences among implementations of Nelder-Mead: > >>> 1) the start simplex: x0 +- what ? It's common to take x0 + a fixed > >>> (user-specified) stepsize in each dimension. NLOpt takes a "walking > >>> simplex", don't know what R does > >>> > >>> 2) termination: what ftol, xtol did you specify ? NLOpt looks at > >>> fhi - flo: fhi changes at each iteration, flo is sticky. > >>> > >>> Could you post a testcase similar to yours ? > >>> That would sure be helpful. > >>> > >>> cheers > >>> -- denis > >>> > >>> > >>> On 24/05/2012 10:15, servant mathieu wrote: > >>> > Dear scipy users, > >>> > Again a question about optimization. > >>> > I've just compared the efficiency of the simplex routine in R > >>> > (optim) vs scipy (fmin), when minimizing a chi-square. fmin is faster > >>> > than optim, but appears to be less efficient. In R, the value of the > >>> > function is always minimized step by step (there are of course some > >>> > exceptions) while there is lot of fluctuations in python. Given that > >>> > the > >>> > underlying simplex algorithm is supposed to be the same, which > >>> > mechanism > >>> > is responsible for this difference? Is it possible to constrain fmin > so > >>> > it could be more rigorous? > >>> > Cheers, > >>> > Mathieu > >>> > >>> _______________________________________________ > >>> SciPy-User mailing list > >>> SciPy-User at scipy.org > >>> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > >> > >> > >> _______________________________________________ > >> SciPy-User mailing list > >> SciPy-User at scipy.org > >> http://mail.scipy.org/mailman/listinfo/scipy-user > >> > > > > > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu May 9 17:31:20 2013 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 10 May 2013 00:31:20 +0300 Subject: [SciPy-User] efficiency of the simplex routine: R (optim) vs scipy.optimize.fmin In-Reply-To: References: Message-ID: 10.05.2013 00:16, Arnaldo Russo kirjoitti: > Hi Josef, > thank you for quick reply. > > Changing my pp_min, without square, I have the results: > > Out[4]: (array([ 9.82259647e-02, -1.58338152e-02, 5.03125198e+01]), 1) > > It's not much different from previously results. Any reason for this? The chi^2 for the solution found by leastsq is smaller than for that found by R. -- Pauli Virtanen From pav at iki.fi Thu May 9 17:37:32 2013 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 10 May 2013 00:37:32 +0300 Subject: [SciPy-User] efficiency of the simplex routine: R (optim) vs scipy.optimize.fmin In-Reply-To: References: Message-ID: 10.05.2013 00:31, Pauli Virtanen kirjoitti: > 10.05.2013 00:16, Arnaldo Russo kirjoitti: >> Hi Josef, >> thank you for quick reply. >> >> Changing my pp_min, without square, I have the results: >> >> Out[4]: (array([ 9.82259647e-02, -1.58338152e-02, 5.03125198e+01]), 1) >> >> It's not much different from previously results. Any reason for this? > > The chi^2 for the solution found by leastsq is smaller than for that > found by R. I.e., your function has several local minima, and the solver in R doesn't happen to pick the lowest one. This is a common occurrence with local minimzation algorithms --- different algorithms may find different local minima. -- Pauli Virtanen From arnaldorusso at gmail.com Thu May 9 17:43:25 2013 From: arnaldorusso at gmail.com (Arnaldo Russo) Date: Thu, 9 May 2013 18:43:25 -0300 Subject: [SciPy-User] efficiency of the simplex routine: R (optim) vs scipy.optimize.fmin In-Reply-To: References: Message-ID: Hi Pauli, I didn't understand. I thought that the output was the solution of my best values of "alpha", "beta" and "gama". But a much lower value of gama is a best choice? The R solver picked a double value while comparing with python results. I'm asking these things because I want to plot a fit curve with these parameters and I don't know how. Thank you --- *Arnaldo D'Amaral Pereira Granja Russo* Lab. de Estudos dos Oceanos e Clima Instituto de Oceanografia - FURG 2013/5/9 Pauli Virtanen > 10.05.2013 00:16, Arnaldo Russo kirjoitti: > > Hi Josef, > > thank you for quick reply. > > > > Changing my pp_min, without square, I have the results: > > > > Out[4]: (array([ 9.82259647e-02, -1.58338152e-02, 5.03125198e+01]), > 1) > > > > It's not much different from previously results. Any reason for this? > > The chi^2 for the solution found by leastsq is smaller than for that > found by R. > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joonpyro at gmail.com Fri May 10 00:11:25 2013 From: joonpyro at gmail.com (Joon Ro) Date: Thu, 9 May 2013 23:11:25 -0500 Subject: [SciPy-User] efficiency of the simplex routine: R (optim) vs scipy.optimize.fmin In-Reply-To: References: Message-ID: Hi Arnaldo, On Thu, May 9, 2013 at 4:43 PM, Arnaldo Russo wrote: > > I didn't understand. > I thought that the output was the solution of my best values of "alpha", > "beta" and "gama". > But a much lower value of gama is a best choice? The R solver picked a > double value while comparing with python results. > I'm asking these things because I want to plot a fit curve with these > parameters and I don't know how. > > The solver tries to find the parameter (alpha, beta, gama) values which minimizes the sum squared value of the return of your function, `pp_min()`. When I try the Python and R solution: sum(pp_min(array([ 9.82259647e-02, -1.58338152e-02, 5.03125198e+01]), x, y), axis=0) Out[23]: 153.42871102569131 >>> pp_R = array([0.09157204, 0.02129695, 148.89173924]) sum(pp_min(pp_R, x, y), axis=0) Out[21]: 155.07002552970221 So it seems like Python solver found better solution than the R one. You might want to fix alpha, beta parameters and draw plots with different values of gama to see what is going on. Best, Joon -------------- next part -------------- An HTML attachment was scrubbed... URL: From johann.cohentanugi at gmail.com Fri May 10 04:18:23 2013 From: johann.cohentanugi at gmail.com (Johann Cohen-Tanugi) Date: Fri, 10 May 2013 10:18:23 +0200 Subject: [SciPy-User] efficiency of the simplex routine: R (optim) vs scipy.optimize.fmin In-Reply-To: References: Message-ID: <518CAD4F.2000200@gmail.com> Dear Arnaldo, I think what Pauli is trying to say is that the algorithm does not guarantee that the *global* minimum will be found, it proceeds with conjugate gradients or the like to find a local minimum starting from the initial parameter, values, so you might very well converge to a local minimum, but miss the global one. Maybe your function is ill-behaved, it seems to be "scale invariant" : X=x/gamma Y=y/gamma turns your function into g(X,Y) that depends on gamma only through an overall factor, if I read your formula correctly. cheers, Johann On 05/09/2013 11:43 PM, Arnaldo Russo wrote: > Hi Pauli, > > I didn't understand. > I thought that the output was the solution of my best values of "alpha", > "beta" and "gama". > But a much lower value of gama is a best choice? The R solver picked a > double value while comparing with python results. > I'm asking these things because I want to plot a fit curve with these > parameters and I don't know how. > > Thank you > --- > *Arnaldo D'Amaral Pereira Granja Russo* > Lab. de Estudos dos Oceanos e Clima > Instituto de Oceanografia - FURG > > > > > 2013/5/9 Pauli Virtanen > > > 10.05.2013 00:16, Arnaldo Russo kirjoitti: > > Hi Josef, > > thank you for quick reply. > > > > Changing my pp_min, without square, I have the results: > > > > Out[4]: (array([ 9.82259647e-02, -1.58338152e-02, > 5.03125198e+01]), 1) > > > > It's not much different from previously results. Any reason for this? > > The chi^2 for the solution found by leastsq is smaller than for that > found by R. > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > -- > This message has been scanned for viruses and > dangerous content by *MailScanner* , and is > believed to be clean. > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From lists at hilboll.de Fri May 10 05:58:45 2013 From: lists at hilboll.de (Andreas Hilboll) Date: Fri, 10 May 2013 11:58:45 +0200 Subject: [SciPy-User] Ideas for effective linear ND interpolation? Message-ID: <518CC4D5.9080400@hilboll.de> Hi, I'll have to code multilinear interpolation in n dimensions, n~7. My data space is quite large, ~10**9 points. The values are given on a rectangular (but not square) grid. The values are numbers in a range of approx. [0.0, 100.0]. The challenge is to do this efficiently, and it would be great if the whole thing would be able to run fast on a machine with only 8G (or better 4G) RAM. A common task will be to interpolate 10**6 points, which souldn't take too long. Any ideas on how to do this efficiently are welcome: * which dtype to use? * is using pytables/blosc an option? How can this be integrated in the interpolation? * you name it ... ;) Cheers, Andreas. From kmichael.aye at gmail.com Fri May 10 06:00:15 2013 From: kmichael.aye at gmail.com (K.-Michael Aye) Date: Fri, 10 May 2013 03:00:15 -0700 Subject: [SciPy-User] ANN: Spyder v2.2 References: Message-ID: On 2013-05-08 15:15:20 +0000, Pierre Raybaut said: > Hi all, > > On the behalf of Spyder's development team > (http://code.google.com/p/spyderlib/people/list), I'm pleased to > announce that Spyder v2.2 has been released and is available for > Windows XP/Vista/7/8, GNU/Linux and MacOS X: > http://code.google.com/p/spyderlib/. The download link for the Mac App is still in RC phase (2.2.0rc) is that on purpose? I am wondering because you announced the release on May 8th, while this RC has been uploaded on April 6th? Cheers, Michael > > This release represents 18 months of development since v2.1 and > introduces major enhancements and new features: > ?* Full support for IPython v0.13, including the ability to attach to > existing kernels > ?* New MacOS X application > ?* Much improved debugging experience > ?* Various editor improvements for code completion, zooming, auto > insertion, and syntax highlighting > ?* Better looking and faster Object Inspector > ?* Single instance mode > ?* Spanish tranlation of the interface > ?* And many other changes: http://code.google.com/p/spyderlib/wiki/ChangeLog > > This is the last release to support Python 2.5: > ?* Spyder 2.2 supports Python 2.5 to 2.7 > ?* Spyder 2.3 will support Python 2.7 and Python 3 > ?* (Spyder 2.1.14dev4 is a development release which already supports Python 3) > See also https://code.google.com/p/spyderlib/downloads/list. > > Spyder is a free, open-source (MIT license) interactive development > environment for the Python language with advanced editing, interactive > testing, debugging and introspection features. Originally designed to > provide MATLAB-like features (integrated help, interactive console, > variable explorer with GUI-based editors for dictionaries, NumPy > arrays, ...), it is strongly oriented towards scientific computing and > software development. Thanks to the `spyderlib` library, Spyder also > provides powerful ready-to-use widgets: embedded Python console > (example: http://packages.python.org/guiqwt/_images/sift3.png), NumPy > array editor (example: > http://packages.python.org/guiqwt/_images/sift2.png), dictionary > editor, source code editor, etc. > > Description of key features with tasty screenshots can be found at: > http://code.google.com/p/spyderlib/wiki/Features > > Don't forget to follow Spyder updates/news: > ? * on the project website: http://code.google.com/p/spyderlib/ > ? * and on our official blog: http://spyder-ide.blogspot.com/ > > Last, but not least, we welcome any contribution that helps making > Spyder an efficient scientific development/computing environment. Join > us to help creating your favourite environment! > (http://code.google.com/p/spyderlib/wiki/NoteForContributors) > > Enjoy! > -Pierre > > -- > You received this message because you are subscribed to the Google > Groups "spyder" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to > spyderlib+unsubscribe at googlegroups.com. > To post to this group, send email to > spyderlib at googlegroups.com. > Visit this group at http://groups.google.com/group/spyderlib?hl=en. > For more options, visit https://groups.google.com/groups/opt_out. > ? > ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri May 10 09:45:24 2013 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 10 May 2013 14:45:24 +0100 Subject: [SciPy-User] [SciPy-Dev] New scipy.org site In-Reply-To: References: Message-ID: On Fri, May 10, 2013 at 2:33 PM, Pauli Virtanen wrote: > Dear all, > > We are working on migrating the front of http://scipy.org away from the > wiki to a static site, which also should address the performance > problems the site has been encountering recently. Having disabled new logins and deleting all of the spammy users, the performance issues should now basically be resolved. ;-) The static site will have a better collaboration model through Github PRs, of course. How is the conversion process going? Are there people working on migrating the Topical Software and Cookbook pages, yet? -- Robert Kern From arnaldorusso at gmail.com Fri May 10 11:10:13 2013 From: arnaldorusso at gmail.com (Arnaldo Russo) Date: Fri, 10 May 2013 12:10:13 -0300 Subject: [SciPy-User] efficiency of the simplex routine: R (optim) vs scipy.optimize.fmin In-Reply-To: <518CAD4F.2000200@gmail.com> References: <518CAD4F.2000200@gmail.com> Message-ID: Thank you so much for explanations. Now I understood the minimum solution. My question is, if I put those minimized values as parameters of my pp_min function, I'll get values of adjusted curve? Arnaldo. --- *Arnaldo D'Amaral Pereira Granja Russo* Lab. de Estudos dos Oceanos e Clima Instituto de Oceanografia - FURG 2013/5/10 Johann Cohen-Tanugi > Dear Arnaldo, > I think what Pauli is trying to say is that the algorithm does not > guarantee that the *global* minimum will be found, it proceeds with > conjugate gradients or the like to find a local minimum starting from the > initial parameter, values, so you might very well converge to a local > minimum, but miss the global one. > > Maybe your function is ill-behaved, it seems to be "scale invariant" : > X=x/gamma Y=y/gamma turns your function into g(X,Y) that depends on gamma > only through an overall factor, if I read your formula correctly. > > cheers, > Johann > > > On 05/09/2013 11:43 PM, Arnaldo Russo wrote: > >> Hi Pauli, >> >> I didn't understand. >> I thought that the output was the solution of my best values of "alpha", >> "beta" and "gama". >> But a much lower value of gama is a best choice? The R solver picked a >> double value while comparing with python results. >> I'm asking these things because I want to plot a fit curve with these >> parameters and I don't know how. >> >> Thank you >> --- >> *Arnaldo D'Amaral Pereira Granja Russo* >> >> Lab. de Estudos dos Oceanos e Clima >> Instituto de Oceanografia - FURG >> >> >> >> >> 2013/5/9 Pauli Virtanen > >> >> >> 10.05.2013 00:16, Arnaldo Russo kirjoitti: >> > Hi Josef, >> > thank you for quick reply. >> > >> > Changing my pp_min, without square, I have the results: >> > >> > Out[4]: (array([ 9.82259647e-02, -1.58338152e-02, >> 5.03125198e+01]), 1) >> > >> > It's not much different from previously results. Any reason for >> this? >> >> The chi^2 for the solution found by leastsq is smaller than for that >> found by R. >> >> -- >> Pauli Virtanen >> >> ______________________________**_________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/**listinfo/scipy-user >> >> >> >> -- >> This message has been scanned for viruses and >> dangerous content by *MailScanner* **, and >> is >> believed to be clean. >> >> >> >> ______________________________**_________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/**listinfo/scipy-user >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthieu.brucher at gmail.com Fri May 10 11:17:52 2013 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Fri, 10 May 2013 16:17:52 +0100 Subject: [SciPy-User] efficiency of the simplex routine: R (optim) vs scipy.optimize.fmin In-Reply-To: References: <518CAD4F.2000200@gmail.com> Message-ID: Hi, Yes, as Joon wrote, you can use the two sets as parameters for your function. Cheers, 2013/5/10 Arnaldo Russo > Thank you so much for explanations. > > Now I understood the minimum solution. > My question is, if I put those minimized values as parameters of my pp_min > function, > I'll get values of adjusted curve? > > Arnaldo. > > > > --- > *Arnaldo D'Amaral Pereira Granja Russo* > Lab. de Estudos dos Oceanos e Clima > Instituto de Oceanografia - FURG > > > > > 2013/5/10 Johann Cohen-Tanugi > >> Dear Arnaldo, >> I think what Pauli is trying to say is that the algorithm does not >> guarantee that the *global* minimum will be found, it proceeds with >> conjugate gradients or the like to find a local minimum starting from the >> initial parameter, values, so you might very well converge to a local >> minimum, but miss the global one. >> >> Maybe your function is ill-behaved, it seems to be "scale invariant" : >> X=x/gamma Y=y/gamma turns your function into g(X,Y) that depends on gamma >> only through an overall factor, if I read your formula correctly. >> >> cheers, >> Johann >> >> >> On 05/09/2013 11:43 PM, Arnaldo Russo wrote: >> >>> Hi Pauli, >>> >>> I didn't understand. >>> I thought that the output was the solution of my best values of "alpha", >>> "beta" and "gama". >>> But a much lower value of gama is a best choice? The R solver picked a >>> double value while comparing with python results. >>> I'm asking these things because I want to plot a fit curve with these >>> parameters and I don't know how. >>> >>> Thank you >>> --- >>> *Arnaldo D'Amaral Pereira Granja Russo* >>> >>> Lab. de Estudos dos Oceanos e Clima >>> Instituto de Oceanografia - FURG >>> >>> >>> >>> >>> 2013/5/9 Pauli Virtanen > >>> >>> >>> 10.05.2013 00:16, Arnaldo Russo kirjoitti: >>> > Hi Josef, >>> > thank you for quick reply. >>> > >>> > Changing my pp_min, without square, I have the results: >>> > >>> > Out[4]: (array([ 9.82259647e-02, -1.58338152e-02, >>> 5.03125198e+01]), 1) >>> > >>> > It's not much different from previously results. Any reason for >>> this? >>> >>> The chi^2 for the solution found by leastsq is smaller than for that >>> found by R. >>> >>> -- >>> Pauli Virtanen >>> >>> ______________________________**_________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/**listinfo/scipy-user >>> >>> >>> >>> -- >>> This message has been scanned for viruses and >>> dangerous content by *MailScanner* **, >>> and is >>> believed to be clean. >>> >>> >>> >>> ______________________________**_________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/**listinfo/scipy-user >>> >>> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From lanceboyle at qwest.net Fri May 10 22:37:20 2013 From: lanceboyle at qwest.net (Jerry) Date: Fri, 10 May 2013 19:37:20 -0700 Subject: [SciPy-User] Where are GUI apps for ports installed? In-Reply-To: References: <8C753D58-CF13-48F3-9566-6EC60D8548D5@qwest.net> Message-ID: <3449CD4A-B9F9-4C7A-A9B7-9D2856564037@qwest.net> Oops, my bad. Meant to sent this to MacPorts list. Jerry On May 9, 2013, at 2:29 AM, Robert Kern wrote: > On Thu, May 9, 2013 at 9:58 AM, Jerry wrote: >> Some ports install a double-clickable binary in /Applications/MacPorts. >> >> However, the only things installed in that location are on my system are AquaTerm, Pallet, some python 2.7 auxiliary stuff, and something related to Qt4 called qmlplugindump. >> >> I have the Spyder port installed and there is a binary for it but it does not install in this location but rather apparently in > > The MacPorts mailing lists may be found here: > > http://trac.macports.org/wiki/MailingLists > > The Spyder mailing list may be found here: > > https://groups.google.com/forum/?fromgroups#!forum/spyderlib > > -- > Robert Kern > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From lanceboyle at qwest.net Fri May 10 22:41:32 2013 From: lanceboyle at qwest.net (Jerry) Date: Fri, 10 May 2013 19:41:32 -0700 Subject: [SciPy-User] Is Macports version of Spyder 2.2.0 same as "official" release? In-Reply-To: References: Message-ID: <438A88A6-25B3-4C7E-88D4-41A3D7BE9F2C@qwest.net> Geez, sorry.... :-\ Jerry On May 9, 2013, at 1:59 AM, Jerry wrote: > Hi, > > I see that Spyder 2.2.0 (Scientific PYthon Development EnviRonment) is out. There is an official .dmg distribution available from the project's home page. According to the list of changes for 2.2.0 http://code.google.com/p/spyderlib/wiki/ChangeLog is this: > > "The App comes with its own interpreter, which has the main Python scientific libraries preinstalled: Numpy, SciPy, Matplotlib, IPython, Pandas, Sympy, Scikit-learn and Scikit-image." > > Does the MacPorts version of Spyder 2.2.0 include all of these add-ons? I hate duplicating things such as Python interpreters, Numpy, etc. all over the place if I can do it once with MacPorts so I would much prefer to use the MacPorts version if possible. > > Also, the MacPorts page lists both py-spyder 2.2.0 and py27-spyder 2.2.0. Are these the same? > > Thanks, > Jerry > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From pmhobson at gmail.com Mon May 13 14:24:34 2013 From: pmhobson at gmail.com (Paul Hobson) Date: Mon, 13 May 2013 11:24:34 -0700 Subject: [SciPy-User] Is Macports version of Spyder 2.2.0 same as "official" release? In-Reply-To: <438A88A6-25B3-4C7E-88D4-41A3D7BE9F2C@qwest.net> References: <438A88A6-25B3-4C7E-88D4-41A3D7BE9F2C@qwest.net> Message-ID: On May 9, 2013, at 1:59 AM, Jerry wrote: > > > Hi, > > > > I see that Spyder 2.2.0 (Scientific PYthon Development EnviRonment) is > out. There is an official .dmg distribution available from the project's > home page. According to the list of changes for 2.2.0 > http://code.google.com/p/spyderlib/wiki/ChangeLog is this: > > > > "The App comes with its own interpreter, which has the main Python > scientific libraries preinstalled: Numpy, SciPy, Matplotlib, IPython, > Pandas, Sympy, Scikit-learn and Scikit-image." > > > > Does the MacPorts version of Spyder 2.2.0 include all of these add-ons? > I hate duplicating things such as Python interpreters, Numpy, etc. all over > the place if I can do it once with MacPorts so I would much prefer to use > the MacPorts version if possible. > > > > Also, the MacPorts page lists both py-spyder 2.2.0 and py27-spyder > 2.2.0. Are these the same? > On Fri, May 10, 2013 at 7:41 PM, Jerry wrote: > Geez, sorry.... :-\ > Jerry > No need to apologize. This list doesn't really support Spyder. You might have better luck and Sypder's Google Code page. -p -------------- next part -------------- An HTML attachment was scrubbed... URL: From sergio_r at mail.com Tue May 14 15:34:10 2013 From: sergio_r at mail.com (Sergio Rojas) Date: Tue, 14 May 2013 15:34:10 -0400 Subject: [SciPy-User] About wrong results from scipy statistical distributions Message-ID: <20130514193410.181550@gmx.com> I am wondering whether there are specific examples one could run to check what exactly is wrong with skew and kurtosis on scipy as mentioned in the "remaining Issuess" [1] section of the documention [ http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html#remaining-issues ]. It is also mentioned there that there is a range of values on which the scipy distributions gives wrong results. Is there any other document explaining further this (and the previous) issue? Thanks in advance, Sergio [1] Remaining Issues * skew and kurtosis, 3rd and 4th moments and entropy are not thoroughly tested and some coarse testing indicates that there are still some incorrect results left. * * the distributions have been tested over some range of parameters, however in some corner ranges, a few incorrect results may remain. -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue May 14 16:13:40 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 14 May 2013 16:13:40 -0400 Subject: [SciPy-User] About wrong results from scipy statistical distributions In-Reply-To: <20130514193410.181550@gmx.com> References: <20130514193410.181550@gmx.com> Message-ID: On Tue, May 14, 2013 at 3:34 PM, Sergio Rojas wrote: > I am wondering whether there are specific examples one could run to check > what exactly is > wrong with skew and kurtosis on scipy as mentioned in the "remaining > Issuess" [1] section > of the documention > [ > http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html#remaining-issues > ]. > > It is also mentioned there that there is a range of values on which the > scipy distributions gives wrong results. Is there any other document > explaining further this (and the previous) issue? > > Thanks in advance, > > Sergio > > [1] > > Remaining Issues > > skew and kurtosis, 3rd and 4th moments and entropy are not thoroughly tested > and some coarse testing indicates that there are still some incorrect > results left. I think I left some tests in the test suite that are not run. The problems is that these are coarse statistical tests that have several false failures. Also there are problems if the moments don't even exist. https://github.com/scipy/scipy/issues/1329#issuecomment-17022751 is the list when I looked at this the last time, IIRC We should also be able to get the moments by numerical integration, but I haven't tried that yet. I think some of the entropies have possibly the wrong sign. Essentially, it requires going through the list and checking all the suspicious skew, kurtosis and entropy, to see whether they are a false alarm or a bug. related issues: non-existing moments are not correctly specified searching the issues for skew and kurtosis finds these two https://github.com/scipy/scipy/issues/2401 https://github.com/scipy/scipy/issues/1866 > > the distributions have been tested over some range of parameters, however in > some corner ranges, a few incorrect results may remain. This is a generic warning. Some cases are known and have tickets, like truncnorm if you work only far out in the right tail. There are some function where the pdf is singular (-> inf) at the boundary (or maybe also interior), and the results can get strange when we get close enough. (Here is a case that Warren fixed https://github.com/scipy/scipy/pull/106 ) Some distributions degenerate to a single point at the limit of the parameter space, and I don't know how close to the limits they start to get "weird", i.e. numerical problems could dominate the result. There can also be numerical precision problems in the scipy.special function (but Pauli has been improving those a lot). Contributions here would be very helpful. Also if a case is identified as false alarm, it would be good to know so it can be taken of the "suspicious" list Josef > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From alok.jadhav at credit-suisse.com Wed May 15 02:06:04 2013 From: alok.jadhav at credit-suisse.com (Jadhav, Alok) Date: Wed, 15 May 2013 14:06:04 +0800 Subject: [SciPy-User] 64 bit weave issues Message-ID: SciPy gurus, I am facing a strange problem using weave on 64 bit machine. Specifically with weave's inline function. It has something to do with weave's catalog. Similar issues I found in the past (very old) http://mail.scipy.org/pipermail/scipy-dev/2006-June/005908.html http://mail.scipy.org/pipermail/scipy-dev/2005-June/003042.html Common things I have in my observation are: - Already working setup in 32 bit doesn't work in same manner in 64 bit env - Weave recompiles inline code which is already compiled and doesn't require recompilation. This doesn't happen every time but at random. Whenever weave recompiles I see a notification "repairing catalog by removing key" in the output which ends up in the error message "ImportError: DLL load failed: Invalid access to memory location". If I don't see repair catalog message, then it doesn't fail. It compiles correctly and generates the output. - Sometimes gcc gets into an infinite loop printing error message "Looking for python27.dll". Even though the dll is on the path. This process doesn't end. Had to kill it forcefully. G++ process became ghost even after killing python process. Could someone advise what am I missing here. Is there any specific setup that I need to do? Is there an issue with python 27 - 64 bit weave implementation? Is there some issue in using Scipy for 64bit in production? Is it not stable yet? Regards, Alok I have a simple script to calculate moving average using weave's inline function. source mvg.py import numpy as np import scipy.weave as weave import distutils.sysconfig import distutils.dir_util import os distutils.sysconfig._config_vars["LDSHARED"]="-LC:\strawberry\c\x86_64-w 64-mingw32\lib" def ExpMovAvg(data,time,lag): if (data.size!=time.size): print "error in EMA, data and time have different size" return None result=np.repeat(0.0,data.size) code=""" #line 66 "basics.py" result(0)=data(0); for (int i=0;i1) { alpha=10; } result(i+1)=(1-alpha)*data(i)+alpha*result(i); } """ weave.inline(code,["data","time","lag","result"],type_converters=weave.c onverters.blitz,headers=[""],compiler="gcc",verbose=2) return result source test.py import string import numpy as np import mvg print(mvg.ExpMovAvg(np.array(range(10)),np.array(range(10)),2)) Output: Working output: Y:\STMM\alpha\klse\PROD>c:\python27\python.exe s:\common\tools\python\python-2.7-64bit\test.py [ 0. 0. 0.63212774 1.49679774 2.44701359 3.42869938 4.42196209 5.41948363 6.41857187 7.41823646] Now if I keep running the script multiple times, sometimes I see correct output... but suddenly sometimes I get below error. Y:\STMM\alpha\klse\PROD>c:\python27\python.exe s:\common\tools\python\python-2.7-64bit\test.py repairing catalog by removing key Looking for python27.dll running build_ext running build_src build_src building extension "sc_44f3fe3c65d5c3feecb45d9269ac207f5" sources build_src: building npy-pkg config files Looking for python27.dll customize Mingw32CCompiler customize Mingw32CCompiler using build_ext Looking for python27.dll customize Mingw32CCompiler customize Mingw32CCompiler using build_ext building 'sc_44f3fe3c65d5c3feecb45d9269ac207f5' extension compiling C++ sources C compiler: g++ -g -DDEBUG -DMS_WIN64 -O0 -Wall compile options: '-Ic:\python27\lib\site-packages\scipy\weave -Ic:\python27\lib\site-packages\scipy\weave\scxx -Ic:\python27\lib\site-packages\scipy\weave\blitz -Ic:\python27\lib\site-packages\numpy\core\include -Ic:\python27\include -Ic:\python27\PC -c' g++ -g -DDEBUG -DMS_WIN64 -O0 -Wall -Ic:\python27\lib\site-packages\scipy\weave -Ic:\python27\lib\site-packages\scipy\weave\scxx -Ic:\python27\lib\site-packages\scipy\weave\blitz -Ic:\python27\lib\site-packages\numpy\core\include -Ic:\python27\include -Ic:\python27\PC -c c:\users\ajadhav2\appdata\local\temp\ajadhav2\python27_compiled\sc_44f3f e3c65d5c3feecb45d9269ac207f5.cpp -o c:\users\ajadhav2\appdata\local\temp\ajadhav2\python27_intermediate\comp iler_2d3e1e2e4de6a91419d2376b162e5342\Release\users\ajadhav2\appdata\loc al\temp\ajadhav2\python27_compiled\sc_44f3fe3c65d5c3feecb45d9269ac207f5. o Found executable C:\strawberry\c\bin\g++.exe g++ -g -DDEBUG -DMS_WIN64 -O0 -Wall-Ic:\python27\lib\site-packages\scipy\weave-Ic:\python27\lib\site-p ackages\scipy\weave\scxx -Ic:\python27\lib\site-packages\scipy\weave\blitz -Ic:\python27\lib\site-packages\numpy\core\include-Ic:\python27\include -Ic:\python27\PC -c c:\python27\lib\site-packages\scipy\weave\scxx\weave_imp.cpp -o c:\users\ajadhav2\appdata\local\temp\ajadhav2\python27_intermediate\comp iler_2d3e1e2e4de6a91419d2376b162e5342\Release\python27\lib\site-packages \scipy\weave\scxx\weave_imp.o g++ -g -shared c:\users\ajadhav2\appdata\local\temp\ajadhav2\python27_intermediate\comp iler_2d3e1e2e4de6a91419d2376b162e5342\Release\users\ajadhav2\appdata\loc al\temp\ajadhav2\python27_compiled\sc_44f3fe3c65d5c3feecb45d9269ac207f5. o c:\users\ajadhav2\appdata\local\temp\ajadhav2\python27_intermediate\comp iler_2d3e1e2e4de6a91419d2376b162e5342\Release\python27\lib\site-packages \scipy\weave\scxx\weave_imp.o -Lc:\python27\libs -Lc:\python27\PCbuild\amd64 -lpython27 -lmsvcr90 -o c:\users\ajadhav2\appdata\local\temp\ajadhav2\python27_compiled\sc_44f3f e3c65d5c3feecb45d9269ac207f5.pyd running scons Traceback (most recent call last): File "s:\common\tools\python\python-2.7-64bit\test.py", line 5, in print(mvg.ExpMovAvg(np.array(range(10)),np.array(range(10)),2)) File "s:\common\tools\python\python-2.7-64bit\mvg.py", line 30, in ExpMovAvgweave.inline(code,["data","time","lag","result"], type_converters=weave.converters.blitz,headers=[""],compiler="gc c",verbose=2) File "c:\python27\lib\site-packages\scipy\weave\inline_tools.py", line355, in inline **kw) File "c:\python27\lib\site-packages\scipy\weave\inline_tools.py", line 488, in compile_function exec 'import ' + module_name File "", line 1, in ImportError: DLL load failed: Invalid access to memory location. Y:\STMM\alpha\klse\PROD> =============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html =============================================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From dan.stowell at eecs.qmul.ac.uk Wed May 15 04:35:59 2013 From: dan.stowell at eecs.qmul.ac.uk (Dan Stowell) Date: Wed, 15 May 2013 09:35:59 +0100 Subject: [SciPy-User] Matching Pursuit Toolkit (MPTK) 0.7 released Message-ID: <519348EF.4080704@eecs.qmul.ac.uk> Dear all, Matching Pursuit Toolkit (MPTK) is a fast and efficient library for the sparse decomposition of multichannel audio signals. Version 0.7 is now officially released: https://gforge.inria.fr/frs/?group_id=36 New in this version is a Python wrapper, so you can decompose/reconstruct signals directly from within Python. Best Dan -- Dan Stowell Postdoctoral Research Assistant Centre for Digital Music Queen Mary, University of London Mile End Road, London E1 4NS http://www.elec.qmul.ac.uk/digitalmusic/people/dans.htm http://www.mcld.co.uk/ From bricklemacho at gmail.com Wed May 15 22:19:35 2013 From: bricklemacho at gmail.com (Brickle Macho) Date: Thu, 16 May 2013 10:19:35 +0800 Subject: [SciPy-User] Matlab imfilter/fspecial equivalent Message-ID: <51944237.50209@gmail.com> I am porting some Matlab code to python. When I run the ported version the output differs. It appears the difference may be due to how I have translated Matlab's imfilter() and/or fspecial(). Below are my translation, are there any Scipy/Matlab gurus that can let me know if there are correct or if I have made some errors, or could suggest better translations. Matlab: imfilter(saliencyMap, fspecial('gaussian', [10, 10], 2.5)) Scipy: ndimage.gaussian_filter(saliencyMap, sigma=2.5) Also the following. At the rsik of highlighting my lack of understanding of the temrinology, Is uniform_filter the same as 'average'? Matlab: imfilter(myLogAmplitude, fspecial('average', 3), 'replicate'); Scipy: scipy.ndimage.uniform_filter(mylogAmplitude, size=3, mode="nearest") Thanks, Michael. -- If curious here are the 8 lines of matlab code I am trying to port: %% Read image from file inImg = im2double(rgb2gray(imread('yourImage.jpg'))); inImg = imresize(inImg, 64/size(inImg, 2)); %% Spectral Residual myFFT = fft2(inImg); myLogAmplitude = log(abs(myFFT)); myPhase = angle(myFFT); mySpectralResidual = myLogAmplitude - imfilter(myLogAmplitude, fspecial('average', 3), 'replicate'); saliencyMap = abs(ifft2(exp(mySpectralResidual + i*myPhase))).^2; %% After Effect saliencyMap = mat2gray(imfilter(saliencyMap, fspecial('gaussian', [10, 10], 2.5))); imshow(saliencyMap); -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu May 16 01:11:26 2013 From: travis at continuum.io (Travis Oliphant) Date: Thu, 16 May 2013 00:11:26 -0500 Subject: [SciPy-User] Matlab imfilter/fspecial equivalent In-Reply-To: <51944237.50209@gmail.com> References: <51944237.50209@gmail.com> Message-ID: Hi Michael, Thanks for sharing your code and example. The imfilter command is equivalent to scipy.signal.correlate and scipy.ndimage.correlate (the one in scipy.ndimage is faster I believe). These implement simple correlation-based filtering given a finite kernel. To get the same output you would need to generate the same kind of kernel in Python as the Matlab fspecial command is producing. one approach to get you started and to help separate the problem into 2 stages (reproducing the imfilter and reproducing the fspecial filter) is to export the results of the Matlab fspecial command and use that kernel in Python code (save it as a .mat file and read that .mat file with Python). SciPy does not have the equivalent to the fspecial command but you can generate all kinds of 1-d special filters with scipy.signal.get_window. You can also "generate" the filters you need directly from code. Here is some untested code for generating something close to what fspecial('gaussian", [10,10], 2.5) would be producing import numpy as np def fgaussian(size, sigma): m,n = size h, k = m//2, n//2 x, y = np.mgrid[-h:h, -k:k] On Wed, May 15, 2013 at 9:19 PM, Brickle Macho wrote: > I am porting some Matlab code to python. When I run the ported version > the output differs. It appears the difference may be due to how I have > translated Matlab's imfilter() and/or fspecial(). Below are my > translation, are there any Scipy/Matlab gurus that can let me know if there > are correct or if I have made some errors, or could suggest better > translations. > > Matlab: imfilter(saliencyMap, fspecial('gaussian', [10, 10], 2.5)) > Scipy: ndimage.gaussian_filter(saliencyMap, sigma=2.5) > > Also the following. At the rsik of highlighting my lack of understanding > of the temrinology, Is uniform_filter the same as 'average'? > > Matlab: imfilter(myLogAmplitude, fspecial('average', 3), 'replicate'); > Scipy: scipy.ndimage.uniform_filter(mylogAmplitude, size=3, > mode="nearest") > > > Thanks, > > Michael. > -- > > > If curious here are the 8 lines of matlab code I am trying to port: > > %% Read image from file > inImg = im2double(rgb2gray(imread('yourImage.jpg'))); > inImg = imresize(inImg, 64/size(inImg, 2)); > > %% Spectral Residual > myFFT = fft2(inImg); > myLogAmplitude = log(abs(myFFT)); > myPhase = angle(myFFT); > mySpectralResidual = myLogAmplitude - imfilter(myLogAmplitude, > fspecial('average', 3), 'replicate'); > saliencyMap = abs(ifft2(exp(mySpectralResidual + i*myPhase))).^2; > > %% After Effect > saliencyMap = mat2gray(imfilter(saliencyMap, fspecial('gaussian', [10, > 10], 2.5))); > imshow(saliencyMap); > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- --- Travis Oliphant Continuum Analytics, Inc. http://www.continuum.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu May 16 01:25:56 2013 From: travis at continuum.io (Travis Oliphant) Date: Thu, 16 May 2013 00:25:56 -0500 Subject: [SciPy-User] Matlab imfilter/fspecial equivalent In-Reply-To: <51944237.50209@gmail.com> References: <51944237.50209@gmail.com> Message-ID: I accidentally hit send before I finished the original email (Gmail interface quirk strikes again...) I'll including the entire original email.... Hi Michael, Thanks for sharing your code and example. The imfilter command is equivalent to scipy.signal.correlate and scipy.ndimage.correlate (the one in scipy.ndimage is faster I believe). These implement simple correlation-based filtering given a finite kernel. To get the same output you would need to generate the same kind of kernel in Python as the Matlab fspecial command is producing. one approach to get you started and to help separate the problem into 2 stages (reproducing the imfilter and reproducing the fspecial filter) is to export the results of the Matlab fspecial command and use that kernel in Python code (save it as a .mat file and read that .mat file with Python). SciPy does not have the equivalent to the fspecial command but you can generate all kinds of 1-d special filters with scipy.signal.get_window. You can also "generate" the filters you need directly from code. Here is some untested code for generating something close to what fspecial('gaussian", [10,10], 2.5) would be producing import numpy as np def fgaussian(size, sigma): m,n = size h, k = m//2, n//2 x, y = np.mgrid[-h:h, -k:k] return np.exp(-(x**2 + y**2)/(2*sigma**2)) Up to a scaling constant (and possibly shifted a bit) this should return a similar filter to fspecial('gaussian', size, sigma). I'm not completely sure if the ndimage.gaussian_filter does a finite-window convolution like imfilter (and correlate) or if it does something else. The uniform filter is the same as an averaging filter (up to a scaling constant). To other approach is to just use scipy.ndimage.correlate with np.ones(size) / np.product(size) where size is the size of the kernel. Perhaps you would be willing to post your code when you get it to work. Best, -Travis On Wed, May 15, 2013 at 9:19 PM, Brickle Macho wrote: > I am porting some Matlab code to python. When I run the ported version > the output differs. It appears the difference may be due to how I have > translated Matlab's imfilter() and/or fspecial(). Below are my > translation, are there any Scipy/Matlab gurus that can let me know if there > are correct or if I have made some errors, or could suggest better > translations. > > Matlab: imfilter(saliencyMap, fspecial('gaussian', [10, 10], 2.5)) > Scipy: ndimage.gaussian_filter(saliencyMap, sigma=2.5) > > Also the following. At the rsik of highlighting my lack of understanding > of the temrinology, Is uniform_filter the same as 'average'? > > Matlab: imfilter(myLogAmplitude, fspecial('average', 3), 'replicate'); > Scipy: scipy.ndimage.uniform_filter(mylogAmplitude, size=3, > mode="nearest") > > > Thanks, > > Michael. > -- > > > If curious here are the 8 lines of matlab code I am trying to port: > > %% Read image from file > inImg = im2double(rgb2gray(imread('yourImage.jpg'))); > inImg = imresize(inImg, 64/size(inImg, 2)); > > %% Spectral Residual > myFFT = fft2(inImg); > myLogAmplitude = log(abs(myFFT)); > myPhase = angle(myFFT); > mySpectralResidual = myLogAmplitude - imfilter(myLogAmplitude, > fspecial('average', 3), 'replicate'); > saliencyMap = abs(ifft2(exp(mySpectralResidual + i*myPhase))).^2; > > %% After Effect > saliencyMap = mat2gray(imfilter(saliencyMap, fspecial('gaussian', [10, > 10], 2.5))); > imshow(saliencyMap); > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- --- Travis Oliphant Continuum Analytics, Inc. http://www.continuum.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From bricklemacho at gmail.com Thu May 16 02:12:09 2013 From: bricklemacho at gmail.com (Brickle Macho) Date: Thu, 16 May 2013 14:12:09 +0800 Subject: [SciPy-User] Matlab imfilter/fspecial equivalent In-Reply-To: References: <51944237.50209@gmail.com> Message-ID: <519478B9.9050201@gmail.com> Hi Travis, Thanks for the response, it has given a better understanding of the functions. Once working, I will post the new version. The code comes from [1], but [2] questions the use/need of log-amplitude and experimentally show that just using the phase produces the same result. This new approach reduces the computation by about 1/3. I have been able to confirm that using just the phase component produces the same result within either environment, Matlab or Scipy. Obviously I still have the problem that the output from Matlab differs from Scipy. I also discovered that if I don't resize the image (it is currently being reduced to approx 64x64) the output from Matlab is almost the same as Python. This suggests to me that, since keeping the kernels the same and using a smaller/larger images implies the differences in output is due to the kernel being used (I think). So for now I will focus my effort on implementing the the fspecial('gaussian', [10,10], 2.5) using the one provided below as a starting point. Thanks again for the pointers. Michael. -- [1] Xiaodi Hou and Liqing Zhang , Saliency Detection: A Spectral Residual Approach, Journal Computer Vision and Pattern Recognition, IEEE Computer Society Conference 2007 [2] Guo, Chenlei and Ma, Qi and Zhang, Liming, Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform, Computer Vision and Pattern Recognition, 2008. CVPR 2008. On 16/05/13 1:25 PM, Travis Oliphant wrote: > I accidentally hit send before I finished the original email (Gmail > interface quirk strikes again...) I'll including the entire original > email.... > > Hi Michael, > > Thanks for sharing your code and example. The imfilter command is > equivalent to scipy.signal.correlate and scipy.ndimage.correlate (the > one in scipy.ndimage is faster I believe). These implement simple > correlation-based filtering given a finite kernel. > > To get the same output you would need to generate the same kind of > kernel in Python as the Matlab fspecial command is producing. one > approach to get you started and to help separate the problem into 2 > stages (reproducing the imfilter and reproducing the fspecial filter) > is to export the results of the Matlab fspecial command and use that > kernel in Python code (save it as a .mat file and read that .mat file > with Python). > > SciPy does not have the equivalent to the fspecial command but you can > generate all kinds of 1-d special filters with > scipy.signal.get_window. You can also "generate" the filters you > need directly from code. > > Here is some untested code for generating something close to what > fspecial('gaussian", [10,10], 2.5) would be producing > > import numpy as np > > def fgaussian(size, sigma): > m,n = size > h, k = m//2, n//2 > x, y = np.mgrid[-h:h, -k:k] > return np.exp(-(x**2 + y**2)/(2*sigma**2)) > > Up to a scaling constant (and possibly shifted a bit) this should > return a similar filter to fspecial('gaussian', size, sigma). > > I'm not completely sure if the ndimage.gaussian_filter does a > finite-window convolution like imfilter (and correlate) or if it does > something else. > > The uniform filter is the same as an averaging filter (up to a scaling > constant). To other approach is to just use scipy.ndimage.correlate > with np.ones(size) / np.product(size) where size is the size of the > kernel. > > Perhaps you would be willing to post your code when you get it to work. > > Best, > > -Travis > > > > > On Wed, May 15, 2013 at 9:19 PM, Brickle Macho > wrote: > > I am porting some Matlab code to python. When I run the ported > version the output differs. It appears the difference may be due > to how I have translated Matlab's imfilter() and/or fspecial(). > Below are my translation, are there any Scipy/Matlab gurus that > can let me know if there are correct or if I have made some > errors, or could suggest better translations. > > Matlab: imfilter(saliencyMap, fspecial('gaussian', [10, 10], 2.5)) > Scipy: ndimage.gaussian_filter(saliencyMap, sigma=2.5) > > Also the following. At the rsik of highlighting my lack of > understanding of the temrinology, Is uniform_filter the same as > 'average'? > > Matlab: imfilter(myLogAmplitude, fspecial('average', 3), > 'replicate'); > Scipy: scipy.ndimage.uniform_filter(mylogAmplitude, size=3, > mode="nearest") > > > Thanks, > > Michael. > -- > > > If curious here are the 8 lines of matlab code I am trying to port: > > %% Read image from file > inImg = im2double(rgb2gray(imread('yourImage.jpg'))); > inImg = imresize(inImg, 64/size(inImg, 2)); > > %% Spectral Residual > myFFT = fft2(inImg); > myLogAmplitude = log(abs(myFFT)); > myPhase = angle(myFFT); > mySpectralResidual = myLogAmplitude - imfilter(myLogAmplitude, > fspecial('average', 3), 'replicate'); > saliencyMap = abs(ifft2(exp(mySpectralResidual + i*myPhase))).^2; > > %% After Effect > saliencyMap = mat2gray(imfilter(saliencyMap, fspecial('gaussian', > [10, 10], 2.5))); > imshow(saliencyMap); > > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > -- > --- > Travis Oliphant > Continuum Analytics, Inc. > http://www.continuum.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From bricklemacho at gmail.com Thu May 16 20:23:54 2013 From: bricklemacho at gmail.com (Brickle Macho) Date: Fri, 17 May 2013 08:23:54 +0800 Subject: [SciPy-User] Matlab imfilter/fspecial equivalent In-Reply-To: References: <51944237.50209@gmail.com> Message-ID: <5195789A.8060302@gmail.com> I am getting closer. Here are some observations: * If I don't imresize() the original image, then visually I can't tell the difference in the output image from Matlab or Scipy. * Empirically the results using different kernels are the same. That is the difference between two images is essentially zero. That is doesn't seem to matter if I use the imported the "fspecial()" kernel from matlab; generate my own kernel using a fgaussion(), or use scipy.ndimage.gaussian_filter(). So look like the problem is with ndimage.resize(). Michael. -- On 16/05/13 1:25 PM, Travis Oliphant wrote: > I accidentally hit send before I finished the original email (Gmail > interface quirk strikes again...) I'll including the entire original > email.... > > Hi Michael, > > Thanks for sharing your code and example. The imfilter command is > equivalent to scipy.signal.correlate and scipy.ndimage.correlate (the > one in scipy.ndimage is faster I believe). These implement simple > correlation-based filtering given a finite kernel. > > To get the same output you would need to generate the same kind of > kernel in Python as the Matlab fspecial command is producing. one > approach to get you started and to help separate the problem into 2 > stages (reproducing the imfilter and reproducing the fspecial filter) > is to export the results of the Matlab fspecial command and use that > kernel in Python code (save it as a .mat file and read that .mat file > with Python). > > SciPy does not have the equivalent to the fspecial command but you can > generate all kinds of 1-d special filters with > scipy.signal.get_window. You can also "generate" the filters you > need directly from code. > > Here is some untested code for generating something close to what > fspecial('gaussian", [10,10], 2.5) would be producing > > import numpy as np > > def fgaussian(size, sigma): > m,n = size > h, k = m//2, n//2 > x, y = np.mgrid[-h:h, -k:k] > return np.exp(-(x**2 + y**2)/(2*sigma**2)) > > Up to a scaling constant (and possibly shifted a bit) this should > return a similar filter to fspecial('gaussian', size, sigma). > > I'm not completely sure if the ndimage.gaussian_filter does a > finite-window convolution like imfilter (and correlate) or if it does > something else. > > The uniform filter is the same as an averaging filter (up to a scaling > constant). To other approach is to just use scipy.ndimage.correlate > with np.ones(size) / np.product(size) where size is the size of the > kernel. > > Perhaps you would be willing to post your code when you get it to work. > > Best, > > -Travis > > > > > On Wed, May 15, 2013 at 9:19 PM, Brickle Macho > wrote: > > I am porting some Matlab code to python. When I run the ported > version the output differs. It appears the difference may be due > to how I have translated Matlab's imfilter() and/or fspecial(). > Below are my translation, are there any Scipy/Matlab gurus that > can let me know if there are correct or if I have made some > errors, or could suggest better translations. > > Matlab: imfilter(saliencyMap, fspecial('gaussian', [10, 10], 2.5)) > Scipy: ndimage.gaussian_filter(saliencyMap, sigma=2.5) > > Also the following. At the rsik of highlighting my lack of > understanding of the temrinology, Is uniform_filter the same as > 'average'? > > Matlab: imfilter(myLogAmplitude, fspecial('average', 3), > 'replicate'); > Scipy: scipy.ndimage.uniform_filter(mylogAmplitude, size=3, > mode="nearest") > > > Thanks, > > Michael. > -- > > > If curious here are the 8 lines of matlab code I am trying to port: > > %% Read image from file > inImg = im2double(rgb2gray(imread('yourImage.jpg'))); > inImg = imresize(inImg, 64/size(inImg, 2)); > > %% Spectral Residual > myFFT = fft2(inImg); > myLogAmplitude = log(abs(myFFT)); > myPhase = angle(myFFT); > mySpectralResidual = myLogAmplitude - imfilter(myLogAmplitude, > fspecial('average', 3), 'replicate'); > saliencyMap = abs(ifft2(exp(mySpectralResidual + i*myPhase))).^2; > > %% After Effect > saliencyMap = mat2gray(imfilter(saliencyMap, fspecial('gaussian', > [10, 10], 2.5))); > imshow(saliencyMap); > > > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > -- > --- > Travis Oliphant > Continuum Analytics, Inc. > http://www.continuum.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed.ludlow at gmail.com Tue May 21 09:54:00 2013 From: jed.ludlow at gmail.com (Jed Ludlow) Date: Tue, 21 May 2013 07:54:00 -0600 Subject: [SciPy-User] Is Macports version of Spyder 2.2.0 same as "official" release? In-Reply-To: References: Message-ID: On Thu, May 9, 2013 at 2:59 AM, Jerry wrote: > Hi, > > I see that Spyder 2.2.0 (Scientific PYthon Development EnviRonment) is > out. There is an official .dmg distribution available from the project's > home page. According to the list of changes for 2.2.0 > http://code.google.com/p/spyderlib/wiki/ChangeLog is this: > > "The App comes with its own interpreter, which has the main Python > scientific libraries preinstalled: Numpy, SciPy, Matplotlib, IPython, > Pandas, Sympy, Scikit-learn and Scikit-image." > > Does the MacPorts version of Spyder 2.2.0 include all of these add-ons? I > hate duplicating things such as Python interpreters, Numpy, etc. all over > the place if I can do it once with MacPorts so I would much prefer to use > the MacPorts version if possible. > > Also, the MacPorts page lists both py-spyder 2.2.0 and py27-spyder 2.2.0. > Are these the same? > > Thanks, > Jerry > Jerry, This as actually really valuable feedback for us on the Spyder team. I've posted your question to the Spyder list here if you'd like to follow it: https://groups.google.com/d/msg/spyderlib/jjRHQSi6qFE/qDrRUtjggeoJ Jed -------------- next part -------------- An HTML attachment was scrubbed... URL: From glciampagl at gmail.com Tue May 21 14:49:34 2013 From: glciampagl at gmail.com (Giovanni Luca Ciampaglia) Date: Tue, 21 May 2013 14:49:34 -0400 Subject: [SciPy-User] MemoryError in scipy.sparse.csgraph.shortest_path Message-ID: <519BC1BE.8010503@gmail.com> Hi, I want to compute the shortest path distances in a sparse directed graph with 2M nodes but scipy.csgraph.shortest_path immediately throws a MemoryError -- regardless of the chosen method. This is what I do: > In [2]: import numpy as np > > In [3]: import scipy.sparse as sp > > In [4]: import scipy.sparse.csgraph as csg > > In [5]: a = np.load('adjacency_isa.npy') > > In [6]: adj = sp.coo_matrix((a['weight'], (a['row'], a['col'])), (2351254,)*2) > > In [7]: adj = adj.tocsr() > And this is what I get: > In [8]: D = csg.shortest_path(adj, directed=True) > --------------------------------------------------------------------------- > MemoryError Traceback (most recent call last) > in () > ----> 1 D = csg.shortest_path(adj, directed=True) > > /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so in > scipy.sparse.csgraph._shortest_path.shortest_path > (scipy/sparse/csgraph/_shortest_path.c:2117)() > > /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so in > scipy.sparse.csgraph._shortest_path.dijkstra > (scipy/sparse/csgraph/_shortest_path.c:3948)() > > MemoryError: > > In [9]: D = csg.dijkstra(adj, directed=True) > --------------------------------------------------------------------------- > MemoryError Traceback (most recent call last) > in () > ----> 1 D = csg.dijkstra(adj, directed=True) > > /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so in > scipy.sparse.csgraph._shortest_path.dijkstra > (scipy/sparse/csgraph/_shortest_path.c:3948)() > > MemoryError: > > In [10]: D = csg.floyd_warshall(adj, directed=True) > --------------------------------------------------------------------------- > MemoryError Traceback (most recent call last) > in () > ----> 1 D = csg.floyd_warshall(adj, directed=True) > > /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so in > scipy.sparse.csgraph._shortest_path.floyd_warshall > (scipy/sparse/csgraph/_shortest_path.c:2457)() > > /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_validation.pyc > in validate_graph(csgraph, directed, dtype, csr_output, dense_output, > copy_if_dense, copy_if_sparse, null_value_in, null_value_out, infinity_null, > nan_null) > 26 csgraph = csr_matrix(csgraph, dtype=DTYPE, copy=copy_if_sparse) > 27 else: > ---> 28 csgraph = csgraph_to_dense(csgraph, null_value=null_value_out) > 29 elif np.ma.is_masked(csgraph): > 30 if dense_output: > > /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_tools.so in > scipy.sparse.csgraph._tools.csgraph_to_dense > (scipy/sparse/csgraph/_tools.c:2984)() > > MemoryError: It seems that there are two distinct issues: 1. floyd_warshall() calls validate_graph with csr_output = False (_shortest_path.pyx:218), causing the graph to be converted to dense. I believe this a bug. 2. dijkstra creates a dense distance matrix (_shortest_path.pyx:409). I understand that one cannot make any assumption about the connectivity of the graph, and thus of the sparsity of the distance matrix itself; and of course I can get around this calling dijkstra multiple times with a manageable chunk of indices, and discarding the values that are equal to inf, but it would be nonetheless nice if the code tried to do something similar, at least for the cases when one knows that most of the distances will be inf. Best, -- Giovanni Luca Ciampaglia Postdoctoral fellow Center for Complex Networks and Systems Research Indiana University ? 910 E 10th St ? Bloomington ? IN 47408 ? http://cnets.indiana.edu/ ? gciampag at indiana.edu From cimrman3 at ntc.zcu.cz Wed May 22 09:48:28 2013 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Wed, 22 May 2013 15:48:28 +0200 Subject: [SciPy-User] ANN: SfePy 2013.2 Message-ID: <519CCCAC.1000203@ntc.zcu.cz> I am pleased to announce release 2013.2 of SfePy. Description ----------- SfePy (simple finite elements in Python) is a software for solving systems of coupled partial differential equations by the finite element method. The code is based on NumPy and SciPy packages. It is distributed under the new BSD license. Home page: http://sfepy.org Downloads, mailing list, wiki: http://code.google.com/p/sfepy/ Git (source) repository, issue tracker: http://github.com/sfepy Highlights of this release -------------------------- - automatic testing of term calls (many terms fixed w.r.t. corner cases) - new elastic contact plane term + example - translated low level base functions from Cython to C for reusability - improved gallery http://docs.sfepy.org/gallery/gallery For full release notes see http://docs.sfepy.org/doc/release_notes.html#id1 (rather long and technical). Best regards, Robert Cimrman and Contributors (*) (*) Contributors to this release (alphabetical order): Vladim?r Luke?, Ankit Mahato, Maty?? Nov?k From J.M.Rowe at exeter.ac.uk Thu May 23 10:02:24 2013 From: J.M.Rowe at exeter.ac.uk (John Rowe) Date: Thu, 23 May 2013 15:02:24 +0100 Subject: [SciPy-User] Dynamic library loading problem Message-ID: <1369317744.18275.3085.camel@amp> I have installed python3.3.2 and the latest numpy and scipy on linux x86_64 using the Intel compilers and MKL following a few variants of the advice on http://www.scipy.org/Installing_SciPy/Linux and http://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl. However, although the installation goes fine the moment I try to use it it fails, for example: > python3 -c 'import scipy; scipy.test()' Running unit tests for scipy NumPy version 1.7.0 NumPy is installed in /usr/local64/python3.3.2/lib/python3.3/site-packages/numpy SciPy version 0.12.0 SciPy is installed in /usr/local64/python3.3.2/lib/python3.3/site-packages/scipy Python version 3.3.2 (default, May 22 2013, 15:10:00) [GCC Intel(R) C++ gcc 4.3 mode] nose version 1.3.0 ............................................................................ *** libmkl_mc3.so *** failed with error : /opt/intel/intel-fc/x86_64/11.1.064/mkl/lib/em64t/libmkl_mc3.so: undefined symbol: mkl_dft_commit_descriptor_s_c2c_md_omp *** libmkl_def.so *** failed with error : /opt/intel/intel-fc/x86_64/11.1.064/mkl/lib/em64t/libmkl_def.so: undefined symbol: mkl_dft_commit_descriptor_s_c2c_md_omp MKL FATAL ERROR: Cannot load neither libmkl_mc3.so nor libmkl_def.so I get pretty much the same message with numpy.test() and if I try to run the actual program, but with different libraries and missing symbols. The missing symbols always appear to be ones that are found in the specified mkl shared libraries, eg in this case: > nm libmkl_intel_thread.so | grep \ mkl_dft_commit_descriptor_s_c2c_md_omp 00000000006181a0 T mkl_dft_commit_descriptor_s_c2c_md_omp This makes me think it's a systemic issue with shared libraries rather than me having just missed one out. Any help would be hugely appreciated (scipy config follows). Thanks John > python3 -c 'import scipy; scipy.show_config()' blas_mkl_info: libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'mkl_mc', 'iomp5', 'pthread'] define_macros = [('SCIPY_MKL_H', None)] library_dirs = ['/opt/intel/intel-fc/x86_64/11.1.064/mkl/lib/em64t', '/opt/intel/intel-fc/x86_64/11.1.064/lib/intel64'] include_dirs = ['/opt/intel/intel-fc/x86_64/11.1.064/mkl/include/em64t', '/opt/intel/intel-fc/x86_64/11.1.064/include/intel64'] lapack_mkl_info: libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'mkl_mc', 'iomp5', 'pthread'] define_macros = [('SCIPY_MKL_H', None)] library_dirs = ['/opt/intel/intel-fc/x86_64/11.1.064/mkl/lib/em64t', '/opt/intel/intel-fc/x86_64/11.1.064/lib/intel64'] include_dirs = ['/opt/intel/intel-fc/x86_64/11.1.064/mkl/include/em64t', '/opt/intel/intel-fc/x86_64/11.1.064/include/intel64'] lapack_opt_info: libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'mkl_mc', 'iomp5', 'pthread'] define_macros = [('SCIPY_MKL_H', None)] library_dirs = ['/opt/intel/intel-fc/x86_64/11.1.064/mkl/lib/em64t', '/opt/intel/intel-fc/x86_64/11.1.064/lib/intel64'] include_dirs = ['/opt/intel/intel-fc/x86_64/11.1.064/mkl/include/em64t', '/opt/intel/intel-fc/x86_64/11.1.064/include/intel64'] umfpack_info: NOT AVAILABLE mkl_info: libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'mkl_mc', 'iomp5', 'pthread'] define_macros = [('SCIPY_MKL_H', None)] library_dirs = ['/opt/intel/intel-fc/x86_64/11.1.064/mkl/lib/em64t', '/opt/intel/intel-fc/x86_64/11.1.064/lib/intel64'] include_dirs = ['/opt/intel/intel-fc/x86_64/11.1.064/mkl/include/em64t', '/opt/intel/intel-fc/x86_64/11.1.064/include/intel64'] blas_opt_info: libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'mkl_mc', 'iomp5', 'pthread'] define_macros = [('SCIPY_MKL_H', None)] library_dirs = ['/opt/intel/intel-fc/x86_64/11.1.064/mkl/lib/em64t', '/opt/intel/intel-fc/x86_64/11.1.064/lib/intel64'] include_dirs = ['/opt/intel/intel-fc/x86_64/11.1.064/mkl/include/em64t', '/opt/intel/intel-fc/x86_64/11.1.064/include/intel64'] From otrov at hush.ai Fri May 24 03:29:19 2013 From: otrov at hush.ai (zetah) Date: Fri, 24 May 2013 09:29:19 +0200 Subject: [SciPy-User] How to read CSV with missing data in numpy array? Message-ID: <20130524072920.519AFA6E42@smtp.hushmail.com> Trying to read this csv: 6;6;7;8;3; 8;10;8;;5; 3;5;6;7;; with this code: loadtxt('test.csv', delimiter=';', usecols=range(5)) yields "ValueError: could not convert string to float: " Is there any easy way numpy to read this CSV? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From servant.mathieu at gmail.com Fri May 24 05:15:00 2013 From: servant.mathieu at gmail.com (servant mathieu) Date: Fri, 24 May 2013 11:15:00 +0200 Subject: [SciPy-User] How to cite the scipy.optimize module in a paper? Message-ID: Dear community, I've recently fitted a model to empirical data using the Simplex routine implemented in the scipy.optimize module. How could I cite this work in a paper using the APA style? Cheers, Mathieu -------------- next part -------------- An HTML attachment was scrubbed... URL: From gary.ruben at gmail.com Fri May 24 05:22:18 2013 From: gary.ruben at gmail.com (gary ruben) Date: Fri, 24 May 2013 19:22:18 +1000 Subject: [SciPy-User] How to read CSV with missing data in numpy array? In-Reply-To: <20130524072920.519AFA6E42@smtp.hushmail.com> References: <20130524072920.519AFA6E42@smtp.hushmail.com> Message-ID: Yes, genfromtxt('test.csv', delimiter=';') On 24 May 2013 17:29, zetah wrote: > Trying to read this csv: > > 6;6;7;8;3; > 8;10;8;;5; > 3;5;6;7;; > > with this code: > > loadtxt('test.csv', delimiter=';', usecols=range(5)) > > yields "ValueError: could not convert string to float: " > > Is there any easy way numpy to read this CSV? > > Thanks > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at hilboll.de Fri May 24 05:19:22 2013 From: lists at hilboll.de (Andreas Hilboll) Date: Fri, 24 May 2013 11:19:22 +0200 Subject: [SciPy-User] How to cite the scipy.optimize module in a paper? In-Reply-To: References: Message-ID: <519F309A.7000505@hilboll.de> On 24.05.2013 11:15, servant mathieu wrote: > Dear community, > > I've recently fitted a model to empirical data using the Simplex routine > implemented in the scipy.optimize module. How could I cite this work in > a paper using the APA style? For SciPy, I'd follow http://www.scipy.org/Citing_SciPy. The fmin docstring lists two references, you could just cite those? -- Andreas. From otrov at hush.ai Fri May 24 05:38:29 2013 From: otrov at hush.ai (zetah) Date: Fri, 24 May 2013 11:38:29 +0200 Subject: [SciPy-User] How to read CSV with missing data in numpy array Message-ID: <20130524093830.82E9DA6E38@smtp.hushmail.com> `genfromtxt()` works beautifully. Thanks Gary From pav at iki.fi Fri May 24 05:40:45 2013 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 24 May 2013 09:40:45 +0000 (UTC) Subject: [SciPy-User] How to cite the scipy.optimize module in a paper? References: Message-ID: servant mathieu gmail.com> writes: > I've recently fitted a model to empirical data using the > Simplex?routine implemented in the scipy.optimize module. > How?could I?cite this work in a paper?using the APA style? I don't know about APA style, but if it was my own paper, I'd just consider the simplex method a "well known basic method" and wouldn't necessarily bother citing which implementation was used, unless this is somehow central to the work. If it's central or the method is not a basic one, I'd maybe say "using method XXX [1] implemented in Scipy [2]" where [1] is a citation to the original paper or a relevant review article or a book introducing method XXX, and [2] a citation to Scipy. The documentation in Scipy often contains citations of the articles the implementation is based on. It's important to remember to support the authors of those articles by citing appropriately. Something was written on citing Scipy: http://new.scipy.org/scipylib/citing.html -- Pauli Virtanen From pierre.raybaut at gmail.com Sat May 25 10:31:11 2013 From: pierre.raybaut at gmail.com (Pierre Raybaut) Date: Sat, 25 May 2013 16:31:11 +0200 Subject: [SciPy-User] ANN: New WinPython with Python 2.7.5 and 3.3.2 (32/64bit) Message-ID: Hi all, I am pleased to announce that four new versions of WinPython have been released yesterday with Python 2.7.5 and 3.3.2, 32 and 64 bits. Many packages have been added or upgraded. Special thanks to Christoph Gohlke for building most of the binary packages bundled in WinPython. WinPython is a free open-source portable distribution of Python for Windows, designed for scientists. It is a full-featured (see http://code.google.com/p/winpython/wiki/PackageIndex) Python-based scientific environment: * Designed for scientists (thanks to the integrated libraries NumPy, SciPy, Matplotlib, guiqwt, etc.: * Regular *scientific users*: interactive data processing and visualization using Python with Spyder * *Advanced scientific users and software developers*: Python applications development with Spyder, version control with Mercurial and other development tools (like gettext) * *Portable*: preconfigured, it should run out of the box on any machine under Windows (without any installation requirements) and the folder containing WinPython can be moved to any location (local, network or removable drive) * *Flexible*: one can install (or should I write "use" as it's portable) as many WinPython versions as necessary (like isolated and self-consistent environments), even if those versions are running different versions of Python (2.7, 3.x in the near future) or different architectures (32bit or 64bit) on the same machine * *Customizable*: using the integrated package manager (wppm, as WinPython Package Manager), it's possible to install, uninstall or upgrade Python packages (see http://code.google.com/p/winpython/wiki/WPPM for more details on supported package formats). *WinPython is not an attempt to replace Python(x,y)*, this is just something different (see http://code.google.com/p/winpython/wiki/Roadmap): more flexible, easier to maintain, movable and less invasive for the OS, but certainly less user-friendly, with less packages/contents and without any integration to Windows explorer [*]. [*] Actually there is an optional integration into Windows explorer, providing the same features as the official Python installer regarding file associations and context menu entry (this option may be activated through the WinPython Control Panel), and adding shortcuts to Windows Start menu. Enjoy! -Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: From otrov at hush.ai Sat May 25 18:04:10 2013 From: otrov at hush.ai (zetah) Date: Sun, 26 May 2013 00:04:10 +0200 Subject: [SciPy-User] ndimage.zoom for array with nan values Message-ID: <20130525220410.6FEA8E6739@smtp.hushmail.com> When I try to interpolate array with nan values, with ndimage.zoom I get whole array of nans For example ndimage.zoom(a, 3), for a: numpy.array([[ 0. , 5. , 7.2, 5. , nan, 3.9, 3.9, 7.2, -1.1, 1.1, nan, 8.9, 11.1, 12.2, 7.2, 0. , 2.8, 2.2, ... results in something like: [[nan, nan, nan...]] whole array filled with nan values. What function should I use to interpolate/resample array with nans? Thanks in advance From otrov at hush.ai Sat May 25 18:47:07 2013 From: otrov at hush.ai (zetah) Date: Sun, 26 May 2013 00:47:07 +0200 Subject: [SciPy-User] ndimage.zoom for array with nan values Message-ID: <20130525224707.EAC37E6736@smtp.hushmail.com> Found an answer that's acceptable for me: http://astrolitterbox.blogspot.com/2012/03/healing-holes-in-arrays-in-python.html Code at the bottom of the post does inpainting magic to numpy arrays, just like inpainting does magic with images. My results: 1. original array imshow and healed array: http://i.imgur.com/dYS8Rn3.png 2. original contour and healed + zoomed 3x contour: http://i.imgur.com/V4upQ6w.png From njs at pobox.com Sun May 26 14:17:52 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 26 May 2013 19:17:52 +0100 Subject: [SciPy-User] Weighted kernel density estimate? Message-ID: Hey all, I have a probability distribution represented as a set of weighted samples. I'd like to visualize it. Right now I'm using the weights= argument to np.histogram2d to do this, which is nice. What would be even nicer, though, would be if I could use a KDE instead of histogram. But the Python KDE routines I've found in a quick google (like, scipy.stats.kde) don't seem to have any sort of weights argument. Any suggestions? -n From zachary.pincus at yale.edu Mon May 27 10:33:28 2013 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Mon, 27 May 2013 10:33:28 -0400 Subject: [SciPy-User] Weighted kernel density estimate? In-Reply-To: References: Message-ID: > I have a probability distribution represented as a set of weighted > samples. I'd like to visualize it. Right now I'm using the weights= > argument to np.histogram2d to do this, which is nice. What would be > even nicer, though, would be if I could use a KDE instead of > histogram. But the Python KDE routines I've found in a quick google > (like, scipy.stats.kde) don't seem to have any sort of weights > argument. Any suggestions? > There have been a couple threads on this topic: http://mail.scipy.org/pipermail/scipy-user/2012-May/032202.html http://mail.scipy.org/pipermail/scipy-user/2013-January/033956.html Attached is a modification to the scipy KDE routine that handles weights (only for density estimation, and none of the other tasks like integration), based on these threads. The code is not well tested, but seems to work. See the previous threads for some caveats... Zach -------------- next part -------------- A non-text attachment was scrubbed... Name: weighted_kde.py Type: text/x-python-script Size: 4770 bytes Desc: not available URL: From alan.isaac at gmail.com Mon May 27 13:44:13 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 27 May 2013 13:44:13 -0400 Subject: [SciPy-User] peer review of scientific software Message-ID: <51A39B6D.4030607@gmail.com> http://www.sciencemag.org/content/340/6134/814.summary Alan Isaac From Andrew.G.York+scipy at gmail.com Mon May 27 19:11:36 2013 From: Andrew.G.York+scipy at gmail.com (Andrew York) Date: Mon, 27 May 2013 19:11:36 -0400 Subject: [SciPy-User] fftconvolve performance depends on padding to a power of two Message-ID: I'm using scipy.signal.fftconvolve in a performance-sensitive context, and I've found that its default behavior (padding array size to the next power of two) does not always give the best speed, and can consume a lot more memory. I made a modified version that lets the user choose whether or not to pad to the nearest power of two, and in some contexts, it goes faster by a reasonable amount: Shape of input data: (10L, 100L, 100L) Time elapsed with 2**n padding: 0.144182774132 Time elapsed without 2**n padding: 0.112595033085 Shape of input data: (10L, 200L, 200L) Time elapsed with 2**n padding: 0.636056829348 Time elapsed without 2**n padding: 0.713067174562 Shape of input data: (10L, 300L, 300L) Time elapsed with 2**n padding: 3.10080130933 Time elapsed without 2**n padding: 1.0824417215 Assuming I'm not doing something really dumb, this might be a useful option to expose to the user. Here's the modified code, very similar to the existing fftconvolve code: import numpy as np from numpy import array, product from scipy.fftpack import fftn, ifftn from scipy.signal.signaltools import _centered def fftconvolve(in1, in2, mode="full", pad_to_power_of_two=True): """Convolve two N-dimensional arrays using FFT. See convolve. """ s1 = array(in1.shape) s2 = array(in2.shape) complex_result = (np.issubdtype(in1.dtype, np.complex) or np.issubdtype(in2.dtype, np.complex)) size = s1 + s2 - 1 if pad_to_power_of_two: # Use 2**n-sized FFT; it might improve performance fsize = 2 ** np.ceil(np.log2(size)) else: # Padding to a power of two might degrade performance, too fsize = size IN1 = fftn(in1, fsize) IN1 *= fftn(in2, fsize) fslice = tuple([slice(0, int(sz)) for sz in size]) ret = ifftn(IN1)[fslice].copy() del IN1 if not complex_result: ret = ret.real if mode == "full": return ret elif mode == "same": if product(s1, axis=0) > product(s2, axis=0): osize = s1 else: osize = s2 return _centered(ret, osize) elif mode == "valid": return _centered(ret, abs(s2 - s1) + 1) if __name__ == '__main__': import time for sz in (100, 200, 300): a = np.zeros((10, sz, sz), dtype=np.float64) b = np.zeros((10, 20, 20), dtype=np.float64) print "Shape of input data:", a.shape start = time.clock() fftconvolve(a, b, mode='same', pad_to_power_of_two=True) end = time.clock() print "Time elapsed with 2**n padding:", end - start start = time.clock() fftconvolve(a, b, mode='same', pad_to_power_of_two=False) end = time.clock() print "Time elapsed without 2**n padding:", end - start print -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Tue May 28 10:52:09 2013 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 28 May 2013 17:52:09 +0300 Subject: [SciPy-User] fftconvolve performance depends on padding to a power of two In-Reply-To: References: Message-ID: 28.05.2013 02:11, Andrew York kirjoitti: > I'm using scipy.signal.fftconvolve in a performance-sensitive context, > and I've found that its default behavior (padding array size to the next > power of two) does not always give the best speed, and can consume a lot > more memory. I made a modified version that lets the user choose whether > or not to pad to the nearest power of two, and in some contexts, it goes > faster by a reasonable amount: It could be changed to pad to the nearest composite number of 2, 3, and 5, as this is what the FFT code has special cases for. Adding a new keyword option for controlling the padding is perhaps not so optimal, as there is a good heuristics -- Pauli Virtanen From Andrew.G.York+scipy at gmail.com Tue May 28 13:11:37 2013 From: Andrew.G.York+scipy at gmail.com (Andrew York) Date: Tue, 28 May 2013 13:11:37 -0400 Subject: [SciPy-User] fftconvolve performance depends on padding to a power of two In-Reply-To: References: Message-ID: Good suggestion! Would it be helpful for me to try to code the behavior you suggest, or better I should leave it to those with more experience? On Tue, May 28, 2013 at 10:52 AM, Pauli Virtanen wrote: > 28.05.2013 02:11, Andrew York kirjoitti: > > I'm using scipy.signal.fftconvolve in a performance-sensitive context, > > and I've found that its default behavior (padding array size to the next > > power of two) does not always give the best speed, and can consume a lot > > more memory. I made a modified version that lets the user choose whether > > or not to pad to the nearest power of two, and in some contexts, it goes > > faster by a reasonable amount: > > It could be changed to pad to the nearest composite number of 2, 3, and > 5, as this is what the FFT code has special cases for. > > Adding a new keyword option for controlling the padding is perhaps not > so optimal, as there is a good heuristics > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mutantturkey at gmail.com Tue May 28 14:23:27 2013 From: mutantturkey at gmail.com (Calvin Morrison) Date: Tue, 28 May 2013 14:23:27 -0400 Subject: [SciPy-User] peer review of scientific software In-Reply-To: <51A39B6D.4030607@gmail.com> References: <51A39B6D.4030607@gmail.com> Message-ID: On 27 May 2013 13:44, Alan G Isaac wrote: > > http://www.sciencemag.org/content/340/6134/814.summary Maybe I can use this as a ranting point. As a background I am a programmer, but I have been hired by various professors and other academic people to work on projects. My biggest problem with the "scientific computing" community is having to put up with this crap! I am so sick of reading a paper, that will exactly fulfill my needs, only to find out the software has disappeared, only works on HP-UX on one computer in the world, or has absolutely zero documentation. I've started pushing my lab to follow standards, use version control, document our changes, and publish our code. So far it is going really well, but I can only do so much. If only all of all students had a few basic "proper programming practices" courses, everything would go a lot smoother. Teaching programming is fine, but why don't we teach our scientists the best way to collaborate in the 21st century? Below is another related paper that is a good starting point for converting users. Enough emailing tarballs back and forth! Enough undocumented code! Enough is Enough! http://arxiv.org/pdf/1210.0530v3.pdf Pissed-off Scientific Programmer, Calvin Morrison -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Tue May 28 14:27:38 2013 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 28 May 2013 18:27:38 +0000 (UTC) Subject: [SciPy-User] fftconvolve performance depends on padding to a power of two References: Message-ID: Andrew York gmail.com> writes: > Good suggestion! Would it be helpful for me to try to code the behavior you suggest, or better I should leave it to those with more experience?? There's a patch doing most of this among the issues, so that could be an useful point to start. From vanleeuwen.martin at gmail.com Tue May 28 15:41:26 2013 From: vanleeuwen.martin at gmail.com (Martin van Leeuwen) Date: Tue, 28 May 2013 12:41:26 -0700 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: Hi, Nice article. The frustration for students without a formal programming background such as a bachelor in computer science is as big as that for students and Profs that do have such a background, I think. The solution is of course proper education. Code is, like math, a language on its own. Math we often learn from teachers ? not necessarily in the Math department - that went through it themselves years ago. For coding this is different. Programming techniques change so quickly and new languages keep popping up, making transfer of up-to-date knowledge in Academic curricula more challenging. Bad programming practices in Academia (and elsewhere) certainly aren't the frustration of ?good? and ?up-to-date? programmers alone, but they are also the frustration of those who want to learn by example. Much of this learning process evolves through online tutorials, blogs, cookbooks etc, but there are so many of them. Python especially is a language that is used by many, and it is also a recommended beginner?s language, so one can expect a lot of bad programming practices. Perhaps that some sort of ranking of open source code helps making developers realize what is good practice and what is bad practice. Or even a prestigious display of some excellent coding projects, at a variety of levels of complexity or project size. Could Scipy perhaps take the lead by making knowledge as to what are good programming skills more accessible through some sort of open ranking/voting platform? Martin 2013/5/28 Calvin Morrison > > > > On 27 May 2013 13:44, Alan G Isaac wrote: > >> >> http://www.sciencemag.org/content/340/6134/814.summary > > > Maybe I can use this as a ranting point. > > As a background I am a programmer, but I have been hired by various > professors and other academic people to work on projects. My biggest > problem with the "scientific computing" community is having to put up with > this crap! > > I am so sick of reading a paper, that will exactly fulfill my needs, only > to find out the software has disappeared, only works on HP-UX on one > computer in the world, or has absolutely zero documentation. > > I've started pushing my lab to follow standards, use version control, > document our changes, and publish our code. So far it is going really well, > but I can only do so much. > > If only all of all students had a few basic "proper programming practices" > courses, everything would go a lot smoother. Teaching programming is fine, > but why don't we teach our scientists the best way to collaborate in the > 21st century? > > Below is another related paper that is a good starting point for > converting users. Enough emailing tarballs back and forth! Enough > undocumented code! Enough is Enough! > > http://arxiv.org/pdf/1210.0530v3.pdf > > Pissed-off Scientific Programmer, > Calvin Morrison > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue May 28 16:00:51 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 28 May 2013 13:00:51 -0700 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: Hi, On Tue, May 28, 2013 at 12:41 PM, Martin van Leeuwen wrote: > Hi, > > Nice article. The frustration for students without a formal programming > background such as a bachelor in computer science is as big as that for > students and Profs that do have such a background, I think. I found the article frustrating - it didn't seem to have much to add to a general set of feelings (that most of us share) that writing code properly is a good idea. The question that always comes up is - why? Most scientists trained in the ad-hoc get-it-to-work model have a rather deep belief that this model is more or less OK, and that doing all that version-control, testing stuff is for programming types with lots of time on their hands. If we want to persuade those guys and gals, we have to come up with something pretty compelling, and I don't think we have that yet. I would love to see some really good data to show that we'd proceed faster as scientists with more organized coding practice, Cheers, Matthew From mutantturkey at gmail.com Tue May 28 16:11:47 2013 From: mutantturkey at gmail.com (Calvin Morrison) Date: Tue, 28 May 2013 16:11:47 -0400 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: On 28 May 2013 16:00, Matthew Brett wrote: > Hi, > > On Tue, May 28, 2013 at 12:41 PM, Martin van Leeuwen > wrote: > > Hi, > > > > Nice article. The frustration for students without a formal programming > > background such as a bachelor in computer science is as big as that for > > students and Profs that do have such a background, I think. > > I found the article frustrating - it didn't seem to have much to add > to a general set of feelings (that most of us share) that writing code > properly is a good idea. > > The question that always comes up is - why? Most scientists trained > in the ad-hoc get-it-to-work model have a rather deep belief that this > model is more or less OK, and that doing all that version-control, > testing stuff is for programming types with lots of time on their > hands. Yes and wearing lab vests, writing down procedures and documenting methods is all just for people with so much time... Version control, unit testing, proper practices in general, actually are time savers. I use version control on my own projects, because it helps me organize my code and allows me to stay organized. Coding well helps me read through my code easier and collaborate with others. The issue is not with the tools, it is with people refusing to learn how to use them. The ability to reproduce results is a very important aspect of science is it not? How can I know if your claims are true if you have hidden software that has never seen the light of day? How can you benefit the community if nobody can use your software? If we want to persuade those guys and gals, we have to come up > with something pretty compelling, and I don't think we have that yet. > I would love to see some really good data to show that we'd proceed > faster as scientists with more organized coding practice, > Proceed faster as scientists individually? Maybe not. But as an aggregate, the community most certainly benefit from not having to reimplement tools, by developing tools for other people to use, by making it easier to collaborate and reasons that I can't even think of! What is the point of publishing works if you aren't publishing the tools? Who are you helping? The community, the populace as a whole, or your silly CV? Calvin -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue May 28 16:14:11 2013 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 28 May 2013 21:14:11 +0100 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: On Tue, May 28, 2013 at 9:00 PM, Matthew Brett wrote: > The question that always comes up is - why? Most scientists trained > in the ad-hoc get-it-to-work model have a rather deep belief that this > model is more or less OK, and that doing all that version-control, > testing stuff is for programming types with lots of time on their > hands. If we want to persuade those guys and gals, we have to come up > with something pretty compelling, and I don't think we have that yet. > I would love to see some really good data to show that we'd proceed > faster as scientists with more organized coding practice, I always make newbies read this: http://boscoh.com/protein/a-sign-a-flipped-structure-and-a-scientific-flameout-of-epic-proportions.html (Short version: basically someone's career being destroyed by a sign error in code written by some random person in the next lab.) Then I show them the magic of 'if __name__ == "__main__": import nose; nose.runmodule()'. Maybe it sinks in sometimes... I agree that while it's all very noble to talk about the benefit to science as a whole and the long term benefits of eating your vegetables and blah blah blah, the actually compelling reason to use VCS and test everything is that it always turns out to pay back the investment in, like, tens of minutes. Telling people about the benefits of regular exercise never works either. -n From matthew.brett at gmail.com Tue May 28 16:23:10 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 28 May 2013 13:23:10 -0700 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: Hi, On Tue, May 28, 2013 at 1:11 PM, Calvin Morrison wrote: > > > > On 28 May 2013 16:00, Matthew Brett wrote: >> >> Hi, >> >> On Tue, May 28, 2013 at 12:41 PM, Martin van Leeuwen >> wrote: >> > Hi, >> > >> > Nice article. The frustration for students without a formal programming >> > background such as a bachelor in computer science is as big as that for >> > students and Profs that do have such a background, I think. >> >> I found the article frustrating - it didn't seem to have much to add >> to a general set of feelings (that most of us share) that writing code >> properly is a good idea. >> >> The question that always comes up is - why? Most scientists trained >> in the ad-hoc get-it-to-work model have a rather deep belief that this >> model is more or less OK, and that doing all that version-control, >> testing stuff is for programming types with lots of time on their >> hands. > > > Yes and wearing lab vests, writing down procedures and documenting methods > is all just for people with so much time... > > Version control, unit testing, proper practices in general, actually are > time savers. I use version control on my own projects, because it helps me > organize my code and allows me to stay organized. Coding well helps me read > through my code easier and collaborate with others. The issue is not with > the tools, it is with people refusing to learn how to use them. > > The ability to reproduce results is a very important aspect of science is it > not? How can I know if your claims are true if you have hidden software that > has never seen the light of day? How can you benefit the community if nobody > can use your software? > >> If we want to persuade those guys and gals, we have to come up >> with something pretty compelling, and I don't think we have that yet. >> I would love to see some really good data to show that we'd proceed >> faster as scientists with more organized coding practice, > > > Proceed faster as scientists individually? Maybe not. But as an aggregate, > the community most certainly benefit from not having to reimplement tools, > by developing tools for other people to use, by making it easier to > collaborate and reasons that I can't even think of! > > What is the point of publishing works if you aren't publishing the tools? > Who are you helping? The community, the populace as a whole, or your silly > CV? I have personally been doing these good things "since my youth", and trying to teach other people to do the same. I don't often have much success though, hence my email. The response usually goes something like "I can't afford to waste time, I've got a deadline / I've got to get tenure" etc. If our response is a general 'oh but it's much better to do it that way' - I can assure you, most of the time, that doesn't cut it. As Nathaniel says - for those who are interested - just showing them how to do this stuff can often be enough. "Oh yes, I get it, awesome". For those who sense this as a threat to waste their time in tedious detail - that isn't going to work. We really have to persuade these people that - for a short investment of time - they will reap major benefits - for themselves and / or for their fellow scientists. I don't know of much data to help with this latter thing. I can imagine data, but I don't know where it is, or if it exists... Cheers, Matthew From vanleeuwen.martin at gmail.com Tue May 28 16:23:23 2013 From: vanleeuwen.martin at gmail.com (Martin van Leeuwen) Date: Tue, 28 May 2013 13:23:23 -0700 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: I tend to think that we get less and less time on our sleeves as science seems to become more competitive. But that being said, I don't think we should respond by teaching students to forget about version control and unit testing. If it isn't to support your own debugging efforts, it also provides a handle for anyone attempting to use someone else's code. 2013/5/28 Nathaniel Smith > On Tue, May 28, 2013 at 9:00 PM, Matthew Brett > wrote: > > The question that always comes up is - why? Most scientists trained > > in the ad-hoc get-it-to-work model have a rather deep belief that this > > model is more or less OK, and that doing all that version-control, > > testing stuff is for programming types with lots of time on their > > hands. If we want to persuade those guys and gals, we have to come up > > with something pretty compelling, and I don't think we have that yet. > > I would love to see some really good data to show that we'd proceed > > faster as scientists with more organized coding practice, > > I always make newbies read this: > > http://boscoh.com/protein/a-sign-a-flipped-structure-and-a-scientific-flameout-of-epic-proportions.html > (Short version: basically someone's career being destroyed by a sign > error in code written by some random person in the next lab.) > > Then I show them the magic of 'if __name__ == "__main__": import nose; > nose.runmodule()'. > > Maybe it sinks in sometimes... > > I agree that while it's all very noble to talk about the benefit to > science as a whole and the long term benefits of eating your > vegetables and blah blah blah, the actually compelling reason to use > VCS and test everything is that it always turns out to pay back the > investment in, like, tens of minutes. > > Telling people about the benefits of regular exercise never works either. > > -n > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mutantturkey at gmail.com Tue May 28 16:33:18 2013 From: mutantturkey at gmail.com (Calvin Morrison) Date: Tue, 28 May 2013 16:33:18 -0400 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: On 28 May 2013 16:23, Matthew Brett wrote: > Hi, > > On Tue, May 28, 2013 at 1:11 PM, Calvin Morrison > wrote: > > > > > > > > On 28 May 2013 16:00, Matthew Brett wrote: > >> > >> Hi, > >> > >> On Tue, May 28, 2013 at 12:41 PM, Martin van Leeuwen > >> wrote: > >> > Hi, > >> > > >> > Nice article. The frustration for students without a formal > programming > >> > background such as a bachelor in computer science is as big as that > for > >> > students and Profs that do have such a background, I think. > >> > >> I found the article frustrating - it didn't seem to have much to add > >> to a general set of feelings (that most of us share) that writing code > >> properly is a good idea. > >> > >> The question that always comes up is - why? Most scientists trained > >> in the ad-hoc get-it-to-work model have a rather deep belief that this > >> model is more or less OK, and that doing all that version-control, > >> testing stuff is for programming types with lots of time on their > >> hands. > > > > > > Yes and wearing lab vests, writing down procedures and documenting > methods > > is all just for people with so much time... > > > > Version control, unit testing, proper practices in general, actually are > > time savers. I use version control on my own projects, because it helps > me > > organize my code and allows me to stay organized. Coding well helps me > read > > through my code easier and collaborate with others. The issue is not with > > the tools, it is with people refusing to learn how to use them. > > > > The ability to reproduce results is a very important aspect of science > is it > > not? How can I know if your claims are true if you have hidden software > that > > has never seen the light of day? How can you benefit the community if > nobody > > can use your software? > > > >> If we want to persuade those guys and gals, we have to come up > >> with something pretty compelling, and I don't think we have that yet. > >> I would love to see some really good data to show that we'd proceed > >> faster as scientists with more organized coding practice, > > > > > > Proceed faster as scientists individually? Maybe not. But as an > aggregate, > > the community most certainly benefit from not having to reimplement > tools, > > by developing tools for other people to use, by making it easier to > > collaborate and reasons that I can't even think of! > > > > What is the point of publishing works if you aren't publishing the tools? > > Who are you helping? The community, the populace as a whole, or your > silly > > CV? > > I have personally been doing these good things "since my youth", and > trying to teach other people to do the same. > > I don't often have much success though, hence my email. > > The response usually goes something like "I can't afford to waste > time, I've got a deadline / I've got to get tenure" etc. > > If our response is a general 'oh but it's much better to do it that > way' - I can assure you, most of the time, that doesn't cut it. > > As Nathaniel says - for those who are interested - just showing them > how to do this stuff can often be enough. "Oh yes, I get it, > awesome". > > For those who sense this as a threat to waste their time in tedious > detail - that isn't going to work. > > We really have to persuade these people that - for a short investment > of time - they will reap major benefits - for themselves and / or for > their fellow scientists. I don't know of much data to help with this > latter thing. I can imagine data, but I don't know where it is, or if > it exists... > > We need to persuade people on a large scale. I think it took some time for regular science to establish baseline standards, and as computing is still relatively new, people haven't figured out the baseline standards. I think this is a very important issue, and some real action should be done to improve it. Two ways I can think of is requiring peer-reviewed software, like we review our journals, and requiring the publishing of software with the journal submission. Not "find it on our page", not "materials upon request" but some way that we can guarantee it won't drop off the face of the earth Just an idea, Calvin -------------- next part -------------- An HTML attachment was scrubbed... URL: From cweisiger at msg.ucsf.edu Tue May 28 16:35:51 2013 From: cweisiger at msg.ucsf.edu (Chris Weisiger) Date: Tue, 28 May 2013 13:35:51 -0700 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: I joined a lab after it had already developed a substantial codebase, which was littered with comments like: # 20060824 recalibrated # old values [0.08, .11, .09] I chucked the entire codebase into source control, and then went through and started deleting these comments...and had to spend a lot of time convincing my coworkers that nothing was being lost, and that I could retrieve any older version if they just gave me a date! I honestly think that a lot of this stuff is pretty self-evidently valuable, *if you have learned about it to begin with*. My coworkers aren't software developers; they're scientists in biology, physics, optics, etc. Programming is outside of their skillset and thus difficult (just as I would have a lot of trouble designing an experiment), not because they're bad at programming but just because they haven't had the relevant training. In short, if you want scientists to be better programmers, you have to train them to be better programmers. I know my alma mater is placing more of an emphasis on programming these days, since everyone who does any work in the sciences needs at least some skill in that discipline. This doesn't help us with all of the scientists who are currently out there producing bad code and not using standard tools, of course. -Chris On Tue, May 28, 2013 at 1:23 PM, Martin van Leeuwen < vanleeuwen.martin at gmail.com> wrote: > I tend to think that we get less and less time on our sleeves as science > seems to become more competitive. But that being said, I don't think we > should respond by teaching students to forget about version control and > unit testing. If it isn't to support your own debugging efforts, it also > provides a handle for anyone attempting to use someone else's code. > > > 2013/5/28 Nathaniel Smith > >> On Tue, May 28, 2013 at 9:00 PM, Matthew Brett >> wrote: >> > The question that always comes up is - why? Most scientists trained >> > in the ad-hoc get-it-to-work model have a rather deep belief that this >> > model is more or less OK, and that doing all that version-control, >> > testing stuff is for programming types with lots of time on their >> > hands. If we want to persuade those guys and gals, we have to come up >> > with something pretty compelling, and I don't think we have that yet. >> > I would love to see some really good data to show that we'd proceed >> > faster as scientists with more organized coding practice, >> >> I always make newbies read this: >> >> http://boscoh.com/protein/a-sign-a-flipped-structure-and-a-scientific-flameout-of-epic-proportions.html >> (Short version: basically someone's career being destroyed by a sign >> error in code written by some random person in the next lab.) >> >> Then I show them the magic of 'if __name__ == "__main__": import nose; >> nose.runmodule()'. >> >> Maybe it sinks in sometimes... >> >> I agree that while it's all very noble to talk about the benefit to >> science as a whole and the long term benefits of eating your >> vegetables and blah blah blah, the actually compelling reason to use >> VCS and test everything is that it always turns out to pay back the >> investment in, like, tens of minutes. >> >> Telling people about the benefits of regular exercise never works either. >> >> -n >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From newville at cars.uchicago.edu Tue May 28 16:58:44 2013 From: newville at cars.uchicago.edu (Matt Newville) Date: Tue, 28 May 2013 15:58:44 -0500 Subject: [SciPy-User] peer review of scientific software In-Reply-To: <51A39B6D.4030607@gmail.com> References: <51A39B6D.4030607@gmail.com> Message-ID: Hi, As others have said, I find the low average programming skill level among scientists frustrating, but I also found this article quite frustrating. >From my perspective, the authors main complaint seems to be that there is not enough independent checking of specialized scientific software written by scientists. They seem particularly unhappy about the tendency to use existing packages written by other scientists based on "trust", "reputation", "previous citations" and without independent checking. They also say: A "well-respected" end-user developer will almost certainly have earned that respect through scientific breakthroughs, perhaps not for their software engineering skills (although agreement on what constitutes "appropriate" scientific software engineering standards is still under debate). On this point in particular, and indeed in this whole line of argument, I think the authors are misguided, perhaps even to the point of fatality damaging their whole argument. I believe much more common case is for the "well-respected" end-user developer to be known for the programs written and supported, and less so for the scientific breakthroughs (unless you count new programs as new instrumentation, and so, well, breakthroughs, but it's pretty clear that the authors are making a distinction). It's too often the case that spending any significant time on such programs is career suicide, as it takes time and attention away from such breakthroughs. It's perfectly believable that the programming skills of such a scientific developer may be incomplete, but I think it's fair to say that most supported and well-used programs are likely the effort of people with above-average programming skills and the interest and intent to support such programs. Indeed, I would argue that instead of being unhappy about the reliance on trusted programs and developers, the authors would better serve the scientific community by arguing that the authors of such programs should be better supported, and given access to tools and resources (ie, fund them) to improve their work rather than treat them as untrustworthy programmers. I should admit to being one such author of a "well-respected" and "trust" package for a very small scientific discipline, and with the proverbial "many citations etc" because of this. So I would admit to being the just sort of person the authors are unhappy about. I suspect many people on this mailing list are in the same category. I would like to think the trust and respect for certain packages have been earned, and that people use such packages because they are "known to work", both in the sense of actually having been tested on idealized cases, and in producing verifiable results in real cases (where "testing" would not always be possible). Indeed, the small, decentralized group of scientific programmers that I work with (mostly trained as physicists, and learning to program in Fortran -- some of us still use mostly Fortran, in fact) do test and verify such codes, precisely because we know other people use them. Of course errors occur, and of course testing is important. Modern techniques like distributed version control and unit testing are very good tools to use. I agree they should be used more thoroughly, and that one should always be willing to question the results of a computer program. Then again, when was the last time I tested the correctness of results from my handheld HP calculator? Hmm, a very, very long time ago. That's software. I tend to believe the messages I read in my inbox are actually the message sent, and hardly ever do a checksum on it. But that's software. Indeed, all science is a social enterprise and so "trust", "reputation", and reliance on the literature (aka "past experience") are not merely unfortunate outcomes of laziness, but an important part of the process. I am certainly am happy to support the notion that "more scientists should be able to program better", so I am not going to say the entire article is wrong, and I don't disagree with their main conclusions. But I think they have a fatal flaw in their assumptions and arguments. --Matt Newville From hasslerjc at comcast.net Tue May 28 17:52:39 2013 From: hasslerjc at comcast.net (John Hassler) Date: Tue, 28 May 2013 17:52:39 -0400 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: <51A52727.9080606@comcast.net> On 5/28/2013 4:58 PM, Matt Newville wrote: > Hi, > > As others have said, I find the low average programming skill level > among scientists frustrating, but I also found this article quite > frustrating. > > >From my perspective, the authors main complaint seems to be that there > is not enough independent checking of specialized scientific software > written by scientists. They seem particularly unhappy about the > tendency to use existing packages written by other scientists based on > "trust", "reputation", "previous citations" and without independent > checking. They also say: > > A "well-respected" end-user developer will almost certainly have > earned that respect > through scientific breakthroughs, perhaps not for their software > engineering skills > (although agreement on what constitutes "appropriate" scientific > software engineering > standards is still under debate). > > On this point in particular, and indeed in this whole line of > argument, I think the authors are misguided, perhaps even to the point > of fatality damaging their whole argument. I believe much more > common case is for the "well-respected" end-user developer to be known > for the programs written and supported, and less so for the scientific > breakthroughs (unless you count new programs as new instrumentation, > and so, well, breakthroughs, but it's pretty clear that the authors > are making a distinction). It's too often the case that spending > any significant time on such programs is career suicide, as it takes > time and attention away from such breakthroughs. It's perfectly > believable that the programming skills of such a scientific developer > may be incomplete, but I think it's fair to say that most supported > and well-used programs are likely the effort of people with > above-average programming skills and the interest and intent to > support such programs. Indeed, I would argue that instead of being > unhappy about the reliance on trusted programs and developers, the > authors would better serve the scientific community by arguing that > the authors of such programs should be better supported, and given > access to tools and resources (ie, fund them) to improve their work > rather than treat them as untrustworthy programmers. > > I should admit to being one such author of a "well-respected" and > "trust" package for a very small scientific discipline, and with the > proverbial "many citations etc" because of this. So I would admit to > being the just sort of person the authors are unhappy about. I > suspect many people on this mailing list are in the same category. I > would like to think the trust and respect for certain packages have > been earned, and that people use such packages because they are "known > to work", both in the sense of actually having been tested on > idealized cases, and in producing verifiable results in real cases > (where "testing" would not always be possible). Indeed, the small, > decentralized group of scientific programmers that I work with (mostly > trained as physicists, and learning to program in Fortran -- some of > us still use mostly Fortran, in fact) do test and verify such codes, > precisely because we know other people use them. Of course errors > occur, and of course testing is important. Modern techniques like > distributed version control and unit testing are very good tools to > use. I agree they should be used more thoroughly, and that one > should always be willing to question the results of a computer > program. > > Then again, when was the last time I tested the correctness of results > from my handheld HP calculator? Hmm, a very, very long time ago. > That's software. I tend to believe the messages I read in my inbox > are actually the message sent, and hardly ever do a checksum on it. > But that's software. Indeed, all science is a social enterprise and > so "trust", "reputation", and reliance on the literature (aka "past > experience") are not merely unfortunate outcomes of laziness, but an > important part of the process. > > I am certainly am happy to support the notion that "more scientists > should be able to program better", so I am not going to say the > entire article is wrong, and I don't disagree with their main > conclusions. But I think they have a fatal flaw in their assumptions > and arguments. > > --Matt Newville > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-use Exactly! There is actually a question here that hasn't been made explicit. For whom is this advice intended? There are all levels of programming/programmers in STEM. Some of my colleagues use Excel for everything. (As in, EVERYTHING.) Some fewer use Matlab. Still fewer use C/Fortran/Java/C#/whatever. So far as I know, I'm the one lone Pythonista. Each group uses programming differently. I've been programming for more than 50 years. I've taught programming to engineers in several contexts over the years. For a time, I really wanted to 'do it right.' (I even taught 'structured programming' and 'Warnier-Orr' at one point, but realized that it was worse than useless for the particular audience.) I've come to realize that most engineers just want an answer. They are not interested in how gracefully the answer was arrived at. MOST programs written by MOST engineers are small, short, simple, and intended to solve one problem one time. (The deficiency I've most often seen is the lack of error checking for the answer, and better programming techniques would not generally help much.) The problem is that nobody sets out to write a "well respected" program. Someone sets out to scratch a particular itch ('one problem one time'). It expands. Others find it useful. It becomes widely used. The original author, however, was solving his/her own particular problem, and was not at all interested in "proper" programming. So, I guess my question is, how do we find that person who is going to write the "well respected" program and convince him/her to take time out and learn proper programming first? Because we are certainly not going to convince everybody to do it. john From matthew.brett at gmail.com Tue May 28 18:05:37 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 28 May 2013 15:05:37 -0700 Subject: [SciPy-User] peer review of scientific software In-Reply-To: <51A52727.9080606@comcast.net> References: <51A39B6D.4030607@gmail.com> <51A52727.9080606@comcast.net> Message-ID: Hi, On Tue, May 28, 2013 at 2:52 PM, John Hassler wrote: > > On 5/28/2013 4:58 PM, Matt Newville wrote: >> I am certainly am happy to support the notion that "more scientists >> should be able to program better", so I am not going to say the >> entire article is wrong, and I don't disagree with their main >> conclusions. But I think they have a fatal flaw in their assumptions >> and arguments. >> >> --Matt Newville >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-use > > Exactly! There is actually a question here that hasn't been made > explicit. For whom is this advice intended? There are all levels of > programming/programmers in STEM. Some of my colleagues use Excel for > everything. (As in, EVERYTHING.) Some fewer use Matlab. Still fewer > use C/Fortran/Java/C#/whatever. So far as I know, I'm the one lone > Pythonista. Each group uses programming differently. > > I've been programming for more than 50 years. I've taught programming > to engineers in several contexts over the years. For a time, I really > wanted to 'do it right.' (I even taught 'structured programming' and > 'Warnier-Orr' at one point, but realized that it was worse than useless > for the particular audience.) I've come to realize that most engineers > just want an answer. They are not interested in how gracefully the > answer was arrived at. MOST programs written by MOST engineers are > small, short, simple, and intended to solve one problem one time. (The > deficiency I've most often seen is the lack of error checking for the > answer, and better programming techniques would not generally help much.) > > The problem is that nobody sets out to write a "well respected" > program. Someone sets out to scratch a particular itch ('one problem > one time'). It expands. Others find it useful. It becomes widely > used. The original author, however, was solving his/her own particular > problem, and was not at all interested in "proper" programming. So, I > guess my question is, how do we find that person who is going to write > the "well respected" program and convince him/her to take time out and > learn proper programming first? Because we are certainly not going to > convince everybody to do it. You might find this reference interesting : Basili, Victor R., et al. "Understanding the High-Performance-Computing Community." (2008). I found it from the Joppa article : http://blog.nipy.org/science-joins-software.html The take home message seems to be - "we tell scientists to use our fancy stuff, they tell us no, and now we realize they were often right". That article is about high-level programming tools, but it must be entirely different for version control, testing, code review, in particular. I believe these tools are very fundamental in controlling error. The point about error is the central, for me. As I proceed further down my scientific career, I slowly begin to realize the number of errors we make, and how easy we find it to miss them: http://blog.nipy.org/unscientific-programming.html That, for me, is the key argument - we will make fewer mistakes and do better science if we use the basic tools to help us control error and to help others find our errors. Most scientists (myself included) tend to believe this error is not very important. I believe that's is wrong, but as scientists we don't believe everything we think, and so we need data. I wonder how we should get it... Cheers, Matthew From josef.pktd at gmail.com Tue May 28 18:16:23 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 28 May 2013 18:16:23 -0400 Subject: [SciPy-User] peer review of scientific software In-Reply-To: <51A52727.9080606@comcast.net> References: <51A39B6D.4030607@gmail.com> <51A52727.9080606@comcast.net> Message-ID: On Tue, May 28, 2013 at 5:52 PM, John Hassler wrote: > > On 5/28/2013 4:58 PM, Matt Newville wrote: >> Hi, >> >> As others have said, I find the low average programming skill level >> among scientists frustrating, but I also found this article quite >> frustrating. >> >> >From my perspective, the authors main complaint seems to be that there >> is not enough independent checking of specialized scientific software >> written by scientists. They seem particularly unhappy about the >> tendency to use existing packages written by other scientists based on >> "trust", "reputation", "previous citations" and without independent >> checking. They also say: >> >> A "well-respected" end-user developer will almost certainly have >> earned that respect >> through scientific breakthroughs, perhaps not for their software >> engineering skills >> (although agreement on what constitutes "appropriate" scientific >> software engineering >> standards is still under debate). >> >> On this point in particular, and indeed in this whole line of >> argument, I think the authors are misguided, perhaps even to the point >> of fatality damaging their whole argument. I believe much more >> common case is for the "well-respected" end-user developer to be known >> for the programs written and supported, and less so for the scientific >> breakthroughs (unless you count new programs as new instrumentation, >> and so, well, breakthroughs, but it's pretty clear that the authors >> are making a distinction). It's too often the case that spending >> any significant time on such programs is career suicide, as it takes >> time and attention away from such breakthroughs. It's perfectly >> believable that the programming skills of such a scientific developer >> may be incomplete, but I think it's fair to say that most supported >> and well-used programs are likely the effort of people with >> above-average programming skills and the interest and intent to >> support such programs. Indeed, I would argue that instead of being >> unhappy about the reliance on trusted programs and developers, the >> authors would better serve the scientific community by arguing that >> the authors of such programs should be better supported, and given >> access to tools and resources (ie, fund them) to improve their work >> rather than treat them as untrustworthy programmers. >> >> I should admit to being one such author of a "well-respected" and >> "trust" package for a very small scientific discipline, and with the >> proverbial "many citations etc" because of this. So I would admit to >> being the just sort of person the authors are unhappy about. I >> suspect many people on this mailing list are in the same category. I >> would like to think the trust and respect for certain packages have >> been earned, and that people use such packages because they are "known >> to work", both in the sense of actually having been tested on >> idealized cases, and in producing verifiable results in real cases >> (where "testing" would not always be possible). Indeed, the small, >> decentralized group of scientific programmers that I work with (mostly >> trained as physicists, and learning to program in Fortran -- some of >> us still use mostly Fortran, in fact) do test and verify such codes, >> precisely because we know other people use them. Of course errors >> occur, and of course testing is important. Modern techniques like >> distributed version control and unit testing are very good tools to >> use. I agree they should be used more thoroughly, and that one >> should always be willing to question the results of a computer >> program. >> >> Then again, when was the last time I tested the correctness of results >> from my handheld HP calculator? Hmm, a very, very long time ago. >> That's software. I tend to believe the messages I read in my inbox >> are actually the message sent, and hardly ever do a checksum on it. >> But that's software. Indeed, all science is a social enterprise and >> so "trust", "reputation", and reliance on the literature (aka "past >> experience") are not merely unfortunate outcomes of laziness, but an >> important part of the process. >> >> I am certainly am happy to support the notion that "more scientists >> should be able to program better", so I am not going to say the >> entire article is wrong, and I don't disagree with their main >> conclusions. But I think they have a fatal flaw in their assumptions >> and arguments. >> >> --Matt Newville >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-use > > Exactly! There is actually a question here that hasn't been made > explicit. For whom is this advice intended? There are all levels of > programming/programmers in STEM. Some of my colleagues use Excel for > everything. (As in, EVERYTHING.) Some fewer use Matlab. Still fewer > use C/Fortran/Java/C#/whatever. So far as I know, I'm the one lone > Pythonista. Each group uses programming differently. > > I've been programming for more than 50 years. I've taught programming > to engineers in several contexts over the years. For a time, I really > wanted to 'do it right.' (I even taught 'structured programming' and > 'Warnier-Orr' at one point, but realized that it was worse than useless > for the particular audience.) I've come to realize that most engineers > just want an answer. They are not interested in how gracefully the > answer was arrived at. MOST programs written by MOST engineers are > small, short, simple, and intended to solve one problem one time. (The > deficiency I've most often seen is the lack of error checking for the > answer, and better programming techniques would not generally help much.) > > The problem is that nobody sets out to write a "well respected" > program. Someone sets out to scratch a particular itch ('one problem > one time'). It expands. Others find it useful. It becomes widely > used. The original author, however, was solving his/her own particular > problem, and was not at all interested in "proper" programming. So, I > guess my question is, how do we find that person who is going to write > the "well respected" program and convince him/her to take time out and > learn proper programming first? Because we are certainly not going to > convince everybody to do it. > > john I had the same impression as Matt about the article, but his writing is clearer than my thinking. For statistics and econometrics (and some economics), there are researcher who develop tools and some who write tools and sometimes they are the same. R, Stata, SAS and matlab have support for user contributions, journals, conferences, distribution channels. Developers of new algorithms, statistical tests or estimators have an incentive to see that the code goes to potential users because it boosts adoption and with it the number of citations. some examples open source maybe without source control, unit tests and without license http://ideas.repec.org/s/boc/bocode.html http://www.feweb.vu.nl/econometriclinks/software.html#GAUSS http://www.unc.edu/~jbhill/Gauss_by_code.htm Alan Isaac had a Gauss program page, but I cannot find it anymore example bocode and Stata Journal Stata is very good in supporting user code with peer review on the mailing lists (besides the articles) and if everybody else is using it, then it must be "correct" Josef > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From wccarithers at lbl.gov Tue May 28 18:44:05 2013 From: wccarithers at lbl.gov (Bill Carithers) Date: Tue, 28 May 2013 15:44:05 -0700 Subject: [SciPy-User] peer review of scientific software In-Reply-To: Message-ID: As a scientist who spends the majority of my time writing code to analyze data, I've found this discussion fascinating. Early in my career, I actually coded in assembly language (remember index registers?), then Fortran for a couple of decades, then biting the bullet and moving to object-oriented languages (Java, Python, Objective C). Now I use mostly Python. I hope the following comments from this perspective will be useful. 1. There is no "one size fits all" . Sometimes I use Python as a BASIC-style calculator, sometimes Python as procedural like Fortran, most of the time Python as fully OO. The level of documentation, testing, and version control need to be tailored to the problem. 2. In terms of getting scientists into the "modern world" of writing maintainable, re-useable code, I think the most useful tool is a really good IDE. Then much of the documentation, version control, de-bugging tools are seamlessly there. I don't think I could have written acceptable Java without Eclipse, and I'm absolutely positive that I couldn't write Objective C without Xcode. I use IDLE for Python, but it is no where near the level of these others. Hope these help and keep up the good work, Bill On 5/28/13 3:05 PM, "Matthew Brett" wrote: > Hi, > > On Tue, May 28, 2013 at 2:52 PM, John Hassler wrote: >> >> On 5/28/2013 4:58 PM, Matt Newville wrote: > > > >>> I am certainly am happy to support the notion that "more scientists >>> should be able to program better", so I am not going to say the >>> entire article is wrong, and I don't disagree with their main >>> conclusions. But I think they have a fatal flaw in their assumptions >>> and arguments. >>> >>> --Matt Newville >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-use >> >> Exactly! There is actually a question here that hasn't been made >> explicit. For whom is this advice intended? There are all levels of >> programming/programmers in STEM. Some of my colleagues use Excel for >> everything. (As in, EVERYTHING.) Some fewer use Matlab. Still fewer >> use C/Fortran/Java/C#/whatever. So far as I know, I'm the one lone >> Pythonista. Each group uses programming differently. >> >> I've been programming for more than 50 years. I've taught programming >> to engineers in several contexts over the years. For a time, I really >> wanted to 'do it right.' (I even taught 'structured programming' and >> 'Warnier-Orr' at one point, but realized that it was worse than useless >> for the particular audience.) I've come to realize that most engineers >> just want an answer. They are not interested in how gracefully the >> answer was arrived at. MOST programs written by MOST engineers are >> small, short, simple, and intended to solve one problem one time. (The >> deficiency I've most often seen is the lack of error checking for the >> answer, and better programming techniques would not generally help much.) >> >> The problem is that nobody sets out to write a "well respected" >> program. Someone sets out to scratch a particular itch ('one problem >> one time'). It expands. Others find it useful. It becomes widely >> used. The original author, however, was solving his/her own particular >> problem, and was not at all interested in "proper" programming. So, I >> guess my question is, how do we find that person who is going to write >> the "well respected" program and convince him/her to take time out and >> learn proper programming first? Because we are certainly not going to >> convince everybody to do it. > > You might find this reference interesting : > > Basili, Victor R., et al. "Understanding the > High-Performance-Computing Community." (2008). > > I found it from the Joppa article : > http://blog.nipy.org/science-joins-software.html > > The take home message seems to be - "we tell scientists to use our > fancy stuff, they tell us no, and now we realize they were often > right". > > That article is about high-level programming tools, but it must be > entirely different for version control, testing, code review, in > particular. I believe these tools are very fundamental in controlling > error. > > The point about error is the central, for me. As I proceed further > down my scientific career, I slowly begin to realize the number of > errors we make, and how easy we find it to miss them: > > http://blog.nipy.org/unscientific-programming.html > > That, for me, is the key argument - we will make fewer mistakes and do > better science if we use the basic tools to help us control error and > to help others find our errors. > > Most scientists (myself included) tend to believe this error is not > very important. > > I believe that's is wrong, but as scientists we don't believe > everything we think, and so we need data. I wonder how we should get > it... > > Cheers, > > Matthew > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From matthew.brett at gmail.com Tue May 28 19:23:20 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 28 May 2013 16:23:20 -0700 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: Message-ID: Hi, On Tue, May 28, 2013 at 3:44 PM, Bill Carithers wrote: > As a scientist who spends the majority of my time writing code to analyze > data, I've found this discussion fascinating. Early in my career, I actually > coded in assembly language (remember index registers?), then Fortran for a > couple of decades, then biting the bullet and moving to object-oriented > languages (Java, Python, Objective C). Now I use mostly Python. I hope the > following comments from this perspective will be useful. Yes, thanks for sending it. > 1. There is no "one size fits all" . Sometimes I use Python as a BASIC-style > calculator, sometimes Python as procedural like Fortran, most of the time > Python as fully OO. The level of documentation, testing, and version control > need to be tailored to the problem. That's true, but I personally use version control for everything. Whenever I'm writing more than a few lines of code I start feeling uncomfortable if it's not somewhere in version control and it's not tested. Version control is so easy I do remember to do that. Testing is hard and annoying, I sometimes press on and almost invariably regret it. I think that discomfort - the feeling that I'm setting myself up for future problems if I don't do this stuff - is what I'd like to be able to teach the next generation of scientists so they can do a better job than we did. I'm still struggling with how to do that. > 2. In terms of getting scientists into the "modern world" of writing > maintainable, re-useable code, I think the most useful tool is a really good > IDE. Then much of the documentation, version control, de-bugging tools are > seamlessly there. I don't think I could have written acceptable Java without > Eclipse, and I'm absolutely positive that I couldn't write Objective C > without Xcode. I use IDLE for Python, but it is no where near the level of > these others. I'm a scientist, I've never taken a course in programming or CS. I've written a lot of code in Matlab and Python. I very occasionally used the Matlab IDE, for debugging, but I've never used a Python IDE. My typical workflow is text editor, nosetests from terminal, IPython console in a terminal to try stuff out. I feel this helps me think more clearly - it separates the editing world from the testing world and the version control world. But I might well be wrong about that. It seems to me there's a constant and difficult tension between making it easy and making it easier to think. Cheers, Matthew From pjabardo at yahoo.com.br Tue May 28 22:18:29 2013 From: pjabardo at yahoo.com.br (Paulo Jabardo) Date: Tue, 28 May 2013 19:18:29 -0700 (PDT) Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> I'm an engineer working in research but I spend a good deal of time coding. What I've seen with most of my colleagues and friends is that they will only code whenever it is extremely necessary for an immediate application in an experiment or for their PhD. The problem starts very early, when I was beginning my studies, we were taught C (and that is still the case almost 20 years later). A small percentage of the students (10%?) enjoy programming and they will profit. I really loved pointers and doing neat tricks. For the rest it was torture, plain and simple torture. And completely useless. Most students couldn't do anything useful with programming. All their suffering was for nothing. What happened later was obvious: they would avoid programming at all costs and if they had to do something they would use MS-Excel. The spreadsheets I've seen... I still have nightmares. The things they accomplished humbles me, proves that I'm a lower being. I've seen people solve partial differential equations where each cell was an element in the solution and it was colored according to the result. Beautiful but I'd rather suffer accute physical pain than to do something like that, or worse, debug? such a "program". By the way, this sort of application was not a joke or a neat hack, it was actually the only way those guys knew how to solve a problem. 15 years later... I have a physics undergraduate student working with me. Very smart and interested. They still learn C and later on when they need to do something, what is it they do? Most professors use Origin. A huge improvement over Excel, but still. A couple of months ago, he had to turn in a report and since we don't have Origin, he was using Excel. I kind of felt sorry for him and I helped him out to do it in Python. He couldn't believe it. I did my Masters and PhD in CFD. Most other students had almost no background in programming and did most things using Excel! When they had to modify some code, it was almost by accident that things worked. You can imagine what sort of code comes out of this. The professors didn't know programming much better. Just getting them to understand the concept of version control took a while. In my opinion, If schools taught, at the begining, something like Python/Octave/R instead of C, students would be able to use this knowledge easily and productively throughout their courses and eventually learn C when they really needed it. Paulo ________________________________ De: Calvin Morrison Para: SciPy Users List Enviadas: Ter?a-feira, 28 de Maio de 2013 15:23 Assunto: Re: [SciPy-User] peer review of scientific software On 27 May 2013 13:44, Alan G Isaac wrote: >http://www.sciencemag.org/content/340/6134/814.summary Maybe I can use this as a ranting point. As a background I am a programmer, but I have been hired by various professors and other academic people to work on projects. My biggest problem with the "scientific computing" community is having to put up with this crap! I am so sick of reading a paper, that will exactly?fulfill?my needs, only to find out the software has disappeared, only works on HP-UX on one computer in the world, or has absolutely zero documentation.? I've started pushing my lab to follow standards, use version control, document our changes, and publish our code. So far it is going really well, but I can only do so much. If only all of all students had a few basic "proper programming practices" courses, everything would go a lot smoother. Teaching programming is fine, but why don't we teach our scientists the best way to collaborate in the 21st century? Below is another related paper that is a good starting point for converting users. Enough emailing tarballs back and forth! Enough undocumented code! Enough is Enough! http://arxiv.org/pdf/1210.0530v3.pdf Pissed-off Scientific Programmer, Calvin Morrison _______________________________________________ SciPy-User mailing list SciPy-User at scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue May 28 22:34:26 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 28 May 2013 19:34:26 -0700 Subject: [SciPy-User] peer review of scientific software In-Reply-To: <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> References: <51A39B6D.4030607@gmail.com> <1369793909.71134.YahooMailNeo@web121502.mail.ne1.yahoo.com> Message-ID: Hi, On Tue, May 28, 2013 at 7:18 PM, Paulo Jabardo wrote: > I'm an engineer working in research but I spend a good deal of time coding. > What I've seen with most of my colleagues and friends is that they will only > code whenever it is extremely necessary for an immediate application in an > experiment or for their PhD. The problem starts very early, when I was > beginning my studies, we were taught C (and that is still the case almost 20 > years later). A small percentage of the students (10%?) enjoy programming > and they will profit. I really loved pointers and doing neat tricks. For the > rest it was torture, plain and simple torture. And completely useless. Most > students couldn't do anything useful with programming. All their suffering > was for nothing. What happened later was obvious: they would avoid > programming at all costs and if they had to do something they would use > MS-Excel. The spreadsheets I've seen... I still have nightmares. The things > they accomplished humbles me, proves that I'm a lower being. I've seen > people solve partial differential equations where each cell was an element > in the solution and it was colored according to the result. Beautiful but > I'd rather suffer accute physical pain than to do something like that, or > worse, debug such a "program". By the way, this sort of application was not > a joke or a neat hack, it was actually the only way those guys knew how to > solve a problem. > > 15 years later... I have a physics undergraduate student working with me. > Very smart and interested. They still learn C and later on when they need to > do something, what is it they do? Most professors use Origin. A huge > improvement over Excel, but still. A couple of months ago, he had to turn in > a report and since we don't have Origin, he was using Excel. I kind of felt > sorry for him and I helped him out to do it in Python. He couldn't believe > it. Oh - dear; you probably saw this stuff? http://blog.stodden.net/2013/04/19/what-the-reinhart-rogoff-debacle-really-shows-verifying-empirical-results-needs-to-be-routine/ > I did my Masters and PhD in CFD. Most other students had almost no > background in programming and did most things using Excel! When they had to > modify some code, it was almost by accident that things worked. You can > imagine what sort of code comes out of this. The professors didn't know > programming much better. Just getting them to understand the concept of > version control took a while. > > In my opinion, If schools taught, at the begining, something like > Python/Octave/R instead of C, students would be able to use this knowledge > easily and productively throughout their courses and eventually learn C when > they really needed it. That's surely one of the big arguments for Python - it is a great first language, and it is capable across a wider range than Octave or R - or even Excel :) Cheers, Matthew From bjorn.madsen at operationsresearchgroup.com Wed May 29 02:25:54 2013 From: bjorn.madsen at operationsresearchgroup.com (Bjorn Madsen) Date: Wed, 29 May 2013 07:25:54 +0100 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: Message-ID: +1 to Bill Carithers. On 28 May 2013 23:44, Bill Carithers wrote: > As a scientist who spends the majority of my time writing code to analyze > data, I've found this discussion fascinating. Early in my career, I > actually > coded in assembly language (remember index registers?), then Fortran for a > couple of decades, then biting the bullet and moving to object-oriented > languages (Java, Python, Objective C). Now I use mostly Python. I hope the > following comments from this perspective will be useful. > > 1. There is no "one size fits all" . Sometimes I use Python as a > BASIC-style > calculator, sometimes Python as procedural like Fortran, most of the time > Python as fully OO. The level of documentation, testing, and version > control > need to be tailored to the problem. > > 2. In terms of getting scientists into the "modern world" of writing > maintainable, re-useable code, I think the most useful tool is a really > good > IDE. Then much of the documentation, version control, de-bugging tools are > seamlessly there. I don't think I could have written acceptable Java > without > Eclipse, and I'm absolutely positive that I couldn't write Objective C > without Xcode. I use IDLE for Python, but it is no where near the level of > these others. > > Hope these help and keep up the good work, > Bill > > > On 5/28/13 3:05 PM, "Matthew Brett" wrote: > > > Hi, > > > > On Tue, May 28, 2013 at 2:52 PM, John Hassler > wrote: > >> > >> On 5/28/2013 4:58 PM, Matt Newville wrote: > > > > > > > >>> I am certainly am happy to support the notion that "more scientists > >>> should be able to program better", so I am not going to say the > >>> entire article is wrong, and I don't disagree with their main > >>> conclusions. But I think they have a fatal flaw in their assumptions > >>> and arguments. > >>> > >>> --Matt Newville > >>> _______________________________________________ > >>> SciPy-User mailing list > >>> SciPy-User at scipy.org > >>> http://mail.scipy.org/mailman/listinfo/scipy-use > >> > >> Exactly! There is actually a question here that hasn't been made > >> explicit. For whom is this advice intended? There are all levels of > >> programming/programmers in STEM. Some of my colleagues use Excel for > >> everything. (As in, EVERYTHING.) Some fewer use Matlab. Still fewer > >> use C/Fortran/Java/C#/whatever. So far as I know, I'm the one lone > >> Pythonista. Each group uses programming differently. > >> > >> I've been programming for more than 50 years. I've taught programming > >> to engineers in several contexts over the years. For a time, I really > >> wanted to 'do it right.' (I even taught 'structured programming' and > >> 'Warnier-Orr' at one point, but realized that it was worse than useless > >> for the particular audience.) I've come to realize that most engineers > >> just want an answer. They are not interested in how gracefully the > >> answer was arrived at. MOST programs written by MOST engineers are > >> small, short, simple, and intended to solve one problem one time. (The > >> deficiency I've most often seen is the lack of error checking for the > >> answer, and better programming techniques would not generally help > much.) > >> > >> The problem is that nobody sets out to write a "well respected" > >> program. Someone sets out to scratch a particular itch ('one problem > >> one time'). It expands. Others find it useful. It becomes widely > >> used. The original author, however, was solving his/her own particular > >> problem, and was not at all interested in "proper" programming. So, I > >> guess my question is, how do we find that person who is going to write > >> the "well respected" program and convince him/her to take time out and > >> learn proper programming first? Because we are certainly not going to > >> convince everybody to do it. > > > > You might find this reference interesting : > > > > Basili, Victor R., et al. "Understanding the > > High-Performance-Computing Community." (2008). > > > > I found it from the Joppa article : > > http://blog.nipy.org/science-joins-software.html > > > > The take home message seems to be - "we tell scientists to use our > > fancy stuff, they tell us no, and now we realize they were often > > right". > > > > That article is about high-level programming tools, but it must be > > entirely different for version control, testing, code review, in > > particular. I believe these tools are very fundamental in controlling > > error. > > > > The point about error is the central, for me. As I proceed further > > down my scientific career, I slowly begin to realize the number of > > errors we make, and how easy we find it to miss them: > > > > http://blog.nipy.org/unscientific-programming.html > > > > That, for me, is the key argument - we will make fewer mistakes and do > > better science if we use the basic tools to help us control error and > > to help others find our errors. > > > > Most scientists (myself included) tend to believe this error is not > > very important. > > > > I believe that's is wrong, but as scientists we don't believe > > everything we think, and so we need data. I wonder how we should get > > it... > > > > Cheers, > > > > Matthew > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Bjorn Madsen *Researcher Complex Systems Research* Ph.: (+44) 0 7792 030 720 bjorn.madsen at operationsresearchgroup.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From msuzen at gmail.com Wed May 29 06:32:02 2013 From: msuzen at gmail.com (Suzen, Mehmet) Date: Wed, 29 May 2013 12:32:02 +0200 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: On 28 May 2013 20:23, Calvin Morrison wrote: > > > > On 27 May 2013 13:44, Alan G Isaac wrote: >> >> >> http://www.sciencemag.org/content/340/6134/814.summary > > It is very ironic that someone from Microsoft, a closed source software company, demands others to release their source code. ".. need for open access to software ..even the most basic step of making source code available upon publication.". I don't know maybe Microsoft Research publishes all the source code they use/developed in their research output, so then it makes sense but I highly doubt it though due to patents etc. > Maybe I can use this as a ranting point. > > As a background I am a programmer, but I have been hired by various > professors and other academic people to work on projects. My biggest problem > with the "scientific computing" community is having to put up with this > crap! Just a remark: software is not the subject of scientific research at least in computational physical sciences if not all. It's a tool and infrastructure. NO clever software development practice would give you a good scientific output. It can improve the efficiency/correctness greatly. However, for example implementing wrong equation, n-body simulation code that do not conserve momentum by construction or similar can not be detected by the highest quality software development life cycle. So scientific programmers should think about this as well before cursing to professors and academic people. From lists at hilboll.de Wed May 29 06:36:47 2013 From: lists at hilboll.de (Andreas Hilboll) Date: Wed, 29 May 2013 12:36:47 +0200 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> Message-ID: <51A5DA3F.6090109@hilboll.de> On 29.05.2013 12:32, Suzen, Mehmet wrote: > On 28 May 2013 20:23, Calvin Morrison wrote: >> >> >> >> On 27 May 2013 13:44, Alan G Isaac wrote: >>> >>> >>> http://www.sciencemag.org/content/340/6134/814.summary >> >> > > It is very ironic that someone from Microsoft, a closed source > software company, > demands others to release their source code. ".. need for open access > to software > ..even the most basic step of making source code available upon > publication.". I don't > know maybe Microsoft Research publishes all the source code they > use/developed in their research > output, so then it makes sense but I highly doubt it though due to patents etc. > > >> Maybe I can use this as a ranting point. >> >> As a background I am a programmer, but I have been hired by various >> professors and other academic people to work on projects. My biggest problem >> with the "scientific computing" community is having to put up with this >> crap! > > Just a remark: software is not the subject of scientific research at > least in computational > physical sciences if not all. It's a tool and infrastructure. NO > clever software development practice would give you a good > scientific output. It can improve the efficiency/correctness greatly. > However, for example implementing wrong equation, > n-body simulation code that do not conserve momentum by construction > or similar can not be detected by the highest quality > software development life cycle. So scientific programmers should > think about this as well before cursing to professors and academic > people. Mehmet, you're correct that version control doesn't protect from wrongly implemented equations. However, proper testing of the code you write *does*. If you implement an equation, and implement a test case in which you test the output of this function for a number of well-known cases (computed independently), you'll immediately see that you have a problem. This being said, it is true that it requires hard work to actually stick to these good practices in everyday life with deadlines and supervisors who haven't grown up with these good practices and therefore don't want to accept that they do require a front-up time investment. -- Andreas. From msuzen at gmail.com Wed May 29 06:43:55 2013 From: msuzen at gmail.com (Suzen, Mehmet) Date: Wed, 29 May 2013 12:43:55 +0200 Subject: [SciPy-User] peer review of scientific software In-Reply-To: <51A5DA3F.6090109@hilboll.de> References: <51A39B6D.4030607@gmail.com> <51A5DA3F.6090109@hilboll.de> Message-ID: On 29 May 2013 12:36, Andreas Hilboll wrote: > implemented equations. However, proper testing of the code you write > *does*. If you implement an equation, and implement a test case in which I agree with you that unit testing or functional testing helps greatly to catch problems and there is a lack of appreciating these good practices from "old-school" computational scientists. But, you need to design your "proper test" and obtain cases by your scientific knowledge. A sole unit test can *not* catch your scientific error alone. But probably it is an art of mixing two; good software development practice and good scientific knowledge. From hasslerjc at comcast.net Wed May 29 09:56:48 2013 From: hasslerjc at comcast.net (John Hassler) Date: Wed, 29 May 2013 09:56:48 -0400 Subject: [SciPy-User] peer review of scientific software In-Reply-To: References: <51A39B6D.4030607@gmail.com> <51A5DA3F.6090109@hilboll.de> Message-ID: <51A60920.5060207@comcast.net> On 5/29/2013 6:43 AM, Suzen, Mehmet wrote: > On 29 May 2013 12:36, Andreas Hilboll wrote: >> implemented equations. However, proper testing of the code you write >> *does*. If you implement an equation, and implement a test case in which > I agree with you that unit testing or functional testing helps greatly to catch > problems and there is a lack of appreciating these good practices from > "old-school" computational > scientists. But, you need to design your "proper test" and obtain > cases by your scientific knowledge. > A sole unit test can *not* catch your scientific error alone. But > probably it is an art of mixing two; good > software development practice and good scientific knowledge. > _______________________________________________ The thing that made me uncomfortable with the original article was that all of the emphasis was on the program. For most of us, the program is not the goal. Neither is the answer, in itself. We want to DO something with it. "The purpose of computing is insight, not numbers." (Hamming) When I was teaching, one of the things I emphasized (with, I'm afraid, only modest success) was "error checking." Just because the computer spit out an answer (to 14 fig.) doesn't mean it's correct. How, in fact, DO you know your answer is correct? If you're a student, you hand it in and see if it comes back covered in red marks. But what if you are working on a real job for real money, and you have a real problem and not a 'solution kit' (Shockley)? THEN how do you know your answer is correct? Short answer: You don't. The best you can do is to use whatever checks and methods you can find to reduce the chances that there is an error of some sort. Proper programming methods can help, but in my experience, programming errors are not the major source of errors in real life(TM). A couple of examples: A certain thermodynamics text has an elaborate example involving extensive calculations. The equations are correct. The solution of the equations is correct. The answer *to the problem* is wrong. The problem as the author set it up is not well posed, and the computation has lost all significance by the end. Yet each single step is correct, and would pass any reasonable unit test. The infamous Mars Orbiter "metric mixup" was not really a programming problem. Both programs were correct, and no tests applied to either single program would have found a problem. (After the launch, there were warning flags that something was amiss. They were ignored.) Obviously, a correct program is NECESSARY for a correct answer. Equally obviously, it is not SUFFICIENT. So, if you are an engineer/scientist who desperately needs an answer to a problem, and who greets the prospect of programming with the joy usually reserved for root canal work, where will you put your effort? I like programming. I've studied it. I've put a lot of effort into learning to 'do it right.' You've never heard of me. Maybe ... if I'd put more effort into the 'engineering' and less into the 'programming' ... maybe .. I coulda been a contender. john From matthew.brett at gmail.com Wed May 29 16:26:05 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 29 May 2013 13:26:05 -0700 Subject: [SciPy-User] peer review of scientific software In-Reply-To: <51A60920.5060207@comcast.net> References: <51A39B6D.4030607@gmail.com> <51A5DA3F.6090109@hilboll.de> <51A60920.5060207@comcast.net> Message-ID: Hi, On Wed, May 29, 2013 at 6:56 AM, John Hassler wrote: > > On 5/29/2013 6:43 AM, Suzen, Mehmet wrote: >> On 29 May 2013 12:36, Andreas Hilboll wrote: >>> implemented equations. However, proper testing of the code you write >>> *does*. If you implement an equation, and implement a test case in which >> I agree with you that unit testing or functional testing helps greatly to catch >> problems and there is a lack of appreciating these good practices from >> "old-school" computational >> scientists. But, you need to design your "proper test" and obtain >> cases by your scientific knowledge. >> A sole unit test can *not* catch your scientific error alone. But >> probably it is an art of mixing two; good >> software development practice and good scientific knowledge. >> _______________________________________________ > > The thing that made me uncomfortable with the original article was that > all of the emphasis was on the program. For most of us, the program is > not the goal. Neither is the answer, in itself. We want to DO > something with it. "The purpose of computing is insight, not numbers." > (Hamming) > > When I was teaching, one of the things I emphasized (with, I'm afraid, > only modest success) was "error checking." Just because the computer > spit out an answer (to 14 fig.) doesn't mean it's correct. How, in fact, > DO you know your answer is correct? > > If you're a student, you hand it in and see if it comes back covered in > red marks. But what if you are working on a real job for real money, > and you have a real problem and not a 'solution kit' (Shockley)? THEN > how do you know your answer is correct? > > Short answer: You don't. Right - but the key thing is - that we work to show that we are correct - as far as we can. With the emphasis on 'show'. I think this is what Donoho means in the second quote here: http://blog.nipy.org/ubiquity-of-error.html That's tests for code correctness and more - a way of thinking 'how could this be wrong?' - 'how could I check if it was wrong?' and 'how can I show you that this may not be wrong". This is the process Donoho calls "rooting out error". I don't think that instinct is universal at the moment. You may have seen the recent evidence that most studies do not replicate in Cancer research: http://pipeline.corante.com/archives/2012/03/29/sloppy_science.php There's been a lot of discussion for the life sciences in general about this: http://pps.sagepub.com/content/7/6/543.short http://jeps.efpsa.org/blog/2013/01/30/replication-studies-its-time-to-clean-up-your-act-psychologists/ http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124 http://www.nature.com/nrn/journal/v14/n5/abs/nrn3475.html So, yes, it's better for our careers to get stuff out, but it may well be worse for science in general. Cheers, Matthew From opit at hardcore.lt Thu May 30 12:23:11 2013 From: opit at hardcore.lt (astrolitterbox) Date: Thu, 30 May 2013 16:23:11 +0000 (UTC) Subject: [SciPy-User] ndimage.zoom for array with nan values References: <20130525224707.EAC37E6736@smtp.hushmail.com> Message-ID: zetah hush.ai> writes: > > Found an answer that's acceptable for me: http://astrolitterbox.blogspot.com/2012/03/healing-holes-in-arrays-in-python.html Glad it was of any use! However, if you have rather large holes you need to fill, the next iteration of this script is better. It fills the pixels with the largest number of neighbours first, iteratively filling them all. http://astrolitterbox.blogspot.de/2012/08/the-new-inpainting-script.html From ralf.gommers at gmail.com Thu May 30 15:45:43 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 30 May 2013 21:45:43 +0200 Subject: [SciPy-User] MemoryError in scipy.sparse.csgraph.shortest_path In-Reply-To: <519BC1BE.8010503@gmail.com> References: <519BC1BE.8010503@gmail.com> Message-ID: On Tue, May 21, 2013 at 8:49 PM, Giovanni Luca Ciampaglia < glciampagl at gmail.com> wrote: > Hi, > > I want to compute the shortest path distances in a sparse directed graph > with 2M > nodes but scipy.csgraph.shortest_path immediately throws a MemoryError -- > regardless of the chosen method. This is what I do: > > > In [2]: import numpy as np > > > > In [3]: import scipy.sparse as sp > > > > In [4]: import scipy.sparse.csgraph as csg > > > > In [5]: a = np.load('adjacency_isa.npy') > > > > In [6]: adj = sp.coo_matrix((a['weight'], (a['row'], a['col'])), > (2351254,)*2) > > > > In [7]: adj = adj.tocsr() > > > > And this is what I get: > > > In [8]: D = csg.shortest_path(adj, directed=True) > > > --------------------------------------------------------------------------- > > MemoryError Traceback (most recent call last) > > in () > > ----> 1 D = csg.shortest_path(adj, directed=True) > > > > > /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so > in > > scipy.sparse.csgraph._shortest_path.shortest_path > > (scipy/sparse/csgraph/_shortest_path.c:2117)() > > > > > /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so > in > > scipy.sparse.csgraph._shortest_path.dijkstra > > (scipy/sparse/csgraph/_shortest_path.c:3948)() > > > > MemoryError: > > > > In [9]: D = csg.dijkstra(adj, directed=True) > > > --------------------------------------------------------------------------- > > MemoryError Traceback (most recent call last) > > in () > > ----> 1 D = csg.dijkstra(adj, directed=True) > > > > > /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so > in > > scipy.sparse.csgraph._shortest_path.dijkstra > > (scipy/sparse/csgraph/_shortest_path.c:3948)() > > > > MemoryError: > > > > In [10]: D = csg.floyd_warshall(adj, directed=True) > > > --------------------------------------------------------------------------- > > MemoryError Traceback (most recent call last) > > in () > > ----> 1 D = csg.floyd_warshall(adj, directed=True) > > > > > /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so > in > > scipy.sparse.csgraph._shortest_path.floyd_warshall > > (scipy/sparse/csgraph/_shortest_path.c:2457)() > > > > > /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_validation.pyc > > in validate_graph(csgraph, directed, dtype, csr_output, dense_output, > > copy_if_dense, copy_if_sparse, null_value_in, null_value_out, > infinity_null, > > nan_null) > > 26 csgraph = csr_matrix(csgraph, dtype=DTYPE, copy=copy_if_sparse) > > 27 else: > > ---> 28 csgraph = csgraph_to_dense(csgraph, null_value=null_value_out) > > 29 elif np.ma.is_masked(csgraph): > > 30 if dense_output: > > > > /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_tools.so > in > > scipy.sparse.csgraph._tools.csgraph_to_dense > > (scipy/sparse/csgraph/_tools.c:2984)() > > > > MemoryError: > > > It seems that there are two distinct issues: > > 1. floyd_warshall() calls validate_graph with csr_output = False > (_shortest_path.pyx:218), causing the graph to be converted to dense. I > believe > this a bug. > 2. dijkstra creates a dense distance matrix (_shortest_path.pyx:409). I > understand that one cannot make any assumption about the connectivity of > the > graph, and thus of the sparsity of the distance matrix itself; and of > course I > can get around this calling dijkstra multiple times with a manageable > chunk of > indices, and discarding the values that are equal to inf, but it would be > nonetheless nice if the code tried to do something similar, at least for > the > cases when one knows that most of the distances will be inf. > Hi Giovanni, sorry no one replied so far. Could you please open an issue for this on Github? Thanks, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From otrov at hush.ai Thu May 30 16:35:12 2013 From: otrov at hush.ai (zetah) Date: Thu, 30 May 2013 22:35:12 +0200 Subject: [SciPy-User] ndimage.zoom for array with nan values In-Reply-To: References: <20130525224707.EAC37E6736@smtp.hushmail.com> Message-ID: <20130530203513.8FBFAA6E38@smtp.hushmail.com> "astrolitterbox" wrote: > >Glad it was of any use! However, if you have rather large holes you need to >fill, the next iteration of this script is better. It fills the pixels with >the largest number of neighbours first, iteratively filling them all. > >http://astrolitterbox.blogspot.de/2012/08/the-new-inpainting-script.html Thanks for the fast food and your considering reply :) First version works perfect for me, and with all that docstrings and module structure it's in my site-packages. If I have to use it on an array with large holes, and notice issues I'll use the new version which seems it's more fitted for specialized usage and would need user intervention before importing. Thanks again Cheers From evilper at gmail.com Fri May 31 05:58:51 2013 From: evilper at gmail.com (Per Nielsen) Date: Fri, 31 May 2013 11:58:51 +0200 Subject: [SciPy-User] Unexpectedly large memory usage in scipy.ode class Message-ID: Hi all, I am solving large linear ODE systems using the QuTip python package ( https://code.google.com/p/qutip/) which uses scipy ODE solvers under the hood. The system is of he form dydt = L*y, where L is a large complex sparse matrix, all pretty standard. In this type of problem the matrix L is the biggest memory user, expected to be much larger than the solution vector y itself. Below is the output of a @profile from the memory_profiler package on the function setting up the ode object, no actual time-stepping is done (the code can be found here: https://github.com/qutip/qutip/blob/master/qutip/mesolve.py#L561). Line # Mem usage Increment Line Contents ================================================ 562 @profile 563 def _mesolve_const(H, rho0, tlist, c_op_list, expt_ops, args, opt, 564 progress_bar): 565 """! 566 Evolve the density matrix using an ODE solver, for constant hamiltonian 567 and collapse operators. 568 """ 569 61.961 MB 0.000 MB 570 61.961 MB 0.000 MB if debug: 571 print(inspect.stack()[0][3]) 572 573 # 574 # check initial state 575 # 576 61.961 MB 0.000 MB if isket(rho0): 577 # if initial state is a ket and no collapse operator where given, 578 # fallback on the unitary schrodinger equation solver 579 61.961 MB 0.000 MB if len(c_op_list) == 0 and isoper(H): 580 return _sesolve_const(H, rho0, tlist, expt_ops, args, opt) 581 582 # Got a wave function as initial state: convert to density matrix. 583 61.973 MB 0.012 MB rho0 = rho0 * rho0.dag() 584 585 # 586 # construct liouvillian 587 # 588 61.973 MB 0.000 MB if opt.tidy: 589 61.973 MB 0.000 MB H = H.tidyup(opt.atol) 590 591 327.887 MB 265.914 MB L = liouvillian_fast(H, c_op_list) 592 593 # 594 # setup integrator 595 # 596 343.168 MB 15.281 MB initial_vector = mat2vec(rho0.full()) 597 343.168 MB 0.000 MB r = scipy.integrate.ode(cy_ode_rhs) 598 343.168 MB 0.000 MB r.set_f_params(L.data.data, L.data.indices, L.data.indptr) 599 343.168 MB 0.000 MB r.set_integrator('zvode', method=opt.method, order=opt.order, 600 343.168 MB 0.000 MB atol=opt.atol, rtol=opt.rtol, nsteps=opt.nsteps, 601 343.168 MB 0.000 MB first_step=opt.first_step, min_step=opt.min_step, 602 343.172 MB 0.004 MB max_step=opt.max_step) 603 572.055 MB 228.883 MB r.set_initial_value(initial_vector, tlist[0]) 604 605 # 606 # call generic ODE code 607 # 608 602.805 MB 30.750 MB return _generic_ode_solve(r, rho0, tlist, expt_ops, opt, progress_bar) On line 591 the L matrix generated and eats a large chunk of memory, as expected. However, on line 603 setting the initial condition eats an almost comparable chunk, despite the fact that the initial vector itself only takes up ~ 15 MB (line 596). I find this strange, as I would expect that setting the initial condition would at most increase the memory usage by approximately the size of the initial vector. I have tried to reproduce the problem using a minimal script (see attachment), but here the memory usage is as expected: Filename: test_ode2.py Line # Mem usage Increment Line Contents ================================================ 7 @profile 8 18.707 MB 0.000 MB def runode(): 9 18.707 MB 0.000 MB N = 5000 10 11 # M = np.random.rand(N, N) 12 111.230 MB 92.523 MB M = sparse.rand(N, N, density=0.05, format='csr') \ 13 198.797 MB 87.566 MB + 1j * sparse.rand(N, N, density=0.05, format='csr') 14 199.031 MB 0.234 MB y0 = np.random.rand(N, 1) + 1j * np.random.rand(N, 1) 15 16 199.031 MB 0.000 MB t0 = 0.0 17 18 199.031 MB 0.000 MB def f(t, y, M): 19 # return np.dot(M, y) 20 return M.dot(y) 21 22 199.031 MB 0.000 MB r = ode(f) 23 199.031 MB 0.000 MB r.set_integrator('zvode', atol=1e-10) 24 199.035 MB 0.004 MB r.set_f_params(M) 25 199.035 MB 0.000 MB r.set_initial_value(y0, t0) Does someone with more insight into the scipy.ode solver might have an idea of whats going on? I looked in the file myself but didnt not see any indications of large memory consumptions. Best, Per -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_ode2.py Type: application/octet-stream Size: 801 bytes Desc: not available URL: From warren.weckesser at gmail.com Fri May 31 10:23:36 2013 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Fri, 31 May 2013 10:23:36 -0400 Subject: [SciPy-User] Unexpectedly large memory usage in scipy.ode class In-Reply-To: References: Message-ID: On Fri, May 31, 2013 at 5:58 AM, Per Nielsen wrote: > Hi all, > > I am solving large linear ODE systems using the QuTip python package ( > https://code.google.com/p/qutip/) which uses scipy ODE solvers under the > hood. The system is of he form > > dydt = L*y, > > where L is a large complex sparse matrix, all pretty standard. In this > type of problem the matrix L is the biggest memory user, expected to be > much larger than the solution vector y itself. > > Below is the output of a @profile from the memory_profiler package on the > function setting up the ode object, no actual time-stepping is done (the > code can be found here: > https://github.com/qutip/qutip/blob/master/qutip/mesolve.py#L561). > > Line # Mem usage Increment Line Contents > ================================================ > 562 @profile > 563 def _mesolve_const(H, rho0, tlist, > c_op_list, expt_ops, args, opt, > 564 progress_bar): > 565 """! > 566 Evolve the density matrix using an > ODE solver, for constant hamiltonian > 567 and collapse operators. > 568 """ > 569 61.961 MB 0.000 MB > 570 61.961 MB 0.000 MB if debug: > 571 print(inspect.stack()[0][3]) > 572 > 573 # > 574 # check initial state > 575 # > 576 61.961 MB 0.000 MB if isket(rho0): > 577 # if initial state is a ket and > no collapse operator where given, > 578 # fallback on the unitary > schrodinger equation solver > 579 61.961 MB 0.000 MB if len(c_op_list) == 0 and > isoper(H): > 580 return _sesolve_const(H, > rho0, tlist, expt_ops, args, opt) > 581 > 582 # Got a wave function as > initial state: convert to density matrix. > 583 61.973 MB 0.012 MB rho0 = rho0 * rho0.dag() > 584 > 585 # > 586 # construct liouvillian > 587 # > 588 61.973 MB 0.000 MB if opt.tidy: > 589 61.973 MB 0.000 MB H = H.tidyup(opt.atol) > 590 > 591 327.887 MB 265.914 MB L = liouvillian_fast(H, c_op_list) > 592 > 593 # > 594 # setup integrator > 595 # > 596 343.168 MB 15.281 MB initial_vector = > mat2vec(rho0.full()) > 597 343.168 MB 0.000 MB r = scipy.integrate.ode(cy_ode_rhs) > 598 343.168 MB 0.000 MB r.set_f_params(L.data.data, > L.data.indices, L.data.indptr) > 599 343.168 MB 0.000 MB r.set_integrator('zvode', > method=opt.method, order=opt.order, > 600 343.168 MB 0.000 MB atol=opt.atol, > rtol=opt.rtol, nsteps=opt.nsteps, > 601 343.168 MB 0.000 MB > first_step=opt.first_step, min_step=opt.min_step, > 602 343.172 MB 0.004 MB > max_step=opt.max_step) > 603 572.055 MB 228.883 MB r.set_initial_value(initial_vector, > tlist[0]) > 604 > 605 # > 606 # call generic ODE code > 607 # > 608 602.805 MB 30.750 MB return _generic_ode_solve(r, rho0, > tlist, expt_ops, opt, progress_bar) > > On line 591 the L matrix generated and eats a large chunk of memory, as > expected. However, on line 603 setting the initial condition eats an almost > comparable chunk, despite the fact that the initial vector itself only > takes up ~ 15 MB (line 596). > > I find this strange, as I would expect that setting the initial condition > would at most increase the memory usage by approximately the size of the > initial vector. > > I have tried to reproduce the problem using a minimal script (see > attachment), but here the memory usage is as expected: > > Filename: test_ode2.py > > Line # Mem usage Increment Line Contents > ================================================ > 7 @profile > 8 18.707 MB 0.000 MB def runode(): > 9 18.707 MB 0.000 MB N = 5000 > 10 > 11 # M = np.random.rand(N, N) > 12 111.230 MB 92.523 MB M = sparse.rand(N, N, density=0.05, > format='csr') \ > 13 198.797 MB 87.566 MB + 1j * sparse.rand(N, N, > density=0.05, format='csr') > 14 199.031 MB 0.234 MB y0 = np.random.rand(N, 1) + 1j * > np.random.rand(N, 1) > 15 > 16 199.031 MB 0.000 MB t0 = 0.0 > 17 > 18 199.031 MB 0.000 MB def f(t, y, M): > 19 # return np.dot(M, y) > 20 return M.dot(y) > 21 > 22 199.031 MB 0.000 MB r = ode(f) > 23 199.031 MB 0.000 MB r.set_integrator('zvode', > atol=1e-10) > 24 199.035 MB 0.004 MB r.set_f_params(M) > 25 199.035 MB 0.000 MB r.set_initial_value(y0, t0) > > Does someone with more insight into the scipy.ode solver might have an > idea of whats going on? I looked in the file myself but didnt not see any > indications of large memory consumptions. > The `set_initial_value` method calls the integrator's `reset` method. The `reset` method of the 'zvode' integrator allocates three "work" arrays, `iwork`, `rwork` and `zwork`, whose sizes depends on the size of `y0`. To verify that these are the cause of the memory growth, you can access these arrays after calling `r.set_initial_value(y0, t0)` as `r._integrator.iwork`, etc. Warren > Best, > Per > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From glciampagl at gmail.com Fri May 31 10:57:42 2013 From: glciampagl at gmail.com (Giovanni Luca Ciampaglia) Date: Fri, 31 May 2013 10:57:42 -0400 Subject: [SciPy-User] MemoryError in scipy.sparse.csgraph.shortest_path In-Reply-To: References: Message-ID: <51A8BA66.9020009@gmail.com> On 05/31/2013 05:54 AM, scipy-user-request at scipy.org wrote: >> It seems that there are two distinct issues: 1. floyd_warshall() calls >> validate_graph with csr_output = False (_shortest_path.pyx:218), causing the >> graph to be converted to dense. I believe this a bug. 2. dijkstra creates a >> dense distance matrix (_shortest_path.pyx:409). I understand that one cannot >> make any assumption about the connectivity of the graph, and thus of the >> sparsity of the distance matrix itself; and of course I can get around this >> calling dijkstra multiple times with a manageable chunk of indices, and >> discarding the values that are equal to inf, but it would be nonetheless nice >> if the code tried to do something similar, at least for the cases when one >> knows that most of the distances will be inf. > Hi Giovanni, sorry no one replied so far. Could you please open an issue > for this on Github? > > Thanks, > Ralf Hi Ralf, done, it's here: https://github.com/scipy/scipy/issues/2526 Do I have to specify assignees? Best, -- Giovanni Luca Ciampaglia Postdoctoral fellow Center for Complex Networks and Systems Research Indiana University ? 910 E 10th St ? Bloomington ? IN 47408 ? http://cnets.indiana.edu/ ? gciampag at indiana.edu From ralf.gommers at gmail.com Fri May 31 12:48:24 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 31 May 2013 18:48:24 +0200 Subject: [SciPy-User] MemoryError in scipy.sparse.csgraph.shortest_path In-Reply-To: <51A8BA66.9020009@gmail.com> References: <51A8BA66.9020009@gmail.com> Message-ID: On Fri, May 31, 2013 at 4:57 PM, Giovanni Luca Ciampaglia < glciampagl at gmail.com> wrote: > On 05/31/2013 05:54 AM, scipy-user-request at scipy.org wrote: > >> It seems that there are two distinct issues: 1. floyd_warshall() calls > >> validate_graph with csr_output = False (_shortest_path.pyx:218), > causing the > >> graph to be converted to dense. I believe this a bug. 2. dijkstra > creates a > >> dense distance matrix (_shortest_path.pyx:409). I understand that one > cannot > >> make any assumption about the connectivity of the graph, and thus of the > >> sparsity of the distance matrix itself; and of course I can get around > this > >> calling dijkstra multiple times with a manageable chunk of indices, and > >> discarding the values that are equal to inf, but it would be > nonetheless nice > >> if the code tried to do something similar, at least for the cases when > one > >> knows that most of the distances will be inf. > > Hi Giovanni, sorry no one replied so far. Could you please open an issue > > for this on Github? > > > > Thanks, > > Ralf > > Hi Ralf, > > done, it's here: https://github.com/scipy/scipy/issues/2526 > Thanks. > Do I have to specify assignees? > Not needed (and I don't think you can without commit rights). We label the issues and perhaps ping people we know are interested, but don't assign issues to others. Every committer is welcome to assign issues to him/her self though. Ralf > > Best, > > -- > Giovanni Luca Ciampaglia > > Postdoctoral fellow > Center for Complex Networks and Systems Research > Indiana University > > ? 910 E 10th St ? Bloomington ? IN 47408 > ? http://cnets.indiana.edu/ > ? gciampag at indiana.edu > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From afraser at lanl.gov Fri May 31 16:55:01 2013 From: afraser at lanl.gov (Andrew Fraser) Date: Fri, 31 May 2013 14:55:01 -0600 Subject: [SciPy-User] Python3 code for hidden Markov models Message-ID: <87obbqhntm.fsf@watcher.lanl.gov> A couple of weeks ago, I posted python3 code for HMMs at http://code.google.com/p/hmmds/ From that page: Derived from the code for Fraser's book "Hidden Markov Models and Dynamical Systems", hmmds provides an HMM class that supports customized output/observation models. We include code and data for all examples and figures in the book. We have written the base classes in python3 for clarity and used cython and scipy sparse matrices for efficiency and speed. Although I will continue to work on the many incomplete examples, the code for basic HMMs with customized output/observation models is solid. I like the way I've implemented it. I welcome suggestions for improvement. Andy Fraser From stonebig34 at gmail.com Sat May 4 07:22:33 2013 From: stonebig34 at gmail.com (stonebig) Date: Sat, 04 May 2013 11:22:33 -0000 Subject: [SciPy-User] ANN: New WinPython with Python 2.7.4 and 3.3.1 (32/64bit) In-Reply-To: References: Message-ID: <0d87976a-0f7b-4b46-9e2b-aa0b7a48af4b@googlegroups.com> Hello Pierre, Thank you very much for this wonderfull tool. A small question : - SQLAlchemy is not in winpython3.3.1.0, but seems compatible with Python 3.3, - is there a special known problem if I do a "pip install SQLAlchemy" ? Sheers. Le vendredi 3 mai 2013 21:48:25 UTC+2, Pierre Raybaut a ?crit : > > Hi all, > > I am pleased to announce that four new versions of WinPython have been > released yesterday with Python 2.7.4 and 3.3.1, 32 and 64 bits. Many > packages have been added or upgraded (see the automatically-generated > changelogs). > > Special thanks to Christoph Gohlke for building most of the binary > packages bundled in WinPython. > > WinPython is a free open-source portable distribution of Python > for Windows, designed for scientists. > > It is a full-featured (see > http://code.google.com/p/winpython/wiki/PackageIndex) > Python-based scientific environment: > * Designed for scientists (thanks to the integrated libraries > NumPy, SciPy, Matplotlib, guiqwt, etc.: > * Regular *scientific users*: interactive data processing > and visualization using Python with Spyder > * *Advanced scientific users and software developers*: > Python applications development with Spyder, version control with > Mercurial and other development tools (like gettext) > * *Portable*: preconfigured, it should run out of the box on any machine > under Windows (without any installation requirements) and the folder > containing WinPython can be moved to any location (local, network or > removable drive) > * *Flexible*: one can install (or should I write "use" as it's portable) > as many WinPython versions as necessary (like isolated and self-consistent > environments), even if those versions are running different versions of > Python (2.7, 3.3) or different architectures (32bit or 64bit) on the same > machine > * *Customizable*: using the integrated package manager (wppm, > as WinPython Package Manager), it's possible to install, uninstall > or upgrade Python packages (see > http://code.google.com/p/winpython/wiki/WPPM for more details > on supported package formats). > > *WinPython is not an attempt to replace Python(x,y)*, this is > just something different (see > http://code.google.com/p/winpython/wiki/Roadmap): more flexible, easier > to maintain, movable and less invasive for the OS, but certainly less > user-friendly, with less packages/contents and without any integration to > Windows explorer [*]. > > [*] Actually there is an optional integration into Windows > explorer, providing the same features as the official Python installer > regarding file associations and context menu entry (this option may be > activated through the WinPython Control Panel), and adding shortcuts to > Windows Start menu. > > Enjoy! > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rishi.sharma at gmail.com Wed May 8 02:07:02 2013 From: rishi.sharma at gmail.com (Rishi Sharma) Date: Wed, 08 May 2013 06:07:02 -0000 Subject: [SciPy-User] Error installing Scipy from repository with latest Cython installed In-Reply-To: <517A8F15.3060500@gmail.com> References: <517A8F15.3060500@gmail.com> Message-ID: <001d6678-2dba-44d2-840e-3a050813ed11@googlegroups.com> I'm getting the same error while trying to build the epd numpy and scipy with cython0.16. The numpy build works smoothly, In [2]: numpy.__version__ Out[2]: '1.6.1' but, the development version of scipy fails to compile, (epd-7.3.2)bash-3.2$ python setup.py install Cythonizing sources Processing scipy/special/_ufuncs_cxx.pyx .... Error compiling Cython file: ------------------------------------------------------------ ... ###################################################################### # Function specializers ###################################################################### def get_nonzero_line(np.ndarray[data_t] a): return fused_nonzero_line[data_t] ^ ------------------------------------------------------------ _ni_label.pyx:96:50: Type not in fused type Traceback (most recent call last): File "/home/rishi/.python_environments/epd-7.3.2/src/scipy/tools/cythonize.py", line 188, in main() File "/home/rishi/.python_environments/epd-7.3.2/src/scipy/tools/cythonize.py", line 184, in main find_process_files(root_dir) File "/home/rishi/.python_environments/epd-7.3.2/src/scipy/tools/cythonize.py", line 176, in find_process_files process(cur_dir, fromfile, tofile, function, hash_db) File "/home/rishi/.python_environments/epd-7.3.2/src/scipy/tools/cythonize.py", line 153, in process processor_function(fromfile, tofile) File "/home/rishi/.python_environments/epd-7.3.2/src/scipy/tools/cythonize.py", line 71, in process_pyx raise Exception('Cython failed') Exception: Cython failed Traceback (most recent call last): File "setup.py", line 230, in setup_package() File "setup.py", line 223, in setup_package generate_cython() File "setup.py", line 159, in generate_cython raise RuntimeError("Running cythonize failed!") RuntimeError: Running cythonize failed! (epd-7.3.2)bash-3.2$ cython --version Cython version 0.16 On Friday, April 26, 2013 7:28:37 AM UTC-7, Jose Guzman wrote: > > Hi everybody, > > I am trying to install the latest developmental version of Scipy, for > that i did: > > >>> git clone https://github.com/scipy/scipy.git Scipy > > when I enter in the Scipy folder and type > >>> python setup.py build, > > I found the following error: > > Exception: Cython failed > Traceback (most recent call last): > File "setup.py", line 229, in > setup_package() > File "setup.py", line 222, in setup_package > generate_cython() > File "setup.py", line 159, in generate_cython > raise RuntimeError("Running cythonize failed!") > RuntimeError: Running cythonize failed! > > > My cython version is: > >>> cython --version 0.19 > > Any ideas? > > Thanks in advance > > > -- > Jose Guzman > http://www.ist.ac.at/~jguzman/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rishi.sharma at gmail.com Wed May 8 03:43:38 2013 From: rishi.sharma at gmail.com (Rishi Sharma) Date: Wed, 08 May 2013 07:43:38 -0000 Subject: [SciPy-User] Error installing Scipy from repository with latest Cython installed In-Reply-To: <001d6678-2dba-44d2-840e-3a050813ed11@googlegroups.com> References: <517A8F15.3060500@gmail.com> <001d6678-2dba-44d2-840e-3a050813ed11@googlegroups.com> Message-ID: <776ac8a7-1713-4d1d-9089-40a4e894dc25@googlegroups.com> It works with the new version of cython. the main changes from the failed attempt (a) a new version of cython was used (b) the virtual environment used distribute, not setuptools (virtualenv --distribute epd-7.3.2) (see also) http://stackoverflow.com/questions/15175135/build-scipy-error-cythonize-failed (epd-7.3.2)bash-3.2$ cython --version Cython version 0.19 In [4]: numpy.__version__ Out[4]: '1.6.1' In [2]: scipy.__version__ Out[2]: '0.10.1' -------------- next part -------------- An HTML attachment was scrubbed... URL: From sylvain.corlay at gmail.com Wed May 8 11:47:33 2013 From: sylvain.corlay at gmail.com (Sylvain Corlay) Date: Wed, 08 May 2013 15:47:33 -0000 Subject: [SciPy-User] ANN: Spyder v2.2 In-Reply-To: References: Message-ID: <6865fbef-4be3-4f83-9e1c-3747369eea01@googlegroups.com> Congratulations to Carlos Jed and Pierre for this new release. Best, Sylvain On Wednesday, May 8, 2013 11:15:20 AM UTC-4, Pierre Raybaut wrote: > > Hi all, > > On the behalf of Spyder's development team ( > http://code.google.com/p/spyderlib/people/list), I'm pleased to announce > that Spyder v2.2 has been released and is available for Windows > XP/Vista/7/8, GNU/Linux and MacOS X: http://code.google.com/p/spyderlib/. > > This release represents 18 months of development since v2.1 and introduces > major enhancements and new features: > * Full support for IPython v0.13, including the ability to attach to > existing kernels > * New MacOS X application > * Much improved debugging experience > * Various editor improvements for code completion, zooming, auto > insertion, and syntax highlighting > * Better looking and faster Object Inspector > * Single instance mode > * Spanish tranlation of the interface > * And many other changes: > http://code.google.com/p/spyderlib/wiki/ChangeLog > > This is the last release to support Python 2.5: > * Spyder 2.2 supports Python 2.5 to 2.7 > * Spyder 2.3 will support Python 2.7 and Python 3 > * (Spyder 2.1.14dev4 is a development release which already supports > Python 3) > See also https://code.google.com/p/spyderlib/downloads/list. > > Spyder is a free, open-source (MIT license) interactive development > environment for the Python language with advanced editing, interactive > testing, debugging and introspection features. Originally designed to > provide MATLAB-like features (integrated help, interactive console, > variable explorer with GUI-based editors for dictionaries, NumPy arrays, > ...), it is strongly oriented towards scientific computing and software > development. Thanks to the `spyderlib` library, Spyder also provides > powerful ready-to-use widgets: embedded Python console (example: > http://packages.python.org/guiqwt/_images/sift3.png), NumPy array editor > (example: http://packages.python.org/guiqwt/_images/sift2.png), > dictionary editor, source code editor, etc. > > Description of key features with tasty screenshots can be found at: > http://code.google.com/p/spyderlib/wiki/Features > > Don't forget to follow Spyder updates/news: > * on the project website: http://code.google.com/p/spyderlib/ > * and on our official blog: http://spyder-ide.blogspot.com/ > > Last, but not least, we welcome any contribution that helps making Spyder > an efficient scientific development/computing environment. Join us to help > creating your favourite environment! > (http://code.google.com/p/spyderlib/wiki/NoteForContributors) > > Enjoy! > -Pierre > -------------- next part -------------- An HTML attachment was scrubbed... URL: From keith.braithwaite0 at gmail.com Thu May 9 12:22:41 2013 From: keith.braithwaite0 at gmail.com (Keith Braithwaite) Date: Thu, 09 May 2013 16:22:41 -0000 Subject: [SciPy-User] MLE for discrete distributions: no fit() method on rv_discrete? Message-ID: Hi, I would like to find MLEs (and subsequently goodness of fit and so on) for empirical data against discrete distributions. I notice that scipy.stats.rv_discrete does not have the fit() method that scipy.stats.rv_continuous does. Is there a technical (either Pythonic or statistical) reason why not? If I roll my own am I going to get nonsense results? Best Regards, Keith -------------- next part -------------- An HTML attachment was scrubbed... URL: From wadnerkar.rohan at gmail.com Mon May 13 06:55:43 2013 From: wadnerkar.rohan at gmail.com (rohan wadnerkar) Date: Mon, 13 May 2013 10:55:43 -0000 Subject: [SciPy-User] problem with scipy.io.wavfile (urgent) Message-ID: Hello all, I am trying to read .wav file using scipy.io.wavfile.read(). It reads some file properly. For some files its giving following error... Warning (from warnings module): File "D:\project\cardiocare-1.0\src\scipy\io\wavfile.py", line 121 warnings.warn("chunk not understood", WavFileWarning)WavFileWarning: chunk not understoodTraceback (most recent call last): File "D:\project\cardiocare-1.0\src\ccare\plot.py", line 37, in plot input_data = read(p.bitfile) File "D:\project\cardiocare-1.0\src\scipy\io\wavfile.py", line 119, in read data = _read_data_chunk(fid, noc, bits) File "D:\project\cardiocare-1.0\src\scipy\io\wavfile.py", line 56, in _read_data_chunk data = data.reshape(-1,noc)ValueError: total size of new array must be unchanged Can any one suggest me any solution? -- -- -- -- -- -- -- -- -- -- Rohan Wadnerkar 09590686578 -------------- next part -------------- An HTML attachment was scrubbed... URL: From adiamondcsi at gmail.com Mon May 13 16:18:55 2013 From: adiamondcsi at gmail.com (psoriasis) Date: Mon, 13 May 2013 20:18:55 -0000 Subject: [SciPy-User] noob question: numpy copy vs standard lib copy Message-ID: I'm new to python. As I understand it, assignment copies by reference and to do otherwise requires a function like the standard library's copy or deepcopy functions. However, from what I see numpy has it's own copy function and using it on a random object (instance of a test class I made up not an array etc) doesn't seem to return the expected copy object. I did try importing the copy module and that worked but then the numpy copy module was "shadowed" but I don't know if that's a problem. Still, I'm sure numpy users need to copy regular objects so what's the standard solution to this? Note: I'm using the latest pythonxy (==> spyder, numpy, etc) Thanks From michael at yanchepmartialarts.com.au Fri May 17 13:38:41 2013 From: michael at yanchepmartialarts.com.au (Michael Borck) Date: Fri, 17 May 2013 17:38:41 -0000 Subject: [SciPy-User] Matlab imfilter/fspecial equivalent In-Reply-To: <5195789A.8060302@gmail.com> References: <51944237.50209@gmail.com> <5195789A.8060302@gmail.com> Message-ID: <51966C4A.9000801@borck.id.au> An update. On further testing, the problem appeared to be caused by converting the RGB image to grayscale, not imresize(). I was getting different image values using a rgb2grayscale convertor form skimage. If I use ndimage.imread(, flatten=True) then I get the same values in the image array/matrix. Since I was using skimage to do the conversion, I posted a query on the skimage mail list, and got back the following response: They're just different color conversion factors. Based on http://www.mathworks.com/help/images/ref/rgb2gray.html, Matlab uses: 0.2989 R + 0.5870 G + 0.1140 B Based on the docstring for `color.rgb2gray`: 0.2125 R + 0.7154 G + 0.0721 B So I think this will explain why the output is similar in structure but different in detail. I will write a convertor using the Matlab's conversion factors to convince myself that the output is the same as in the paper. I will post the final ported version later. Michael. -- On 17/05/13 8:23 AM, Brickle Macho wrote: > I am getting closer. Here are some observations: > > * If I don't imresize() the original image, then visually I can't tell > the difference in the output image from Matlab or Scipy. > * Empirically the results using different kernels are the same. That > is the difference between two images is essentially zero. That is > doesn't seem to matter if I use the imported the "fspecial()" kernel > from matlab; generate my own kernel using a fgaussion(), or use > scipy.ndimage.gaussian_filter(). > > So look like the problem is with ndimage.resize(). > > Michael. > -- -------------- next part -------------- An HTML attachment was scrubbed... URL: From LUJIAN at cec.sc.edu Wed May 22 11:45:55 2013 From: LUJIAN at cec.sc.edu (LU, JIANMIN) Date: Wed, 22 May 2013 15:45:55 -0000 Subject: [SciPy-User] Can SciPy deal with a set of non-linear equations? Message-ID: <1FAC1D95C571204AAEE7BEE54B5AFB561A884C01@CAE145EMBP02.ds.sc.edu> Hi SciPy guys, I get a set of non-linear equations to be solved with Python and the goal is to find the roots of this set of equations. Can SciPy handle this? Regards, Jianmin, from U. of South Carolina. -------------- next part -------------- An HTML attachment was scrubbed... URL: From LUJIAN at cec.sc.edu Wed May 22 11:59:06 2013 From: LUJIAN at cec.sc.edu (LU, JIANMIN) Date: Wed, 22 May 2013 15:59:06 -0000 Subject: [SciPy-User] Can SciPy deal with a set of non-linear equations? Message-ID: <1FAC1D95C571204AAEE7BEE54B5AFB561A884C17@CAE145EMBP02.ds.sc.edu> Hi guys, I know that non-linear equations can be solved by fsolve() in SciPy. Another question from me would be: Can SciPy handle the multiple precision number problem? Namely, can SciPy define a variable with arbitray precision and can fsolve() take that correctly? Thanks. Regards, Jianmin, from U. of South Carolina. -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.blelloch at gmail.com Fri May 24 11:19:31 2013 From: paul.blelloch at gmail.com (Paul Blelloch) Date: Fri, 24 May 2013 15:19:31 -0000 Subject: [SciPy-User] Numpy 1.7.1 Crashing with MKL and AVX instructions Message-ID: <07647d8b-69b2-422a-aa78-b3af9ad11c31@googlegroups.com> I've tried posting this on the numpy list, but it keeps on getting bounced, so I'll try here: I found that when I went from numpy 1.7.0 to 1.7.1 I get a crash whenever I try an eigenvalue calculation (or any other linalg calculation) on matrices bigger than about 200x200. This happens with both the latest Anaconda and WinPython 64-bit Windows distributions (both of which use numpy 1.7.1) and occurs on all the HP workstations at my company. The folks at Continuum helped me debug this and determined that the failure was most likely in use of AVX instructions in MKL, but we haven't found a workaround yet short of sticking to numpy 1.7.0. The problem is apparently specific to some combination of processor and OS. Here's what I have: OS Name Microsoft Windows 7 Professional Version 6.1.7601 Service Pack 1 Build 7601 OS Manufacturer Microsoft Corporation System Name Z420-6 System Manufacturer Hewlett-Packard System Model HP Z420 Workstation System Type x64-based PC Processor Intel(R) Xeon(R) CPU E5-1650 0 @ 3.20GHz, 3201 Mhz, 6 Core(s), 12 Logical Processor(s) BIOS Version/Date Hewlett-Packard J61 v01.14, 7/17/2012 SMBIOS Version 2.7 Windows Directory C:\Windows System Directory C:\Windows\system32 Boot Device \Device\HarddiskVolume1 Locale United States Hardware Abstraction Layer Version = "6.1.7601.17514" Installed Physical Memory (RAM) 32.0 GB Total Physical Memory 31.9 GB Available Physical Memory 28.4 GB Total Virtual Memory 95.8 GB Available Virtual Memory 92.1 GB Page File Space 63.9 GB Page File C:\pagefile.sys I'm surprised that I'm the only person out there running into this. Is there anyone else who's running into this problem with numpy 1.7.1 with MKL on 64-bit Windows? -Paul Blelloch -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.blelloch at gmail.com Fri May 24 11:36:06 2013 From: paul.blelloch at gmail.com (Paul Blelloch) Date: Fri, 24 May 2013 15:36:06 -0000 Subject: [SciPy-User] Numpy 1.7.1 Crashing with MKL and AVX instructions In-Reply-To: <07647d8b-69b2-422a-aa78-b3af9ad11c31@googlegroups.com> References: <07647d8b-69b2-422a-aa78-b3af9ad11c31@googlegroups.com> Message-ID: <1d1b56d7-3110-49f6-8bad-2ea94e9a8e89@googlegroups.com> This is just an extra piece of information that I also downloaded numpy 1.7.1 directly from *http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy*, so this seems to be fairly universal for the MKL version of numpy 1.7.1. On Friday, May 24, 2013 8:24:37 AM UTC-7, Paul Blelloch wrote: > I've tried posting this on the numpy list, but it keeps on getting > bounced, so I'll try here: > > I found that when I went from numpy 1.7.0 to 1.7.1 I get a crash whenever > I try an eigenvalue calculation (or any other linalg calculation) on > matrices bigger than about 200x200. This happens with both the latest > Anaconda and WinPython 64-bit Windows distributions (both of which use > numpy 1.7.1) and occurs on all the HP workstations at my company. The > folks at Continuum helped me debug this and determined that the failure was > most likely in use of AVX instructions in MKL, but we haven't found a > workaround yet short of sticking to numpy 1.7.0. The problem is apparently > specific to some combination of processor and OS. Here's what I have: > > > > OS Name Microsoft Windows 7 Professional > Version 6.1.7601 Service Pack 1 Build 7601 > OS Manufacturer Microsoft Corporation > System Name Z420-6 > System Manufacturer Hewlett-Packard > System Model HP Z420 Workstation > System Type x64-based PC > Processor Intel(R) Xeon(R) CPU E5-1650 0 @ 3.20GHz, 3201 Mhz, 6 Core(s), > 12 Logical Processor(s) > BIOS Version/Date Hewlett-Packard J61 v01.14, 7/17/2012 > SMBIOS Version 2.7 > Windows Directory C:\Windows > System Directory C:\Windows\system32 > Boot Device \Device\HarddiskVolume1 > Locale United States > Hardware Abstraction Layer Version = "6.1.7601.17514" > Installed Physical Memory (RAM) 32.0 GB > Total Physical Memory 31.9 GB > Available Physical Memory 28.4 GB > Total Virtual Memory 95.8 GB > Available Virtual Memory 92.1 GB > Page File Space 63.9 GB > Page File C:\pagefile.sys > > > > I'm surprised that I'm the only person out there running into this. Is > there anyone else who's running into this problem with numpy 1.7.1 with MKL > on 64-bit Windows? > > > > -Paul Blelloch > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.blelloch at gmail.com Fri May 24 15:31:19 2013 From: paul.blelloch at gmail.com (Paul Blelloch) Date: Fri, 24 May 2013 19:31:19 -0000 Subject: [SciPy-User] sqrtm is too slow for matrices of size 1000 In-Reply-To: References: Message-ID: <100d0fea-dd88-4858-8599-0aecd5fa79a3@googlegroups.com> I was interested so I tried this and in my case it just crashed Python. I think that the crash might be related to an issue that posted earlier related to numpy 1.7.1. In fact any of the linalg functions in scipy 0.12.0 seem to cause Python to crash on my computer. On Saturday, April 13, 2013 8:47:25 AM UTC-7, Vivek Kulkarni wrote: > Hi, > I am implementing spectral clustering for my course work and am using the > sqrtm function to find the square root of a matrix . But its far too slow, > my matrix is of size (1258,1258). Any suggestions on how I can speed things > up or some other function that scipy supports which can be used. > > I am implementing the algorithm as described here: > http://books.nips.cc/papers/files/nips14/AA35.pdf > > Finding the square root of D^(-1) is way too slow for D^-1 of size > (1258,1258). > > > Thanks. > Vivek. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.blelloch at gmail.com Fri May 24 15:32:21 2013 From: paul.blelloch at gmail.com (Paul Blelloch) Date: Fri, 24 May 2013 19:32:21 -0000 Subject: [SciPy-User] Numpy 1.7.1 Crashing with MKL and AVX instructions In-Reply-To: <07647d8b-69b2-422a-aa78-b3af9ad11c31@googlegroups.com> References: <07647d8b-69b2-422a-aa78-b3af9ad11c31@googlegroups.com> Message-ID: And another piece of information is that I get the same behavior with the scipy 0.12.0 linalg routines such as sqrtm. On Friday, May 24, 2013 8:24:37 AM UTC-7, Paul Blelloch wrote: > I've tried posting this on the numpy list, but it keeps on getting > bounced, so I'll try here: > > I found that when I went from numpy 1.7.0 to 1.7.1 I get a crash whenever > I try an eigenvalue calculation (or any other linalg calculation) on > matrices bigger than about 200x200. This happens with both the latest > Anaconda and WinPython 64-bit Windows distributions (both of which use > numpy 1.7.1) and occurs on all the HP workstations at my company. The > folks at Continuum helped me debug this and determined that the failure was > most likely in use of AVX instructions in MKL, but we haven't found a > workaround yet short of sticking to numpy 1.7.0. The problem is apparently > specific to some combination of processor and OS. Here's what I have: > > > > OS Name Microsoft Windows 7 Professional > Version 6.1.7601 Service Pack 1 Build 7601 > OS Manufacturer Microsoft Corporation > System Name Z420-6 > System Manufacturer Hewlett-Packard > System Model HP Z420 Workstation > System Type x64-based PC > Processor Intel(R) Xeon(R) CPU E5-1650 0 @ 3.20GHz, 3201 Mhz, 6 Core(s), > 12 Logical Processor(s) > BIOS Version/Date Hewlett-Packard J61 v01.14, 7/17/2012 > SMBIOS Version 2.7 > Windows Directory C:\Windows > System Directory C:\Windows\system32 > Boot Device \Device\HarddiskVolume1 > Locale United States > Hardware Abstraction Layer Version = "6.1.7601.17514" > Installed Physical Memory (RAM) 32.0 GB > Total Physical Memory 31.9 GB > Available Physical Memory 28.4 GB > Total Virtual Memory 95.8 GB > Available Virtual Memory 92.1 GB > Page File Space 63.9 GB > Page File C:\pagefile.sys > > > > I'm surprised that I'm the only person out there running into this. Is > there anyone else who's running into this problem with numpy 1.7.1 with MKL > on 64-bit Windows? > > > > -Paul Blelloch > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.blelloch at gmail.com Sat May 25 00:37:03 2013 From: paul.blelloch at gmail.com (Paul Blelloch) Date: Sat, 25 May 2013 04:37:03 -0000 Subject: [SciPy-User] sqrtm is too slow for matrices of size 1000 In-Reply-To: References: Message-ID: <7d0779e0-8d21-45a9-9d1b-b9049571ac13@googlegroups.com> I don't have an answer, but I can confirm that the algorithm seems to be fairly slow. I compared a sqrtm or a 1258x1258 random (non-symmetric) matrix in Scipy 0.12.0 and Matlab 2011b on the same machine (weak laptop). Matlab was 38.5 seconds. Scipy was 367 seconds. I did look briefly at the code in both Scipy and Matlab and they both appear to be using an algorithm by by Nicholas J. Higham that's based on a Schur decomposition. I checked the elapsed time for the Schur decomposition itself, and it wasn't very different (~9 seconds on Scipy and 7 seconds in Matlab). In fact, the Scipy version first returns the real Schur decomposition and then converts that to complex, which only seems to take about 3.5 seconds. The difference would appear to be somewhere else in the algorithm. Clearly it can be improved, but that would require some digging. On Saturday, April 13, 2013 8:47:25 AM UTC-7, Vivek Kulkarni wrote: > > Hi, > I am implementing spectral clustering for my course work and am using the > sqrtm function to find the square root of a matrix . But its far too slow, > my matrix is of size (1258,1258). Any suggestions on how I can speed things > up or some other function that scipy supports which can be used. > > I am implementing the algorithm as described here: > http://books.nips.cc/papers/files/nips14/AA35.pdf > > Finding the square root of D^(-1) is way too slow for D^-1 of size > (1258,1258). > > > Thanks. > Vivek. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.blelloch at gmail.com Sat May 25 00:46:58 2013 From: paul.blelloch at gmail.com (Paul Blelloch) Date: Sat, 25 May 2013 04:46:58 -0000 Subject: [SciPy-User] sqrtm is too slow for matrices of size 1000 In-Reply-To: References: Message-ID: <03398196-8dfb-4be3-b45f-9804c1abff84@googlegroups.com> I looked at the rest of the algorithm, and Scipy and Matlab appear to be doing nearly the same thing. In Scipy there are three nested for loops as follows: for j in range(n): R[j,j] = sqrt(T[j,j]) for i in range(j-1,-1,-1): s = 0 for k in range(i+1,j): s = s + R[i,k]*R[k,j] R[i,j] = (T[i,j] - s)/(R[i,i] + R[j,j]) While in Matlab it's two nested loops as follows: for j=1:n R(j,j) = sqrt(T(j,j)); for i=j-1:-1:1 k = i+1:j-1; s = R(i,k)*R(k,j); R(i,j) = (T(i,j) - s)/(R(i,i) + R(j,j)); end end The difference is that Matlab is replacing the inner loop by a Vector multiply of a slice of the matrix. This is where the speed difference may be. So if you're feeling adventurous you could try to do the same thing. On Saturday, April 13, 2013 8:47:25 AM UTC-7, Vivek Kulkarni wrote: > > Hi, > I am implementing spectral clustering for my course work and am using the > sqrtm function to find the square root of a matrix . But its far too slow, > my matrix is of size (1258,1258). Any suggestions on how I can speed things > up or some other function that scipy supports which can be used. > > I am implementing the algorithm as described here: > http://books.nips.cc/papers/files/nips14/AA35.pdf > > Finding the square root of D^(-1) is way too slow for D^-1 of size > (1258,1258). > > > Thanks. > Vivek. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.blelloch at gmail.com Sat May 25 01:21:15 2013 From: paul.blelloch at gmail.com (Paul Blelloch) Date: Sat, 25 May 2013 05:21:15 -0000 Subject: [SciPy-User] sqrtm is too slow for matrices of size 1000 In-Reply-To: References: Message-ID: <63b35b81-023d-4950-a584-7b2bcbfee247@googlegroups.com> I think that I found the problem. It was in not recognizing that the inner for loop is actually a dot product. If I replace the following lines of code: s = 0 for k in range(i+1,j): s = s + R[i,k]*R[k,j] with s = np.dot(R[i,(i+1):j],R[(i+1):j,j]) Run time decreases from 367 to 15.6 seconds. My guess is that you could get considerable further speedup, but I'm pleased with the 15.6 seconds. If you copy the sqrtm function from scipy and make that change I think that you'll see considerable improvement. On Saturday, April 13, 2013 8:47:25 AM UTC-7, Vivek Kulkarni wrote: > > Hi, > I am implementing spectral clustering for my course work and am using the > sqrtm function to find the square root of a matrix . But its far too slow, > my matrix is of size (1258,1258). Any suggestions on how I can speed things > up or some other function that scipy supports which can be used. > > I am implementing the algorithm as described here: > http://books.nips.cc/papers/files/nips14/AA35.pdf > > Finding the square root of D^(-1) is way too slow for D^-1 of size > (1258,1258). > > > Thanks. > Vivek. > -------------- next part -------------- An HTML attachment was scrubbed... URL: