From newville at cars.uchicago.edu Tue Jul 1 15:14:46 2014 From: newville at cars.uchicago.edu (Matt Newville) Date: Tue, 1 Jul 2014 14:14:46 -0500 Subject: [SciPy-Dev] SciPy-Dev Digest, Vol 129, Issue 1 In-Reply-To: References: Message-ID: > Date: Tue, 1 Jul 2014 00:05:43 +0000 (UTC) > From: Sturla Molden > Subject: Re: [SciPy-Dev] SciPy-Dev Digest, Vol 128, Issue 5 > To: scipy-dev at scipy.org > Message-ID: > <1816260967425865047.889482sturla.molden-gmail.com at news.gmane.org> > Content-Type: text/plain; charset=UTF-8 > > Matt Newville wrote: > > > I don't disagree that scipy could use more pure optimizers, but I also > > think that striving for a more consistent and elegant interface to these > > would be very helpful. With the notable exception of the relatively recent > > unification of the scaler minimizers with minimize(), it seems that many of > > the existing methods are fairly bare-bones wrappings of underlying C or > > Fortran code. Of course, having such wrapping is critically important, > > but I think there is a need for a higher level interface as well. > > The raison d'etre for SciPy is "nice to use". So clearly simple and > intuitive high-level interfaces are needed. If we only cared about speed we > should all be coding in Fortran 77. Personally I am willing to scrifice a > lot of speed for a nice high-level interface. The solvers in scipy.optimize use interfaces that are clearly inherited directly from the Fortran, with very little change, even in the use of short argument names. In some ways it make it easy for old-timers who see leastsq() as a shallow wrapping of MINPACKs lmdif/lmder. It's "nice" in that the objective function is written in Python, But the interfaces to these functions themselves is not very Pythonic. > Currently my main interest in SciPy's LM is the underlying solver, though. > It's a very old Fortran code that even supplies its own linear algebra > solvers because it was written before LAPACK. It's not very nice on modern > computers, for various reasons. Could you elaborate? What you see as the main problems and the reasons that this is not very nice on modern computers? --Matt Newville -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.dvro at gmail.com Wed Jul 2 07:51:45 2014 From: victor.dvro at gmail.com (Dayvid Victor) Date: Wed, 2 Jul 2014 08:51:45 -0300 Subject: [SciPy-Dev] scikit package naming/submission Message-ID: Hello, I am creating a new package which, for now, is out of scope for sklearn. So I created scikit-protopy (current link: https://github.com/dvro/scikit-protopy). So far I have seen the following use: - from scikits.protopy import * - from skprotopy import * But I'm currently using: - from protopy import * Is it ok? Will it be included in the scikits ( https://scikits.appspot.com/scikits) when submited to pypi? Thanks, -- *Dayvid Victor R. de Oliveira* PhD Candidate in Computer Science at Federal University of Pernambuco (UFPE) MSc in Computer Science at Federal University of Pernambuco (UFPE) BSc in Computer Engineering - Federal University of Pernambuco (UFPE) -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Jul 2 12:16:43 2014 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 2 Jul 2014 17:16:43 +0100 Subject: [SciPy-Dev] scikit package naming/submission In-Reply-To: References: Message-ID: On 2 Jul 2014 12:52, "Dayvid Victor" wrote: > > Hello, > > I am creating a new package which, for now, is out of scope for sklearn. So I created scikit-protopy (current link: https://github.com/dvro/scikit-protopy). > > So far I have seen the following use: > from scikits.protopy import * > from skprotopy import * > But I'm currently using: > from protopy import * > Is it ok? It's fine - the "scikits brand" is a nice idea that's turned out not to matter terribly much in practice (IMHO). There are lots of popular packages that don't use the scikits namespace, and several of the popular packages that used to use the scikits namespace have given up on it because it causes distribution headaches. As long as your package is googleable and on PyPi, then people will be able to find and install it. > Will it be included in the scikits (https://scikits.appspot.com/scikits) when submited to pypi? Probably not (at least by default?). Listing on this website is probably the only concrete advantage given by using the scikits brand. You can decide how important that is to you :-) (Technically I think you could name the project scikits-protopy on PyPi while having the python package be called just protopy, but I wouldn't recommend it. IMO PyPi names and python packages should always match unless you have a very good reason why not (e.g. backcompat constraints). Anything else is pointlessly confusing for everyone trying to install your package.) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Wed Jul 2 13:39:01 2014 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 02 Jul 2014 19:39:01 +0200 Subject: [SciPy-Dev] scikit package naming/submission Message-ID: I actually think that it should be included on the page once it is on pypi: the page simply uses a regex on pypi if I remember correctly.? Ga?l

-------- Original message --------

From: Nathaniel Smith

Date:02/07/2014 18:16 (GMT+01:00)

To: SciPy Developers List

Subject: Re: [SciPy-Dev] scikit package naming/submission

On 2 Jul 2014 12:52, "Dayvid Victor" wrote: > > Hello, > > I am creating a new package which, for now, is out of scope for sklearn. So I created scikit-protopy (current link: https://github.com/dvro/scikit-protopy). > > So far I have seen the following use: > from scikits.protopy import * > from skprotopy import * > But I'm currently using: > from protopy import * > Is it ok? It's fine - the "scikits brand" is a nice idea that's turned out not to matter terribly much in practice (IMHO). There are lots of popular packages that don't use the scikits namespace, and several of the popular packages that used to use the scikits namespace have given up on it because it causes distribution headaches. As long as your package is googleable and on PyPi, then people will be able to find and install it. > Will it be included in the scikits (https://scikits.appspot.com/scikits) when submited to pypi? Probably not (at least by default?). Listing on this website is probably the only concrete advantage given by using the scikits brand. You can decide how important that is to you :-) (Technically I think you could name the project scikits-protopy on PyPi while having the python package be called just protopy, but I wouldn't recommend it. IMO PyPi names and python packages should always match unless you have a very good reason why not (e.g. backcompat constraints). Anything else is pointlessly confusing for everyone trying to install your package.) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.gramfort at inria.fr Fri Jul 4 07:44:41 2014 From: alexandre.gramfort at inria.fr (Alexandre Gramfort) Date: Fri, 4 Jul 2014 13:44:41 +0200 Subject: [SciPy-Dev] griddata equivalent of matlab "V4" option Message-ID: hi, scipy does not seem to implement the griddata V4 option from matlab. The original reference is : David T. Sandwell, Biharmonic spline interpolation of GEOS-3 and SEASAT altimeter data, Geophysical Research Letters, 2, 139-142, 1987. is there a reason for it? should I look for it somewhere else? thanks Alex From pav at iki.fi Fri Jul 4 14:59:50 2014 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 04 Jul 2014 21:59:50 +0300 Subject: [SciPy-Dev] griddata equivalent of matlab "V4" option In-Reply-To: References: Message-ID: 04.07.2014 14:44, Alexandre Gramfort kirjoitti: > scipy does not seem to implement the griddata V4 option from matlab. > > The original reference is : > > David T. Sandwell, Biharmonic spline interpolation of GEOS-3 and > SEASAT altimeter data, Geophysical Research Letters, 2, 139-142, 1987. > > is there a reason for it? Is there a good reason to implement it? -- Pauli Virtanen From alexandre.gramfort at inria.fr Fri Jul 4 15:38:22 2014 From: alexandre.gramfort at inria.fr (Alexandre Gramfort) Date: Fri, 4 Jul 2014 21:38:22 +0200 Subject: [SciPy-Dev] griddata equivalent of matlab "V4" option In-Reply-To: References: Message-ID: > Is there a good reason to implement it? I need to produce images like: http://fieldtrip.fcdonders.nl/_media/tutorial/plotting/figure4.png?cache=&w=568&h=496 the black dots are the sensors where measurements are done. the interpolation needs to extrapolate a tiny bit to fill the circle although the sensors are a little inside. The options of griddata don't do the job. cubic cannot extrapolate and nearest is not nice. Alex From lasagnadavide at gmail.com Sat Jul 5 06:36:24 2014 From: lasagnadavide at gmail.com (Davide Lasagna) Date: Sat, 5 Jul 2014 11:36:24 +0100 Subject: [SciPy-Dev] griddata equivalent of matlab "V4" option In-Reply-To: References: Message-ID: Do you by chance happen to know the value on the circle? Because you could add some artificial points and use whatever SciPy provides. Davide On 4 Jul 2014 20:38, "Alexandre Gramfort" wrote: > > Is there a good reason to implement it? > > I need to produce images like: > > > http://fieldtrip.fcdonders.nl/_media/tutorial/plotting/figure4.png?cache=&w=568&h=496 > > the black dots are the sensors where measurements are done. > > the interpolation needs to extrapolate a tiny bit to fill the circle > although the sensors are a little inside. The options of griddata > don't do the job. cubic cannot extrapolate and nearest is not nice. > > Alex > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.gramfort at inria.fr Sat Jul 5 15:12:03 2014 From: alexandre.gramfort at inria.fr (Alexandre Gramfort) Date: Sat, 5 Jul 2014 21:12:03 +0200 Subject: [SciPy-Dev] griddata equivalent of matlab "V4" option In-Reply-To: References: Message-ID: > Do you by chance happen to know the value on the circle? Because you could > add some artificial points and use whatever SciPy provides. no I don't. In the mean time we reimplemented this algorithm. let me know if it shall be pushed to scipy. Alex From alexlib at eng.tau.ac.il Sat Jul 5 15:13:01 2014 From: alexlib at eng.tau.ac.il (Alex Liberzon) Date: Sat, 5 Jul 2014 22:13:01 +0300 Subject: [SciPy-Dev] griddata equivalent of matlab "V4" option Message-ID: I believe the Radial Basis Function can do the work: http://wiki.scipy.org/Cookbook/RadialBasisFunctions On Sat, Jul 5, 2014 at 8:00 PM, wrote: > Send SciPy-Dev mailing list submissions to > scipy-dev at scipy.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.scipy.org/mailman/listinfo/scipy-dev > or, via email, send a message with subject or body 'help' to > scipy-dev-request at scipy.org > > You can reach the person managing the list at > scipy-dev-owner at scipy.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of SciPy-Dev digest..." > > > Today's Topics: > > 1. Re: griddata equivalent of matlab "V4" option (Pauli Virtanen) > 2. Re: griddata equivalent of matlab "V4" option (Alexandre Gramfort) > 3. Re: griddata equivalent of matlab "V4" option (Davide Lasagna) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 04 Jul 2014 21:59:50 +0300 > From: Pauli Virtanen > Subject: Re: [SciPy-Dev] griddata equivalent of matlab "V4" option > To: scipy-dev at scipy.org > Message-ID: > Content-Type: text/plain; charset=ISO-8859-1 > > 04.07.2014 14:44, Alexandre Gramfort kirjoitti: > > scipy does not seem to implement the griddata V4 option from matlab. > > > > The original reference is : > > > > David T. Sandwell, Biharmonic spline interpolation of GEOS-3 and > > SEASAT altimeter data, Geophysical Research Letters, 2, 139-142, 1987. > > > > is there a reason for it? > > Is there a good reason to implement it? > > -- > Pauli Virtanen > > > > ------------------------------ > > Message: 2 > Date: Fri, 4 Jul 2014 21:38:22 +0200 > From: Alexandre Gramfort > Subject: Re: [SciPy-Dev] griddata equivalent of matlab "V4" option > To: SciPy Developers List > Message-ID: > < > CADeotZrpfFj7S3B3jmHvDWxcVtgRg283oCrYRTegd2gqyGQ6sQ at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > > Is there a good reason to implement it? > > I need to produce images like: > > > http://fieldtrip.fcdonders.nl/_media/tutorial/plotting/figure4.png?cache=&w=568&h=496 > > the black dots are the sensors where measurements are done. > > the interpolation needs to extrapolate a tiny bit to fill the circle > although the sensors are a little inside. The options of griddata > don't do the job. cubic cannot extrapolate and nearest is not nice. > > Alex > > > ------------------------------ > > Message: 3 > Date: Sat, 5 Jul 2014 11:36:24 +0100 > From: Davide Lasagna > Subject: Re: [SciPy-Dev] griddata equivalent of matlab "V4" option > To: SciPy Developers List > Message-ID: > sh8aES7wn369ay7+WzqLq4QevQBgns7ambw at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Do you by chance happen to know the value on the circle? Because you could > add some artificial points and use whatever SciPy provides. > > Davide > On 4 Jul 2014 20:38, "Alexandre Gramfort" > wrote: > > > > Is there a good reason to implement it? > > > > I need to produce images like: > > > > > > > http://fieldtrip.fcdonders.nl/_media/tutorial/plotting/figure4.png?cache=&w=568&h=496 > > > > the black dots are the sensors where measurements are done. > > > > the interpolation needs to extrapolate a tiny bit to fill the circle > > although the sensors are a little inside. The options of griddata > > don't do the job. cubic cannot extrapolate and nearest is not nice. > > > > Alex > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mail.scipy.org/pipermail/scipy-dev/attachments/20140705/e08215ed/attachment-0001.html > > ------------------------------ > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > End of SciPy-Dev Digest, Vol 129, Issue 5 > ***************************************** > -- Alex Liberzon | Turbulence Structure Laboratory | School of Mechanical Engineering | Tel Aviv University | Tel Aviv 69978 | Israel E-mail: alexlib at tau.ac.il | Office: +972-3-640-8928 | Lab: +972-3-640-6860 (telefax) | www.eng.tau.ac.il/turbulencelab -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Jul 5 15:24:10 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 5 Jul 2014 20:24:10 +0100 Subject: [SciPy-Dev] griddata equivalent of matlab "V4" option In-Reply-To: References: Message-ID: On Sat, Jul 5, 2014 at 8:12 PM, Alexandre Gramfort wrote: >> Do you by chance happen to know the value on the circle? Because you could >> add some artificial points and use whatever SciPy provides. > > no I don't. In the mean time we reimplemented this algorithm. > > let me know if it shall be pushed to scipy. I'm not a scipy dev, but I am interested in the functionality, so if your code is a good way to provide it then I'd vote for submitting a PR to scipy :-) -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From denis.engemann at gmail.com Sat Jul 5 15:45:00 2014 From: denis.engemann at gmail.com (Denis A. Engemann) Date: Sat, 5 Jul 2014 21:45:00 +0200 Subject: [SciPy-Dev] griddata equivalent of matlab "V4" option In-Reply-To: References:

Message-ID: <80FE735A-68B7-4334-B814-F2D58FDBA028@gmail.com> Cool, I'll a open a PR tomorrow if no one is objected. > On Jul 5, 2014, at 9:24 PM, Nathaniel Smith wrote: > > On Sat, Jul 5, 2014 at 8:12 PM, Alexandre Gramfort > wrote: >>> Do you by chance happen to know the value on the circle? Because you could >>> add some artificial points and use whatever SciPy provides. >> >> no I don't. In the mean time we reimplemented this algorithm. >> >> let me know if it shall be pushed to scipy. > > I'm not a scipy dev, but I am interested in the functionality, so if > your code is a good way to provide it then I'd vote for submitting a > PR to scipy :-) > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev From rajsai24 at gmail.com Thu Jul 10 06:08:53 2014 From: rajsai24 at gmail.com (Sai Rajeshwar) Date: Thu, 10 Jul 2014 15:38:53 +0530 Subject: [SciPy-Dev] scipy improve performance by parallelizing Message-ID: hi all, im trying to optimise a python code takes huge amount of time on scipy functions such as scipy.signa.conv. Following are some of my queries regarding the same.. It would be great to hear from you.. thanks.. ---------------------------------------------------- 1) Can Scipy take advantage of multi-cores.. if so how 2)what are ways we can improve the performance of scipy/numpy functions eg: using openmp, mpi etc 3)If scipy internally use blas/mkl libraries can we enable parallelism through these? looks like i have to work on internals of scipy.. thanks a lot.. *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Thu Jul 10 09:00:20 2014 From: nouiz at nouiz.org (=?UTF-8?B?RnLDqWTDqXJpYyBCYXN0aWVu?=) Date: Thu, 10 Jul 2014 09:00:20 -0400 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References: Message-ID: Specific about convolution, there is a faster implementation in Theano: http://deeplearning.net/software/theano/library/tensor/nnet/conv.html It allow you to do multiple convolution at the same time. There is a parallel implementation, but sometimes, it speed things up, but othertimes, it slow things down. Fred p.s. I'm a Theano developer. On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar wrote: > hi all, > > im trying to optimise a python code takes huge amount of time on scipy > functions such as scipy.signa.conv. Following are some of my queries > regarding the same.. It would be great to hear from you.. thanks.. > ---------------------------------------------------- > 1) Can Scipy take advantage of multi-cores.. if so how > 2)what are ways we can improve the performance of scipy/numpy functions > eg: using openmp, mpi etc > 3)If scipy internally use blas/mkl libraries can we enable parallelism > through these? > > > looks like i have to work on internals of scipy.. thanks a lot.. > > > *with regards..* > > *M. Sai Rajeswar* > *M-tech Computer Technology* > > > *IIT Delhi----------------------------------Cogito Ergo Sum---------* > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashwinsrnth at gmail.com Thu Jul 10 11:19:00 2014 From: ashwinsrnth at gmail.com (Ashwin Srinath) Date: Thu, 10 Jul 2014 11:19:00 -0400 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References: Message-ID: Hey, Sai I'm no expert, so I'll just share a few links to start this discussion. You definitely want to look at Cython if you're computing with NumPy arrays. If you're familiar with the MPI programming model, you want to check out mpi4py . If you have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA . Thanks, Ashwin On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar wrote: > hi all, > > im trying to optimise a python code takes huge amount of time on scipy > functions such as scipy.signa.conv. Following are some of my queries > regarding the same.. It would be great to hear from you.. thanks.. > ---------------------------------------------------- > 1) Can Scipy take advantage of multi-cores.. if so how > 2)what are ways we can improve the performance of scipy/numpy functions > eg: using openmp, mpi etc > 3)If scipy internally use blas/mkl libraries can we enable parallelism > through these? > > > looks like i have to work on internals of scipy.. thanks a lot.. > > > *with regards..* > > *M. Sai Rajeswar* > *M-tech Computer Technology* > > > *IIT Delhi----------------------------------Cogito Ergo Sum---------* > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.dvro at gmail.com Fri Jul 11 08:32:42 2014 From: victor.dvro at gmail.com (Dayvid Victor) Date: Fri, 11 Jul 2014 09:32:42 -0300 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References: Message-ID: Hi =, About PyCUDA, scikits.cuda package uses PyCUDA to provide high-level functions similar to those in numpy. maybe you should check it out (at least for examples)! []'s On Thu, Jul 10, 2014 at 12:19 PM, Ashwin Srinath wrote: > Hey, Sai > > I'm no expert, so I'll just share a few links to start this discussion. > You definitely want to look at Cython if you're > computing with NumPy arrays. If you're familiar with the MPI programming > model, you want to check out mpi4py . If you > have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA > . > > Thanks, > Ashwin > > > On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar wrote: > >> hi all, >> >> im trying to optimise a python code takes huge amount of time on scipy >> functions such as scipy.signa.conv. Following are some of my queries >> regarding the same.. It would be great to hear from you.. thanks.. >> ---------------------------------------------------- >> 1) Can Scipy take advantage of multi-cores.. if so how >> 2)what are ways we can improve the performance of scipy/numpy functions >> eg: using openmp, mpi etc >> 3)If scipy internally use blas/mkl libraries can we enable parallelism >> through these? >> >> >> looks like i have to work on internals of scipy.. thanks a lot.. >> >> >> *with regards..* >> >> *M. Sai Rajeswar* >> *M-tech Computer Technology* >> >> >> *IIT Delhi----------------------------------Cogito Ergo Sum---------* >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -- *Dayvid Victor R. de Oliveira* PhD Candidate in Computer Science at Federal University of Pernambuco (UFPE) MSc in Computer Science at Federal University of Pernambuco (UFPE) BSc in Computer Engineering - Federal University of Pernambuco (UFPE) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Fri Jul 11 13:43:17 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 11 Jul 2014 19:43:17 +0200 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References: Message-ID: <53C02235.8000705@googlemail.com> for simple convolutions there is also np.convolve compared to scipy it releases the GIL and you can use normal python threads for parallization if you need to compute many independent convolutions and not just one. That said scipy should probably release the GIL too, probably a bug that it doesn't. On 10.07.2014 17:19, Ashwin Srinath wrote: > Hey, Sai > > I'm no expert, so I'll just share a few links to start this discussion. > You definitely want to look at Cython if you're > computing with NumPy arrays. If you're familiar with the MPI programming > model, you want to check out mpi4py . If you > have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA > . > > Thanks, > Ashwin > > > On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar > wrote: > > hi all, > > im trying to optimise a python code takes huge amount of time on > scipy functions such as scipy.signa.conv. Following are some of my > queries regarding the same.. It would be great to hear from you.. > thanks.. > ---------------------------------------------------- > 1) Can Scipy take advantage of multi-cores.. if so how > 2)what are ways we can improve the performance of scipy/numpy > functions eg: using openmp, mpi etc > 3)If scipy internally use blas/mkl libraries can we enable > parallelism through these? > > > looks like i have to work on internals of scipy.. thanks a lot.. > > > *with regards..* > * > * > *M. Sai Rajeswar* > *M-tech Computer Technology* > *IIT Delhi > ----------------------------------Cogito Ergo Sum--------- > * > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > From derek at astro.physik.uni-goettingen.de Fri Jul 11 08:32:34 2014 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Fri, 11 Jul 2014 14:32:34 +0200 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References: Message-ID: <26946BDF-2841-4E79-AF20-9266A2C97377@astro.physik.uni-goettingen.de> On 10 Jul 2014, at 05:19 pm, Ashwin Srinath wrote: > I'm no expert, so I'll just share a few links to start this discussion. You definitely want to look at Cython if you're computing with NumPy arrays. If you're familiar with the MPI programming model, you want to check out mpi4py. If you have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA. > > Thanks, > Ashwin > > > On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar wrote: > hi all, > > im trying to optimise a python code takes huge amount of time on scipy functions such as scipy.signa.conv. Following are some of my queries regarding the same.. It would be great to hear from you.. thanks.. > ---------------------------------------------------- > 1) Can Scipy take advantage of multi-cores.. if so how > 2)what are ways we can improve the performance of scipy/numpy functions eg: using openmp, mpi etc > 3)If scipy internally use blas/mkl libraries can we enable parallelism through these? > If your operations are using the BLAS functions a lot, you get SMP parallelisation for very cheap by linking to the multithreaded MKL or ACML versions and setting OMP_NUM_THREADS/MKL_NUM_THREADS to the no. of available cores. Cheers, Derek From rajsai24 at gmail.com Sun Jul 13 01:59:18 2014 From: rajsai24 at gmail.com (Sai Rajeshwar) Date: Sun, 13 Jul 2014 11:29:18 +0530 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References:

Message-ID: ok thanks for scipy especially is there any way .. we can speed it up.. im right now using scipy.signal.convolve which is taking huge amount of time... can i take liverage on openmp/mpi/cuda? *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Fri, Jul 11, 2014 at 6:02 PM, Dayvid Victor wrote: > Hi =, > > About PyCUDA, scikits.cuda package uses > PyCUDA to provide high-level functions similar to those in numpy. maybe you > should check it out (at least for examples)! > > []'s > > > On Thu, Jul 10, 2014 at 12:19 PM, Ashwin Srinath > wrote: > >> Hey, Sai >> >> I'm no expert, so I'll just share a few links to start this discussion. >> You definitely want to look at Cython if you're >> computing with NumPy arrays. If you're familiar with the MPI programming >> model, you want to check out mpi4py . If you >> have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA >> . >> >> Thanks, >> Ashwin >> >> >> On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar >> wrote: >> >>> hi all, >>> >>> im trying to optimise a python code takes huge amount of time on >>> scipy functions such as scipy.signa.conv. Following are some of my queries >>> regarding the same.. It would be great to hear from you.. thanks.. >>> ---------------------------------------------------- >>> 1) Can Scipy take advantage of multi-cores.. if so how >>> 2)what are ways we can improve the performance of scipy/numpy functions >>> eg: using openmp, mpi etc >>> 3)If scipy internally use blas/mkl libraries can we enable parallelism >>> through these? >>> >>> >>> looks like i have to work on internals of scipy.. thanks a lot.. >>> >>> >>> *with regards..* >>> >>> *M. Sai Rajeswar* >>> *M-tech Computer Technology* >>> >>> >>> *IIT Delhi----------------------------------Cogito Ergo Sum---------* >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > > -- > *Dayvid Victor R. de Oliveira* > PhD Candidate in Computer Science at Federal University of Pernambuco > (UFPE) > MSc in Computer Science at Federal University of Pernambuco (UFPE) > BSc in Computer Engineering - Federal University of Pernambuco (UFPE) > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rajsai24 at gmail.com Sun Jul 13 08:28:26 2014 From: rajsai24 at gmail.com (Sai Rajeshwar) Date: Sun, 13 Jul 2014 17:58:26 +0530 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: <26946BDF-2841-4E79-AF20-9266A2C97377@astro.physik.uni-goettingen.de> References: <26946BDF-2841-4E79-AF20-9266A2C97377@astro.physik.uni-goettingen.de> Message-ID: hi , thanks for suggestions actually iam running my code on stampede tacc. where numpy,scipy are built against mkl libraries for optimal performance.. observations are as follows -------------------------------------------------- 1) setting different OMP_NUM_THREADS to different values didnot change the runtimes 2)the code took same time as it took on mac pro with accelerated framework for blas and lapack.. so is mkl not being helpful, or its not getting configured to use multithreads -------------------------- the statements taking lot fo time are like folllows -------------------- 1) for i in xrange(conv_out_shape[1]): conv_out[0][i]=scipy.signal.convolve(self.input[0][i%self.image_shape[1]],numpy.rot90(self.W[0][i/self.image_shape[1]],2),mode='valid') 2)for i in xrange(pooled_shape[1]): for j in xrange(pooled_shape[2]): for k in xrange(pooled_shape[3]): for l in xrange(pooled_shape[4]): pooled[0][i][j][k][l]=math.tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j]) thanks *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Fri, Jul 11, 2014 at 6:02 PM, Derek Homeier < derek at astro.physik.uni-goettingen.de> wrote: > On 10 Jul 2014, at 05:19 pm, Ashwin Srinath wrote: > > > I'm no expert, so I'll just share a few links to start this discussion. > You definitely want to look at Cython if you're computing with NumPy > arrays. If you're familiar with the MPI programming model, you want to > check out mpi4py. If you have NVIDIA GPUs that you'd like to take advantage > of, check out PyCUDA. > > > > Thanks, > > Ashwin > > > > > > On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar > wrote: > > hi all, > > > > im trying to optimise a python code takes huge amount of time on > scipy functions such as scipy.signa.conv. Following are some of my queries > regarding the same.. It would be great to hear from you.. thanks.. > > ---------------------------------------------------- > > 1) Can Scipy take advantage of multi-cores.. if so how > > 2)what are ways we can improve the performance of scipy/numpy functions > eg: using openmp, mpi etc > > 3)If scipy internally use blas/mkl libraries can we enable parallelism > through these? > > > If your operations are using the BLAS functions a lot, you get SMP > parallelisation for very cheap by > linking to the multithreaded MKL or ACML versions and setting > OMP_NUM_THREADS/MKL_NUM_THREADS > to the no. of available cores. > > Cheers, > Derek > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Sun Jul 13 11:38:40 2014 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Sun, 13 Jul 2014 17:38:40 +0200 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References: <26946BDF-2841-4E79-AF20-9266A2C97377@astro.physik.uni-goettingen.de> Message-ID: On 13 July 2014 14:28, Sai Rajeshwar wrote: > > 2)for i in xrange(pooled_shape[1]): > for j in xrange(pooled_shape[2]): > for k in xrange(pooled_shape[3]): > for l in xrange(pooled_shape[4]): > > pooled[0][i][j][k][l]=math.tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j]) You should get a speed up by accessing the arrays in a more efficient way: pooled[0, i, j, k, l] = math.tanh((numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+2, l*3:(l+1)*3]))/9.0+b[i, j]) In fact: numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3]) seems equivalent to: numpy.sum(conv_out[0, i, j, k*3: k*3 +1, l*3:(l+1)*3]) To take the last one into account: vec = numpy.sum(conv_out[0, i, j, k*3: k*3 + 2, l*3:(l+1)*3], axis=-1) pooled[0, i, j, k, l] = vec[0] + vec[1] + vec[2] / 9.0 And you can probably get rid of the i and j indexes all together. Something like this should work (untested): for k in... for l in... output = numpy.sum(conv_out[0, :, :, k*3: k*3 +1, l*3:(l+1)*3]), axis=-1) output += numpy.sum(conv_out[0, :, :, k*3 + 2, l*3 : (l+1)*3])), axis=-1)/9.0 output += b pooled[0, :, :, k, l] = numpy.tanh(output) In this case, one of the loops seems a great target for parallelisation. Also, Cython should help reduce the loop overhead. -------------- next part -------------- An HTML attachment was scrubbed... URL: From luke.pfister at gmail.com Sun Jul 13 13:00:55 2014 From: luke.pfister at gmail.com (Luke Pfister) Date: Sun, 13 Jul 2014 12:00:55 -0500 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References: <26946BDF-2841-4E79-AF20-9266A2C97377@astro.physik.uni-goettingen.de> Message-ID: Scipy does not call the MKL convolution function, so that isn't surprising. I've had good success with writing my own Cython wrapper around the Intel IPP convolution functions. On Sunday, July 13, 2014, Sai Rajeshwar wrote: > hi , thanks for suggestions > > actually iam running my code on stampede tacc. where numpy,scipy are > built against mkl libraries for optimal performance.. observations are as > follows > -------------------------------------------------- > > 1) setting different OMP_NUM_THREADS to different values didnot change > the runtimes > 2)the code took same time as it took on mac pro with accelerated framework > for blas and lapack.. > > so is mkl not being helpful, or its not getting configured to use > multithreads > > -------------------------- > the statements taking lot fo time are like folllows > -------------------- > > 1) for i in xrange(conv_out_shape[1]): > > conv_out[0][i]=scipy.signal.convolve(self.input[0][i%self.image_shape[1]],numpy.rot90(self.W[0][i/self.image_shape[1]],2),mode='valid') > > > > 2)for i in xrange(pooled_shape[1]): > for j in xrange(pooled_shape[2]): > for k in xrange(pooled_shape[3]): > for l in xrange(pooled_shape[4]): > > pooled[0][i][j][k][l]=math.tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j]) > > > thanks > > *with regards..* > > *M. Sai Rajeswar* > *M-tech Computer Technology* > > > *IIT Delhi----------------------------------Cogito Ergo Sum---------* > > > On Fri, Jul 11, 2014 at 6:02 PM, Derek Homeier < > derek at astro.physik.uni-goettingen.de > > > wrote: > >> On 10 Jul 2014, at 05:19 pm, Ashwin Srinath > > wrote: >> >> > I'm no expert, so I'll just share a few links to start this discussion. >> You definitely want to look at Cython if you're computing with NumPy >> arrays. If you're familiar with the MPI programming model, you want to >> check out mpi4py. If you have NVIDIA GPUs that you'd like to take advantage >> of, check out PyCUDA. >> > >> > Thanks, >> > Ashwin >> > >> > >> > On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar > > wrote: >> > hi all, >> > >> > im trying to optimise a python code takes huge amount of time on >> scipy functions such as scipy.signa.conv. Following are some of my queries >> regarding the same.. It would be great to hear from you.. thanks.. >> > ---------------------------------------------------- >> > 1) Can Scipy take advantage of multi-cores.. if so how >> > 2)what are ways we can improve the performance of scipy/numpy functions >> eg: using openmp, mpi etc >> > 3)If scipy internally use blas/mkl libraries can we enable parallelism >> through these? >> > >> If your operations are using the BLAS functions a lot, you get SMP >> parallelisation for very cheap by >> linking to the multithreaded MKL or ACML versions and setting >> OMP_NUM_THREADS/MKL_NUM_THREADS >> to the no. of available cores. >> >> Cheers, >> Derek >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rajsai24 at gmail.com Mon Jul 14 10:53:42 2014 From: rajsai24 at gmail.com (Sai Rajeshwar) Date: Mon, 14 Jul 2014 20:23:42 +0530 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References:

Message-ID: hi frederic, thanks, actually im trying to implement a 3d-convolutional neural network as you can see in the snippet.. so you mean to say 1)instead of using scipy.signal.convolve i should import theano and use signal.conv2d , if so signal.conv2d is right or any other function according to my need.. 2)also any hints on speeding up numpy.sum in pooled[0][i][j][k][l]=math. tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j]) thanks a lot.. also i have seen your name some where in pylearn2.. are ua pylearn developer too. *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Thu, Jul 10, 2014 at 6:30 PM, Fr?d?ric Bastien wrote: > Specific about convolution, there is a faster implementation in Theano: > > http://deeplearning.net/software/theano/library/tensor/nnet/conv.html > > It allow you to do multiple convolution at the same time. > > There is a parallel implementation, but sometimes, it speed things up, but > othertimes, it slow things down. > > Fred > > p.s. I'm a Theano developer. > > On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar wrote: > >> hi all, >> >> im trying to optimise a python code takes huge amount of time on scipy >> functions such as scipy.signa.conv. Following are some of my queries >> regarding the same.. It would be great to hear from you.. thanks.. >> ---------------------------------------------------- >> 1) Can Scipy take advantage of multi-cores.. if so how >> 2)what are ways we can improve the performance of scipy/numpy functions >> eg: using openmp, mpi etc >> 3)If scipy internally use blas/mkl libraries can we enable parallelism >> through these? >> >> >> looks like i have to work on internals of scipy.. thanks a lot.. >> >> >> *with regards..* >> >> *M. Sai Rajeswar* >> *M-tech Computer Technology* >> >> >> *IIT Delhi----------------------------------Cogito Ergo Sum---------* >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Mon Jul 14 11:03:10 2014 From: nouiz at nouiz.org (=?UTF-8?B?RnLDqWTDqXJpYyBCYXN0aWVu?=) Date: Mon, 14 Jul 2014 11:03:10 -0400 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References:

Message-ID: On Mon, Jul 14, 2014 at 10:53 AM, Sai Rajeshwar wrote: > hi frederic, > > thanks, actually im trying to implement a 3d-convolutional neural network > as you can see in the snippet.. so you mean to say > > 1)instead of using scipy.signal.convolve i should import theano and use > signal.conv2d > > , if so signal.conv2d is right or any other function according to my > need.. > We have some special conv3d for neural network: http://deeplearning.net/software/theano/library/tensor/nnet/conv.html. Maybe the suite better what you want. But to be useful, you will need medium/big convolution, not tini video. > > 2)also any hints on speeding up numpy.sum in > > pooled[0][i][j][k][l]=math. > > tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j]) > Someone else reply with some information to do less indexing and do the sum on bigger chunk of data each time. This could speed up your stuff. > thanks a lot.. also i have seen your name some where in pylearn2.. are > ua pylearn developer too. > Yes and no. I'm in the same lab as the main Pylearn2 dev and I do some small contribution from time to time(stuff mostly related to optimizaiton or Theano). But I wouldn't call me a pylearn2 core dev. Fred -------------- next part -------------- An HTML attachment was scrubbed... URL: From padarn at gmail.com Tue Jul 15 05:54:02 2014 From: padarn at gmail.com (Padarn Wilson) Date: Tue, 15 Jul 2014 19:54:02 +1000 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References:

Message-ID: On Tue, Jul 15, 2014 at 12:53 AM, Sai Rajeshwar wrote: > hi frederic, > > thanks, actually im trying to implement a 3d-convolutional neural network > as you can see in the snippet.. so you mean to say > > 1)instead of using scipy.signal.convolve i should import theano and use > signal.conv2d > > , if so signal.conv2d is right or any other function according to my > need.. > > 2)also any hints on speeding up numpy.sum in > > pooled[0][i][j][k][l]=math. > > tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j]) > > Is it actually the individual sums that are slow, or the whole loop? Its a bit hard to read, but it looks like you could vectorise the addition and then sum? Not sure if that would help much but worth a go maybe? > thanks a lot.. also i have seen your name some where in pylearn2.. are > ua pylearn developer too. > > > *with regards..* > > *M. Sai Rajeswar* > *M-tech Computer Technology* > > > *IIT Delhi----------------------------------Cogito Ergo Sum--------- * > > > On Thu, Jul 10, 2014 at 6:30 PM, Fr?d?ric Bastien wrote: > >> Specific about convolution, there is a faster implementation in Theano: >> >> http://deeplearning.net/software/theano/library/tensor/nnet/conv.html >> >> It allow you to do multiple convolution at the same time. >> >> There is a parallel implementation, but sometimes, it speed things up, >> but othertimes, it slow things down. >> >> Fred >> >> p.s. I'm a Theano developer. >> >> On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar >> wrote: >> >>> hi all, >>> >>> im trying to optimise a python code takes huge amount of time on >>> scipy functions such as scipy.signa.conv. Following are some of my queries >>> regarding the same.. It would be great to hear from you.. thanks.. >>> ---------------------------------------------------- >>> 1) Can Scipy take advantage of multi-cores.. if so how >>> 2)what are ways we can improve the performance of scipy/numpy functions >>> eg: using openmp, mpi etc >>> 3)If scipy internally use blas/mkl libraries can we enable parallelism >>> through these? >>> >>> >>> looks like i have to work on internals of scipy.. thanks a lot.. >>> >>> >>> *with regards..* >>> >>> *M. Sai Rajeswar* >>> *M-tech Computer Technology* >>> >>> >>> *IIT Delhi----------------------------------Cogito Ergo Sum---------* >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Tue Jul 15 14:06:26 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 15 Jul 2014 20:06:26 +0200 Subject: [SciPy-Dev] __numpy_ufunc__ and 1.9 release Message-ID: <53C56DA2.40402@googlemail.com> hi, as you may know we want to release numpy 1.9 soon. We should have solved most indexing regressions the first beta showed. The remaining blockers are finishing the new __numpy_ufunc__ feature. This feature should allow for alternative method to overriding the behavior of ufuncs from subclasses. It is described here: https://github.com/numpy/numpy/blob/master/doc/neps/ufunc-overrides.rst The current blocker issues are: https://github.com/numpy/numpy/issues/4753 https://github.com/numpy/numpy/pull/4815 I'm not to familiar with all the complications of subclassing so I can't really say how hard this is to solve. My issue is that it there still seems to be debate on how to handle operator overriding correctly and I am opposed to releasing a numpy with yet another experimental feature that may or may not be finished sometime later. Having datetime in infinite experimental state is bad enough. I think nobody is served well if we release 1.9 with the feature prematurely based on a not representative set of users and the later after more users showed up see we have to change its behavior. So I'm wondering if we should delay the introduction of this feature to 1.10 or is it important enough to wait until there is a consensus on the remaining issues? From rajsai24 at gmail.com Wed Jul 16 05:55:47 2014 From: rajsai24 at gmail.com (Sai Rajeshwar) Date: Wed, 16 Jul 2014 15:25:47 +0530 Subject: [SciPy-Dev] building scipy with umfpack and amd Message-ID: hi, im running a code which uses scipy.signal.convolve and numpy.sum extensively. I ran the code on two machines. One machine took very less time compared to other with same configuration, i checked the scipy configuration in that machine. i found scipy in that is built with umfpack and amd.. is this the reason behind it.. in what way umfpack and amd aid scipy operations..? -------------------------------- >>> scipy.__config__.show() blas_info: libraries = ['blas'] library_dirs = ['/usr/lib64'] language = f77 amd_info: libraries = ['amd'] library_dirs = ['/usr/lib64'] define_macros = [('SCIPY_AMD_H', None)] swig_opts = ['-I/usr/include/suitesparse'] include_dirs = ['/usr/include/suitesparse'] lapack_info: libraries = ['lapack'] library_dirs = ['/usr/lib64'] language = f77 atlas_threads_info: NOT AVAILABLE blas_opt_info: libraries = ['blas'] library_dirs = ['/usr/lib64'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_blas_threads_info: NOT AVAILABLE umfpack_info: libraries = ['umfpack', 'amd'] library_dirs = ['/usr/lib64'] define_macros = [('SCIPY_UMFPACK_H', None), ('SCIPY_AMD_H', None)] swig_opts = ['-I/usr/include/suitesparse', '-I/usr/include/suitesparse'] include_dirs = ['/usr/include/suitesparse'] thanks a lot for your replies in advance *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* -------------- next part -------------- An HTML attachment was scrubbed... URL: From yoshiki89 at gmail.com Wed Jul 16 14:06:45 2014 From: yoshiki89 at gmail.com (Yoshiki Vazquez Baeza) Date: Wed, 16 Jul 2014 12:06:45 -0600 Subject: [SciPy-Dev] Adding Procrustes to SciPy Message-ID: <20140716180645.GG76326@rl1-1-220-56-dhcp.int.colorado.edu> Hello, There seems to be some interest in adding Procrustes analysis to SciPy, there is an existing implementation in scikit-bio ( https://github.com/biocore/scikit-bio/blob/master/skbio/math/stats/spatial.py a package in which I am a developer) which could probably be ported over. The thing that's not particularly clear is where should this code live, the suggestion by Ralf Gommers is "linalg". However skbio puts the code inside the "spatial" submodule. This is the GitHub issue where this was initially discussed: https://github.com/scipy/scipy/issues/3786 Thanks! Yoshiki. From lists at informa.tiker.net Sun Jul 13 20:35:12 2014 From: lists at informa.tiker.net (Andreas Kloeckner) Date: Sun, 13 Jul 2014 19:35:12 -0500 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References: Message-ID: <53C325C0.4030306@informa.tiker.net> Am 10.07.2014 um 10:19 schrieb Ashwin Srinath: > I'm no expert, so I'll just share a few links to start this > discussion. You definitely want to look at Cython > if you're computing with NumPy arrays. If you're familiar with the MPI > programming model, you want to check out mpi4py > . If you have NVIDIA GPUs that you'd like to > take advantage of, check out PyCUDA > . Just stopping by to mention PyOpenCL [1] as a possible, non-Nvidia-specific (in fact not-GPU-specific) alternative to PyCUDA. [1] http://pypi.python.org/pypi/pyopencl Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: From rajsai24 at gmail.com Thu Jul 17 09:35:46 2014 From: rajsai24 at gmail.com (Sai Rajeshwar) Date: Thu, 17 Jul 2014 14:35:46 +0100 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References: <26946BDF-2841-4E79-AF20-9266A2C97377@astro.physik.uni-goettingen.de>

Message-ID: ok luke. thanks can you throw some light on Cython wrapper for IPP convolution function.. how should i go about it ..to start with.. and bit of details would be helpful... thanks *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Sun, Jul 13, 2014 at 6:00 PM, Luke Pfister wrote: > Scipy does not call the MKL convolution function, so that isn't > surprising. > > I've had good success with writing my own Cython wrapper around the Intel > IPP convolution functions. > > > On Sunday, July 13, 2014, Sai Rajeshwar wrote: > >> hi , thanks for suggestions >> >> actually iam running my code on stampede tacc. where numpy,scipy are >> built against mkl libraries for optimal performance.. observations are as >> follows >> -------------------------------------------------- >> >> 1) setting different OMP_NUM_THREADS to different values didnot change >> the runtimes >> 2)the code took same time as it took on mac pro with accelerated >> framework for blas and lapack.. >> >> so is mkl not being helpful, or its not getting configured to use >> multithreads >> >> -------------------------- >> the statements taking lot fo time are like folllows >> -------------------- >> >> 1) for i in xrange(conv_out_shape[1]): >> >> conv_out[0][i]=scipy.signal.convolve(self.input[0][i%self.image_shape[1]],numpy.rot90(self.W[0][i/self.image_shape[1]],2),mode='valid') >> >> >> >> 2)for i in xrange(pooled_shape[1]): >> for j in xrange(pooled_shape[2]): >> for k in xrange(pooled_shape[3]): >> for l in xrange(pooled_shape[4]): >> >> pooled[0][i][j][k][l]=math.tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j]) >> >> >> thanks >> >> *with regards..* >> >> *M. Sai Rajeswar* >> *M-tech Computer Technology* >> >> >> *IIT Delhi----------------------------------Cogito Ergo Sum---------* >> >> >> On Fri, Jul 11, 2014 at 6:02 PM, Derek Homeier < >> derek at astro.physik.uni-goettingen.de> wrote: >> >>> On 10 Jul 2014, at 05:19 pm, Ashwin Srinath >>> wrote: >>> >>> > I'm no expert, so I'll just share a few links to start this >>> discussion. You definitely want to look at Cython if you're computing with >>> NumPy arrays. If you're familiar with the MPI programming model, you want >>> to check out mpi4py. If you have NVIDIA GPUs that you'd like to take >>> advantage of, check out PyCUDA. >>> > >>> > Thanks, >>> > Ashwin >>> > >>> > >>> > On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar >>> wrote: >>> > hi all, >>> > >>> > im trying to optimise a python code takes huge amount of time on >>> scipy functions such as scipy.signa.conv. Following are some of my queries >>> regarding the same.. It would be great to hear from you.. thanks.. >>> > ---------------------------------------------------- >>> > 1) Can Scipy take advantage of multi-cores.. if so how >>> > 2)what are ways we can improve the performance of scipy/numpy >>> functions eg: using openmp, mpi etc >>> > 3)If scipy internally use blas/mkl libraries can we enable parallelism >>> through these? >>> > >>> If your operations are using the BLAS functions a lot, you get SMP >>> parallelisation for very cheap by >>> linking to the multithreaded MKL or ACML versions and setting >>> OMP_NUM_THREADS/MKL_NUM_THREADS >>> to the no. of available cores. >>> >>> Cheers, >>> Derek >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >> >> > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Jul 17 12:11:11 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 17 Jul 2014 17:11:11 +0100 Subject: [SciPy-Dev] __numpy_ufunc__ and 1.9 release In-Reply-To: <53C56DA2.40402@googlemail.com> References: <53C56DA2.40402@googlemail.com> Message-ID: On Tue, Jul 15, 2014 at 7:06 PM, Julian Taylor wrote: > hi, > as you may know we want to release numpy 1.9 soon. We should have solved > most indexing regressions the first beta showed. > > The remaining blockers are finishing the new __numpy_ufunc__ feature. > This feature should allow for alternative method to overriding the > behavior of ufuncs from subclasses. > It is described here: > https://github.com/numpy/numpy/blob/master/doc/neps/ufunc-overrides.rst > > The current blocker issues are: > https://github.com/numpy/numpy/issues/4753 > https://github.com/numpy/numpy/pull/4815 > > I'm not to familiar with all the complications of subclassing so I can't > really say how hard this is to solve. > My issue is that it there still seems to be debate on how to handle > operator overriding correctly and I am opposed to releasing a numpy with > yet another experimental feature that may or may not be finished > sometime later. Having datetime in infinite experimental state is bad > enough. > I think nobody is served well if we release 1.9 with the feature > prematurely based on a not representative set of users and the later > after more users showed up see we have to change its behavior. > > So I'm wondering if we should delay the introduction of this feature to > 1.10 or is it important enough to wait until there is a consensus on the > remaining issues? -1 on delaying the release (but you knew I'd say that) I don't have a strong feeling about whether or not we should disable __numpy_ufunc__ for the 1.9 release based on those bugs. They don't seem obviously catastrophic to me, but you make a good point about datetime. I think it's your call as release manager... -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From scipy.org at alexander-behringer-online.de Fri Jul 18 05:53:24 2014 From: scipy.org at alexander-behringer-online.de (Alexander Behringer) Date: Fri, 18 Jul 2014 11:53:24 +0200 Subject: [SciPy-Dev] Is Brent's method for minimizing the value of a function implemented twice in SciPy? Message-ID: <53C8EE94.8050606@alexander-behringer-online.de> Hello, while studying the SciPy documentation, I noticed that the 'brent' and the 'fminbound' function in the 'scipy.optimize' package both seem to implement Brent's method for function minimization. Both functions have been implemented by Travis Oliphant (see commit infos below). One minor difference is, that the 'brent' function _optionally_ allows for auto bracketing via the help of the 'bracket' function, when supplied only with two bounds via the 'brack' parameter instead of a triplet as required by Brent's algorithm. So is it possible, that Brent's method has been implemented twice? 'fminbound' was added in 2001: https://github.com/scipy/scipy/commit/3f44f63b481abf676a0b344fc836acf76bc86b35 'brent' was added approximately three-quarters of a year later in 2002: https://github.com/scipy/scipy/commit/b94c30dcb1ba9ad0b4c3e2090f5e99a8a21275ab The 'brent' code has later been moved into a separate internal class: https://github.com/scipy/scipy/commit/675ad592465be178cde88a89e9e362fdd5237004 Sincerely, Alexander Behringer From robert.kern at gmail.com Fri Jul 18 11:35:09 2014 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 18 Jul 2014 16:35:09 +0100 Subject: [SciPy-Dev] building scipy with umfpack and amd In-Reply-To: References: Message-ID: On Wed, Jul 16, 2014 at 10:55 AM, Sai Rajeshwar wrote: > hi, > > im running a code which uses scipy.signal.convolve and numpy.sum > extensively. I ran the code on two machines. One machine took very less time > compared to other with same configuration, i checked the scipy > configuration in that machine. i found scipy in that is built with umfpack > and amd.. > > is this the reason behind it.. in what way umfpack and amd aid scipy > operations..? They are not involved in either scipy.signal.convolve() or numpy.sum(). They are involved in linear algebra operations on sparse matrices. -- Robert Kern From rajsai24 at gmail.com Fri Jul 18 12:39:16 2014 From: rajsai24 at gmail.com (Sai Rajeshwar) Date: Fri, 18 Jul 2014 22:09:16 +0530 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References: <26946BDF-2841-4E79-AF20-9266A2C97377@astro.physik.uni-goettingen.de> Message-ID: thanks.. thats great to start with.. any hints about the scipy.convolve function which is a real bottleneck.. how can i speed it up *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Sun, Jul 13, 2014 at 9:08 PM, Da?id wrote: > > On 13 July 2014 14:28, Sai Rajeshwar wrote: > >> >> 2)for i in xrange(pooled_shape[1]): >> for j in xrange(pooled_shape[2]): >> for k in xrange(pooled_shape[3]): >> for l in xrange(pooled_shape[4]): >> >> pooled[0][i][j][k][l]=math.tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j]) > > > You should get a speed up by accessing the arrays in a more efficient way: > > pooled[0, i, j, k, l] = math.tanh((numpy.sum(conv_out[0, i, j, k*3, > l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3]) + > numpy.sum(conv_out[0, i, j, k*3+2, l*3:(l+1)*3]))/9.0+b[i, j]) > > In fact: > > numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, > j, k*3+1, l*3:(l+1)*3]) > > seems equivalent to: > > numpy.sum(conv_out[0, i, j, k*3: k*3 +1, l*3:(l+1)*3]) > > To take the last one into account: > > vec = numpy.sum(conv_out[0, i, j, k*3: k*3 + 2, l*3:(l+1)*3], axis=-1) > pooled[0, i, j, k, l] = vec[0] + vec[1] + vec[2] / 9.0 > > And you can probably get rid of the i and j indexes all together. > Something like this should work (untested): > > for k in... > for l in... > output = numpy.sum(conv_out[0, :, :, k*3: k*3 +1, l*3:(l+1)*3]), axis=-1) > output += numpy.sum(conv_out[0, :, :, k*3 + 2, l*3 : (l+1)*3])), > axis=-1)/9.0 > output += b > pooled[0, :, :, k, l] = numpy.tanh(output) > > In this case, one of the loops seems a great target for parallelisation. > Also, Cython should help reduce the loop overhead. > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Fri Jul 18 13:15:11 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 18 Jul 2014 17:15:11 +0000 (UTC) Subject: [SciPy-Dev] scipy improve performance by parallelizing References: <26946BDF-2841-4E79-AF20-9266A2C97377@astro.physik.uni-goettingen.de> Message-ID: <288246983427396452.672571sturla.molden-gmail.com@news.gmane.org> Sai Rajeshwar wrote: > thanks.. > > thats great to start with.. any hints about the scipy.convolve function > which is a real bottleneck.. > > how can i speed it up Have you considered FFT? Sturla From rajsai24 at gmail.com Sat Jul 19 13:32:31 2014 From: rajsai24 at gmail.com (Sai Rajeshwar) Date: Sat, 19 Jul 2014 23:02:31 +0530 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References:

Message-ID: hi Frederic.. following your advice i tried to rewrite my code.. using theano conv3d. basically im implementing a convolutional neural network.. and the problem with my code using theano is.. that error percenage across epochs doesnot decrease. i dont know if the problem with my implementation of conv3d.. i attach my code here.. thanks a lot in advance.. *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Mon, Jul 14, 2014 at 8:33 PM, Fr?d?ric Bastien wrote: > > > > On Mon, Jul 14, 2014 at 10:53 AM, Sai Rajeshwar > wrote: > >> hi frederic, >> >> thanks, actually im trying to implement a 3d-convolutional neural network >> as you can see in the snippet.. so you mean to say >> >> 1)instead of using scipy.signal.convolve i should import theano and use >> signal.conv2d >> >> , if so signal.conv2d is right or any other function according to my >> need.. >> > > We have some special conv3d for neural network: > http://deeplearning.net/software/theano/library/tensor/nnet/conv.html. > Maybe the suite better what you want. But to be useful, you will need > medium/big convolution, not tini video. > > >> >> 2)also any hints on speeding up numpy.sum in >> >> pooled[0][i][j][k][l]=math. >> >> tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j]) >> > > Someone else reply with some information to do less indexing and do the > sum on bigger chunk of data each time. This could speed up your stuff. > > >> thanks a lot.. also i have seen your name some where in pylearn2.. are >> ua pylearn developer too. >> > > Yes and no. I'm in the same lab as the main Pylearn2 dev and I do some > small contribution from time to time(stuff mostly related to optimizaiton > or Theano). But I wouldn't call me a pylearn2 core dev. > > Fred > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: lenet_kth_my.py Type: text/x-python Size: 11700 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: logistic_sgd1.py Type: text/x-python Size: 15103 bytes Desc: not available URL: From moritz.beber at gmail.com Mon Jul 21 04:09:02 2014 From: moritz.beber at gmail.com (Moritz Emanuel Beber) Date: Mon, 21 Jul 2014 10:09:02 +0200 Subject: [SciPy-Dev] computing pairwise distance of vectors with missing (nan) values Message-ID: <53CCCA9E.1060103@gmail.com> Dear all, My basic problem is that I would like to compute distances between vectors with missing values. You can find more detail in my question on SO (http://stackoverflow.com/questions/24781461/compute-the-pairwise-distance-in-scipy-with-missing-values). Since it seems this is not directly possible with scipy at the moment, I started to Cythonize my function. Currently, the below function is not much faster than my pure Python implementation, so I thought I'd ask the experts here. *Note that even though I'm computing the euclidean distance, I'd like to make use of different distance metrics. * So my current attempt at Cythonizing is: import numpy cimport numpy cimport cython from numpy.linalg import norm numpy.import_array() @cython.boundscheck(False) @cython.wraparound(False) def masked_euclidean(numpy.ndarray[numpy.double_t, ndim=2] data): cdef Py_ssize_t m = data.shape[0] cdef Py_ssize_t i = 0 cdef Py_ssize_t j = 0 cdef Py_ssize_t k = 0 cdef numpy.ndarray[numpy.double_t] dm = numpy.zeros(m * (m - 1) // 2, dtype=numpy.double) cdef numpy.ndarray[numpy.uint8_t, ndim=2, cast=True] mask = numpy.isfinite(data) # boolean for i in range(m - 1): for j in range(i + 1, m): curr = numpy.logical_and(mask[i], mask[j]) u = data[i][curr] v = data[j][curr] dm[k] = norm(u - v) k += 1 return dm Maybe the lack of speed-up is due to the Python function 'norm'? So my question is, how to improve the Cython implementation? Or is there a completely different way of approaching this problem? Thanks in advance, Moritz -------------- next part -------------- An HTML attachment was scrubbed... URL: From bussonniermatthias at gmail.com Mon Jul 21 04:47:51 2014 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Mon, 21 Jul 2014 10:47:51 +0200 Subject: [SciPy-Dev] computing pairwise distance of vectors with missing (nan) values In-Reply-To: <53CCCA9E.1060103@gmail.com> References: <53CCCA9E.1060103@gmail.com> Message-ID: <84F5C3FF-0A6D-4468-8E30-5F4A315D9AC5@gmail.com> Le 21 juil. 2014 ? 10:09, Moritz Emanuel Beber a ?crit : > Dear all, > > My basic problem is that I would like to compute distances between vectors with missing values. You can find more detail in my question on SO (http://stackoverflow.com/questions/24781461/compute-the-pairwise-distance-in-scipy-with-missing-values). Since it seems this is not directly possible with scipy at the moment, I started to Cythonize my function. Currently, the below function is not much faster than my pure Python implementation, so I thought I'd ask the experts here. Note that even though I'm computing the euclidean distance, I'd like to make use of different distance metrics. > > So my current attempt at Cythonizing is: > > import numpy > cimport numpy > cimport cython > from numpy.linalg import norm > > numpy.import_array() > > @cython.boundscheck(False) > @cython.wraparound(False) > def masked_euclidean(numpy.ndarray[numpy.double_t, ndim=2] data): > cdef Py_ssize_t m = data.shape[0] > cdef Py_ssize_t i = 0 > cdef Py_ssize_t j = 0 > cdef Py_ssize_t k = 0 > cdef numpy.ndarray[numpy.double_t] dm = numpy.zeros(m * (m - 1) // 2, dtype=numpy.double) > cdef numpy.ndarray[numpy.uint8_t, ndim=2, cast=True] mask = numpy.isfinite(data) # boolean > for i in range(m - 1): > for j in range(i + 1, m): > curr = numpy.logical_and(mask[i], mask[j]) > u = data[i][curr] > v = data[j][curr] > dm[k] = norm(u - v) > k += 1 > return dm > > Maybe the lack of speed-up is due to the Python function 'norm'? So my question is, how to improve the Cython implementation? Or is there a completely different way of approaching this problem? > > Thanks in advance, I would suggest using the python --anotate option (or -a option of python magic in IPython notebook) ,it will show you the generated c-code with hints of which line is slow and why as a nice syntax highlighted html page. You are right that `norm`, is slow, but apparently so is gitItem on data[] and numpy.logical_and -- M > Moritz > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From johannes.kulick at ipvs.uni-stuttgart.de Mon Jul 21 05:06:20 2014 From: johannes.kulick at ipvs.uni-stuttgart.de (Johannes Kulick) Date: Mon, 21 Jul 2014 11:06:20 +0200 Subject: [SciPy-Dev] Pull Request: Dirichlet Distribution Message-ID: <20140721090620.22952.33520@quirm.robotics.tu-berlin.de> Hi, I sent a pull request, that implements a Dirichlet distribution. Code review would be appreciated! https://github.com/scipy/scipy/pull/3815 Best, Johannes Kulick -- Question: What is the weird attachment to all my emails? Answer: http://en.wikipedia.org/wiki/Digital_signature -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 490 bytes Desc: signature URL: From moritz.beber at gmail.com Wed Jul 23 05:51:23 2014 From: moritz.beber at gmail.com (Moritz Beber) Date: Wed, 23 Jul 2014 11:51:23 +0200 Subject: [SciPy-Dev] computing pairwise distance of vectors with missing (nan) values In-Reply-To: <53CCCA9E.1060103@gmail.com> References: <53CCCA9E.1060103@gmail.com> Message-ID: Hello again, (I somehow lost the ability to reply to your message Matthias, since I got mails in digest mode, apologies for that.) So I've poked around the code with profilers and got two conflicting pieces of information. I now use a .pyx file (see attachment) and I profiled it in two different ways: 1. Using cProfile which gave the following results: ncalls tottime percall cumtime percall filename:lineno(function) 1 1.641 1.641 2.303 2.303 distance.pyx:13(masked_euclidean) 44850 0.294 0.000 0.662 0.000 linalg.py:1924(norm) 44850 0.292 0.000 0.292 0.000 {method 'reduce' of 'numpy.ufunc' objects} 44850 0.041 0.000 0.041 0.000 {numpy.core.multiarray.array} 44850 0.023 0.000 0.065 0.000 numeric.py:392(asarray) 44850 0.012 0.000 0.012 0.000 {method 'conj' of 'numpy.ndarray' objects} 1 0.000 0.000 2.303 2.303 :1() 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} This leads me to believe that, yes, the boolean subset is taking a significant amount of time but the majority is spent in `norm`. 2. I also ran valgrind and that seems to suggest that 55% of the time is spent in the boolean subset (you can get it here: https://dl.dropboxusercontent.com/u/51564502/callgrind.log). Or am I reading the results wrong? 3. I couldn't get the %lprof magic to work in the IPyNB, just get 0 time for the whole function call. Is this possible somehow by now? So my questions at this point are: Can I improve the fancy indexing somehow? And can I include the scipy distance measures easily so that I avoid the call to numpy.linalg.norm? Thank you so much, Moritz -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: distance.pyx Type: application/octet-stream Size: 908 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: profile_cython.py Type: text/x-python Size: 425 bytes Desc: not available URL: From bussonniermatthias at gmail.com Wed Jul 23 08:36:09 2014 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Wed, 23 Jul 2014 14:36:09 +0200 Subject: [SciPy-Dev] computing pairwise distance of vectors with missing (nan) values In-Reply-To: References: <53CCCA9E.1060103@gmail.com> Message-ID: Quick from my phone. Isn't numpy.take() faster than fancy indexing ? -- M Envoy? de mon iPhone > Le 23 juil. 2014 ? 11:51, Moritz Beber a ?crit : > > Hello again, > > (I somehow lost the ability to reply to your message Matthias, since I got mails in digest mode, apologies for that.) > > > So I've poked around the code with profilers and got two conflicting pieces of information. I now use a .pyx file (see attachment) and I profiled it in two different ways: > > 1. Using cProfile which gave the following results: > > ncalls tottime percall cumtime percall filename:lineno(function) > 1 1.641 1.641 2.303 2.303 distance.pyx:13(masked_euclidean) > 44850 0.294 0.000 0.662 0.000 linalg.py:1924(norm) > 44850 0.292 0.000 0.292 0.000 {method 'reduce' of 'numpy.ufunc' objects} > 44850 0.041 0.000 0.041 0.000 {numpy.core.multiarray.array} > 44850 0.023 0.000 0.065 0.000 numeric.py:392(asarray) > 44850 0.012 0.000 0.012 0.000 {method 'conj' of 'numpy.ndarray' objects} > 1 0.000 0.000 2.303 2.303 :1() > 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} > > This leads me to believe that, yes, the boolean subset is taking a significant amount of time but the majority is spent in `norm`. > > 2. I also ran valgrind and that seems to suggest that 55% of the time is spent in the boolean subset (you can get it here: https://dl.dropboxusercontent.com/u/51564502/callgrind.log). Or am I reading the results wrong? > > 3. I couldn't get the %lprof magic to work in the IPyNB, just get 0 time for the whole function call. Is this possible somehow by now? > > So my questions at this point are: Can I improve the fancy indexing somehow? And can I include the scipy distance measures easily so that I avoid the call to numpy.linalg.norm? > > Thank you so much, > Moritz > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From vanforeest at gmail.com Wed Jul 23 10:40:31 2014 From: vanforeest at gmail.com (nicky van foreest) Date: Wed, 23 Jul 2014 16:40:31 +0200 Subject: [SciPy-Dev] scipy.sparse versus pysparse Message-ID: Hi, I am doing some testing between scipy.sparse and pysparse on my ubuntu machine. Some testing reveals that pysparse is about 9 times faster in matrix-vector multiplication that scipy.sparse. Might there be anything specific I forgot to do during scipy's installation (I just ran apt-get install python-scipy)? Is there another simple explanation for this difference? I prefer to use scipy.sparse for its cleaner api, but a factor 9 in speed is considerable. thanks Nicky -------------- next part -------------- An HTML attachment was scrubbed... URL: From moritz.beber at gmail.com Wed Jul 23 10:51:52 2014 From: moritz.beber at gmail.com (Moritz Beber) Date: Wed, 23 Jul 2014 16:51:52 +0200 Subject: [SciPy-Dev] scipy.sparse versus pysparse In-Reply-To: References: Message-ID: Hey, On Wed, Jul 23, 2014 at 4:40 PM, nicky van foreest wrote: > > > I am doing some testing between scipy.sparse and pysparse on my ubuntu > machine. Some testing reveals that pysparse is about 9 times faster in > matrix-vector multiplication that scipy.sparse. Might there be anything > specific I forgot to do during scipy's installation (I just ran apt-get > install python-scipy)? Is there another simple explanation for this > difference? I prefer to use scipy.sparse for its cleaner api, but a factor > 9 in speed is considerable. > > Could you post your benchmarking code somewhere (or show here), please? Cheers, Moritz -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Wed Jul 23 13:37:39 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 23 Jul 2014 19:37:39 +0200 Subject: [SciPy-Dev] __numpy_ufunc__ and 1.9 release In-Reply-To: <53C56DA2.40402@googlemail.com> References: <53C56DA2.40402@googlemail.com> Message-ID: <53CFF2E3.1020708@googlemail.com> On 15.07.2014 20:06, Julian Taylor wrote: > hi, > as you may know we want to release numpy 1.9 soon. We should have solved > most indexing regressions the first beta showed. > > The remaining blockers are finishing the new __numpy_ufunc__ feature. > This feature should allow for alternative method to overriding the > behavior of ufuncs from subclasses. > It is described here: > https://github.com/numpy/numpy/blob/master/doc/neps/ufunc-overrides.rst > > The current blocker issues are: > https://github.com/numpy/numpy/issues/4753 > https://github.com/numpy/numpy/pull/4815 > > I'm not to familiar with all the complications of subclassing so I can't > really say how hard this is to solve. > My issue is that it there still seems to be debate on how to handle > operator overriding correctly and I am opposed to releasing a numpy with > yet another experimental feature that may or may not be finished > sometime later. Having datetime in infinite experimental state is bad > enough. > I think nobody is served well if we release 1.9 with the feature > prematurely based on a not representative set of users and the later > after more users showed up see we have to change its behavior. > > So I'm wondering if we should delay the introduction of this feature to > 1.10 or is it important enough to wait until there is a consensus on the > remaining issues? > So its been a week and we got a few answers and new issues. To summarize: - to my knowledge no progress was made on the issues - scipy already has a released version using the current implementation - no very loud objections to delaying the feature to 1.10 - I am still unfamiliar with the problematics of subclassing, but don't want to release something new which has unsolved issues. That scipy already uses it in a released version (0.14) is very problematic. Can maybe someone give some insight if the potential changes to resolve the remaining issues would break scipy? If so we have following choices: - declare what we have as final and close the remaining issues as 'won't fix'. Any changes would have to have a new name __numpy_ufunc2__ or a somehow versioned the interface - delay the introduction, potentially breaking scipy 0.14 when numpy 1.10 is released. I would like to get the next (and last) numpy 1.9 beta out soon, so I would propose to make a decision until this Saturday the 26.02.2014 however misinformed it may be. Please note that the numpy 1.10 release cycle is likely going to be a very long one as we are currently planning to change a bunch of default behaviours that currently raise deprecation warnings and possibly will try to fix string types, text IO and datetime. Please see the future changes notes in the current 1.9.x release notes. If we delay numpy_ufunc it is not unlikely that it will take a year until we release 1.10. Though we could still put it into a earlier 1.9.1. Cheers, Julian From vanforeest at gmail.com Wed Jul 23 16:08:16 2014 From: vanforeest at gmail.com (nicky van foreest) Date: Wed, 23 Jul 2014 22:08:16 +0200 Subject: [SciPy-Dev] scipy.sparse versus pysparse In-Reply-To: References:

Message-ID: Hi Moritz, Sure. Please see below. I included an extra time stamp to analyse the results in slightly more detail. It turns out that the matrix-vector multiplications are roughly the same in scipy.stats and pysparse, but that building the matrices in pysparse is way faster. I compute the stationary distribution vector of a Markov chain. The details of the algo are not really important. I guess that the logic of the code is easy to understand, if not, please let me know. There are three nearly identical methods to compute the distribution vector, the first uses scipy.stats and the * operator to compute a vector times a matrix, the second uses scipy.sparse dot(), and the third uses pysparse. You might want to skip the code and jump right away to results which I include below the code ==== Code from numpy import ones, zeros, empty import scipy.sparse as sp import pysparse from pylab import matshow, savefig from scipy.linalg import norm import time labda, mu1, mu2 = 1., 1.1, 1.01 N1, N2 = 400, 400 size = N1*N2 eps = 1e-3 maxIterations = 1e5 print "size = ", size def state(i,j): return j*N1 + i def fillOffDiagonal(Q): # labda for i in range(0,N1-1): for j in range(0,N2): Q[(state(i,j),state(i+1,j))]= labda # mu2 for i in range(0,N1): for j in range(1,N2): Q[(state(i,j),state(i,j-1))]= mu2 # mu1 for i in range(1,N1): for j in range(0,N2-1): Q[(state(i,j),state(i-1,j+1))]= mu1 #print "ready filling" def computePiMethod1(): """ based on scipy.sparse, naive matrix-vector multiplication """ e0 = time.time() Q = sp.dok_matrix((size,size)) fillOffDiagonal(Q) # Set the diagonal of Q such that the row sums are zero Q.setdiag( -Q*ones(size) ) # Compute a suitable stochastic matrix by means of uniformization l = min(Q.values())*1.001 # avoid periodicity, see trivedi's book P = sp.eye(size, size) - Q/l P = P.tocsr() pi = zeros(size); pi1 = zeros(size) pi[0] = 1; n = norm(pi - pi1,1); i = 0; e1 = time.time() while n > eps and i < maxIterations: pi1 = pi*P pi = pi1*P # avoid copying pi1 to pi n = norm(pi - pi1,1); i += 1 print "Method 1: ", e1-e0, time.time() - e1, i return pi def computePiMethod2(): """ based on scipy.sparse, dot multiplication """ e0 = time.time() Q = sp.dok_matrix((size,size)) fillOffDiagonal(Q) # Set the diagonal of Q such that the row sums are zero Q.setdiag( -Q*ones(size) ) # Compute a suitable stochastic matrix by means of uniformization l = min(Q.values())*1.001 # avoid periodicity, see trivedi's book P = sp.eye(size, size) - Q/l P = P.transpose() P = P.tocsr() pi = zeros(size); pi1 = zeros(size) pi[0] = 1; n = norm(pi - pi1,1); i = 0; e1 = time.time() while n > eps and i < maxIterations: pi1 = P.dot(pi) pi = P.dot(pi1) n = norm(pi - pi1,1); i += 1 print "Method 2: ", e1-e0, time.time() - e1, i return pi def computePiMethod3(): """ based on pysparse """ e0 = time.time() Q = pysparse.spmatrix.ll_mat(size,size) fillOffDiagonal(Q) # fill diagonal x = empty(size) Q.matvec(ones(size),x) Q.put(-x) # uniformize l = min(Q.values())*1.001 P = pysparse.spmatrix.ll_mat(size,size) P.put(ones(size)) P.shift(-1./l, Q) # Compute pi P = P.to_csr() pi = zeros(size); pi1 = zeros(size) pi[0] = 1; n = norm(pi - pi1,1); i = 0; e1 = time.time() while n > eps and i < maxIterations: P.matvec_transp(pi,pi1) P.matvec_transp(pi1,pi) n = norm(pi - pi1,1); i += 1 print "Method 3: ", e1-e0, time.time() - e1, i return pi def plotPi(pi): pi = pi.reshape(N2,N1) matshow(pi) savefig("pi.png") if __name__ == "__main__": pi1 = computePiMethod1() pi2 = computePiMethod2() pi3 = computePiMethod3() d1 = norm(pi1-pi2,1) d2 = norm(pi1-pi3,1) print d1, d2from numpy import ones, zeros, empty import scipy.sparse as sp import pysparse from pylab import matshow, savefig from scipy.linalg import norm import time labda, mu1, mu2 = 1., 1.1, 1.01 N1, N2 = 400, 400 size = N1*N2 eps = 1e-3 maxIterations = 1e5 print "size = ", size def state(i,j): return j*N1 + i def fillOffDiagonal(Q): # labda for i in range(0,N1-1): for j in range(0,N2): Q[(state(i,j),state(i+1,j))]= labda # mu2 for i in range(0,N1): for j in range(1,N2): Q[(state(i,j),state(i,j-1))]= mu2 # mu1 for i in range(1,N1): for j in range(0,N2-1): Q[(state(i,j),state(i-1,j+1))]= mu1 #print "ready filling" def computePiMethod1(): """ based on scipy.sparse, naive matrix-vector multiplication """ e0 = time.time() Q = sp.dok_matrix((size,size)) fillOffDiagonal(Q) # Set the diagonal of Q such that the row sums are zero Q.setdiag( -Q*ones(size) ) # Compute a suitable stochastic matrix by means of uniformization l = min(Q.values())*1.001 # avoid periodicity, see trivedi's book P = sp.eye(size, size) - Q/l P = P.tocsr() pi = zeros(size); pi1 = zeros(size) pi[0] = 1; n = norm(pi - pi1,1); i = 0; e1 = time.time() while n > eps and i < maxIterations: pi1 = pi*P pi = pi1*P # avoid copying pi1 to pi n = norm(pi - pi1,1); i += 1 print "Method 1: ", e1-e0, time.time() - e1, i return pi def computePiMethod2(): """ based on scipy.sparse, dot multiplication """ e0 = time.time() Q = sp.dok_matrix((size,size)) fillOffDiagonal(Q) # Set the diagonal of Q such that the row sums are zero Q.setdiag( -Q*ones(size) ) # Compute a suitable stochastic matrix by means of uniformization l = min(Q.values())*1.001 # avoid periodicity, see trivedi's book P = sp.eye(size, size) - Q/l P = P.transpose() P = P.tocsr() pi = zeros(size); pi1 = zeros(size) pi[0] = 1; n = norm(pi - pi1,1); i = 0; e1 = time.time() while n > eps and i < maxIterations: pi1 = P.dot(pi) pi = P.dot(pi1) n = norm(pi - pi1,1); i += 1 print "Method 2: ", e1-e0, time.time() - e1, i return pi def computePiMethod3(): """ based on pysparse """ e0 = time.time() Q = pysparse.spmatrix.ll_mat(size,size) fillOffDiagonal(Q) # fill diagonal x = empty(size) Q.matvec(ones(size),x) Q.put(-x) # uniformize l = min(Q.values())*1.001 P = pysparse.spmatrix.ll_mat(size,size) P.put(ones(size)) P.shift(-1./l, Q) # Compute pi P = P.to_csr() pi = zeros(size); pi1 = zeros(size) pi[0] = 1; n = norm(pi - pi1,1); i = 0; e1 = time.time() while n > eps and i < maxIterations: P.matvec_transp(pi,pi1) P.matvec_transp(pi1,pi) n = norm(pi - pi1,1); i += 1 print "Method 3: ", e1-e0, time.time() - e1, i return pi def plotPi(pi): pi = pi.reshape(N2,N1) matshow(pi) savefig("pi.png") if __name__ == "__main__": pi1 = computePiMethod1() pi2 = computePiMethod2() pi3 = computePiMethod3() d1 = norm(pi1-pi2,1) d2 = norm(pi1-pi3,1) print d1, d2 ============================== Results nicky at chuck:~/myprogs/python/queueing/tandemQueueMDP$ python tandemqueue.py size = 40000 Method 1: 4.31593680382 0.387089014053 285 Method 2: 4.27599096298 0.273495912552 285 Method 3: 0.0856800079346 0.267058134079 285 0.0 5.05082123061e-15 The first number after "method1:" represents the time it takes to fill the matrix, the second number is the time involved to carry out the multiplications, and the third is (twice) the number of mutliplications involved. The second number is (nearly) the same for all three methods, hence the multiplication time is about the same. There is, however, a huge difference in the first number, ie, the time required to build the matrix. It takes about 4 sec for scipy.stats, and 8e-2 sec for pysparse. The last row prints the difference between the results (the stationary distributions vectors) as obtained by all three methods. Luckily the results are the same, up to rounding. Here is a try with a bigger matrix: nicky at chuck:~/myprogs/python/queueing/tandemQueueMDP$ python tandemqueue.py size = 160000 Method 1: 17.4650111198 1.80849194527 285 Method 2: 17.5270321369 1.54912996292 285 Method 3: 0.382800102234 1.63900899887 285 0.0 5.87925665394e-15 Again the same result. Do you perhaps have any explanation for this? Thanks Nicky On 23 July 2014 16:51, Moritz Beber wrote: > Hey, > > > On Wed, Jul 23, 2014 at 4:40 PM, nicky van foreest > wrote: > >> >> >> I am doing some testing between scipy.sparse and pysparse on my ubuntu >> machine. Some testing reveals that pysparse is about 9 times faster in >> matrix-vector multiplication that scipy.sparse. Might there be anything >> specific I forgot to do during scipy's installation (I just ran apt-get >> install python-scipy)? Is there another simple explanation for this >> difference? I prefer to use scipy.sparse for its cleaner api, but a factor >> 9 in speed is considerable. >> >> > Could you post your benchmarking code somewhere (or show here), please? > > Cheers, > Moritz > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Wed Jul 23 17:09:55 2014 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 24 Jul 2014 00:09:55 +0300 Subject: [SciPy-Dev] scipy.sparse versus pysparse In-Reply-To: References:

Message-ID: 23.07.2014, 23:08, nicky van foreest kirjoitti: > Sure. Please see below. I included an extra time stamp to analyse > the results in slightly more detail. It turns out that the > matrix-vector multiplications are roughly the same in scipy.stats > and pysparse, but that building the matrices in pysparse is way > faster. The benchmark is mainly measuring the speed of dok_matrix.__setitem__ for scalars (dok_matrix.setdiag is naive and justs sets items in a for loop). Neither dok_matrix or lil_matrix is very fast. This is largely limited by the fact that they use Python dict and Python lists as data structures, which have non-negligible overheads. lil_matrix was optimized in Scipy 0.14.0, so you may get better results using it (for those Scipy versions). Additionally, vectorized assignment into sparse matrices is now supported, so further performance improvement can be obtained by replacing the for loops in fillOffDiagonal. There may be some room for optimization in dok_matrix for scalar assignment, but this is probably not more than 2x. The remaining 10x factor vs. pysparse requires pretty much not using Python data structures for storing the numbers. csr, csr, bsr, and dia are OK, but the data structures are not well-suited for matrix assembly. -- Pauli Virtanen From pav at iki.fi Wed Jul 23 18:35:57 2014 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 24 Jul 2014 01:35:57 +0300 Subject: [SciPy-Dev] __numpy_ufunc__ and 1.9 release In-Reply-To: <53CFF2E3.1020708@googlemail.com> References: <53C56DA2.40402@googlemail.com> <53CFF2E3.1020708@googlemail.com> Message-ID: <53D038CD.3000306@iki.fi> 23.07.2014, 20:37, Julian Taylor kirjoitti: [clip: __numpy_ufunc__] > So its been a week and we got a few answers and new issues. To > summarize: - to my knowledge no progress was made on the issues - > scipy already has a released version using the current > implementation - no very loud objections to delaying the feature to > 1.10 - I am still unfamiliar with the problematics of subclassing, > but don't want to release something new which has unsolved issues. > > That scipy already uses it in a released version (0.14) is very > problematic. Can maybe someone give some insight if the potential > changes to resolve the remaining issues would break scipy? > > If so we have following choices: > > - declare what we have as final and close the remaining issues as > 'won't fix'. Any changes would have to have a new name > __numpy_ufunc2__ or a somehow versioned the interface - delay the > introduction, potentially breaking scipy 0.14 when numpy 1.10 is > released. > > I would like to get the next (and last) numpy 1.9 beta out soon, so > I would propose to make a decision until this Saturday the > 26.02.2014 however misinformed it may be. It seems fairly unlikely to me that the `__numpy_ufunc__` interface itself requires any changes. I believe the definition of the interface is quite safe to consider as fixed --- it is a fairly straighforward hook for Numpy ufuncs. (There are also no essential changes in it since last year.) For the binary operator overriding, Scipy sets the constraint that ndarray * spmatrix MUST call spmatrix.__rmul__ even if spmatrix.__numpy_ufunc__ is defined. spmatrixes are not ndarray subclasses, and various subclassing problems do not enter here. Note that this binop discussion is somewhat separate from the __numpy_ufunc__ interface itself. The only information available about it at the binop stage is `hasattr(other, '__numpy_ufunc__')`. *** Regarding the blockers: (1) https://github.com/numpy/numpy/issues/4753 This is a bug in the argument normalization --- output arguments are not checked for the presence of "__numpy_ufunc__" if they are passed as keyword arguments (as a positional argument it works). It's a bug in the implementation, but I don't think it is really a blocker. Scipy sparse matrices will in practice seldom be used as output args for ufuncs. *** (2) https://github.com/numpy/numpy/pull/4815 The is open question concerns semantics of `__numpy_ufunc__` versus Python operator overrides. When should ndarray.__mul__(other) return NotImplemented? Scipy sparse matrices are not subclasses of ndarray, so the code in question in Numpy gets to run only for ndarray * spmatrix This provides a constraint to what solution we can choose in Numpy to deal with the issue: ndarray.__mul__(spmatrix) MUST continue to return NotImplemented This is the current behavior, and cannot be changed: it is not possible to defer this to __numpy_ufunc__(ufunc=np.multiply), because sparse matrices define `*` as the matrix multiply, and not the elementwise multiply. (This settles one line of discussion in the issues --- ndarray should defer.) How Numpy currently determines whether to return NotImplemented in this case or to call np.multiply(self, other) is by comparing `__array_priority__` attributes of `self` and `other`. Scipy sparse matrices define an `__array_priority__` larger than ndarrays, which then makes a NotImplemented be returned. The idea in the __numpy_ufunc__ NEP was to replace this with `hasattr(other, '__numpy_ufunc__') and hasattr(other, '__rmul__')`. However, when both self and other are ndarray subclasses in a certain configuration, both end up returning NotImplemented, and Python raises TypeError. The `__array_priority__` mechanism is also broken in some of the subclassing cases: https://github.com/numpy/numpy/issues/4766 As far as I see, the backward compatibility requirement from Scipy only rules out the option that ndarray.__mul__(other) should unconditionally call `np.add(self, other)`. We have some freedom how to solve the binop vs. subclass issues. It's possible to e.g. retain the __array_priority__ stuff as a backward compatibility measure as we do currently. -- Pauli Virtanen From vanforeest at gmail.com Thu Jul 24 04:11:57 2014 From: vanforeest at gmail.com (nicky van foreest) Date: Thu, 24 Jul 2014 10:11:57 +0200 Subject: [SciPy-Dev] scipy.sparse versus pysparse In-Reply-To: References:

Message-ID: Hi Pauli, Thanks for your clarifications. NIcky On 23 July 2014 23:09, Pauli Virtanen wrote: > 23.07.2014, 23:08, nicky van foreest kirjoitti: > > Sure. Please see below. I included an extra time stamp to analyse > > the results in slightly more detail. It turns out that the > > matrix-vector multiplications are roughly the same in scipy.stats > > and pysparse, but that building the matrices in pysparse is way > > faster. > > The benchmark is mainly measuring the speed of dok_matrix.__setitem__ > for scalars (dok_matrix.setdiag is naive and justs sets items in a for > loop). > > Neither dok_matrix or lil_matrix is very fast. This is largely limited > by the fact that they use Python dict and Python lists as data > structures, which have non-negligible overheads. > > lil_matrix was optimized in Scipy 0.14.0, so you may get better > results using it (for those Scipy versions). Additionally, vectorized > assignment into sparse matrices is now supported, so further > performance improvement can be obtained by replacing the for loops in > fillOffDiagonal. > > There may be some room for optimization in dok_matrix for scalar > assignment, but this is probably not more than 2x. The remaining 10x > factor vs. pysparse requires pretty much not using Python data > structures for storing the numbers. > > csr, csr, bsr, and dia are OK, but the data structures are not > well-suited for matrix assembly. > > -- > Pauli Virtanen > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu Jul 24 09:13:11 2014 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 24 Jul 2014 16:13:11 +0300 Subject: [SciPy-Dev] scipy.sparse versus pysparse In-Reply-To: References:

Message-ID: 24.07.2014, 11:11, nicky van foreest kirjoitti: > Thanks for your clarifications. I should note that the issue of adding a sparse format more suitable for fast matrix assembly has been brought up, but not implemented yet. While the fact that lil_matrix and dok_matrix are Python data structures is nice, a more practical approach would use opaque data storage (similar to ll_mat in pysparse). Pauli From rajsai24 at gmail.com Thu Jul 24 12:29:04 2014 From: rajsai24 at gmail.com (Sai Rajeshwar) Date: Thu, 24 Jul 2014 21:59:04 +0530 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: <53C02235.8000705@googlemail.com> References: <53C02235.8000705@googlemail.com> Message-ID: hi julian thanks.. but when i use numpy.convolve i get this error ValueError: object too deep for desired array does numpy.convolve work for 2D or 3D convolution? thanks *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Fri, Jul 11, 2014 at 11:13 PM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > for simple convolutions there is also np.convolve > > compared to scipy it releases the GIL and you can use normal python > threads for parallization if you need to compute many independent > convolutions and not just one. > > That said scipy should probably release the GIL too, probably a bug that > it doesn't. > > On 10.07.2014 17:19, Ashwin Srinath wrote: > > Hey, Sai > > > > I'm no expert, so I'll just share a few links to start this discussion. > > You definitely want to look at Cython if you're > > computing with NumPy arrays. If you're familiar with the MPI programming > > model, you want to check out mpi4py . If you > > have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA > > . > > > > Thanks, > > Ashwin > > > > > > On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar > > wrote: > > > > hi all, > > > > im trying to optimise a python code takes huge amount of time on > > scipy functions such as scipy.signa.conv. Following are some of my > > queries regarding the same.. It would be great to hear from you.. > > thanks.. > > ---------------------------------------------------- > > 1) Can Scipy take advantage of multi-cores.. if so how > > 2)what are ways we can improve the performance of scipy/numpy > > functions eg: using openmp, mpi etc > > 3)If scipy internally use blas/mkl libraries can we enable > > parallelism through these? > > > > > > looks like i have to work on internals of scipy.. thanks a lot.. > > > > > > *with regards..* > > * > > * > > *M. Sai Rajeswar* > > *M-tech Computer Technology* > > *IIT Delhi > > ----------------------------------Cogito Ergo Sum--------- > > * > > > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > > > > > > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewm at redtetrahedron.org Thu Jul 24 12:46:20 2014 From: ewm at redtetrahedron.org (Eric Moore) Date: Thu, 24 Jul 2014 12:46:20 -0400 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References: <53C02235.8000705@googlemail.com> Message-ID: On Thursday, July 24, 2014, Sai Rajeshwar wrote: > hi julian thanks.. > > but when i use numpy.convolve i get this error ValueError: object too > deep for desired array > > does numpy.convolve work for 2D or 3D convolution? > thanks > > *with regards..* > > *M. Sai Rajeswar* > *M-tech Computer Technology* > > > *IIT Delhi----------------------------------Cogito Ergo Sum---------* > > > On Fri, Jul 11, 2014 at 11:13 PM, Julian Taylor < > jtaylor.debian at googlemail.com > > wrote: > >> for simple convolutions there is also np.convolve >> >> compared to scipy it releases the GIL and you can use normal python >> threads for parallization if you need to compute many independent >> convolutions and not just one. >> >> That said scipy should probably release the GIL too, probably a bug that >> it doesn't. >> >> On 10.07.2014 17:19, Ashwin Srinath wrote: >> > Hey, Sai >> > >> > I'm no expert, so I'll just share a few links to start this discussion. >> > You definitely want to look at Cython if you're >> > computing with NumPy arrays. If you're familiar with the MPI programming >> > model, you want to check out mpi4py . If you >> > have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA >> > . >> > >> > Thanks, >> > Ashwin >> > >> > >> > On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar > >> > > >> wrote: >> > >> > hi all, >> > >> > im trying to optimise a python code takes huge amount of time on >> > scipy functions such as scipy.signa.conv. Following are some of my >> > queries regarding the same.. It would be great to hear from you.. >> > thanks.. >> > ---------------------------------------------------- >> > 1) Can Scipy take advantage of multi-cores.. if so how >> > 2)what are ways we can improve the performance of scipy/numpy >> > functions eg: using openmp, mpi etc >> > 3)If scipy internally use blas/mkl libraries can we enable >> > parallelism through these? >> > >> > >> > looks like i have to work on internals of scipy.. thanks a lot.. >> > >> > >> > *with regards..* >> > * >> > * >> > *M. Sai Rajeswar* >> > *M-tech Computer Technology* >> > *IIT Delhi >> > ----------------------------------Cogito Ergo Sum--------- >> > * >> > >> > _______________________________________________ >> > SciPy-Dev mailing list >> > SciPy-Dev at scipy.org >> > SciPy-Dev at scipy.org >> > >> > http://mail.scipy.org/mailman/listinfo/scipy-dev >> > >> > >> > >> > >> > _______________________________________________ >> > SciPy-Dev mailing list >> > SciPy-Dev at scipy.org >> >> > http://mail.scipy.org/mailman/listinfo/scipy-dev >> > >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> > > There are also convolution functions in scipy.ndimage. For simple smallish 1d convolution ndimage is much much faster than scipy.signal and somewhat faster than numpy.convolve. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rajsai24 at gmail.com Thu Jul 24 12:47:46 2014 From: rajsai24 at gmail.com (Sai Rajeshwar) Date: Thu, 24 Jul 2014 22:17:46 +0530 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References: <53C02235.8000705@googlemail.com> Message-ID: ok .. what about 2d or 3d convolution.. does it perform better? thanks *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Thu, Jul 24, 2014 at 10:16 PM, Eric Moore wrote: > > > On Thursday, July 24, 2014, Sai Rajeshwar wrote: > >> hi julian thanks.. >> >> but when i use numpy.convolve i get this error ValueError: object too >> deep for desired array >> >> does numpy.convolve work for 2D or 3D convolution? >> thanks >> >> *with regards..* >> >> *M. Sai Rajeswar* >> *M-tech Computer Technology* >> >> >> *IIT Delhi----------------------------------Cogito Ergo Sum---------* >> >> >> On Fri, Jul 11, 2014 at 11:13 PM, Julian Taylor < >> jtaylor.debian at googlemail.com> wrote: >> >>> for simple convolutions there is also np.convolve >>> >>> compared to scipy it releases the GIL and you can use normal python >>> threads for parallization if you need to compute many independent >>> convolutions and not just one. >>> >>> That said scipy should probably release the GIL too, probably a bug that >>> it doesn't. >>> >>> On 10.07.2014 17:19, Ashwin Srinath wrote: >>> > Hey, Sai >>> > >>> > I'm no expert, so I'll just share a few links to start this discussion. >>> > You definitely want to look at Cython if you're >>> > computing with NumPy arrays. If you're familiar with the MPI >>> programming >>> > model, you want to check out mpi4py . If you >>> > have NVIDIA GPUs that you'd like to take advantage of, check out PyCUDA >>> > . >>> > >>> > Thanks, >>> > Ashwin >>> > >>> > >>> > On Thu, Jul 10, 2014 at 6:08 AM, Sai Rajeshwar >> > > wrote: >>> > >>> > hi all, >>> > >>> > im trying to optimise a python code takes huge amount of time on >>> > scipy functions such as scipy.signa.conv. Following are some of my >>> > queries regarding the same.. It would be great to hear from you.. >>> > thanks.. >>> > ---------------------------------------------------- >>> > 1) Can Scipy take advantage of multi-cores.. if so how >>> > 2)what are ways we can improve the performance of scipy/numpy >>> > functions eg: using openmp, mpi etc >>> > 3)If scipy internally use blas/mkl libraries can we enable >>> > parallelism through these? >>> > >>> > >>> > looks like i have to work on internals of scipy.. thanks a lot.. >>> > >>> > >>> > *with regards..* >>> > * >>> > * >>> > *M. Sai Rajeswar* >>> > *M-tech Computer Technology* >>> > *IIT Delhi >>> > ----------------------------------Cogito Ergo Sum--------- >>> > * >>> > >>> > _______________________________________________ >>> > SciPy-Dev mailing list >>> > SciPy-Dev at scipy.org >>> > http://mail.scipy.org/mailman/listinfo/scipy-dev >>> > >>> > >>> > >>> > >>> > _______________________________________________ >>> > SciPy-Dev mailing list >>> > SciPy-Dev at scipy.org >>> > http://mail.scipy.org/mailman/listinfo/scipy-dev >>> > >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >> >> > > There are also convolution functions in scipy.ndimage. For simple smallish > 1d convolution ndimage is much much faster than scipy.signal and somewhat > faster than numpy.convolve. > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From derek at astro.physik.uni-goettingen.de Thu Jul 24 12:46:58 2014 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Thu, 24 Jul 2014 18:46:58 +0200 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References: <53C02235.8000705@googlemail.com> Message-ID: <71F4F370-A187-4F84-9353-DCCB5C189D2C@astro.physik.uni-goettingen.de> Hi Sai, > but when i use numpy.convolve i get this error ValueError: object too deep for desired array > > does numpy.convolve work for 2D or 3D convolution? > no, it works on linear arrays only, as you will find in the documentation. It seems the best optimisation strategy for your case would depend on how many individual convolutions of what size arrays it involves. For large arrays, as Sturla has suggested, scipy.signal.fftconvolve which does operate on multi-D arrays, could be the best (or at least initially easiest) way to go. HTH Derek From rajsai24 at gmail.com Thu Jul 24 13:34:24 2014 From: rajsai24 at gmail.com (Sai Rajeshwar) Date: Thu, 24 Jul 2014 23:04:24 +0530 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References: <26946BDF-2841-4E79-AF20-9266A2C97377@astro.physik.uni-goettingen.de> Message-ID: hi david, tried as you suggested ---------------------------------------------------------------- )for i in xrange(pooled_shape[1]): for j in xrange(pooled_shape[2]): for k in xrange(pooled_shape[3]): for l in xrange(pooled_shape[4]): pooled[0][i][j][k][l]=math. > > tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy. > sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_ > out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j]) You should get a speed up by accessing the arrays in a more efficient way: pooled[0, i, j, k, l] = math.tanh((numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+2, l*3:(l+1)*3]))/9.0+b[i, j]) In fact: numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3]) seems equivalent to: numpy.sum(conv_out[0, i, j, k*3: k*3 +1, l*3:(l+1)*3]) To take the last one into account: vec = numpy.sum(conv_out[0, i, j, k*3: k*3 + 2, l*3:(l+1)*3], axis=-1) pooled[0, i, j, k, l] = vec[0] + vec[1] + vec[2] / 9.0 And you can probably get rid of the i and j indexes all together. Something like this should work (untested): for k in... for l in... output = numpy.sum(conv_out[0, :, :, k*3: k*3 +1, l*3:(l+1)*3]), axis=-1) output += numpy.sum(conv_out[0, :, :, k*3 + 2, l*3 : (l+1)*3])), axis=-1)/9.0 output += b pooled[0, :, :, k, l] = numpy.tanh(output) ----------------------------------------------------------------- for i in xrange(self.pooled_shape[1]): for j in xrange(self.pooled_shape[2]): for k in xrange(self.pooled_shape[3]): for l in xrange(self.pooled_shape[4]): #-- commented-- self.pooled[0][i][j][k][l]=math.tanh((numpy.sum(self.conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(self.conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(self.conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+self.b[i][j]) vec = numpy.sum(self.conv_out[0, i, j, k*3: k*3 + 2, l*3:(l+1)*3], axis=-1) self.pooled[0, i, j, k, l] = math.tanh((vec[0] + vec[1] + vec[2] )/ 9.0+self.b[i][j]) but it gave following error ---------------------------------------------------------------------------------- Traceback (most recent call last): File "3dcnn_test.py", line 401, in check() File "3dcnn_test.py", line 392, in check layer1.change_input(numpy.reshape(test_set_x[i],(1,1,9,60,80))) File "3dcnn_test.py", line 77, in change_input self.pooled[0, i, j, k, l] = math.tanh((vec[0] + vec[1] + vec[2] )/ 9.0+self.b[i][j]) IndexError: index out of bounds *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Sun, Jul 13, 2014 at 9:08 PM, Da?id wrote: > > On 13 July 2014 14:28, Sai Rajeshwar wrote: > >> >> 2)for i in xrange(pooled_shape[1]): >> for j in xrange(pooled_shape[2]): >> for k in xrange(pooled_shape[3]): >> for l in xrange(pooled_shape[4]): >> >> pooled[0][i][j][k][l]=math.tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j]) > > > You should get a speed up by accessing the arrays in a more efficient way: > > pooled[0, i, j, k, l] = math.tanh((numpy.sum(conv_out[0, i, j, k*3, > l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3]) + > numpy.sum(conv_out[0, i, j, k*3+2, l*3:(l+1)*3]))/9.0+b[i, j]) > > In fact: > > numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, > j, k*3+1, l*3:(l+1)*3]) > > seems equivalent to: > > numpy.sum(conv_out[0, i, j, k*3: k*3 +1, l*3:(l+1)*3]) > > To take the last one into account: > > vec = numpy.sum(conv_out[0, i, j, k*3: k*3 + 2, l*3:(l+1)*3], axis=-1) > pooled[0, i, j, k, l] = vec[0] + vec[1] + vec[2] / 9.0 > > And you can probably get rid of the i and j indexes all together. > Something like this should work (untested): > > for k in... > for l in... > output = numpy.sum(conv_out[0, :, :, k*3: k*3 +1, l*3:(l+1)*3]), axis=-1) > output += numpy.sum(conv_out[0, :, :, k*3 + 2, l*3 : (l+1)*3])), > axis=-1)/9.0 > output += b > pooled[0, :, :, k, l] = numpy.tanh(output) > > In this case, one of the loops seems a great target for parallelisation. > Also, Cython should help reduce the loop overhead. > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vanforeest at gmail.com Thu Jul 24 14:49:08 2014 From: vanforeest at gmail.com (nicky van foreest) Date: Thu, 24 Jul 2014 20:49:08 +0200 Subject: [SciPy-Dev] scipy.sparse versus pysparse In-Reply-To: References:

Message-ID: That would be great. I'll check the mailing list to see when it comes along :-) On 24 July 2014 15:13, Pauli Virtanen wrote: > 24.07.2014, 11:11, nicky van foreest kirjoitti: > > Thanks for your clarifications. > > I should note that the issue of adding a sparse format more suitable for > fast matrix assembly has been brought up, but not implemented yet. > > While the fact that lil_matrix and dok_matrix are Python data structures > is nice, a more practical approach would use opaque data storage > (similar to ll_mat in pysparse). > > Pauli > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Thu Jul 24 22:47:35 2014 From: argriffi at ncsu.edu (alex) Date: Thu, 24 Jul 2014 22:47:35 -0400 Subject: [SciPy-Dev] scipy.sparse versus pysparse In-Reply-To: References:

Message-ID: On Thu, Jul 24, 2014 at 2:49 PM, nicky van foreest wrote: > That would be great. I'll check the mailing list to see when it comes along :-) After profiling the code I see that much of the time is spent doing numerical things with the DOK data, like negating or scaling each value. This is slow for DOK but faster for other formats like COO. Although you can't set COO matrix entries after having built the matrix, implementing setdiag seemed enough to work for your case, although possibly losing the parts of the API that you prefer over that of pysparse. I took the liberty to rewrite some of the code exploring some other scipy.sparse.linalg algorithms; I too run across problems like this where I need the equilibrium distribution for a sparse instantaneous transition rate matrix with a combinatorially large state space. For your particular example the shift-invert arpack function that uses superlu for sparse decomposition seems unreasonably effective. Cheers, Alex --- from __future__ import print_function from numpy import ones, zeros, array import scipy.sparse as sp from scipy.sparse.linalg import eigs from pylab import matshow, savefig from scipy.linalg import norm import time labda, mu1, mu2 = 1., 1.1, 1.01 N1, N2 = 400, 400 size = N1*N2 eps = 1e-3 maxIterations = int(1e5) guess = array([1] + [0] * (size-1)) def state(i,j): return j*N1 + i def gen_rate_triples(): for i in range(0,N1-1): for j in range(0,N2): yield state(i, j), state(i+1, j), labda for i in range(0,N1): for j in range(1,N2): yield state(i, j), state(i, j-1), mu2 for i in range(1,N1): for j in range(0,N2-1): yield state(i, j), state(i-1, j+1), mu1 def get_rate_info(triples): rows, cols, rates = zip(*triples) pre_Q = sp.coo_matrix((rates, (rows, cols)), shape=(size, size)) exit_rates = pre_Q.dot(ones(size)) return pre_Q, exit_rates def get_PT(pre_Q, exit_rates): urate = exit_rates.max() * 1.001 P = pre_Q / urate P.setdiag(1 - exit_rates / urate) return P.T.tocsr() def get_QT(pre_Q, exit_rates): Q = pre_Q.copy() Q.setdiag(-exit_rates) return Q.T.tocsr() def QTpi(QT, guess, tol): w, v = eigs(QT, k=1, v0=guess, sigma=1e-6, which='LM', tol=tol, maxiter=maxIterations) pi = v[:, 0].real return pi / pi.sum() def PTpi(PT, guess, tol): w, v = eigs(PT, k=1, v0=guess, tol=eps, maxiter=maxIterations) pi = v[:, 0].real return pi / pi.sum() def power_method(PT, guess, abstol): p_prev = zeros(size) p = guess.copy() for i in range(maxIterations): if norm(p - p_prev, ord=1) < abstol: break p_prev = PT.dot(p) p = PT.dot(p_prev) return p print('settings:') print('labda:', labda) print('mu1:', mu1) print('mu2:', mu2) print('N1:', N1) print('N2:', N2) print() print('precalculation times:') tm = time.time() pre_Q, exit_rates = get_rate_info(gen_rate_triples()) print(time.time() - tm, 'for some rate info') tm = time.time() QT = get_QT(pre_Q, exit_rates) print(time.time() - tm, 'to make the transition rate matrix') tm = time.time() PT = get_PT(pre_Q, exit_rates) print(time.time() - tm, 'to make the uniformized trans prob matrix') print() # use an iterative method tm = time.time() pi_P = PTpi(PT, guess, eps) tm_P = time.time() - tm # use superlu and arpack tm = time.time() pi_Q = QTpi(QT, guess, eps) tm_Q = time.time() - tm # use a power method tm = time.time() pi_R = power_method(PT, guess, eps) tm_R = time.time() - tm # make pngs matshow(pi_P.reshape(N2,N1)); savefig("pi_P.png") matshow(pi_Q.reshape(N2,N1)); savefig("pi_Q.png") matshow(pi_R.reshape(N2,N1)); savefig("pi_R.png") print('distribution estimates:') print('P:', pi_P) print('Q:', pi_Q) print('R:', pi_R) print() print('computation times for the iterations:') print('P:', tm_P) print('Q:', tm_Q) print('R:', tm_R) print() print('violation of the invariant pi * P = pi:') print('P:', norm(PT*pi_P - pi_P, ord=1)) print('Q:', norm(PT*pi_Q - pi_Q, ord=1)) print('R:', norm(PT*pi_R - pi_R, ord=1)) print() print('violation of the invariant pi * Q = 0:') print('P:', norm(QT*pi_P, ord=1)) print('Q:', norm(QT*pi_Q, ord=1)) print('R:', norm(QT*pi_R, ord=1)) print() print('pngs:') print('P: pi_P.png') print('Q: pi_Q.png') print('R: pi_R.png') print() --- settings: labda: 1.0 mu1: 1.1 mu2: 1.01 N1: 400 N2: 400 precalculation times: 0.753726005554 for some rate info 0.045077085495 to make the transition rate matrix 0.0467808246613 to make the uniformized trans prob matrix distribution estimates: P: [ 0.00501086 0.00455523 0.00414086 ..., -0. -0. -0. ] Q: [ 9.00503398e-04 8.18606355e-04 7.44190344e-04 ..., 3.62901458e-07 3.59308342e-07 3.55750817e-07] R: [ 0.0053192 0.00483551 0.00439555 ..., 0. 0. 0. ] computation times for the iterations: P: 2.88090801239 Q: 4.26303100586 R: 0.834770202637 violation of the invariant pi * P = pi: P: 0.0016872009599 Q: 2.74708912485e-08 R: 0.000994144099758 violation of the invariant pi * Q = 0: P: 0.00525244218029 Q: 8.55199061005e-08 R: 0.0030948799384 pngs: P: pi_P.png Q: pi_Q.png R: pi_R.png From vanforeest at gmail.com Fri Jul 25 03:59:47 2014 From: vanforeest at gmail.com (nicky van foreest) Date: Fri, 25 Jul 2014 09:59:47 +0200 Subject: [SciPy-Dev] scipy.sparse versus pysparse In-Reply-To: References:

Message-ID: HI Alex, Thanks for your input. I'll try to run it and get back to you. Nicky On 25 July 2014 04:47, alex wrote: > On Thu, Jul 24, 2014 at 2:49 PM, nicky van foreest > wrote: > > That would be great. I'll check the mailing list to see when it comes > along :-) > > After profiling the code I see that much of the time is spent doing > numerical things with the DOK data, like negating or scaling each > value. This is slow for DOK but faster for other formats like COO. > Although you can't set COO matrix entries after having built the > matrix, implementing setdiag seemed enough to work for your case, > although possibly losing the parts of the API that you prefer over > that of pysparse. > > I took the liberty to rewrite some of the code exploring some other > scipy.sparse.linalg algorithms; I too run across problems like this > where I need the equilibrium distribution for a sparse instantaneous > transition rate matrix with a combinatorially large state space. For > your particular example the shift-invert arpack function that uses > superlu for sparse decomposition seems unreasonably effective. > > Cheers, > Alex > > --- > > from __future__ import print_function > from numpy import ones, zeros, array > import scipy.sparse as sp > from scipy.sparse.linalg import eigs > from pylab import matshow, savefig > from scipy.linalg import norm > import time > > labda, mu1, mu2 = 1., 1.1, 1.01 > N1, N2 = 400, 400 > size = N1*N2 > eps = 1e-3 > maxIterations = int(1e5) > guess = array([1] + [0] * (size-1)) > > def state(i,j): > return j*N1 + i > > def gen_rate_triples(): > for i in range(0,N1-1): > for j in range(0,N2): > yield state(i, j), state(i+1, j), labda > for i in range(0,N1): > for j in range(1,N2): > yield state(i, j), state(i, j-1), mu2 > for i in range(1,N1): > for j in range(0,N2-1): > yield state(i, j), state(i-1, j+1), mu1 > > def get_rate_info(triples): > rows, cols, rates = zip(*triples) > pre_Q = sp.coo_matrix((rates, (rows, cols)), shape=(size, size)) > exit_rates = pre_Q.dot(ones(size)) > return pre_Q, exit_rates > > def get_PT(pre_Q, exit_rates): > urate = exit_rates.max() * 1.001 > P = pre_Q / urate > P.setdiag(1 - exit_rates / urate) > return P.T.tocsr() > > def get_QT(pre_Q, exit_rates): > Q = pre_Q.copy() > Q.setdiag(-exit_rates) > return Q.T.tocsr() > > def QTpi(QT, guess, tol): > w, v = eigs(QT, k=1, v0=guess, sigma=1e-6, which='LM', > tol=tol, maxiter=maxIterations) > pi = v[:, 0].real > return pi / pi.sum() > > def PTpi(PT, guess, tol): > w, v = eigs(PT, k=1, v0=guess, tol=eps, maxiter=maxIterations) > pi = v[:, 0].real > return pi / pi.sum() > > def power_method(PT, guess, abstol): > p_prev = zeros(size) > p = guess.copy() > for i in range(maxIterations): > if norm(p - p_prev, ord=1) < abstol: > break > p_prev = PT.dot(p) > p = PT.dot(p_prev) > return p > > print('settings:') > print('labda:', labda) > print('mu1:', mu1) > print('mu2:', mu2) > print('N1:', N1) > print('N2:', N2) > print() > > print('precalculation times:') > tm = time.time() > pre_Q, exit_rates = get_rate_info(gen_rate_triples()) > print(time.time() - tm, 'for some rate info') > tm = time.time() > QT = get_QT(pre_Q, exit_rates) > print(time.time() - tm, 'to make the transition rate matrix') > tm = time.time() > PT = get_PT(pre_Q, exit_rates) > print(time.time() - tm, 'to make the uniformized trans prob matrix') > print() > > # use an iterative method > tm = time.time() > pi_P = PTpi(PT, guess, eps) > tm_P = time.time() - tm > > # use superlu and arpack > tm = time.time() > pi_Q = QTpi(QT, guess, eps) > tm_Q = time.time() - tm > > # use a power method > tm = time.time() > pi_R = power_method(PT, guess, eps) > tm_R = time.time() - tm > > # make pngs > matshow(pi_P.reshape(N2,N1)); savefig("pi_P.png") > matshow(pi_Q.reshape(N2,N1)); savefig("pi_Q.png") > matshow(pi_R.reshape(N2,N1)); savefig("pi_R.png") > > print('distribution estimates:') > print('P:', pi_P) > print('Q:', pi_Q) > print('R:', pi_R) > print() > print('computation times for the iterations:') > print('P:', tm_P) > print('Q:', tm_Q) > print('R:', tm_R) > print() > print('violation of the invariant pi * P = pi:') > print('P:', norm(PT*pi_P - pi_P, ord=1)) > print('Q:', norm(PT*pi_Q - pi_Q, ord=1)) > print('R:', norm(PT*pi_R - pi_R, ord=1)) > print() > print('violation of the invariant pi * Q = 0:') > print('P:', norm(QT*pi_P, ord=1)) > print('Q:', norm(QT*pi_Q, ord=1)) > print('R:', norm(QT*pi_R, ord=1)) > print() > print('pngs:') > print('P: pi_P.png') > print('Q: pi_Q.png') > print('R: pi_R.png') > print() > > --- > > settings: > labda: 1.0 > mu1: 1.1 > mu2: 1.01 > N1: 400 > N2: 400 > > precalculation times: > 0.753726005554 for some rate info > 0.045077085495 to make the transition rate matrix > 0.0467808246613 to make the uniformized trans prob matrix > > distribution estimates: > P: [ 0.00501086 0.00455523 0.00414086 ..., -0. -0. > -0. ] > Q: [ 9.00503398e-04 8.18606355e-04 7.44190344e-04 ..., > 3.62901458e-07 > 3.59308342e-07 3.55750817e-07] > R: [ 0.0053192 0.00483551 0.00439555 ..., 0. 0. > 0. ] > > computation times for the iterations: > P: 2.88090801239 > Q: 4.26303100586 > R: 0.834770202637 > > violation of the invariant pi * P = pi: > P: 0.0016872009599 > Q: 2.74708912485e-08 > R: 0.000994144099758 > > violation of the invariant pi * Q = 0: > P: 0.00525244218029 > Q: 8.55199061005e-08 > R: 0.0030948799384 > > pngs: > P: pi_P.png > Q: pi_Q.png > R: pi_R.png > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rajsai24 at gmail.com Fri Jul 25 09:34:19 2014 From: rajsai24 at gmail.com (Sai Rajeshwar) Date: Fri, 25 Jul 2014 19:04:19 +0530 Subject: [SciPy-Dev] scipy improve performance by parallelizing In-Reply-To: References: <26946BDF-2841-4E79-AF20-9266A2C97377@astro.physik.uni-goettingen.de>

Message-ID: ok i guess axis=-1 option is not required.. that solved it *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* On Thu, Jul 24, 2014 at 11:04 PM, Sai Rajeshwar wrote: > hi david, > > tried as you suggested > ---------------------------------------------------------------- > > )for i in xrange(pooled_shape[1]): > for j in xrange(pooled_shape[2]): > for k in xrange(pooled_shape[3]): > for l in xrange(pooled_shape[4]): > pooled[0][i][j][k][l]=math. >> >> tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy. >> sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_ >> out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j]) > > > You should get a speed up by accessing the arrays in a more efficient way: > > pooled[0, i, j, k, l] = math.tanh((numpy.sum(conv_out[0, i, j, k*3, > l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3]) + > numpy.sum(conv_out[0, i, j, k*3+2, l*3:(l+1)*3]))/9.0+b[i, j]) > > In fact: > > numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, > j, k*3+1, l*3:(l+1)*3]) > > seems equivalent to: > > numpy.sum(conv_out[0, i, j, k*3: k*3 +1, l*3:(l+1)*3]) > > To take the last one into account: > > vec = numpy.sum(conv_out[0, i, j, k*3: k*3 + 2, l*3:(l+1)*3], axis=-1) > pooled[0, i, j, k, l] = vec[0] + vec[1] + vec[2] / 9.0 > > And you can probably get rid of the i and j indexes all together. > Something like this should work (untested): > > for k in... > for l in... > output = numpy.sum(conv_out[0, :, :, k*3: k*3 +1, l*3:(l+1)*3]), axis=-1) > output += numpy.sum(conv_out[0, :, :, k*3 + 2, l*3 : (l+1)*3])), > axis=-1)/9.0 > output += b > pooled[0, :, :, k, l] = numpy.tanh(output) > ----------------------------------------------------------------- > > > for i in xrange(self.pooled_shape[1]): > for j in xrange(self.pooled_shape[2]): > for k in xrange(self.pooled_shape[3]): > for l in xrange(self.pooled_shape[4]): > > #-- commented-- > self.pooled[0][i][j][k][l]=math.tanh((numpy.sum(self.conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(self.conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(self.conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+self.b[i][j]) > > vec = numpy.sum(self.conv_out[0, i, j, k*3: k*3 + > 2, l*3:(l+1)*3], axis=-1) > self.pooled[0, i, j, k, l] = math.tanh((vec[0] + > vec[1] + vec[2] )/ 9.0+self.b[i][j]) > > > but it gave following error > > ---------------------------------------------------------------------------------- > Traceback (most recent call last): > File "3dcnn_test.py", line 401, in > check() > File "3dcnn_test.py", line 392, in check > layer1.change_input(numpy.reshape(test_set_x[i],(1,1,9,60,80))) > File "3dcnn_test.py", line 77, in change_input > self.pooled[0, i, j, k, l] = math.tanh((vec[0] + vec[1] + vec[2] )/ > 9.0+self.b[i][j]) > IndexError: index out of bounds > > *with regards..* > > *M. Sai Rajeswar* > *M-tech Computer Technology* > > > *IIT Delhi----------------------------------Cogito Ergo Sum---------* > > > On Sun, Jul 13, 2014 at 9:08 PM, Da?id wrote: > >> >> On 13 July 2014 14:28, Sai Rajeshwar wrote: >> >>> >>> 2)for i in xrange(pooled_shape[1]): >>> for j in xrange(pooled_shape[2]): >>> for k in xrange(pooled_shape[3]): >>> for l in xrange(pooled_shape[4]): >>> >>> pooled[0][i][j][k][l]=math.tanh((numpy.sum(conv_out[0][i][j][k*3][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+1][l*3:(l+1)*3])+numpy.sum(conv_out[0][i][j][k*3+2][l*3:(l+1)*3]))/9.0+b[i][j]) >> >> >> You should get a speed up by accessing the arrays in a more efficient way: >> >> pooled[0, i, j, k, l] = math.tanh((numpy.sum(conv_out[0, i, j, k*3, >> l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, j, k*3+1, l*3:(l+1)*3]) + >> numpy.sum(conv_out[0, i, j, k*3+2, l*3:(l+1)*3]))/9.0+b[i, j]) >> >> In fact: >> >> numpy.sum(conv_out[0, i, j, k*3, l*3:(l+1)*3]) + numpy.sum(conv_out[0, i, >> j, k*3+1, l*3:(l+1)*3]) >> >> seems equivalent to: >> >> numpy.sum(conv_out[0, i, j, k*3: k*3 +1, l*3:(l+1)*3]) >> >> To take the last one into account: >> >> vec = numpy.sum(conv_out[0, i, j, k*3: k*3 + 2, l*3:(l+1)*3], axis=-1) >> pooled[0, i, j, k, l] = vec[0] + vec[1] + vec[2] / 9.0 >> >> And you can probably get rid of the i and j indexes all together. >> Something like this should work (untested): >> >> for k in... >> for l in... >> output = numpy.sum(conv_out[0, :, :, k*3: k*3 +1, l*3:(l+1)*3]), axis=-1) >> output += numpy.sum(conv_out[0, :, :, k*3 + 2, l*3 : (l+1)*3])), >> axis=-1)/9.0 >> output += b >> pooled[0, :, :, k, l] = numpy.tanh(output) >> >> In this case, one of the loops seems a great target for parallelisation. >> Also, Cython should help reduce the loop overhead. >> >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-dev >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rajsai24 at gmail.com Sun Jul 27 04:28:38 2014 From: rajsai24 at gmail.com (Sai Rajeshwar) Date: Sun, 27 Jul 2014 13:58:38 +0530 Subject: [SciPy-Dev] convolution using numpy/scipy using MKL libraries Message-ID: hi all, Im trying to implement 3d convolutional networks.. for which I wanted to use convolve function from scipy.signal.convolve or fftconvolve.. but looks like both of them doesnot use MKL libraries.. is there any implementation of convolutoin which uses MKL libraries or MKL-threaded so that code runs faster. thanks a lot in advance *with regards..* *M. Sai Rajeswar* *M-tech Computer Technology* *IIT Delhi----------------------------------Cogito Ergo Sum---------* -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sun Jul 27 12:36:48 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 27 Jul 2014 16:36:48 +0000 (UTC) Subject: [SciPy-Dev] convolution using numpy/scipy using MKL libraries References: Message-ID: <348070374428171087.737381sturla.molden-gmail.com@news.gmane.org> Sai Rajeshwar wrote: > Im trying to implement 3d convolutional networks.. for which I wanted to > use convolve function from scipy.signal.convolve or fftconvolve.. but > looks like both of them doesnot use MKL libraries.. That is correct. MKL is used for matrix multiplication and linear algebra. > is there any > implementation of convolutoin which uses MKL libraries or MKL-threaded so > that code runs faster. In Enthought canopy, numpy.fft.* uses MKL if you install the fastnumpy package. Otherwise you will have to make it yourself, which shouldn't be difficult. The main thing is to find out if you want FFT convolution or time-domain convolution. The latter boils down to a simple loop and a call to BLAS function *DOT. You can use Python threads or a Cython prange loop to run it in parallel. Anyway, a convolution is just a tiny loop to write in Cython, C or Fortran, so I don't really see what the problem is. Sturla From jtaylor.debian at googlemail.com Wed Jul 30 16:20:05 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 30 Jul 2014 22:20:05 +0200 Subject: [SciPy-Dev] ANN: NumPy 1.9.0 beta 2 release Message-ID: <53D95375.5080707@googlemail.com> Hello, The source packages and binaries got numpy 1.9.0 beta 2 have just been uploaded to sourceforge. https://sourceforge.net/projects/numpy/files/NumPy/1.9.0b2 1.9.0 will be a new feature release supporting Python 2.6 - 2.7 and 3.2 - 3.4. Unfortunately we have disabled the new __numpy_ufunc__ feature for overriding ufuncs in subclasses for now. There are still some unresolved issues with its behavior regarding python operator precedence and subclasses. If you have a stake in the issue please read Pauli's summary of the remaining issues: http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070737.html When the issues are resolved to everyones satisfaction we hope to enable the feature for 1.10 in its final form. We have restored the indexing edge case that broke matplotlib with numpy 1.9.0 beta 1 but some of the other test failures in other packages are deemed bugs in their code and not reasonable to support in numpy anymore. Most projects have fixed the issues in their latest stable or development versions. Depending on how bad the broken functionality is you may need to update your third party packages when updating numpy to 1.9.0b2. An attempt was made to update the windows binary toolchain to the latest mingw/mingw64 version and an up to date ATLAS version but this turned up a few ugly test failures. Help in resolving these issues is appreciated, no core developer has Windows debugging experience. Please see this issue for details: https://github.com/numpy/numpy/issues/4909 The changelog is mostly the same as in beta1. Please read it carefully there have been many small changes that could affect your code. https://github.com/numpy/numpy/blob/maintenance/1.9.x/doc/release/1.9.0-notes.rst Please also take special note of the future changes section which will apply to the following release 1.10.0 and make sure to check if your applications would be affected by them. Source tarballs, windows installers and release notes can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.9.0b2 Cheers, Julian Taylor -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: