From kumar.sachin at yandex.com Tue Jan 7 13:56:44 2014 From: kumar.sachin at yandex.com (Sachin Kumar) Date: Wed, 08 Jan 2014 00:26:44 +0530 Subject: [SciPy-User] Number of equations and precision of integrate.odeint() Message-ID: <18531389121004@web22g.yandex.ru> Hi, Is there any reason why integrate.odeint() should become less precise when the number of equations increase? ?I'm trying to solve these two sets of differential equations: ??dy1/dx = x**2 ????(1) ??dy2/dx = x**3 and, ??dy1/dx = x**2 ????(2) Since the two equations in (1) are uncoupled, y1 from integrating (1) and (2) should be the same. ?I'm implementing the two in the following snippet: ??import numpy as np ??from scipy.integrate import odeint ??def f(y, x): ??????return [x**2, x**3] ??def g(y, x): ??????return x**2 ??a = odeint(f, [0.0, 0.0], np.arange(0, 5, 0.0001)) ??b = odeint(g, 0.0, np.arange(0, 5, 0.0001)) ??print a[-1][0], b[-1][0], abs(a[-1][0] - b[-1][0]) Running the above code gives me: ??41.6641667161 41.6641667354 1.93298319573e-08 The difference seems to be very insignificant here. ?But in cases where the number of equations becomes large (for e.g. in Lyapunov exponent calculations), it appears to cause significant differences. What could be going on here? -sachin From newville at cars.uchicago.edu Tue Jan 7 15:22:48 2014 From: newville at cars.uchicago.edu (Matt Newville) Date: Tue, 7 Jan 2014 14:22:48 -0600 Subject: [SciPy-User] leastsq and multiprocessing In-Reply-To: References: Message-ID: On Fri, Dec 20, 2013 at 7:09 AM, Matt Newville wrote: > Jeremy, > > On Fri, Dec 20, 2013 at 6:43 AM, Jeremy Sanders > wrote: >> Matt Newville wrote: >> >>> Currently, scipy's leastsq() simply calls the Fortran lmdif() (for >>> finite-diff Jacobian). I think replacing fdjac2() with a >>> multiprocessing version would require reimplementing both lmdif() and >>> fdjac2(), probably using cython. If calls to MINPACKs lmpar() and >>> qrfac() could be left untouched, this translation does not look too >>> insane -- the two routines lmdif() and fdjac2() themselves are not >>> that complicated. It would be a fair amount of work, and I cannot >>> volunteer to do this myself any time soon. But, I do think it >>> actually would improve the speed of leastsq() for many use cases. >> >> Computing the Jacobian using using multiprocessing definitely helps the >> speed. I wrote the unrated answer (xioxox) there which shows how to do it in >> Python. >> >> Jeremy >> > > Sorry, I hadn't read the stackoverflow discussion carefully enough. > You're right that this is the same basic approach, and your suggestion > is much easier to implement. I think having helper functions to > automatically provide this functionality would be really great. > I've implemented such an ability to use a multiprocessing Pool for leastsq() in a way that I think is suitable for scipy. This is currently at https://github.com/newville/scipy/commit/3d0ac1da3bcd1d34a1bec8226ea0284f04fcb5dc it adds an "mp_pool" argument to leastsq() which, if not None and if Dfun is not otherwise defined, will provide a Dfun that uses the provided multiprocessing Pool. It requires that user to make and manage the multiprocessing Pool rather than try to manage it inside leastsq(). I think this is ready for a PR, but would happily take comments on it. I do notice that for small test programs, such as the test_minpack.py suite, this approach (which is basically a generalization of Jeremy's implementation) is significantly slower (~10x) than not using multiprocessing. This slow-down seems entirely to be due to replaced the Fortran subroutine fdjac2 with Python function Dfun. I imagine that real performance will be highly variable, and have not explored (yet?) whether using Cython might help here. I expect it would, but have no experience using Cython and multiprocessing together. One caveat of the multiprocessing approach is that the objective function must be Pickleable, which can be challenging in many real-world situations, say where the objective function is an instance method. Solutions using copy_reg() are reported to work, but I couldn't get this to work for the test_minpack.py suite, and so that only tests using a plain function for the objective function. Is this addition worth including in leastsq()? I would think that it does little harm, might be useful for some, and provides a starting point for further work. Cheers, --Matt From rob.clewley at gmail.com Thu Jan 9 19:37:35 2014 From: rob.clewley at gmail.com (Rob Clewley) Date: Thu, 9 Jan 2014 19:37:35 -0500 Subject: [SciPy-User] Number of equations and precision of integrate.odeint() In-Reply-To: <18531389121004@web22g.yandex.ru> References: <18531389121004@web22g.yandex.ru> Message-ID: Hi Sachin, On Tue, Jan 7, 2014 at 1:56 PM, Sachin Kumar wrote: > Hi, > > Is there any reason why integrate.odeint() should become less precise > when the number of equations increase? There are certainly reasons why it would be different. You are basically expecting too much of a numerical algorithm. > I'm trying to solve these two > sets of differential equations: > > dy1/dx = x**2 (1) > dy2/dx = x**3 > > and, > > dy1/dx = x**2 (2) > > Since the two equations in (1) are uncoupled, y1 from integrating (1) > and (2) should be the same. "Should" by what standard? This is the critical question you must ask! A pure mathematical standard regarding the dense solution curves of these ODEs is not realistic because on a computer you are dealing with a finite algorithm for approximating solutions. odeint is a variable-step solver. If you give it several variables, coupled or not, it will choose a set of time steps that it finds most appropriate for all of the variables. No doubt, the fact that the rates of change of the two variables differ on your domain means that it chooses different time steps and therefore yields slightly different numerical solutions. The errors will necessarily be different. A numerical analysis textbook will give you a better idea why, but wikipedia explains it pretty well (e.g., look up Runge-Kutta). Tolerances and a sensitivity analysis of your outcomes with respect to your algorithmic parameters are the way to keep control on the growing difference between your two models (or for models with larger dimension). If you really want to force the time steps to be the same for both cases you could use a fixed step solver, but I suspect that's trying to solve the wrong problem. -- Robert Clewley, Ph.D. Assistant Professor Neuroscience Institute and Department of Mathematics and Statistics Georgia State University PO Box 5030 Atlanta, GA 30302, USA tel: 404-413-6420 fax: 404-413-5446 http://neuroscience.gsu.edu/rclewley.html From anand.prabhakar.patil at gmail.com Fri Jan 10 18:36:06 2014 From: anand.prabhakar.patil at gmail.com (Anand Patil) Date: Fri, 10 Jan 2014 15:36:06 -0800 Subject: [SciPy-User] ANN: Sense, a new cloud platform for data analysis Message-ID: Hi everyone, As a longtime fan of the SciPy/NumPy ecosystem, I'm happy to announce Sense, a new cloud platform for data analysis that has the Python data stack built in. For Python programmers, the core Sense experience is a browser-based console-and-editor environment with some nifty extra features. You can use IPython's rich display system right in the console, provision and dismiss worker machines to distribute computation, share code with colleagues without requiring them to install anything, and much more. Sense is a serious Python environment where you can install packages and run code on powerful machines. Our long-term goal is to make Sense the best possible cloud teammate to Python and the rest of the world's awesome open-source data tools. If you're interested in learning more, please watch our demo at https://senseplatform.com and sign up for the beta. Feedback, questions, comments etc. are very welcome. Anand Patil Tristan Zajonc -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Jan 12 22:28:38 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 12 Jan 2014 22:28:38 -0500 Subject: [SciPy-User] Semi-OT: weekend puzzle enumerating number of balls in bins Message-ID: nobs = 5 rr = 2 We have 5 balls and 2 bins. We can put as many balls as we want (0 to 5) in the first bin. Then we put as many balls as we want (and are available) in the second bin. And so on if we had more bins. I'd like to enumerate all possible bin allocations, in this case 21: >>> m_all array([[0, 0], [0, 1], [0, 2], [0, 3], [0, 4], [0, 5], [1, 0], [1, 1], [1, 2], [1, 3], [1, 4], [2, 0], [2, 1], [2, 2], [2, 3], [3, 0], [3, 1], [3, 2], [4, 0], [4, 1], [5, 0]]) Here is a simple way to caculate it. The number of cases gets large very fast. ----------- import itertools import numpy as np nobs = 10 rr = 3 m_all = np.array(list(itertools.ifilter(lambda ii: sum(ii)<=nobs, itertools.product(*[range(nobs + 1)]*rr)))) print m_all.shape print np.bincount(m_all.sum(1)) ----------- Is there a faster way to do this? I need to do some calculation on each case, so either a numpy array or an iterator works. The motivation: We have two kinds of machines, 10 each that we want to test for failing early. The test has the null hypothesis that the failure distributions are the same. Against a one-sided alternative that the first kind of machines fails earlier. We run the 20 machines and count how many machines of the first kind has failed by the time that we see the rth (3rd) failure of the second kind. If a large number of the machines of the first kind have failed, then we reject the Null that they have the same failure distribution. We don't have time to wait until all machines fail. So we need to do "very small" sample statistics. Wilcoxon-type precedence tests. Josef It's Big Data, if we have big machines. From guziy.sasha at gmail.com Mon Jan 13 01:17:41 2014 From: guziy.sasha at gmail.com (Oleksandr Huziy) Date: Mon, 13 Jan 2014 01:17:41 -0500 Subject: [SciPy-User] Semi-OT: weekend puzzle enumerating number of balls in bins In-Reply-To: References: Message-ID: Hi Josef: This recursive solution looks faster: http://nbviewer.ipython.org/urls/raw2.github.com/guziy/PyNotebooks/master/bins.ipynb?create=1 I think you can optimize it to get rid of the loops using numpy functions, I did not do it since you probably know them better)) Cheers 2014/1/12 > nobs = 5 > rr = 2 > > We have 5 balls and 2 bins. > We can put as many balls as we want (0 to 5) in the first bin. > Then we put as many balls as we want (and are available) in the second bin. > And so on if we had more bins. > > I'd like to enumerate all possible bin allocations, in this case 21: > > >>> m_all > array([[0, 0], > [0, 1], > [0, 2], > [0, 3], > [0, 4], > [0, 5], > [1, 0], > [1, 1], > [1, 2], > [1, 3], > [1, 4], > [2, 0], > [2, 1], > [2, 2], > [2, 3], > [3, 0], > [3, 1], > [3, 2], > [4, 0], > [4, 1], > [5, 0]]) > > > Here is a simple way to caculate it. The number of cases gets large very > fast. > ----------- > import itertools > import numpy as np > > nobs = 10 > rr = 3 > m_all = np.array(list(itertools.ifilter(lambda ii: sum(ii)<=nobs, > itertools.product(*[range(nobs + > 1)]*rr)))) > print m_all.shape > print np.bincount(m_all.sum(1)) > ----------- > > Is there a faster way to do this? > > I need to do some calculation on each case, so either a numpy array or > an iterator works. > > > > The motivation: > > We have two kinds of machines, 10 each that we want to test for failing > early. > The test has the null hypothesis that the failure distributions are the > same. > Against a one-sided alternative that the first kind of machines fails > earlier. > > We run the 20 machines and count how many machines of the first kind > has failed by the time that we see the rth (3rd) failure of the second > kind. > If a large number of the machines of the first kind have failed, then > we reject the Null that they have the same failure distribution. > > We don't have time to wait until all machines fail. So we need to do > "very small" sample statistics. > > Wilcoxon-type precedence tests. > > > Josef > It's Big Data, if we have big machines. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -- Sasha -------------- next part -------------- An HTML attachment was scrubbed... URL: From gblive at gmail.com Mon Jan 13 01:32:48 2014 From: gblive at gmail.com (Mike Timonin) Date: Mon, 13 Jan 2014 06:32:48 +0000 Subject: [SciPy-User] Semi-OT: weekend puzzle enumerating number of balls in bins In-Reply-To: References: Message-ID: Hey Josef, If you just need the number of possible allocations, have a look at this http://math.stackexchange.com/questions/382935/combinatorics-distribution-number-of-integer-solutions-concept-explanation Regards, Mikhail On 13 Jan 2014 03:28, wrote: > nobs = 5 > rr = 2 > > We have 5 balls and 2 bins. > We can put as many balls as we want (0 to 5) in the first bin. > Then we put as many balls as we want (and are available) in the second bin. > And so on if we had more bins. > > I'd like to enumerate all possible bin allocations, in this case 21: > > >>> m_all > array([[0, 0], > [0, 1], > [0, 2], > [0, 3], > [0, 4], > [0, 5], > [1, 0], > [1, 1], > [1, 2], > [1, 3], > [1, 4], > [2, 0], > [2, 1], > [2, 2], > [2, 3], > [3, 0], > [3, 1], > [3, 2], > [4, 0], > [4, 1], > [5, 0]]) > > > Here is a simple way to caculate it. The number of cases gets large very > fast. > ----------- > import itertools > import numpy as np > > nobs = 10 > rr = 3 > m_all = np.array(list(itertools.ifilter(lambda ii: sum(ii)<=nobs, > itertools.product(*[range(nobs + > 1)]*rr)))) > print m_all.shape > print np.bincount(m_all.sum(1)) > ----------- > > Is there a faster way to do this? > > I need to do some calculation on each case, so either a numpy array or > an iterator works. > > > > The motivation: > > We have two kinds of machines, 10 each that we want to test for failing > early. > The test has the null hypothesis that the failure distributions are the > same. > Against a one-sided alternative that the first kind of machines fails > earlier. > > We run the 20 machines and count how many machines of the first kind > has failed by the time that we see the rth (3rd) failure of the second > kind. > If a large number of the machines of the first kind have failed, then > we reject the Null that they have the same failure distribution. > > We don't have time to wait until all machines fail. So we need to do > "very small" sample statistics. > > Wilcoxon-type precedence tests. > > > Josef > It's Big Data, if we have big machines. > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Mon Jan 13 01:46:26 2014 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Mon, 13 Jan 2014 01:46:26 -0500 Subject: [SciPy-User] Semi-OT: weekend puzzle enumerating number of balls in bins In-Reply-To: References: Message-ID: On Mon, Jan 13, 2014 at 1:32 AM, Mike Timonin wrote: > Hey Josef, > > If you just need the number of possible allocations, have a look at this > > > http://math.stackexchange.com/questions/382935/combinatorics-distribution-number-of-integer-solutions-concept-explanation > > Regards, > Mikhail > See also http://en.wikipedia.org/wiki/Composition_%28combinatorics%29 >From the comment there about counting the number of weak compositions, you can find that the number of possible assignments for your problem is comb(nobs + rr, rr). Warren On 13 Jan 2014 03:28, wrote: > >> nobs = 5 >> rr = 2 >> >> We have 5 balls and 2 bins. >> We can put as many balls as we want (0 to 5) in the first bin. >> Then we put as many balls as we want (and are available) in the second >> bin. >> And so on if we had more bins. >> >> I'd like to enumerate all possible bin allocations, in this case 21: >> >> >>> m_all >> array([[0, 0], >> [0, 1], >> [0, 2], >> [0, 3], >> [0, 4], >> [0, 5], >> [1, 0], >> [1, 1], >> [1, 2], >> [1, 3], >> [1, 4], >> [2, 0], >> [2, 1], >> [2, 2], >> [2, 3], >> [3, 0], >> [3, 1], >> [3, 2], >> [4, 0], >> [4, 1], >> [5, 0]]) >> >> >> Here is a simple way to caculate it. The number of cases gets large very >> fast. >> ----------- >> import itertools >> import numpy as np >> >> nobs = 10 >> rr = 3 >> m_all = np.array(list(itertools.ifilter(lambda ii: sum(ii)<=nobs, >> itertools.product(*[range(nobs + >> 1)]*rr)))) >> print m_all.shape >> print np.bincount(m_all.sum(1)) >> ----------- >> >> Is there a faster way to do this? >> >> I need to do some calculation on each case, so either a numpy array or >> an iterator works. >> >> >> >> The motivation: >> >> We have two kinds of machines, 10 each that we want to test for failing >> early. >> The test has the null hypothesis that the failure distributions are the >> same. >> Against a one-sided alternative that the first kind of machines fails >> earlier. >> >> We run the 20 machines and count how many machines of the first kind >> has failed by the time that we see the rth (3rd) failure of the second >> kind. >> If a large number of the machines of the first kind have failed, then >> we reject the Null that they have the same failure distribution. >> >> We don't have time to wait until all machines fail. So we need to do >> "very small" sample statistics. >> >> Wilcoxon-type precedence tests. >> >> >> Josef >> It's Big Data, if we have big machines. >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Mon Jan 13 01:46:28 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 13 Jan 2014 01:46:28 -0500 Subject: [SciPy-User] Semi-OT: weekend puzzle enumerating number of balls in bins In-Reply-To: References: Message-ID: <52D38BC4.8040803@gmail.com> I think this does what you want. (BSD license.) Note I left out argument error checking for presentational clarity. Alan def exact_partitions(n, nbins): result = [] if nbins == 1: result.append([n]) else: for n1 in range(n+1): for part in exact_partitions(n-n1,nbins-1): result.append([n1]+part) return result def all_partitions(n, nbins): result = [] for n1 in range(n+1): result += exact_partitions(n1, nbins) return result From alan.isaac at gmail.com Mon Jan 13 02:26:22 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 13 Jan 2014 02:26:22 -0500 Subject: [SciPy-User] Semi-OT: weekend puzzle enumerating number of balls in bins In-Reply-To: References: Message-ID: <52D3951E.8010003@gmail.com> On 1/13/2014 1:32 AM, Mike Timonin wrote: > If you just need the number of possible allocations, have a look at this > http://math.stackexchange.com/questions/382935/combinatorics-distribution-number-of-integer-solutions-concept-explanation Josef said enumerate but maybe he did mean count. Good point. Then we just sum(C(nbins - 1 + n1, n1) for n1 in range(n+1)) which should produce (1+n) C(n+nbins,n+1)/nbins. Alan From alan.isaac at gmail.com Mon Jan 13 02:30:24 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 13 Jan 2014 02:30:24 -0500 Subject: [SciPy-User] Semi-OT: weekend puzzle enumerating number of balls in bins In-Reply-To: References: Message-ID: <52D39610.4060403@gmail.com> On 1/13/2014 1:46 AM, Warren Weckesser wrote: > the number of possible assignments for your problem is comb(nobs + rr, rr) Yes, that's a nicer expression. Alan From marwan at pik-potsdam.de Mon Jan 13 06:46:51 2014 From: marwan at pik-potsdam.de (Norbert Marwan) Date: Mon, 13 Jan 2014 12:46:51 +0100 Subject: [SciPy-User] weave folder Message-ID: <96266250-4902-4FA9-B4F8-861C0709AD3C@pik-potsdam.de> Hello, I would like to change the directory where weave is copying and compiling the c-code. I have tried to find some information in the documentation but failed. Can anyone help? Thanks! Norbert -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 203 bytes Desc: Message signed with OpenPGP using GPGMail URL: From josef.pktd at gmail.com Mon Jan 13 08:47:26 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 13 Jan 2014 08:47:26 -0500 Subject: [SciPy-User] Semi-OT: weekend puzzle enumerating number of balls in bins In-Reply-To: <52D38BC4.8040803@gmail.com> References: <52D38BC4.8040803@gmail.com> Message-ID: On Mon, Jan 13, 2014 at 1:46 AM, Alan G Isaac wrote: > I think this does what you want. > (BSD license.) > Note I left out argument error checking > for presentational clarity. > > Alan > > > def exact_partitions(n, nbins): > result = [] > if nbins == 1: > result.append([n]) > else: > for n1 in range(n+1): > for part in exact_partitions(n-n1,nbins-1): > result.append([n1]+part) > return result > > def all_partitions(n, nbins): > result = [] > for n1 in range(n+1): > result += exact_partitions(n1, nbins) > return result Thanks Oleksandr, Alan Those look like what I need and I can try later today. Thanks to all others The count is useful to see how large the problems is, but for the actual calculations I need to be able to calculate the test statistic, mean and variances and so on for the full cases. For small cases we can use the exact distribution. For medim cases we can use permutation resampling. For large cases we can use the normal approximation. However, I also need the exact distribution to figure out what the typos in the journal article or in my code for the normal approximation are. Thanks. Josef > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From josef.pktd at gmail.com Mon Jan 13 11:38:37 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 13 Jan 2014 11:38:37 -0500 Subject: [SciPy-User] Semi-OT: weekend puzzle enumerating number of balls in bins In-Reply-To: References: <52D38BC4.8040803@gmail.com> Message-ID: On Mon, Jan 13, 2014 at 8:47 AM, wrote: > On Mon, Jan 13, 2014 at 1:46 AM, Alan G Isaac wrote: >> I think this does what you want. >> (BSD license.) >> Note I left out argument error checking >> for presentational clarity. >> >> Alan >> >> >> def exact_partitions(n, nbins): >> result = [] >> if nbins == 1: >> result.append([n]) >> else: >> for n1 in range(n+1): >> for part in exact_partitions(n-n1,nbins-1): >> result.append([n1]+part) >> return result >> >> def all_partitions(n, nbins): >> result = [] >> for n1 in range(n+1): >> result += exact_partitions(n1, nbins) >> return result > > > Thanks Oleksandr, Alan > Those look like what I need and I can try later today. To report back for nobs = 10 rr = 7 which is the example in the journal article: Alan's is 80 times faster than my itertools version. Oleksandr's is twice as fast as Alans The result is the same in all three versions, based on the statistics I need to calculate. Thanks again. Will end up in statsmodels as soon as I have figured out what the typos are (in my code or in the journal articles). Josef > > Thanks to all others > The count is useful to see how large the problems is, but for the > actual calculations I need to be able to calculate the test statistic, > mean and variances and so on for the full cases. > > For small cases we can use the exact distribution. > For medim cases we can use permutation resampling. > For large cases we can use the normal approximation. > > However, I also need the exact distribution to figure out what the > typos in the journal article or in my code for the normal > approximation are. > > Thanks. > > Josef > > >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user From josef.pktd at gmail.com Mon Jan 13 12:06:39 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 13 Jan 2014 12:06:39 -0500 Subject: [SciPy-User] Semi-OT: weekend puzzle enumerating number of balls in bins In-Reply-To: References: <52D38BC4.8040803@gmail.com> Message-ID: On Mon, Jan 13, 2014 at 11:38 AM, wrote: > On Mon, Jan 13, 2014 at 8:47 AM, wrote: >> On Mon, Jan 13, 2014 at 1:46 AM, Alan G Isaac wrote: >>> I think this does what you want. >>> (BSD license.) >>> Note I left out argument error checking >>> for presentational clarity. >>> >>> Alan >>> >>> >>> def exact_partitions(n, nbins): >>> result = [] >>> if nbins == 1: >>> result.append([n]) >>> else: >>> for n1 in range(n+1): >>> for part in exact_partitions(n-n1,nbins-1): >>> result.append([n1]+part) >>> return result >>> >>> def all_partitions(n, nbins): >>> result = [] >>> for n1 in range(n+1): >>> result += exact_partitions(n1, nbins) >>> return result >> >> >> Thanks Oleksandr, Alan >> Those look like what I need and I can try later today. > > To report back > > for nobs = 10 rr = 7 which is the example in the journal article: > Alan's is 80 times faster than my itertools version. > Oleksandr's is twice as fast as Alans if rr is small relative to nobs, then Alan's is just a little bit slower than Oleksandr's. And I should have checked Warren's count formula before trying examples that break with MemoryError. :) Thanks, Josef > > The result is the same in all three versions, based on the statistics > I need to calculate. > > Thanks again. > > Will end up in statsmodels as soon as I have figured out what the > typos are (in my code or in the journal articles). > > Josef > > >> >> Thanks to all others >> The count is useful to see how large the problems is, but for the >> actual calculations I need to be able to calculate the test statistic, >> mean and variances and so on for the full cases. >> >> For small cases we can use the exact distribution. >> For medim cases we can use permutation resampling. >> For large cases we can use the normal approximation. >> >> However, I also need the exact distribution to figure out what the >> typos in the journal article or in my code for the normal >> approximation are. >> >> Thanks. >> >> Josef >> >> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user From alan.isaac at gmail.com Mon Jan 13 12:45:16 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 13 Jan 2014 12:45:16 -0500 Subject: [SciPy-User] Semi-OT: weekend puzzle enumerating number of balls in bins In-Reply-To: References: <52D38BC4.8040803@gmail.com> Message-ID: <52D4262C.6060500@gmail.com> On 1/13/2014 11:38 AM, josef.pktd at gmail.com wrote: > Oleksandr's is twice as fast as Alans That's due to memoization. I put a memoized version here: http://econpy.googlecode.com/svn/trunk/pytrix/pytrix.py (Renamed `ordered_partitions` and `ordered_subpartitions`.) This will make even more difference with larger problems. Alan From bwoods at aer.com Mon Jan 13 14:49:57 2014 From: bwoods at aer.com (Bryan Woods) Date: Mon, 13 Jan 2014 14:49:57 -0500 Subject: [SciPy-User] calculating numerous linear regressions quickly Message-ID: <52D44365.4060602@aer.com> Given some geospatial grid with a time dimension V[t, lat, lon], I want to compute the trend at each spatial point in the domain. Essentially I am trying to compute many linear regressions in the form: y = mx+b where y is the predicted value of V, x is the time coordinate array. The coordinates t, lat, lon at all equispaced 1-D arrays, so the predictor (x, or t) will be the same for each regression. I want to gather the regression coefficients (m,b), correlation, and p-value for the temporal trend at each spatial point. This can be directly accomplished by repeatedly calling stats.linregress inside of a loop for every [lat, lon] point in the domain, but it is not efficient. The challenge is that I need to compute a lot of them quickly and a python loop is proving very slow. I feel like there should be some version of stats.linregress that accepts and returns multidimensional without being forced into using a python loop. Suggestions? Thanks, Bryan -------------- next part -------------- A non-text attachment was scrubbed... Name: bwoods.vcf Type: text/x-vcard Size: 341 bytes Desc: not available URL: From dshean at gmail.com Mon Jan 13 15:02:49 2014 From: dshean at gmail.com (David Shean) Date: Mon, 13 Jan 2014 12:02:49 -0800 Subject: [SciPy-User] calculating numerous linear regressions quickly In-Reply-To: <52D44365.4060602@aer.com> References: <52D44365.4060602@aer.com> Message-ID: <7009BAC4-5E9D-4003-AE2A-C6CD8F90B941@gmail.com> Hi Bryan, The discussion/links here might be useful: http://stackoverflow.com/questions/20343500/efficient-1d-linear-regression-for-each-element-of-3d-numpy-array -David On Jan 13, 2014, at 11:49 AM, Bryan Woods wrote: > Given some geospatial grid with a time dimension V[t, lat, lon], I want to compute the trend at each spatial point in the domain. Essentially I am trying to compute many linear regressions in the form: > y = mx+b > where y is the predicted value of V, x is the time coordinate array. The coordinates t, lat, lon at all equispaced 1-D arrays, so the predictor (x, or t) will be the same for each regression. I want to gather the regression coefficients (m,b), correlation, and p-value for the temporal trend at each spatial point. This can be directly accomplished by repeatedly calling stats.linregress inside of a loop for every [lat, lon] point in the domain, but it is not efficient. > > The challenge is that I need to compute a lot of them quickly and a python loop is proving very slow. I feel like there should be some version of stats.linregress that accepts and returns multidimensional without being forced into using a python loop. Suggestions? > > Thanks, > Bryan > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From josef.pktd at gmail.com Mon Jan 13 15:06:24 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 13 Jan 2014 15:06:24 -0500 Subject: [SciPy-User] calculating numerous linear regressions quickly In-Reply-To: <52D44365.4060602@aer.com> References: <52D44365.4060602@aer.com> Message-ID: On Mon, Jan 13, 2014 at 2:49 PM, Bryan Woods wrote: > Given some geospatial grid with a time dimension V[t, lat, lon], I want to > compute the trend at each spatial point in the domain. Essentially I am > trying to compute many linear regressions in the form: > y = mx+b > where y is the predicted value of V, x is the time coordinate array. The > coordinates t, lat, lon at all equispaced 1-D arrays, so the predictor (x, > or t) will be the same for each regression. I want to gather the regression > coefficients (m,b), correlation, and p-value for the temporal trend at each > spatial point. This can be directly accomplished by repeatedly calling > stats.linregress inside of a loop for every [lat, lon] point in the domain, > but it is not efficient. > > The challenge is that I need to compute a lot of them quickly and a python > loop is proving very slow. I feel like there should be some version of > stats.linregress that accepts and returns multidimensional without being > forced into using a python loop. Suggestions? That can be done completely without loops. reshape the grid to 2d (t, nlat*nlong) -> Y trend = np.vander(t, 2) (m,b) = np.linalg.pinv(trend).dot(Y) and then a few more array operations to get the other statistics. I can try to do it later if needed. Josef > > Thanks, > Bryan > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From jjhelmus at gmail.com Mon Jan 13 15:09:43 2014 From: jjhelmus at gmail.com (Jonathan Helmus) Date: Mon, 13 Jan 2014 14:09:43 -0600 Subject: [SciPy-User] calculating numerous linear regressions quickly In-Reply-To: <7009BAC4-5E9D-4003-AE2A-C6CD8F90B941@gmail.com> References: <52D44365.4060602@aer.com> <7009BAC4-5E9D-4003-AE2A-C6CD8F90B941@gmail.com> Message-ID: <52D44807.8050008@gmail.com> Since the points are equally spaced and you want the coefficients of a low order polynomial you should be able to use a analytical solution to the linear least squares problem. The Savitzky-Golay filter can be used to calculate the derivatives which are related to the polynomials coefficients. S-G isn't in Scipy but there is a cookbook page in the wiki: http://wiki.scipy.org/Cookbook/SavitzkyGolay - Jonathan Helmus On 01/13/2014 02:02 PM, David Shean wrote: > Hi Bryan, > The discussion/links here might be useful: > http://stackoverflow.com/questions/20343500/efficient-1d-linear-regression-for-each-element-of-3d-numpy-array > -David > > On Jan 13, 2014, at 11:49 AM, Bryan Woods wrote: > >> Given some geospatial grid with a time dimension V[t, lat, lon], I want to compute the trend at each spatial point in the domain. Essentially I am trying to compute many linear regressions in the form: >> y = mx+b >> where y is the predicted value of V, x is the time coordinate array. The coordinates t, lat, lon at all equispaced 1-D arrays, so the predictor (x, or t) will be the same for each regression. I want to gather the regression coefficients (m,b), correlation, and p-value for the temporal trend at each spatial point. This can be directly accomplished by repeatedly calling stats.linregress inside of a loop for every [lat, lon] point in the domain, but it is not efficient. >> >> The challenge is that I need to compute a lot of them quickly and a python loop is proving very slow. I feel like there should be some version of stats.linregress that accepts and returns multidimensional without being forced into using a python loop. Suggestions? >> >> Thanks, >> Bryan >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From ralf.gommers at gmail.com Mon Jan 13 16:42:02 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 13 Jan 2014 22:42:02 +0100 Subject: [SciPy-User] weave folder In-Reply-To: <96266250-4902-4FA9-B4F8-861C0709AD3C@pik-potsdam.de> References: <96266250-4902-4FA9-B4F8-861C0709AD3C@pik-potsdam.de> Message-ID: On Mon, Jan 13, 2014 at 12:46 PM, Norbert Marwan wrote: > Hello, > > I would like to change the directory where weave is copying and compiling > the c-code. I have tried to find some information in the documentation but > failed. Can anyone help? > There is no good way to change that at the moment AFAIK. It's not supposed to matter. If you want to change it by altering scipy.weave itself, look for the `module_dir` variable. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From fboulogne at sciunto.org Mon Jan 13 17:10:40 2014 From: fboulogne at sciunto.org (=?ISO-8859-1?Q?Fran=E7ois_Boulogne?=) Date: Mon, 13 Jan 2014 17:10:40 -0500 Subject: [SciPy-User] calculating numerous linear regressions quickly In-Reply-To: <52D44807.8050008@gmail.com> References: <52D44365.4060602@aer.com> <7009BAC4-5E9D-4003-AE2A-C6CD8F90B941@gmail.com> <52D44807.8050008@gmail.com> Message-ID: <52D46460.7060604@sciunto.org> Le 13/01/2014 15:09, Jonathan Helmus a ?crit : > Since the points are equally spaced and you want the coefficients of a > low order polynomial you should be able to use a analytical solution to > the linear least squares problem. The Savitzky-Golay filter can be used > to calculate the derivatives which are related to the polynomials > coefficients. S-G isn't in Scipy but there is a cookbook page in the > wiki: http://wiki.scipy.org/Cookbook/SavitzkyGolay It's in master actually: https://github.com/scipy/scipy/blob/ed6b0fbd339fa3b48e56116760e2fba8d583fcaa/scipy/signal/_savitzky_golay.py Best. Fran?ois. -- Fran?ois Boulogne. http://www.sciunto.org GPG fingerprint: 25F6 C971 4875 A6C1 EDD1 75C8 1AA7 216E 32D5 F22F From josef.pktd at gmail.com Mon Jan 13 20:59:56 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 13 Jan 2014 20:59:56 -0500 Subject: [SciPy-User] calculating numerous linear regressions quickly In-Reply-To: References: <52D44365.4060602@aer.com> Message-ID: On Mon, Jan 13, 2014 at 3:06 PM, wrote: > On Mon, Jan 13, 2014 at 2:49 PM, Bryan Woods wrote: >> Given some geospatial grid with a time dimension V[t, lat, lon], I want to >> compute the trend at each spatial point in the domain. Essentially I am >> trying to compute many linear regressions in the form: >> y = mx+b >> where y is the predicted value of V, x is the time coordinate array. The >> coordinates t, lat, lon at all equispaced 1-D arrays, so the predictor (x, >> or t) will be the same for each regression. I want to gather the regression >> coefficients (m,b), correlation, and p-value for the temporal trend at each >> spatial point. This can be directly accomplished by repeatedly calling >> stats.linregress inside of a loop for every [lat, lon] point in the domain, >> but it is not efficient. >> >> The challenge is that I need to compute a lot of them quickly and a python >> loop is proving very slow. I feel like there should be some version of >> stats.linregress that accepts and returns multidimensional without being >> forced into using a python loop. Suggestions? > > That can be done completely without loops. > > reshape the grid to 2d (t, nlat*nlong) -> Y > trend = np.vander(t, 2) > (m,b) = np.linalg.pinv(trend).dot(Y) > > and then a few more array operations to get the other statistics. > > I can try to do it later if needed. some quickly written version this is for general x matrix as long as it is common to all regression That's pretty much how statsmodels calculates OLS (sm.OLS is not vectorized but has many other goodies instead). A version that uses that there is only one slope coefficient might be a bit faster, but I don't have that memorized. That would be copying linregress and vectorizing it. Josef ------------------------------- # -*- coding: utf-8 -*- """multivariate OLS, vectorized independent OLS with common exog Created on Mon Jan 13 19:33:44 2014 Author: Josef Perktold """ import numpy as np from statsmodels.regression.linear_model import OLS nobs = 20 k_y = 30 # generate trend trend = np.linspace(-1, 1, nobs) x = np.vander(trend, 2) assert x.shape == (nobs, 2) # generate random sample beta = 1 + np.random.randn(2, k_y) y_true = np.dot(x, beta) y = y_true + 0.5 * np.random.randn(*y_true.shape) ######## estimate # x is common design matrix (nobs, k_vars) # y are independent/response variables (nobs, k_y) xpinv = np.linalg.pinv(x) params = xpinv.dot(y) # get some additional results xxinv = np.dot(xpinv, xpinv.T) fitted = np.dot(x, params) resid = y - fitted y_mean = y.mean(0) tss = ((y - y_mean)**2).sum(0) rss = (resid**2).sum(0) r_squared = 1 - rss / tss df_resid = nobs - 2 mse = rss / df_resid bse = np.sqrt(np.diag(xxinv)[:, None] * mse) # standard error of params ######## # reference statsmodels OLS from collections import defaultdict as DefaultDict results = DefaultDict(list) for yj in y.T: res = OLS(yj, x).fit() results['params'].append(res.params) results['bse'].append(res.bse) results['r_squared'].append(res.rsquared) results['mse'].append(res.mse_resid) print '\nparameters' print np.column_stack((params.T, results['params'])) print '\nstandard error of parameter estimates' print np.column_stack((bse.T, results['bse'])) print '\nR-squared' print np.column_stack((r_squared.T, results['r_squared'])) from numpy.testing import assert_allclose assert_allclose(params.T, results['params'], rtol=1e-13) assert_allclose(bse.T, results['bse'], rtol=1e-13) assert_allclose(r_squared.T, results['r_squared'], rtol=1e-13) --------------- > > Josef > >> >> Thanks, >> Bryan >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> From Jerome.Kieffer at esrf.fr Tue Jan 14 01:26:34 2014 From: Jerome.Kieffer at esrf.fr (Jerome Kieffer) Date: Tue, 14 Jan 2014 07:26:34 +0100 Subject: [SciPy-User] calculating numerous linear regressions quickly In-Reply-To: <52D46460.7060604@sciunto.org> References: <52D44365.4060602@aer.com> <7009BAC4-5E9D-4003-AE2A-C6CD8F90B941@gmail.com> <52D44807.8050008@gmail.com> <52D46460.7060604@sciunto.org> Message-ID: <20140114072634.760f257a5ca01f8e1f8dbd6d@esrf.fr> On Mon, 13 Jan 2014 17:10:40 -0500 Fran?ois Boulogne wrote: > It's in master actually: > https://github.com/scipy/scipy/blob/ed6b0fbd339fa3b48e56116760e2fba8d583fcaa/scipy/signal/_savitzky_golay.py Hi, I was wondering if the 2D-Savitsly-Golay were of some interest for some of us: http://research.microsoft.com/en-us/um/people/jckrumm/SavGol/SavGol.htm I never used them personnaly but I often use the 1D version and I found the idea interesting for cheap de-noising continuous signal. Cheers, -- J?r?me Kieffer Data analysis unit - ESRF From marwan at pik-potsdam.de Tue Jan 14 03:24:34 2014 From: marwan at pik-potsdam.de (Norbert Marwan) Date: Tue, 14 Jan 2014 09:24:34 +0100 Subject: [SciPy-User] weave folder In-Reply-To: References: <96266250-4902-4FA9-B4F8-861C0709AD3C@pik-potsdam.de> Message-ID: Hi Ralf, thanks for the hint. But I cannot find any proper place where to change this module_dir variable. My guess is that I should do it in catalog.py, but I'm completely unsure where and whether this is right. BEst Norbert Am 13. Jan. 2014 um 22:42 schrieb Ralf Gommers : > > > > On Mon, Jan 13, 2014 at 12:46 PM, Norbert Marwan wrote: > Hello, > > I would like to change the directory where weave is copying and compiling the c-code. I have tried to find some information in the documentation but failed. Can anyone help? > > There is no good way to change that at the moment AFAIK. It's not supposed to matter. If you want to change it by altering scipy.weave itself, look for the `module_dir` variable. > > Ralf > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 203 bytes Desc: Message signed with OpenPGP using GPGMail URL: From ralf.gommers at gmail.com Tue Jan 14 15:13:54 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 14 Jan 2014 21:13:54 +0100 Subject: [SciPy-User] weave folder In-Reply-To: References: <96266250-4902-4FA9-B4F8-861C0709AD3C@pik-potsdam.de> Message-ID: On Tue, Jan 14, 2014 at 9:24 AM, Norbert Marwan wrote: > Hi Ralf, > > thanks for the hint. But I cannot find any proper place where to change > this module_dir variable. My guess is that I should do it in catalog.py, > but I'm completely unsure where and whether this is right. > For weave.inline I think here: https://github.com/scipy/scipy/blob/master/scipy/weave/inline_tools.py#L351 Ralf > > BEst > Norbert > > Am 13. Jan. 2014 um 22:42 schrieb Ralf Gommers : > > > > > On Mon, Jan 13, 2014 at 12:46 PM, Norbert Marwan wrote: > >> Hello, >> >> I would like to change the directory where weave is copying and compiling >> the c-code. I have tried to find some information in the documentation but >> failed. Can anyone help? >> > > There is no good way to change that at the moment AFAIK. It's not > supposed to matter. If you want to change it by altering scipy.weave > itself, look for the `module_dir` variable. > > Ralf > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kumar.sachin at yandex.com Tue Jan 14 18:54:21 2014 From: kumar.sachin at yandex.com (Sachin Kumar) Date: Wed, 15 Jan 2014 05:24:21 +0530 Subject: [SciPy-User] Number of equations and precision of integrate.odeint() In-Reply-To: References: <18531389121004@web22g.yandex.ru> Message-ID: <26911389743661@web26g.yandex.ru> On 10/01/2014 at 06:07, "Rob Clewley" wrote: > A pure mathematical standard regarding the dense solution curves of > these ODEs is not realistic because on a computer you are dealing with > a finite algorithm for approximating solutions. odeint is a > variable-step solver. If you give it several variables, coupled or > not, it will choose a set of time steps that it finds most appropriate > for all of the variables. No doubt, the fact that the rates of change > of the two variables differ on your domain means that it chooses > different time steps and therefore yields slightly different numerical > solutions. The errors will necessarily be different. A numerical > analysis textbook will give you a better idea why, but wikipedia > explains it pretty well (e.g., look up Runge-Kutta). Thanks, that really helped. -- Sachin Kumar From josef.pktd at gmail.com Tue Jan 14 19:33:27 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 14 Jan 2014 19:33:27 -0500 Subject: [SciPy-User] Semi-OT: weekend puzzle enumerating number of balls in bins In-Reply-To: References: Message-ID: On Sun, Jan 12, 2014 at 10:28 PM, wrote: > nobs = 5 > rr = 2 > > We have 5 balls and 2 bins. > We can put as many balls as we want (0 to 5) in the first bin. > Then we put as many balls as we want (and are available) in the second bin. > And so on if we had more bins. > > I'd like to enumerate all possible bin allocations, in this case 21: > >>>> m_all > array([[0, 0], > [0, 1], > [0, 2], > [0, 3], > [0, 4], > [0, 5], > [1, 0], > [1, 1], > [1, 2], > [1, 3], > [1, 4], > [2, 0], > [2, 1], > [2, 2], > [2, 3], > [3, 0], > [3, 1], > [3, 2], > [4, 0], > [4, 1], > [5, 0]]) > > > Here is a simple way to caculate it. The number of cases gets large very fast. > ----------- > import itertools > import numpy as np > > nobs = 10 > rr = 3 > m_all = np.array(list(itertools.ifilter(lambda ii: sum(ii)<=nobs, > itertools.product(*[range(nobs + 1)]*rr)))) > print m_all.shape > print np.bincount(m_all.sum(1)) > ----------- > > Is there a faster way to do this? > > I need to do some calculation on each case, so either a numpy array or > an iterator works. > > > > The motivation: > > We have two kinds of machines, 10 each that we want to test for failing early. > The test has the null hypothesis that the failure distributions are the same. > Against a one-sided alternative that the first kind of machines fails earlier. > > We run the 20 machines and count how many machines of the first kind > has failed by the time that we see the rth (3rd) failure of the second > kind. > If a large number of the machines of the first kind have failed, then > we reject the Null that they have the same failure distribution. I think this sentence is wrong, we reject when the test statistic is small. > > We don't have time to wait until all machines fail. So we need to do > "very small" sample statistics. > > Wilcoxon-type precedence tests. In case anyone is interested, I pushed a first complete version https://github.com/josef-pkt/statsmodels/compare/precedence#diff-459f12a0fad805d03bf9b3d78cb5ee90R218 It's almost a multivariate distribution class, except that we are only interested in the distributions of a few real functions on it. It contains Oleksandr's solution, I haven't tried yet Alan's updated solution. Josef > > > Josef > It's Big Data, if we have big machines. From josef.pktd at gmail.com Wed Jan 15 20:00:48 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 15 Jan 2014 20:00:48 -0500 Subject: [SciPy-User] integrate.quad out of center Message-ID: FYI or JAR: integrate.quad with infinite bounds doesn't always find the relevant range. >>> f = lambda x: float((100 <= x) & (x <= 250)) >>> integrate.quad(f, -np.inf, np.inf) (0.0, 0.0) >>> integrate.quad(f, 0, 500) (150.00000000000006, 1.6653345369377353e-13) Josef JAR: http://i.imgur.com/pspS3hj.jpg - Don't forget to feed the ...! - I know, I know. From jeremy at jeremysanders.net Thu Jan 16 03:30:32 2014 From: jeremy at jeremysanders.net (Jeremy Sanders) Date: Thu, 16 Jan 2014 09:30:32 +0100 Subject: [SciPy-User] ANN: Veusz 1.20 Message-ID: Veusz 1.20 ---------- http://home.gna.org/veusz/ Veusz is a scientific plotting package. It is designed to produce publication-ready Postscript, PDF or SVG output. Graphs are built-up by combining plotting widgets. The user interface aims to be simple, consistent and powerful. Veusz provides GUI, Python module, command line, scripting, DBUS and SAMP interfaces to its plotting facilities. It also allows for manipulation and editing of datasets. Data can be captured from external sources such as Internet sockets or other programs. Changes in 1.20: * Add HDF5 file data import * Allow expressions to be edited for linked 2D datasets * Add support for 2D datasets with irregular gridpoints * Add 2D data CSV import * Allow safe renaming of linked datasets * Support importing text from FITS files Bug fixes: * When capturing data from a file/named pipe, do not stop when no more data are available * Fixes mangling of text in saved files using Windows binary * Fix encoding for standard file import * Fix FITS import for python3 Features of package: Plotting features: * X-Y plots (with errorbars) * Line and function plots * Contour plots * Images (with colour mappings and colorbars) * Stepped plots (for histograms) * Bar graphs * Vector field plots * Box plots * Polar plots * Ternary plots * Plotting dates * Fitting functions to data * Stacked plots and arrays of plots * Nested plots * Plot keys * Plot labels * Shapes and arrows on plots * LaTeX-like formatting for text * Multiple axes * Axes with steps in axis scale (broken axes) * Axis scales using functional forms * Plotting functions of datasets Input and output: * EPS/PDF/PNG/SVG/EMF export * Dataset creation/manipulation * Embed Veusz within other programs * Text, HDF5, CSV, FITS, NPY/NPZ, QDP, binary and user-plugin importing * Data can be captured from external sources Extending: * Use as a Python module * User defined functions, constants and can import external Python functions * Plugin interface to allow user to write or load code to - import data using new formats - make new datasets, optionally linked to existing datasets - arbitrarily manipulate the document * Scripting interface * Control with DBUS and SAMP Other features: * Data picker * Interactive tutorial * Multithreaded rendering Requirements for source install: Python 2.x (2.6 or greater required) or 3.x (3.3 or greater required) http://www.python.org/ Qt >= 4.6 (free edition) http://www.trolltech.com/products/qt/ PyQt >= 4.5 (SIP is required to be installed first) http://www.riverbankcomputing.co.uk/software/pyqt/ http://www.riverbankcomputing.co.uk/software/sip/ numpy >= 1.0 http://numpy.scipy.org/ Optional requirements: h5py (optional for HDF5 support) http://www.h5py.org/ astropy >= 0.2 or PyFITS >= 1.1 (optional for FITS import) http://www.stsci.edu/resources/software_hardware/pyfits http://www.astropy.org/ pyemf >= 2.0.0 (optional for EMF export) http://pyemf.sourceforge.net/ PyMinuit >= 1.1.2 (optional improved fitting) http://code.google.com/p/pyminuit/ dbus-python, for dbus interface http://dbus.freedesktop.org/doc/dbus-python/ astropy (optional for VO table import) http://www.astropy.org/ SAMPy (optional for SAMP support) http://pypi.python.org/pypi/sampy/ Veusz is Copyright (C) 2003-2014 Jeremy Sanders and contributors. It is licenced under the GPL (version 2 or greater). For documentation on using Veusz, see the "Documents" directory. The manual is in PDF, HTML and text format (generated from docbook). The examples are also useful documentation. Please also see and contribute to the Veusz wiki: https://github.com/jeremysanders/veusz/wiki Issues with the current version: * Due to a bug in the Qt XML processing, some MathML elements containing purely white space (e.g. thin space) will give an error. If you enjoy using Veusz, we would love to hear from you. Please join the mailing lists at https://gna.org/mail/?group=veusz to discuss new features or if you'd like to contribute code. The latest code can always be found in the Git repository at https://github.com/jeremysanders/veusz.git. From arun.gokule at gmail.com Thu Jan 16 03:40:19 2014 From: arun.gokule at gmail.com (Arun Gokule) Date: Thu, 16 Jan 2014 00:40:19 -0800 Subject: [SciPy-User] ANN: Sense, a new cloud platform for data analysis In-Reply-To: References: Message-ID: Great, but does it blend? On Fri, Jan 10, 2014 at 3:36 PM, Anand Patil < anand.prabhakar.patil at gmail.com> wrote: > Hi everyone, > > As a longtime fan of the SciPy/NumPy ecosystem, I'm happy to announce > Sense, a new cloud platform for data analysis that has the Python data > stack built in. > > For Python programmers, the core Sense experience is a browser-based > console-and-editor environment with some nifty extra features. You can use > IPython's rich display system right in the console, provision and dismiss > worker machines to distribute computation, share code with colleagues > without requiring them to install anything, and much more. Sense is a > serious Python environment where you can install packages and run code on > powerful machines. > > Our long-term goal is to make Sense the best possible cloud teammate to > Python and the rest of the world's awesome open-source data tools. If > you're interested in learning more, please watch our demo at > https://senseplatform.com and sign up for the beta. Feedback, questions, > comments etc. are very welcome. > > Anand Patil > Tristan Zajonc > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anand.prabhakar.patil at gmail.com Thu Jan 16 11:38:18 2014 From: anand.prabhakar.patil at gmail.com (Anand Patil) Date: Thu, 16 Jan 2014 08:38:18 -0800 Subject: [SciPy-User] ANN: Sense, a new cloud platform for data analysis In-Reply-To: References: Message-ID: It does reduce mixing time. On Thu, Jan 16, 2014 at 12:40 AM, Arun Gokule wrote: > Great, but does it blend? > > > On Fri, Jan 10, 2014 at 3:36 PM, Anand Patil < > anand.prabhakar.patil at gmail.com> wrote: > >> Hi everyone, >> >> As a longtime fan of the SciPy/NumPy ecosystem, I'm happy to announce >> Sense, a new cloud platform for data analysis that has the Python data >> stack built in. >> >> For Python programmers, the core Sense experience is a browser-based >> console-and-editor environment with some nifty extra features. You can use >> IPython's rich display system right in the console, provision and dismiss >> worker machines to distribute computation, share code with colleagues >> without requiring them to install anything, and much more. Sense is a >> serious Python environment where you can install packages and run code on >> powerful machines. >> >> Our long-term goal is to make Sense the best possible cloud teammate to >> Python and the rest of the world's awesome open-source data tools. If >> you're interested in learning more, please watch our demo at >> https://senseplatform.com and sign up for the beta. Feedback, questions, >> comments etc. are very welcome. >> >> Anand Patil >> Tristan Zajonc >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Jan 18 18:29:25 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 18 Jan 2014 23:29:25 +0000 Subject: [SciPy-User] [ANN] scikits.sparse v0.2 released Message-ID: Hi all, I've just released v0.2 of scikits.sparse, the scipy.sparse-compatible CHOLMOD wrapper. I no longer use this package and don't have time to maintain it, so if anyone is interested in taking it over please get in touch. Highlights of this release: * Factor solve methods now return 1d output for 1d input (just like np.dot does). * Factor.solve_P() and Factor.solve_P() deprecated; use Factor.apply_P() and Factor.apply_Pt() instead. * New methods for computing determinants of positive-definite matrices: Factor.det(), Factor.logdet(), Factor.slogdet(). * New method for explicitly computing inverse of a positive-definite matrix: Factor.inv(). Factor.D() has much better implementation. * Build system improvements. * Wrapper code re-licensed under BSD terms. Downloads: https://pypi.python.org/pypi/scikits.sparse Code: https://github.com/njsmith/scikits-sparse Manual: http://packages.python.org/scikits.sparse/ Share and enjoy, -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From jonas-v at gmx.net Sun Jan 19 19:25:08 2014 From: jonas-v at gmx.net (Jonas V) Date: Mon, 20 Jan 2014 01:25:08 +0100 Subject: [SciPy-User] Error estimation of integration via numpy.trapz Message-ID: <52DC6CE4.6060606@gmx.net> Dear all, I integrate samples f(x) whereas f and x are both 1-dimensional Numpy arrays. A = numpy.trapz(f, x=x) Is there a way to get an error estimation of this integration? Usually, the error could be estimated like this: |E| <= (xmax-xmin)^3/12 * max(|f''(x)|) So do I need to find the max(|f''(x)|) myself or is there already something in Numpy that can be used? Thank you! Kind regards, Jonas From gdmcbain at freeshell.org Mon Jan 20 00:32:14 2014 From: gdmcbain at freeshell.org (Geordie McBain) Date: Mon, 20 Jan 2014 16:32:14 +1100 Subject: [SciPy-User] Error estimation of integration via numpy.trapz In-Reply-To: <52DC6CE4.6060606@gmx.net> References: <52DC6CE4.6060606@gmx.net> Message-ID: Le 20 janv. 2014 11:25, "Jonas V" a ?crit : > > Dear all, > > I integrate samples f(x) whereas f and x are both 1-dimensional Numpy > arrays. > > A = numpy.trapz(f, x=x) > > Is there a way to get an error estimation of this integration? > > Usually, the error could be estimated like this: > |E| <= (xmax-xmin)^3/12 * max(|f''(x)|) > > So do I need to find the max(|f''(x)|) myself or is there already > something in Numpy that can be used? One way to get an idea of the error in numerical quadrature is to repeat at a longer step; e.g. compare trapz (f) with trapz (f [::2]) * 2 and trapz ([1::2]) * 2. This doesn't require derivatives and only uses tools available in NumPy. > > Thank you! > > Kind regards, > Jonas > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rcsqtc at iqac.csic.es Wed Jan 22 05:46:31 2014 From: rcsqtc at iqac.csic.es (Ramon Crehuet) Date: Wed, 22 Jan 2014 11:46:31 +0100 Subject: [SciPy-User] Standard error of the mean for weighted data Message-ID: <52DFA187.8090403@iqac.csic.es> Dear all, I would like to calculate the standard error of the mean for data values that each has some (normalized) weight. I guess this cannot be done with scipy.stats.sem... I thought of coding that, but I'm afraid I don't know what to code! For weighted data, the SEM cannot be std/sqrt(N), even if the std is calculated from the weighted data as explained here: http://en.wikipedia.org/wiki/Weighted_arithmetic_mean Imagine I have 1000 values, but only 2 have weights different from zero. It makes no sense to divide the weighted std by sqrt(1000). Right? Any help or suggestion is welcome. (I also looked at scikits.bootstrap but I don't think I can define an array of weights anywhere). Thanks in advance, Ramon From kevinkunzmann at gmx.net Wed Jan 22 07:29:22 2014 From: kevinkunzmann at gmx.net (Kevin Kunzmann) Date: Wed, 22 Jan 2014 13:29:22 +0100 Subject: [SciPy-User] Standard error of the mean for weighted data In-Reply-To: <52DFA187.8090403@iqac.csic.es> References: <52DFA187.8090403@iqac.csic.es> Message-ID: <52DFB9A2.4020801@gmx.net> Hey, er, why not use the weighted sample variance? Same Wiki site, lil' further down. Take care when using deriving the std from that, as the estimator is no longer unbiased, cheers, Kevin On 22.01.2014 11:46, Ramon Crehuet wrote: > Dear all, > I would like to calculate the standard error of the mean for data values that > each has some (normalized) weight. I guess this cannot be done with > scipy.stats.sem... > I thought of coding that, but I'm afraid I don't know what to code! For weighted > data, the SEM cannot be std/sqrt(N), even if the std is calculated from the > weighted data as explained here: > http://en.wikipedia.org/wiki/Weighted_arithmetic_mean > Imagine I have 1000 values, but only 2 have weights different from zero. It > makes no sense to divide the weighted std by sqrt(1000). Right? > Any help or suggestion is welcome. (I also looked at scikits.bootstrap but I > don't think I can define an array of weights anywhere). > Thanks in advance, > Ramon > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From josef.pktd at gmail.com Wed Jan 22 09:08:02 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 22 Jan 2014 09:08:02 -0500 Subject: [SciPy-User] Standard error of the mean for weighted data In-Reply-To: <52DFB9A2.4020801@gmx.net> References: <52DFA187.8090403@iqac.csic.es> <52DFB9A2.4020801@gmx.net> Message-ID: On Wed, Jan 22, 2014 at 7:29 AM, Kevin Kunzmann wrote: > Hey, > > er, why not use the weighted sample variance? Same Wiki site, lil' > further down. Take care when using deriving the std from that, as the > estimator is no longer unbiased, > > cheers, > > Kevin > > On 22.01.2014 11:46, Ramon Crehuet wrote: >> Dear all, >> I would like to calculate the standard error of the mean for data values that >> each has some (normalized) weight. I guess this cannot be done with >> scipy.stats.sem... >> I thought of coding that, but I'm afraid I don't know what to code! For weighted >> data, the SEM cannot be std/sqrt(N), even if the std is calculated from the >> weighted data as explained here: >> http://en.wikipedia.org/wiki/Weighted_arithmetic_mean >> Imagine I have 1000 values, but only 2 have weights different from zero. It >> makes no sense to divide the weighted std by sqrt(1000). Right? >> Any help or suggestion is welcome. (I also looked at scikits.bootstrap but I >> don't think I can define an array of weights anywhere). >> Thanks in advance, two ways using statsmodels >>> import numpy as np >>> import statsmodels.api as sm >>> nobs=100 >>> x = 1 + np.random.randn(nobs) >>> w = np.random.chisquare(5, size=nobs) >>> res = sm.WLS(x, np.ones(nobs), weights=w).fit() >>> res.params array([ 1.22607483]) >>> res.bse array([ 0.09177795]) need to normalize weights to the number of observations: >>> ws = sm.stats.DescrStatsW(x, weights = w / w.sum() * nobs) >>> ws.mean 1.2260748286171022 >>> ws.std_mean 0.09177794656529388 >>> ws.ttest_mean() (13.359144266153654, 6.9422438873768692e-24, 99.0) >>> res.tvalues, res.pvalues (array([ 13.35914427]), array([ 6.94224389e-24])) >>> ws.ttest_mean(value=1) (2.4632805273788185, 0.01549276757842121, 99.0) >>> tt = res.t_test(r_matrix=[1], q_matrix=[1]) >>> tt.tvalue, tt.pvalue (array([[ 2.46328053]]), array(0.015492767578421171)) http://statsmodels.sourceforge.net/devel/stats.html#basic-statistics-and-t-tests-with-frequency-weights Josef >> Ramon >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From fboulogne at sciunto.org Wed Jan 22 09:10:24 2014 From: fboulogne at sciunto.org (=?ISO-8859-1?Q?Fran=E7ois_Boulogne?=) Date: Wed, 22 Jan 2014 09:10:24 -0500 Subject: [SciPy-User] calculating numerous linear regressions quickly In-Reply-To: <20140114072634.760f257a5ca01f8e1f8dbd6d@esrf.fr> References: <52D44365.4060602@aer.com> <7009BAC4-5E9D-4003-AE2A-C6CD8F90B941@gmail.com> <52D44807.8050008@gmail.com> <52D46460.7060604@sciunto.org> <20140114072634.760f257a5ca01f8e1f8dbd6d@esrf.fr> Message-ID: <52DFD150.5030506@sciunto.org> Le 14/01/2014 01:26, Jerome Kieffer a ?crit : > > Hi, > > I was wondering if the 2D-Savitsly-Golay were of some interest for some of us: > http://research.microsoft.com/en-us/um/people/jckrumm/SavGol/SavGol.htm > > I never used them personnaly but I often use the 1D version and I found the idea interesting for cheap de-noising continuous signal. > > Cheers, > I think it would be a great addition and definitely useful! Cheers. -- Fran?ois Boulogne. http://www.sciunto.org GPG fingerprint: 25F6 C971 4875 A6C1 EDD1 75C8 1AA7 216E 32D5 F22F From matthew.brett at gmail.com Thu Jan 23 18:34:44 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 23 Jan 2014 15:34:44 -0800 Subject: [SciPy-User] SciPy-User Digest, Vol 120, Issue 12 In-Reply-To: <52137F9C.5000802@imperial.ac.uk> References: <52137F9C.5000802@imperial.ac.uk> Message-ID: Hi, On Tue, Aug 20, 2013 at 7:39 AM, Michal Romaniuk wrote: > Hi, > >> Hi, >> >> On Mon, Aug 19, 2013 at 7:44 AM, Michal Romaniuk >> wrote: >>> Hi, >>> >>> I'm saving a large batch of data using savemat and although I get no >>> errors, the files produced are not readable for either matlab or scipy. >>> Is there a limit on file size? >> >> Ah - yes there is - the individual matrices in the mat file cannot be >> larger than 4GB. Is it possible you hit this limit? >> >> Sorry, I only realized this when Richard Llewellyn pointed this out a >> couple of weeks ago on the list: >> >> http://scipy-user.10969.n7.nabble.com/SciPy-User-scipy-io-loadmat-throws-TypeError-with-large-files-td18558.html >> >> The current scipy code has an error message for matrices that are too large. >> >> Cheers, >> >> Matthew > > Well, I managed to work around the problem to some extent by setting > do_compression=True. Now Matlab can read those files (so they must be > valid to some extent) but SciPy can't (even though they were written > with SciPy). > > I get this error: > > > PATH/lib/python2.6/site-packages/scipy/io/matlab/mio.pyc in > loadmat(file_name, mdict, appendmat, **kwargs) > 173 variable_names = kwargs.pop('variable_names', None) > 174 MR = mat_reader_factory(file_name, appendmat, **kwargs) > --> 175 matfile_dict = MR.get_variables(variable_names) > 176 if mdict is not None: > 177 mdict.update(matfile_dict) > > PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5.pyc in > get_variables(self, variable_names) > 290 continue > 291 try: > --> 292 res = self.read_var_array(hdr, process) > 293 except MatReadError, err: > 294 warnings.warn( > > PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5.pyc in > read_var_array(self, header, process) > 253 `process`. > 254 ''' > --> 255 return self._matrix_reader.array_from_header(header, > process) > 256 > 257 def get_variables(self, variable_names=None): > > PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5_utils.so in > scipy.io.matlab.mio5_utils.VarReader5.array_from_header > (scipy/io/matlab/mio5_utils.c:5401)() > > PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5_utils.so in > scipy.io.matlab.mio5_utils.VarReader5.array_from_header > (scipy/io/matlab/mio5_utils.c:4849)() > > PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5_utils.so in > scipy.io.matlab.mio5_utils.VarReader5.read_real_complex > (scipy/io/matlab/mio5_utils.c:5602)() > > ValueError: total size of new array must be unchanged > > > > The size of the main array is about 9 GB before compression, but the > compressed files are less than 500 MB and closer to 400 MB. There are > some other arrays in the file too but they are much smaller. > > Any ideas on how I could get SciPy to read this data back? Right now I > can only think of storing the data in single precision format... Sorry for this ridiculously late reply. To check - you are trying to read the files that scipy generated before the fix to raise an error for files that are too large? Or after the fix? Does matlab read the files correctly? Are you getting the error with current scipy master? Cheers, Matthew From sebastian at sipsolutions.net Fri Jan 24 05:32:24 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 24 Jan 2014 11:32:24 +0100 Subject: [SciPy-User] SciPy-User Digest, Vol 120, Issue 12 In-Reply-To: References: <52137F9C.5000802@imperial.ac.uk> Message-ID: <1390559544.7837.4.camel@sebastian-laptop> On Thu, 2014-01-23 at 15:34 -0800, Matthew Brett wrote: > Hi, > > On Tue, Aug 20, 2013 at 7:39 AM, Michal Romaniuk > wrote: > > Hi, > > > >> Hi, > >> > >> On Mon, Aug 19, 2013 at 7:44 AM, Michal Romaniuk > >> wrote: > >>> Hi, > >>> > >>> I'm saving a large batch of data using savemat and although I get no > >>> errors, the files produced are not readable for either matlab or scipy. > >>> Is there a limit on file size? > >> Hi, seems like a bug in https://github.com/scipy/scipy/blob/master/scipy/io/matlab/mio5_utils.pyx#L123 the line should use np.intp_t not int32_t. - Sebastian > >> Ah - yes there is - the individual matrices in the mat file cannot be > >> larger than 4GB. Is it possible you hit this limit? > >> > >> Sorry, I only realized this when Richard Llewellyn pointed this out a > >> couple of weeks ago on the list: > >> > >> http://scipy-user.10969.n7.nabble.com/SciPy-User-scipy-io-loadmat-throws-TypeError-with-large-files-td18558.html > >> > >> The current scipy code has an error message for matrices that are too large. > >> > >> Cheers, > >> > >> Matthew > > > > Well, I managed to work around the problem to some extent by setting > > do_compression=True. Now Matlab can read those files (so they must be > > valid to some extent) but SciPy can't (even though they were written > > with SciPy). > > > > I get this error: > > > > > > PATH/lib/python2.6/site-packages/scipy/io/matlab/mio.pyc in > > loadmat(file_name, mdict, appendmat, **kwargs) > > 173 variable_names = kwargs.pop('variable_names', None) > > 174 MR = mat_reader_factory(file_name, appendmat, **kwargs) > > --> 175 matfile_dict = MR.get_variables(variable_names) > > 176 if mdict is not None: > > 177 mdict.update(matfile_dict) > > > > PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5.pyc in > > get_variables(self, variable_names) > > 290 continue > > 291 try: > > --> 292 res = self.read_var_array(hdr, process) > > 293 except MatReadError, err: > > 294 warnings.warn( > > > > PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5.pyc in > > read_var_array(self, header, process) > > 253 `process`. > > 254 ''' > > --> 255 return self._matrix_reader.array_from_header(header, > > process) > > 256 > > 257 def get_variables(self, variable_names=None): > > > > PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5_utils.so in > > scipy.io.matlab.mio5_utils.VarReader5.array_from_header > > (scipy/io/matlab/mio5_utils.c:5401)() > > > > PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5_utils.so in > > scipy.io.matlab.mio5_utils.VarReader5.array_from_header > > (scipy/io/matlab/mio5_utils.c:4849)() > > > > PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5_utils.so in > > scipy.io.matlab.mio5_utils.VarReader5.read_real_complex > > (scipy/io/matlab/mio5_utils.c:5602)() > > > > ValueError: total size of new array must be unchanged > > > > > > > > The size of the main array is about 9 GB before compression, but the > > compressed files are less than 500 MB and closer to 400 MB. There are > > some other arrays in the file too but they are much smaller. > > > > Any ideas on how I could get SciPy to read this data back? Right now I > > can only think of storing the data in single precision format... > > Sorry for this ridiculously late reply. > > To check - you are trying to read the files that scipy generated > before the fix to raise an error for files that are too large? Or > after the fix? Does matlab read the files correctly? Are you getting > the error with current scipy master? > > Cheers, > > Matthew > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From sebastian at sipsolutions.net Fri Jan 24 05:38:47 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 24 Jan 2014 11:38:47 +0100 Subject: [SciPy-User] SciPy-User Digest, Vol 120, Issue 12 In-Reply-To: <1390559544.7837.4.camel@sebastian-laptop> References: <52137F9C.5000802@imperial.ac.uk> <1390559544.7837.4.camel@sebastian-laptop> Message-ID: <1390559927.7837.5.camel@sebastian-laptop> On Fri, 2014-01-24 at 11:32 +0100, Sebastian Berg wrote: > On Thu, 2014-01-23 at 15:34 -0800, Matthew Brett wrote: > > Hi, > > > > On Tue, Aug 20, 2013 at 7:39 AM, Michal Romaniuk > > wrote: > > > Hi, > > > > > >> Hi, > > >> > > >> On Mon, Aug 19, 2013 at 7:44 AM, Michal Romaniuk > > >> wrote: > > >>> Hi, > > >>> > > >>> I'm saving a large batch of data using savemat and although I get no > > >>> errors, the files produced are not readable for either matlab or scipy. > > >>> Is there a limit on file size? > > >> > > Hi, > > seems like a bug in > > https://github.com/scipy/scipy/blob/master/scipy/io/matlab/mio5_utils.pyx#L123 > > the line should use np.intp_t not int32_t. > Sorry didn't think that through. I bet the int32 is just the format and the 9 GiB with double doesn't indicate some overflow as I thought anyway. > - Sebastian > > > > > >> Ah - yes there is - the individual matrices in the mat file cannot be > > >> larger than 4GB. Is it possible you hit this limit? > > >> > > >> Sorry, I only realized this when Richard Llewellyn pointed this out a > > >> couple of weeks ago on the list: > > >> > > >> http://scipy-user.10969.n7.nabble.com/SciPy-User-scipy-io-loadmat-throws-TypeError-with-large-files-td18558.html > > >> > > >> The current scipy code has an error message for matrices that are too large. > > >> > > >> Cheers, > > >> > > >> Matthew > > > > > > Well, I managed to work around the problem to some extent by setting > > > do_compression=True. Now Matlab can read those files (so they must be > > > valid to some extent) but SciPy can't (even though they were written > > > with SciPy). > > > > > > I get this error: > > > > > > > > > PATH/lib/python2.6/site-packages/scipy/io/matlab/mio.pyc in > > > loadmat(file_name, mdict, appendmat, **kwargs) > > > 173 variable_names = kwargs.pop('variable_names', None) > > > 174 MR = mat_reader_factory(file_name, appendmat, **kwargs) > > > --> 175 matfile_dict = MR.get_variables(variable_names) > > > 176 if mdict is not None: > > > 177 mdict.update(matfile_dict) > > > > > > PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5.pyc in > > > get_variables(self, variable_names) > > > 290 continue > > > 291 try: > > > --> 292 res = self.read_var_array(hdr, process) > > > 293 except MatReadError, err: > > > 294 warnings.warn( > > > > > > PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5.pyc in > > > read_var_array(self, header, process) > > > 253 `process`. > > > 254 ''' > > > --> 255 return self._matrix_reader.array_from_header(header, > > > process) > > > 256 > > > 257 def get_variables(self, variable_names=None): > > > > > > PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5_utils.so in > > > scipy.io.matlab.mio5_utils.VarReader5.array_from_header > > > (scipy/io/matlab/mio5_utils.c:5401)() > > > > > > PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5_utils.so in > > > scipy.io.matlab.mio5_utils.VarReader5.array_from_header > > > (scipy/io/matlab/mio5_utils.c:4849)() > > > > > > PATH/lib/python2.6/site-packages/scipy/io/matlab/mio5_utils.so in > > > scipy.io.matlab.mio5_utils.VarReader5.read_real_complex > > > (scipy/io/matlab/mio5_utils.c:5602)() > > > > > > ValueError: total size of new array must be unchanged > > > > > > > > > > > > The size of the main array is about 9 GB before compression, but the > > > compressed files are less than 500 MB and closer to 400 MB. There are > > > some other arrays in the file too but they are much smaller. > > > > > > Any ideas on how I could get SciPy to read this data back? Right now I > > > can only think of storing the data in single precision format... > > > > Sorry for this ridiculously late reply. > > > > To check - you are trying to read the files that scipy generated > > before the fix to raise an error for files that are too large? Or > > after the fix? Does matlab read the files correctly? Are you getting > > the error with current scipy master? > > > > Cheers, > > > > Matthew > > _______________________________________________ > > SciPy-User mailing list > > SciPy-User at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-user > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From cweisiger at msg.ucsf.edu Fri Jan 24 12:13:52 2014 From: cweisiger at msg.ucsf.edu (Chris Weisiger) Date: Fri, 24 Jan 2014 09:13:52 -0800 Subject: [SciPy-User] Microscopy TIFF files Message-ID: Apologies for the somewhat off-topic nature of this post, but I'm not aware of any more-pertinent communities. Please feel free to redirect me elsewhere. Our microscope control software currently outputs files in MRC format. In the interests of interoperability with the rest of the microscope community, it would be nice if we had TIFF support as well. There exist programs that convert MRC to TIFF and vice versa, but a native solution is always preferable if only to reduce the number of steps involved. Personally I have next to no experience with the TIFF format as used in microscopy, so I'm starting out by doing some research. There exist Python libraries for writing TIFF files, of course; my concern with them is that they perform writes "atomically" (that is, the entire image data is written at once). This requires you to *have* all of the image data at the time you perform the writing, and means that for large datasets you need lots of RAM (to hold the dataset in memory) and lots of time (to write the data to disk all at once). I prefer to stream data to disk as it comes in, which greatly reduces RAM requirements and also improves performance by allowing writes to happen in parallel with data acquisition. However, I'm not aware of a Python/TIFF library that allows for this. I've taken a brief look at the TIFF spec ( http://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf ). Is this actually the same format that is used for microscopy data? I've seen OME-TIFF as well ( http://www.openmicroscopy.org/site/support/ome-model/ome-tiff/ ) but I don't know how widely it is supported (though I do see that Micro-Manager supports it). The Adobe TIFF standard specifies a limit in filesize of 4GB, which could be potentially troublesome. I don't know if OME-TIFF has similar limitations. I'd appreciate any advice on this topic. I'm missing a lot of context, I'm sure, which will make it a lot harder to implement a working and useful solution. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Jan 24 12:24:58 2014 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 24 Jan 2014 17:24:58 +0000 Subject: [SciPy-User] Microscopy TIFF files In-Reply-To: References: Message-ID: On Fri, Jan 24, 2014 at 5:13 PM, Chris Weisiger wrote: > > Apologies for the somewhat off-topic nature of this post, but I'm not aware of any more-pertinent communities. Please feel free to redirect me elsewhere. > > Our microscope control software currently outputs files in MRC format. In the interests of interoperability with the rest of the microscope community, it would be nice if we had TIFF support as well. There exist programs that convert MRC to TIFF and vice versa, but a native solution is always preferable if only to reduce the number of steps involved. Personally I have next to no experience with the TIFF format as used in microscopy, so I'm starting out by doing some research. > > There exist Python libraries for writing TIFF files, of course; my concern with them is that they perform writes "atomically" (that is, the entire image data is written at once). This requires you to *have* all of the image data at the time you perform the writing, and means that for large datasets you need lots of RAM (to hold the dataset in memory) and lots of time (to write the data to disk all at once). I prefer to stream data to disk as it comes in, which greatly reduces RAM requirements and also improves performance by allowing writes to happen in parallel with data acquisition. However, I'm not aware of a Python/TIFF library that allows for this. These are perfectly good reasons to save to your own format, MRC, and have the conversion take place as a post-processing step. The cost of that extra step is not that great, IMO. > I've taken a brief look at the TIFF spec ( http://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf ). Is this actually the same format that is used for microscopy data? I've seen OME-TIFF as well ( http://www.openmicroscopy.org/site/support/ome-model/ome-tiff/ ) but I don't know how widely it is supported (though I do see that Micro-Manager supports it). The Adobe TIFF standard specifies a limit in filesize of 4GB, which could be potentially troublesome. I don't know if OME-TIFF has similar limitations. I don't know the microscopy field, but the answer is "sort of". TIFF is kind of a framework for building an image file format than an image format per se. Sure, you can stick to the defaults, write out one image with standard metadata, and most general image programs will be able to read it. However, there are a bunch of standards for extra, domain-specific (and application-specific!) metadata that build on top of the general TIFF spec to specify the semantics of certain metadata fields and what kind of raster data can go in, etc. OME-TIFF is, essentially, an instantiation of the general class of the TIFF standard and not a different standard altogether. The 4GB limitation is probably inherited. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From hbabcock at mac.com Fri Jan 24 13:02:55 2014 From: hbabcock at mac.com (Hazen Babcock) Date: Fri, 24 Jan 2014 13:02:55 -0500 Subject: [SciPy-User] Microscopy TIFF files In-Reply-To: References: Message-ID: <52E2AACF.4040108@mac.com> On Fri, Jan 24, 2014 at 5:13 PM, Chris Weisiger wrote: > There exist Python libraries for writing TIFF files, of course; my concern > with them is that they perform writes "atomically" (that is, the entire > image data is written at once). This requires you to *have* all of the > image data at the time you perform the writing, and means that for large > datasets you need lots of RAM (to hold the dataset in memory) and lots of > time (to write the data to disk all at once). I prefer to stream data to > disk as it comes in, which greatly reduces RAM requirements and also > improves performance by allowing writes to happen in parallel with data > acquisition. However, I'm not aware of a Python/TIFF library that allows > for this. On the off chance that this is helpful, I wrote a TIFF writer that works as you describe (i.e. it can write frame-by-frame as the images become available). It is available here: https://github.com/ZhuangLab/storm-control/blob/master/hal4000/halLib/tiffwriter.py It is designed for 16bit images. -Hazen From matthew.brett at gmail.com Fri Jan 24 13:44:07 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 24 Jan 2014 10:44:07 -0800 Subject: [SciPy-User] SciPy-User Digest, Vol 120, Issue 12 In-Reply-To: <1390559927.7837.5.camel@sebastian-laptop> References: <52137F9C.5000802@imperial.ac.uk> <1390559544.7837.4.camel@sebastian-laptop> <1390559927.7837.5.camel@sebastian-laptop> Message-ID: Hi, On Fri, Jan 24, 2014 at 2:38 AM, Sebastian Berg wrote: > On Fri, 2014-01-24 at 11:32 +0100, Sebastian Berg wrote: >> On Thu, 2014-01-23 at 15:34 -0800, Matthew Brett wrote: >> > Hi, >> > >> > On Tue, Aug 20, 2013 at 7:39 AM, Michal Romaniuk >> > wrote: >> > > Hi, >> > > >> > >> Hi, >> > >> >> > >> On Mon, Aug 19, 2013 at 7:44 AM, Michal Romaniuk >> > >> wrote: >> > >>> Hi, >> > >>> >> > >>> I'm saving a large batch of data using savemat and although I get no >> > >>> errors, the files produced are not readable for either matlab or scipy. >> > >>> Is there a limit on file size? >> > >> >> >> Hi, >> >> seems like a bug in >> >> https://github.com/scipy/scipy/blob/master/scipy/io/matlab/mio5_utils.pyx#L123 >> >> the line should use np.intp_t not int32_t. >> > > Sorry didn't think that through. I bet the int32 is just the format and > the 9 GiB with double doesn't indicate some overflow as I thought > anyway. Yes, the int32 is the format - see page 1-15 at http://www.mathworks.com/help/pdf_doc/matlab/matfile_format.pdf Cheers, Matthew From fx.thomas at gmail.com Fri Jan 24 16:45:50 2014 From: fx.thomas at gmail.com (=?UTF-8?Q?Fran=C3=A7ois=2DXavier_Thomas?=) Date: Fri, 24 Jan 2014 22:45:50 +0100 Subject: [SciPy-User] Microscopy TIFF files In-Reply-To: References: Message-ID: On Jan 24, 2014 6:14 PM, "Chris Weisiger" wrote: > > Apologies for the somewhat off-topic nature of this post, but I'm not aware of any more-pertinent communities. Please feel free to redirect me elsewhere. > > Our microscope control software currently outputs files in MRC format. In the interests of interoperability with the rest of the microscope community, it would be nice if we had TIFF support as well. There exist programs that convert MRC to TIFF and vice versa, but a native solution is always preferable if only to reduce the number of steps involved. Personally I have next to no experience with the TIFF format as used in microscopy, so I'm starting out by doing some research. > > There exist Python libraries for writing TIFF files, of course; my concern with them is that they perform writes "atomically" (that is, the entire image data is written at once). This requires you to *have* all of the image data at the time you perform the writing, and means that for large datasets you need lots of RAM (to hold the dataset in memory) and lots of time (to write the data to disk all at once). I prefer to stream data to disk as it comes in, which greatly reduces RAM requirements and also improves performance by allowing writes to happen in parallel with data acquisition. However, I'm not aware of a Python/TIFF library that allows for this. If I remember correctly, PyLibTiff can write streaming content using the low-level primitives from the C LibTiff (using WriteStrip for instance), the only issue being knowing how, because the documentation is spartan. It is mostly a wrapper around the C version though, so you might find something in its documentation or re-use existing C code. > I've taken a brief look at the TIFF spec ( http://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf ). Is this actually the same format that is used for microscopy data? I've seen OME-TIFF as well ( http://www.openmicroscopy.org/site/support/ome-model/ome-tiff/ ) but I don't know how widely it is supported (though I do see that Micro-Manager supports it). The Adobe TIFF standard specifies a limit in filesize of 4GB, which could be potentially troublesome. I don't know if OME-TIFF has similar limitations. Baseline Tiff does have the 4GB limit (which OME-TIFF must have inherited), but an extension called BigTiff does not. I am not sure if you can use it easily using Python though. Well, you can use GDAL, but it is a pretty large library for such a simple goal, especially since you mentioned limited RAM. Cheers, Fran?ois-Xavier -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimparker96313 at gmail.com Fri Jan 24 17:40:50 2014 From: jimparker96313 at gmail.com (Jim Parker) Date: Fri, 24 Jan 2014 16:40:50 -0600 Subject: [SciPy-User] Numpy broadcasting rules when using masked_where (or do they not apply?) Message-ID: I posted the below to Stack Overflow, http://stackoverflow.com/questions/21295788/numpy-broadcasting-rules-when-using-masked-where-or-do-they-not-apply and the consensus was it is a bug. I don't know, but I believe this is a better forum to address the issue. I would like to use broadcasting to use values from one array (lower dimension) to mask another array (higher dimension). An example would be a= np.arange(9).reshape(3,3) b= np.arange(3) what I tried is c=np.ma.masked_where(b[:,np.newaxis]>1, a) but it fails with "IndexError: Inconsistant[sic] shape between the condition and the input (got(3,1) and (3,3))" which is perplexing since broadcasting should apply here and b should broadcast onto a. A work around is to build a temporary d array of the right size d=np.broadcast_arrays(b[:,np.newaxis],a)[0] and use it in the masked_where statement c=np.ma.masked_where(d>1,a) but this is goes against the paradigm of avoiding temp arrays by using np.newaxis. OBTW, using the where command does allow broadcasting in this manner c=np.where(b[:,np.newaxis]>1, 0.0, a) works as expected. Additionally,if the condition in the masked_where is always false, the command will execute without error... c=np.ma.masked_where(b[:,np.newaxis]>5, 0.0, a) will yield a. FWIW, I'm using numpy 1.6.1. Others have replicated this with versions 1.8 and 1.9.0.dev-205598b Cheers, --Jim Parker -------------- next part -------------- An HTML attachment was scrubbed... URL: From anthony.j.mannucci at jpl.nasa.gov Fri Jan 24 17:46:28 2014 From: anthony.j.mannucci at jpl.nasa.gov (Mannucci, Anthony J (335G)) Date: Fri, 24 Jan 2014 22:46:28 +0000 Subject: [SciPy-User] Writing string variables with scipy.io.netcdf Message-ID: I am not able to write variables with type "string" using the scipy netCDF interface. Is this a hard limitation? I am trying to write a date as a string. fnet = CDF.netcdf_file(ncfile, 'w') fnet.createDimension('nepochs',nepochs) fnet.createVariable('varepochs',,('nepochs',)) I have tried many things for and they give a ValueError exception. Is this simply not possible? Thanks. -Tony -- Tony Mannucci Supervisor, Ionospheric and Atmospheric Remote Sensing Group Mail-Stop 138-308, Tel > (818) 354-1699 Jet Propulsion Laboratory, Fax > (818) 393-5115 California Institute of Technology, Email > Tony.Mannucci at jpl.nasa.gov 4800 Oak Grove Drive, http://scienceandtechnology.jpl.nasa.gov/people/a_mannucci/ Pasadena, CA 91109 -------------- next part -------------- An HTML attachment was scrubbed... URL: From guziy.sasha at gmail.com Fri Jan 24 19:41:17 2014 From: guziy.sasha at gmail.com (Oleksandr Huziy) Date: Fri, 24 Jan 2014 19:41:17 -0500 Subject: [SciPy-User] Writing string variables with scipy.io.netcdf In-Reply-To: References: Message-ID: Hi Tony: Have you tried "c" ? http://nbviewer.ipython.org/urls/raw2.github.com/guziy/PyNotebooks/master/netcdf_in_scipy_char_data.ipynb?create=1 But it has to be 2d array of chars. Cheers 2014/1/24 Mannucci, Anthony J (335G) > I am not able to write variables with type "string" using the scipy > netCDF interface. Is this a hard limitation? I am trying to write a date as > a string. > > fnet = CDF.netcdf_file(ncfile, 'w') > fnet.createDimension('nepochs',nepochs) > fnet.createVariable('varepochs',,('nepochs',)) > > I have tried many things for and they give a ValueError > exception. Is this simply not possible? Thanks. > > -Tony > > -- > Tony Mannucci > Supervisor, Ionospheric and Atmospheric Remote Sensing Group > Mail-Stop 138-308, Tel > (818) 354-1699 > Jet Propulsion Laboratory, Fax > (818) 393-5115 > California Institute of Technology, Email > > Tony.Mannucci at jpl.nasa.gov > 4800 Oak Grove Drive, > http://scienceandtechnology.jpl.nasa.gov/people/a_mannucci/ > Pasadena, CA 91109 > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- Sasha -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sat Jan 25 02:09:30 2014 From: sturla.molden at gmail.com (sturla.molden at gmail.com) Date: Sat, 25 Jan 2014 07:09:30 +0000 (UTC) Subject: [SciPy-User] Microscopy TIFF files References: Message-ID: <1100431831412325780.320265sturla.molden-gmail.com@news.gmane.org> I have found the tifffile module from Christoph Gohlke to be sufficient to read TIFF fikes from Olympus microscope software. It also writes TIFF files: http://www.lfd.uci.edu/~gohlke/ Also note that a number of Python image processing packages have i/o routines for various image formats. Look around in OpenCV, scikit-image, PIL, and scipy.ndarray. Personally I prefer to store image data in a Pytables database (hdf5). Using an image format is only needed to get images into e.g. Adobe Illustrator when preparing a figure. I would advice against using multiple image files as a file format for storing microscope image data. Sturla Chris Weisiger wrote: > Apologies for the somewhat off-topic nature of this post, but I'm not > aware of any more-pertinent communities. Please feel free to redirect me elsewhere. > > Our microscope control software currently outputs files in MRC format. In > the interests of interoperability with the rest of the microscope > community, it would be nice if we had TIFF support as well. There exist > programs that convert MRC to TIFF and vice versa, but a native solution > is always preferable if only to reduce the number of steps involved. > Personally I have next to no experience with the TIFF format as used in > microscopy, so I'm starting out by doing some research. > > There exist Python libraries for writing TIFF files, of course; my > concern with them is that they perform writes "atomically" (that is, the > entire image data is written at once). This requires you to *have* all of > the image data at the time you perform the writing, and means that for > large datasets you need lots of RAM (to hold the dataset in memory) and > lots of time (to write the data to disk all at once). I prefer to stream > data to disk as it comes in, which greatly reduces RAM requirements and > also improves performance by allowing writes to happen in parallel with > data acquisition. However, I'm not aware of a Python/TIFF library that allows for this. > > I've taken a brief look at the TIFF spec ( href="http://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf">http://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf > ). Is this actually the same format that is used for microscopy data? > I've seen OME-TIFF as well ( href="http://www.openmicroscopy.org/site/support/ome-model/ome-tiff/">http://www.openmicroscopy.org/site/support/ome-model/ome-tiff/ > ) but I don't know how widely it is supported (though I do see that > Micro-Manager supports it). The Adobe TIFF standard specifies a limit in > filesize of 4GB, which could be potentially troublesome. I don't know if > OME-TIFF has similar limitations. > > I'd appreciate any advice on this topic. I'm missing a lot of context, > I'm sure, which will make it a lot harder to implement a working and useful solution. > > -Chris > > _______________________________________________ SciPy-User mailing list > SciPy-User at scipy.org href="http://mail.scipy.org/mailman/listinfo/scipy-user">http://mail.scipy.org/mailman/listinfo/scipy-user From sturla.molden at gmail.com Sat Jan 25 03:33:21 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 25 Jan 2014 08:33:21 +0000 (UTC) Subject: [SciPy-User] leastsq and multiprocessing References: Message-ID: <2115119013412331145.568300sturla.molden-gmail.com@news.gmane.org> Matt Newville wrote: > Is this addition worth including in leastsq()? I would think that it > does little harm, might be useful for some, and provides a starting > point for further work. I believe the "right place" to start vectorizing leastsq would be to use LAPACK for the QR factorization, and then leave the parallel computing to MKL or OpenBLAS. But if you do, it would be just as easy to write a Levenberg-Marquardt method from scratch rather than to patch MINPACK. Sturla From njs at pobox.com Sat Jan 25 15:58:00 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 25 Jan 2014 20:58:00 +0000 Subject: [SciPy-User] [SciPy-Dev] 64-bit sparse matrix indices In-Reply-To: References: Message-ID: On Sat, Jan 25, 2014 at 8:34 PM, Pauli Virtanen wrote: > Hi, > > The 32 & 64 support for sparse matrices is nearly finished, and > essentially waiting for more testing & merging: > > https://github.com/scipy/scipy/pull/442 > > What this will do is that sparse matrices with nnz that fit into 32 bit > use 32-bit index arrays, but those with larger nnz automatically switch > into 64-bit indices. > > This means that e.g. csr_matrix.indices can be either int32 or int64. In > most cases (for sparse matrices taking less than a few gigabytes of > memory) it will be int32. Does this also allow for sparse matrices with more than 2**32 rows or columns (which might have arbitrarily small nnz)? -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From sturla.molden at gmail.com Sat Jan 25 16:31:08 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 25 Jan 2014 21:31:08 +0000 (UTC) Subject: [SciPy-User] calculating numerous linear regressions quickly References: <52D44365.4060602@aer.com> Message-ID: <261068776412378043.548379sturla.molden-gmail.com@news.gmane.org> I had the same issue when computing "lowess" regression. I ended up using a Fortran subroutine that called the LAPACK subroutine DGELS. (It is possible to vectorize for multiple cores using OpenMP or link with a multithreaded LAPACK/BLAS. What works best is dependent on the problem.) Sturla From yw5aj at virginia.edu Sun Jan 26 16:01:03 2014 From: yw5aj at virginia.edu (Yuxiang Wang) Date: Sun, 26 Jan 2014 16:01:03 -0500 Subject: [SciPy-User] Normalization for optimization in python Message-ID: Dear all, During optimization, it is often helpful to normalize the input parameters to make them on the same order of magnitude, so the convergence can be much better. For example, if we want to minimize f(x), while a reasonable approximation is x0=[1e3, 1e-4], it might be helpful to normalize x0[0] and x0[1] to about the same order of magnitude (often O(1)). My question is, I have been using scipy.optimize and specifically, the L-BFGS-B algorithm. I was wondering that, do I need to normalize that manually by writing a function, or the algorithm already did it for me? Thank you! -Shawn -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia yw5aj at virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ From klonuo at gmail.com Sun Jan 26 18:02:26 2014 From: klonuo at gmail.com (klo uo) Date: Mon, 27 Jan 2014 00:02:26 +0100 Subject: [SciPy-User] Beltrami flow (Laplace-Beltrami operator) Message-ID: I'm looking for Python implementation of Beltrami flow (for image denoising and smoothing), but couldn't find one. Does someone maybe knows if this algorithm is implemented in some project or has a pointers which could probably help me? Thanks From macdonald at maths.ox.ac.uk Sun Jan 26 18:40:56 2014 From: macdonald at maths.ox.ac.uk (Colin Macdonald) Date: Sun, 26 Jan 2014 23:40:56 +0000 Subject: [SciPy-User] Beltrami flow (Laplace-Beltrami operator) In-Reply-To: References: Message-ID: <52E59D08.7030408@maths.ox.ac.uk> On 26/01/14 23:02, klo uo wrote: > I'm looking for Python implementation of Beltrami flow (for image > denoising and smoothing), but couldn't find one. > > Does someone maybe knows if this algorithm is implemented in some > project or has a pointers which could probably help me? Hi, I've done some work with Tom M?rz on image processing on surfaces (where the image is defined on a curved surface). We use the closest point method (finite differences with embedded surface representation). My implementation is here: github.com/cbm755/cp_matrices (although Python implementation lags Octave/Matlab one in some ways, and whole thing is "beta" at best). Happy to follow-up off-list if this is relevant. best, Colin From ehermes at chem.wisc.edu Sun Jan 26 19:01:45 2014 From: ehermes at chem.wisc.edu (Eric Hermes) Date: Sun, 26 Jan 2014 18:01:45 -0600 Subject: [SciPy-User] Normalization for optimization in python In-Reply-To: References: Message-ID: <52E5A1E9.5000009@chem.wisc.edu> Shawn, I extensively use scipy.optimize.fmin_l_bfgs_b, and I always explicitly normalize the input before passing it to the optimizer. I usually do it by writing my function as g(x,norm) == f([x[0]*norm[0],x[1]*norm[1],...]), and pass it to the optimizer as fmin_l_bfgs_b(func=g,x0=[1.,1.],args=(norm)). Note that you can achieve rigorous convergence by multiplying norm by the result of optimization and iterating, but convergence behavior far from a minimum may highly depend both on what you choose as your initial guess and what your initial normalization factor is. Eric On 1/26/2014 3:01 PM, Yuxiang Wang wrote: > Dear all, > > During optimization, it is often helpful to normalize the input > parameters to make them on the same order of magnitude, so the > convergence can be much better. For example, if we want to minimize > f(x), while a reasonable approximation is x0=[1e3, 1e-4], it might be > helpful to normalize x0[0] and x0[1] to about the same order of > magnitude (often O(1)). > > My question is, I have been using scipy.optimize and specifically, the > L-BFGS-B algorithm. I was wondering that, do I need to normalize that > manually by writing a function, or the algorithm already did it for > me? > > Thank you! > > -Shawn > From klonuo at gmail.com Sun Jan 26 20:22:33 2014 From: klonuo at gmail.com (klo uo) Date: Mon, 27 Jan 2014 02:22:33 +0100 Subject: [SciPy-User] Beltrami flow (Laplace-Beltrami operator) In-Reply-To: <52E59D08.7030408@maths.ox.ac.uk> References: <52E59D08.7030408@maths.ox.ac.uk> Message-ID: Colin, thanks for your help. I cloned your project, build the package and browsed thru it, but I'm not sure if I can handle it. I'm following pattern recognition method, where Beltrami flow is suggested as a step in image preprocessing. I read a bit about it, but it seemed too advanced for me to make that concept in Python code right now, so I searched around for implementation expecting a straightforward function in a manner of Gaussian filter or similar. It looks like your project works on mesh surface object, and the interface plays nice with PLY 3D format. So I'll need to somehow translate my image into mesh surface? Or maybe your project is meant for different type of problem? Thanks, Klo On Mon, Jan 27, 2014 at 12:40 AM, Colin Macdonald wrote: > On 26/01/14 23:02, klo uo wrote: >> I'm looking for Python implementation of Beltrami flow (for image >> denoising and smoothing), but couldn't find one. >> >> Does someone maybe knows if this algorithm is implemented in some >> project or has a pointers which could probably help me? > > Hi, > > I've done some work with Tom M?rz on image processing on surfaces > (where the image is defined on a curved surface). > > We use the closest point method (finite differences with embedded > surface representation). My implementation is here: > > github.com/cbm755/cp_matrices > > (although Python implementation lags Octave/Matlab one in some ways, > and whole thing is "beta" at best). Happy to follow-up off-list if > this is relevant. > > best, > Colin > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From Jerome.Kieffer at esrf.fr Mon Jan 27 01:29:27 2014 From: Jerome.Kieffer at esrf.fr (Jerome Kieffer) Date: Mon, 27 Jan 2014 07:29:27 +0100 Subject: [SciPy-User] Microscopy TIFF files In-Reply-To: References: Message-ID: <20140127072927.ebcfc249155277b22efc2a60@esrf.fr> Hi Chris, We have developped a library (FabIO) to open images from various sources of X-ray detectors in read/write mode and includes a bit of conversion between formats (while not obvious: see publication: http://journals.iucr.org/j/issues/2013/02/00/kk5124/ ) This library supports MRC & Tiff among other formats: https://github.com/kif/fabio or: http://sourceforge.net/projects/fable/files/fabio/0.1.3/ Hope this helps. -- J?r?me Kieffer Data analysis unit - ESRF From macdonald at maths.ox.ac.uk Mon Jan 27 05:08:52 2014 From: macdonald at maths.ox.ac.uk (Colin Macdonald) Date: Mon, 27 Jan 2014 10:08:52 +0000 Subject: [SciPy-User] Beltrami flow (Laplace-Beltrami operator) In-Reply-To: References: <52E59D08.7030408@maths.ox.ac.uk> Message-ID: <52E63034.5030808@maths.ox.ac.uk> On 27/01/14 01:22, klo uo wrote: > I'm following pattern recognition method, where Beltrami flow is > suggested as a step in image preprocessing. I read a bit about it, but > it seemed too advanced for me to make that concept in Python code > right now, so I searched around for implementation expecting a > straightforward function in a manner of Gaussian filter or similar. > It looks like your project works on mesh surface object, and the > interface plays nice with PLY 3D format. So I'll need to somehow > translate my image into mesh surface? Or maybe your project is meant > for different type of problem? It works on a "closest point representation" where each grid point in R^n (say R^3) stores its closest point on the surface. We then use this representation to solve PDEs on the surface by discretizing in the R^3 space. Not sure if that is appropriate for your problem or not. Can you send a reference? (For example, if the surface deforms, then this software isn't ready yet---and the maths is not much ahead of it!) The ply and mesh stuff is there just because we have some routines that convert such meshes into closest point representation. In an ideal world, these should probably be their own packages. (this is probably more clear in the Octave/Matlab version.) Colin From yw5aj at virginia.edu Mon Jan 27 15:20:18 2014 From: yw5aj at virginia.edu (Yuxiang Wang) Date: Mon, 27 Jan 2014 15:20:18 -0500 Subject: [SciPy-User] Normalization for optimization in python In-Reply-To: <52E5A1E9.5000009@chem.wisc.edu> References: <52E5A1E9.5000009@chem.wisc.edu> Message-ID: Hi Eric, Great! Thanks a lot for the confirmation. That is really helpful. -Shawn On Sun, Jan 26, 2014 at 7:01 PM, Eric Hermes wrote: > Shawn, > > I extensively use scipy.optimize.fmin_l_bfgs_b, and I always explicitly > normalize the input before passing it to the optimizer. I usually do it > by writing my function as g(x,norm) == > f([x[0]*norm[0],x[1]*norm[1],...]), and pass it to the optimizer as > fmin_l_bfgs_b(func=g,x0=[1.,1.],args=(norm)). Note that you can achieve > rigorous convergence by multiplying norm by the result of optimization > and iterating, but convergence behavior far from a minimum may highly > depend both on what you choose as your initial guess and what your > initial normalization factor is. > > Eric > > On 1/26/2014 3:01 PM, Yuxiang Wang wrote: >> Dear all, >> >> During optimization, it is often helpful to normalize the input >> parameters to make them on the same order of magnitude, so the >> convergence can be much better. For example, if we want to minimize >> f(x), while a reasonable approximation is x0=[1e3, 1e-4], it might be >> helpful to normalize x0[0] and x0[1] to about the same order of >> magnitude (often O(1)). >> >> My question is, I have been using scipy.optimize and specifically, the >> L-BFGS-B algorithm. I was wondering that, do I need to normalize that >> manually by writing a function, or the algorithm already did it for >> me? >> >> Thank you! >> >> -Shawn >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia yw5aj at virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ From klonuo at gmail.com Mon Jan 27 18:03:06 2014 From: klonuo at gmail.com (klo uo) Date: Tue, 28 Jan 2014 00:03:06 +0100 Subject: [SciPy-User] Beltrami flow (Laplace-Beltrami operator) In-Reply-To: <52E63034.5030808@maths.ox.ac.uk> References: <52E59D08.7030408@maths.ox.ac.uk> <52E63034.5030808@maths.ox.ac.uk> Message-ID: Beltrami flow was suggested as a filter because it preserves object edges in satisfactory way, compared to other diffusion filters. While searching for this filter, I came also by your paper about surface segmentation and several others dealing with meshes like for example in MRI scans. I found a simple Matlab procedure in "Numerical Geometry of Images" by Ron Kimmel: http://books.google.com/books?id=0kuKevTKxOAC&pg=PA148 and here is IPython Notebook which demonstrates this procedure applied to sample images from "Handbook of Geometric Computing" as a validation: http://nbviewer.ipython.org/gist/anonymous/7657b8105c15d316bb82 I just need to translate this code to Python, which shouldn't be a problem Cheers On Mon, Jan 27, 2014 at 11:08 AM, Colin Macdonald wrote: > On 27/01/14 01:22, klo uo wrote: >> I'm following pattern recognition method, where Beltrami flow is >> suggested as a step in image preprocessing. I read a bit about it, but >> it seemed too advanced for me to make that concept in Python code >> right now, so I searched around for implementation expecting a >> straightforward function in a manner of Gaussian filter or similar. >> It looks like your project works on mesh surface object, and the >> interface plays nice with PLY 3D format. So I'll need to somehow >> translate my image into mesh surface? Or maybe your project is meant >> for different type of problem? > > It works on a "closest point representation" where each grid point in > R^n (say R^3) stores its closest point on the surface. We then use > this representation to solve PDEs on the surface by discretizing in > the R^3 space. > > Not sure if that is appropriate for your problem or not. Can you send > a reference? (For example, if the surface deforms, then this software > isn't ready yet---and the maths is not much ahead of it!) > > The ply and mesh stuff is there just because we have some routines > that convert such meshes into closest point representation. In an > ideal world, these should probably be their own packages. > (this is probably more clear in the Octave/Matlab version.) > > Colin > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From rroemer at gmail.com Wed Jan 29 09:42:31 2014 From: rroemer at gmail.com (=?UTF-8?Q?Ronald_R=C3=B6mer?=) Date: Wed, 29 Jan 2014 15:42:31 +0100 Subject: [SciPy-User] Swapped axis of interpolate.bisplev Message-ID: Hello, I have a simple question about the bisplrep-function. Here is my example: ===== Python 2.7.1 (r271:86832, Apr 12 2011, 16:15:16) [GCC 4.6.0 20110331 (Red Hat 4.6.0-2)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> from scipy import interpolate >>> x = [[0.00044999999999999999, 0.0013500000000000001, 0.0022499999999999998, 0.00315], ... [0.00044999999999999999, 0.0013500000000000001, 0.0022499999999999998, 0.00315], ... [0.00044999999999999999, 0.0013500000000000001, 0.0022499999999999998, 0.00315], ... [0.00044999999999999999, 0.0013500000000000001, 0.0022499999999999998, 0.00315]] >>> y = [[0.00044999999999999999, 0.00044999999999999999, 0.00044999999999999999, 0.00044999999999999999], ... [0.0013500000000000001, 0.0013500000000000001, 0.0013500000000000001, 0.0013500000000000001], ... [0.0022499999999999998, 0.0022499999999999998, 0.0022499999999999998, 0.0022499999999999998], ... [0.00315, 0.00315, 0.00315, 0.00315]] >>> r = [[354532.60790243, 549112.50964666996, 753454.90521641006, 136672.17168719001], ... [262992.88639023999, 126681.69270699999, -171772.44189881999, -705770.91541941999], ... [261424.75081678, 125749.32400532, -177467.12005669001, -730048.73154369998], ... [350700.67623087001, 542821.56886627001, 734994.61649696995, 87886.793706120006]] >>> ipl = interpolate.bisplrep(x, y, r) >>> interpolate.bisplev(x[0], [ yi[0] for yi in y ], ipl) array([[ 354532.60790243, 262992.88639024, 261424.75081678, 350700.67623087], [ 549112.50964667, 126681.692707 , 125749.32400532, 542821.56886627], [ 753454.90521641, -171772.44189882, -177467.12005669, 734994.61649697], [ 136672.17168719, -705770.91541942, -730048.7315437 , 87886.79370612]]) >>> ===== And now the question: Why is the last output not equivalent to r? It seems so, as if the x- and y-axis are swapped. Whats the reason for that issue? Best regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.s4403 at gmail.com Wed Jan 29 10:49:31 2014 From: j.s4403 at gmail.com (j s) Date: Wed, 29 Jan 2014 09:49:31 -0600 Subject: [SciPy-User] scipy.sparse.linalg license Message-ID: Hello, I'd like to incorporate sparse matrix routines into my project, but I need clarification about the project's license. Is UMFPACK still the underlying solver? Is the matrix factorization code under the GPL? Are any of the routines under a more permissive license? Regards, J From cournape at gmail.com Wed Jan 29 16:39:21 2014 From: cournape at gmail.com (David Cournapeau) Date: Wed, 29 Jan 2014 21:39:21 +0000 Subject: [SciPy-User] scipy.sparse.linalg license In-Reply-To: References: Message-ID: On Wed, Jan 29, 2014 at 3:49 PM, j s wrote: > Hello, > > I'd like to incorporate sparse matrix routines into my project, but I > need clarification about the project's license. > > Is UMFPACK still the underlying solver? Is the matrix factorization > code under the GPL? Are any of the routines under a more permissive > license? > Hi J, You would need to give us more details about the exact code you are using, but all the code in the current scipy git repository should be available under a permissive license. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From klamathconservation at gmail.com Wed Jan 29 18:50:38 2014 From: klamathconservation at gmail.com (Carlos Carroll) Date: Wed, 29 Jan 2014 15:50:38 -0800 Subject: [SciPy-User] multivariate distance calculations and clustering of large matrices Message-ID: We are attempting to generalize a function developed in R from the univariate to multivariate context, and are finding that computational and memory requirements suggest a shift to python. The function calculates the geographic distance between each cell in a matrix and the closest cell which is considered "similar" in terms of environmental variables (climate in this case). The matrix has between 10^7 to 10^8 cells, and ~20 environmental covariates. The univariate function in R: t <- 0.25 # plus/minus threshold for climate match t <- 1/(t*2) # inverse for rounding, double for plus/minus x <- present$x # vector of grid cell x coordinates y <- present$y # vector of grid cell y coordinates p <- round(present$var.1*t)/t # vector of rounded present climate values f <- round(future$var.1*t)/t # vector of rounded future climate values d <- vector(length=length(p)) # empty vector to write distance to climate match u <- unique(p)[order(unique(p))] # list of unique climate values in p match <- function(u){c(which(u==f))} # function finding climate matches of u with f m <- sapply(u, match) # list of climate matches for unique values for(i in 1:length(p)){ # loop for all grid cells of p mi <- m[[which(u==p[i])]] # matches for grid cell i d[i] <- sqrt(min((x[i]-x[mi])^2 + (y[i]-y[mi])^2)) # distance to closest match } For the multivariate function, we would be using something like the standardized Euclidean distance (eg from scipy,spatial.distance) to assess environmental similarity. standardized Euclidean distance, between two 1-D arrays u and v is: np.sqrt(((u - v) ** 2 / V).sum()) Can someone suggest a good solution to reducing the computational and memory requirements? Maybe we need to explore some type of clustering algorithms that work on sparse matrices? -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.s4403 at gmail.com Wed Jan 29 19:58:45 2014 From: j.s4403 at gmail.com (j s) Date: Wed, 29 Jan 2014 18:58:45 -0600 Subject: [SciPy-User] scipy.sparse.linalg license In-Reply-To: References: Message-ID: <52E9A3C5.2010907@gmail.com> On 1/29/14, 3:39 PM, David Cournapeau wrote: > > > > On Wed, Jan 29, 2014 at 3:49 PM, j s > wrote: > > Hello, > > I'd like to incorporate sparse matrix routines into my project, but I > need clarification about the project's license. > > Is UMFPACK still the underlying solver? Is the matrix factorization > code under the GPL? Are any of the routines under a more permissive > license? > > > Hi J, > > You would need to give us more details about the exact code you are > using, but all the code in the current scipy git repository should be > available under a permissive license. I haven't started using it yet, but I would like to do sparse matrix factorization. The default solver for scipy.sparse.linalg is UMFPACK, which is GPL. Was scipy able to negotiate the use this code under a more permissive license? What solver is used when use_umfpack=False? J > > David > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From warren.weckesser at gmail.com Wed Jan 29 23:15:57 2014 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Wed, 29 Jan 2014 23:15:57 -0500 Subject: [SciPy-User] scipy.sparse.linalg license In-Reply-To: <52E9A3C5.2010907@gmail.com> References: <52E9A3C5.2010907@gmail.com> Message-ID: On 1/29/14, j s wrote: > On 1/29/14, 3:39 PM, David Cournapeau wrote: >> >> >> >> On Wed, Jan 29, 2014 at 3:49 PM, j s > > wrote: >> >> Hello, >> >> I'd like to incorporate sparse matrix routines into my project, but I >> need clarification about the project's license. >> >> Is UMFPACK still the underlying solver? Is the matrix factorization >> code under the GPL? Are any of the routines under a more permissive >> license? >> >> >> Hi J, >> >> You would need to give us more details about the exact code you are >> using, but all the code in the current scipy git repository should be >> available under a permissive license. > > I haven't started using it yet, but I would like to do sparse matrix > factorization. The default solver for scipy.sparse.linalg is UMFPACK, > which is GPL. Was scipy able to negotiate the use this code under a > more permissive license? What solver is used when use_umfpack=False? > UMFPACK is deprecated, and will be removed when this pull request is merged: https://github.com/scipy/scipy/pull/3178 Warren > J > > >> >> David >> >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > From L.J.Buitinck at uva.nl Thu Jan 30 07:47:18 2014 From: L.J.Buitinck at uva.nl (Lars Buitinck) Date: Thu, 30 Jan 2014 13:47:18 +0100 Subject: [SciPy-User] L-BFGS broken in Ubuntu's scipy builds? Message-ID: Dear all, I was trying to implement logistic regression using the fmin_l_bfgs_b from SciPy 0.11/0.12 as shipped by Ubuntu 13.04/13.10 and nothing seemed to work. Fearing I had done something wrong in implementing the gradient from Bishop's book, I turned to Fabian Pedregosa's implementation [1] to see if that would work. I removed the callback and maxiter options that aren't available in 0.11 from the script, then tried the following combinations: * SciPy 0.11.0 (Ubuntu), NumPy 1.7.1 (Ubuntu): ABNORMAL_TERMINATION_IN_LNSRCH * SciPy 0.12.0 (Ubuntu), NumPy 1.7.1 (Ubuntu): ABNORMAL_TERMINATION_IN_LNSRCH * SciPy 0.12.0 (fresh build), NumPy ad5bded (GitHub): converges * SciPy 0.13.0rc1, NumPy ad5bded (GitHub): converges I also tried a freshly built SciPy 0.11.0 from GH but it failed to load something from LAPACK. I got the same error from my own code. A numerical gradient check pointed out no obvious flaws in my code. Is this a known issue with Ubuntu's SciPy build? I saw a bunch of commits related to line search issues in the Git log, could any of those be related? [1] http://nbviewer.ipython.org/github/fabianp/pytron/blob/master/doc/benchmark_logistic.ipynb Regards, Lars Buitinck From alan.isaac at gmail.com Thu Jan 30 10:17:49 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Thu, 30 Jan 2014 10:17:49 -0500 Subject: [SciPy-User] missing web page? Message-ID: <52EA6D1D.8080103@gmail.com> At http://docs.scipy.org/doc/numpy/reference/routines.statistics.html all the links seem to work except this one: http://docs.scipy.org/doc/numpy/reference/generated/numpy.var.html#numpy.var Odd ... Alan Isaac From j.s4403 at gmail.com Thu Jan 30 11:03:21 2014 From: j.s4403 at gmail.com (j s) Date: Thu, 30 Jan 2014 10:03:21 -0600 Subject: [SciPy-User] scipy.sparse.linalg license In-Reply-To: References: <52E9A3C5.2010907@gmail.com> Message-ID: Looking at the pull request, superlu will be the default, which I've used from its C interface. This solver is sufficient for my needs. Thank you, Juan On Wed, Jan 29, 2014 at 10:15 PM, Warren Weckesser wrote: > On 1/29/14, j s wrote: >> On 1/29/14, 3:39 PM, David Cournapeau wrote: >>> >>> >>> >>> On Wed, Jan 29, 2014 at 3:49 PM, j s >> > wrote: >>> >>> Hello, >>> >>> I'd like to incorporate sparse matrix routines into my project, but I >>> need clarification about the project's license. >>> >>> Is UMFPACK still the underlying solver? Is the matrix factorization >>> code under the GPL? Are any of the routines under a more permissive >>> license? >>> >>> >>> Hi J, >>> >>> You would need to give us more details about the exact code you are >>> using, but all the code in the current scipy git repository should be >>> available under a permissive license. >> >> I haven't started using it yet, but I would like to do sparse matrix >> factorization. The default solver for scipy.sparse.linalg is UMFPACK, >> which is GPL. Was scipy able to negotiate the use this code under a >> more permissive license? What solver is used when use_umfpack=False? >> > > > UMFPACK is deprecated, and will be removed when this pull request is merged: > https://github.com/scipy/scipy/pull/3178 > > Warren > > >> J >> >> >>> >>> David >>> >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user From jsseabold at gmail.com Thu Jan 30 11:35:29 2014 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 30 Jan 2014 11:35:29 -0500 Subject: [SciPy-User] scipy.sparse.linalg license In-Reply-To: References: <52E9A3C5.2010907@gmail.com> Message-ID: On Thu, Jan 30, 2014 at 11:03 AM, j s wrote: > Looking at the pull request, superlu will be the default, which I've > used from its C interface. This solver is sufficient for my needs. > Indeed. Could/should SuperLU's use be better advertised? Skipper From pav at iki.fi Thu Jan 30 11:49:28 2014 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 30 Jan 2014 18:49:28 +0200 Subject: [SciPy-User] L-BFGS broken in Ubuntu's scipy builds? In-Reply-To: References: Message-ID: 30.01.2014 14:47, Lars Buitinck kirjoitti: > I was trying to implement logistic regression using the fmin_l_bfgs_b > from SciPy 0.11/0.12 as shipped by Ubuntu 13.04/13.10 and nothing > seemed to work. Fearing I had done something wrong in implementing the > gradient from Bishop's book, I turned to Fabian Pedregosa's > implementation [1] to see if that would work. I removed the callback > and maxiter options that aren't available in 0.11 from the script, > then tried the following combinations: > > [1] http://nbviewer.ipython.org/github/fabianp/pytron/blob/master >/doc/benchmark_logistic.ipynb This works for me without any problems on Ubuntu 13.10, which has Scipy 0.12.0, Numpy 1.7.1, as directly installed from Ubuntu packages. Given this, I'm not sure you have problems. Different Ubuntu version? Faulty BLAS library? > * SciPy 0.11.0 (Ubuntu), NumPy 1.7.1 (Ubuntu): ABNORMAL_TERMINATION_IN_LNSRCH > * SciPy 0.12.0 (Ubuntu), NumPy 1.7.1 (Ubuntu): ABNORMAL_TERMINATION_IN_LNSRCH > * SciPy 0.12.0 (fresh build), NumPy ad5bded (GitHub): converges > * SciPy 0.13.0rc1, NumPy ad5bded (GitHub): converges > > I also tried a freshly built SciPy 0.11.0 from GH but it failed to > load something from LAPACK. I got the same error from my own code. A > numerical gradient check pointed out no obvious flaws in my code. The error message comes from LBFGS code itself. This is unmodified code by J. Nocedal et al. > Is this a known issue with Ubuntu's SciPy build? I saw a bunch of > commits related to line search issues in the Git log, could any of > those be related? Those are unrelated. -- Pauli Virtanen From pav at iki.fi Thu Jan 30 11:51:29 2014 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 30 Jan 2014 18:51:29 +0200 Subject: [SciPy-User] scipy.sparse.linalg license In-Reply-To: References: <52E9A3C5.2010907@gmail.com> Message-ID: 30.01.2014 18:35, Skipper Seabold kirjoitti: [clip] > Indeed. Could/should SuperLU's use be better advertised? The spsolve docstring should give a reference to it, at the least. -- Pauli Virtanen From jenny.stone125 at gmail.com Thu Jan 30 18:01:01 2014 From: jenny.stone125 at gmail.com (jennifer stone) Date: Fri, 31 Jan 2014 04:31:01 +0530 Subject: [SciPy-User] Suggestions for GSoC Projects Message-ID: With GSoC 2014 being round the corner, I hereby put up few projects for discussion that I would love to pursue as a student. Guidance, suggestions are cordially welcome:- 1. If I am not mistaken, contour integration is not supported by SciPy; in fact even line integrals of real functions is yet to be implemented in SciPy, which is surprising. Though we at present have SymPy for line Integrals, I doubt if there is any open-source python package supporting the calculation of Contour Integrals. With integrate module of SciPy already having been properly developed for definite integration, implementation of line as well as contour integrals, I presume; would not require work from scratch and shall be a challenging but fruitful project. 2. I really have no idea if the purpose of NumPy or SciPy would encompass this but we are yet to have indefinite integration. An implementation of that, though highly challenging, may open doors for innumerable other functions like the ones to calculate the Laplace transform, Hankel transform and many more. 3. As stated earlier, we have spherical harmonic functions (with much scope for dev) we are yet to have elliptical and cylindrical harmonic function, which may be developed. 4. Lastly, we are yet to have Inverse Laplace transforms which as Ralf has rightly pointed out it may be too challenging to implement. 5. Further reading the road-map given by Mr.Ralf, I would like to develop the Bluestein's FFT algorithm. Thanks for reading along till the end. I shall append to this mail as when I am struck with ideas. Please do give your valuable guidance -------------- next part -------------- An HTML attachment was scrubbed... URL: From guziy.sasha at gmail.com Thu Jan 30 18:24:23 2014 From: guziy.sasha at gmail.com (Oleksandr Huziy) Date: Thu, 30 Jan 2014 18:24:23 -0500 Subject: [SciPy-User] Suggestions for GSoC Projects In-Reply-To: References: Message-ID: Hi Jennifer: >2. I really have no idea if the purpose of NumPy or SciPy would encompass this but we are yet to have indefinite integration. An implementation of that, though highly challenging, may >open doors for innumerable other functions like the ones to calculate the Laplace transform, Hankel transform and many more do you mean the integrals with infinite upper or lower limit.... I think it is already there, if you've meant something else, please correct me.. http://guziy.github.io/CNRCWP_student_workshop_2013-12-17/Python_CNRCWP_tutorial.slides.html#/6/10 Thanks 2014-01-30 jennifer stone > With GSoC 2014 being round the corner, I hereby put up few projects for > discussion that I would love to pursue as a student. > Guidance, suggestions are cordially welcome:- > > 1. If I am not mistaken, contour integration is not supported by SciPy; in > fact even line integrals of real functions is yet to be implemented in > SciPy, which is surprising. Though we at present have SymPy for line > Integrals, I doubt if there is any open-source python package supporting > the calculation of Contour Integrals. With integrate module of SciPy > already having been properly developed for definite integration, > implementation of line as well as contour integrals, I presume; would not > require work from scratch and shall be a challenging but fruitful project. > > 2. I really have no idea if the purpose of NumPy or SciPy would encompass > this but we are yet to have indefinite integration. An implementation of > that, though highly challenging, may open doors for innumerable other > functions like the ones to calculate the Laplace transform, Hankel > transform and many more. > > 3. As stated earlier, we have spherical harmonic functions (with much > scope for dev) we are yet to have elliptical and cylindrical harmonic > function, which may be developed. > > 4. Lastly, we are yet to have Inverse Laplace transforms which as Ralf has > rightly pointed out it may be too challenging to implement. > > 5. Further reading the road-map given by Mr.Ralf, I would like to develop > the Bluestein's FFT algorithm. > > Thanks for reading along till the end. I shall append to this mail as when > I am struck with ideas. Please do give your valuable guidance > > _______________________________________________ > SciPy-User mailing list > SciPy-User at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-user > > -- Sasha -------------- next part -------------- An HTML attachment was scrubbed... URL: From guziy.sasha at gmail.com Thu Jan 30 18:26:29 2014 From: guziy.sasha at gmail.com (Oleksandr Huziy) Date: Thu, 30 Jan 2014 18:26:29 -0500 Subject: [SciPy-User] Suggestions for GSoC Projects In-Reply-To: References: Message-ID: Oops, excuse me, indefinite, I have misunderstood 2014-01-30 Oleksandr Huziy > Hi Jennifer: > > >2. I really have no idea if the purpose of NumPy or SciPy would encompass > this but we are yet to have indefinite integration. An implementation of > that, though highly challenging, may >open doors for innumerable other > functions like the ones to calculate the Laplace transform, Hankel > transform and many more > > do you mean the integrals with infinite upper or lower limit.... I think > it is already there, if you've meant something else, please correct me.. > > http://guziy.github.io/CNRCWP_student_workshop_2013-12-17/Python_CNRCWP_tutorial.slides.html#/6/10 > > Thanks > > > > > 2014-01-30 jennifer stone > >> With GSoC 2014 being round the corner, I hereby put up few projects for >> discussion that I would love to pursue as a student. >> Guidance, suggestions are cordially welcome:- >> >> 1. If I am not mistaken, contour integration is not supported by SciPy; >> in fact even line integrals of real functions is yet to be implemented in >> SciPy, which is surprising. Though we at present have SymPy for line >> Integrals, I doubt if there is any open-source python package supporting >> the calculation of Contour Integrals. With integrate module of SciPy >> already having been properly developed for definite integration, >> implementation of line as well as contour integrals, I presume; would not >> require work from scratch and shall be a challenging but fruitful project. >> >> 2. I really have no idea if the purpose of NumPy or SciPy would encompass >> this but we are yet to have indefinite integration. An implementation of >> that, though highly challenging, may open doors for innumerable other >> functions like the ones to calculate the Laplace transform, Hankel >> transform and many more. >> >> 3. As stated earlier, we have spherical harmonic functions (with much >> scope for dev) we are yet to have elliptical and cylindrical harmonic >> function, which may be developed. >> >> 4. Lastly, we are yet to have Inverse Laplace transforms which as Ralf >> has rightly pointed out it may be too challenging to implement. >> >> 5. Further reading the road-map given by Mr.Ralf, I would like to develop >> the Bluestein's FFT algorithm. >> >> Thanks for reading along till the end. I shall append to this mail as >> when I am struck with ideas. Please do give your valuable guidance >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at scipy.org >> http://mail.scipy.org/mailman/listinfo/scipy-user >> >> > > > -- > Sasha > -- Sasha -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Jan 31 07:23:20 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 31 Jan 2014 13:23:20 +0100 Subject: [SciPy-User] missing web page? In-Reply-To: <52EA6D1D.8080103@gmail.com> References: <52EA6D1D.8080103@gmail.com> Message-ID: On Thu, Jan 30, 2014 at 4:17 PM, Alan G Isaac wrote: > At http://docs.scipy.org/doc/numpy/reference/routines.statistics.html > all the links seem to work except this one: > > http://docs.scipy.org/doc/numpy/reference/generated/numpy.var.html#numpy.var > > Odd ... > https://github.com/numpy/numpy/issues/1951 Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From gb.gabrielebrambilla at gmail.com Fri Jan 31 13:47:35 2014 From: gb.gabrielebrambilla at gmail.com (Gabriele Brambilla) Date: Fri, 31 Jan 2014 13:47:35 -0500 Subject: [SciPy-User] Gaussian filter Message-ID: Hi, I'm an italian guy that studies astrophysics. I'm using Scipy's gaussian filter to smooth some images... Could you tell me the meaning the unity of measure of the parameter sigma? thanks Gabriele -------------- next part -------------- An HTML attachment was scrubbed... URL: